[TAP] data type for column metadata
patrick.dowler at nrc-cnrc.gc.ca
Wed Mar 25 10:23:04 PDT 2009
On 2009-03-25 08:13:26 Markus Demleitner wrote:
> The discussion started as "How do we say, in TAP metadata, how a
> given quantity is stored in a database". Since the database is being
> queried through ADQL, and column types restrict what queries are
> syntactically legal, the question probably should be phrased: "How
> does a certain column look to ADQL?".
> A second point was "How is a value for that column transferred?"
> (i.e., do we want a timestamp type or geometry types in VOTable?).
> This is a different, and at least equally involved question.
Agreed - this should be decoupled from the "query writing metadata".
> Every nontrivial ADQL query will have lots of pitfalls, and the
> resulting tables will require lots of interpretation by both the
> client and the human using it. I'm afraid that's what science is
> about, and trying to push much of it into protocols will turn their
> descriptions into physics textbooks. This may be a good thing, but
> will take ages.
> Well, but for both TAP and VOTable data types only the fact that
> *some* time is being stored here is relevant. What time scale it is
> in, whether it's the time some GRB could be detected on Antares or
> the start of an exposure has nothing to do with data or legal queries
> but is entirely an issue of metadata and interpretation.
> I'm not saying it's unimportant, I'm just saying that it doesn't
> matter for purposes of storing or transferring data. I would also
> maintain that it doesn't matter for writing queries, since, as
> pointed out above, I have strong doubts that we want to build so
> much physics into your protocol that queries "violating physics"
> become some kind of error.
Also agree 100%. There are many many ways to write ADQL (SQL) queries that
generate nonsense results. Only the query writer can do anything about that.
> [My take: *If* we want to expose metadata this detailed up front, why not
> use the proven techique of DAL v1 and define some way to retrieve
> an "empty" VOTable for a given query? There are pitfalls in this as
> well, and I'd rather see an implementation of this before it ends up
> in a standard, but I think that would be the least intrusive way]
I recall that this option is available by setting MAXREC=0 when doing the
query. However, Gerard has argued in the past that populating the result
VOTable metadata from the query can be simple (if the select contains column
names) and very complex if it contains expressions.
> > Please do not use ISO-8601 with timezone indicators - just the
> > restricted form I gave above. Only use UTC - Timezone 0 - conforming
> > with current astronomical usage.
> Full support here. *Don't* mess with time zones, DST, and all those
> other horrors. The laws of physics are complex enough, we don't want
> to have to account for the laws of 200+ governments on this planet.
The reason I mentioned timezone is that without it people have to know/assume
the timezone is to correctly parse the string into a number, which is how
software actually works. Now, we can just say "all timestamps are in UTC" but
that is something we have to say and there will be existing databases out
there with timestamps in local time.
I am OK with requiring UTC and avoiding time zones (which are a pain), but the
implication of that is that TAP services with non-UTC timestamps must:
* be able to parse the query and transform all date constants from UTC to the
timezone they use
* convert selected values from local -> UTC in the output
I have argued elsewhere that I think TAP services will need to parse the query
anyway due to their own DB behaving differently than the ADQL+TAP behaviour,
so I don't personally think this is a major impediment. It is the implication
Note: It is quite possible that if we did the opposite that services would
still have to parse and deal with the serialised timestamp outside the DB
anyway (say their DB doesn't like the format with timezone).
> So -- I'll support basically any choice of SQL types being part of
> TAP metadata; I think we definitely should have timestamps in there
> (though of course, people are free to store times in floats, integers
> or varchars if that's appropriate; it's just that there is one
> standard way to store and transfer timestamps, and everything else
> has to be manually supported).
I take this as a proposal for one allowed iso8601 timestamp variant (mentioned
earlier, without timezone) and a requirement that all timestamps are in UTC.
> A quick inspection of the ADQL
> grammar seems to suggest that it's POINT and REGION indeed:
> <coord_value> ::= <point> | <column_reference>
> -- which singles out POINT from other geometry value expressions.
> This reasoning would break if we anticipated the need to define
> functions taking only certain types of geometries. Do we?
I don't anticipate more functions that take specific geometry types as
arguments. I think one can look at this as there only being just the two
types and the other functions (CIRCLE, BOX, POLYGON) as being constructors
that also create a REGION. In OO one might have REGION as the base class and
these constructors imply the presence of subclasses.... but SQL is not OO so
I don't think one is forced to interpret it that way.
Tel/Tél: (250) 363-0044
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9E 2M7
Centre canadien de donnees astronomiques
Conseil national de recherches Canada
5071, chemin West Saanich
Victoria (C.-B.) V9E 2M7
More information about the dal