dtody at nrao.edu
Mon Jul 13 11:17:10 PDT 2009
On Mon, 13 Jul 2009, Francois Ochsenbein wrote:
> First, the question of TAP result in a single *table* : Alberto's
> question is quite right, and I'm afraid the reduction of the
> result to a single table will generate problems for us (vizier)
> and likely for other services. Yes the relational model implies
> that the result of any query is a single table -- but sticking
> to this means that queries like "give me all objects from any
> table this region of the sky" is not possible. Such questions
> however are quite frequent... How to deal with those ? I see
> only the following alternatives if TAP sticks to a single
> output table:
> a. the client asks for tables existing in the service;
> upon the answer (7896 tables), the client generates
> 7896 queries. Not really realistic :-(
> b. the server creates some kind of minimal common schema
> between all these tables -- in practice this can only be
> the position and the table name (i.e. a 3 column table).
> But then you have to get more details about each result,
> details concerning data and as well as metadata.
> Therefore you still have to generate many 'children' queries.
> Or should services like vizier give up with TAP ?
This is an important use case, but not really a conventional (relational)
table access problem. It is getting more into the domain of the other
DAL services which have data models. Some possible approaches:
o For this specific case (find tables with data in some region) PQL
could be used since it has a data model. For example, query
TAP_SCHEMA.tables with POS,SIZE or REGION specifying the region
of interest. Other simple constraints could be specified as well.
o More generally we could use the Generic Dataset (GDS) query.
The GDS (Observation) data model can describe any kind of
dataset, including tables (also images, spectra, etc.). So if
Vizier provides a global index table based upon the GDS model
it could be queried with either PQL or ADQL in TAP.
o A footprint service could also be used, although this is much
the same here as a GDS query using REGION.
In both of these cases the response is a single table. In the first
case it contains TAP_SCHEMA.tables metadata. In the second case it
contains GDS metadata providing a richer description of the tables,
with the possibility of data links pointing to either the table files
(if small) or to services which can be used to access the data.
> 2.3.5: it looks strange for me that constraints can be ignored in PQL.
> If a table is queried with just a contraint on TIME, and there
> is no time in the table, the fact that this parameter is
> ignored results in a dump of a (potentially very large) table.
> Similarly for POS query (section 1.1.5) -- if the table
> queried has no position, is it really a good solution to
> return the whole table ? Hopefully this is not possible
> with ADQL :-)
Again, I think people misunderstand what was meant by this. We should
just remove this from PQL as it is specific to the semantics of SIA/SSA
whereas PQL is a table query interface. When querying an actual table
the semantics want to be precise. This is different from global data
discovery in SIA or whatever where the same query is posed to many
services, each of which may provide a different subset of metadata.
Precise queries cannot easily be used in such a case, rather we need an
iterative query which is what the S*AP interfaces provide.
> 2.3.8: MTIME -- I still have problems with this. A service may have
> some tables which have such timestamp columns (typically
> TAP_SCHEMA tables) while other tables have not this information.
> I can't therefore see this feature as a service-wide feature,
> and the MTIME capability would need to be specified in
> the TAP_SCHEMA (section 2.6.2)
MTIME is supposed to be a parameter query, hence it need not specify
how update/delete/add metadata is maintained internally.
More information about the dal