Asynchronous querying and tabular data
dtody at nrao.edu
Wed May 2 12:51:18 PDT 2007
Good - sounds to me like we are most of the way there.
The async query operation both executes a query and stages the data,
so I think either name would work, and stageData is more consistent
with the other interfaces. But the operation could have a different
name for TAP if we feel this is important. The more important thing
is that it does essentially the same thing as the other versions;
probably almost everything could be common except for the table-specifc
content of the "stageData" request.
Based on these discussions, what we have at this point for a
simple-as-possible interface is something like:
o queryData. Synchronous data queries against a single
table or tableSet. Could also be used to query metadata
if we wish, by querying SCHEMA.tables and SCHEMA.columns
(following the information schema concept, but omitting most
of it and putting our own custom metadata in these tables).
Alternatively, separate methods could be used for metadata
queries. In either case, a TAP metadata query could be used
to generate table metadata to support registry queries.
o getData. Just an access reference (URL currently) as
elsewhere; so far in TAP this is only needed to retrieve data
from an async query.
o stageData (or queryDataAsync etc.). Executes a query
asynchronously, staging the output table in a local or remote
VOSpace. Standard UWS-like mechanisms (polling, messaging) can
be used to monitor the progress of a job once execution begins.
When the job completes, an acref can be returned to the client,
or the table can be delivered directly to the client's VOSpace.
o getCapabilities - standard
o getAvailability - standard
I think there is still some work to be done (beyond what VOResource
defines) to define tableset and table/column metadata to support
complex queries against large tables, however I recognize that others
don't necessarily agree with this.
Does this sound like it would work?
On Wed, 2 May 2007, Patrick Dowler wrote:
> On Wednesday 02 May 2007 11:07, Doug Tody wrote:
>> A single service could support both: queryData for synchronous DM and
>> ADQL-based queries, and optionally stageData for asyn/staged execution.
>> The client would then either have to guess which to use, or try a
>> few smaller synchronous queries first to determine what to do, and
>> then resubmit a larger query as a batch job.
> I said earlier that I think this is what we need (single step sync and async
> querying methods). For what it's worth, I think they both need the word query
> in the name, for the simple reason that this is what people new to the API
> will look for. For example, when you first learn the JDBC API, you poke
> around aimlessly until you find the Statement interface and the method:
> ResultSet executeQuery(String sql)
> That is where you start. Then you learn about Connection and DriverManager
> (maybe DataSource) and ResultSet... but you grok it when you see that one
> method signature.
> ** I think it is really important for TAP to have this kind of very clear
> focal point. **
> PS-Sure, JDBC is a nightmare of bad design otherwise, but once someone finds
> that method signature they can proceed from there and get something working
> quite quickly.
More information about the voql-teg