TAP and large resultsets

Kona Andrews kea at roe.ac.uk
Mon Jan 29 05:14:05 PST 2007


Hi Doug, all,

I don't disagree that paging is useful for simple clients - but I
do argue that there are (primarily sociological) problems with making 
it compulsory.

We have a hard time already in persuading third-party adopters
to install our components, even though they can be installed with
minimal space usage and locked-down read-only JDBC access to their 
datasets.  If we tell them they need to provide either a (potentially 
very) large disk cache or (even worse!) write access to their DBMS 
for temporary tables, many of them simply won't install the component - 
so no TAP at all.

I'm not arguing that we won't/shouldn't implement paging in our own 
TAP services - just that we need to allow providers to switch
the paging function off if they wish - and that therefore it 
shouldn't be a compulsory part of TAP.

Cheers,
Kona.

On Sun, Jan 28, 2007 at 10:15:13PM -0700, Doug Tody wrote:
> Hi Kona, All -
> 
> I agree that a fully streamed query could be a powerful way to deal
> with large queries, and we should consider supporting this.  However,
> a fully streamed query is not fully general (e.g., no ORDER BY or
> anything else which requires management of the full result set on
> the server; can't handle all cases), and it is semantically complex
> for the client to be able to deal with potentially very large query
> responses.  Another key point with paged queries is that we do this
> in part to attempt to make things more responsive given a slow
> Internet.  Often the client will abort the whole operation after
> receiving the first page or so of the result set, and repeat the
> operation with different parameters.
> 
> For very large queries we need advanced techniques such as use
> of asynchronous operations and VOStore, or a streaming query.
> For "modest" size queries one might define a upper limit for the size
> of the result set managed by the server without resorting to the more
> complex managed techniques, plus some options for how the client
> can get at the result set.  This could include either making the
> upper limit small enough to return it in one go in interactive times
> (the most basic interface, a la cone search), or some scheme based
> on automated server side caching.  If the result set is cached on the
> server, then it can be returned either via paging or via a streaming
> transfer (as we already do for other large datasets such as images).
> 
> So long as the server does not have to manage writeable storage on
> behalf of a client, caching result sets on the server is not necessarily
> very complicated.  TAP already assumes that a DBMS is involved, so it
> is not so difficult to store a result set in a temporary table managed
> transparently by the server, and deleted after some interval.
> 
> I agree that for the simplest possible service we probably do not have
> to require that it support paged queries, however, having a simple way
> to deal with queries up to the point where we get into grid techniques,
> while still providing reasonable interactive performance, is important.
> 
> 	- Doug



More information about the voql-teg mailing list