a high level language

Tony Linde ael at star.le.ac.uk
Mon Feb 24 13:41:23 PST 2003


Ed,

Thanks for that - it succinctly explains what you eventually want.
Unfortunately it is well beyond current capabilities. This should not
place me in your 'detractor' class - I want the same thing - it is what
attracted me to the VO projects in the first place - recognition of the
possibilities.

To put it into context, take Google, one of the best search engines ever
- how often do you get exactly and only what you wanted to find?

> red-giants with white 
> dwarf companions and proper motions greater than x and 
> variable by less than y%)

That's a great example. Every word - nouns, adjectives, conjunctions,
values - is problematic. Rather than a query language or a workflow
language, what you're after is a natural language processor backed up by
astronomical knowledge and an ability to learn. Excellent! Let's go for
it.

BUT

While we're researching the ability to do this, we need to get stuck
into building the early releases of the VO. And for that we need to take
a few steps back (a mile or two :) and look at what we can achieve now.
A VOQL that can be built and interpreted now.

Cheers,
Tony. 

> -----Original Message-----
> From: Ed Shaya [mailto:edward.j.shaya.1 at gsfc.nasa.gov] 
> Sent: 24 February 2003 20:06
> Cc: voql at ivoa.net
> Subject: Re: a high level language
> 
> 
> 
>     I just want to reiterate that I think VOQL should be a 
> means for the 
> astronomer to clearly specify what astronomical knowledge 
> he/she wants. 
>  It should not require the astronomer to know how data is arranged at 
> each of the data centers, nor all of the steps required.   
> Astronomical 
> data is held in a bewildering assortment of ways.  There are object 
> oriented data centers (IPAC, CDS and AMASE), there are thousands of 
> catalogs at CDS each in their own eccentric logical arrangement (yes, 
> they are all in a similar numeric format, but each is ontologically 
> unique), object-relational databases, XML databases (GSFC),  
> log entry 
> observations, and into the future one can expect this to only 
> get more 
> complex.  The average astronomer can not be expected to be able to 
> properly layout an optimal workflow.
> 
> The astronomer thinks "I am interested in objects with such and such 
> properties" and VOQL should allow a description of these constraints, 
> even if they are quite complex or detailed (eg.  red-giants 
> with white 
> dwarf companions and proper motions greater than x and 
> variable by less 
> than y%).  Additionally, the astronomer says,  "while 
> selecting objects 
> that fit this criterion, keep track of the following other 
> properties of 
> such objects."  It is sobering how quickly the detailed work flow for 
> such queries gets beyond what humans can reliably handle.  The 
> astronomer doesn't know if it is better to first get index numbers of 
> records that fit the criteria and then later extract additional 
> properties or to do both at the same time.  Are we looking through 
> tables or querying object databases?  Are we looking at tables of 
> red-giants or white dwarfs or proper motions or binary stars? 
>  Are there 
> tables specifically of variability or do we need to look at 
> photometry 
> catalogs and look for time variations?  How do we do a many way 
> cross-correlation, because the intermediate return has some 
> tables with 
> some properties and other tables have other properties although some 
> tables have a few of each?  Etc.
> 
> So now the detractors say, well if you don't know how to do 
> this, how do 
> you expect the machine to know better?  The answer is that computers 
> simply are better at the restrictive problem here and repetitive task 
> breakdown and assignment.  To be sure, it will take us some 
> time to find 
> the appropriate set of algorithms and "workflow" language to automate 
> this.  But once done it should evolve quite slowly.
> 
> Next the detractor says, "I can break your example down 
> easily.  Go to 
> an all-sky binary star catalogs and get lists of binaries with 
> red-giant/white dwarf members, then send the list of ra/dec  
> positions 
> to a proper motion resource and have it delete candidates with low 
> proper motions.  Then send that list to a variable star resource and 
> have it delete highly variable ones.  What is so hard about 
> that, I can 
> set up the entire workflow for that in about 20 minutes?"  
> 
> First of all, I agree that the user should indeed have access to 
> individual resources in this simple manner.  And this may get 
> the user 
> some objects that fulfill the criteria.  But what if the user needs a 
> more thorough search.  Let's say that it is not easy to see 
> white dwarfs 
> around red-giants, so one is likely to get only a few hits from the 
> general catalogs. Inevitably a more in depth search is 
> required.  There 
> are many useful sources for each of the desired properties, 
> most do not 
> have all sky coverage, so coverage maps need to be examined 
> for proper 
> overlap.   Now we are talking about matching with SLOAN and 
> 2-MASS and 
> PSS and a few dozen other cataloged data sources, perhaps some were 
> published in the last few months.  The workflow development rapidly 
> grows to a couple of days.
> 
> "This will be ruinous,"  says the conservative. "How is a machine to 
> know when enough is enough? Perhaps it will attempt 
> photometery on all 
> objects on each and every image of the sky ever taken in a desperate 
> attempt to answer your query as thoroghly as technologically 
> possible?" 
>   Yes, this is a concern. On the other hand, maybe that is 
> what the user 
> has in mind.  For requests that have options that would take 
> more than a 
> few minutes the user should first be sent a high level summary of the 
> possible paths to satisfy the request and estimates of the time 
> required.  The user then selects one of these options before any such 
> action.  So, the concern is, and always has been, what 
> prevents a single 
> user from tying up vast resources too often?   There must be time 
> alotments or financial costs to put limits on what users do.
> 
> Admittedly, this level of automated service will not even 
> begin to come 
> together before the final year of the NVO grant.  But it is useful to 
> carefully outline and agree on the properties of the 
> (pen)ultimate user 
> interface before beginning to develop a system.  The alternative is 
> everyone marching in a different direction because each sees 
> a different 
> end goal.
> 
> Back to annotation.
> 
>  Ed
> 
>    
> 
>    
> 
> Kirk Borne wrote:
> 
> >Tony:  thanks for clarifying distinctions between workflow 
> and query, 
> >and between data services and functional services.  This is 
> in fact a 
> >distinction that Ed, Brian, and I discussed, but somehow I 
> mangled it 
> >in my example.  It is perhaps appropriate and prudent 
> therefore to keep 
> >those "functional" workflow actions separate from the VOQL's query 
> >actions.
> >
> >- Kirk
> >
> >
> >  
> >
> >>From: "Tony Linde" <ael at star.le.ac.uk>
> >>To: <voql at ivoa.net>
> >>Subject: RE: a high level language
> >>Date: Mon, 24 Feb 2003 09:42:34 -0000
> >>
> >>Hi Kirk,
> >>
> >>Thanks for the reply.
> >>
> >>    
> >>
> >>>This query involves
> >>>multi-wavelength data and multi-modal data (catalogs, 
> spectra), and 
> >>>thereby the query must be parsed and distributed to 
> >>>appropriate data centers and maybe the data need to be 
> >>>shipped to some service (e.g., to generate line lists from 
> >>>optical spectra). 
> >>>      
> >>>
> >>This is what I assumed from Ed & Brian's document and why I 
> raised the 
> >>question. I can see that a *query* language might cover more than a 
> >>simple single-dataset query, eg selecting from a join of 
> distributed 
> >>datasets with sub-selects etc. - the sort of thing you can 
> do at the 
> >>moment using SQL on the more advanced databases (though without the 
> >>distributed bit).
> >>
> >>However, when it comes to shipping intermediate data to another 
> >>service for analysis, reduction etc., I would consider this to be 
> >>*workflow*, requiring a separate description using a 
> workflow language 
> >>(as in the commercial world with the recent development of BPEL4WS).
> >>
> >>    
> >>
> >>>VOQL is a standardized language to capture scientist's
> >>>queries to the distributed heterogeneous collections that 
> >>>comprise the VO. 
> >>>      
> >>>
> >>There I would agree. But the VO comprises more than data 
> services, it 
> >>includes functional services such as those to 'generate 
> line lists'. 
> >>Pushing the results of a query to such services, or using 
> the results 
> >>of a query in another, later, query amount to workflow construction.
> >>
> >>There is a danger that, in trying to combine queries and 
> workflow in a 
> >>single language, we will overcomplicate the matter and reduce the 
> >>chance of using or extending existing efforts in the development of 
> >>query and workflow languages.
> >>
> >>Cheers,
> >>Tony.
> >>
> >>    
> >>
> >>>-----Original Message-----
> >>>From: Kirk Borne [mailto:borne at rings.gsfc.nasa.gov]
> >>>Sent: 23 February 2003 22:01
> >>>To: ael at star.le.ac.uk
> >>>Cc: voql at ivoa.net
> >>>Subject: Re: a high level language
> >>>
> >>>...
> >>>      
> >>>
> >
> >+------------------------------------+-----------------------
> --------------+
> >| Dr. Kirk D. Borne                  | 
> mailto:Kirk.Borne at gsfc.nasa.gov     |
> >| Institute for 
> Science & Technology, Raytheon (IST at R)                     |
> >| NASA Goddard Space Flight Center   |                       
>               |
> >| Astrophysics Data Facility         | Phone: 301-286-0696   
>               |
> >| Code 631                           |     or 
> 301-286-2772:Kathy Starling  |
> >| Greenbelt, MD  20771               | FAX:   301-286-1771   
>               |
> >+------------------------------------+-----------------------
> --------------+
> >  US Virtual Observatory:  http://us-vo.org/
> >  Staff page:     
> http://rings.gsfc.nasa.gov/> ~borne/bio_borne_kirk.html
> >
> >  
> >
> 



More information about the voql mailing list