VOTable alternative? Parsing VOTables

Martin Hill mchill at dial.pipex.com
Mon Jan 19 06:38:16 PST 2004


Hi folks,

Using ordinary XML, we have a mechanism for describing data values (schemas) 
which we can then use to validate XML document instances.  Generally, when we 
agree the schema we also agree the extra meaning we humans and our software will 
assign to certain elements.  For a simplistic example, we can say that the tag 
<UCD> in a particular context will have the obvious value, and we all agree on 
this.  But we need this higher level of agreement.

With this we can do anything.  Using it, VOTable was made to describe *tables*. 
  For this it has only a major problem in the data section: Using TD/TR means 
that the values in the data section have no meaning to any XML tools.  We cannot 
parse usefully using XQuery or XPath, and we can't use any ordinary XML display 
tools.  We have to build our own specialised ones.

And as we start using it for everything, it starts becoming so generalised that 
we end up reinventing what we already had with original XML - but with an extra 
layer of toolsets that we have to write ourselves.  We cannot validate a VOTable 
to make sure that it is a correct *table* vs *spectra* etc, so as we use it for 
more things we lose the validation too.

Working backwards to build tools and transformations to interface between 
standard XML tools and VOTable is the wrong way around.  We should be working 
with the standard XML tools to build a series of VOXmls that works with them 
directly. That means considering what we want to store - and not the structure 
that it happened to have been stored in before, but the structure of the 
information.  For example, we need to think not in terms of 'a table', but 'a 
Stellar Catalogue'.

The metadata section is more interesting and could be carried over to SoVOTable 
(Spawn of VOTable...), but we should also have a look at how this is starting to 
overlap with VOResource/VODescription.  There are better ways of linking tagged 
values with extended information (eg, relating the tag <Magnitude>5</Magnitude> 
with other information, either inside the file or without, that describe 
passbands, zero points, etc), though I confess to not having written any myself.

We also have a skill issue.  A fair amount of effort has been invested in 
VOTable and I know that quite a few people on this list can read one faster than 
a random XML document.  However because of it's unusual structure, newcomers to 
this industry are going to take longer getting to grips with it and the extra 
tools required.  Particularly as XML becomes a more common skill.  We should be 
designing XML documents that are easy to pick up and understand at least in 
outline by both humans and machines.

Finally our replacement VOXmls should be able to handle a certain amount of 
data, but we should not not not be thinking of using XML as a standard for 
passing around arbitrary sizes of data results.  The waste in processing power 
for de/compression and parsing, disk space and/or network bandwidth would be 
dreadful!  XML is a rich language for description and messaging.  Lets use it 
for that.

Cheers,

Martin


Clive Page wrote:

> On Mon, 19 Jan 2004, Tony Linde wrote:
> 
> 
>>I'm concerned that VOTable is contrary to the normal usage of xml, eg: the
>>metadata for a table is in a proprietary format rather than in XML Schema;
>>it is embedded within the document;
> 
> 
> Tony
> 
> Could you explain further what you mean by that: I suspect you are using
> the term "metadata" in a way that is different from an astronomer.  The
> important metadata of a tabular dataset such as is likely to be stored in
> a VOTable are things like the data types and physical units of each
> column, and for the table as a whole, the epoch and equinox of the
> celestial positions (if there are any), the origin of the data, history of
> processing and so on.  I can't believe that it is right to relegate such
> things into a separate schema file, nor that there's a proper XML-way to
> store these quantities.  We surely have to invent our own, even if that
> makes it "proprietary"?
> 
> I'm only guessing here, but I suspect that the structure of VOTable was
> suggested by the need to cope with the three alternative formats, in
> particular the BINARY format, in which the data in a row are compressed
> into a binary BLOB.  To keep the metadata (of the sort that I describe
> above) the same, the natural thing with the alternative TABLEDATA and FITS
> formats was to use <td> and <tr>, as in an HTML table.  As already noted,
> astronomical XML files easily get into the gigabyte range of size, so we
> need to concentrate on the BINARY and FITS forms, and let the TABLEDATA
> form tag along as best it can.
> 
> I don't really understand why that makes the VOtable harder to parse than
> a normal XML table, but that's because I haven't tried to write an XML
> parser myself.  For the benefit of those like me, probably the majority,
> maybe someone could explain?
> 
> 

-- 
Software Engineer
AstroGrid @ ROE
Tel: +44 7901 55 24 66
www.astrogrid.org




More information about the votable mailing list