VOTable alternative? Parsing VOTables
Martin Hill
mchill at dial.pipex.com
Mon Jan 19 06:38:16 PST 2004
Hi folks,
Using ordinary XML, we have a mechanism for describing data values (schemas)
which we can then use to validate XML document instances. Generally, when we
agree the schema we also agree the extra meaning we humans and our software will
assign to certain elements. For a simplistic example, we can say that the tag
<UCD> in a particular context will have the obvious value, and we all agree on
this. But we need this higher level of agreement.
With this we can do anything. Using it, VOTable was made to describe *tables*.
For this it has only a major problem in the data section: Using TD/TR means
that the values in the data section have no meaning to any XML tools. We cannot
parse usefully using XQuery or XPath, and we can't use any ordinary XML display
tools. We have to build our own specialised ones.
And as we start using it for everything, it starts becoming so generalised that
we end up reinventing what we already had with original XML - but with an extra
layer of toolsets that we have to write ourselves. We cannot validate a VOTable
to make sure that it is a correct *table* vs *spectra* etc, so as we use it for
more things we lose the validation too.
Working backwards to build tools and transformations to interface between
standard XML tools and VOTable is the wrong way around. We should be working
with the standard XML tools to build a series of VOXmls that works with them
directly. That means considering what we want to store - and not the structure
that it happened to have been stored in before, but the structure of the
information. For example, we need to think not in terms of 'a table', but 'a
Stellar Catalogue'.
The metadata section is more interesting and could be carried over to SoVOTable
(Spawn of VOTable...), but we should also have a look at how this is starting to
overlap with VOResource/VODescription. There are better ways of linking tagged
values with extended information (eg, relating the tag <Magnitude>5</Magnitude>
with other information, either inside the file or without, that describe
passbands, zero points, etc), though I confess to not having written any myself.
We also have a skill issue. A fair amount of effort has been invested in
VOTable and I know that quite a few people on this list can read one faster than
a random XML document. However because of it's unusual structure, newcomers to
this industry are going to take longer getting to grips with it and the extra
tools required. Particularly as XML becomes a more common skill. We should be
designing XML documents that are easy to pick up and understand at least in
outline by both humans and machines.
Finally our replacement VOXmls should be able to handle a certain amount of
data, but we should not not not be thinking of using XML as a standard for
passing around arbitrary sizes of data results. The waste in processing power
for de/compression and parsing, disk space and/or network bandwidth would be
dreadful! XML is a rich language for description and messaging. Lets use it
for that.
Cheers,
Martin
Clive Page wrote:
> On Mon, 19 Jan 2004, Tony Linde wrote:
>
>
>>I'm concerned that VOTable is contrary to the normal usage of xml, eg: the
>>metadata for a table is in a proprietary format rather than in XML Schema;
>>it is embedded within the document;
>
>
> Tony
>
> Could you explain further what you mean by that: I suspect you are using
> the term "metadata" in a way that is different from an astronomer. The
> important metadata of a tabular dataset such as is likely to be stored in
> a VOTable are things like the data types and physical units of each
> column, and for the table as a whole, the epoch and equinox of the
> celestial positions (if there are any), the origin of the data, history of
> processing and so on. I can't believe that it is right to relegate such
> things into a separate schema file, nor that there's a proper XML-way to
> store these quantities. We surely have to invent our own, even if that
> makes it "proprietary"?
>
> I'm only guessing here, but I suspect that the structure of VOTable was
> suggested by the need to cope with the three alternative formats, in
> particular the BINARY format, in which the data in a row are compressed
> into a binary BLOB. To keep the metadata (of the sort that I describe
> above) the same, the natural thing with the alternative TABLEDATA and FITS
> formats was to use <td> and <tr>, as in an HTML table. As already noted,
> astronomical XML files easily get into the gigabyte range of size, so we
> need to concentrate on the BINARY and FITS forms, and let the TABLEDATA
> form tag along as best it can.
>
> I don't really understand why that makes the VOtable harder to parse than
> a normal XML table, but that's because I haven't tried to write an XML
> parser myself. For the benefit of those like me, probably the majority,
> maybe someone could explain?
>
>
--
Software Engineer
AstroGrid @ ROE
Tel: +44 7901 55 24 66
www.astrogrid.org
More information about the votable
mailing list