Comments on V1.1 - Future of VOTable (flame bait sigh)
martin hill
mchill at dial.pipex.com
Fri Apr 16 04:15:27 PDT 2004
Mark Taylor wrote:
> On Wed, 14 Apr 2004, martin hill wrote:
>>Nor do we save any effort in using VOTable to represent SEDs than in writing a
>>new SED schema (except maybe for a few people to learn schemas - but this is
no
>>harder than learning VOTable and it's a good skill to have!).
>
> An important bit of effort which is saved is coming up with facilities for
> dealing with, crudely, columns of numbers. Schemas are not designed to
> represent tables, and though there are plenty of industry-standard tools
> for doing computer-science-type tasks with them (validation, web service
> specification, searching), the same is not true (as far as I'm aware)
> for astronomy-type tasks which rely on the tabular nature of the data
> (plotting columns against each other, converting to a FITS table or other
> tabular format for use with legacy applications, calculating statistics).
I suspect I haven't understood this right. You can use schemas to describe
tabular data very easily - The VOTable schema is an example of this. And I
don't see too much wrong with using VOTable (or even, <spit>, CSV!) for
representing tables. My issue is that using a generic tabular form (of any
sort) is not the right way to communicate between web services; we should be
explicitly 'typing' the data and its natural structure, which is *not* purely
tabular. A fair amount of astronomy data is in *relational* tables because that
has been the only suitable storage mechanism available, and VOTable does *not*
represent relational tables - at least, not without an extra VO-specific layer
to resolve the relationships. It's just as easy to write a new schema as it is
to design and agree a new VOTable flavour, and you get many advantages to doing
so.
One of the advantages of using tables is, as you say, you can do statistical
analysis/plots/etc. Good stuff. Let's use VOTable for that - as well as CSV
and any other commonly used formats used - as inputs to these tools.
Transformation sheets that take, eg, SEDs and create CSVs to be imported into
Excel (or your favourite non-microsoft spreadsheet) would be very useful.
> An important related point is that VOTable provides ways of storing and
> transmitting very large data sets for which raw XML is not well suited
> in terms of bandwidth and/or processing efficiency.
Of course, VOTable/XML *is* XML... Any difference in size between VOTable/XML
and XML with longer element names is irrelevent; if size is a problem we should
not be using Votable/XML in the first place! And there are many unresolved
issues with using VOTable/FITS; mostly based around ensuring that the VOTable is
correct for the wrapped FITS file. Using this mechanism you can submit FITS
catalogues to SExtractor... We are blinding our web services to each others'
capabilities, something we want to avoid as part of our 'metadata-rich' VO.
There are other possible solutions to using binary XML-like structures without
having to go through ASCII.
> If a SED is passed around as a VOTable then the application programmer
> can use an existing VOTable processing library to turn it into something
> which looks like a column of numbers for further processing without
> further ado, or the astronomer can use a VOTable-aware tool to do
> something tabularly-generic with it such as plot.
But we don't want to have to build special VOTable processing libraries in all
the various languages that astronomers use. We should be using standard XML
tools (below) to move data from one existing tool to another.
> If it's passed
> around using a SED-specific schema this functionality has to be
> rewritten from scratch for SEDs (and the same applies for any other
> specific formats that we want to define new schemas for).
>
With VOTable the same thing is going to have be done 'from scratch' for each
flavour of VOTable - ie converting from an SED VOTable to SED visualisation
tools. This is 'hidden' because we can lump it all under 'developing a VOTable
tool'.
Our data formats should only be storage/transportation mechanisms; we should not
be trying to build our own toolsets around them. There are plenty of existing
tools in the astronomical world for visualising and analysing (and producing)
data. We need to be able to transform the outputs of one into the inputs of
another. VOTable provides an illusionary 'many to one to many' interface
format, but the 'one' in the middle is of many flavours, and yet the flavours
are not explicitly specified in any standard way. Further, we are finding we
have to build a huge *code*set around it, which needs to be installed and run
somewhere (ie a new toolmaker is going to have to do so).
Instead we should have XML message formats that correctly represent the data,
and transformation sheets that understands the source data type so it can
produce the right 'native' format for the target tool. XML transformation sheets
are an industry-standard conversion mechanism available on many different
platforms, so people who want to introduce new tools to the VO can use them
rather than find and learn some VO-specific tools.
Building specialised schemas and associated sheets is *not that hard*! With an
SED-specific schema, we add a transformation sheet to create (say) Votables and
CSVs for things like spreadsheets, and other transformers as required for
existing SED visualisation tools. This means that if you run an SED
visualisation tool *it will only accept SED data*. It means you can construct
the right input document, with help from XML-building tools. You can't throw a
sky catalogue at it, and you will know before you try.
>>I have been assuming that VOTable is for representing 2d tables (I realise it
>>can now hold tables that include tables).
>
> If I understand this statement correctly, I don't think it's true.
>
Someone said recently you can store arbitrary data structures in VOTable? I
don't believe this (and I shy from what an example would look like - but go on,
show me!) The structure of VOTables are based around <TD> and <TR> elements.
There is not even a way of setting up (XML-based) relationships between rows or
cells. To do this we again have to set up a VO-specific interpreters over the
top of standard XML ones to do what we need.
In summary: we need VOTable (for many of the reasons discussed). But we should
not be trying to use it as the sole representation of all astronomical data for
either transport or store. It is not designed for it, we don't have the tools
for it, we shouldn't be spending effort making tools for it, and we will be
shooting ourselves in the foot trying to do so.
MC
--
Martin Hill
Software Engineer
AstroGrid @ ROE
Tel: +44 7901 55 24 66
www.astrogrid.org
More information about the votable
mailing list