norman at astro.gla.ac.uk
Mon Jan 26 09:06:54 PST 2004
Hmm, this list has been a bit quiet for several hours now, methinks it
needs a bit of gingering up.
The more I read of this discussion, the more I am persuaded that VOTable
is a well-designed and thoughtful application of XML.
Dictum no. 1: XML does not become purer, better, more proper, or
`harder' in direct proportion to the size of your schema.
Less is more.
I follow the argument that says that, in a case such as the following:
> <FIELD name="Object name" ID="NAME" datatype="char" arraysize="*"/>
> <FIELD name="V Magnitude" ID="VMAG" datatype="double"/>
...this is arguably ugly because you (a) you can't write an XPath
expression to find the magnitudes, and (b) can't write a constraining
schema for <TD> (it has to be simply ANY). It does not follow that this
is just CSV, since you do have all the metadata you need in the sequence
of <FIELD> elements, in a way which is straightforwardly accessible.
(a) There isn't an element named `MAG', so you can't find it using XPath.
Nonsense: /TABLE/DATA/TABLEDATA/TR/TD gives all the elements in
column 2 (the VMAG column) once you know (from <FIELD>) that's the column
you're after. Extracting that using XPath and XSLT is not _completely_
trivial, but it's not exactly hard. You also know the column's type
(FIELD/@datatype) and potentially semantic content (FIELD/@ucd).
(b) If you believe in XSchema, then you believe in your soul that element
contents are typed (column 1 here is a string, column 2 a numeric, for
example), and you will be upset that these don't have specified types.
If you don't believe in XSchema (and you possibly shouldn't, unless you
are a database person who wants to suck all XML into SQL databases), then
you don't care that there isn't a predefined type for each element --
the associated validation and processing lies very firmly on the
application side of this particular technology boundary.
The main thing you get from XSchemas (as opposed to DTDs) is the type
system, and that's where _all_ the complication lies.
Dictum no. 2: If you do not NEED XSchema's type system, then you
do not want XSchema.
VOTable defines reasonably general tables (ahem!), therefore you
don't know a priori what types of elements are going in which columns.
So what? You've still got to process the table with an application
which is smart enough to parse the <FIELD> elements and thus devise
something clever to do with the columns. This means that you might have
difficulty processing a VOTable with standard XML tools; but so what?
Echoing Mark ``In short, I think that VOTable is primarily a storage
and transmission format, and we shouldn't be afraid of parsing it before
doing more complicated things with the data it contains.''
Dictum no. 3: Spending months arguing over schemas in order to
generate a few days worth of code is not a good
XSchema types and schema-generated code are necessary _if_ what you
want to do is suck XML files into SQL databases, so that parsing and
validation are the bulk of what you need to code. It is _not_ the
bulk of what _we_ need to code, and obsessing about code-generation is
allowing the tail to wag the dog, and counter-productive.
XSchemas and code-generation are Good Things -- I know that as well
as anyone else. However they also have costs, and I can recall no
arguments that these costs are worth _us_ paying, in the production of
_our_ data analysis applications.
There's nothing wrong with VOTable. It ain't broke. Discuss.
Norman Gray http://www.astro.gla.ac.uk/users/norman/
Physics and Astronomy, University of Glasgow, UK norman at astro.gla.ac.uk
More information about the votable