VOTable alternative?
Martin Hill
mchill at dial.pipex.com
Tue Jan 27 04:48:53 PST 2004
Guy Rixon wrote:
>
>>Where (perhaps?) we diverge is that I don't think it should be used as the
>>normal data exchange mechanism. Because it doesn't describe the data
>>sufficiently well _in a standard way_ that allows us and future astronomers to
>>build tools out of 'industry standard' tools. Other comments below...
>
> Yes. I think that VOTable _shuld_ be the normal medium of exchange, _for
> generic tables_. I.e., for the most common cases, we exchange generic tables.
> Catalogues, typically, which are the main data type for the VO.
>
> Yes, we can use fancy XML structures for special cases that aren't tables.
> Yes, we can constrain _some aspects_ of these with schemata (although there
> are always cases you can't catch like getting the RA and dec the wrong way
> round). But these are exceptions.
>
> I hold it to be more important to make VOTable a workable standard than to
> generate new structures for special cases. If we gain the special structures
> at the expense of being able to handle large, generic tables, then the VO
> fails.
My feeling is that VOTable *is* a workable standard for generic tables - any
more tweaking should be minor. We have started to consider major changes
because we want to do more with it - and instead of shoe-horning every data
transfer type into it, we should be looking at what the original problem is.
ie, how do we store/transfer/process astronomical data, preferably in some
meaningful way.
>
>>Quoting Guy Rixon <gtr at ast.cam.ac.uk>:
>>
>>>On Mon, 26 Jan 2004, Martin Hill wrote:
>>
>>We can in fact say a lot of things about whether a catalogue is valid or not.
>>We can ensure that positions are within sensible ranges, that shapes are
>>associated with galaxies, etc.
>
> So every time I write a structure about a galaxy I have to put in morphology
> information? No, I know that's not what you meant...but it could be what the
> schemata end up saying. Option (a) all valid Galaxy elements include a
> Morphology element, so catalogues without morphology data are filled with
> spurious elements; (b) Morphology is optional, so a schema parser can't check
> that Morphology is present when it's needed. So...step 2: separate schemata
> for each representaion of a Galaxay; Galaxy-with-morphology,
> Galaxy-with-photometry, Galaxy-with-morphology-only-IR-photometry-
> couple-of-spectra-on-the-side-no-fries-please. There are just too many things
> to say about a galaxy to support this; we'd need thousands of schemata.
>
> Ultimately, you need a more-generic structure. Granted, that structure need
> not be a table; but you always have to do some application-specific parsing
> and checking. Therefore, W3C XML schema is _not_ a silver bullet that gets us
> out of coding parsers and validators.
A more generic structure would be ideal. But just because we can't do the ideal
should not force us into doing nothing... The typical answer to above might
be that all components are optional in the overall schema. We can check at
least that things are valid if present, and we always ensure that the structure
is valid. ie, we can make sure we there's not shapes for stars (unless the VO
community want it...). We can make sure that there's no RA element inside a
MAG, etc, etc. We can make sure that if an RA is given so is a DEC and the
epoch and equinox. We can make sure that a MAG always has a unit and passband
associated with it. And so on.
If we can have some way of extending-schemas-by-restriction (dream on!), then
application servers could implement those as their input scheams, so that those
doing analysis on shape (?) can check that such data is present... for much
later I think...
>>Also *you* may know that a particular file is a spectrum or a catalogue. But
>>we're trying to build semi-intelligent systems, including workflows, that can
>>take certain 'types' of data and feed them into other steps. They need to know
>>what can go into what - not just to check the process as it runs, but also at
>>design time. We don't want to let people accidently connect the wrong job step
>>output into some other input without it barfing. And we want our tools to be
>>able to look around and go 'there's a service that can take what I have, to do
>>what I want'.
>>
>>So why create an enumerated type in VOTable? Why make up yet another
>>VO-specific way of describing things when there is already an industry standard
>>way of doing it? We are/should be about reusing existing standards rather than
>>making new.
>
> Because the IT-industry standard seems not to work for efficient, binary
> structures, or for tables in ordinary RDBMS, which are the two things we have
> most of.
I assume you mean here using the VOTable astronomical metadata to describe the
binary structures, or the tables in an RDBMS? Since VOTable is also not an
efficient, binary structure :-) However I believe the VOResource registry entry
schema is going to give us a lot more than the VOTable astronomical metadata can
for these cases. And relevent schemas that include how the document should
include relevant astronomical metadata for relevent datasets will make data
exchange between semi-automatic workflow systems much easier.
> If we don't put large datasets into XML, how do
> we describe them?
>
> We have cases - e.g. the output from an ADQL server - where any given job can
> produce either a small amount of data, which we can put in your proposed
> format, or in VOTable/TABLEDATA, or in V2, or it can produce a mass of data
> which we can't. In a workflow, we need to route this result from originating
> service A to consuming service B via a file or a stream. B is easier to write
> if the metadata for the dataset - the part doing the job of the FIELDs in
> VOTable - is the same in both cases. If B depends on a W3C XML schema and a
> validating XML-parser to check its input then either it can't handle the
> binary case or its has to forgo the validation. If it depends on a W3C schema
> to understand the data, then maybe it can't use the binary format at all.
>
> We need a format for metadata that is common to rich XML and binary formats.
> That's why I like the inclusion of VOTable FIELDS in Roy's V2 proposal.
OK I see your point - it would be useful to have a common astronomical metadata
to both binary and XML forms. Indeed I would say it would be useful if that (or
part of that) astronomical metadata was also common to the registry entries.
Let's see what Bob M has to say about BinX; but otherwise I don't see this as
any more significantly difficult than defining the binary format that VOTable
maps to, and how VOTable metadata maps to it.
>>>You can imaging the scenario. Dr. Clever goes to her local VO contact and
>>>says "how do I record data about this new idea I've just invented in the
>>>VO?"
>>>Mr. VO looks at it and says "Cool. Give us six months to draw up the data
>>>model for the new bits and then you can publish your results. Don't try to
>>>publish it this week using that VOTable rubbish (like your competitors will)
>>>coz it's not semantically pure and my favourite IT-industry tool won't be
>>>able
>>>to read it." We'd get lynched!
>>
>>Of course we would. And rightly so! But saying that we need to *always*
>>describe data vaguely because there will be occasions where we want to be
>>flexible is not a Good Solution. Saying that we should *never* be able to use
>>standard tools because there will be (a few) occasions where it might not be the
>>best solution is also not a Good Solution. Saying that people are *always*
>>going to have to use VO-built tools when there may be more sophisticated
>>standard ones is also not a Good Solution.
>
> BUT: using VOTable doesn't stop you form using XML tools. Using XML tools
> exclusively stop you writing down data (a) until the data models are workable
> and (b) in the cases where the data sets need to be binary files or streams.
But using VOTable *does* put a big stumbling block in the way of using
information-processing/presentation XML tools. And using common-or-garden XML
does not mean we have to wait for anything, any more than we have to wait for
UCD2s to appear before we can use VOTable. And it certainly does not mean we
can't use binary files or streams any more than VOTable does.
>>>>So we need a different way of agreeing data models. We don't have to have an
>>>>absolute agreement - we just need a version 0.1. Indeed we don't need an agreed
>>>>data model before we have semi-agreed XML snippets (for those not in the
>>>>dm at ivoa.net mailing list, I've been whittering on there too on this). In
>>>the
>>>>same way as we are agreeing ADQL, or VOResource, etc.
>>>
>>>Sounds deadly. Is my structure using VOPosition v35.213.345 and
>>>VOPhotometry
>>>v4000000018 and VOMorph vvSNAPSHOT going to interoperate with your
>>>VOPosition
>>>v48.12 VOSpectropolarimetry v2004-12-18 and VOGalShapes v "latest"?
>>>Oink-flap.
>>
>>This is a problem we are going to have to face anyway from the data modelling
>>group. We already face it now in ADQL. It gets worse because these snippets are
>>likely to be common across schemas such as ADQL and Registry entries as well.
>>
>>I'd be interested in any ideas on this! :-)
>
>
> There is one data-modelling group. That's a lot more manageable than a
> free-for-all. Besides, we don't have to use the data-model refernces in
> generic structures until they are stable and tested; they add value to generic
> structures, they aren't essential for the structure to exist.
Yes. However the process of going from generic to data model involves a
translation process that we are going to have to add later - bringing me back to
my point that we are putting off something we need to do, and the sooner the
better. Again, no reason to remove generic VOTable, but definitely one to
start working on specific schemas. As is happening already - if the data
modelling team are too busy to make up standard models just now, we'll just have
to get on with real world examples (such as nvo-coord.xsd) and grow the models
from them.
The business of growing such models needs to be done early - before there are
too many implementations - rather than later. Even so, I think we are in a
position to make sensible decisions about many things (such as world
coordinates) even if we expect to extend them later. If we are saying that in
fact we will always be using VOTable everywhere, then in fact we don't need the
data modelling group at all. Where would it fit in?
Cheers,
Martin
--
Martin Hill
Software Engineer
AstroGrid @ ROE
Tel: +44 7901 55 24 66
www.astrogrid.org
More information about the votable
mailing list