VOTable alternative? Mapping Rich Data to DBMSs
Guy Rixon
gtr at ast.cam.ac.uk
Tue Jan 27 02:45:26 PST 2004
Suppose we have the schema for a hierarchical structure. Suppose that the
actual data are flattened into a binary table, but that table can be crumpled
back into the hierarchy by a programme that understands the table. Can we
annotate the schema with column names or positions s.t. it informs a tool
that works on the binary data directly?
I'm thinking like this:
1. Write some standard schemata for bits of data model.
2. Combine those schemata into an application-specific schema describing a
data set.
3. Annotate the combined schema with attributes that say where to find
individual data in the binary table.
This may mean that every element in the schemata should be an extension of
some MappedDatum that mandates ColumnName and/or ColumnNumber
children/attributes.
4. Process the binary via a library that provides XML-like features,
mainly XPath and XSLT. The library works on {binary-file, XSD} tuples.
Apologies if this has already been discussed on some other list.
On Tue, 27 Jan 2004, Martin Hill wrote:
> Good question - how do we map such heirarchical views on to our existing
> flat databases?
>
> In fact few (if any?) of our databases are flat. However the results of
> our queries usually return a 'joined' table that is, so we would need to
> unflatten it to recreate structure. This is a way of re-introducing the
> natural structure of the information - with the normal query result we
> get, for example, shape cells for objects that don't have shape.
>
> I'd like to hear of more elegant suggestions, but as it's a one to one
> relationship the datacenter manager could fill out a template form with
> the column names inserted:
>
> <StellarCatalogue>
> <Galaxy name="T-NAMES.GALNAME">
> <Altname>T-NAMES.CATNAME</Altname>
> <Position>
> <WCS epoxinoxthingy="J2000">
> <RA unit=sexigesimal>T-OBJS.RA</RA>
> <DEC unit=sexigesmial>T-OBJS.DECL</DEC>
> </WCS>
> <Galactic>
> <Long unit=degree>T_OBJS.GALLONG</Long>
> <Lat unit=degree>T_OBJS.GALLAT</Lat>
> </Galactic>
> etc
>
> With a way of removing elements and parents when not present in the
> resultset. This should work for uploads too I would think.
>
> Now I reiterate that XML is *not* a suitable transport/ storage/
> processing medium for large tables. Binary formats is a different and
> larger can of worms still to be dealt with, but we ought to be able to
> use a similar mapping if necessary.
>
> Cheers,
>
> Martin
>
> Guy Rixon wrote:
> > Right. Taking the example quoted below, and the implied schema, how do you
> > use the schema when there are 1,000,000 objects matching the Galaxy type in a
> > binary table?
> >
> > If the answer is "use an ORDBMS" (which is probably a _good_ answer; q.v. Ed's
> > contributions earlier in this list) can you tell us wher we get a suitable
> > ORDBMS and a little more detail of how we use it? Also how the data get into
> > the ORDBMS when there are millions of Galaxy structures to move.
> >
> > I'm prepared to believe in schema used t ovalidate and process binary tables
> > if I see a worked example.
> >
> > On Mon, 26 Jan 2004, Martin Hill wrote:
> >
> >
> >>In fact with a good schema we can add much deeper, useful information than we
> >>can with VOTable. Consider the following, have a look at the <Brightness
> >>filter="peach"> and the <Filter ID="peach">:
> >>
> >><StellarCatalogue>
> >> <Galaxy name="NGC 300">
> >> <Altname>PGC 3238</Altname>
> >> <Position>
> >> <WCS epoxinoxthingy="J2000">
> >> <RA unit=sexigesimal>00:54:52.6</RA>
> >> <DEC unit=sexigesmial>-37:40:57</DEC>
> >> </WCS>
> >> <Galactic>
> >> <Long unit=degree>299.2306</Long>
> >> <Lat unit=degree>-79.4210</Lat>
> >> </Galactic>
> >> </Position>
> >> <Brightness band="B" unit="johnsonmag">8.95</Brightness>
> >> <Brightness filter="peach" unit="johnsonmag">6.95</Brightness>
> >> <Extinction band="I" unit="johnsonmag">0.02</Extinction>
> >> <Extinction band="B" unit="johnsonmag">0.055</Extinction>
> >> <Shape>
> >> <Ratio>0.73</Ratio>
> >> <MorphologyT>7</MorphologyT>
> >> <MorphologyM>SA(s)d</MorphologyM>
> >> </Shape>
> >> :
> >> </Galaxy>
> >> <Filter ID="peach">
> >> <CenterWlen units='nm'>650</CenterFreq>
> >> <FreqWidth units='millihertzteehee'>12</FreqWidth>
> >> </Filter>
> >></StellarCatalogue>
> >>
> >>We can extend the schema snippet describing the <Filter> element to include all
> >>kinds of bandwidth information and characteristics. And as I understand it, we
> >>can also use this technique to refer to other XML documents, such as
> >>astronomical metadata held at the datacenter. Whether we want to allow metadata
> >>to be 'distributed' like this is another matter...
> >>
> >>Er, and I forgot to include UCDs in the above. There's no reason why they can't
> >>also be in as attributes.
> >>
> >>Cheers,
> >>
> >>Martin
> >>
> >>Martin Hill wrote:
> >>
> >>
> >>>
> >>>Mark Taylor wrote:
> >>>
> >>>
> >>>>On Mon, 26 Jan 2004, Martin Hill wrote:
> >>>>
> >>>>
> >>>>
> >>>>>3+) We need to argue over the schemas for a very important reason -
> >>>>>because we are arguing over how we share our information. VOTable
> >>>>>does *not* do this! It's a cop out that lets us pass information
> >>>>>around, but without having to agree what's in it. This is sometimes
> >>>>>presented as a good thing, but actually it's not - it just means we
> >>>>>have deferred the problem, and we can continue to defer it while we
> >>>>>pat ourselves on the back for having produced something. For
> >>>>>example, how do we use it to transport spectra? Aha, we need to
> >>>>>discuss this and agree it. How do we use it to describe datacenter
> >>>>>metadata? Aha, we need to discuss this and agree it. So in fact we
> >>>>>still have the original problem, and have solved nothing. Indeed,
> >>>>>we've made it worse, because now we have no way of checking our
> >>>>>agreements. Agreeing and publishing a schema means everyone
> >>>>>everywhere has something to develop against, and something to
> >>>>>validate against both as they publish data and receive it. You can
> >>>>>be sure you're not getting a spectra when you expect a catalogue, etc.
> >>>>
> >>>>
> >>>>
> >>>>I think this is the confusion about what astronomers mean by metadata
> >>>>and what computing science people mean by metadata cropping up again
> >>>>(fourth time in this mailstorm by my count - well done Clive Page
> >>>>for spotting it the first time!).
> >>>>
> >>>>It seems to me (for the purposes of this discussion an astronomer
> >>>>rather than a computing person) that, trying to interpret a spectrum
> >>>>as a catalogue is not in practice the kind of problem that your
> >>>>working astronomer is going to encounter. You submit a catalogue
> >>>>request to a federated catalogue-serving service, and you expect to
> >>>>get a catalogue back. If you submit a catalogue request to a
> >>>>federated spectrum-serving service and expect to get a catalogue back,
> >>>>well you probably don't have any business using the service.
> >>>
> >>>
> >>>That's fair, and *perhaps* this is a bad example - it's just the two
> >>>things that we seem to be using at the moment. However, I can think of
> >>>tools that might take spectra and catalogues, or make spectra out of
> >>>catalogues, or spectra out of catalogues and other spectra. And getting
> >>>the right information in to the right point is a Good Thing.
> >>>
> >>>
> >>>>The kind of problem which astronomers really do need to solve is
> >>>>comparing two catalogues and working out what column in one can
> >>>>sensibly be compared to what column in the other. Metadata in the
> >>>>sense that it appears in VOTable FIELD elements (UCDs, utypes, units)
> >>>>is the way to do this; I don't believe that you can make much or
> >>>>any contribution to it by using schemas.
> >>>
> >>>
> >>>Quite - here is a specific use-case on catalogues. But careful - your
> >>>comment is based on the fact that most astronomy is done by thinking in
> >>>terms of columns in a database. 1) I expect that this will continue -
> >>>most joins/comparisons will continue to happen in a database. 2) It can
> >>>be done just as easily using 'rich' XML as using VOTable. eg, Find all
> >>>elements of type "StellarObject" and plot their child elements named
> >>>"Mag" against the child elements named "Freq".
> >>>
> >>>Yes we definitely need all the extra information - let's call it the
> >>>astronomical metadata. Yes we *must* provide it (whether as part of the
> >>>same document/file or separately is a different debate). But there is
> >>>no reason why it can't be included in 'rich' XML documents using more
> >>>common XML techniques (such as XPointer) that other (common) tools can
> >>>make use of.
> >>>
> >>>
> >>>>I'm not saying that the computing science problems here are
> >>>>unimportant, but if we solve those problems at the expense of
> >>>>the astronomical ones, then our software and standards, however
> >>>>beautiful and XML-friendly they are, are not going to be useful for or
> >>>>used by our intended customers.
> >>>
> >>>
> >>>I agree hugely largely etc. In fact the reason I am pushing for
> >>>better-described schemas is because it will make interpreting the data
> >>>*so much easier*. I am not after pure XML, but after something that is
> >>>straightforward to use for people with some IT skills but an interest in
> >>>a different *specific* discipline: astronomers.
> >>>
> >>>Cheers,
> >>>
> >>>Martin
> >>>
> >>>
> >>
> >>--
> >>Martin Hill
> >>Software Engineer
> >>AstroGrid @ ROE
> >>Tel: +44 7901 55 24 66
> >>www.astrogrid.org
> >>
> >>
> >
> >
> > Guy Rixon gtr at ast.cam.ac.uk
> > Institute of Astronomy Tel: +44-1223-337542
> > Madingley Road, Cambridge, UK, CB3 0HA Fax: +44-1223-337523
>
Guy Rixon gtr at ast.cam.ac.uk
Institute of Astronomy Tel: +44-1223-337542
Madingley Road, Cambridge, UK, CB3 0HA Fax: +44-1223-337523
More information about the votable
mailing list