a recipe for crumpets
edward.j.shaya.1 at gsfc.nasa.gov
Wed Jan 28 10:04:11 PST 2004
We are trying to synthesize a number of requirements into a consistent
model. We want to be able to make statements about very many different
types of objects using a vocabulary of terms from UCDs that is well over
1300 in number (to which we will be adding many more, I bet). We want
to be able to use XML tools, especially XPATH which then permits
XQuery. We need a high level language to express queries independent of
any datacenter's organization. We have extremely large quantities of
data that require the speed and compact size of relational databases.
But, our knowledge is not simply 2-dimensional and so one wants to be
able to address the data as if it is hierarchical, even though the
internal storage and access MAY be relational. This means that we
need clear rules for flattening and "crumbling".
Start by noting that a record in a table is usually a list of Quantities
about some Object. So we should have a clear way to identify in our XML
which elements are Objects and which are Properties, perhaps by
namespacing them. Along the way we find that there are a few tricks to
designing the schemas so that one generates nicer tables and directions
for VOTable to develop.
O are Objects. Statements always begin with an O element.
Object take P's, properties, which are of type A, G, M.
A is an Atomic Quantity, an example is RA, and the child O's are Metadata.
This is a Group Property of A's, each A typically is different, an
example is position with several coordinates. In fact each A requires a
bit of grouping to hold it together also, but I ignore that.
This is a Membership Property that holds Objects. An example is globular
clusters have M=MembersStars which holds many O=star. It is probably
best if each M is constrained to a certain range of Object type.
All of this is much like OWL-lite but I am paying special attention to
properties which take physical Objects as children. The OWL
objectProperty is a property that takes an Instance, ie not a native
number. We are now working a notch above OWL because our Quantities are
quite a bit richer than a common OWL property.
A basic example that conforms to O then P or M, M then O.
We can incorporate an image into this (we may not want to, but it can be
done without stretching too far) by simply noticing that each pixel
mapped onto the sky is a region of the sky which is an Object.
We may need to extend our id to include a position Group.
So an image, spectra, or timeseries is
I=(O*,M) The first O* is metadata and the M refers to a series of O(id,A)
as in M=[O(spot1,A), O(spot2,A), O(spot3,A),...., O(spotN,A)]
But, in this fancy image one can add additional information at any spot.
So, one can easily add-in O(spot1,A/P1,A2/P2),O(spot2,A,M(O*)...), etc.
Why can we do this?
Because it is XML and so you can do just about anything.
And in fact we can include spectra and time series in a similar way. We
simply think about a region in coordinate space as an Object.
The path to any A Quantity starts with an O passes through 0 or more
M/O, then ends with a series of G's and finally the A. For instance:
Xpath = /O/M/O/G/G/A
represents A cluster of galaxies that M_hasGalaxies and these have
velocities measured and there are radial velocities and one of them is
=/GalaxyCluster at id="343"/MemberStars/Star at id="2323"/Velocities/RadialVelocities/RadioCZ
(Actually I am cheating a bit on the Xpath expression just for explanation).
There is a flattening algorithm that is wonderfully simple:
At the top level one can make tables of each ObjectType. Then, whenever
there is an M, each M becomes a table and the table id is the Xpath to M.
So there is a table here:
TableName='/GalaxyCluster at id="343"/MemberStars'
In the top level table, each A is 3 or so columns (value, error, units),
but for an M property a single column contains the pointer to the "MTable".
The table consists of stars in GalaxyCluster343 and has all of the A`a
and G's of A's.
On the unlikely chance that there are actually several MemberStars at
this point one needs to allow for a qualifier attribute. It does not
modify the theory though because this is to be thought of as subclassing
One thing that I swept under the rug is the metadata in each A. These
can go into FIELD/Metadata. But, if they differ from item to item then
we need a column whose cells take XML. Also note that an Mtable in the
Metadata is a likely occurrence, so this has to be transformed into a
table and a pointer replaces it in the cell.
As it turns out Norman Grey has just described how one adds extra
branches of XML info into VOTABLE (see the VOTable discussion list,
My conclusion is that one gets a wonderfully simple but powerful
mechanism if one can identify XML elements as one of type O, A, G, or
M. Actually O and A can be detected simply by position. It is the
M element that is difficult to distinguish (for the computer, that is)
from A. So we could name these special properties starting with M: or
M_ or whatever.
This all follows from simply noting that a table is confined to a
/O/G/A or O/G/G/A (or can be cast into this) but that these may be
incorporated into a hierarchical pattern by linking properties, M's.
IF this works, it would mean that with a little bit of simple code to
flatten and crumble and to convert XPATH into SQL, any relational
database can become an XML ORDB. The price is that schema need to follow
a few rules.
More information about the dm