Is XPATH the way to search a data model?
gtr at ast.cam.ac.uk
Tue May 18 01:28:51 PDT 2004
On Mon, 17 May 2004, David Berry wrote:
> Obviously, the ability to search our data models will be very important,
Doubtless, but what exactly do you mean by "searching a model". Parsing the
model's W3C XML schema in order to generate software? Or parsing a data
instance that onforms to the model? Let's assume the second one for now.
> but should we just assume that XPATH is the best way to do it? My question
> is, should we be optimising our data models specifically so that they can
> be searched using XPATH? This seems to be the general assumption, but I
> have two questions with this:
> 1) Does not the fact that XPTAH is a specifically XML thing not mean that
> it is more to do with data >formats< than data >models<? Fine if you
> serialise your data as canonical XML but what if you use (e.g.)
> stand-alone FITS files, or in some non-canonical XML format for which your
> XPATH expressions are not valid?
XPATH is for searching data organized as graphs of nodes, where the graphs are
acyclic and the nodes are specialized as "elements", "attributes" and "text
values". Clearly, it's meant for searching XML, but, IIRC, it's actually
defined to work on XML infoset, not directly on XML. In practice, XPath
implementations usually work on DOM. (Dogma: XML is unicode marked up with
angle brackets; inforset is not XML; DOM is not XML.)
XPath implemntations ought to work on any DOM instance whether or not that
instance is derived from XML. If we could describe the content of a FITS file
as a DOM instance, then we ought to be able to search it with XPath. Mapping
from the selected nodes back to the bytes in the FITS file is a different
problem that we would have to solve for ourselves.
> 2) Can it have astronomical knowledge built into it, or is it just a
> sort of dumb regexp system for structured text?
It's the dumb one. But, as i said above, it's regexp for a data graph, not for
text; this makes it more powerful. If we want to use XPath for astronomical
inference, then we have to do that via the details of the data model.
> What I mean is, if for
> instance you searched a StandardQuantity for a Frame (Frame "A") holding
> the 3 axes:
> (heliocentric radio velocity, ICRS RA, ICRS Dec)
> could XPATH do anything sensible if the StdQ did not contain this exact
> Frame, but instead contained a Frame (Frame "B") containing the 3 axes:
> (Galactic longitude, geocentric frequency, galactic latitude)
> ?? Obviously Frame A and Frame B are not identical, but given a
> position in Frame B it is possible to convert it into Frame A without
> needing any extra information over and above that stored in the
> Frames. What we want is a search system which has enough astronomical
> knowledge to be able to do this. What you want from your search system is
> for it to say "no I cannot find Frame A but Frame B looks very similar and
> I can give you a Mapping which will convert positions in Frame B into the
> corresponding positions in Frame A". Such recoverable mis-matches between
> what the client wants and what the server can provide is bound to happen
> over and over again in the VO. So can XPATH be used to do this sort of
> searching? If not, then should we not drop XPATH in favour of building
> customised intelligent searching into our code which searches the data
> model itself rather than just some specific data format?
If Frame A and Frame B are of types that share some common ancestor in the
inheritence scheme, then I _think_ that XPath can be set to search for that
ancestor. I.e., you could ask it to find a StandardQuantity containing a
CoordinateFrame, say. Or you could pattern-match the UCDs looking for "POS.".
XPath does searching; it doesn't do transformation. See XSLT for
transformation, and note that XSLT is built around XPath: it looks for
patterns using XPath and applies transformations to them. In principle, we
could build an astronomical search-and-transform thingy using XSLT scripts.
Guy Rixon gtr at ast.cam.ac.uk
Institute of Astronomy Tel: +44-1223-337542
Madingley Road, Cambridge, UK, CB3 0HA Fax: +44-1223-337523
More information about the dm