Putting the pieces together...
dtody at nrao.edu
Fri May 14 13:31:59 PDT 2004
Hi Tom -
It will probably be simpler to discuss this once we have the data model and
interface specifications to review (hopefully next week), but a little more
discussion should be helpful.
> > When you use SSA you will get back a spectrum (or SED) which conforms to the
> > SSA data model. It won't matter what format the data is in so long as your
> > application can parse it, since all formats encode the same data object.
> What does this mean... "conforms to the data model?" Does this
> mean that we're getting back a file that's in XML with a schema somewhere
> and items identified in the schema? Data cannot be in FITS?
The science data model (SDM) for a spectrum, SED, or whatever defines
the required and optional data elements, their allowed values, and their
meanings and relationships. The thinking now is that the SDM includes
a reference serialization in XML including a schema.
At the SDM level this is all rather abstract. The data access interface
(SSA in this case) defines both the query interface for the particular type
of data in question, e.g., a spectrum, and the standard formats in which
data can be returned, i.e., an export data format (EDF) such as VOTable,
FITS, XML, etc.
Each EDF is a physical realization of the SDM in some specific format.
In the case of VOTable the tabular elements of the SDM are represented
in tables, and UTYPE values are given to refer the table fields back to
elements of the SDM. In the case of FITS, the EDF defines the mapping of
the SDM into specific FITS representations, such as one or more binary
tables or one or more rows of a single binary table, with 8-character
FITS keyword names for all elements of the SDM (for FITS the keyword name
takes the place of UTYPE).
In theory, regardless of what file format is used, the same data object
is returned. The chief difference is that some formats may be more
permissible in terms of nonstandard extensions than others (generic
containers like VOTable are more permissible in this regard than pure
XML representations of a specific data model).
> Or maybe it means that the data can be in FITS or XML, but the data can
> be deserialized using some standard applications that recognize
> the formats and allow the users some API to the data? There's
> a set of standard Java interfaces and classes that are provided by the VO.
> Wil writes an equivalent set that C# users can try.
Data can be in any format for which a standard mapping has been defined
by the access interface. Yes, one could write a standard wrapper
class to decode any standard format and return the object via an API.
Ultimately we will probably want to provide reference code for both the
server side (a data access framework for implementing services) and for
the client side (libraries in major languages for applications to use).
Or applications developers will just write their own client interface.
> Or maybe it means that there are converter tools that know how to
> transform the data into a standard representation for a given data model?
> The converter tools are invokable from a few standard Web sites.
Transforming the data *into* a standard representation occurs mainly on
the server side and is done by the data access service. If the server
generates data in one format, e.g., XML, a standard module on the server
side could automatically generate the format requested by the client.
We could also make these transformer modules available as web services
but this should be avoided. Or the service might generate any format
directly in order to have maximum control over formatting, e.g., to be
able to add nonstandard extensions.
> These are three very different ways I could see a data model being
> instantiated usefully within the VO. There are probably others, CORBA
> and IDL could probably play a role. I could understand what conforming
> to a data model means if it meant one or more of these!
CORBA could also be used but this is something one would normally use
only for local area computing, e.g., to use a DAL service within a cluster
(SOAP is better for wide area computing such as for VO). For example, we
could implement a data access component to implement the client side of
a DAL service, using an API conformant to the SDM to provide concurrent
access to the data object to applications code running on a cluster.
The data access component would use a web service interface to talk to
the VO to fetch the data object, and a CORBA interface (or whatever)
to provide high performance shared access to the data within the cluster.
Again, the key thing here is that the data object conforms to the SDM.
The application sees only the SDM, and doesn't care about all the different
data access protocols and formats.
> Or does it mean, as you seem to imply below, that the data model is just
> a bit of documentation that is somehow associated with the file but
> every application needs to build their own custom parsers? I surely hope
> that's not what it is.
A data model is an abstract definition, but this does not mean every
application needs to build their own custom parsers. An existing example
is FITS WCS. FITS WCS is defined in a document. Various parties write
class code to implement WCS. We extract a WCS from some dataset, retrieve
it from a database, or whatever, load it into some WCS class library, and
we can use the WCS.
> To me the real purpose of data models is that I can
> build high level interfaces over the data model and hide
> low level implementation details. The VO provides me
> with some way of getting the standard information regardless
> of the low level stuff. But if everyone has to build their
> own low level interfaces that's not going to win us any friends! Every time
> a new low level format comes along everyone needs to update their codes.
> That can't be right.
This is pretty much right. Having an abstract definition of the data
model means that it is defined independently of the implementation,
so your applications code does not have to know about all the various
protocols and access interfaces.
> If the data model is only documentation then
> I think we've wasted a lot of time on this. Just as resource
> metadata isn't very helpful without registries
> that give us common access to metadata from many resources, data model
> metadata isn't very helpful unless we have API's that give us common
> access to datasets which implement these models.
> What are the software capabilities that the data models are associated with?
This is completely undefined. Of course you can write any sort of
application to manipulate such data.
We seem to be having an abstract discussion about something which is
actually pretty straightforward. Once we have initial implementations of
SSA services we should write an actual application or two, and demonstrate
things end-to-end. This will make it all a lot more concrete.
More information about the apps