UCD for SIAP
dtody at nrao.edu
Tue Jun 17 21:17:36 PDT 2003
Certainly we can replace the VOX: namespace UCDs with global UCDs, so
long as we are willing to make up new UCDs where needed. This might be
a good short-term solution.
The key problem I see with trying to use existing UCDs is that historically
UCDs have been used primarily as fuzzy tags to link similar fields in
catalogs. In data access metadata such as is introduced in SIA we are
using UCDs to identify the fields of a formal data model. Here the tag
is not fuzzy at all, linking similar fields of unrelated catalogs, rather
it is a link to a field of a formally defined data model. Precision is
important for these data models - we are precisely defining attributes
of the data model.
We should formally define data models such as spectralBandpass or WCS
and define, as part of the data model, the UCD tag used to identify an
attribute of the data model. When we represent a data model as a set of
related columns in a table, or as an entity struct in XML (as in IDHA or
HDX), we will use the UCDs to formally type the data model attributes so
that programs can use them unambiguously, so that we can use XML Schemas
for automated validation, and so forth.
This one-to-one mapping of UCDs to formal data models is a concept that
does not currently exist in UCDs. If we try to take a more classical
UCD-like approach and use UCDs to associate "similar" fields of different
data models, then we no longer have a precisely defined data model.
This association of "similar" fields of different data models should occur
in the data model definitions, where data models may define attributes
in terms of more fundamental data models or quantities.
Some more specific comments based on your proposed UCDs. I haven't tried
to be complete, these are only examples.
> example is there a reason for VOX:Image_AccessReference as a new UCD, why
> can't we simply use DATA_LINK from the existing set?
This would work if we have only one DATA_LINK in an interface such as
SIA, and we define that within this particular interface, DATA_LINK means
the formally defined SIA Image_AccessReference. If for some reason we
have two DATA_LINK attributes then we are in trouble, as the type is
then overloaded and the meaning is ambiguous. The problem with what you
suggest is that we are inherently overloading the type. It might work
for a while, but will cause a problem in the future if we apply the same
logic to a similar attribute. Since the attribute is precisely defined
we gain nothing by using a fuzzy tag.
> ** new: POS_TRANSF_WCS_NAXES
> specifying the number of image axes.
> ** new: POS_TRANSF_WCS_NAXIS
> NOTE: Can a UCD refer to an array like this?
> with the array value giving the length in pixels of each image axis.
This sort of mapping of the WCS data model onto "standard" UCDs (if
newly defined) is certainly possible, so long as we define these tags as
part of the WCS data model.
However, the geometry of an image (NAXES, NAXIS) is not really part of
the WCS - these are image attributes (in FITS they existed long before WCS).
A WCS is associated with an image. CDELT, FRAME, etc., are part of the WCS.
Do we need an image geometry data model? A general image attributes
data model? NAXES, NAXIS are clearly (or shall we say, clearly should be)
precisely defined terms of some formal image data model.
> ** existing: INST_FILTER_CODE
> identifying the bandpass by name (e.g., "V", "SDSS_U", "K", "K-Band", etc.).
BandPass is a more general concept that "instrument" or "filter". Filters
and instruments are examples of specific entities that have a bandpass.
"bandpass" should be a formal data model with "bandpass"-specific attributes
> ** existing: UNITS -- but should be GROUPED with Bandpass
> identifying the units used to represent spectral values, selected from
> "meters", "hertz", and "keV".
What do you do once there are two data models both of which need to define
their units? GROUPED above, whatever that is, may address this, but it
is better to have an explicit, unambiguous attribute. The units have to be
precisely defined. The data model could define a default if this parameter
> ** existing: CODE_MISC
> specifying the type of processing done by the image service to produce an
> output image pixel. The string value should be formed from some
> combination of the following character codes: C, F, X, Z, V
What do you do as soon as there are two fields with UCD=CODE_MISC in the
metadata? In the current SIA, the implication is that Image_PixFlags is
an attribute of some sort of "image" data model used in SIA, hence it has
a precise definition.
Similar comments apply to other UCDs where we confuse UCD associations
with data model terms.
I think it would help a lot if we just took one of these simple data
models, e.g., spectralBandpass, and formally defined it, with UCDs
assigned to identify attributes. Then we could use the same approach
to define all the other SIA image attributes, grouped by data model.
Later perhaps we can show how to define data models in terms of other
data models or ultimately Quantities, to fully define complex data objects
via a hierarchy of formal definitions.
More information about the dm