From roy at cacr.caltech.edu Tue Jun 17 13:03:25 2003 From: roy at cacr.caltech.edu (Roy Williams) Date: Tue, 17 Jun 2003 13:03:25 -0700 Subject: UCD for SIAP Message-ID: <02f601c3350b$83283ab0$6b91d783@cacr.caltech.edu> One of my actions as part of the UCD steering committee is to work with the "VOX" new UCDs that were created for the SIAP protocol. Below, I have listed all of the VOX UCDs that could find in the SIAP definition, with a little bit of the description. I have tried to find an equivalent in the standard UCD set, and in many cases was successful. For example is there a reason for VOX:Image_AccessReference as a new UCD, why can't we simply use DATA_LINK from the existing set? This is why it says "**existing: DATA_LINK" for that entry. Of course, there will a revision when we steel on the "base+modifier" scheme for UCD that we decided in Cambridge. Please can some of you look at the suggested equivalents fron the exisiting set, and the suggested new UCDs in the listing below? We gain interoperability from reusing and following the existing tree -- but do we lose anything? Thenk You Roy -------- Caltech Center for Advanced Computing Research roy at cacr.caltech.edu 626 395 3670 VOX:Image_Title ** existing: ID_IMAGE containing a short (usually one line) description of the image. VOX:Image_MJDateObs ** existing: TIME_DATE with datatype="double", representing the mean modified Julian date of the observation. VOX:Image_Naxes ** new: POS_TRANSF_WCS_NAXES specifying the number of image axes. VOX:Image_Naxis ** new: POS_TRANSF_WCS_NAXIS NOTE: Can a UCD refer to an array like this? with the array value giving the length in pixels of each image axis. VOX:Image_Scale ** new: POS_TRANSF_WCS_CDELT NOTE: Can a UCD refer to an array like this? with the array value giving the scale in degrees per pixel of each image axis. VOX:Image_Format ** new: DATA_TYPE_MIME specifying the MIME-type of the object associated with the image acref, e.g., image/fits", "text/html", and so forth. VOX:STC_CoordRefFrame ** existing: ID_FRAME representing the coordinate system reference frame, selected from "ICRS", "FK5", "FK4", "ECL", "GAL", and "SGAL". VOX:STC_CoordEquinox ** existing: TIME_EQUINOX representing the Equinox (not required for ICRS) of the coordinate system used for the image world coordinate system (WCS). VOX:WCS_CoordProjection ** new: POS_TRANSF_WCS_CTYPE with the array value being the three-character code ("TAN", "ARC", "SIN", etc.) VOX:WCS_CoordRefPixel ** new: POS_TRANSF_WCS_CRPIX with the array value specifying the image pixel coordinates of the WCS reference pixel. This is identical to "CRPIX" in FITS WCS. VOX:WCS_CoordRefValue ** new: POS_TRANSF_WCS_CRVAL with the array value specifying the world coordinates of the WCS reference pixel. This is identical to "CRVAL" in FITS WCS. VOX:WCS_CDMatrix ** new: POS_TRANSF_WCS_CD with the array (matrix) value specifying the WCS CD matrix. VOX:BandPass_ID ** existing: INST_FILTER_CODE identifying the bandpass by name (e.g., "V", "SDSS_U", "K", "K-Band", etc.). VOX:BandPass_Unit ** existing: UNITS -- but should be GROUPED with Bandpass identifying the units used to represent spectral values, selected from "meters", "hertz", and "keV". VOX:BandPass_RefValue ** new: INST_FILTER_REF specifying the characteristic (reference) frequency, wavelength, or energy for the bandpass model. VOX:BandPass_HiLimit ** new: INST_FILTER_MAX specifying the upper limit of the bandpass. VOX:BandPass_LoLimit ** new: INST_FILTER_MIN specifying the lower limit of the bandpass. VOX:Image_PixFlags ** existing: CODE_MISC specifying the type of processing done by the image service to produce an output image pixel. The string value should be formed from some combination of the following character codes: C, F, X, Z, V VOX:Image_AccessReference ** existing: DATA_LINK specifying the URL to be used to access or retrieve the image. VOX:Image_AccessRefTTL ** existing: TIME_DELAY specifying the minimum time to live in seconds of the access reference. VOX:Image_FileSize ** new: DATA_SIZE representing the actual or estimated size of the encoded image in bytes (not pixels!). From dtody at nrao.edu Tue Jun 17 21:17:36 2003 From: dtody at nrao.edu (Doug Tody) Date: Tue, 17 Jun 2003 22:17:36 -0600 (MDT) Subject: UCD for SIAP In-Reply-To: <02f601c3350b$83283ab0$6b91d783@cacr.caltech.edu> Message-ID: Roy - Certainly we can replace the VOX: namespace UCDs with global UCDs, so long as we are willing to make up new UCDs where needed. This might be a good short-term solution. The key problem I see with trying to use existing UCDs is that historically UCDs have been used primarily as fuzzy tags to link similar fields in catalogs. In data access metadata such as is introduced in SIA we are using UCDs to identify the fields of a formal data model. Here the tag is not fuzzy at all, linking similar fields of unrelated catalogs, rather it is a link to a field of a formally defined data model. Precision is important for these data models - we are precisely defining attributes of the data model. We should formally define data models such as spectralBandpass or WCS and define, as part of the data model, the UCD tag used to identify an attribute of the data model. When we represent a data model as a set of related columns in a table, or as an entity struct in XML (as in IDHA or HDX), we will use the UCDs to formally type the data model attributes so that programs can use them unambiguously, so that we can use XML Schemas for automated validation, and so forth. This one-to-one mapping of UCDs to formal data models is a concept that does not currently exist in UCDs. If we try to take a more classical UCD-like approach and use UCDs to associate "similar" fields of different data models, then we no longer have a precisely defined data model. This association of "similar" fields of different data models should occur in the data model definitions, where data models may define attributes in terms of more fundamental data models or quantities. Some more specific comments based on your proposed UCDs. I haven't tried to be complete, these are only examples. > example is there a reason for VOX:Image_AccessReference as a new UCD, why > can't we simply use DATA_LINK from the existing set? This would work if we have only one DATA_LINK in an interface such as SIA, and we define that within this particular interface, DATA_LINK means the formally defined SIA Image_AccessReference. If for some reason we have two DATA_LINK attributes then we are in trouble, as the type is then overloaded and the meaning is ambiguous. The problem with what you suggest is that we are inherently overloading the type. It might work for a while, but will cause a problem in the future if we apply the same logic to a similar attribute. Since the attribute is precisely defined we gain nothing by using a fuzzy tag. > VOX:Image_Naxes > ** new: POS_TRANSF_WCS_NAXES > specifying the number of image axes. > > VOX:Image_Naxis > ** new: POS_TRANSF_WCS_NAXIS > NOTE: Can a UCD refer to an array like this? > with the array value giving the length in pixels of each image axis. This sort of mapping of the WCS data model onto "standard" UCDs (if newly defined) is certainly possible, so long as we define these tags as part of the WCS data model. However, the geometry of an image (NAXES, NAXIS) is not really part of the WCS - these are image attributes (in FITS they existed long before WCS). A WCS is associated with an image. CDELT, FRAME, etc., are part of the WCS. Do we need an image geometry data model? A general image attributes data model? NAXES, NAXIS are clearly (or shall we say, clearly should be) precisely defined terms of some formal image data model. > VOX:BandPass_ID > ** existing: INST_FILTER_CODE > identifying the bandpass by name (e.g., "V", "SDSS_U", "K", "K-Band", etc.). BandPass is a more general concept that "instrument" or "filter". Filters and instruments are examples of specific entities that have a bandpass. "bandpass" should be a formal data model with "bandpass"-specific attributes (UCDs). > VOX:BandPass_Unit > ** existing: UNITS -- but should be GROUPED with Bandpass > identifying the units used to represent spectral values, selected from > "meters", "hertz", and "keV". What do you do once there are two data models both of which need to define their units? GROUPED above, whatever that is, may address this, but it is better to have an explicit, unambiguous attribute. The units have to be precisely defined. The data model could define a default if this parameter is absent. > VOX:Image_PixFlags > ** existing: CODE_MISC > specifying the type of processing done by the image service to produce an > output image pixel. The string value should be formed from some > > combination of the following character codes: C, F, X, Z, V What do you do as soon as there are two fields with UCD=CODE_MISC in the metadata? In the current SIA, the implication is that Image_PixFlags is an attribute of some sort of "image" data model used in SIA, hence it has a precise definition. Similar comments apply to other UCDs where we confuse UCD associations with data model terms. I think it would help a lot if we just took one of these simple data models, e.g., spectralBandpass, and formally defined it, with UCDs assigned to identify attributes. Then we could use the same approach to define all the other SIA image attributes, grouped by data model. Later perhaps we can show how to define data models in terms of other data models or ultimately Quantities, to fully define complex data objects via a hierarchy of formal definitions. - Doug From derriere at newb6.u-strasbg.fr Fri Jun 20 08:34:56 2003 From: derriere at newb6.u-strasbg.fr (Sebastien Derriere) Date: Fri, 20 Jun 2003 17:34:56 +0200 Subject: UCD for SIAP References: Message-ID: <3EF329A0.6B013D17@astro.u-strasbg.fr> Doug Tody wrote: > > The key problem I see with trying to use existing UCDs is that historically > UCDs have been used primarily as fuzzy tags to link similar fields in > catalogs. In data access metadata such as is introduced in SIA we are > using UCDs to identify the fields of a formal data model. Here the tag > is not fuzzy at all, linking similar fields of unrelated catalogs, rather > it is a link to a field of a formally defined data model. Precision is > important for these data models - we are precisely defining attributes > of the data model. > > We should formally define data models such as spectralBandpass or WCS > and define, as part of the data model, the UCD tag used to identify an > attribute of the data model. When we represent a data model as a set of > related columns in a table, or as an entity struct in XML (as in IDHA or > HDX), we will use the UCDs to formally type the data model attributes so > that programs can use them unambiguously, so that we can use XML Schemas > for automated validation, and so forth. Hello, The primary goal of UCDs is to ensure interoperability between heterogeneous datasets. That's why they have been defined to some "reasonable" level of precision (what you call fuzziness). Internal attributes of a formally defined data model can be defined at any level of precision, and have their own names. But you can have *in addition* a UCD attached to every attribute (see the case of the IDHA model). Those UCD can ensure interoperability between different data models, and between data models and datasets. The names of the attributes can not a priori ensure this task, because nothing prevents from having the same concept named differently in different models. Sebastien. -- _______ / ~ /, Sebastien Derriere mailto:derriere at astro.u-strasbg.fr / ~~~~ // Observatoire de Strasbourg Phone +33 (0) 390 242 444 /______// 11, rue de l'universite Telefax +33 (0) 390 242 417 (______(/ F-67000 Strasbourg France From dtody at nrao.edu Fri Jun 20 11:55:29 2003 From: dtody at nrao.edu (Doug Tody) Date: Fri, 20 Jun 2003 12:55:29 -0600 (MDT) Subject: UCD for SIAP In-Reply-To: <3EF329A0.6B013D17@astro.u-strasbg.fr> Message-ID: Sebastien - Good - we agree I think. This is exactly the point I was trying to make about data models and UCDs. The attributes of formal data models need to be defined precisely and unambiguously. The attributes of different data models need to be uniquely identified by some means, e.g., a globally unique name or reference (e.g., a form of UCD), a namespace (e.g., our temporary VOX namespace), or some hierarchical structure as in IDHA. The attributes of different data models, although they need to be distinguished from one another, may well share the same fundamental type, and a UCD could be used to express this. Using different approaches to naming data model attributes and types (UCDs), as I think you are suggesting below, is one way to solve the problem. This provides both the precision required to identify DM attributes, and the means to associate elements of different data models for interoperability. The only problem I see with this is that we would like flexibility in how we represent data models and metadata. Mapping DM attributes into the columns of a flat table, as in SIA or in a FITS header, is convenient and can simplify representations, up to a point. If datasets get complex enough then eventually one needs more structure and an approach such as IDHA or HDX may be called for. In many cases the simpler representation is adequate. It would be good if the underlying mechanisms, such as UCDs and how we define data models, were flexible enough to permit a variety of such representations. If we map the attributes of a DM into table columns and we do NOT use the UCD to identify the DM attribute, then we need another tag of some sort for this purpose. This would be no problem in XML, but we would have the nuisance of carrying along an additional tag separate from the UCD. In VOTable this would give us NAME, ID, UCD, plus a new tag for the formal DM attribute assocation (conceivably ID could be used for this purpose but it already has other uses). In a representation such as FITS, (e.g., if we try to represent VO data in FITS), then it is harder. In this case one might want to use the comment field of a FITS keyword to contain something like a UCD: keyword = value / UCD. I am not saying we necessarily want to do this, but it is an example of representation flexibility and it would be good if our scheme could extend to this level. If we DO use the UCD to carry this additional meaning, then the global UCD namespace could include both formal DM attribute names, and the more fundamental types used to associate different data elements as at present. UCDs would then provide a global naming index, with a single name (the UCD) being sufficient to carry all this meaning. Given the UCDs and an understanding of the associated DM (stored separately) we would then be able to recognize that different metadata elements (table columns in this case) are associated, define and use an XML schema to verify the integrity of the DM subset in these columns, use semantic relationships for inference, and so forth. In this case what we would do is use the UCD tag in a representation to convey the data model attribute name, uniquely identifying both the data model and the attribute of the data model. The formal definition of the DM would then define each attribute of the DM, ** giving for each attribute the UCD type of the attribute **. If this UCD type is elemental then we would have the desired interoperability, and the means to associate and compare similar data elements. UCDs would thus provide the metadata "glue" to link related concepts such as fundamental quantities and data models, making possible a uniform representation for both. To summarize, UCDs or something like them can play a key role to structure and link fundamental metadata and data models. The issue has already come up in interfaces like SIA and IDHA. Can we come up with something which is sufficiently powerful and general to provide both types of representations? - Doug On Fri, 20 Jun 2003, Sebastien Derriere wrote: > Doug Tody wrote: > > > > The key problem I see with trying to use existing UCDs is that historically > > UCDs have been used primarily as fuzzy tags to link similar fields in > > catalogs. In data access metadata such as is introduced in SIA we are > > using UCDs to identify the fields of a formal data model. Here the tag > > is not fuzzy at all, linking similar fields of unrelated catalogs, rather > > it is a link to a field of a formally defined data model. Precision is > > important for these data models - we are precisely defining attributes > > of the data model. > > > > We should formally define data models such as spectralBandpass or WCS > > and define, as part of the data model, the UCD tag used to identify an > > attribute of the data model. When we represent a data model as a set of > > related columns in a table, or as an entity struct in XML (as in IDHA or > > HDX), we will use the UCDs to formally type the data model attributes so > > that programs can use them unambiguously, so that we can use XML Schemas > > for automated validation, and so forth. > > Hello, > > The primary goal of UCDs is to ensure interoperability between > heterogeneous datasets. That's why they have been defined to some > "reasonable" level of precision (what you call fuzziness). > Internal attributes of a formally defined data model can be defined > at any level of precision, and have their own names. But you can > have *in addition* a UCD attached to every attribute (see the case > of the IDHA model). Those UCD can ensure interoperability between > different data models, and between data models and datasets. > The names of the attributes can not a priori ensure this task, > because nothing prevents from having the same concept named > differently in different models. > > Sebastien. From dtody at nrao.edu Fri Jun 20 17:13:10 2003 From: dtody at nrao.edu (Doug Tody) Date: Fri, 20 Jun 2003 18:13:10 -0600 (MDT) Subject: UCD for SIAP In-Reply-To: <02f601c3350b$83283ab0$6b91d783@cacr.caltech.edu> Message-ID: Roy - I don't think we are actually very far apart on this. The main thing I am trying to do is determine how to deal with data models in DAL headers such as for SIA. An SIA contains two types of information all collected together in a flat votable: - Attributes which are part of the SIA interface (e.g., AccessReference). - Attributes which are part of formal data models which are mapped into the columns of the table, e.g., BandPass, WCS. The current SIA defines these as part of the interface but in principle what you see represents formally defined data models which are defined external to SIA. The SIA interface attributes could use standard UCDs (e.g., DATA_LINK) since the SIA interface is a controlled namespace and we can precisely defined what UCDs mean when present in an SIA votable. The data model attributes are something different. In general we might want to use any set of data models to define the attributes of a dataset described by a DAL service. A data model like BandPass or WCS should stand on its own, and be reusable in various contexts, e.g., any DAL service, or any VOTable (or VO-in-FITS) representation of a dataset. If we map the attributes of a data model into the columns of a VOtable (or the keywords of a FITS header) then the attributes need to be uniquely named so that they don't get confused with one another, so that we can verify data model integrity (no missing required attributes), and so forth. Some mechanism is needed to uniquely and unambiguously identify the attributes of a data model in such a context. It doesn't have to be the UCD, but this seems to be a natural extension of the UCD concept and usage. Your suggested UCDs for BandPass are close to what is needed, since they are basically direct one-to-one mappings of the data model attributes. We just need to go one step further and define a formal relationship between such UCDs and data model attributes. I went back and looked at your slides from the IVOA workshop again to see what I could learn. The summary slide goes like this: UCD is inherently fuzzy UCD is a description, not a unique name ... UCD will be eventually replaced by "pointers into data model" It seems to me this is exactly what we have been talking about. Conventional UCDs are fuzzy tags used to associate similar data elements. If we use a UCD to tag an attribute of a data model this UCD is a **pointer into a data model**. If we then go and look up the definition of this data model, it may state that the pointed-to DM attribute in turn has a UCD defining its "type", allowing it to be associated with other similar data elements in the more conventional UCD way. It is not clear if UCDs really need to be replaced by a new scheme providing pointers into data models - they come pretty close to this already. This is what SIA is trying to do; some such scheme is needed to represent data models in DAL protocols like SIA, even for these early versions. Well that's it for me for a while as I am about to leave on travel. I do think we are close to resolving this, and maybe even taking a step forward towards integrating data models and metadata via UCDs. We would like to do something more standard for these new "data model" UCDs in the upcoming DAL interfaces. - Doug On Tue, 17 Jun 2003, Roy Williams wrote: > One of my actions as part of the UCD steering committee is to work with the > "VOX" new UCDs that were created for the SIAP protocol. > > Below, I have listed all of the VOX UCDs that could find in the SIAP > definition, with a little bit of the description. I have tried to find an > equivalent in the standard UCD set, and in many cases was successful. For > example is there a reason for VOX:Image_AccessReference as a new UCD, why > can't we simply use DATA_LINK from the existing set? This is why it says > "**existing: DATA_LINK" for that entry. > > Of course, there will a revision when we steel on the "base+modifier" scheme > for UCD that we decided in Cambridge. > > Please can some of you look at the suggested equivalents fron the exisiting > set, and the suggested new UCDs in the listing below? We gain > interoperability from reusing and following the existing tree -- but do we > lose anything? > > Thenk You > Roy > > -------- > Caltech Center for Advanced Computing Research > roy at cacr.caltech.edu > 626 395 3670 > > VOX:Image_Title > ** existing: ID_IMAGE > containing a short (usually one line) description of the image. > > VOX:Image_MJDateObs > ** existing: TIME_DATE > with datatype="double", representing the mean modified Julian date of the > observation. > > VOX:Image_Naxes > ** new: POS_TRANSF_WCS_NAXES > specifying the number of image axes. > > VOX:Image_Naxis > ** new: POS_TRANSF_WCS_NAXIS > NOTE: Can a UCD refer to an array like this? > with the array value giving the length in pixels of each image axis. > > VOX:Image_Scale > ** new: POS_TRANSF_WCS_CDELT > NOTE: Can a UCD refer to an array like this? > with the array value giving the scale in degrees per pixel of each image > axis. > > VOX:Image_Format > ** new: DATA_TYPE_MIME > specifying the MIME-type of the object associated with the image acref, > e.g., image/fits", "text/html", and so forth. > > VOX:STC_CoordRefFrame > ** existing: ID_FRAME > representing the coordinate system reference frame, selected from "ICRS", > "FK5", "FK4", "ECL", "GAL", and "SGAL". > > VOX:STC_CoordEquinox > ** existing: TIME_EQUINOX > representing the Equinox (not required for ICRS) of the coordinate system > used for the image world coordinate system (WCS). > > VOX:WCS_CoordProjection > ** new: POS_TRANSF_WCS_CTYPE > with the array value being the three-character code ("TAN", "ARC", "SIN", > etc.) > > VOX:WCS_CoordRefPixel > ** new: POS_TRANSF_WCS_CRPIX > with the array value specifying the image pixel coordinates of the WCS > > reference pixel. This is identical to "CRPIX" in FITS WCS. > > VOX:WCS_CoordRefValue > ** new: POS_TRANSF_WCS_CRVAL > with the array value specifying the world coordinates of the WCS reference > pixel. This is identical to "CRVAL" in FITS WCS. > > VOX:WCS_CDMatrix > ** new: POS_TRANSF_WCS_CD > with the array (matrix) value specifying the WCS CD matrix. > > VOX:BandPass_ID > ** existing: INST_FILTER_CODE > identifying the bandpass by name (e.g., "V", "SDSS_U", "K", "K-Band", etc.). > > VOX:BandPass_Unit > ** existing: UNITS -- but should be GROUPED with Bandpass > identifying the units used to represent spectral values, selected from > "meters", "hertz", and "keV". > > VOX:BandPass_RefValue > ** new: INST_FILTER_REF > specifying the characteristic (reference) frequency, wavelength, or energy > for the bandpass model. > > VOX:BandPass_HiLimit > ** new: INST_FILTER_MAX > specifying the upper limit of the bandpass. > > VOX:BandPass_LoLimit > ** new: INST_FILTER_MIN > specifying the lower limit of the bandpass. > > VOX:Image_PixFlags > ** existing: CODE_MISC > specifying the type of processing done by the image service to produce an > output image pixel. The string value should be formed from some > > combination of the following character codes: C, F, X, Z, V > > VOX:Image_AccessReference > ** existing: DATA_LINK > specifying the URL to be used to access or retrieve the image. > > VOX:Image_AccessRefTTL > ** existing: TIME_DELAY > specifying the minimum time to live in seconds of the access reference. > > VOX:Image_FileSize > ** new: DATA_SIZE > representing the actual or estimated size of the encoded image in bytes (not > pixels!). > > From roy at cacr.caltech.edu Tue Jun 17 13:03:25 2003 From: roy at cacr.caltech.edu (Roy Williams) Date: Tue, 17 Jun 2003 13:03:25 -0700 Subject: UCD for SIAP Message-ID: <02f601c3350b$83283ab0$6b91d783@cacr.caltech.edu> One of my actions as part of the UCD steering committee is to work with the "VOX" new UCDs that were created for the SIAP protocol. Below, I have listed all of the VOX UCDs that could find in the SIAP definition, with a little bit of the description. I have tried to find an equivalent in the standard UCD set, and in many cases was successful. For example is there a reason for VOX:Image_AccessReference as a new UCD, why can't we simply use DATA_LINK from the existing set? This is why it says "**existing: DATA_LINK" for that entry. Of course, there will a revision when we steel on the "base+modifier" scheme for UCD that we decided in Cambridge. Please can some of you look at the suggested equivalents fron the exisiting set, and the suggested new UCDs in the listing below? We gain interoperability from reusing and following the existing tree -- but do we lose anything? Thenk You Roy -------- Caltech Center for Advanced Computing Research roy at cacr.caltech.edu 626 395 3670 VOX:Image_Title ** existing: ID_IMAGE containing a short (usually one line) description of the image. VOX:Image_MJDateObs ** existing: TIME_DATE with datatype="double", representing the mean modified Julian date of the observation. VOX:Image_Naxes ** new: POS_TRANSF_WCS_NAXES specifying the number of image axes. VOX:Image_Naxis ** new: POS_TRANSF_WCS_NAXIS NOTE: Can a UCD refer to an array like this? with the array value giving the length in pixels of each image axis. VOX:Image_Scale ** new: POS_TRANSF_WCS_CDELT NOTE: Can a UCD refer to an array like this? with the array value giving the scale in degrees per pixel of each image axis. VOX:Image_Format ** new: DATA_TYPE_MIME specifying the MIME-type of the object associated with the image acref, e.g., image/fits", "text/html", and so forth. VOX:STC_CoordRefFrame ** existing: ID_FRAME representing the coordinate system reference frame, selected from "ICRS", "FK5", "FK4", "ECL", "GAL", and "SGAL". VOX:STC_CoordEquinox ** existing: TIME_EQUINOX representing the Equinox (not required for ICRS) of the coordinate system used for the image world coordinate system (WCS). VOX:WCS_CoordProjection ** new: POS_TRANSF_WCS_CTYPE with the array value being the three-character code ("TAN", "ARC", "SIN", etc.) VOX:WCS_CoordRefPixel ** new: POS_TRANSF_WCS_CRPIX with the array value specifying the image pixel coordinates of the WCS reference pixel. This is identical to "CRPIX" in FITS WCS. VOX:WCS_CoordRefValue ** new: POS_TRANSF_WCS_CRVAL with the array value specifying the world coordinates of the WCS reference pixel. This is identical to "CRVAL" in FITS WCS. VOX:WCS_CDMatrix ** new: POS_TRANSF_WCS_CD with the array (matrix) value specifying the WCS CD matrix. VOX:BandPass_ID ** existing: INST_FILTER_CODE identifying the bandpass by name (e.g., "V", "SDSS_U", "K", "K-Band", etc.). VOX:BandPass_Unit ** existing: UNITS -- but should be GROUPED with Bandpass identifying the units used to represent spectral values, selected from "meters", "hertz", and "keV". VOX:BandPass_RefValue ** new: INST_FILTER_REF specifying the characteristic (reference) frequency, wavelength, or energy for the bandpass model. VOX:BandPass_HiLimit ** new: INST_FILTER_MAX specifying the upper limit of the bandpass. VOX:BandPass_LoLimit ** new: INST_FILTER_MIN specifying the lower limit of the bandpass. VOX:Image_PixFlags ** existing: CODE_MISC specifying the type of processing done by the image service to produce an output image pixel. The string value should be formed from some combination of the following character codes: C, F, X, Z, V VOX:Image_AccessReference ** existing: DATA_LINK specifying the URL to be used to access or retrieve the image. VOX:Image_AccessRefTTL ** existing: TIME_DELAY specifying the minimum time to live in seconds of the access reference. VOX:Image_FileSize ** new: DATA_SIZE representing the actual or estimated size of the encoded image in bytes (not pixels!). From dtody at nrao.edu Tue Jun 17 21:17:36 2003 From: dtody at nrao.edu (Doug Tody) Date: Tue, 17 Jun 2003 22:17:36 -0600 (MDT) Subject: UCD for SIAP In-Reply-To: <02f601c3350b$83283ab0$6b91d783@cacr.caltech.edu> Message-ID: Roy - Certainly we can replace the VOX: namespace UCDs with global UCDs, so long as we are willing to make up new UCDs where needed. This might be a good short-term solution. The key problem I see with trying to use existing UCDs is that historically UCDs have been used primarily as fuzzy tags to link similar fields in catalogs. In data access metadata such as is introduced in SIA we are using UCDs to identify the fields of a formal data model. Here the tag is not fuzzy at all, linking similar fields of unrelated catalogs, rather it is a link to a field of a formally defined data model. Precision is important for these data models - we are precisely defining attributes of the data model. We should formally define data models such as spectralBandpass or WCS and define, as part of the data model, the UCD tag used to identify an attribute of the data model. When we represent a data model as a set of related columns in a table, or as an entity struct in XML (as in IDHA or HDX), we will use the UCDs to formally type the data model attributes so that programs can use them unambiguously, so that we can use XML Schemas for automated validation, and so forth. This one-to-one mapping of UCDs to formal data models is a concept that does not currently exist in UCDs. If we try to take a more classical UCD-like approach and use UCDs to associate "similar" fields of different data models, then we no longer have a precisely defined data model. This association of "similar" fields of different data models should occur in the data model definitions, where data models may define attributes in terms of more fundamental data models or quantities. Some more specific comments based on your proposed UCDs. I haven't tried to be complete, these are only examples. > example is there a reason for VOX:Image_AccessReference as a new UCD, why > can't we simply use DATA_LINK from the existing set? This would work if we have only one DATA_LINK in an interface such as SIA, and we define that within this particular interface, DATA_LINK means the formally defined SIA Image_AccessReference. If for some reason we have two DATA_LINK attributes then we are in trouble, as the type is then overloaded and the meaning is ambiguous. The problem with what you suggest is that we are inherently overloading the type. It might work for a while, but will cause a problem in the future if we apply the same logic to a similar attribute. Since the attribute is precisely defined we gain nothing by using a fuzzy tag. > VOX:Image_Naxes > ** new: POS_TRANSF_WCS_NAXES > specifying the number of image axes. > > VOX:Image_Naxis > ** new: POS_TRANSF_WCS_NAXIS > NOTE: Can a UCD refer to an array like this? > with the array value giving the length in pixels of each image axis. This sort of mapping of the WCS data model onto "standard" UCDs (if newly defined) is certainly possible, so long as we define these tags as part of the WCS data model. However, the geometry of an image (NAXES, NAXIS) is not really part of the WCS - these are image attributes (in FITS they existed long before WCS). A WCS is associated with an image. CDELT, FRAME, etc., are part of the WCS. Do we need an image geometry data model? A general image attributes data model? NAXES, NAXIS are clearly (or shall we say, clearly should be) precisely defined terms of some formal image data model. > VOX:BandPass_ID > ** existing: INST_FILTER_CODE > identifying the bandpass by name (e.g., "V", "SDSS_U", "K", "K-Band", etc.). BandPass is a more general concept that "instrument" or "filter". Filters and instruments are examples of specific entities that have a bandpass. "bandpass" should be a formal data model with "bandpass"-specific attributes (UCDs). > VOX:BandPass_Unit > ** existing: UNITS -- but should be GROUPED with Bandpass > identifying the units used to represent spectral values, selected from > "meters", "hertz", and "keV". What do you do once there are two data models both of which need to define their units? GROUPED above, whatever that is, may address this, but it is better to have an explicit, unambiguous attribute. The units have to be precisely defined. The data model could define a default if this parameter is absent. > VOX:Image_PixFlags > ** existing: CODE_MISC > specifying the type of processing done by the image service to produce an > output image pixel. The string value should be formed from some > > combination of the following character codes: C, F, X, Z, V What do you do as soon as there are two fields with UCD=CODE_MISC in the metadata? In the current SIA, the implication is that Image_PixFlags is an attribute of some sort of "image" data model used in SIA, hence it has a precise definition. Similar comments apply to other UCDs where we confuse UCD associations with data model terms. I think it would help a lot if we just took one of these simple data models, e.g., spectralBandpass, and formally defined it, with UCDs assigned to identify attributes. Then we could use the same approach to define all the other SIA image attributes, grouped by data model. Later perhaps we can show how to define data models in terms of other data models or ultimately Quantities, to fully define complex data objects via a hierarchy of formal definitions. - Doug From derriere at newb6.u-strasbg.fr Fri Jun 20 08:34:56 2003 From: derriere at newb6.u-strasbg.fr (Sebastien Derriere) Date: Fri, 20 Jun 2003 17:34:56 +0200 Subject: UCD for SIAP References: Message-ID: <3EF329A0.6B013D17@astro.u-strasbg.fr> Doug Tody wrote: > > The key problem I see with trying to use existing UCDs is that historically > UCDs have been used primarily as fuzzy tags to link similar fields in > catalogs. In data access metadata such as is introduced in SIA we are > using UCDs to identify the fields of a formal data model. Here the tag > is not fuzzy at all, linking similar fields of unrelated catalogs, rather > it is a link to a field of a formally defined data model. Precision is > important for these data models - we are precisely defining attributes > of the data model. > > We should formally define data models such as spectralBandpass or WCS > and define, as part of the data model, the UCD tag used to identify an > attribute of the data model. When we represent a data model as a set of > related columns in a table, or as an entity struct in XML (as in IDHA or > HDX), we will use the UCDs to formally type the data model attributes so > that programs can use them unambiguously, so that we can use XML Schemas > for automated validation, and so forth. Hello, The primary goal of UCDs is to ensure interoperability between heterogeneous datasets. That's why they have been defined to some "reasonable" level of precision (what you call fuzziness). Internal attributes of a formally defined data model can be defined at any level of precision, and have their own names. But you can have *in addition* a UCD attached to every attribute (see the case of the IDHA model). Those UCD can ensure interoperability between different data models, and between data models and datasets. The names of the attributes can not a priori ensure this task, because nothing prevents from having the same concept named differently in different models. Sebastien. -- _______ / ~ /, Sebastien Derriere mailto:derriere at astro.u-strasbg.fr / ~~~~ // Observatoire de Strasbourg Phone +33 (0) 390 242 444 /______// 11, rue de l'universite Telefax +33 (0) 390 242 417 (______(/ F-67000 Strasbourg France From dtody at nrao.edu Fri Jun 20 11:55:29 2003 From: dtody at nrao.edu (Doug Tody) Date: Fri, 20 Jun 2003 12:55:29 -0600 (MDT) Subject: UCD for SIAP In-Reply-To: <3EF329A0.6B013D17@astro.u-strasbg.fr> Message-ID: Sebastien - Good - we agree I think. This is exactly the point I was trying to make about data models and UCDs. The attributes of formal data models need to be defined precisely and unambiguously. The attributes of different data models need to be uniquely identified by some means, e.g., a globally unique name or reference (e.g., a form of UCD), a namespace (e.g., our temporary VOX namespace), or some hierarchical structure as in IDHA. The attributes of different data models, although they need to be distinguished from one another, may well share the same fundamental type, and a UCD could be used to express this. Using different approaches to naming data model attributes and types (UCDs), as I think you are suggesting below, is one way to solve the problem. This provides both the precision required to identify DM attributes, and the means to associate elements of different data models for interoperability. The only problem I see with this is that we would like flexibility in how we represent data models and metadata. Mapping DM attributes into the columns of a flat table, as in SIA or in a FITS header, is convenient and can simplify representations, up to a point. If datasets get complex enough then eventually one needs more structure and an approach such as IDHA or HDX may be called for. In many cases the simpler representation is adequate. It would be good if the underlying mechanisms, such as UCDs and how we define data models, were flexible enough to permit a variety of such representations. If we map the attributes of a DM into table columns and we do NOT use the UCD to identify the DM attribute, then we need another tag of some sort for this purpose. This would be no problem in XML, but we would have the nuisance of carrying along an additional tag separate from the UCD. In VOTable this would give us NAME, ID, UCD, plus a new tag for the formal DM attribute assocation (conceivably ID could be used for this purpose but it already has other uses). In a representation such as FITS, (e.g., if we try to represent VO data in FITS), then it is harder. In this case one might want to use the comment field of a FITS keyword to contain something like a UCD: keyword = value / UCD. I am not saying we necessarily want to do this, but it is an example of representation flexibility and it would be good if our scheme could extend to this level. If we DO use the UCD to carry this additional meaning, then the global UCD namespace could include both formal DM attribute names, and the more fundamental types used to associate different data elements as at present. UCDs would then provide a global naming index, with a single name (the UCD) being sufficient to carry all this meaning. Given the UCDs and an understanding of the associated DM (stored separately) we would then be able to recognize that different metadata elements (table columns in this case) are associated, define and use an XML schema to verify the integrity of the DM subset in these columns, use semantic relationships for inference, and so forth. In this case what we would do is use the UCD tag in a representation to convey the data model attribute name, uniquely identifying both the data model and the attribute of the data model. The formal definition of the DM would then define each attribute of the DM, ** giving for each attribute the UCD type of the attribute **. If this UCD type is elemental then we would have the desired interoperability, and the means to associate and compare similar data elements. UCDs would thus provide the metadata "glue" to link related concepts such as fundamental quantities and data models, making possible a uniform representation for both. To summarize, UCDs or something like them can play a key role to structure and link fundamental metadata and data models. The issue has already come up in interfaces like SIA and IDHA. Can we come up with something which is sufficiently powerful and general to provide both types of representations? - Doug On Fri, 20 Jun 2003, Sebastien Derriere wrote: > Doug Tody wrote: > > > > The key problem I see with trying to use existing UCDs is that historically > > UCDs have been used primarily as fuzzy tags to link similar fields in > > catalogs. In data access metadata such as is introduced in SIA we are > > using UCDs to identify the fields of a formal data model. Here the tag > > is not fuzzy at all, linking similar fields of unrelated catalogs, rather > > it is a link to a field of a formally defined data model. Precision is > > important for these data models - we are precisely defining attributes > > of the data model. > > > > We should formally define data models such as spectralBandpass or WCS > > and define, as part of the data model, the UCD tag used to identify an > > attribute of the data model. When we represent a data model as a set of > > related columns in a table, or as an entity struct in XML (as in IDHA or > > HDX), we will use the UCDs to formally type the data model attributes so > > that programs can use them unambiguously, so that we can use XML Schemas > > for automated validation, and so forth. > > Hello, > > The primary goal of UCDs is to ensure interoperability between > heterogeneous datasets. That's why they have been defined to some > "reasonable" level of precision (what you call fuzziness). > Internal attributes of a formally defined data model can be defined > at any level of precision, and have their own names. But you can > have *in addition* a UCD attached to every attribute (see the case > of the IDHA model). Those UCD can ensure interoperability between > different data models, and between data models and datasets. > The names of the attributes can not a priori ensure this task, > because nothing prevents from having the same concept named > differently in different models. > > Sebastien. From dtody at nrao.edu Fri Jun 20 17:13:10 2003 From: dtody at nrao.edu (Doug Tody) Date: Fri, 20 Jun 2003 18:13:10 -0600 (MDT) Subject: UCD for SIAP In-Reply-To: <02f601c3350b$83283ab0$6b91d783@cacr.caltech.edu> Message-ID: Roy - I don't think we are actually very far apart on this. The main thing I am trying to do is determine how to deal with data models in DAL headers such as for SIA. An SIA contains two types of information all collected together in a flat votable: - Attributes which are part of the SIA interface (e.g., AccessReference). - Attributes which are part of formal data models which are mapped into the columns of the table, e.g., BandPass, WCS. The current SIA defines these as part of the interface but in principle what you see represents formally defined data models which are defined external to SIA. The SIA interface attributes could use standard UCDs (e.g., DATA_LINK) since the SIA interface is a controlled namespace and we can precisely defined what UCDs mean when present in an SIA votable. The data model attributes are something different. In general we might want to use any set of data models to define the attributes of a dataset described by a DAL service. A data model like BandPass or WCS should stand on its own, and be reusable in various contexts, e.g., any DAL service, or any VOTable (or VO-in-FITS) representation of a dataset. If we map the attributes of a data model into the columns of a VOtable (or the keywords of a FITS header) then the attributes need to be uniquely named so that they don't get confused with one another, so that we can verify data model integrity (no missing required attributes), and so forth. Some mechanism is needed to uniquely and unambiguously identify the attributes of a data model in such a context. It doesn't have to be the UCD, but this seems to be a natural extension of the UCD concept and usage. Your suggested UCDs for BandPass are close to what is needed, since they are basically direct one-to-one mappings of the data model attributes. We just need to go one step further and define a formal relationship between such UCDs and data model attributes. I went back and looked at your slides from the IVOA workshop again to see what I could learn. The summary slide goes like this: UCD is inherently fuzzy UCD is a description, not a unique name ... UCD will be eventually replaced by "pointers into data model" It seems to me this is exactly what we have been talking about. Conventional UCDs are fuzzy tags used to associate similar data elements. If we use a UCD to tag an attribute of a data model this UCD is a **pointer into a data model**. If we then go and look up the definition of this data model, it may state that the pointed-to DM attribute in turn has a UCD defining its "type", allowing it to be associated with other similar data elements in the more conventional UCD way. It is not clear if UCDs really need to be replaced by a new scheme providing pointers into data models - they come pretty close to this already. This is what SIA is trying to do; some such scheme is needed to represent data models in DAL protocols like SIA, even for these early versions. Well that's it for me for a while as I am about to leave on travel. I do think we are close to resolving this, and maybe even taking a step forward towards integrating data models and metadata via UCDs. We would like to do something more standard for these new "data model" UCDs in the upcoming DAL interfaces. - Doug On Tue, 17 Jun 2003, Roy Williams wrote: > One of my actions as part of the UCD steering committee is to work with the > "VOX" new UCDs that were created for the SIAP protocol. > > Below, I have listed all of the VOX UCDs that could find in the SIAP > definition, with a little bit of the description. I have tried to find an > equivalent in the standard UCD set, and in many cases was successful. For > example is there a reason for VOX:Image_AccessReference as a new UCD, why > can't we simply use DATA_LINK from the existing set? This is why it says > "**existing: DATA_LINK" for that entry. > > Of course, there will a revision when we steel on the "base+modifier" scheme > for UCD that we decided in Cambridge. > > Please can some of you look at the suggested equivalents fron the exisiting > set, and the suggested new UCDs in the listing below? We gain > interoperability from reusing and following the existing tree -- but do we > lose anything? > > Thenk You > Roy > > -------- > Caltech Center for Advanced Computing Research > roy at cacr.caltech.edu > 626 395 3670 > > VOX:Image_Title > ** existing: ID_IMAGE > containing a short (usually one line) description of the image. > > VOX:Image_MJDateObs > ** existing: TIME_DATE > with datatype="double", representing the mean modified Julian date of the > observation. > > VOX:Image_Naxes > ** new: POS_TRANSF_WCS_NAXES > specifying the number of image axes. > > VOX:Image_Naxis > ** new: POS_TRANSF_WCS_NAXIS > NOTE: Can a UCD refer to an array like this? > with the array value giving the length in pixels of each image axis. > > VOX:Image_Scale > ** new: POS_TRANSF_WCS_CDELT > NOTE: Can a UCD refer to an array like this? > with the array value giving the scale in degrees per pixel of each image > axis. > > VOX:Image_Format > ** new: DATA_TYPE_MIME > specifying the MIME-type of the object associated with the image acref, > e.g., image/fits", "text/html", and so forth. > > VOX:STC_CoordRefFrame > ** existing: ID_FRAME > representing the coordinate system reference frame, selected from "ICRS", > "FK5", "FK4", "ECL", "GAL", and "SGAL". > > VOX:STC_CoordEquinox > ** existing: TIME_EQUINOX > representing the Equinox (not required for ICRS) of the coordinate system > used for the image world coordinate system (WCS). > > VOX:WCS_CoordProjection > ** new: POS_TRANSF_WCS_CTYPE > with the array value being the three-character code ("TAN", "ARC", "SIN", > etc.) > > VOX:WCS_CoordRefPixel > ** new: POS_TRANSF_WCS_CRPIX > with the array value specifying the image pixel coordinates of the WCS > > reference pixel. This is identical to "CRPIX" in FITS WCS. > > VOX:WCS_CoordRefValue > ** new: POS_TRANSF_WCS_CRVAL > with the array value specifying the world coordinates of the WCS reference > pixel. This is identical to "CRVAL" in FITS WCS. > > VOX:WCS_CDMatrix > ** new: POS_TRANSF_WCS_CD > with the array (matrix) value specifying the WCS CD matrix. > > VOX:BandPass_ID > ** existing: INST_FILTER_CODE > identifying the bandpass by name (e.g., "V", "SDSS_U", "K", "K-Band", etc.). > > VOX:BandPass_Unit > ** existing: UNITS -- but should be GROUPED with Bandpass > identifying the units used to represent spectral values, selected from > "meters", "hertz", and "keV". > > VOX:BandPass_RefValue > ** new: INST_FILTER_REF > specifying the characteristic (reference) frequency, wavelength, or energy > for the bandpass model. > > VOX:BandPass_HiLimit > ** new: INST_FILTER_MAX > specifying the upper limit of the bandpass. > > VOX:BandPass_LoLimit > ** new: INST_FILTER_MIN > specifying the lower limit of the bandpass. > > VOX:Image_PixFlags > ** existing: CODE_MISC > specifying the type of processing done by the image service to produce an > output image pixel. The string value should be formed from some > > combination of the following character codes: C, F, X, Z, V > > VOX:Image_AccessReference > ** existing: DATA_LINK > specifying the URL to be used to access or retrieve the image. > > VOX:Image_AccessRefTTL > ** existing: TIME_DELAY > specifying the minimum time to live in seconds of the access reference. > > VOX:Image_FileSize > ** new: DATA_SIZE > representing the actual or estimated size of the encoded image in bytes (not > pixels!). > >