Resource and Service Metadata for the Virtual Observatory

Version 6

February 2003

 

Bob Hanisch

NVO Metadata Working Group

IVOA Interoperability Working Group

 

 

Introduction

 

An essential capability of the Virtual Observatory is a means for describing what data and computational facilities are available where, and once identified, how to use them.  The data themselves have associated metadata (e.g., FITS keywords), and similarly we require metadata about data collections and data services so that VO users can easily find information of interest.  Furthermore, such metadata are needed in order to manage distributed queries efficiently; if a user is interested in finding x-ray images there is no point in querying the HST archive, for example.  In this document we suggest an architecture for resource and service metadata and describe the relationship of this architecture to emerging Web Services standards.  We also define an initial set of metadata concepts.

 

 

Architecture

 

In order to make it easy for astronomy information services to participate in the VO, we propose a hierarchical system for metadata management.  At the top level we require a minimum amount of information, sufficient primarily to note the existence of a resource and to describe who is responsible for it.  At lower levels, the metadata are more extensive and complex, allowing for the description of query syntax, access protocols, and usage policies.

 

A service is any VO element that can be invoked by the user to perform some action on their behalf.  Associated with any service is descriptive metadata about the service.  Metadata generally include information the user needs to determine if a service is of interest and how the service may be invoked.  Specific types of metadata are described below.  Note that the service itself need not be aware of the metadata that describe it.


A query service supports a query/response protocol.  The user submits a query to the service that may define characteristics of interest, and the service returns a set of information to the user.  The query may be null, e.g., a current-time service may only support a null query, and some services may respond to a null query with appropriate default actions.  Non-query services may also exist, e.g., services to copy or delete files on remote files systems, to mail information to other users, to kill existing jobs, to authorize actions, etc.

A registry is a query service for which the response is a structured description of other services.  The services described by a registry may be of any type.  The registry may support a query that allows the user to indicate which services might be of interest.

A resource is a collection of one or more services, or other resources, that share some common metadata characteristics (e.g., Publisher, Creator, Contributor, Identifier, Contact, Type, Facility).  The extent of commonality depends upon the resource.  The services described by a resource may themselves be resources.  For example, MAST, HEASARC, IRSA, NED, et al., are resources.  Each of these contains other resources, e.g., the HST archive in MAST.  They also contain specific services, such as an HST observation log query service or a cone search service.  A resource must include at least a minimalist service, i.e., a URL for a web site.  One could in principle describe all of NASA astrophysics data holdings as a resource, or all of NVO as a resource, but aggregates of this scale circumvent the goal of being able to locate the specific resources and services of interest for a particular application.

Both resources and services are described by metadata.  Resource metadata are high-level and independent of any specific service.  Resource metadata include

 

·                   Curation metadata, which describe who supports the resource and what its purpose is

 

·                   Content metadata, which describe what kind of information is available (types of data, sky coverage, spectral coverage, etc.)

 

Resource metadata are typically not queryable parameters in the underlying services, but rather they encompass information that now is simply “known” to users, or must be discovered through other means.  Astronomers know that the HST archive includes optical images and spectra, for example, or that Vizier provides access to catalogs and tables.  Resource metadata constitute a “yellow pages” of astronomical information.  Resource metadata are analogous to the UDDI (Universal Description, Discovery and Integration) Web Service, and are analogous to the high-level descriptions included in the CDS GLU.

 

Service metadata include metadata that describe the service's interface (its input and output) as well as information that aids in effective use of the service (e.g., range of possible values returned).  Service metadata also describe access methods or protocols.  Service metadata are analogous to WSDL (Web Service Description Language) and the query specifications component of the CDS GLU.

 

These analogies are not perfect.  For example, WSDL can describe multiple services in one file, and UDDI does probably not convey as much information as resource metadata.  Nevertheless, the intention is for the resource metadata to describe what is available, and for the service metadata to describe how to access it.

 

 

Resource Metadata Concepts*

 

Below we describe the concepts we believe are needed in the resource metadata.  These concepts may be instantiated in a variety of standard forms, e.g. XML, UCD tags, or FITS keywords, and with a variety of mechanisms, such as Topic Maps, OWL, or RDBMSs.  Consequently, the exact names and rendering of the values may depend on the particular form in which they are represented.  For example, when Coverage.Spatial is rendered as a FITS keyword record, the name will need to be limited to 8 characters and the value rendered in a pure ASCII form; in contrast, when rendered in XML, it might be better to tag the different components of the value separately.  It will be necessary to define standard renderings for each of these common forms.

 

 

Curation Metadata

 

Title (string)

Definition:  A name given to the resource.
Comment:  Typically, a Title will be a name by which the resource is formally known.
 

Publisher (string)

Definition:  An entity responsible for making the resource available
Comment:   Examples of a Publisher include a person or an organization.  Users of the resource should include Publisher in subsequent credits and acknowledgments.
 

Creator (string)

Definition:  An entity primarily responsible for making the content of the resource.
Comment:   Examples of a Creator include a person or an organization.  Users of the resource should include Creator in subsequent credits and acknowledgments.
 
Creator.Logo (URL)
Definition:  A URL pointing to a graphical logo, which may be used to help identify the information resource.
Comment:  Not in Dublin Core.
 
Subject (string, list)
Definition:  A list of the topics, object types, or other descriptive keywords about the resource.
Comment:  Subject is intended to provide additional information about the nature of the information provided by the resource.  Is this a catalog of quasars?  Of planetary nebulae?  Is this a tool for computing ephemerides?  Terms for Subject should be based on the IAU Astronomy Thesaurus (http://msowww.anu.edu.au/library/thesaurus/).

 

Description (string, free text)

Definition:  An account of the content of the resource.
Comment:  Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.

 

Contributor (string)

Definition:  An entity responsible for making contributions to the content of the resource.

Comment:   Examples of a Contributor include a person or an organization.  Users of the resource should include Contributor in subsequent credits and acknowledgments.
 
Date (string)
Definition:  A date associated with an event in the life cycle of the resource.  Typically, Date will be associated with the creation or availability (i.e., most recent release or version) of the resource.  ISO8601 is the preferred format (YYYY-MM-DD).
 
Version (string)
Definition:  A label associated with the creation or availability (i.e., most recent release or version) of the resource.
Comment:  Not in Dublin Core.

 

Identifier (URI)

Definition:  An unambiguous reference to the resource within a given context.
Comment:  The URI corresponding to the resource.
 
Resource URL (URL)
Definition:  A URL pointing to additional information about the resource.
Comment:  Not in Dublin Core.
 
Service URL  (URL)
Definition:  A URL pointing to additional information about the service or services, describing, e.g., the service type (HTTP, web service, GLU resource).
Comment:  Not in Dublin Core.
 
Contact (string, e-mail address)
Definition:  The e-mail address for contacting the persons responsible for the resource.
Comment:  Not part of the Dublin Core.  Contact is split into two concepts for clarity.
 
               Contact.Name (string)
               Definition:  The name of the contact.
               Comment:  A person’s name, “John P. Jones”, or a group, “Archive Support
                                Team”.
 
               Contact.Email (e-mail address)
               Definition:  The e-mail address of the contact.
               Comment:  For example, “mailto:John.P.Jones@navy.gov”, or 
                               “mailto:archive@datacenter.org”.
 
Content Metadata
 

Type (string, list)

Definition:  The nature or genre of the content of the resource.
Comment:   Type includes terms describing general categories, functions, genres, or aggregation levels for content.  VO Types include:
 
               Type                       Description
               Archive                   Collection of pointed observations
               Survey                    Collection of observations covering substantial and
                                              contiguous areas of the sky
               Catalog                   Collection of derived data, primarily in tabular form
               Bibliography            Collection of bibliographic references, abstracts, and
                                              publications
               Journal                    Collection of scholarly publications under common
                                              editorial policy
               Library                    Collection of published materials (journals, books, etc.)
               Outreach                 Collection of materials appropriate for public outreach, such
                                              as press releases and photo galleries
               Education                Collection of materials appropriate for educational use, such
                                              as teaching resources, curricula, etc.
               EPOResource         Collection of materials that may be suitable for EPO products
                                              but which are not in final product form, as in Type Outreach
                                              or Type Education.  EPOResource would apply, e.g., to archives 
                                              with easily accessed preview images or to surveys with 
                                              easy-to-use images.
               
This list is extensible, and indeed requires further elaboration in the area of computational resources and theoretical models.  Resources providing more than one type of content should list all relevant types.
 
Coverage (string)
Definition:  The extent of scope of the content of the resource.
Comment:  The Dublin Core notion of coverage is too generic to be of much use in the VO, where we need more specific information.  We propose to subset this element as follows:
 
Coverage.Spatial (string)
Definition:  The sky coverage of the resource. 
Comment:  The syntax for the spatial coverage specification is based on the SAO region specifications, though for resource descriptions we need only support a subset.  We also enhance the SAO region specifications with an explicit reference to the coordinate system reference frame.  All positions should be given in degrees.
 
   Region Name          Specification
   Box                         box (cframe, ξmin, ηmin, ξmax, ηmax)
   Circle                      circle (cframe, ξcen, ηcen, radius)
   Polygon                   polygon (cframe, ξ1, η1, ξ2, η2, ξ3, η3, …)
 
ξ and η represent coordinates in the appropriate frame (α, δ; l, b; …).  Compound regions may be constructed with or’s.  The coordinate system reference frame is specified as follows (see http://aladin.u-strasbg.fr/java/doctech/cds.astro.astroframe.html for additional details):
 
   cframe                    Description
   ICRS                      International Celestial Reference System
   FK5                        Equatorial coordinates, FK5 system (J2000)
   FK4                        Equatorial coordinates, FK4 system (B1950)
   ECL                        Ecliptic coordinates (J2000)
   GAL                       Galactic coordinates (J2000)
   SGAL                     Supergalactic coordinates (J2000)
 
Coverage.Spectral (string, list)
Definition:  The spectral coverage of the resource.
Comment:  Spectral coverage at the resource level will be in terms of general spectral regions (gamma-ray, x-ray, extreme UV, UV, optical, infrared, radio).  The general spectral regions are defined specifically as follows:
 
   Coverage.Spectral   Represents
   Radio                      l ≥ 100 μ
                                  n £ 3000 GHz
   Infrared                   1 μ £ l £ 100 μ
   Optical                    0.3 μ £ l £ 1 μ
                                  300 nm £ l £ 1000 nm
                                  3000 Å £ l £ 10000 Å
   UV                         0.1 μ £ l £ 0.3 μ
                                  1000 Å £ l £ 3000 Å
   EUV                       100 Å £ l £ 1000 Å
                                  12 eV £ E £ 120 eV
   X-ray                      0.1 Å £ l £ 100 Å
                                  0.12 keV £ E £ 120 keV
   Gamma-ray             E ≥ 120 keV
 
Resources containing data in multiple spectral regions may give a list (e.g., “Radio, Infrared”).  
 
Coverage.Spectral.Bandpass  (string, list)
Definition:  A specific bandpass specification.
Comment:  Some resources and services may choose to give spectral coverage in more specific terms than the general spectral regions.  The list of possible bandpass names is too lengthy to enumerate here, but would include optical bandpasses (U, V, B, R, I), narrow line filters (H-alpha, [OIII]), or other specific bandpass names.   [Should it be required for any named bandpass to have an associated URL with the actual transmission curve (available as a VOTable)?]
 
Coverage.Temporal (string, list)
Definition:  The temporal coverage of the resource.
Comment:  Temporal coverage specifications will be given in years, with decimal years permitted.  Ranges are specified with a hyphen, e.g., “1987-1993” or “1998.275- ”.  Disjoint time spans may be given as a list, e.g., “1981-1984, 1987-1990”.
 
ContentLevel (string, list)
Definition:  A description of the content level, or intended audience.
Comment:  VO resources will be available to professional astronomers, amateur astronomers, educators, and the general public.  These different audiences need a way to find material appropriate for their needs.
 
               ContentLevel                          Definition
               General                                  Resource provides information appropriate for all users
               Elementary Education              Resource provides information appropriate for grades
                                                             K-5 education
               Middle School Education          Resource provides information appropriate for grades
                                                             6-8 education
               Secondary Education               Resource provides information appropriate for grades
                                                             9-12 education
               University                               Resource provides information appropriate for 
                                                             university-level education
               Research                                Resource provides information appropriate for 
                                                             professional-level research and graduate school 
                                                             education
               Amateur                                 Resource provides information of interest to amateur
                                                             Astronomers
 
Facility (string)
Definition:  The observatory or facility where the data was obtained.
Comments:  Not in Dublin Core.  Some resources are likely to hold data from multiple observatories.  If just a few, this could be a list; if very many, just say “many”.  Theoretical data will not originate with an observatory, but rather might be characterized by the computational facility used to create them (NCSA, SDSC, etc.).  
 
Instrument (string)
Definition:  The instrument used to collect the data.
Comments:  Not in Dublin Core.  Can be a specific instrument name (Wide Field/Planetary Camera 2) or generic instrument type (CCD camera).  Theoretical data is produced by a computer code, and the name of the code could be specified.
 
Format (string)
Definition:  The encoding format of data provided by the resource.
Comments:  Typical values would be “FITS”, “ASCII text”, “HTML”, “XML”, “VOTable”, “GIF”, etc.  Dublin Core notion of Format is different, but very flexible.  We recommend employing MIME types here in order to utilize existing standards.
 
Rights (string)
Definition:  Information about rights held in and over the resource.
Comment:  Dublin Core uses Rights to describe copyright and other intellectual property rights issues.  In the VO context Rights would describe access privileges, using the following values: public, proprietary, mixed.
 
 
Resource metadata will be collected through resource registration services, i.e., web forms that present a resource curator with the requisite fields and enumerated lists, and construct a resource descriptor in a standard format (such as VOTable).  If content elements are not relevant for a given resource Type (e.g., Type “journal” does not have meaningful spatial coverage, Facility, or Instrument, though it does have a valid temporal coverage), then they should take on a “not applicable” value.  The resource registration service should not allow fields to be left unspecified.
 
Example:  The Sloan Digital Sky Survey data as hosted by MAST at STScI.
 
Title                           Sloan Digital Sky Survey
Publisher                   Space Telescope Science Institute/MAST
Creator                      Sloan Digital Sky Survey Consortium
Creator.Logo             http://archive.stsci.edu/images/sdss_logo.gif
Subject                      galaxies, quasars, stars, CCD photometry, spectroscopy, redshift, 
                                  sky surveys
Description                               The Sloan Digital Sky Survey is using a dedicated 2.5 m telescope and a large format CCD camera to obtain images of over 10,000 square degrees of high Galactic latitude sky in five broad bands (u', g', r', i' and z', centered at 3540, 4770, 6230, 7630, and 9130 Å, respectively). Medium resolution spectra will be obtained for approximately 106 galaxies and 100,000 quasars. The early data release (EDR), on June 2001, includes searchable catalogs of images and spectra, images for display and scientific purpose in both 2-D FITS and JPEG formats, and spectra in both 1-D FITS and GIF formats. The EDR covers about 460 square degrees of sky. The next data releases will occur every 18 months or so.
Contributor                               Sloan Digital Sky Survey Consortium
Date                          2003-02-01
Version                      SDSS EDR
Identifier                   http://archive.stsci.edu/sdss/
Resource URL           http://archive.stsci.edu/sdss/index.html
Service URL              http://skyserver.pha.jhu.edu/en/tools/getimg/fields.asp
Contact.Name            Archive Branch, Space Telescope Science Institute
Contact.Email            mailto:archive@stsci.edu
Type                          Survey, Catalog, EPOResource
Coverage.Spatial       BOX (FK5 145.17 -1.25 235.9 1.25) OR BOX (FK5
                                  250.71 52.15 267 66.29) OR BOX (FK5 350.43 -1.25
                                  416.37 1.17)
Coverage.Spectral     Optical
Coverage.Spectral.    u’, g’, r’, i’, z’
     Bandpass              
Coverage.Temporal   1999.92-
ContentLevel             Research
Facility                      Apache Point Observatory, Sloan 2.5-m Telescope
Instrument                 Five-band clocked CCD camera
Format                       FITS, GIF, JPEG (image/FITS, image/gif, image/jpg)
Rights                        Public
 
 
 
Changes from V5
 
·    Added Date element, following Dublin Core and in response to October 2002 discussion.
·    Added Version element, in response to October 2002 discussion.  Not in Dublin Core.
·    Added Creator.Logo element, in response to October 2002 discussion.  Not Dublin Core.
·    Added Subject element, to replace Coverage.Topics, as the former is Dublin Core, the latter is not, and the intent is basically the same.
·    Added ContentLevel element, to replace Coverage.Level.  Upon reflection I thought that ContentLevel was both a better label and was of a different nature than the other Coverage elements.  M. Voit will provide further inputs on appropriate element values.
·    Updated example to use new metadata elements.


·* Resource metadata concepts are drawn from the Dublin Core, http://dublincore.org/documents/dces/, except where otherwise noted.)