Resource and Service Metadata for the Virtual Observatory
NVO Metadata Working Group
An essential capability of the Virtual Observatory is a means for describing what data and computational facilities are available where, and once identified, how to use them. The data themselves have associated metadata (e.g., FITS keywords), and similarly we require metadata about data collections and data services so that VO users can easily find information of interest. Furthermore, such metadata are needed in order to manage distributed queries efficiently; if a user is interested in finding x-ray images there is no point in querying the HST archive, for example. In this document we suggest an architecture for resource and service metadata and describe the relationship of this architecture to emerging Web Services standards. We also define an initial set of metadata concepts.
In order to make it easy for astronomy information services to participate in the VO, we propose a hierarchical system for metadata management. At the top level we require a minimum amount of information, sufficient primarily to note the existence of a resource and to describe who is responsible for it. At lower levels, the metadata are more extensive and complex, allowing for the description of query syntax, access protocols, and usage policies.
A service is any VO element that can be invoked by the user to perform some action on their behalf. Associated with any service is descriptive metadata about the service. Metadata generally include information the user needs to determine if a service is of interest and how the service may be invoked. Specific types of metadata are described below. Note that the service itself need not be aware of the metadata that describe it.
A query service supports a query/response protocol. The user submits a query to the service that may define characteristics of interest, and the service returns a set of information to the user. The query may be null, e.g., a current-time service may only support a null query, and some services may respond to a null query with appropriate default actions. Non-query services may also exist, e.g., services to copy or delete files on remote files systems, to mail information to other users, to kill existing jobs, to authorize actions, etc.
A registry is a query service for which the response is a structured description of other services. The services described by a registry may be of any type. The registry may support a query that allows the user to indicate which services might be of interest.
A resource is a collection of one or more services, or other resources, that share some common metadata characteristics (e.g., Publisher, Creator, Contributor, Identifier, Contact, Type, Facility). The extent of commonality depends upon the resource. The services described by a resource may themselves be resources. For example, MAST, HEASARC, IRSA, NED, et al., are resources. Each of these contains other resources, e.g., the HST archive in MAST. They also
contain specific services, such as an HST observation log query service or a cone search service. A resource must include at least a minimalist service, i.e., a URL for a web site. One could in principle describe all of NASA astrophysics data holdings as a resource, or all of NVO as a resource, but aggregates of this scale circumvent the goal of being able to locate the specific resources and services of interest for a particular application.
Both resources and services are described by metadata. Resource metadata are high-level and independent of any specific service. Resource metadata include
· Curation metadata, which describe who supports the resource and what its purpose is
· Content metadata, which describe what kind of information is available (types of data, sky coverage, spectral coverage, etc.)
Resource metadata are typically not queryable parameters in the underlying services, but rather they encompass information that now is simply “known” to users, or must be discovered through other means. Astronomers know that the HST archive includes optical images and spectra, for example, or that Vizier provides access to catalogs and tables. Resource metadata constitute a “yellow pages” of astronomical information. Resource metadata are analogous to the UDDI (Universal Description, Discovery and Integration) Web Service, and are analogous to the high-level descriptions included in the CDS GLU.
Service metadata include metadata that describe the service's interface (its input and output) as well as information that aids in effective use of the service (e.g., range of possible values returned). Service metadata also describe access methods or protocols. Service metadata are analogous to WSDL (Web Service Description Language) and the query specifications component of the CDS GLU.
These analogies are not perfect. For example, WSDL can describe multiple services in one file, and UDDI does probably not convey as much information as resource metadata. Nevertheless, the intention is for the resource metadata to describe what is available, and for the service metadata to describe how to access it.
Below we describe the concepts we believe are needed in the resource metadata. These concepts may be instantiated in a variety of standard forms, e.g. XML, UCD tags, or FITS keywords, and with a variety of mechanisms, such as Topic Maps, OWL, or RDBMSs. Consequently, the exact names and rendering of the values may depend on the particular form in which they are represented. For example, when Coverage.Spatial is rendered as a FITS keyword record, the name will need to be limited to 8 characters and the value rendered in a pure ASCII form; in contrast, when rendered in XML, it might be better to tag the different components of the value separately. It will be necessary to define standard renderings for each of these common forms.
Definition: A name given to the resource.
Comment: Typically, a Title will be a name by which the resource is formally known.
Definition: An entity responsible for making the resource available
Comment: Examples of a Publisher include a person or an organization. Users of the resource should include Publisher in subsequent credits and acknowledgments.
Definition: An entity primarily responsible for making the content of the resource.
Comment: Examples of a Creator include a person or an organization. Users of the resource should include Creator in subsequent credits and acknowledgments.
Definition: An account of the content of the resource.
Comment: Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.
Comment: Examples of a Contributor include a person or an organization. Users of the resource should include Contributor in subsequent credits and acknowledgments.
Definition: An unambiguous reference to the resource within a given context.
Comment: The URI corresponding to the resource.
Resource URL (URL)
Definition: A URL pointing to additional information about the resource.
Comment: Not in Dublin Core.
Service URL (URL)
Definition: A URL pointing to additional information about the service or services, describing, e.g., the service type (HTTP, web service, GLU resource).
Comment: Not in Dublin Core.
Contact (string, e-mail address)
Definition: The e-mail address for contacting the persons responsible for the resource.
Comment: Not part of the Dublin Core. Contact is split into two concepts for clarity.
Definition: The name of the contact.
Comment: A person’s name, “John P. Jones”, or a group, “Archive Support Team”.
Contact.Email (e-mail address)
Definition: The e-mail address of the contact.
Comment: For example, “mailto:John.P.Jones@navy.gov”, or “mailto:email@example.com”.
Definition: The nature or genre of the content of the resource.
Comment: Type includes terms describing general categories, functions, genres, or aggregation levels for content. VO Types include:
Archive Collection of pointed observations
Survey Collection of observations covering substantial and contiguous areas of
Catalog Collection of derived data, primarily in tabular form
Bibliography Collection of bibliographic references, abstracts, and publications
Journal Collection of scholarly publications under common editorial policy
Library Collection of published materials (journals, books, etc.)
Outreach Collection of materials appropriate for public outreach, such as press
releases and photo galleries
Education Collection of materials appropriate for educational use, such as teaching
resources, curricula, etc.
EPOResource Collection of materials that may be suitable for EPO products but which
are not in final product form, as in Type Outreach or Type Education.
EPOResource would apply, e.g., to archives with easily accessed
preview images or to surveys with easy-to-use images.
This list is extensible, and indeed requires further elaboration in the area of computational resources and theoretical models. Resources providing more than one type of content should list all relevant types.
Definition: The extent of scope of the content of the resource.
Comment: The Dublin Core notion of coverage is too generic to be of much use in the VO, where we need more specific information. We propose to subset this element as follows:
Definition: The sky coverage of the resource.
Comment: The syntax for the spatial coverage specification is based on the SAO region specifications, though for resource descriptions we need only support a subset. We also enhance the SAO region specifications with an explicit reference to the coordinate system reference frame. All positions should be given in degrees.
Region Name Specification
Box box (cframe, ξmin, ηmin, ξmax, ηmax)
Circle circle (cframe, ξcen, ηcen, radius)
Polygon polygon (cframe, ξ1, η1, ξ2, η2, ξ3, η3, …)
ξ and η represent coordinates in the appropriate frame (α, δ; l, b; …). Compound regions may be constructed with or’s. The coordinate system reference frame is specified as follows (see http://aladin.u-strasbg.fr/java/doctech/cds.astro.astroframe.html for additional details):
ICRS International Celestial Reference System
FK5 Equatorial coordinates, FK5 system (J2000)
FK4 Equatorial coordinates, FK4 system (B1950)
ECL Ecliptic coordinates (J2000)
GAL Galactic coordinates (J2000)
SGAL Supergalactic coordinates (J2000)
Coverage.Spectral (string, list)
Definition: The spectral coverage of the resource.
Comment: Spectral coverage at the resource level will be in terms of general spectral regions (gamma-ray, x-ray, extreme UV, UV, optical, infrared, radio). The general spectral regions are defined specifically as follows:
Radio λ ≥ 100 μ
ν ≤ 3000 GHz
Infrared 1 μ ≤ λ ≤ 100 μ
Optical 0.3 μ ≤ λ ≤ 1 μ
300 nm ≤ λ ≤ 1000 nm
3000 Å ≤ λ ≤ 10000 Å
UV 0.1 μ ≤ λ ≤ 0.3 μ
1000 Å ≤ ë ≤ 3000 Å
EUV 100 Å ≤ ë ≤ 1000 Å
12 eV ≤ E ≤ 120 eV
X-ray 0.1 Å ≤ ë ≤ 100 Å
0.12 keV ≤ E ≤ 120 keV
Gamma-ray E ≥ 120 keV
Resources containing data in multiple spectral regions may give a list (e.g., “Radio, Infrared”).
Coverage.Spectral.Bandpass (string, list)
Definition: A specific bandpass specification.
Comment: Some resources and services may choose to give spectral coverage in more specific terms than the general spectral regions. The list of possible bandpass names is too lengthy to enumerate here, but would include optical bandpasses (U, V, B, R, I), narrow line filters (H-alpha, [OIII]), or other specific bandpass names. [Should it be required for any named bandpass to have an associated URL with the actual transmission curve (available as a VOTable)?]
Coverage.Temporal (string, list)
Definition: The temporal coverage of the resource.
Comment: Temporal coverage specifications will be given in years, with decimal years permitted. Ranges are specified with a hyphen, e.g., “1987-1993” or “1998.275- ”. Disjoint time spans may be given as a list, e.g., “1981-1984, 1987-1990”.
Coverage.Topics (string, list)
Definition: A list of the topics, object types, or other descriptive keywords about the resource.
Comment: Coverage.Topics is intended to provide additional information about the nature of the information provided by the resource. Is this a catalog of quasars? Of planetary nebulae? Is this a tool for computing ephemerides? Terms for Coverage.Topics should be based on the IAU Astronomy Thesaurus (http://msowww.anu.edu.au/library/thesaurus/).
Coverage.Level (string, list)
Definition: A description of the content level, or intended audience.
Comment: VO resources will be available to professional astronomers, amateur astronomers, educators, and the general public. These different audiences need a way to find material appropriate for their needs.
General Resource provides information appropriate for all users
Elementary Education Resource provides information appropriate for grades
Middle School Education Resource provides information appropriate for grades
Secondary Education Resource provides information appropriate for grades
University Resource provides information appropriate for
Research Resource provides information appropriate for
professional-level research and graduate school
Amateur Resource provides information of interest to amateur
Definition: The observatory or facility where the data was obtained.
Comments: Not in Dublin Core. Some resources are likely to hold data from multiple observatories. If just a few, this could be a list; if very many, just say “many”. Theoretical data will not originate with an observatory, but rather might be characterized by the computational facility used to create them (NCSA, SDSC, etc.).
Definition: The instrument used to collect the data.
Comments: Not in Dublin Core. Can be a specific instrument name (Wide Field/Planetary Camera 2) or generic instrument type (CCD camera). Theoretical data is produced by a computer code, and the name of the code could be specified.
Definition: The encoding format of data provided by the resource.
Comments: Typical values would be “FITS”, “ASCII text”, “HTML”, “XML”, “VOTable”, “GIF”, etc. Dublin Core notion of Format is different, but very flexible. We recommend employing MIME types here in order to utilize existing standards.
Definition: Information about rights held in and over the resource.
Comment: Dublin Core uses Rights to describe copyright and other intellectual property rights issues. In the VO context Rights would describe access privileges, using the following values: public, proprietary, mixed.
Resource metadata will be collected through resource registration services, i.e., web forms that present a resource curator with the requisite fields and enumerated lists, and construct a resource descriptor in a standard format (such as VOTable). If content elements are not relevant for a given resource Type (e.g., Type “journal” does not have meaningful spatial coverage, Facility, or Instrument, though it does have a valid temporal coverage), then they should take on a “not applicable” value. The resource registration service should not allow fields to be left unspecified.
Example: The Sloan Digital Sky Survey data as hosted by MAST at STScI.
Title Sloan Digital Sky Survey
Publisher Space Telescope Science Institute/MAST
Creator Sloan Digital Sky Survey Consortium
Description The Sloan Digital Sky Survey is using a dedicated 2.5 m
telescope and a large format CCD camera to obtain images
of over 10,000 square degrees of high Galactic latitude sky
in five broad bands (u', g', r', i' and z', centered at 3540,
4770, 6230, 7630, and 9130 Å, respectively). Medium
resolution spectra will be obtained for approximately 106
galaxies and 100,000 quasars. The early data release
(EDR), on June 2001, includes searchable catalogs of
images and spectra, images for display and scientific
purpose in both 2-D FITS and JPEG formats, and spectra in
both 1-D FITS and GIF formats. The EDR covers about
460 square degrees of sky. The next data releases will
occur every 18 months or so.
Contributor Sloan Digital Sky Survey Consortium
Reference URL http://archive.stsci.edu/sdss/index.html
Service URL http://skyserver.pha.jhu.edu/en/tools/getimg/fields.asp
Contact.Name Archive Branch, Space Telescope Science Institute
Type Survey, Catalog, EPOResource
Coverage.Spatial BOX (J2000 145.17 -1.25 235.9 1.25) OR BOX (J2000
250.71 52.15 267 66.29) OR BOX (J2000 350.43 -1.25
Coverage.Spectral u’, g’, r’, i’, z’
Coverage.Topics galaxies, quasars, stars, CCD photometry, spectroscopy,
redshift, sky surveys
Facility Apache Point Observatory, Sloan 2.5-m Telescope
Instrument Five-band clocked CCD camera
Format FITS, GIF, JPEG (image/FITS, image/gif, image/jpg)
[*] Resource metadata concepts are drawn from the Dublin Core, http://dublincore.org/documents/dces/ , except where otherwise noted.)