Resource and Service Metadata for the
Virtual Observatory
Version 5
October 2002
NVO Metadata Working Group
An essential capability of the
Virtual Observatory is a means for describing what data and computational
facilities are available where, and once identified, how to use them. The data
themselves have associated metadata (e.g., FITS keywords), and similarly we
require metadata about data collections and data services so that VO users can
easily find information of interest. Furthermore, such metadata are needed in
order to manage distributed queries efficiently; if a user is interested in
finding x-ray images there is no point in querying the HST archive, for
example. In this document we suggest an architecture for resource and service
metadata and describe the relationship of this architecture to emerging Web
Services standards. We also define an initial set of metadata concepts.
In order to make it easy for
astronomy information services to participate in the VO, we propose a
hierarchical system for metadata management. At the top level we require a
minimum amount of information, sufficient primarily to note the existence of a
resource and to describe who is responsible for it. At lower levels, the
metadata are more extensive and complex, allowing for the description of query
syntax, access protocols, and usage policies.
A service is any VO element that can be
invoked by the user to perform some action on their behalf. Associated with any
service is descriptive metadata about the service. Metadata generally include information the user
needs to determine if a service is of interest and how the service may be
invoked. Specific types of metadata are described below. Note that the service
itself need not be aware of the metadata that describe it.
A query service supports a
query/response protocol. The user submits a query to the service that may
define characteristics of interest, and the service returns a set of
information to the user. The query may be null, e.g., a current-time service
may only support a null query, and some services may respond to a null query
with appropriate default actions. Non-query services may also exist, e.g.,
services to copy or delete files on remote files systems, to mail information
to other users, to kill existing jobs, to authorize actions, etc.
A registry is a query
service for which the response is a structured description of other services.
The services described by a registry may be of any type. The registry may
support a query that allows the user to indicate which services might be of
interest.
A resource is a collection
of one or more services, or other resources, that share some common metadata
characteristics (e.g., Publisher, Creator, Contributor, Identifier, Contact,
Type, Facility). The extent of commonality depends upon the resource. The
services described by a resource may themselves be resources. For example,
MAST, HEASARC, IRSA, NED, et al., are resources. Each of these contains other
resources, e.g., the HST archive in MAST. They also
contain specific services, such as an HST
observation log query service or a cone search service. A resource must include
at least a minimalist service, i.e., a URL for a web site. One could in
principle describe all of NASA astrophysics data holdings as a resource, or all
of NVO as a resource, but aggregates of this scale circumvent the goal of being
able to locate the specific resources and services of interest for a particular
application.
Both resources and services are described
by metadata. Resource metadata are
high-level and independent of any specific service. Resource metadata include
· Curation metadata, which describe who supports the resource and what its purpose is
· Content metadata, which describe what kind of information is available (types of data, sky coverage, spectral coverage, etc.)
Resource metadata are typically not
queryable parameters in the underlying services, but rather they encompass
information that now is simply “known” to users, or must be discovered through
other means. Astronomers know that the HST archive includes optical images and
spectra, for example, or that Vizier provides access to catalogs and tables.
Resource metadata constitute a “yellow pages” of astronomical information. Resource
metadata are analogous to the UDDI (Universal Description, Discovery and
Integration) Web Service, and are analogous to the high-level descriptions
included in the CDS GLU.
Service metadata include
metadata that describe the service's interface (its input and output) as well
as information that aids in effective use of the service (e.g., range of
possible values returned). Service metadata also describe access methods or
protocols. Service metadata are analogous to WSDL (Web Service Description Language)
and the query specifications component of the CDS GLU.
These analogies are not perfect. For
example, WSDL can describe multiple services in one file, and UDDI does
probably not convey as much information as resource metadata. Nevertheless, the
intention is for the resource metadata to describe what is available, and
for the service metadata to describe how
to access it.
Below we describe the concepts we believe are needed in the resource metadata. These concepts may be instantiated in a variety of standard forms, e.g. XML, UCD tags, or FITS keywords, and with a variety of mechanisms, such as Topic Maps, OWL, or RDBMSs. Consequently, the exact names and rendering of the values may depend on the particular form in which they are represented. For example, when Coverage.Spatial is rendered as a FITS keyword record, the name will need to be limited to 8 characters and the value rendered in a pure ASCII form; in contrast, when rendered in XML, it might be better to tag the different components of the value separately. It will be necessary to define standard renderings for each of these common forms.
Definition: A name given to the resource.Comment: Typically, a Title will be a name by which the resource is formally known.
Definition: An entity responsible for making the resource availableComment: Examples of a Publisher include a person or an organization. Users of the resource should include Publisher in subsequent credits and acknowledgments.
Definition: An entity primarily responsible for making the content of the resource.Comment: Examples of a Creator include a person or an organization. Users of the resource should include Creator in subsequent credits and acknowledgments.
Definition: An account of the content of the resource.Comment: Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.
Comment: Examples of a Contributor include a person or an organization. Users of the resource should include Contributor in subsequent credits and acknowledgments.
Definition: An unambiguous reference to the resource within a given context.Comment: The URI corresponding to the resource.
Resource URL (URL)
Definition: A URL pointing to additional information about the resource.Comment: Not in Dublin Core.
Service URL (URL)
Definition: A URL pointing to additional information about the service or services, describing, e.g., the service type (HTTP, web service, GLU resource).Comment: Not in Dublin Core.
Contact (string, e-mail address)
Definition: The e-mail address for contacting the persons responsible for the resource.Comment: Not part of the Dublin Core. Contact is split into two concepts for clarity.
Contact.Name (string)
Definition: The name of the contact.Comment: A person’s name, “John P. Jones”, or a group, “Archive Support Team”.
Contact.Email (e-mail address)
Definition: The e-mail address of the contact. Comment: For example, “mailto:John.P.Jones@navy.gov”, or “mailto:archive@datacenter.org”.
Content Metadata
Definition: The nature or genre of the content of the resource.Comment: Type includes terms describing general categories, functions, genres, or aggregation levels for content. VO Types include:
Type Description
Archive Collection of pointed observations Survey Collection of observations covering substantial and contiguous areas ofthe sky Catalog Collection of derived data, primarily in tabular formBibliography Collection of bibliographic references, abstracts, and publications
Journal Collection of scholarly publications under common editorial policyLibrary Collection of published materials (journals, books, etc.)Outreach Collection of materials appropriate for public outreach, such as pressreleases and photo galleries Education Collection of materials appropriate for educational use, such as teachingresources, curricula, etc.EPOResource Collection of materials that may be suitable for EPO products but whichare not in final product form, as in Type Outreach or Type Education.EPOResource would apply, e.g., to archives with easily accessedpreview images or to surveys with easy-to-use images.
This list is extensible, and indeed requires further elaboration in the area of computational resources and theoretical models. Resources providing more than one type of content should list all relevant types.
Coverage (string)
Definition: The extent of scope of the content of the resource.Comment: The Dublin Core notion of coverage is too generic to be of much use in the VO, where we need more specific information. We propose to subset this element as follows:
Coverage.Spatial (string)
Definition: The sky coverage of the resource.Comment: The syntax for the spatial coverage specification is based on the SAO region specifications, though for resource descriptions we need only support a subset. We also enhance the SAO region specifications with an explicit reference to the coordinate system reference frame. All positions should be given in degrees. Region Name Specification
Box box (cframe, ξmin, ηmin, ξmax, ηmax) Circle circle (cframe, ξcen, ηcen, radius) Polygon polygon (cframe, ξ1, η1, ξ2, η2, ξ3, η3, …) ξ and η represent coordinates in the appropriate frame (α, δ; l, b; …). Compound regions may be constructed with or’s. The coordinate system reference frame is specified as follows (see http://aladin.u-strasbg.fr/java/doctech/cds.astro.astroframe.html for additional details):
cframe Description
ICRS International Celestial Reference SystemFK5 Equatorial coordinates, FK5 system (J2000)FK4 Equatorial coordinates, FK4 system (B1950)ECL Ecliptic coordinates (J2000) GAL Galactic coordinates (J2000) SGAL Supergalactic coordinates (J2000) Coverage.Spectral (string, list)
Definition: The spectral coverage of the resource. Comment: Spectral coverage at the resource level will be in terms of general spectral regions (gamma-ray, x-ray, extreme UV, UV, optical, infrared, radio). The general spectral regions are defined specifically as follows:Coverage.Spectrals Represents
Radio λ ≥ 100 μ
ν ≤ 3000 GHz
Infrared 1 μ ≤ λ ≤ 100 μ
Optical 0.3 μ ≤ λ ≤ 1 μ
300 nm ≤ λ ≤ 1000 nm
3000 Å ≤ λ ≤ 10000 Å
UV 0.1 μ ≤ λ ≤ 0.3 μ
1000 Å ≤ ë ≤ 3000 Å
EUV 100 Å ≤ ë ≤ 1000 Å
12 eV ≤ E ≤ 120 eV
X-ray 0.1 Å ≤ ë ≤ 100 Å
0.12 keV ≤ E ≤ 120 keV
Gamma-ray E ≥ 120 keV
Resources containing data in multiple spectral regions may give a list (e.g., “Radio, Infrared”).
Coverage.Spectral.Bandpass (string, list)
Definition: A specific bandpass specification.Comment: Some resources and services may choose to give spectral coverage in more specific terms than the general spectral regions. The list of possible bandpass names is too lengthy to enumerate here, but would include optical bandpasses (U, V, B, R, I), narrow line filters (H-alpha, [OIII]), or other specific bandpass names. [Should it be required for any named bandpass to have an associated URL with the actual transmission curve (available as a VOTable)?]
Coverage.Temporal (string, list)
Definition: The temporal coverage of the resource.Comment: Temporal coverage specifications will be given in years, with decimal years permitted. Ranges are specified with a hyphen, e.g., “1987-1993” or “1998.275- ”. Disjoint time spans may be given as a list, e.g., “1981-1984, 1987-1990”.
Coverage.Topics (string, list)
Definition: A list of the topics, object types, or other descriptive keywords about the resource.Comment: Coverage.Topics is intended to provide additional information about the nature of the information provided by the resource. Is this a catalog of quasars? Of planetary nebulae? Is this a tool for computing ephemerides? Terms for Coverage.Topics should be based on the IAU Astronomy Thesaurus (http://msowww.anu.edu.au/library/thesaurus/).
Coverage.Level (string, list)
Definition: A description of the content level, or intended audience.Comment: VO resources will be available to professional astronomers, amateur astronomers, educators, and the general public. These different audiences need a way to find material appropriate for their needs.
Coverage.Level Definition
General Resource provides information appropriate for all usersElementary Education Resource provides information appropriate for grades K-5 educationMiddle School Education Resource provides information appropriate for grades 6-8 educationSecondary Education Resource provides information appropriate for grades 9-12 educationUniversity Resource provides information appropriate foruniversity-level educationResearch Resource provides information appropriate forprofessional-level research and graduate schooleducation Amateur Resource provides information of interest to amateurastronomers Facility (string)
Definition: The observatory or facility where the data was obtained.Comments: Not in Dublin Core. Some resources are likely to hold data from multiple observatories. If just a few, this could be a list; if very many, just say “many”. Theoretical data will not originate with an observatory, but rather might be characterized by the computational facility used to create them (NCSA, SDSC, etc.).
Instrument (string)
Definition: The instrument used to collect the data.Comments: Not in Dublin Core. Can be a specific instrument name (Wide Field/Planetary Camera 2) or generic instrument type (CCD camera). Theoretical data is produced by a computer code, and the name of the code could be specified.
Format (string)
Definition: The encoding format of data provided by the resource.Comments: Typical values would be “FITS”, “ASCII text”, “HTML”, “XML”, “VOTable”, “GIF”, etc. Dublin Core notion of Format is different, but very flexible. We recommend employing MIME types here in order to utilize existing standards.
Rights (string)
Definition: Information about rights held in and over the resource.Comment: Dublin Core uses Rights to describe copyright and other intellectual property rights issues. In the VO context Rights would describe access privileges, using the following values: public, proprietary, mixed.
Resource metadata will be collected through resource registration services, i.e., web forms that present a resource curator with the requisite fields and enumerated lists, and construct a resource descriptor in a standard format (such as VOTable). If content elements are not relevant for a given resource Type (e.g., Type “journal” does not have meaningful spatial coverage, Facility, or Instrument, though it does have a valid temporal coverage), then they should take on a “not applicable” value. The resource registration service should not allow fields to be left unspecified.
Example: The Sloan Digital Sky Survey data as hosted by MAST at STScI.
Title Sloan Digital Sky Survey
Publisher Space Telescope Science Institute/MAST
Creator Sloan Digital Sky Survey Consortium
Description The Sloan Digital Sky Survey is using a dedicated 2.5 m
telescope and a large format CCD camera to obtain images
of over 10,000 square degrees of high Galactic latitude sky
in five broad bands (u', g', r', i' and z', centered at 3540,
4770, 6230, 7630, and 9130 Å, respectively). Medium
resolution spectra will be obtained for approximately 106galaxies and 100,000 quasars. The early data release
(EDR), on June 2001, includes searchable catalogs of
images and spectra, images for display and scientific
purpose in both 2-D FITS and JPEG formats, and spectra in
both 1-D FITS and GIF formats. The EDR covers about
460 square degrees of sky. The next data releases will
occur every 18 months or so.Contributor Sloan Digital Sky Survey Consortium
Identifier http://archive.stsci.edu/sdss/
Reference URL http://archive.stsci.edu/sdss/index.html
Service URL http://skyserver.pha.jhu.edu/en/tools/getimg/fields.asp
Contact.Name Archive Branch, Space Telescope Science Institute
Contact.Email mailto:archive@stsci.edu
Type Survey, Catalog, EPOResource
Coverage.Spatial BOX (J2000 145.17 -1.25 235.9 1.25) OR BOX (J2000
250.71 52.15 267 66.29) OR BOX (J2000 350.43 -1.25 416.37 1.17)Coverage.Spectral Optical
Coverage.Spectral u’, g’, r’, i’, z’
Bandpass Coverage.Temporal 1999.92-
Coverage.Topics galaxies, quasars, stars, CCD photometry, spectroscopy,redshift, sky surveys
Coverage.Level Research
Facility Apache
Point Observatory, Sloan 2.5-m Telescope
Instrument Five-band clocked
CCD camera
Format FITS,
GIF, JPEG (image/FITS, image/gif, image/jpg)
Rights Public
[*]
Resource metadata concepts are drawn from the Dublin
Core, http://dublincore.org/documents/dces/ , except where otherwise noted.)