|
International Virtual Observatory Alliance |
Resource and Service Metadata
for the Virtual Observatory
Version 0.7
IVOA Working Draft 09 May 2003
This version:
http://www.ivoa.net/Documents/WD/ResMetadata/RSM-20030509.html
Latest version:
http://www.ivoa.net/Documents/latest/RM.html
Previous versions:
http://www.ivoa.net/Documents/WD/ResMetadata/RSM-20030206.html
http://www.ivoa.net/Documents/WD/ResMetadata/RSM-20021011.html
Editors:
Robert Hanisch
Authors:
IVOA Interoperability Working Group
NVO Metadata Working Group
An essential capability of the Virtual Observatory is a means for describing what data and computational facilities are available where, and once identified, how to use them. The data themselves have associated metadata (e.g., FITS keywords), and similarly we require metadata about data collections and data services so that VO users can easily find information of interest. Furthermore, such metadata are needed in order to manage distributed queries efficiently; if a user is interested in finding x-ray images there is no point in querying the HST archive, for example. In this document we suggest an architecture for resource and service metadata and describe the relationship of this architecture to emerging Web Services standards. We also define an initial set of metadata concepts.
This is a
Working Draft. The first
release of this document was 7 June 2002.
This is
an IVOA Working Draft for review by IVOA members and other interested parties.
It is a draft document and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use IVOA Working Drafts as
reference materials or to cite them as other than "work in progress."
A list of current IVOA Recommendations
and other technical documents can be found at http://www.ivoa.net/Documents/.
Many members of the IVOA Registry Working Group, IVOA Interoperability Working Group, and NVO Metadata Working Group have made significant contributions to this document.
An essential capability of the Virtual Observatory is a means for describing what data and computational facilities are available where, and once identified, how to use them. The data themselves have associated metadata (e.g., FITS keywords), and similarly we require metadata about data collections and data services so that VO users can easily find information of interest. Furthermore, such metadata are needed in order to manage distributed queries efficiently; if a user is interested in finding x-ray images there is no point in querying the HST archive, for example. In this document we suggest an architecture for resource and service metadata and describe the relationship of this architecture to emerging Web Services standards. We also define an initial set of metadata concepts.
In order to make it easy for astronomy information services to participate in the VO, we propose a hierarchical system for metadata management. At the top level we require a minimum amount of information, sufficient primarily to note the existence of a resource and to describe who is responsible for it. At lower levels, the metadata are more extensive and complex, allowing for the description of query syntax, access protocols, and usage policies.
A resource is a general term referring to a VO element that can be described in terms of who curates or maintains it and which can be given a name and a unique identifier. Just about anything can be a resource: it can be an abstract idea, such as sky coverage or an instrumental setup, or it can be fairly concrete, like an organization or a data collection. This definition is consistent with its use in the general Web community as “anything that has an identity” (Berners-Lee 1998, IETF RFC2396). We expand on this definition by saying that it is also describable.
An organization is specific type of resource that brings people together to pursue participation in VO applications. Organizations can be hierarchical and range greatly in size and scope. At a high level, an organization could be a university, observatory, or government agency. At a finer level, it could be a specific scientific project, space mission, or individual researcher. A provider is an organization that makes data and/or services available to users over the network.
A service is any VO element that can be invoked by the user to perform some action on their behalf. Associated with any service is descriptive metadata about the service. Metadata generally include information the user needs to determine if a service is of interest and how the service may be invoked. Specific types of metadata are described below. Note that the service itself need not be aware of the metadata that describe it.
A query service supports a query/response protocol. The user
submits a query to the service that may define characteristics of interest, and
the service returns a set of information to the user. The query may be
null, e.g., a current-time service may only support a null query, and some
services may respond to a null query with appropriate default actions. Non-query services may also exist, e.g.,
services to copy or delete files on remote files systems, to mail information
to other users, to kill existing jobs, to authorize actions, etc.
A registry is a query service for which the response is a structured
description of other services. The services described by a registry may
be of any type. The registry may support a query that allows the user to
indicate which services might be of interest.
In our model, the hierarchy of resources is one in terms of management and curation. For example, an organization may manage a collection of one or more services and even smaller organizations or projects. For example, MAST, HEASARC, IRSA, NED et al. are all resources. Each of these manage other resources, e.g., the HST archive in MAST. They also support specific services (with are also resources) such as an HST observation log query service or a cone search service. One could in principle describe all of NASA astrophysics data holdings as a resource, or all of NVO as a resource, but aggregates of this scale circumvent the goal of being able to locate the specific resources and services of interest for a particular application.
All resources are described by metadata. Resource metadata are
high-level and independent of any specific service. Resource metadata
include
· Identity metadata, which gives the resource a name and an identifier,
· Curation metadata, which describe who supports the resource and what its purpose is, and
· Content metadata, which describe what kind of information is available (types of data, sky coverage, spectral coverage, etc.).
Resource metadata are typically not queryable
parameters in the underlying services, but rather they encompass information
that now is simply “known” to users, or must be discovered through other
means. Astronomers know that the HST
archive includes optical images and spectra, for example, or that Vizier
provides access to catalogs and tables.
Resource metadata constitute a “yellow pages” of astronomical
information. Resource metadata are
analogous to the UDDI (Universal
Description, Discovery and Integration) Web Service, and are analogous to the
high-level descriptions included in the CDS GLU.
Organizations, data collections, and services can be considered as classes of resources that may each require additional metadata to fully describe it, but which are not shared by other classes. For example, a service description would need to include its inputs, outputs, and how it can be accessed. Service metadata, therefore, can be thought of as an extension of the general resource metadata: where as the resource metadata, through its content metadata, describes what is available, the service metadata describes how to access it.
Below we describe the concepts we believe are needed in the resource metadata. These concepts may be instantiated in a variety of standard forms, e.g. XML, UCD tags, or FITS keywords, and with a variety of mechanisms, such as Topic Maps, OWL, or RDBMSs. Consequently, the exact names and rendering of the values may depend on the particular form in which they are represented. For example, when Coverage.Spatial is rendered as a FITS keyword record, the name will need to be limited to 8 characters and the value rendered in a pure ASCII form; in contrast, when rendered in XML, it might be better to tag the different components of the value separately. It will be necessary to define standard renderings for each of these common forms.
Title (string)
Definition: A name given to the resource.
Comment: Typically, a Title will be a name by which the resource is formally known.
Ticker (string)
Definition: A short abbreviation for the name given to the resource.
Comment: The Ticker
name will be used where brief annotations for the resource name are
required. Ticker strings are limited to
a maximum of eight characters. Not in
Identifier (URI)
Definition: An unambiguous reference to the resource within a given context.
Comment: The URI
corresponding to the resource. Not in
Publisher (string)
Definition: An entity responsible for making the resource available
Comment: Examples of a Publisher include a person or an organization. Users of the resource should include Publisher in subsequent credits and acknowledgments.
PublisherID (URI)
Definition: The identifier for the entity responsible for making the resource available.
Comment: Not in
Creator (string)
Definition: An entity primarily responsible for making the content of the resource.
Comment: Examples of a Creator include a person or an organization. Users of the resource should include Creator in subsequent credits and acknowledgments.
Creator.Logo (URL)
Definition: A URL pointing to a graphical logo, which may be used to help identify the information resource.
Comment: Not in
Subject (string, list)
Definition: A list of the topics, object types, or other descriptive keywords about the resource.
Comment: Subject is intended to provide additional information about the nature of the information provided by the resource. Is this a catalog of quasars? Of planetary nebulae? Is this a tool for computing ephemerides? Terms for Subject should be based on the IAU Astronomy Thesaurus (http://msowww.anu.edu.au/library/thesaurus/).
Description (string, free text)
Definition: An account of the content of the resource.
Comment: Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.
Contributor (string)
Definition: An entity responsible for making contributions to the content of the resource.
Comment: Examples of a Contributor include a person or an organization. Users of the resource should include Contributor in subsequent credits and acknowledgments.
Date (string)
Definition: A date associated with an event in the life cycle of the resource. Typically, Date will be associated with the creation or availability (i.e., most recent release or version) of the resource. ISO8601 is the preferred format (YYYY-MM-DD).
Version (string)
Definition: A label associated with the creation or availability (i.e., most recent release or version) of the resource.
Comment: Not in
ReferenceURL (URL)
Definition: A URL pointing to additional information about the resource. In general, this should be in a human-readable format.
Comment: Not in
Contact (string, e-mail address)
Definition: The e-mail address for contacting the persons responsible for the resource.
Comment: Not in
Contact.Name (string)
Definition: The name of the contact.
Comment: A person’s name, “John P. Jones”, or a group, “Archive
Support Team”.
Contact.Email (e-mail address)
Definition: The e-mail address of the contact.
Comment: For example, “mailto:John.P.Jones@navy.gov”, or
“mailto:archive@datacenter.org”.
Type (string, list)
Definition: The nature or genre of the content of the resource.
Comment: Type includes terms describing general categories, functions, genres, or aggregation levels for content. VO Types include:
Type Description
Archive Collection of pointed observations
Bibliography Collection of bibliographic references, abstracts, and
publications
Catalog Collection of derived data, primarily in tabular form
Journal Collection of scholarly publications under common
editorial policy
Library Collection of published materials (journals, books, etc.)
Simulation Theoretical simulation or model
Survey Collection of observations covering substantial and
contiguous areas of the sky
Education Collection of materials appropriate for educational use, such
as teaching resources, curricula, etc.
Outreach Collection of materials appropriate for public outreach, such
as press releases and photo galleries
EPOResource Collection of materials that may be suitable for EPO
products but which are not in final product form, as in Type
Outreach or Type Education. EPOResource would apply,
e.g., to archives with easily accessed preview images or to
surveys with easy-to-use images.
Animation Animation clips of astronomical phenomena
Artwork Artists’ renderings of astronomical phenomena or objects
Background Background information on astronomical phenomena or
objects
BasicData Compilations of basic astronomical facts about objects,
such as approximate distance or membership in
constellation.
Historical Historical information about astronomical objects.
Photographic Publication-quality photographs of astronomical objects.
Press Press releases about astronomical objects.
This list is extensible. Resources providing more than one type of content should list all relevant types.
Coverage (string)
Definition: The extent of scope of the content of the resource.
Comment: The Dublin Core notion of coverage is too generic to be of much use in the VO, where we need more specific information. We propose to subset this element as follows:
[Next
metadata element needs to be updated to STM region specification.]
Coverage.Spatial (string)
Definition: The sky coverage of the resource.
Comment: The syntax for the spatial coverage specification is described in the Space-Time Metadata definition document. All positions should be given in degrees.
Region Name Specification
Box box (cframe, ξmin, ηmin, ξmax, ηmax)
Circle circle (cframe, ξcen, ηcen, radius)
Polygon polygon (cframe, ξ1, η1, ξ2, η2, ξ3, η3, …)
ξ and η represent coordinates in the appropriate frame (α, δ; l, b; …). Compound regions may be constructed with or’s. The coordinate system reference frame is specified as follows (http://aladin.u-strasbg.fr/ java/doctech/cds.astro.astroframe. html for additional details):
cframe Description
ICRS International Celestial Reference System
FK5 Equatorial coordinates, FK5 system (J2000)
FK4 Equatorial coordinates, FK4 system (B1950)
ECL Ecliptic coordinates (J2000)
GAL Galactic coordinates (J2000)
SGAL Supergalactic coordinates (J2000)
Coverage.RegionOfRegard (float, decimal degrees)
Definition: Both data archives and catalogs have an intrinsic scale size, or angular resolution. For example, a source catalog created from an instrument with one degree angular resolution would have a RegionOf Regard of 0.5 degree, meaning that if one is searching for information pertinent to a given position, objects in this catalog within 0.5 degree of the position of interest would need to be included. For an image archive the RegionofRegard corresponds to the image field of view.
Comment: RegionOf Regard corresponds to CoordArea in the Space-Time Metadata definition document.
Coverage.Spectral (string, list)
Definition: The spectral coverage of the resource.
Comment: Spectral coverage at the resource level will be in terms of general spectral regions (gamma-ray, x-ray, extreme UV, UV, optical, infrared, radio). The general spectral regions are defined specifically as follows:
Coverage.Spectral Represents
Radio l ≥ 100 μ
n £ 3000 GHz
Infrared 1 μ £ l £ 100 μ
Optical 0.3 μ £ l £ 1 μ
300 nm £ l £ 1000 nm
3000 Å £ l £ 10000 Å
UV 0.1 μ £ l £ 0.3 μ
1000 Å £ l £ 3000 Å
EUV 100 Å £ l £ 1000 Å
12 eV £ E £ 120 eV
X-ray 0.1 Å £ l £ 100 Å
0.12 keV £ E £ 120 keV
Gamma-ray E ≥ 120 keV
Resources containing data in multiple spectral regions may give a list (e.g., “Radio, Infrared”).
Coverage.Spectral.Bandpass (string, list)
Definition: A specific bandpass specification.
Comment: Some resources and services may choose to give spectral coverage in more specific terms than the general spectral regions. The list of possible bandpass names is too lengthy to enumerate here, but would include optical bandpasses (U, V, B, R, I), narrow line filters (H-alpha, [OIII]), or other specific bandpass names.
Coverage.Spectral.CentralWavelength (float)
Definition: The central wavelength of the spectral bandpass, in meters.
Comment: This should be the most representative wavelength of the bandpass.
Coverage.Spectral.MinimumWavelength (float)
Definition: The minimum wavelength of the spectral bandpass, in meters.
Comment: Implementors are encouraged to set the minimum wavelength to be as inclusive as possible, allowing all relevant resources to be discovered.
Coverage.Spectral.MaximumWavelength (float)
Definition: The maximum wavelength of the spectral bandpass, in meters.
Comment: Implementors are encouraged to set the maximum wavelength to be as inclusive as possible, allowing all relevant resources to be discovered.
Coverage.Temporal.StartTime (string)
Definition: The earliest temporal coverage of the resource.
Comment: Temporal coverage specifications will be given in ISO8601 format. An empty value field implies that there is no known earliest temporal coverage.
Coverage.Temporal.StopTime (string)
Definition: The latest temporal coverage of the resource.
Comment: Temporal coverage specifications will be given in ISO8601 format. An empty value field implies that there is no known latest temporal coverage, i.e., that information continues to be added to the resource.
Coverage.Depth (float)
Definition: The (typical) depth coverage, or sensitivity, of the resource. Coverage.Depth is specified in units appropriate to the resource [integrated magnitudes, surface brightness (mag/arcsec^2), or flux density (Jy)].
Comment: Refer to Greisen and Calabretta 2002 (A&A 395, 1061, Section 4) and references therein for standard units strings.
Coverage.ObjectDensity (float)
Definition: The (typical) density of objects, catalog entries, telescope pointings, etc., on the sky, in number per square degree.
Coverage.ObjectCount (int)
Definition: The total number of objects, catalog entries, telescope pointings, etc., in the resource.
ContentLevel (string, list)
Definition: A description of the content level, or intended audience.
Comment: VO resources will be available to professional astronomers, amateur astronomers, educators, and the general public. These different audiences need a way to find material appropriate for their needs.
ContentLevel Definition
General Resource provides information appropriate for
all users
Elementary Education Resource provides information appropriate for
grades K-4 education
Middle School Education Resource provides information appropriate for
grades 5-8 education
Secondary Education Resource provides information appropriate for
grades 9-12 education
Community College Resource provides information appropriate for
education at community colleges
University Resource provides information appropriate for
university-level education
Research Resource provides information appropriate for
professional-level research and graduate
school education
Amateur Resource provides information of interest to
amateur astronomers
Informal Education Resource provides information appropriate for
education at museums, planetariums, and
other centers of informal learning
Facility (string)
Definition: The observatory or facility where the data was obtained.
Comments: Not in
Instrument (string)
Definition: The instrument used to collect the data.
Comments: Not in
Format (string, list)
Definition: The physical or digital manifestation of the information provided by the resource.
Comments: Typical values would be “FITS”, “ASCII text”, “HTML”, “XML”, “VOTable”, “GIF”, etc. MIME types should be used where available to specifiy digital information formats in order to utilize existing standards.
Other format values will be used to describe the physical medium of the information: CDROM, Digital Planetarium, Online, Presentation, Print, Slides, Video. Format specifications may be combined, as in “Video, JPEG” or “CDROM, FITS, GIF”.
Rights (string)
Definition: Information about rights held in and over the resource.
Comment:
Resource metadata will be collected through resource registration services, i.e., web forms that present a resource curator with the requisite fields and enumerated lists, and construct a resource descriptor in a standard format (such as VOTable). If content elements are not relevant for a given resource Type (e.g., Type “journal” does not have meaningful spatial coverage, Facility, or Instrument, though it does have a valid temporal coverage), then they should take on a “not applicable” value. The resource registration service should not allow fields to be left unspecified.
The metadata necessary for describing a service will vary quite a bit depending on the type of service it is. We propose two general categories of service metadata:
Interface metadata, which describe how to access the service—the inputs and the outputs. There will be standard types of interfaces that could include a web-browser-based interface (i.e., HTML Forms), a Web Service interface (describable by a WSDL document), a general HTTP Get interface (e.g., using key=value arguments), and a GLU-described interface.
Capability metadata, which describe what the service does, its limitations, and other behavioral characteristics.
Note that these categories are reasonably orthogonal. We can imagine the same basic service—in terms of its capabilities—accessible through multiple interfaces.
We expect that for each standard service recognized by the
VO there will be a specification document that defines all the specific
metadata necessary to describe a particular implementation of that service;
thus, we do not include them all here. However, we can identify a few
metadata concepts that might be employed to describe a particular
service. Described below, these concepts should be employed by standard
service specifcations whereever
they are applicable. We note also that metadata
associated with the VOTable schema can also be reused to describe the inputs
and outputs of a service that returns a VOTable.
ServiceInterfaceURL (URL)
Definition: A URL pointing to a document that presents or describes the service interface.
Comment: Not in
ServiceBaseURL (URL)
Definition: The base portion of a URL used to invoke a service with the expectation that an additional string must be appended for the service to execute properly. The syntax of the appended string is defined by the specific service.
Comment: Not in
ServiceHTTPResults (MIME type)
Definition: The MIME type that is returned by a service.
Comment: Not in
ServiceStandardURI (URI)
Definition: A URI identifying a standard service.
Comment: Not in
ServiceStandardURL (URL)
Definition: A URL that points to a human-readable document that describes the standard upon which a service is based.
ServiceMSR (float, decimal degrees)
Definition: Service providers may choose to restrict the scope of searches done against their services, lest they be swamped with requests for millions or billions of results records. ServiceMSR restricts searches to some maximum radius (in decimal degrees) about a celestial coordinate.
Comment: A value of 180.0 or greater denotes that there is no restriction.
Example: The Sloan Digital Sky Survey data as hosted by MAST at STScI.
Identity metadata
Title Sloan
Digital Sky Survey
Ticker SDSS
Identifier http://archive.stsci.edu/sdss/
Curation metadata
Publisher Space
Telescope Science Institute/MAST
PublisherID http://archive.stsci.edu/
Creator Sloan
Digital Sky Survey Consortium
Creator.Logo http://archive.stsci.edu/images/sdss_logo.gif
Subject galaxies, quasars, stars, CCD
photometry,
spectroscopy, redshift,
sky surveys
Description The
Sloan Digital Sky Survey is using a dedicated
2.5 m telescope and a large format CCD camera to
obtain images of over 10,000 square degrees of high Galactic latitude sky in
five broad bands (u', g', r', i' and z', centered at
3540, 4770, 6230, 7630, and 9130 Å, respectively). Medium resolution spectra
will be obtained for approximately 106 galaxies and 100,000 quasars.
The early data release (EDR), on June 2001, includes searchable catalogs of
images and spectra, images for display and scientific purpose in both 2-D FITS
and JPEG formats, and spectra in both 1-D FITS and GIF formats. The EDR covers
about 460 square degrees of sky. The next data releases will occur every 18
months or so.
Contributor Sloan
Digital Sky Survey Consortium
Date 2003-02-01
Version SDSS
EDR
ReferenceURL http://archive.stsci.edu/sdss/index.html
Contact.Name Archive Branch, Space Telescope Science Institute
Contact.Email mailto:archive@stsci.edu
Content metadata
Type Survey,
Catalog, EPOResource
Coverage.Spatial BOX (FK5 145.17 -1.25 235.9 1.25) OR BOX (FK5
250.71
52.15 267 66.29) OR BOX (FK5 350.43 -1.25
416.37
1.17)
Coverage.RegionOfRegard 0.0001
Coverage.Spectral Optical
Coverage.Spectral.Bandpass u’, g’, r’, i’,
z’
Coverage.Spectral.MinimumWavelength 400.e-9
Coverage.Spectral.MaximumWavelength 850.e-9
Coverage.Temporal.StartTime 1999-12-25
Coverage.Temporal.StopTime <null>
Coverage.Depth 22.6 mag
Coverage.ObjectDensity 6.e4
Coverage.ObjectCount 2.e7
ContentLevel Research
Facility Apache
Point Observatory, Sloan 2.5-m Telescope
Instrument Five-band
clocked CCD camera
Format Online,
FITS, GIF, JPEG (image/FITS, image/gif, image/jpg)
Rights Public
Service metadata
ServiceInterfaceURL http://archive.stsci.edu/cgi-bin/sdss/catalog.html
ServiceBaseURL http://archive.stsci.edu/cgi-bin/sdss/catalog
ServiceHTTPResults text/xml
ServiceStandardID http://www.ivoa.net/services/ConeSearch/
ServiceStandardURL http://www.ivoa.net/services/ConeSearchSpecification.html
ServiceMSR 0.2
· Reformatted document to be compatible with new IVOA documentation standards.
· Modified much of the architecture description to reflect current concepts about the function and structure of registries.
· Distinguished identity metadata from other curation metadata.
· Added Ticker element to provide shorthand abbreviation for resource Title.
· Added PublisherID element to provide unique identifier for the resource publisher.
· Replaced ResourceURL with ReferenceURL.
· Deleted ServiceURL and incorporated more general service metadata (Section 4).
· Incorporated more complete metadata definitions for Education and Public Outreach resources. Affects: Type, Format, ContentLevel.
· Added Type = Simulation for theoretical models.
· Coverage.Spatial refers to Space-Time Metadata document for spatial coverage specifications.
· Added Coverage.Spectral.CentralWavelength, Coverage.Spectral.MinimumWavelength, and Coverage.Spectral.MaximumWavelength to more fully describe the spectral bandpass.
· Replaced Coverage.Temporal with Coverage.Temporal.StartTime and Coverage.Temporal.StopTime to be consistent with Date and use of ISO 8601 time format.
· Added Coverage.RegionOfRegard to aid users of the registry in assessing the utility of a resource.
· Added Coverage. Depth, Coverage.Object Density, and Coverage.ObjectCount to provide information on resource depth of coverage (sensitivity) and richness.
· Broadened definition of Format, consistent with Dublin Core, to include specifications of both physical and digital formats.
· Added new section on service metadata, introducing metadata elements ServiceInterfaceURL, ServiceBaseURL, ServiceHTTPResults, ServiceStandardID, ServiceStandardURL, and ServiceMSR.
· Updated example to use new metadata elements.
* Resource metadata concepts are drawn from the Dublin
Core, http://dublincore.org/documents/dces/,
except where otherwise noted.)