IVOA

 International

    Virtual

    Observatory

Alliance

 

Resource Metadata

for the Virtual Observatory

Version 1.12

 

IVOA Recommendation 2007 March 2

 

This version:

            http://www.ivoa.net/Documents/REC/ResMetadata/RM-20070302.html

 

Latest version:

            http://www.ivoa.net/Documents/latest/RM.html

 

Previous version(s):

            http://www.ivoa.net/Documents/PR/ResMetadata/RM-20061212.html

            http://www.ivoa.net/Documents/PR/ResMetadata/RM-20051115.html

            http://www.ivoa.net/Documents/WD/ResMetadata/RM-20050621.html       

            http://www.ivoa.net/Documents/REC/ResMetadata/RM-20040426.html

            http://www.ivoa.net/Documents/PR/ResMetadata/RM-20040323.html

            http://www.ivoa.net/Documents/PR/ResMetadata/RM-20040126.html

            http://www.ivoa.net/Documents/WD/ResMetadata/RM-20031002.html

            http://www.ivoa.net/Documents/WD/ResMetadata/RM-20030801.html

            http://www.ivoa.net/Documents/WD/ResMetadata/RM-20030709.html

            http://www.ivoa.net/Documents/WD/ResMetadata/RSM-20030509.html

            http://www.ivoa.net/Documents/WD/ResMetadata/RSM-20030206.html

            http://www.ivoa.net/Documents/WD/ResMetadata/RSM-20021011.html

 

Editor(s):

            Robert Hanisch

 

Author(s):

            IVOA Resource Registry Working Group

            NVO Metadata Working Group

 

 

Abstract

An essential capability of the Virtual Observatory is a means for describing what data and computational facilities are available where, and once identified, how to use them.  The data themselves have associated metadata (e.g., FITS keywords), and similarly we require metadata about data collections and data services so that VO users can easily find information of interest.  Furthermore, such metadata are needed in order to manage distributed queries efficiently; if a user is interested in finding x-ray images there is no point in querying the HST archive, for example.  In this document we suggest an architecture for resource and service metadata and describe the relationship of this architecture to emerging Web Services standards.  We also define an initial set of metadata concepts.

 

 

Status of this document

This is a Recommendation. The first release of this document was 7 June 2002.  This is an update to the Recommendation dated 2004 April 26.  The goal of this update is to clarify the definitions of certain metadata elements, add certain new elements, and delete elements that have not been useful.

 

 

A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/.

 

Acknowledgments

Many members of the IVOA Registry Working Group, AstroGrid project, NVO Technical Working Group, and participants in IVOA Interoperability workshops have made significant contributions to this document.  Contributors to this document have been partly or completely supported by the following projects and programs:

 

·         The U.S. National Virtual Observatory project, which is funded by the National Science Foundation's Information Technology Research Program under Cooperative Agreement AST0122449 with The Johns Hopkins University.

·         The UK AstroGrid project, which is funded by the Particle Physics and Astronomy Research Council.

·         The Astrophysical Virtual Observatory, which is funded by the fifth framework program of the European Community for research, technological development, and demonstration activities (FP5).

 

 

Contents

Abstract 1

Status of this document 2

Acknowledgments. 2

Contents. 2

1      Introduction. 3

2      Architecture. 3

3      Resource metadata concepts. 5

3.1       Identity metadata. 6

3.2       Curation metadata. 6

3.3       General content metadata. 8

3.4       Collection and service content metadata. 11

3.5       Correspondence of Coverage metadata with the Space-Time Coordinate schema  15

4      Data and metadata quality assessment 16

5      Service metadata concepts. 18

5.1       Interface metadata. 18

5.2       Capabilities metadata. 19

6      Example. 19

7      Changes from previous versions. 21

 

 

1         Introduction

 

An essential capability of the Virtual Observatory is a means for describing what data and computational facilities are available where, and once identified, how to use them.  The data themselves have associated metadata (e.g., FITS keywords), and similarly we require metadata about data collections and data services so that VO users can easily find information of interest.  Furthermore, such metadata are needed in order to manage distributed queries efficiently; if a user is interested in finding x-ray images there is no point in querying the HST archive, for example.  In this document we suggest an architecture for resource and service metadata and describe the relationship of this architecture to emerging Web Services standards.  We also define an initial set of metadata concepts.

 

 

2         Architecture

 

In order to make it easy for astronomy information services to participate in the VO, we propose a hierarchical system for metadata management.  At the top level we require a minimum amount of information, sufficient primarily to note the existence of a resource and to describe who is responsible for it.  At lower levels, the metadata are more extensive and complex, allowing for the description of query syntax, access protocols, and usage policies.

 

A resource is a general term referring to a VO element that can be described in terms of who curates or maintains it and which can be given a name and a unique identifier.  Just about anything can be a resource: it can be an abstract idea, such as sky coverage or an instrumental setup, or it can be fairly concrete, like an organisation or a data collection.  This definition is consistent with its use in the general Web community as “anything that has an identity” (Berners-Lee 1998, IETF RFC2396).  We expand on this definition by saying that it is also describable. 

 

An organisation is specific type of resource that brings people together to pursue participation in VO applications.  Organisations can be hierarchical and range greatly in size and scope.  At a high level, an organisation could be a university, observatory, or government agency.  At a finer level, it could be a specific scientific project, space mission, or individual researcher.  A provider is an organisation that makes data and/or services available to users over the network.

 

A service is any VO resource that can be invoked by the user to perform some action on their behalf.  Associated with any service is descriptive metadata about the service.  Metadata generally include information the user needs to determine if a service is of interest and how the service may be invoked.  Specific types of metadata are described below.  Note that the service itself need not be aware of the metadata that describe it.


A query service supports a query/response protocol.  The user submits a query to the service that may define characteristics of interest, and the service returns a set of information to the user.  The query may be null, e.g., a current-time service may only support a null query, and some services may respond to a null query with appropriate default actions.  Non-query services may also exist, e.g., services to copy or delete files on remote files systems, to mail information to other users, to kill existing jobs, authorize actions, etc.



A registry is a query service for which the response is a structured description of resources.  The resources described by a registry may be of any type.  The registry may support a query that allows the user to indicate which resources might be of interest.

In our model, the hierarchy of resources is one in terms of management and curation.  For example, an organisation may manage a collection of one or more services and even smaller organisations or projects.  For example, MAST, HEASARC, IRSA, NED et al. are all resources.  Each of these manages other resources, e.g., the HST archive in MAST.  They also support specific services (which are also resources) such as an HST observation log query service or a cone search service.  One could in principle describe all of NASA astrophysics data holdings as a resource, or all of NVO as a resource, but aggregates of this scale circumvent the goal of being able to locate the specific resources and services of interest for a particular application. 

 
All resources are described by metadata.  Resource metadata are generic, high-level, and independent of any specific service.  Resource metadata include

 

·         Identity metadata, which gives the resource a name and an identifier,

 

·         Curation metadata, which describe who supports the resource and its availability (i.e., version, release date), and

 

·         Content metadata, which describe what kind of information is available (types of data, sky coverage, spectral coverage, etc.).  Content metadata can be either general, applying to all resources, or associated more specifically with data collections and the services that deliver data from them.

 

Resource metadata are typically not queryable parameters in the underlying services, but rather they encompass information that now is simply “known” to users, or must be discovered through other means.  Astronomers know that the HST archive includes optical images and spectra, for example, or that Vizier provides access to catalogs and tables.  Resource metadata constitute a “yellow pages” of astronomical information.  Resource metadata are analogous to the UDDI (Universal Description, Discovery and Integration) Web Service, and are analogous to the high-level descriptions included in the CDS GLU.

 

Organisations, data collections, and services can be considered as classes of resources that may each require additional metadata to fully describe it, but which are not shared by other classes.  For example, a service description would need to include its inputs, outputs, and how it can be accessed.  Service metadata, therefore, can be thought of as an extension of the general resource metadata:  where as the resource metadata, through its content metadata, describes what is available, the service metadata describes how to access it.

 

Resource metadata will be collected through resource registration services, e.g., web forms that present a resource curator with the requisite fields and enumerated lists, and construct a resource descriptor in a standard format (such as VOTable).    The resource registration service should not allow fields to be left unspecified.  Some metadata elements may be irrelevant, unknown, or not provided by the publisher of a resource.  Since “irrelevant” conveys different information than “not provided”, we will adopt standard representations of these conditions:

 

            “Not Applicable”           irrelevant or not applicable to this resource

            “Unknown”                   unknown, cannot be defined

            “Not Provided             no information was provided by the resource publisher

 

Various applications based on the registry may choose to include or exclude certain resources based on these attributes.  If a metadata element is “Not Provided” the application should make no assumption regarding applicability or relevance.

 

Similarly, some resources may provide quite large aggregations or collections, covering many bandpasses, types, or formats.  It may be prohibitive to list all such options.  In such cases acceptable representations for the metadata entries would be:

 

            “Any”                            resource will respond to requests for any of the

                                                available types (though some may not actually be

                                                available)

            “All”                              resource will respond to requests for all of the

                                                available types, and all are actually available in some

                                                non-zero quantity

 

The most general resource metadata is similar in concept to the Dublin Core metadata definitions (http://dublincore.org/documents/dces/), and where possible DC metadata elements have been used.  VO metadata elements that correspond directly to DC counterparts are noted.  The Dublin Core elements Language and Relation are not currently used in the VO metadata.

 

 

3         Resource metadata concepts

 

Below we describe the concepts we believe are needed in the resource metadata.  These concepts may be instantiated in a variety of standard forms, e.g. XML, UCD tags, or FITS keywords, and with a variety of mechanisms, such as Topic Maps, OWL, or RDBMSs.  Consequently, the exact names and rendering of the values may depend on the particular form in which they are represented.  For example, when Coverage.Spatial is rendered as a FITS keyword record, the name will need to be limited to 8 characters and the value rendered in a pure ASCII form; in contrast, when rendered in XML, it might be better to tag the different components of the value separately.  It will be necessary to define standard renderings for each of these common forms.

 

A limited number of keywords are considered essential for a basic understanding of the resource, and are thus denoted as required.  All others are optional, or may be applied to certain classes of resources only.

 

3.1      Identity metadata

 

Title (string)                 [Dublin Core] [Required]

Definition:  A name given to the resource.

Comment:  Typically, a Title will be a name by which the resource is formally known. Title should be an unabbreviated form (e.g., Hubble Space Telescope) rather than an acronym unless the acronym is so well known as to be part of standard usage.  Publishers are encouraged, but not required, to define unique Titles.

 

ShortName (string)

Definition:  A short abbreviation for the name given to the resource.

Comment:  The ShortName will be used where brief annotations for the resource name are desired, such as in GUIs that might refer to many resources in a compact display.  ShortName strings are limited to a maximum of sixteen characters.  Care should be taken to define illuminating ShortNames indicating either where the resource comes from or what data collection it provides.  ShortNames are not required to be unique.  Indeed, a resource provider may use the same ShortName for several related resources (e.g., different services that access the same collection), or the same ShortName might be used by different providers for common/mirrored resources.  In the latter case, the ShortName defined by the original publisher of the resource should have preference.

 

Identifier (URI)             [Dublin Core] [Required]

Definition:  An unambiguous reference to the resource within a given context.  The syntax for Identifiers is described in IVOA Identifiers in the IVOA document collection (http://www.ivoa.net/Documents/).

Comment:  The URI corresponding to the resource.

 

3.2      Curation metadata

 

Publisher (string)        [Dublin Core] [Required]

Definition:  An entity responsible for making the resource available

Comment:   Examples of a Publisher include a person or an organisation.  Users of the resource should include Publisher in subsequent credits and acknowledgments.

 

PublisherID (URI)

Definition:  The identifier for the entity responsible for making the resource available.  The syntax for Identifiers is described in IVOA Identifiers in the IVOA document collection (http://www.ivoa.net/Documents/).

Comment:  This item is optional; an ID for the publisher may not yet be established (e.g., if the publisher has not yet been registered).

 

Creator (string)           [Dublin Core]

Definition:  An entity primarily responsible for making the content of the resource.

Comment:   Examples of a Creator include a person or an organisation.  Users of the resource should include Creator in subsequent credits and acknowledgments.   Creator is intended to refer to the organisation or individuals responsible for the intellectual content of the resource, and not the organisation or individuals who may have developed the service by which the content is made available.  Guidelines:  1) If the resource is a data collection or service accessing a collection, then Creator fields should list the scientists responsible for the original data collection.  Typically, this would be list of authors associated with the defining published paper for the collection.  At a minimum, the PI or lead author should be given.  Full names should be given, not just surnames.  2) For a collection that is a compilation of many separately published collections (e.g., an archive), then the Creator should be set to "various".  3) If the resource is an organisation not associated with a specific collection, the most appropriate value is either empty or the name of the person responsible to assembling the organisation.  Often, an empty value is most appropriate.  4) If the resource is a Registry that publishes records for a single organisation, the Creator may contain the person(s) responsible for collecting or creating the metadata held in its records.  Otherwise, it can be an empty value.  5) If the resource is an Authority, it should contain the name of the person that reserved the authority ID it records.

 

Creator.Logo (URL)

Definition:  A URL pointing to a graphical logo, which may be used to help identify the information resource.

 

Contributor (string)      [Dublin Core]

Definition:  An entity responsible for making contributions to the content of the resource.

Comment:   Examples of a Contributor include a person or an organisation.  Users of the resource should include Contributor in subsequent credits and acknowledgments.  Like Creator, Contributor is intended to refer to the organisation or individuals responsible for the intellectual content of the resource, and not the organisation or individuals who may have developed the service by which the content is made available.  Also see the Guidelines under Creator.

 

Date (string)                [Dublin Core] [Required]

Definition:  A date associated with an event in the life cycle of the resource.  Typically, Date will be associated with the creation or availability (i.e., most recent release or version) of the resource.  ISO8601 is the preferred format (YYYY-MM-DD).

Comment:  Dates may be approximate (e.g., year only, year and month).  When the resource is an organisation, Date should refer to the approximate genesis of the organisation.  When the resource is a service, Date should refer to the implementation date or the date the service came available.  When the resource describes an authority identifier, Date should refer to when the authority identifier was reserved.  (See IVOA Identifiers in the IVOA document collection (http://www.ivoa.net/Documents/)).

 

Version (string)

Definition:  A label associated with the creation or availability (i.e., most recent release or version) of the resource.

 

Contact (string, e-mail address)

Definition:  The e-mail address for contacting the persons responsible for the resource.

Comment:  Contact is split into two components for clarity.

 

            Contact.Name (string)

            Definition:  The name of the contact.

            Comment:  A person’s name, “John P. Jones”, or a group, “Archive

Support Team”.

 

Contact.Address (string)

Definition:  The mailing address of the contact.

Comment:  All components of the mailing address are given in one string, e.g.,

“3700 San Martin Drive, Baltimore, MD 21218  USA

 

            Contact.Email (e-mail address)

            Definition:  The e-mail address of the contact.

            Comment:  For example, “John.P.Jones@navy.gov”, or

                        “archive@datacenter.org”.

 

            Contact.Telephone (string)

            Definition:  The telephone number of the contact.

            Comment:  Complete international dialing codes should be given, e.g.,

“+1-410-338-1234”

 

 

3.3      General content metadata

 

Subject (string, list)     [Dublin Core] [Required]

Definition:  A list of the topics, object types, or other descriptive keywords about the resource.

Comment:  Subject is intended to provide additional information about the nature of the information provided by the resource.  Is this a catalog of quasars?  Of planetary nebulae?  Is this a tool for computing ephemerides?  Terms for Subject should be drawn from the IAU Astronomy Thesaurus (http://msowww.anu.edu.au/library/thesaurus/), though in the absence of suitable terms (the IAU Thesaurus is not complete in all areas of astronomical research) the following alternate collections of astronomical research terms may be used:

            Vizier keywords (CDS):  http://vizier.u-strasbg.fr/doc/ADCkwds.htx

Astronomy journal keywords: 

http://www.edpsciences.org/journal/statique/doc/aa_keywords.html

Guidelines:  As this is a Required element, it must not be left blank.  Services that provide access to data from registered collections should replicate the Subject metadata in their registry entries.  To support keyword-based searches of registry contents, the Subject element should be as specific as possible and include as many relevant terms as possible.

 

Description (string, free text)  [Dublin Core] [Required]

Definition:  An account of the content of the resource.

Comment:  Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.  Thorough text descriptions are particularly encouraged in order to make text-based searches against the registries maximally useful.  Description should emphasize what the resource is about, as other matters such as who created it, when it was created, and where it is located are described elsewhere in the resource metadata.

 

Source  (string)           [Dublin Core]

Definition:  A bibliographic reference from which the present resource is derived or extracted.

Comment:  The present resource may be derived from the Source in whole or in part.  Recommended best practice is to use the standard bibcode (see http://cdsweb.u-strasbg.fr/simbad/refcode.html), where available.  If no bibcode is available, Source should use a string or number conforming to a formal identification or citation system.

 

ReferenceURL (URL)             [Required]

Definition:  A URL pointing to additional information about the resource.  In general, this information should be human-readable.

 

Type (string, list)         [Dublin Core] [Required]

Definition:  The nature or genre of the content of the resource.

Comment:   Type includes terms describing general categories, functions, genres, or aggregation levels for content.  VO Types include:

 

            Type                Description

            Archive            Collection of pointed observations

            Bibliography    Collection of bibliographic references, abstracts, and

                                    publications

            Catalog            Collection of derived data, primarily in tabular form

            Journal            Collection of scholarly publications under common

                                    editorial policy

            Library             Collection of published materials (journals, books, etc.)

            Simulation       Theoretical simulation or model

            Survey             Collection of observations covering substantial and

                                    contiguous areas of the sky

            Education        Collection of materials appropriate for educational use, such

                                    as teaching resources, curricula, etc.

            Outreach         Collection of materials appropriate for public outreach, such

                                    as press releases and photo galleries

            EPOResource  Collection of materials that may be suitable for EPO

                                    products but which are not in final product form, as in Type

                                    Outreach or Type Education.  EPOResource would apply,

                                    e.g., to archives with easily accessed preview images or to

                                    surveys with easy-to-use images.

            Animation        Animation clips of astronomical phenomena

            Artwork            Artists’ renderings of astronomical phenomena or objects

            Background     Background information on astronomical phenomena or

                                    objects

            BasicData       Compilations of basic astronomical facts about objects,

                                    such as approximate distance or membership in

                                    constellation.

            Historical         Historical information about astronomical objects.

            Photographic   Publication-quality photographs of astronomical objects.

            Press              Press releases about astronomical objects.

            Organisation   An organisation that is a publisher or curator of other resources.

            Project             A project that is a publisher or curator of other resources.

            Registry           A query service for which the response is a structured description

of resources.

Other               A resource not described by any of the above types.

 

This list is extensible.  Resources providing more than one type of content should list all relevant types.

 

ContentLevel (string, list)

Definition:  A description of the content level, or intended audience.

Comment:  VO resources will be available to professional astronomers, amateur astronomers, educators, and the general public.  These different audiences need a way to find material appropriate for their needs.

 

            ContentLevel                           Definition

            General                                   Resource provides information appropriate for

all users

            Elementary Education                        Resource provides information appropriate for

grades K-4 education

            Middle School Education        Resource provides information appropriate for

grades 5-8 education

            Secondary Education             Resource provides information appropriate for

grades 9-12 education

            Community College                Resource provides information appropriate for

                                                            education at community colleges

            University                                Resource provides information appropriate for

                                                            university-level education

            Research                                Resource provides information appropriate for

                                                            professional-level research and graduate

school education

            Amateur                                  Resource provides information of interest to

amateur astronomers

            Informal Education                  Resource provides information appropriate for

                                                            education at museums, planetariums, and

other centers   of informal learning

 

Relationship (string)

Definition:  A resource may be related to another resource in a way that is important to document, so that associated services or duplicate copies may easily be located.

 

            mirror-of          The resource is a mirror of another resource.  Information

                                    gathered from the resources is indistinguishable.

            service-for       The resource is a service associated with a data collection.

            derived-from    The resource is a derivative of another resource, e.g., a subset

                                    selected for a particular scientific interest, or a reprocessed data

                                    collection.

            served-by        The resource (e.g., a data collection) can be accessed via

another service resource.

 

RelationshipID (URI)

Definition:  The identifier of an associated resource.  The relationship is described in the Relationship metadata element.  The syntax for Identifiers is described in IVOA Identifiers in the IVOA document collection (http://www.ivoa.net/Documents/).

 

 

3.4      Collection and service content metadata

 

Facility (string, list)

Definition:  The observatory or facility where the data was obtained.

Comments:  Some resources are likely to hold data from multiple observatories.  If just a few, this could be a list; if very many, just say “many”.  Theoretical data will not originate with an observatory, but rather might be characterized by the computational facility used to create them (NCSA, SDSC, etc.).

Comments:  Facility should be used only to describe entities that specifically produce or manage data.  Observatory names are the most common values.  When the resource is an organisation, Facility may include the names of archives or well known services (e.g. NED) that one may obtain data from.  The listing of Facility values need not be complete; rather, it can be indicative of the facilities that are most important or of most common interest.  The value may be "various" when many facilities are associated with the resource.  The value may be empty when there is no facility that is particularly relevant to the resource.

 

Instrument (string, list)

Definition: