I nternational

V irtual

O bservatory

A lliance

URI fragments in IVOA specifications
Version 1.0

IVOA Note 25 May 2012

Working Group
Semantics
This version:
http://www.ivoa.net/Documents/Notes/URIFragments/20120525
Latest version:
http://www.ivoa.net/Documents/Notes/URIFragments/
Previous version(s):
None
Author(s):
Norman Gray

Abstract

The fragment identifier in a URI has a specific semantics attached to it. IVOA specifications should therefore not use it as a simple indicator of hierarchy or containment.

Status of This Document

This is an author's draft. It has no IVOA standing as such, but will be submitted as a Note to the IVOA documents series once it has received some feedback.

This is an IVOA Note expressing suggestions from and opinions of the authors. It is intended to share best practices, possible approaches, or other perspectives on interoperability with the Virtual Observatory. It should not be referenced or otherwise interpreted as a standard specification.

A list of current IVOA Recommendations and other technical documents can be found at http://www.ivoa.net/Documents/.

Acknowledgements

The author is most grateful for comments and criticisms received from Guy Rixon, Mark Taylor, Markus Demleitner, and Dick Shaw.

Contents

1. Introduction

URIs are defined in IETF RFC 3986 [std:rfc3986]. In its full generality, the syntax of URIs is quite complicated, but most of the URIs we commonly see use only a subset of the possible features, namely a scheme (which is usually http or sometimes, in VO contexts, ivo), a host prefixed by a pair of slashes //, a path with elements separated by single slashes /, and a possible fragment, separated from the rest of the URI by a hash or number sign, #. The point of this present note is to stress that the fragment is importantly distinct from the other parts of the URI: it is not sent over the network to a remote server, when the URI is retrieved or dereferenced.

When looking at a webpage in a web browser – for example the URL http://www.ivoa.net/Documents/#notes – the browser retrieves the path /Documents/ from the server at www.ivoa.net and once it has retrieved the HTML page that come back, it searches within the page for the anchor labelled with notes. Crucially, this search happens entirely on the client side, and it or its analogue happens during the processing of any URI – it is not specific to HTTP or to HTML pages. It also therefore applies to IVORN URIs (starting ivo:) [std:ivo] and VOSpace URIs (starting vos:) [std:voevent].

In brief: The fragment identifier in a URI (RFC 3986, [std:rfc3986]) has a specific semantics attached to it. IVOA specifications should therefore not use it as a simple indicator of hierarchy or containment. Or, put another way: punctu–ation,isn#t ju`st !dec$ora/tion.

This document is not intended to be a comprehensive survey of recommended and deprecated URL patterns. We note, however, that quite a lot of the suggestions in the famous Cool URIs don't change document are as valid now as they were in 1998.

2. The problem with fragments

Several IVOA standards define URI patterns for the objects they describe – the VOEvent and VOSpace standards are an example. In this context, it is natural to use the URI fragment as a way of referring to a resource which is conceptually contained within another, by analogy with the way that the fragments in HTML pages are conceptually within the page. Unfortunately, the fixed and invariable meaning attached to URI fragments means that the applications which process such URIs may be required by the (IETF RFC) standard to process them in ways which may be unintended by the IVOA standards. If applications, guided by an IVOA standard, do not process URIs in a conformant way, then we are concerned that those applications will risk being frustrated by conformant library APIs, by caches, and by future developments in URI standards themselves.

The rest of this section is a detailed discussion of the problem, with a rather legalistic tone, in terms which presume some acquaintance with the details of the URI specification [std:rfc3986].

The fundamental problem with URI formats such as scheme:foo#local_ID is that the specification for URIs [std:rfc3986] requires that the fragment (the #local_ID) is removed prior to any dereference – the fragment identifier is separated from the rest of the URI prior to a dereference (this and other quotations here are from section 3.5 of the URI RFC). Other language in this section makes it clear that the fragment has a special, and secondary, status ([t]he fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information) and that this is independent of the scheme: [f]ragment identifier semantics are independent of the URI scheme and thus cannot be redefined by scheme specifications.

Further, the fragment identifier is not used in the scheme-specific processing of a URI. This means that in order to conform to the URI specification, the processing of the ivo: URI scheme must ignore the fragment. This means that whenever an IVORN ivo://foo/bar#baz is processed (or in general used in any way other than a name in the ivo://foo/bar namespace), that processing must be done on the IVORN ivo://foo/bar alone, and the presence of the #baz fragment taken account of only after retrieval is complete.

Another way of phrasing this is that there is no guarantee that a server will see the fragment in any URI, since any of possibly multiple intermediaries between the client and the server will be licensed to remove it (nor, incidentally, is there any guarantee that a server will not see the fragment).

The intention of the URI specification is that such a URI is conceptually handled by the client stripping the fragment, processing the resulting cropped URI, and then resolving the fragment, in some scheme-specific way, on the client.

In the VOEvent spec, however, .../streamid and .../streamid#local_ID are conceived as completely independent resources, contrary to the prescriptions in the URI RFC.

See section 2.5 for a note on affected IVOA Standards.

This is not merely a theoretical problem, for three reasons.

2.1. Issue 1: scheme handlers may not report the fragment

One can imagine a URI API which allows for scheme-specific handlers (eg for vos: or ivo:), in the way that the java.net.URI class does. Such a handler class's API could potentially be constructed in such a way that the handler code couldn't get access to the fragment part of the parsed URI. This would completely destroy the functionality of a custom handler for ivo: URLs which included significant fragments. And this would not be a bug in the API.

The java.net.URLStreamHandler abstract class is not in fact constructed in this way, but this is no guarantee that a different class, in this or a different language, won't act in the same inconvenient fashion.

2.2. Issue 2: servers (including caches) may equate URIs with and without fragments

When a cache is asked for scheme:path#fragment, it should simply return the content of scheme:path since, according to the URI spec, and for any scheme, these are equivalent in this context. Indeed, any ivo: cache is required to behave like this (RFC section 6.1: When URIs are compared to select (or avoid) a network action, such as retrieval of a representation, fragment components (if any) should be excluded from the comparison.). That is, if a user-agent were to ask a proxy or cache for ivo://auth/obj#frag, it should receive the contents of ivo://auth/obj.

This also is not a bug in the cache.

Superficially, it seems that these two problems can be evaded: don't use scheme-specific handlers, and don't use proxies or caches; or more generally, avoid tools which conform to the demands of the URI specification. Depending on the local network environment, however, user-agents may be obliged to use caches; this is unlikely in (current) practice, in the case of non-HTTP URIs, but this may not be avoidable in future for the following reason.

2.3. Issue 3: URIs won't last forever

The third point is the longest-term point, and may not be so easily worked around.

At some point – perhaps in a decade, perhaps longer – there will be a replacement standard for addressing things on the web (or whatever replaces it). As the web's core addressing technology, URIs are so important that there will certainly be a mechanism for mapping URIs to the new standard, supported by gateways or proxies of some type. At this point, using a URI proxy will not be optional, if the IVOA is to remain reasonably consistent with the rest of the world.

Whatever technology finally replaces URIs as a addressing mechanism will have a lot of work invested in it, to make sure the two are compatible. The gateways implementing this mapping cannot be guaranteed to be friendly to URI schemes which depend on behaviour which the URI specification declares must not happen.

2.4. Non-problem: URIs as names

We do not wish to suggest that fragments should be avoided in general; there are plenty of cases where they are perfectly appropriate. In the best-known use, to provide a direct link to elements within an HTML page, fragments are useful and unexceptionable; and when a fragment is used to create a name for something, as is used within the Standards Registry Extension, or in many Semantic Web use-cases, that is a useful and increasingly common technique which provides natural namespacing.

The Standards Registry Extension specification [std:stdregext] uses URIs as names: for example ivo://ivoa.net/std/QueryProtocol#case-insensitive. Here, there’s no suggestion that the #case-insensitive thing is a differently-retrieved resource – it is simply a name, and the non-fragment part of the URI is merely acting as a type of namespace. This goes with the grain of the URI definition.

At the risk of belabouring the point, the difference between this and the VOEvent case is that in the VOEvent case there is the clear implication that a VOEvent identifier stream#event is not merely a name for an event, but is expected to be retrievable directly, in contrast to being accessed by downloading the entire stream, and searching locally for the secondary resource #event. There is a similar situation, mutatis mutandis, when the VOSpace specification talks of accessing nodes.

2.5. Affected IVOA standards

VOEvent identifiers have the form ivo://example.org/streamid#local_ID (see section 2.2 of [std:voevent]). The URI RFC requires that this is resolved by retrieving the resource ivo://example.org/streamid and finding #local_ID within it, but the VOEvent specification indicates that the resources ivo://example.org/streamid and ivo://example.org/streamid#local_ID might be retrieved independently.

The text of the VOSpace specification [std:vospace] principally illustrates URI fragments being used as property names; this is unproblematic for the reasons discussed below (Sect.2.4). However the specification also describes URIs in a vos: scheme (implicitly and explicitly including fragments) as names for VOSpace nodes, and describes these being retrieved to obtain the node contents. Depending on how this retrieval is done, this dereferencing procedure might be adversely affected by the issues described in this Note.

Other IVOA specifications which discuss URIs with fragments may need to be examined, to discover whether they are also unwittingly depending on unsupported behaviour.

3. Recommendations

This Note makes the following recommendations:

  1. IVOA protocols should not use URI fragments other than in a context in which (a) the fragment is being used as a name for an object which is not expected to be retrieved, or (b) there is an implication that the object so named will be retrieved in the way which is implied by the URI model.
  2. If a resource named by a standard-specified URI will ever be retrieved, then to avoid doubt the standard should explicitly note that the fragment processing is expected to be performed by the client.

References

[std:rfc3986] Tim Berners-Lee, Roy Thomas Fielding, and L Masinter.
Uniform resource identifier (URI): Generic syntax. RFC 3986, January 2005.
[std:vospace] Matthew Graham, Dave Morris, Guy Rixon, Pat Dowler, Andre Schaaff, and Doug Tody.
VOSpace specification, version 2.00-20111202. IVOA Proposed Recommendation, December 2011.
[std:stdregext] Paul Harrison, Douglas Burke, Ray Plante, Guy Rixon, and Dave Morris.
StandardsRegExt: a VOResource schema extension for describing IVOA standards, version 1.0-20120217. IVOA Proposed Recommendation, February 2012.
[std:ivo] Raymond Plante, Tony Linde, Roy Williams, and Keith Noddle.
Ivoa identifiers, version 1.12. IVOA Recommendation, March 2007.
[std:voevent] Rob Seaman and Roy Williams, editors.
Sky event reporting metadata (VOEvent). IVOA Recommendation, 2006.

Volute $Revision: 1775 $ $Date: 2012-05-25 18:32:48 +0100 (Fri, 25 May 2012) $