First version of VOSpace 2.0 WD
K Gillies
gillies at stsci.edu
Tue Jun 16 08:20:34 PDT 2009
Hi All,
These are my comments on the VOSpace WD 2.00 document. Sorry for the
delay. My comments are primarily targeted at providing a better
specification document, but I have included some technical comments as
well. As someone relatively new to VO, I may be repeating some
points that have been discussed in the past.
Kim
-------------------------------------------------------------------------------------
Overall Comments
* The description of service operations in Section 5 is consistent,
complete and clearly will be of value to implementers.
With so many detailed error conditions, we should consider creating a
compliance test suite to enable developers to know their
implementation is correct. Knowing implementations have passed a test
suite will also make it easier on clients.
* The document needs page numbers so the table of contents is useful.
* The example XML in section 1.1 is very useful. The text would be
easier for the reader to understand if more examples of required XML
were interspersed with the text. (see below)
* The sections on various identifiers and descriptions (views,
capabilities, properties, protocols) that are "copy/pasted" should be
combined. The various concepts could be explained in one place, and
each section could provide specific feature details rather than
paragraphs of repeated text. This change would make it easier on the
reader and the document would be easier to maintain.
-------------------------------------------------------------------------------------
* The typical use of VOSpace in Section 1.1 doesn't do an adequate job
of demonstrating why VOSpace is as complex as it is. Some use cases
that validate the enhanced capabilities need to be provided. Maybe
some of this examples such as the data store capability in 3.4.1.1
could be moved into this part of the document?
One known use of VOSpace is as a service embedded within or
collaborating closely with other services such as TAP or other DAL
services. I think this usage should be mentioned and discussed. Does
the VOSpace spec work in this scenario? Minimally, this implies to a
reader that VOSpace was created to be part of an overall VO system.
-------------------------------------------------------------------------------------
* Descriptions (Property, Capability, Views, Protocols - referred to
as PCVP below)
It is suggested that identifiers resolve to descriptions. If they are
resolvable, the descriptions include information that seems useful. I
suggest dropping the optional resolvability of the URI for a specific
PCVP and including that information directly in the representation for
the PCVP. This obviates the problem with lack of registry
representations for PropertyDescriptions, CapabilityDescriptions,
ViewDescriptions, ProtocolDescriptions.
The tradeoff could be that representations would be larger than the
might need to be. On the other side is the added complexity of
requiring clients to do a fetch to obtain the descriptions for every
property, and the fact that doing the fetch could fail. I think we
would be better off adding this information into the item itself.
-------------------------------------------------------------------------------------
* Section 7, Compliance Matrix is needed.
I believe VOSpace needs some sort of "compliance level" to indicate to
clients what is supported by a specific service and to allow certain
assumptions that could make VOSpace use simpler for many common
situations.
As a straw-man, I suggest Level 1 compliance SHALL include the
following:
* Unstructured Data Node support
* httpput/httpget protocols required
* mandatory Level 1 views: TBD - votable, blob?
* security as needed
* client initiated transfers only: pushToVoSpace, pullFromVoSpace
These features/capabilities are not present in Level 1:
* no Properties
* no Capabilities
* no LinkNode, StructuredDataNode, ContainerNode
* no getProperties, getProtocols, getViews (could be supported for
required features, but shouldn't be needed), no creation and node
manipulation, no metadata accessing, no pullToVoSpace, pushFromVoSpace
A Level 2 implementation could include support for properties,
capabilities, and their associated methods,
Level 3 could include the methods that appear to be administrative:
node manipulation, advanced listings...
I think the VOSpace client-service interaction could be reduced to one
interchange followed by data transfer for a Level 1 service with the
defaults mentioned.
-------------------------------------------------------------------------------------
Section Comments
* In various places in the document the text refers to a "resource".
Although it's a common term, a definition of the meaning of this term
in the context of this document should be provided when the term is
first used.
-------------------------------------------------------------------------------------
Section 1.1
I think this section is critical to understanding how VOSpace works
and should be enlarged.
* Showing the XML passed back and forth between client and server is
extremely useful. The example use case would be improved if a mocked
up example of a service reply was added to go with paragraph 4 (The
service will reply...) to supplement the text. The views and any
other information could be shown. The point being made in paragraph 4
is that this response includes information the client uses to form the
next part of the interaction, so seeing this would be help the reader
understand the process.
* Besides adding the missing server responses, this section would be
more valuable with the addition of more explanation for the pieces of
the XML passing back and forth including references to the sections in
the document where these elements are described.
* In my view this is a 3 step process, not 2 (as in paragraph 1). I
understand that VOSpace may only involved in steps 1, and 2, but
clearly exchanging data with a VOSpace is a 3 step process. Maybe a
figure such as the one I've attached could be included that shows the
3 steps and indicates the first two are VOSpace negotiation with the
final step a transfer.
* Is "bmyspace" in paragraph 6 is a typo: http://nvo.caltech.edu/bvospace/myData/table123/transfers/147516ab?
Regarding the note, I don't think UML diagrams are a good choice at
this point in the specification. I suggest a couple figures similar
to what I have attached to demonstrate the flow of a VOSpace
interaction. I could make another if appropriate.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: VOSpace2Figure1.png
Type: image/png
Size: 64447 bytes
Desc: not available
URL: <http://www.ivoa.net/pipermail/vospace/attachments/20090616/e2193e00/attachment-0001.png>
-------------- next part --------------
-------------------------------------------------------------------------------------
Section 2
The following paragraph appears at the end of section 2:
"All ancestors in the hierarchy must be resolvable as containers
(ContainerNodes), all the way up to the root node of the space (this
precludes any system of implied hierarchy in the naming scheme for
nodes with ancestors that are just logical entities and cannot be
reified, e.g. the Amazon S3 system). "
On reading the entire spec, I don't see any reason for or value that
this restriction (resolving all ancestors as containers) gives to the
service or user. There is a lot of complexity and exceptions required
to ensure it is true making the spec more complex. It should be good
enough for vos://a/b/c/file to exist without requiring vos://a/b to
actually be something that exists. If someone needs to place
something in vos://a/b they can certainly do it, and it has no
influence on vos://a/b/c/file. Regardless, I believe the S3 comment
is inappropriate in a spec.
* My suggestion is that this constraint and all its associated
exceptions be removed to be handled by the NodeNotFound exception.
* If there is a good reason for this restriction, we should _really_
weigh whether or not the feature we are supporting with this
constraint in favor of eliminating a possible S3 implementation is
worth it.
My preference would be that VOSpace embrace and encourage the use of
S3. This could get widespread VOSpace use going and would take
advantage of the more available open source community tools. This
would leverage the Amazon's cloud providing a potential opportunity to
reduce IT costs and a way for smaller institutions or sites to provide
data more easily. I could imagine VAO budgeting for some S3 capacity
to be shared by all. One idea is to target Level 1 implementations
(see above) at systems like S3.
-------------------------------------------------------------------------------------
Top of page 7
* The first paragraph refers to the "resource key". This term is not
defined nor is it used anywhere else in the document.
* Is the scheme vos:// or "vos"? (see http://labs.apache.org/webarch/uri/rfc/rfc3986.html#components)
?
Section 2.1
On page 7 the text says the following is a VOSpace identifier: vos://
nvo.caltech!vospace/myresults/siap out 1.vot
It states the two step process for changing a VOSpace identifier into
a HTTP URL is:
1. Change ! to /
2. _Add_ the http:// prefix
If the above referenced use of scheme is correct, then step 2 should
be something like: "Replace the vos URI scheme with the http scheme."
-------------------------------------------------------------------------------------
3.2 Properties
The first paragraph and the concept/usefulness of properties would be
clearer for the reader if paragraph one was extended to include
examples of what Properties are used for within VOSpace with a user-
oriented use case (possibly in section 1.1). Are properties critical
to VOSpace use/operation or a possibly useful addon? Is support of
properties mandatory?
3.2.1 Property Values
* Paragraph 2 states that services don't need to understand the
meaning of all the properties of a node. Then it says that properties
that are not understood should be stored as text strings. Since the
first line of 3.2 says properties _are_ string-based meta data, what
else can a service do than store it as text? If the spec is implying
that the value of the property can be used by the VOSpace
implementation if it is understood, this should be stated. I think
this text should be clarified.
3.2.2 Property Identifiers
* The first sentence says property identifiers must have a unique
URI. Does this mean unique across all vospaces or within a single
vospace?
It seems that potentially there could be a large number of property
identifiers (as well as other VOSpace identifiers). Yet the spec
doesn't provide a recommended viable solution for identifiers +
descriptions. One approach doesn't scale, and the other can't
satisfy the recommendations of the spec because it's not based on
finalized technology. An approach should be recommended in the spec
that can be completely implemented now.
* If a simple URN as property identifier is not scalable for public
use and is not recommended, let's just remove this text from the spec.
* Given that the current VOSpace schema defines property identifiers
as anyURI (para 3), there is a third option of using unique URIs that
aren't registry URIs as property identifiers. The spec should
recommend this option as the first choice along with a requirement
that the URI be resolvable by the VOSpace server (see 3.2.3 below for
an alternate that bypasses this problem).
* TYPO: paragraph 5 - "... resolved into to a description..."
3.2.3 Property Descriptions
All of the information in a PropertyDescription seems reasonable. I
think one could argue that a property representation should just
include this information in addition to the items in 3.2. That way
clients get all the information, and the dependency on incomplete
registry work goes away.
The tradeoff may be that property representations would be bigger than
the might need to be. On the other side is the added complexity of
doing a URL fetch to obtain the descriptions for every property and
the fact that doing the fetch could fail. I think we are better off
adding this information into the property itself.
-------------------------------------------------------------------------------------
Section 3.3 Capabilities
* My feeling is that this functionality described by Capabilities is
not the core functionality of VOSpace, and we should consider dropping
it from the spec. Doesn't the registry provide this kind of
information for users?
-------------------------------------------------------------------------------------
Section 3.4 Views
3.4.1 TYPO: "stores data as a binary files"
3.4.4
* The default views are a great idea. I suggest that the spec also
define some standardized views such as VOTable, FITS, etc.
* The spec should possibly require support for one or more standard
views. This could be done through the compliance levels.
3.4.1.1 Database store
* I see the value of views, and I see this section is trying to
motivate creative uses of views. This idea of using a VOSpace to
access a database seems far fetched, and this functionality seems to
overlap with TAP, which does it much better.
* It is a scenario for an advanced use of VOSpace. It might be better
integrated into section 1.1.
TYPO: "The contents of file would have been..."
3.4.2 View identifiers
* Given this section is basically cut/paste from 3.2.2, 3.3.2, 3.5.1.
Is there some way that "identifiers" can be documented in one section
of the document? The sections seem to be essentially the same.
* Similar to my comments in 3.2.2 I think the text about approaches
that don't scale should be dropped and the text should specify one
approach that can be implemented now.
3.4.3.2 Mime types
* This section should state exactly what field of the HTTP response
should be used for the MIME type.
3.4.4 Default Views
* It should be a priority to specify default VOSpace behavior such
that all VOSpace implementations do something similar without the user
providing an import/export view.
* I understand why the second paragraph states the default import view
is OPTIONAL, but there SHOULD be a default "profile" for a VOSpace
that just works. Maybe this means users can import/export data into/
from UnstructuredDataNodes.
3.4.5 Container Views
This seems like it is a useful feature, although not possibly used
often. I think a scenario using this feature (and demonstrating
views) should be added to 1.1.
* Second paragraph... Can't a user access a subset of all the child
nodes and get them in tar, zip, etc? The line says the service SHALL
package _all_ the child nodes. This seems too restrictive. Is the
restriction required?
-------------------------------------------------------------------------------------
Section 3.7 Listings
* Would it be possible to use PQL for the match rather than yet
another syntax for specifying search criteria?
-------------------------------------------------------------------------------------
Section 3.8 REST bindings
5.2.1
* Is it necessary to support creating vos:Node or vos:DataNode?
* Is there a user scenario that requires this. The API would be
simpler without.
* A better approach might be to indicate methods as part of an
administration interface?
5.2.3 TYPO: "... to be split across more than response" _one_ needed?
5.2.4 findNodes
This operation is purposely marked as OPTIONAL. Does that mean all
methods not marked OPTIONAL are required? Aside from compliance
levels, if certain methods are optional, we should mark each as
required or optional.
5.4.1 pushToVoSpace
Under 5.4.1.1 Request
* It says, "If a Node already exists at the target URI, then the data
SHALL be imported into the existing Node and the Node properties SHALL
be cleared unless the node is a Container Node."
Isn't it required that the Node be a container node to allow importing
data? I suppose there could be a case where one DataNode is being
overwritten with an updated version. Is this the other case?
6.2 --
* I couldn't find the referenced document under the 2.0 section.
Section 7
I think this is a great idea and suggests support for compliance levels.
More information about the vospace
mailing list