First version of VOSpace 2.0 WD

K Gillies gillies at stsci.edu
Tue Jun 16 08:20:34 PDT 2009


Hi All,

These are my comments on the VOSpace WD 2.00 document.   Sorry for the  
delay.  My comments are primarily targeted at providing a better  
specification document, but I have included some technical comments as  
well.   As someone relatively new to VO, I may be repeating some  
points that have been discussed in the past.

Kim

-------------------------------------------------------------------------------------

Overall Comments

* The description of service operations in Section 5 is consistent,  
complete and clearly will be of value to implementers.

With so many detailed error conditions, we should consider creating a  
compliance test suite to enable developers to know their  
implementation is correct.  Knowing implementations have passed a test  
suite will also make it easier on clients.

* The document needs page numbers so the table of contents is useful.

* The example XML in section 1.1 is very useful. The text would be  
easier for the reader to understand if more examples of required XML  
were interspersed with the text.  (see below)

* The sections on various identifiers and descriptions (views,  
capabilities, properties, protocols) that are "copy/pasted" should be  
combined.  The various concepts could be explained in one place, and  
each section could provide specific feature details rather than  
paragraphs of repeated text.  This change would make it easier on the  
reader and the document would be easier to maintain.

-------------------------------------------------------------------------------------

* The typical use of VOSpace in Section 1.1 doesn't do an adequate job  
of demonstrating why VOSpace is as complex as it is.  Some use cases  
that validate the enhanced capabilities need to be provided.  Maybe  
some of this examples such as the data store capability in 3.4.1.1  
could be moved into this part of the document?

One known use of VOSpace is as a service embedded within or  
collaborating closely with other services such as TAP or other DAL  
services.  I think this usage should be mentioned and discussed.  Does  
the VOSpace spec work in this scenario?  Minimally, this implies to a  
reader that VOSpace was created to be part of an overall VO system.

-------------------------------------------------------------------------------------
* Descriptions (Property, Capability, Views, Protocols - referred to  
as PCVP below)

It is suggested that identifiers resolve to descriptions.  If they are  
resolvable, the descriptions include information that seems useful.  I  
suggest dropping the optional resolvability of the URI for a specific  
PCVP and including that information directly in the representation for  
the PCVP.  This obviates the problem with lack of registry  
representations for PropertyDescriptions, CapabilityDescriptions,  
ViewDescriptions, ProtocolDescriptions.

The tradeoff could be that representations would be larger than the  
might need to be.  On the other side is the added complexity of  
requiring clients to do a fetch to obtain the descriptions for every  
property, and the fact that doing the fetch could fail.  I think we  
would be better off adding this information into the item itself.

-------------------------------------------------------------------------------------
* Section 7, Compliance Matrix is needed.

I believe VOSpace needs some sort of "compliance level" to indicate to  
clients what is supported by a specific service and to allow certain  
assumptions that could make VOSpace use simpler for many common  
situations.

As a straw-man, I suggest Level 1 compliance SHALL include the  
following:

* Unstructured Data Node support
* httpput/httpget protocols required
* mandatory Level 1 views: TBD - votable, blob?
* security as needed
* client initiated transfers only: pushToVoSpace, pullFromVoSpace

These features/capabilities are not present in Level 1:
* no Properties
* no Capabilities
* no LinkNode, StructuredDataNode, ContainerNode
* no getProperties, getProtocols, getViews (could be supported for  
required features, but shouldn't be needed), no creation and node  
manipulation, no metadata accessing, no pullToVoSpace, pushFromVoSpace

A Level 2 implementation could include support for properties,  
capabilities, and their associated methods,

Level 3 could include the methods that appear to be administrative:  
node manipulation, advanced listings...

I think the VOSpace client-service interaction could be reduced to one  
interchange followed by data transfer for a Level 1 service with the  
defaults mentioned.

-------------------------------------------------------------------------------------

Section Comments

* In various places in the document the text refers to a "resource".   
Although it's a common term, a definition of the meaning of this term  
in the context of this document should be provided when the term is  
first used.

-------------------------------------------------------------------------------------

Section 1.1

I think this section is critical to understanding how VOSpace works  
and should be enlarged.

* Showing the XML passed back and forth between client and server is  
extremely useful.  The example use case would be improved if a mocked  
up example of a service reply was added to go with paragraph 4 (The  
service will reply...) to supplement the text.   The views and any  
other information could be shown.  The point being made in paragraph 4  
is that this response includes information the client uses to form the  
next part of the interaction, so seeing this would be help the reader  
understand the process.

* Besides adding the missing server responses, this section would be  
more valuable with the addition of more explanation for the pieces of  
the XML passing back and forth including references to the sections in  
the document where these elements are described.

* In my view this is a 3 step process, not 2 (as in paragraph 1).  I  
understand that VOSpace may only involved in steps 1, and 2, but  
clearly exchanging data with a VOSpace is a 3 step process.  Maybe a  
figure such as the one I've attached could be included that shows the  
3 steps and indicates the first two are VOSpace negotiation with the  
final step a transfer.

* Is "bmyspace" in paragraph 6 is a typo: http://nvo.caltech.edu/bvospace/myData/table123/transfers/147516ab?

Regarding the note, I don't think UML diagrams are a good choice at  
this point in the specification.  I suggest a couple figures similar  
to what I have attached to demonstrate the flow of a VOSpace  
interaction. I could make another if appropriate.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: VOSpace2Figure1.png
Type: image/png
Size: 64447 bytes
Desc: not available
URL: <http://www.ivoa.net/pipermail/vospace/attachments/20090616/e2193e00/attachment-0001.png>
-------------- next part --------------


-------------------------------------------------------------------------------------
Section 2

The following paragraph appears at the end of section 2:
"All ancestors in the hierarchy must be resolvable as containers  
(ContainerNodes), all the way up to the root node of the space (this  
precludes any system of implied hierarchy in the naming scheme for  
nodes with ancestors that are just logical entities and cannot be  
reified, e.g. the Amazon S3 system). "

On reading the entire spec, I don't see any reason for or value that  
this restriction (resolving all ancestors as containers) gives to the  
service or user.  There is a lot of complexity and exceptions required  
to ensure it is true making the spec more complex.  It should be good  
enough for vos://a/b/c/file to exist without requiring vos://a/b to  
actually be something that exists.  If someone needs to place  
something in vos://a/b they can certainly do it, and it has no  
influence on vos://a/b/c/file.  Regardless, I believe the S3 comment  
is inappropriate in a spec.

* My suggestion is that this constraint and all its associated  
exceptions be removed to be handled by the NodeNotFound exception.

* If there is a good reason for this restriction, we should _really_  
weigh whether or not the feature we are supporting with this  
constraint in favor of  eliminating a possible S3 implementation is  
worth it.

My preference would be that VOSpace embrace and encourage the use of  
S3.  This could get widespread VOSpace use going and would take  
advantage of the more available open source community tools.  This  
would leverage the Amazon's cloud providing a potential opportunity to  
reduce IT costs and a way for smaller institutions or sites to provide  
data more easily.  I could imagine VAO budgeting for some S3 capacity  
to be shared by all.   One idea is to target Level 1 implementations  
(see above) at systems like S3.

-------------------------------------------------------------------------------------

Top of page 7

* The first paragraph refers to the "resource key".  This term is not  
defined nor is it used anywhere else in the document.

* Is the scheme vos:// or  "vos"?  (see http://labs.apache.org/webarch/uri/rfc/rfc3986.html#components) 
?

Section 2.1

On page 7 the text says the following is a VOSpace identifier:  vos:// 
nvo.caltech!vospace/myresults/siap out 1.vot

It states the two step process for changing a VOSpace identifier into  
a HTTP URL is:
1. Change ! to /
2. _Add_  the http:// prefix

If the above referenced use of scheme is correct, then step 2 should  
be something like: "Replace the vos URI scheme with the http scheme."

-------------------------------------------------------------------------------------

3.2 Properties

The first paragraph and the concept/usefulness of properties would be  
clearer for the reader if paragraph one was extended to include  
examples of what Properties are used for within VOSpace with a user- 
oriented use case (possibly in section 1.1).  Are properties critical  
to VOSpace use/operation or a possibly useful addon?  Is support of  
properties mandatory?

3.2.1 Property Values

* Paragraph 2 states that services don't need to understand the  
meaning of all the properties of a node.  Then it says that properties  
that are not understood should be stored as text strings.  Since the  
first line of 3.2 says properties _are_ string-based meta data, what  
else can a service do than store it as text?   If the spec is implying  
that the value of the property can be used by the VOSpace  
implementation if it is understood, this should be stated.  I think  
this text should be clarified.

3.2.2 Property Identifiers

* The first sentence says property identifiers must have a unique  
URI.  Does this mean unique across all vospaces or within a single  
vospace?

It seems that potentially there could be a large number of property  
identifiers (as well as other VOSpace identifiers).  Yet the spec  
doesn't provide a recommended viable solution for identifiers +  
descriptions.   One approach doesn't scale, and the other can't  
satisfy the recommendations of the spec because it's not based on  
finalized technology.   An approach should be recommended in the spec  
that can be completely implemented now.

* If a simple URN as property identifier is not scalable for public  
use and is not recommended, let's just remove this text from the spec.

* Given that the current VOSpace schema defines property identifiers  
as anyURI (para 3), there is a third option of using unique URIs that  
aren't registry URIs as property identifiers.   The spec should  
recommend this option as the first choice along with a requirement  
that the URI be resolvable by the VOSpace server (see 3.2.3 below for  
an alternate that bypasses this problem).

*  TYPO:  paragraph 5 - "... resolved into to a description..."

3.2.3 Property Descriptions

All of the information in a PropertyDescription seems reasonable.  I  
think one could argue that a property representation should just  
include this information in addition to the items in 3.2.  That way  
clients get all the information, and the dependency on incomplete  
registry work goes away.

The tradeoff may be that property representations would be bigger than  
the might need to be.  On the other side is the added complexity of  
doing a URL fetch to obtain the descriptions for every property and  
the fact that doing the fetch could fail.  I think we are better off  
adding this information into the property itself.

-------------------------------------------------------------------------------------

Section 3.3 Capabilities

* My feeling is that this functionality described by Capabilities is  
not the core functionality of VOSpace, and we should consider dropping  
it from the spec.   Doesn't the registry provide this kind of  
information for users?

-------------------------------------------------------------------------------------

Section 3.4 Views

3.4.1 TYPO: "stores data as a binary files"

3.4.4

* The default views are a great idea.  I suggest that the spec also  
define some standardized views such as VOTable, FITS, etc.

* The spec should possibly require support for one or more standard  
views.  This could be done through the compliance levels.


3.4.1.1 Database store

* I see the value of views, and I see this section is trying to  
motivate creative uses of views.  This idea of using a VOSpace to  
access a database seems far fetched, and this functionality seems to  
overlap with TAP, which does it much better.

* It is a scenario for an advanced use of VOSpace.  It might be better  
integrated into section 1.1.

TYPO: "The contents of file would have been..."

3.4.2 View identifiers

* Given this section is basically cut/paste from 3.2.2, 3.3.2, 3.5.1.   
Is there some way that "identifiers" can be documented in one section  
of the document?  The sections seem to be essentially the same.

* Similar to my comments in 3.2.2 I think the text about approaches  
that don't scale should be dropped and the text should specify one  
approach that can be implemented now.

3.4.3.2 Mime types

* This section should state exactly what field of the HTTP response  
should be used for the MIME type.

3.4.4 Default Views
* It should be a priority to specify default VOSpace behavior such  
that all VOSpace implementations do something similar without the user  
providing an import/export view.

* I understand why the second paragraph states the default import view  
is OPTIONAL, but there SHOULD be a default "profile" for a VOSpace  
that just works.  Maybe this means users can import/export data into/ 
from UnstructuredDataNodes.

3.4.5 Container Views
This seems like it is a useful feature, although not possibly used  
often.  I think a scenario using this feature (and demonstrating  
views) should be added to 1.1.

* Second paragraph...  Can't a user access a subset of all the child  
nodes and get them in tar, zip, etc?  The line says the service SHALL  
package _all_ the child nodes.  This seems too restrictive.  Is the  
restriction required?

-------------------------------------------------------------------------------------

Section 3.7 Listings

* Would it be possible to use PQL for the match rather than yet  
another syntax for specifying search criteria?

-------------------------------------------------------------------------------------

Section 3.8 REST bindings

5.2.1

* Is it necessary to support creating vos:Node or vos:DataNode?

* Is there a user scenario that requires this.  The API would be  
simpler without.

* A better approach might be to indicate methods as part of an  
administration interface?

5.2.3 TYPO:   "... to be split across more than response"  _one_ needed?

5.2.4 findNodes

This operation is purposely marked as OPTIONAL.  Does that mean all  
methods not marked OPTIONAL  are required?  Aside from compliance  
levels, if certain methods are optional, we should mark each as  
required or optional.

5.4.1 pushToVoSpace

Under 5.4.1.1 Request

* It says, "If a Node already exists at the target URI, then the data  
SHALL be imported into the existing Node and the Node properties SHALL  
be cleared unless the node is a Container Node."

Isn't it required that the Node be a container node to allow importing  
data?  I suppose there could be a case where one DataNode is being  
overwritten with an updated version.  Is this the other case?

6.2 --
* I couldn't find the referenced document under the 2.0 section.

Section 7
I think this is a great idea and suggests support for compliance levels.





More information about the vospace mailing list