First version of VOSpace 2.0 WD

Matthew Graham mjg at cacr.caltech.edu
Tue Jun 16 08:45:33 PDT 2009


Hi Kim,

Thanks for the detailed comments. The VOSpace 2.0 spec is currently  
under revision following the Interop to incorporate changes that we  
agreed on. I hope to be able to release a new WD in a few weeks time  
and will try and include as many of your comments as are appropriate  
and will push unresolved issues to discussion on the list.

	Cheers,

	Matthew



On Jun 16, 2009, at 8:20 AM, K Gillies wrote:

> Hi All,
>
> These are my comments on the VOSpace WD 2.00 document.   Sorry for  
> the delay.  My comments are primarily targeted at providing a better  
> specification document, but I have included some technical comments  
> as well.   As someone relatively new to VO, I may be repeating some  
> points that have been discussed in the past.
>
> Kim
>
> -------------------------------------------------------------------------------------
>
> Overall Comments
>
> * The description of service operations in Section 5 is consistent,  
> complete and clearly will be of value to implementers.
>
> With so many detailed error conditions, we should consider creating  
> a compliance test suite to enable developers to know their  
> implementation is correct.  Knowing implementations have passed a  
> test suite will also make it easier on clients.
>
> * The document needs page numbers so the table of contents is useful.
>
> * The example XML in section 1.1 is very useful. The text would be  
> easier for the reader to understand if more examples of required XML  
> were interspersed with the text.  (see below)
>
> * The sections on various identifiers and descriptions (views,  
> capabilities, properties, protocols) that are "copy/pasted" should  
> be combined.  The various concepts could be explained in one place,  
> and each section could provide specific feature details rather than  
> paragraphs of repeated text.  This change would make it easier on  
> the reader and the document would be easier to maintain.
>
> -------------------------------------------------------------------------------------
>
> * The typical use of VOSpace in Section 1.1 doesn't do an adequate  
> job of demonstrating why VOSpace is as complex as it is.  Some use  
> cases that validate the enhanced capabilities need to be provided.   
> Maybe some of this examples such as the data store capability in  
> 3.4.1.1 could be moved into this part of the document?
>
> One known use of VOSpace is as a service embedded within or  
> collaborating closely with other services such as TAP or other DAL  
> services.  I think this usage should be mentioned and discussed.   
> Does the VOSpace spec work in this scenario?  Minimally, this  
> implies to a reader that VOSpace was created to be part of an  
> overall VO system.
>
> -------------------------------------------------------------------------------------
> * Descriptions (Property, Capability, Views, Protocols - referred to  
> as PCVP below)
>
> It is suggested that identifiers resolve to descriptions.  If they  
> are resolvable, the descriptions include information that seems  
> useful.  I suggest dropping the optional resolvability of the URI  
> for a specific PCVP and including that information directly in the  
> representation for the PCVP.  This obviates the problem with lack of  
> registry representations for PropertyDescriptions,  
> CapabilityDescriptions, ViewDescriptions, ProtocolDescriptions.
>
> The tradeoff could be that representations would be larger than the  
> might need to be.  On the other side is the added complexity of  
> requiring clients to do a fetch to obtain the descriptions for every  
> property, and the fact that doing the fetch could fail.  I think we  
> would be better off adding this information into the item itself.
>
> -------------------------------------------------------------------------------------
> * Section 7, Compliance Matrix is needed.
>
> I believe VOSpace needs some sort of "compliance level" to indicate  
> to clients what is supported by a specific service and to allow  
> certain assumptions that could make VOSpace use simpler for many  
> common situations.
>
> As a straw-man, I suggest Level 1 compliance SHALL include the  
> following:
>
> * Unstructured Data Node support
> * httpput/httpget protocols required
> * mandatory Level 1 views: TBD - votable, blob?
> * security as needed
> * client initiated transfers only: pushToVoSpace, pullFromVoSpace
>
> These features/capabilities are not present in Level 1:
> * no Properties
> * no Capabilities
> * no LinkNode, StructuredDataNode, ContainerNode
> * no getProperties, getProtocols, getViews (could be supported for  
> required features, but shouldn't be needed), no creation and node  
> manipulation, no metadata accessing, no pullToVoSpace, pushFromVoSpace
>
> A Level 2 implementation could include support for properties,  
> capabilities, and their associated methods,
>
> Level 3 could include the methods that appear to be administrative:  
> node manipulation, advanced listings...
>
> I think the VOSpace client-service interaction could be reduced to  
> one interchange followed by data transfer for a Level 1 service with  
> the defaults mentioned.
>
> -------------------------------------------------------------------------------------
>
> Section Comments
>
> * In various places in the document the text refers to a  
> "resource".  Although it's a common term, a definition of the  
> meaning of this term in the context of this document should be  
> provided when the term is first used.
>
> -------------------------------------------------------------------------------------
>
> Section 1.1
>
> I think this section is critical to understanding how VOSpace works  
> and should be enlarged.
>
> * Showing the XML passed back and forth between client and server is  
> extremely useful.  The example use case would be improved if a  
> mocked up example of a service reply was added to go with paragraph  
> 4 (The service will reply...) to supplement the text.   The views  
> and any other information could be shown.  The point being made in  
> paragraph 4 is that this response includes information the client  
> uses to form the next part of the interaction, so seeing this would  
> be help the reader understand the process.
>
> * Besides adding the missing server responses, this section would be  
> more valuable with the addition of more explanation for the pieces  
> of the XML passing back and forth including references to the  
> sections in the document where these elements are described.
>
> * In my view this is a 3 step process, not 2 (as in paragraph 1).  I  
> understand that VOSpace may only involved in steps 1, and 2, but  
> clearly exchanging data with a VOSpace is a 3 step process.  Maybe a  
> figure such as the one I've attached could be included that shows  
> the 3 steps and indicates the first two are VOSpace negotiation with  
> the final step a transfer.
>
> * Is "bmyspace" in paragraph 6 is a typo: http://nvo.caltech.edu/bvospace/myData/table123/transfers/147516ab?
>
> Regarding the note, I don't think UML diagrams are a good choice at  
> this point in the specification.  I suggest a couple figures similar  
> to what I have attached to demonstrate the flow of a VOSpace  
> interaction. I could make another if appropriate.
>
> <VOSpace2Figure1.png>
>
> -------------------------------------------------------------------------------------
> Section 2
>
> The following paragraph appears at the end of section 2:
> "All ancestors in the hierarchy must be resolvable as containers  
> (ContainerNodes), all the way up to the root node of the space (this  
> precludes any system of implied hierarchy in the naming scheme for  
> nodes with ancestors that are just logical entities and cannot be  
> reified, e.g. the Amazon S3 system). "
>
> On reading the entire spec, I don't see any reason for or value that  
> this restriction (resolving all ancestors as containers) gives to  
> the service or user.  There is a lot of complexity and exceptions  
> required to ensure it is true making the spec more complex.  It  
> should be good enough for vos://a/b/c/file to exist without  
> requiring vos://a/b to actually be something that exists.  If  
> someone needs to place something in vos://a/b they can certainly do  
> it, and it has no influence on vos://a/b/c/file.  Regardless, I  
> believe the S3 comment is inappropriate in a spec.
>
> * My suggestion is that this constraint and all its associated  
> exceptions be removed to be handled by the NodeNotFound exception.
>
> * If there is a good reason for this restriction, we should _really_  
> weigh whether or not the feature we are supporting with this  
> constraint in favor of  eliminating a possible S3 implementation is  
> worth it.
>
> My preference would be that VOSpace embrace and encourage the use of  
> S3.  This could get widespread VOSpace use going and would take  
> advantage of the more available open source community tools.  This  
> would leverage the Amazon's cloud providing a potential opportunity  
> to reduce IT costs and a way for smaller institutions or sites to  
> provide data more easily.  I could imagine VAO budgeting for some S3  
> capacity to be shared by all.   One idea is to target Level 1  
> implementations (see above) at systems like S3.
>
> -------------------------------------------------------------------------------------
>
> Top of page 7
>
> * The first paragraph refers to the "resource key".  This term is  
> not defined nor is it used anywhere else in the document.
>
> * Is the scheme vos:// or  "vos"?  (see http://labs.apache.org/webarch/uri/rfc/rfc3986.html#components) 
> ?
>
> Section 2.1
>
> On page 7 the text says the following is a VOSpace identifier:   
> vos://nvo.caltech!vospace/myresults/siap out 1.vot
>
> It states the two step process for changing a VOSpace identifier  
> into a HTTP URL is:
> 1. Change ! to /
> 2. _Add_  the http:// prefix
>
> If the above referenced use of scheme is correct, then step 2 should  
> be something like: "Replace the vos URI scheme with the http scheme."
>
> -------------------------------------------------------------------------------------
>
> 3.2 Properties
>
> The first paragraph and the concept/usefulness of properties would  
> be clearer for the reader if paragraph one was extended to include  
> examples of what Properties are used for within VOSpace with a user- 
> oriented use case (possibly in section 1.1).  Are properties  
> critical to VOSpace use/operation or a possibly useful addon?  Is  
> support of properties mandatory?
>
> 3.2.1 Property Values
>
> * Paragraph 2 states that services don't need to understand the  
> meaning of all the properties of a node.  Then it says that  
> properties that are not understood should be stored as text  
> strings.  Since the first line of 3.2 says properties _are_ string- 
> based meta data, what else can a service do than store it as text?    
> If the spec is implying that the value of the property can be used  
> by the VOSpace implementation if it is understood, this should be  
> stated.  I think this text should be clarified.
>
> 3.2.2 Property Identifiers
>
> * The first sentence says property identifiers must have a unique  
> URI.  Does this mean unique across all vospaces or within a single  
> vospace?
>
> It seems that potentially there could be a large number of property  
> identifiers (as well as other VOSpace identifiers).  Yet the spec  
> doesn't provide a recommended viable solution for identifiers +  
> descriptions.   One approach doesn't scale, and the other can't  
> satisfy the recommendations of the spec because it's not based on  
> finalized technology.   An approach should be recommended in the  
> spec that can be completely implemented now.
>
> * If a simple URN as property identifier is not scalable for public  
> use and is not recommended, let's just remove this text from the spec.
>
> * Given that the current VOSpace schema defines property identifiers  
> as anyURI (para 3), there is a third option of using unique URIs  
> that aren't registry URIs as property identifiers.   The spec should  
> recommend this option as the first choice along with a requirement  
> that the URI be resolvable by the VOSpace server (see 3.2.3 below  
> for an alternate that bypasses this problem).
>
> *  TYPO:  paragraph 5 - "... resolved into to a description..."
>
> 3.2.3 Property Descriptions
>
> All of the information in a PropertyDescription seems reasonable.  I  
> think one could argue that a property representation should just  
> include this information in addition to the items in 3.2.  That way  
> clients get all the information, and the dependency on incomplete  
> registry work goes away.
>
> The tradeoff may be that property representations would be bigger  
> than the might need to be.  On the other side is the added  
> complexity of doing a URL fetch to obtain the descriptions for every  
> property and the fact that doing the fetch could fail.  I think we  
> are better off adding this information into the property itself.
>
> -------------------------------------------------------------------------------------
>
> Section 3.3 Capabilities
>
> * My feeling is that this functionality described by Capabilities is  
> not the core functionality of VOSpace, and we should consider  
> dropping it from the spec.   Doesn't the registry provide this kind  
> of information for users?
>
> -------------------------------------------------------------------------------------
>
> Section 3.4 Views
>
> 3.4.1 TYPO: "stores data as a binary files"
>
> 3.4.4
>
> * The default views are a great idea.  I suggest that the spec also  
> define some standardized views such as VOTable, FITS, etc.
>
> * The spec should possibly require support for one or more standard  
> views.  This could be done through the compliance levels.
>
>
> 3.4.1.1 Database store
>
> * I see the value of views, and I see this section is trying to  
> motivate creative uses of views.  This idea of using a VOSpace to  
> access a database seems far fetched, and this functionality seems to  
> overlap with TAP, which does it much better.
>
> * It is a scenario for an advanced use of VOSpace.  It might be  
> better integrated into section 1.1.
>
> TYPO: "The contents of file would have been..."
>
> 3.4.2 View identifiers
>
> * Given this section is basically cut/paste from 3.2.2, 3.3.2,  
> 3.5.1.  Is there some way that "identifiers" can be documented in  
> one section of the document?  The sections seem to be essentially  
> the same.
>
> * Similar to my comments in 3.2.2 I think the text about approaches  
> that don't scale should be dropped and the text should specify one  
> approach that can be implemented now.
>
> 3.4.3.2 Mime types
>
> * This section should state exactly what field of the HTTP response  
> should be used for the MIME type.
>
> 3.4.4 Default Views
> * It should be a priority to specify default VOSpace behavior such  
> that all VOSpace implementations do something similar without the  
> user providing an import/export view.
>
> * I understand why the second paragraph states the default import  
> view is OPTIONAL, but there SHOULD be a default "profile" for a  
> VOSpace that just works.  Maybe this means users can import/export  
> data into/from UnstructuredDataNodes.
>
> 3.4.5 Container Views
> This seems like it is a useful feature, although not possibly used  
> often.  I think a scenario using this feature (and demonstrating  
> views) should be added to 1.1.
>
> * Second paragraph...  Can't a user access a subset of all the child  
> nodes and get them in tar, zip, etc?  The line says the service  
> SHALL package _all_ the child nodes.  This seems too restrictive.   
> Is the restriction required?
>
> -------------------------------------------------------------------------------------
>
> Section 3.7 Listings
>
> * Would it be possible to use PQL for the match rather than yet  
> another syntax for specifying search criteria?
>
> -------------------------------------------------------------------------------------
>
> Section 3.8 REST bindings
>
> 5.2.1
>
> * Is it necessary to support creating vos:Node or vos:DataNode?
>
> * Is there a user scenario that requires this.  The API would be  
> simpler without.
>
> * A better approach might be to indicate methods as part of an  
> administration interface?
>
> 5.2.3 TYPO:   "... to be split across more than response"  _one_  
> needed?
>
> 5.2.4 findNodes
>
> This operation is purposely marked as OPTIONAL.  Does that mean all  
> methods not marked OPTIONAL  are required?  Aside from compliance  
> levels, if certain methods are optional, we should mark each as  
> required or optional.
>
> 5.4.1 pushToVoSpace
>
> Under 5.4.1.1 Request
>
> * It says, "If a Node already exists at the target URI, then the  
> data SHALL be imported into the existing Node and the Node  
> properties SHALL be cleared unless the node is a Container Node."
>
> Isn't it required that the Node be a container node to allow  
> importing data?  I suppose there could be a case where one DataNode  
> is being overwritten with an updated version.  Is this the other case?
>
> 6.2 --
> * I couldn't find the referenced document under the 2.0 section.
>
> Section 7
> I think this is a great idea and suggests support for compliance  
> levels.
>
>
>



More information about the vospace mailing list