Re: remarks about VOSpace level 1

From: Dave Morris <dave-at-ast.cam.ac.uk>
Date: Wed, 16 Aug 2006 09:39:11 +0100


Markus Dolensky wrote:

>5.2.3
>The paging mechanism is certainly a 'nice to have'. Judging from the list of
>exceptions it is one of the most complicated concepts in this spec. Therefore,
>I'd suggest to provide means to filter node listings instead. Filtering by node
>property value and owner (user/creator), for instance.
>
>Paging is not enough on its own without some prior sorting that brings the
>relevant records to the top. Otherwise it is likely that one has to page all the
>way to the end anyway.
>
>So again, it appears rather complicated but inefficient from a user's
>perspective and one can live without it in level 1.
>
>

Mark Taylor has already mentioned some of this. The paging mechanism is not really intended to display a series of separate pages to the user e.g. Google search results.

As Mark has suggested, the reason for splitting the results into a series of pages is to enable the system to handle large numbers of files without overloading the server or client.

There is no limit on the number of files in a space, and one of the target requirements was that the system should be able to handle lists of over 10^6 files without failing.
Although this wouldn't be an ideal way of organizing the data, it is possible to imagine a complex workflow creating a huge number of intermediate temp files, or a VO event receiver that stored a record of all the events for it received. The paging mechanism enables us to avoid overloading the system if a client requested a list of the contents of a large space.

It also enables GUI developers to make the system appear much more responsive.
If a space contains a large number of files, a GUI client could start to render the first set of results as soon as it receives them, rather than wait for the whole list to arrive.

Your suggestion of sorting or filtering the list based on node properties would indeed be a very useful facility. We are planning to add more functionality to the list mechanism, including the ability to search, filter and sort the results based on node properties.
However, this will probably be added as separate search API in future versions of VOSpace.

>On the other hand it is much harder to life with some other limitations. Even
>when accepting that there are no directories and no operations to change access
>policy for the sake of simplicity I would still argue that some cross VOSpace
>store operations can be supported without bloating the level 1 specs:
>
>A VOSpace that supports pullData{To|From}VOSpace can exploit this functionality
>to implement copyNode and moveNode across stores.
>
>

It is very tempting to try and include this type of functionality. Unfortunately enabling pullData{To|From}VOSpace to move data direct from one space to another without involving the client does not scale well. Solving this is the reason that VOSpace has taken so long to get to this stage.

If the transfer only involves two spaces, then a direct transfer would probably work.
However, the next stages in the VOSpace road map are to add containers and inter-space links which will link the separate space services and make them behave as one unified space.

This document may provide a better explanation of why we avoided server to server method calls.
http://wiki.astrogrid.org/bin/view/Astrogrid/VoSpace20060228

The inter-space links mean that the target path for a transfer may not point to a file within the a single space, part of the path include a link into another space, which may in turn contain a link into another space ... etc.

>5.2.4
>What is the practical purpose of moveNode as is?
>
>According to the specs the data are untouched and it's just a way to create
>another node with some different identifier. Here's what I would expect from an
>operation called moveNode:
>
>- each node has a 'mandatory' name property similar to a filename
>- moveNode sets this property to a new value
>- when a different store is given as a 2nd parameter then it physically copies
>the data using pullDataToVOSpace and deleteNode.
>
>

You are right, within a flat space of a VOSpace-1.0 service MoveNode is effectively 'rename'.
Once we get containers in place (top of the list for next version), MoveNode will move the node within the tree of containers within the space.

However, moving data between space services will still be part of the client API.
Again, moving data between services without involving the client does not scale well once we have inter-space links. (see above document on links)

>A last general remark:
>Only after going over the verbose XML messages it became apparent to me that
>this spec allows for specialized stores for DB storage and image manipulation.
>It would be interesting to see a bit clearer how to exploit this functionality
>given some scenario rather than being extremely detailed about specific XML
>messages.
>
>

Point taken.
There is still a lot of work to do in describing how it all fits together, with clearer and more detailed more examples and explanations.

>So, thanks to the authors for their hard work in finding their way to an agreed
>document (which always involves difficult compromises).
>
>- Markus
>
>

Thank you for your comments and feedback. VOSpace-1.0 provides a very basic service, but we hope to address many of the limitations you highlighted in future versions of the specification.

Many thanks,
Dave Morris Received on 2006-08-16Z10:40:44