VOStore interface

Matthew Graham mjg at cacr.caltech.edu
Fri Aug 5 11:22:47 PDT 2005


If the user is accessing the local storage system directly then they can 
do whatever they want. VOStore, however, is the presentation of that 
repository to the VO world and does not necessarily interface with a 
VOSpace layer: this means that the VOStore interface has to be capable 
of handling the VO authentication mechanism. The authorization story is 
as we seemed to have agreed.



Reagan Moore wrote:

> Matthew:
> The expectation is that the VOStore interface does not need to do 
> either authentication or authorization.  If a person is working 
> directly with a local storage system, then they are accessing their 
> own personal data while running under their personal account ID. They 
> can execute the VOStore interface as a local application.
> If VOSpace is accessing the local storage system through VOStore, then 
> VOSpace authenticates its access to the local storage system to read 
> or write files under the VOSpace account ID.  Again VOStore is just a 
> local application that VOSpace executes.
> If the owner of data on the local storage repository chooses to make a 
> file world readable, then VOSpace would be able to access the file 
> through VOStore.
> Reagan
>> Reagan Moore wrote:
>>> I would like to propose the following separation of identity and 
>>> access control management.  The issues appear to be how to separate 
>>> support for local files in a local storage repository from the files 
>>> that are registered into a shared collection that spans multiple 
>>> storage repositories.  An easy way to make the differentiation is to 
>>> identify the usage model for each type of data management system.  I 
>>> would like to learn whether this approach would meet all of the IVOA 
>>> requirements.
>>> Local storage repository:
>>> This is a storage system that is controlled by local administrators 
>>> who establish access accounts for the persons who are allowed to use 
>>> the system.
>>> The users can choose their own file names, manipulate the files with 
>>> the utilities that are available on the local storage, and are 
>>> authenticated by the local system.  If desired, a user could log 
>>> onto the local storage repository, and use a VO specific interface 
>>> such as VOStore to access their own personal data.  Since VOStore 
>>> would be run under their account ID to access files that they own, 
>>> there is no additional required authentication.  They could also use 
>>> other access mechanisms such as perl scripts, or Unix shell 
>>> commands, C library calls, whatever is supported on the local 
>>> storage repository.  These access mechanisms allow them to access 
>>> files that they own.
>>> A VOStore interface for this usage model would provide:
>>> - get file
>>> - put file
>>> - list files
>>> The only advantage is that if the VOStore interface were supported 
>>> on all local storage repositories, the user would have a standard 
>>> access mechanism.
>>> Shared collection - VOSpace:
>>> The purpose of the shared collection is to organize files across 
>>> multiple storage repositories, provide a way to register files into 
>>> the shared collection, establish access controls on the shared data, 
>>> provide standard services for manipulating the files (Cone Search, 
>>> SIAP, SSAP, Mosaic, ...), support replication, support selection of 
>>> the closest file.
>>> The shared collection provides a global (or logical) name space that 
>>> can be organized in a directory structure independently of the 
>>> naming convention and path hierarchy employed at the local storage 
>>> systems. Thus the VOSpace system must manage the mapping from the 
>>> logical name space to the naming convention used in the local 
>>> storage system.
>>> An account ID is established under which the shared collection 
>>> (VOSpace) is able to deposit files in the local storage repository. 
>>> This means the shared collection owns the data that is stored at the 
>>> local storage repository.  In order to access the data, a user would 
>>> need to authenticate herself to the shared collection, which in turn 
>>> authenticates itself to the local storage repository. Whether or not 
>>> to allow the access is controlled by ACLs managed by VOSpace.  This 
>>> means that the authentication mechanism used by VOSpace is 
>>> completely independent of the authentication mechanisms used by the 
>>> local storage systems.
>>> In order to handle the fact that local storage systems use a variety 
>>> of authentication mechanisms (Unix password, PKI certificates, 
>>> Kerberos certificates, DCE credentials, ...) the VOSpace 
>>> implementation could use the Generic Security Service API (GSSAPI) 
>>> to handle the heterogeneity.  In addition, an arbitrary 
>>> authentication mechanism can be chosen for authenticating users to 
>>> VOSpace.
>>> If a VOStore interface is provided by the local storage repository, 
>>> then VOSpace would be able to invoke the VOStore access mechanism 
>>> (running under the VOSpace account ID).  Note that in this model 
>>> VOStore does no authentication.  All authentication is controlled by 
>>> a combination of the local storage system and VOSpace.
>>> The type of operations that would be required by VOStore, however, 
>>> are more sophisticated.  They include:
>>> - get file
>>> - put file
>>> - list files
>>> - register an existing file into VOSpace, while mapping from the 
>>> local name to the VOSpace preferred name
>>> - register an existing directory structure into VOSpace, while 
>>> setting the VOSpace logical names and VOSpace directory structure to 
>>> be the same as the local directory structure
>>> - register an existing local file into VOSpace as a replica of an 
>>> existing VOSpace logical file.
>>> With the latter three commands, it is possible to meet the specific 
>>> requirement that users be able to control the names of files both on 
>>> the local system and in VOSpace.  Note that for the user to access 
>>> the local file system they required an account ID on the local file 
>>> system.  They then stored a local file under their own account ID. 
>>> They would add read permission for the VOSpace account ID to their 
>>> local file to permit access by VOSpace.
>>> This separates authorization cleanly between the local storage 
>>> system (which only checks for access by local account IDs) and the 
>>> VOSpace shared collection (which authorizes all accesses to files 
>>> owned by VOSpace).  This means that VOSpace is managing multiple 
>>> levels of indirection:
>>> - mapping from the global or logical file name space to the local 
>>> repository name space
>>> - mapping from an authenticated user through application of ACLs to 
>>> decide whether the user can read a VOSpace owned file.
>>> - mapping preferred location for accessing replicas (typically pick 
>>> a file on the file system with the user's IP address, then any other 
>>> file system, then a tape archive)
>>> For completeness, VOStore may need an operation that sets access 
>>> permission for VOSpace, when VOStore is run under the local user 
>>> account ID.
>>> Reagan Moore
>>>> I think that most of what is VOStore and what is VOSpace is clear; 
>>>> however, the two grey areas are access control (authorization) and 
>>>> identifiers and this stems from the use case where the user wants 
>>>> direct access to a VOStore (e.g. a local store) and does not want 
>>>> to go through the VOSpace layer. Here are my suggestions for 
>>>> handling these areas:
>>>> Access control:
>>>> -------------------
>>>> A VOStore can run in two modes: authorized and unauthorized. An 
>>>> unauthorized VOStore is semantically equivalent to an anonymous ftp 
>>>> site: any authenticated user (we still maintain security) can put 
>>>> something in, move/rename it, get it and delete it.
>>>> An authorized VOStore will only allow the requested operation if a 
>>>> valid authentication token is included in the request - all the 
>>>> VOStore has to do here is validate the authentication token. The 
>>>> generation of the authentication token is handled by VOSpace: it 
>>>> makes sure that the authenticated user has permission to do what 
>>>> they are requesting and if so, places a valid token in the request 
>>>> down to the VOStore.
>>>> Identifiers:
>>>> --------------
>>>> The protocol identifier ivo:// identifies a resource that exists in 
>>>> the VO. It does not promise that you can completely resolve a URI 
>>>> beginning ivo:// in a registry, merely that some component of the 
>>>> URI will relate to a resource that has a registry entry, i.e. the 
>>>> bit before the first # can be resolved in a registry. So I can go 
>>>> to a registry and find out where 
>>>> ivo://nvo.caltech/vostores/vostore1 is
>>>> but I need to go to VOStore interface for this store to resolve 
>>>> ivo://nvo.caltech/vostores/vostore1#halibut3. I do not see why we 
>>>> need to introduce a second protocol just for VOStore contents.
>>>> Now resolution of individual VOStore identifiers has to be done at 
>>>> the VOStore level; however, VOSpace gives you the ability to set up 
>>>> a single logical identifier for multiple copies of the same 
>>>> resource so here we might want a separate protocol: vos and 
>>>> resolution of this identifier has to be done at the VOSpace level 
>>>> since VOSpace manages multiple VOStores.
>>>>    Cheers,
>>>>    Matthew
>>>> Paul Harrison wrote:
>>>>> Reagan Moore wrote:
>>>>>> The differentiation between the VOStore and VOSpace interfaces is 
>>>>>> becoming unclear.  The latest draft implies that properties that 
>>>>>> were originally associated with VOSpace would now be supported by 
>>>>>> VOStore.
>>>>> I have to say that I agree that there seems to be some confusion 
>>>>> in this area - with hindsight it was probably a mistake to defer 
>>>>> the specification of VOSpace and work on VOStore alone as the 
>>>>> "easier" problem - the specifications should be worked in tandem 
>>>>> to see where it is most appropriate to place roles and 
>>>>> responsibilities for particular use cases, so that a "global" 
>>>>> solution is arrived at.
>>>>> I thought that the original separation into VOStore and VOSpace 
>>>>> was done so that VOStore could be an essentially "dumb" BLOB 
>>>>> repository that did what it was told by the VOStore layer when it 
>>>>> comes to issues of file permissions and hierarchical file names. 
>>>>> However, because no VOSpace specification was created, these more 
>>>>> advanced features have crept into the VOStore layer.
>>>>>> Let's look at the current VOStore and VOSpace proposal:
>>>>>> VOStore                                     VOSpace
>>>>>> Storage of objects                          management of virtual 
>>>>>> file system
>>>>>> data stored under unspecified ID?
>>>>>> no user home directory                      User home directory
>>>>>> directory hierarchy                         Directory hierarchy
>>>>>> Unique file name within storage             User-defined file names
>>>>>>                                             Mapping VOSpace name 
>>>>>> to VOStore name
>>>>>>                                             List files for user
>>>>>> Restrict access by user identity?
>>>>>> Identify files with URIs
>>>>>> Access controls on local file name          Access controls on 
>>>>>> VOSPace name
>>>>>> This characterization mixes name space, mixes access controls, 
>>>>>> does not provide consistent identity, does not allow consistent 
>>>>>> management.  For instance, if a URI is being provided for file 
>>>>>> identity within the VOStore interface, then there is no need for 
>>>>>> user-specified names within VOSTore.  A second issue is the 
>>>>>> assumption that file access can be restricted by user identity. 
>>>>>> This means that the VOStore must manage the owner for each file, 
>>>>>> access controls for each file.  File systems usually do this by 
>>>>>> creating accounts for each user name and applying Unix 
>>>>>> permissions.  Is this capability to be provided now by both 
>>>>>> VOSpace and VOStore?  We need a cleaner separation of capabilities.
>>>>> This security aspect is crucial - it is clear that the owners of 
>>>>> VOStores would not want to be managing user identity lists of all 
>>>>> the VObs users at their stores - the fine grained access controls 
>>>>> should be at the VOSpace level. If VOStores only respond to 
>>>>> requests from trusted VOSpace services then this is possible, but 
>>>>> I think that the perceived requirement for more detailed access 
>>>>> control in the VOSpace layer has come about because prototype 
>>>>> end-user applications have appeared that talk directly to the 
>>>>> VOStore layer - of course, it is not surprising that this has 
>>>>> happened because there was no VOSpace definition for the end user 
>>>>> applications to talk to.
>>>>> How file/BLOB identity is managed is also crucial to producing a 
>>>>> system that offers more than ftp. I thought that one of the 
>>>>> fundamental driving  use cases for a VOSpace was that the same 
>>>>> BLOB could potentially live on serveral VOStores, and that when 
>>>>> specifying a resource in VOSpace, in a workflow for instance, the 
>>>>> resource could be retrieved from the VOStore that was "closest" on 
>>>>> the network to where the resource would be consumed. This sort of 
>>>>> use case does require some careful thought about the allocation 
>>>>> and management of identifiers, and I think probably means that the 
>>>>> VOStore will have to be aware of the VOSpace identifier.
>>>>> I also have an issue with reusing ivo: as the protocol part for 
>>>>> the URI of an identifier in this system - ivo: is already well 
>>>>> defined and used as the identifer for registry entries, and the 
>>>>> "protocol" for accessing the entity associated with the identifier 
>>>>> is defined in the registry interface standard. This means that 
>>>>> given an identifier of the form ivo://authority.org/something#blah 
>>>>> a software agent (or human for that matter) cannot tell by 
>>>>> inspection whether the identifier refers to a file in VOSpace or 
>>>>> is simply a reference to a registry entry (e.g. for a SkyNode) - 
>>>>> this leads to software having to be more complex in order 
>>>>> constantly to test for the different possibilities. I think that 
>>>>> it would be better to have a URI with a different protocol part, 
>>>>> vos: for instance, it would then be immediately apparent that the 
>>>>> VOSpace protocol should be used to access the entity referred to 
>>>>> by the identifier.
>>>>>> Let's look at the Storage Resource Broker data grid separation of 
>>>>>> local storage management from the virtual file system management:
>>>>>> Local storage system                        SRB name space
>>>>>> Storage of objects                          management of virtual 
>>>>>> file system
>>>>>> data stored under SRB ID
>>>>>> no user home directory                      User home directory
>>>>>> directory indirection structure             Directory hierarchy
>>>>>> Unique file name within storage             User-defined file names
>>>>>>                                             Mapping SRB name to 
>>>>>> local file name
>>>>>>                                             List files for user
>>>>>> Access through SRB ID, controlled by SRB
>>>>>>                                             Identify files by URIs
>>>>>>                                             Access controls on 
>>>>>> SRB name
>>>>> I think that as Regan points out the separation of 
>>>>> responsibilities that  SRB has with the local storage system is 
>>>>> pretty much the right model for  VOSpace and VOStore - though it 
>>>>> means that SRB is pretty much at VOSpace level rather than a 
>>>>> VOStore as is suggested in the current VOSpace definition document.
>> Hi,
>> If you also allow the possibility that the local storage repository 
>> can run in an unauthorized (anonymous access) manner then this is 
>> exactly what Guy and I were suggesting. Does that mean that we 
>> actually all agree on this :-)
>>    Cheers,
>>    Matthew

More information about the vospace mailing list