VOStore interface

Matthew Graham mjg at cacr.caltech.edu
Thu Aug 4 11:56:54 PDT 2005


Reagan Moore wrote:

> I would like to propose the following separation of identity and 
> access control management.  The issues appear to be how to separate 
> support for local files in a local storage repository from the files 
> that are registered into a shared collection that spans multiple 
> storage repositories.  An easy way to make the differentiation is to 
> identify the usage model for each type of data management system.  I 
> would like to learn whether this approach would meet all of the IVOA 
> requirements.
>
> Local storage repository:
>
> This is a storage system that is controlled by local administrators 
> who establish access accounts for the persons who are allowed to use 
> the system.
> The users can choose their own file names, manipulate the files with 
> the utilities that are available on the local storage, and are 
> authenticated by the local system.  If desired, a user could log onto 
> the local storage repository, and use a VO specific interface such as 
> VOStore to access their own personal data.  Since VOStore would be run 
> under their account ID to access files that they own, there is no 
> additional required authentication.  They could also use other access 
> mechanisms such as perl scripts, or Unix shell commands, C library 
> calls, whatever is supported on the local storage repository.  These 
> access mechanisms allow them to access files that they own.
>
> A VOStore interface for this usage model would provide:
> - get file
> - put file
> - list files
> The only advantage is that if the VOStore interface were supported on 
> all local storage repositories, the user would have a standard access 
> mechanism.
>
> Shared collection - VOSpace:
>
> The purpose of the shared collection is to organize files across 
> multiple storage repositories, provide a way to register files into 
> the shared collection, establish access controls on the shared data, 
> provide standard services for manipulating the files (Cone Search, 
> SIAP, SSAP, Mosaic, ...), support replication, support selection of 
> the closest file.
>
> The shared collection provides a global (or logical) name space that 
> can be organized in a directory structure independently of the naming 
> convention and path hierarchy employed at the local storage systems. 
> Thus the VOSpace system must manage the mapping from the logical name 
> space to the naming convention used in the local storage system.
>
> An account ID is established under which the shared collection 
> (VOSpace) is able to deposit files in the local storage repository. 
> This means the shared collection owns the data that is stored at the 
> local storage repository.  In order to access the data, a user would 
> need to authenticate herself to the shared collection, which in turn 
> authenticates itself to the local storage repository.  Whether or not 
> to allow the access is controlled by ACLs managed by VOSpace.  This 
> means that the authentication mechanism used by VOSpace is completely 
> independent of the authentication mechanisms used by the local storage 
> systems.
>
> In order to handle the fact that local storage systems use a variety 
> of authentication mechanisms (Unix password, PKI certificates, 
> Kerberos certificates, DCE credentials, ...) the VOSpace 
> implementation could use the Generic Security Service API (GSSAPI) to 
> handle the heterogeneity.  In addition, an arbitrary authentication 
> mechanism can be chosen for authenticating users to VOSpace.
>
> If a VOStore interface is provided by the local storage repository, 
> then VOSpace would be able to invoke the VOStore access mechanism 
> (running under the VOSpace account ID).  Note that in this model 
> VOStore does no authentication.  All authentication is controlled by a 
> combination of the local storage system and VOSpace.
>
> The type of operations that would be required by VOStore, however, are 
> more sophisticated.  They include:
> - get file
> - put file
> - list files
> - register an existing file into VOSpace, while mapping from the local 
> name to the VOSpace preferred name
> - register an existing directory structure into VOSpace, while setting 
> the VOSpace logical names and VOSpace directory structure to be the 
> same as the local directory structure
> - register an existing local file into VOSpace as a replica of an 
> existing VOSpace logical file.
>
> With the latter three commands, it is possible to meet the specific 
> requirement that users be able to control the names of files both on 
> the local system and in VOSpace.  Note that for the user to access the 
> local file system they required an account ID on the local file 
> system.  They then stored a local file under their own account ID. 
> They would add read permission for the VOSpace account ID to their 
> local file to permit access by VOSpace.
>
> This separates authorization cleanly between the local storage system 
> (which only checks for access by local account IDs) and the VOSpace 
> shared collection (which authorizes all accesses to files owned by 
> VOSpace).  This means that VOSpace is managing multiple levels of 
> indirection:
> - mapping from the global or logical file name space to the local 
> repository name space
> - mapping from an authenticated user through application of ACLs to 
> decide whether the user can read a VOSpace owned file.
> - mapping preferred location for accessing replicas (typically pick a 
> file on the file system with the user's IP address, then any other 
> file system, then a tape archive)
>
> For completeness, VOStore may need an operation that sets access 
> permission for VOSpace, when VOStore is run under the local user 
> account ID.
>
>
> Reagan Moore
>
>
>
>>
>> I think that most of what is VOStore and what is VOSpace is clear; 
>> however, the two grey areas are access control (authorization) and 
>> identifiers and this stems from the use case where the user wants 
>> direct access to a VOStore (e.g. a local store) and does not want to 
>> go through the VOSpace layer. Here are my suggestions for handling 
>> these areas:
>>
>> Access control:
>> -------------------
>>
>> A VOStore can run in two modes: authorized and unauthorized. An 
>> unauthorized VOStore is semantically equivalent to an anonymous ftp 
>> site: any authenticated user (we still maintain security) can put 
>> something in, move/rename it, get it and delete it.
>> An authorized VOStore will only allow the requested operation if a 
>> valid authentication token is included in the request - all the 
>> VOStore has to do here is validate the authentication token. The 
>> generation of the authentication token is handled by VOSpace: it 
>> makes sure that the authenticated user has permission to do what they 
>> are requesting and if so, places a valid token in the request down to 
>> the VOStore.
>>
>> Identifiers:
>> --------------
>>
>> The protocol identifier ivo:// identifies a resource that exists in 
>> the VO. It does not promise that you can completely resolve a URI 
>> beginning ivo:// in a registry, merely that some component of the URI 
>> will relate to a resource that has a registry entry, i.e. the bit 
>> before the first # can be resolved in a registry. So I can go to a 
>> registry and find out where ivo://nvo.caltech/vostores/vostore1 is
>> but I need to go to VOStore interface for this store to resolve 
>> ivo://nvo.caltech/vostores/vostore1#halibut3. I do not see why we 
>> need to introduce a second protocol just for VOStore contents.
>>
>> Now resolution of individual VOStore identifiers has to be done at 
>> the VOStore level; however, VOSpace gives you the ability to set up a 
>> single logical identifier for multiple copies of the same resource so 
>> here we might want a separate protocol: vos and resolution of this 
>> identifier has to be done at the VOSpace level since VOSpace manages 
>> multiple VOStores.
>>
>>    Cheers,
>>
>>    Matthew
>>
>>
>> Paul Harrison wrote:
>>
>>> Reagan Moore wrote:
>>>
>>>> The differentiation between the VOStore and VOSpace interfaces is 
>>>> becoming unclear.  The latest draft implies that properties that 
>>>> were originally associated with VOSpace would now be supported by 
>>>> VOStore.
>>>>
>>>
>>> I have to say that I agree that there seems to be some confusion in 
>>> this area - with hindsight it was probably a mistake to defer the 
>>> specification of VOSpace and work on VOStore alone as the "easier" 
>>> problem - the specifications should be worked in tandem to see where 
>>> it is most appropriate to place roles and responsibilities for 
>>> particular use cases, so that a "global" solution is arrived at.
>>>
>>> I thought that the original separation into VOStore and VOSpace was 
>>> done so that VOStore could be an essentially "dumb" BLOB repository 
>>> that did what it was told by the VOStore layer when it comes to 
>>> issues of file permissions and hierarchical file names. However, 
>>> because no VOSpace specification was created, these more advanced 
>>> features have crept into the VOStore layer.
>>>
>>>>
>>>> Let's look at the current VOStore and VOSpace proposal:
>>>>
>>>> VOStore                                     VOSpace
>>>> Storage of objects                          management of virtual 
>>>> file system
>>>> data stored under unspecified ID?
>>>> no user home directory                      User home directory
>>>> directory hierarchy                         Directory hierarchy
>>>> Unique file name within storage             User-defined file names
>>>>                                             Mapping VOSpace name to 
>>>> VOStore name
>>>>                                             List files for user
>>>> Restrict access by user identity?
>>>> Identify files with URIs
>>>> Access controls on local file name          Access controls on 
>>>> VOSPace name
>>>>
>>>> This characterization mixes name space, mixes access controls, does 
>>>> not provide consistent identity, does not allow consistent 
>>>> management.  For instance, if a URI is being provided for file 
>>>> identity within the VOStore interface, then there is no need for 
>>>> user-specified names within VOSTore.  A second issue is the 
>>>> assumption that file access can be restricted by user identity. 
>>>> This means that the VOStore must manage the owner for each file, 
>>>> access controls for each file.  File systems usually do this by 
>>>> creating accounts for each user name and applying Unix 
>>>> permissions.  Is this capability to be provided now by both VOSpace 
>>>> and VOStore?  We need a cleaner separation of capabilities.
>>>
>>>
>>>
>>> This security aspect is crucial - it is clear that the owners of 
>>> VOStores would not want to be managing user identity lists of all 
>>> the VObs users at their stores - the fine grained access controls 
>>> should be at the VOSpace level. If VOStores only respond to requests 
>>> from trusted VOSpace services then this is possible, but I think 
>>> that the perceived requirement for more detailed access control in 
>>> the VOSpace layer has come about because prototype end-user 
>>> applications have appeared that talk directly to the VOStore layer - 
>>> of course, it is not surprising that this has happened because there 
>>> was no VOSpace definition for the end user applications to talk to.
>>>
>>> How file/BLOB identity is managed is also crucial to producing a 
>>> system that offers more than ftp. I thought that one of the 
>>> fundamental driving  use cases for a VOSpace was that the same BLOB 
>>> could potentially live on serveral VOStores, and that when 
>>> specifying a resource in VOSpace, in a workflow for instance, the 
>>> resource could be retrieved from the VOStore that was "closest" on 
>>> the network to where the resource would be consumed. This sort of 
>>> use case does require some careful thought about the allocation and 
>>> management of identifiers, and I think probably means that the 
>>> VOStore will have to be aware of the VOSpace identifier.
>>>
>>> I also have an issue with reusing ivo: as the protocol part for the 
>>> URI of an identifier in this system - ivo: is already well defined 
>>> and used as the identifer for registry entries, and the "protocol" 
>>> for accessing the entity associated with the identifier is defined 
>>> in the registry interface standard. This means that given an 
>>> identifier of the form ivo://authority.org/something#blah a software 
>>> agent (or human for that matter) cannot tell by inspection whether 
>>> the identifier refers to a file in VOSpace or is simply a reference 
>>> to a registry entry (e.g. for a SkyNode) - this leads to software 
>>> having to be more complex in order constantly to test for the 
>>> different possibilities. I think that it would be better to have a 
>>> URI with a different protocol part, vos: for instance, it would then 
>>> be immediately apparent that the VOSpace protocol should be used to 
>>> access the entity referred to by the identifier.
>>>
>>>>
>>>> Let's look at the Storage Resource Broker data grid separation of 
>>>> local storage management from the virtual file system management:
>>>>
>>>> Local storage system                        SRB name space
>>>> Storage of objects                          management of virtual 
>>>> file system
>>>> data stored under SRB ID
>>>> no user home directory                      User home directory
>>>> directory indirection structure             Directory hierarchy
>>>> Unique file name within storage             User-defined file names
>>>>                                             Mapping SRB name to 
>>>> local file name
>>>>                                             List files for user
>>>> Access through SRB ID, controlled by SRB
>>>>                                             Identify files by URIs
>>>>                                             Access controls on SRB 
>>>> name
>>>>
>>>
>>> I think that as Regan points out the separation of responsibilities 
>>> that  SRB has with the local storage system is pretty much the right 
>>> model for  VOSpace and VOStore - though it means that SRB is pretty 
>>> much at VOSpace level rather than a VOStore as is suggested in the 
>>> current VOSpace definition document.
>>
>
Hi,

If you also allow the possibility that the local storage repository can 
run in an unauthorized (anonymous access) manner then this is exactly 
what Guy and I were suggesting. Does that mean that we actually all 
agree on this :-)

    Cheers,

    Matthew



More information about the vospace mailing list