Re: Changes to VOSpace specification (properties)

From: Dave Morris <dave-at-ast.cam.ac.uk>
Date: Fri, 24 Nov 2006 16:45:24 +0000


Matthew Graham wrote:

> I've had meetings in the past two weeks with the folks at JHU about
> putting a VOSpace interface onto CasJobs and Arun Jagatheesan at SDSC
> about the VOSpace interface with SRB. Predictably both parties raised
> issues but there were three that both brought up and I think we need to
> address them:

Assuming that we are now talking about changes to add to the _next_ version, then ok.
We _are_ talking about the next version here ... aren't we guys ?

> 1. Properties
> The current scheme is limited to key-value pairs where the value is
> interpreted as a string. A problem with this that some key-values pairs
> might be intended to represent other datatypes, e.g. a date or a float,
> and without this typing information, it is impossible to check the
> validity of the value.

I know it is very tempting to try to add extended types to the properties.

1.1 Property type attribute

First, _if_ we do introduce type attribute, the attribute should be in our own schema, and not using the xsi:type attribute.

Reason is that some XML parsers may swallow the xsi:type attribute, and the type information might not be passed on to the application layer.

So, it would have to be

[property vos:type="xxxx"]
   

rather than

[property xsi:type="xxxx"]



1.2 Avoiding large blobs

I experimented with quite a few variations of typed properties before we introduced the node/properties structured into vospace, and all of them had one drawback. It makes it very tempting for 3rd party developers to put large and complex information into the properties. Remember that the full properties list is returned for every element in a list response.

Personally, I'd prefer to keep node properties as small and lightweight as possible.
In a lot of my earlier work I actually had the property value as an attribute

[property uri="yyyy" value="data here"/]

rather than the element value we have at the moment

[property uri="yyyy"]data here[/property]

purely to discourage

[property uri="yyyy"]

        People who want to put large blobs of descriptive text
        in what should really be a primitive type.
        Adding this much text into a property makes it difficult
        for a database system to store, and it also makes it
        difficult for a UI developer to display in a GUI client.
        Ideally, all properties should be small values that can be
        displayed in a single line in the GUI.

[/property]

Form what I remember, we didn't adopt property value as attribute in the initial vospace spec, mainly because it is easier for XML/SOAP handlers to extract element data than attribute values.

Allowing large complex bits of data in the node properties means we are in danger of heading down same the road that registry has gone. Where we are starting to see clients using xpath expressions to ask the registry for small parts of the registration document because they don't want to handle the whole document when they only want one or two small bits of data e.g having to parse all of the curation information just to get the service endpoint(s).



1.3 Simple type information

On the other hand, type information for primitive values, e.g string, int, float, date etc. would indeed be useful. However, adding the type attribute to the message schema means that the message sender has control over the type information.

If we allow the sender to set the type attribute, then what happens if client A sends

[property uri="xxxx" type="xsi:int" value="51"/]

and client B sends

[property uri="xxxx" type="xsi:string" value="fifty-one"/]

We would need some way for the recipient to find out what the property type should actually be, and convert or reject one or other of the two values.
Which is what the property registration documents were supposed to do.

The property URI attribute should point to a resource that defines what the property means, in a human readable form, and it should also define the allowed format and range of values
in a machine readable form. In effect, a property registration document should contain the schema rules for the contents of a property element with that URI.



1.4 Complex properties

If we really do have a use case for larger property values, could we add them using a separate xml element.
For text based information, we could add a new [text-property] element, that would should wrap the text blob in a CDATA element.

[text-property uri="xxxx"]

        <[CDATA[
            The node property list may also contain text-property elements,
            which wrap the text in a CDATA element.
        ]]>

[/text-property]

In order to keep prevent large text properties from slowing the system down too much, I would ask that these are excluded from most of the service responses, and are only returned in response to a specific getNode() request.

It would be up to the client UI developers how they handled text-property elements.

For complex xml data, we could add a corresponding [xml-property] element, which can contain arbitrary xml data, possibly adding an additional schema and namespace attribute.

[xml-property uri="yyyy"]

        [colour]
            [red]123[/red]
            [blue]234[/blue]
            [green]89[/green]
        [/colour]

[/xml-property]

Note that we shouldn't need to add type, schema, or namespace attributes to the [xml-property] element itself. Again, the property URI should point to a registration document that defines these for all instances of properties with that URI.

Again, to reduce the impact on control messages, I would ask that these are excluded from most of the service responses, and are only returned in response to a specific getNode() request. Once we allow these larger properties, there is nothing to prevent people from abusing them.



1.5 Property syntax

Your example of the [colour] information is fairly harmless, but once this is in the spec. then there is nothing to prevent a 3rd party adding a 2 page description to a [text-property] or embedding a full registry resource document in an [xml-property].
If this became common place, then it would make things very difficult for UI developers to display the node properties in a clear and consistent way. Some properties would be a single numerical value, others would contain large blocks of text or xml. Unless we can filter out the complex values, then there is nothing to prevent large blobs of text and xml being returned in the node properties for every import, export and list response from then on.

I would prefer to restrict the size and complexity of properties, keeping node/properties as simple as possible and use references to point to the more complex metadata stored in separate nodes.

[node uri="xxxx"]

        [properties]
            [property uri="ivo://..../abstract"     value="vos://...."/]
            [property uri="ivo://..../registration" value="vos://...."/]
            ....
        [/properties]

[/node]

The text description and registration document can then be handled as separate data nodes, containing the complex data as text and xml files The case you highlighted is an edge case, where the xml isn't that complex, and the structure would be useful. However, as Paul mentioned, the same information could be represented as a simple string :

[property uri="xxxx" value="123:234:89"/]

The property URI should point to a registration document that defines the property type, and syntax, in a machine readable way. One possibility would be to use a regular expression in the registration document to define the property syntax.

[property uri="xxxx" type="complex-string"]

        ....
        [regexp]{0-9}+:{0-9}+:{0-9}+[/regexp]
        ....

[/property]

Message parsers could use the regular expression in the property registration document to validate the property value, without needing to understand the particular property meaning. If a property value did not match the regular expression, then the recipient would be allowed to reject the property.

This would allow you to define your own complex properties, with a specific syntax, and enable me to write a generic message handler that could validate the property as a string, by checking if it matched the regular expression. Without having to understand the internal details of the property meaning.



1.6 Summary so far

To avoid large properties, use attribute rather than element to restrict the size and complexity.
Yes, I know it is theoretically possible to add a large blob of text or xml to an attribute, but it makes it more difficult.

[property uri="xxxx" value="123:234:89"/]

The property registration document should contain all the type and syntax information about the property.
For complex properties, this could include a regular expression to describe the property syntax.

[property uri="xxxx" type="complex-string"]

        ....
        [regexp]{0-9}+:{0-9}+:{0-9}+[/regexp]
        ....

[/property]

If we want to add larger text or xml properties, we should add a new element to the schema to support them.

[text-property uri="xxxx"]

        <[CDATA[
            The text data, wrapped in a CDATA element.
        ]]>

[/text-property]

and

[xml-property uri="yyyy"]

        [colour]
            [red]123[/red]
            [blue]234[/blue]
            [green]89[/green]
        [/colour]

[/xml-property]

The [text-property] and [xml-property] would only be returned in response to an explicit getNode() call.
They should be filtered out of the import, export and list responses.

Thank you for reading this far :-)
Dave Received on 2006-11-24Z17:46:04