Changes to VOSpace 2.0
dave.morris at bristol.ac.uk
Thu Jan 6 04:42:09 PST 2011
I've been out of the loop for a while, so this may no longer be
relevant, but I think the original intent was that the job state would
record the success/failure of the whole 'job', the negotiation stage,
the data transfer itself and any decoding/interpreting of the data when
On 05/01/11 23:58, Matthew Graham wrote:
> On Jan 5, 2011, at 3:41 PM, Patrick Dowler wrote:
>> On 2011-01-05 15:03:06 you wrote:
>>> The user submits an HTTP(S) request to /sync which consists of a<transfer>
>>> document detailing a pushTo/pullFrom transfer. The service responds with a
>>> Redirect-GET which returns the Job document for the transfer to the user
>>> with details of the transfer, i.e., endpoints. At this point, the status
>>> of the Job should be ERROR (if something went wrong) or EXECUTING (since
>>> the endpoints are live and ready for data transfer). The Job status should
>>> only go to COMPLETED when the data transfer has finished - the service
>>> will need to monitor the data streams (or delegate the monitoring) to
>>> determine when this occurs and respond accordingly.
>> Oh, I see. It is me that misunderstood the job state. I thought the job was
>> complete once the negotiation was done, but this is more sophisticated. After
>> the user PUTs the file (for example) the job will then be marked COMPLETED or
>> ERROR depending on whether that succeeds. Got it!
> OK, I'm happy with this solution as well.
Yep, I think this is similar to how the 1.1 SOAP service operated.
The job state was intended to be a semi permanent record of what
happened and could be referred to and examined after the transfer had
If the negotiation fails, then it should leave behind a job state that
records how far it got and why it failed. If the negotiation succeeds
but the data transfer fails, then it leaves behind a job state that
records how far it got and why it failed.
The job should only be listed as completed when the all data has arrived
and is in good shape.
For a simple binary blob, then the job completes when all of the bytes
arrive. For a transfer of a votable into a TAP service, then even if all
of the bytes arrive, the job could still fail if the votable was not
valid XML or failed to import into the database.
Several use cases for this.
1) A 3rd party transfer where the client software can negotiate the
transfer and then delegate the actual PUT to a different system which
would not necessarily need to understand vospace, just http or ftp. The
client then could poll the job state to monitor the progress of the
2) Data transfers as part of a larger workflow. By representing data
transfers as jobs with a status, a workflow engine can treat them as
just another step in the workflow, and monitor the status by examining
the job state.
3) Complex transfers of data in specific formats e.g. importing a
votable into a TAP service, where the recipient is expected to interpret
the data in a specific way. If the job state reports success when all of
the bytes arrive, then we need another mechanism to report a failure to
import invalid or badly formatted data.
4) Limited lifetime on the endpoint URLs. The server may want limit the
lifetime of the endpoint URLs, allowing it to manage its resources and
close off unused transfers. The job lifetime provides a way of telling
the client about this. If the client completes the negotiation but
doesn't do the PUT, then the endpoint URLs may become invalid when the
lifetime of the job expires. If the client wants more time, then it can
use the UWS protocol to negotiate with the server to extend the lifetime
of the job and hence the endpoint URLs.
Many service providers will record this kind information anyway for
performance monitoring and resource allocation. Exposing this
information in the job state provides a mechanism for making it
available to the end user.
The trade off has always been trying to enable this level of status
reporting and error handling for the more complex use cases while still
making it fairly simple to implement a basic HTTP binary blob GET/PUT
As Matthew says, a simple client can just follow the redirect, get the
endpoint, push the data and be done. The details in the job status are
there to support more complex clients who want to check if the data
arrived correctly and find out what happened if it didn't.
Hope this helps,
More information about the vospace