VOSpace service-initiated transfers

Brian Major major.brian at gmail.com
Thu Apr 21 19:46:07 CEST 2016


Hi grid,

I'm looking for some feedback concerning the description of how
service-initiated transfers are performed.

The VOSpace specification defines four modes for the arrangement of data
transfer:

pushToVoSpace (optional)
pullFromVoSpace (required)

pullToVoSpace (optional)
pushFromVoSpace (optional)

The first two are "client-initiated" transfers and are fairly simple:
    1)  A client posts a transfer document to the service with the details
of the transfer request.
    2)  The service responds with transfer endpoints.
    3)  The client performs byte transfer on the endpoints (this is outside
of the VOSpace specification).

The second two are "service-initiated".  pullToVoSpace is where one VOSpace
instance downloads a node from another VOSpace instance.  pushToVoSpace is
where one VOSpace instance uploads a node to another VOSpace instance.

The 2.0 spec (and the 2.1 WD currently) define the procedure for
service-initiated transfers roughly as this:
    1)  A client posts a transfer document to service A with the details of
the transfer request.
    2)  Service A receives the request and negotiates with service B to
receive concrete endpoints for data transfer
    3)  Service A then transfers the bytes to (pushFromVoSpace) or from
(pullToVoSpace) Service B.

Firstly, has anyone implemented server to server transfers in their VOSpace?

I think there may be some weaknesses with this approach.

Having the services themselves perform the byte transfer (step 3) could be
problematic.  It requires the service to become fully qualified at that
task.  wget, curl, ftp, etc... are all fairly complex clients that operate
over their protocol and do things such as retry on failure, resume byte
transfer if disconnected, set headers correctly, interact with caching
proxies, work over secure connections, and so on.  I guess most languages
have libraries that perform these type of network tasks well, so maybe
that's not a problem.

Having the services do the byte transfer does mean that both service A and
B need to speak the same protocol.  If one only supports FTP and one only
supports HTTP/S then the negotiation will fail.

With client-initiated transfers, I like the fact that the byte transfer
handling is not the responsibility of VOSpace.

Another complexity is in step 2--the negotiation.  This requires the
service to have the full functionality of a VOSpace client and to make
decisions of behalf of the client using only the transfer document it
received.

Lastly, it would require VOSpace services to have the resources and ability
to execute long-running jobs for the byte transfer.  To me, this sounds
like more than a simple UWS job implementation running on web server
threads could handle.

So, I'm not sure of how to address these issues (if there is agreement of
course that they are issues), but here are some options I can think of:

Option 1: Have the client do the negotiation and be the moderator between
the services.  For example, a pullToVoSpace would look like this:
    1)  A client posts a transfer document to service B requesting download
endpoints.
    2)  Service B responds to the client.
    3)  The client posts a transfer document to service A, asking service A
to download the file from service B using the provided endpoints.
    4)  Service A downloads the file from service B.

Option 2: Have the client do the negotiation and byte transfer.  Another
pullToVoSpace example:
    1)  A client posts a transfer document to service B requesting download
endpoints.
    2)  Service B responds to the client.
    3)  The client posts a transfer document to service A requesting upload
endpoints.
    4)  Service A returns the endpoints to the client.
    5)  The client downloads the file from B then uploads it to A.

Of course, option 2 is not ideal from a network point of view because there
are two hops.  However, it would be more likely that a client supports more
protocols than a service and thus could mix protocols between services.
 (For example, the client could download over FTP and then upload over
HTTP.)

Any other options?  Your comments and opinions are most welcome.

Cheers,
Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/grid/attachments/20160421/fd7ca912/attachment.html>


More information about the grid mailing list