VOSpace service-initiated transfers

Matthew Graham mjg at cd3.caltech.edu
Thu Apr 21 20:03:12 CEST 2016


Hi Brian,

You make good points but I don't think these are issues. My VOSpace implementations have always done the server-to-server transfer and we need this to have interoperable VOSpace services without any third-party byte transfer. The fact that data transfers are modelled as UWS jobs is partially to support asynchronous tasks such as a server-to-server transfer. The transfer protocol negotiation is also to ensure that a VOSpace service does not need to be a client for all protocols but only those the implementors actually want to support. The server-to-server transfer functionality was one of the requirements that took us away from simpler protocols in the first place, such as WebDev. As more and more VOSpace services appear and people start needing to move their data between them, this will be a more used aspect.

	Cheers,

	Matthew

On Apr 21, 2016, at 10:46 AM, Brian Major wrote:

> Hi grid,
> 
> I'm looking for some feedback concerning the description of how service-initiated transfers are performed.
> 
> The VOSpace specification defines four modes for the arrangement of data transfer:
> 
> pushToVoSpace (optional)
> pullFromVoSpace (required)
> 
> pullToVoSpace (optional)
> pushFromVoSpace (optional)
> 
> The first two are "client-initiated" transfers and are fairly simple:
>     1)  A client posts a transfer document to the service with the details of the transfer request.
>     2)  The service responds with transfer endpoints.
>     3)  The client performs byte transfer on the endpoints (this is outside of the VOSpace specification).
> 
> The second two are "service-initiated".  pullToVoSpace is where one VOSpace instance downloads a node from another VOSpace instance.  pushToVoSpace is where one VOSpace instance uploads a node to another VOSpace instance.
> 
> The 2.0 spec (and the 2.1 WD currently) define the procedure for service-initiated transfers roughly as this:
>     1)  A client posts a transfer document to service A with the details of the transfer request.
>     2)  Service A receives the request and negotiates with service B to receive concrete endpoints for data transfer
>     3)  Service A then transfers the bytes to (pushFromVoSpace) or from (pullToVoSpace) Service B.
> 
> Firstly, has anyone implemented server to server transfers in their VOSpace?
> 
> I think there may be some weaknesses with this approach.
> 
> Having the services themselves perform the byte transfer (step 3) could be problematic.  It requires the service to become fully qualified at that task.  wget, curl, ftp, etc... are all fairly complex clients that operate over their protocol and do things such as retry on failure, resume byte transfer if disconnected, set headers correctly, interact with caching proxies, work over secure connections, and so on.  I guess most languages have libraries that perform these type of network tasks well, so maybe that's not a problem.
> 
> Having the services do the byte transfer does mean that both service A and B need to speak the same protocol.  If one only supports FTP and one only supports HTTP/S then the negotiation will fail.
> 
> With client-initiated transfers, I like the fact that the byte transfer handling is not the responsibility of VOSpace.
> 
> Another complexity is in step 2--the negotiation.  This requires the service to have the full functionality of a VOSpace client and to make decisions of behalf of the client using only the transfer document it received.
> 
> Lastly, it would require VOSpace services to have the resources and ability to execute long-running jobs for the byte transfer.  To me, this sounds like more than a simple UWS job implementation running on web server threads could handle.
> 
> So, I'm not sure of how to address these issues (if there is agreement of course that they are issues), but here are some options I can think of:
> 
> Option 1: Have the client do the negotiation and be the moderator between the services.  For example, a pullToVoSpace would look like this:
>     1)  A client posts a transfer document to service B requesting download endpoints.
>     2)  Service B responds to the client.
>     3)  The client posts a transfer document to service A, asking service A to download the file from service B using the provided endpoints.
>     4)  Service A downloads the file from service B.
> 
> Option 2: Have the client do the negotiation and byte transfer.  Another pullToVoSpace example:
>     1)  A client posts a transfer document to service B requesting download endpoints.
>     2)  Service B responds to the client.
>     3)  The client posts a transfer document to service A requesting upload endpoints.
>     4)  Service A returns the endpoints to the client.
>     5)  The client downloads the file from B then uploads it to A.
> 
> Of course, option 2 is not ideal from a network point of view because there are two hops.  However, it would be more likely that a client supports more protocols than a service and thus could mix protocols between services.  (For example, the client could download over FTP and then upload over HTTP.)
> 
> Any other options?  Your comments and opinions are most welcome.
> 
> Cheers,
> Brian
> 
> 
> 
> 
> 



More information about the grid mailing list