Transfer of large data

Jean-Yves Nief nief at cc.in2p3.fr
Mon Aug 15 08:19:03 PDT 2005


hello Masahiro,

             I would not recommend neither HTTP nor FTP for large data 
transfer.
Multi streams transfer tools are important for this as Reagan pointed out.
We have been using for several years tools like bbftp (multi stream ftp, 
gridFTP is also an other tool) and now SRB in order to move hundred of 
Terabytes of data on high latency network between the Stanford Linear 
Accelerator Centre (California) and CC-IN2P3 (a Computing Centre in 
Lyon, France) for a High Energy Physics experiment called BaBar. In the 
case of SRB (used in production mode for more than a year), it 
represents roughly 130 TB shipped from USA to France (> 200,000 files).
Scalabilitity is an other potential issue: using the method above, we 
could easily import much more data if needed (well the network bandwidth 
could be easily saturated).
In all cases, keeping metadata and data separated from one an other 
makes things easier to handle. In most cases like SRB, you store your 
metadata into a database which are then well separated from your data 
file. It also make things easier and much more efficient when searching 
for a set of data files based on a given set of values for the metadata.
And the extra effort in order to do the sync between data and metadata 
is not that expensive.
On top of that, data can be resident on any kind of storage devices. 
When you have a huge amount of data to handle (that is true for HEP but 
it seems to become also true for astrophysics), they can be stored on 
disk but also on various tape storage system: the latter are not very 
convenient but I don't how one can avoid them in any kind of projects 
handling large amount of data.
Any solution chosen for the transfer protocol should give the 
possibility to plug any kind of storage system you like in order to 
deposit your data.
And there is here an other reason to keep metadata separated from data 
is that all data files are not necessarly quickly accessible  making 
searches potentially extremely inefficient.
cheers,
JY

Masahiro TANAKA wrote:

>Dear all,
>
>Have you experienced large data transfer (e.g. >100MB) between
>distributed VO machine?  Would you tell your practical solution?
>
>Our JVO team has built a SkyNode server which receives a query and
>returns a result VOTable through Web Services, using Apache Axis.
>Then we encounter a performance problem in large data transfer
>using SOAP messages; Large amount of memory is allocated to hold
>the entire data as Java objects. It slows processing speed, too.
>Sending large data through SOAP messages seems unrealistic.
>
>So we are considering SkyNodes pass result VOTable using:
> 1. Staging (Result data are separately transferred through HTTP or
>	     FTP, like SIAP)
> 2. Attachment
>Both will need some extension to SkyNode protocol.
>
>Masahiro Tanaka
>  
>



More information about the dal mailing list