Transfer of large data
Jean-Yves Nief
nief at cc.in2p3.fr
Mon Aug 15 08:19:03 PDT 2005
hello Masahiro,
I would not recommend neither HTTP nor FTP for large data
transfer.
Multi streams transfer tools are important for this as Reagan pointed out.
We have been using for several years tools like bbftp (multi stream ftp,
gridFTP is also an other tool) and now SRB in order to move hundred of
Terabytes of data on high latency network between the Stanford Linear
Accelerator Centre (California) and CC-IN2P3 (a Computing Centre in
Lyon, France) for a High Energy Physics experiment called BaBar. In the
case of SRB (used in production mode for more than a year), it
represents roughly 130 TB shipped from USA to France (> 200,000 files).
Scalabilitity is an other potential issue: using the method above, we
could easily import much more data if needed (well the network bandwidth
could be easily saturated).
In all cases, keeping metadata and data separated from one an other
makes things easier to handle. In most cases like SRB, you store your
metadata into a database which are then well separated from your data
file. It also make things easier and much more efficient when searching
for a set of data files based on a given set of values for the metadata.
And the extra effort in order to do the sync between data and metadata
is not that expensive.
On top of that, data can be resident on any kind of storage devices.
When you have a huge amount of data to handle (that is true for HEP but
it seems to become also true for astrophysics), they can be stored on
disk but also on various tape storage system: the latter are not very
convenient but I don't how one can avoid them in any kind of projects
handling large amount of data.
Any solution chosen for the transfer protocol should give the
possibility to plug any kind of storage system you like in order to
deposit your data.
And there is here an other reason to keep metadata separated from data
is that all data files are not necessarly quickly accessible making
searches potentially extremely inefficient.
cheers,
JY
Masahiro TANAKA wrote:
>Dear all,
>
>Have you experienced large data transfer (e.g. >100MB) between
>distributed VO machine? Would you tell your practical solution?
>
>Our JVO team has built a SkyNode server which receives a query and
>returns a result VOTable through Web Services, using Apache Axis.
>Then we encounter a performance problem in large data transfer
>using SOAP messages; Large amount of memory is allocated to hold
>the entire data as Java objects. It slows processing speed, too.
>Sending large data through SOAP messages seems unrealistic.
>
>So we are considering SkyNodes pass result VOTable using:
> 1. Staging (Result data are separately transferred through HTTP or
> FTP, like SIAP)
> 2. Attachment
>Both will need some extension to SkyNode protocol.
>
>Masahiro Tanaka
>
>
More information about the dal
mailing list