Transfer of large data

Reagan Moore moore at sdsc.edu
Thu Jul 21 05:41:47 PDT 2005


SDSC uses the Storage Resource Broker data grid to move and manage 
large amounts of data (on the order of tens of terabytes).  We find 
that more than just single file movement is required.  The transport 
operations that are used by multiple projects manipulating 
distributed data include:
- parallel I/O streams.  This is required to overcome the slow start 
protocol of TCP/IP
- bulk data transport.  This is the aggregation of small files (less 
than 10 MBs in size) before transport to minimize latency
- bulk data registration.  This is the registration of entire 
directory trees into a data management system
- remote procedures.  As pointed out, in many cases the amount of 
data that is moved can be reduced by filtering the data at the 
source.  Examples of remote procedures include Hierarchical Data 
Format manipulation, DataCutter filtering, partial file read, 
metadata extraction
- replication.  We rely upon multiple copies for disaster recovery, 
performance optimization, high availability.
- bulk delete and bulk access control setting.

We find that the types of operations needed for manipulation of large 
data invariably require bulk operations for each of the commands one 
would normally issue on single files.

Projects using the SRB technology at the moment include
- movement of images from Chile to the US (400,000 images, 3.3 TBs moved)
- replication of image surveys onto the NSF Teragrid (10 million 
images, 25 TBs moved)

Reagan Moore



>Dear all,
>
>Have you experienced large data transfer (e.g. >100MB) between
>distributed VO machine?  Would you tell your practical solution?
>
>Our JVO team has built a SkyNode server which receives a query and
>returns a result VOTable through Web Services, using Apache Axis.
>Then we encounter a performance problem in large data transfer
>using SOAP messages; Large amount of memory is allocated to hold
>the entire data as Java objects. It slows processing speed, too.
>Sending large data through SOAP messages seems unrealistic.
>
>So we are considering SkyNodes pass result VOTable using:
>  1. Staging (Result data are separately transferred through HTTP or
>	     FTP, like SIAP)
>  2. Attachment
>Both will need some extension to SkyNode protocol.
>
>Masahiro Tanaka



More information about the dal mailing list