Transfer of large data
Reagan Moore
moore at sdsc.edu
Thu Jul 21 05:41:47 PDT 2005
SDSC uses the Storage Resource Broker data grid to move and manage
large amounts of data (on the order of tens of terabytes). We find
that more than just single file movement is required. The transport
operations that are used by multiple projects manipulating
distributed data include:
- parallel I/O streams. This is required to overcome the slow start
protocol of TCP/IP
- bulk data transport. This is the aggregation of small files (less
than 10 MBs in size) before transport to minimize latency
- bulk data registration. This is the registration of entire
directory trees into a data management system
- remote procedures. As pointed out, in many cases the amount of
data that is moved can be reduced by filtering the data at the
source. Examples of remote procedures include Hierarchical Data
Format manipulation, DataCutter filtering, partial file read,
metadata extraction
- replication. We rely upon multiple copies for disaster recovery,
performance optimization, high availability.
- bulk delete and bulk access control setting.
We find that the types of operations needed for manipulation of large
data invariably require bulk operations for each of the commands one
would normally issue on single files.
Projects using the SRB technology at the moment include
- movement of images from Chile to the US (400,000 images, 3.3 TBs moved)
- replication of image surveys onto the NSF Teragrid (10 million
images, 25 TBs moved)
Reagan Moore
>Dear all,
>
>Have you experienced large data transfer (e.g. >100MB) between
>distributed VO machine? Would you tell your practical solution?
>
>Our JVO team has built a SkyNode server which receives a query and
>returns a result VOTable through Web Services, using Apache Axis.
>Then we encounter a performance problem in large data transfer
>using SOAP messages; Large amount of memory is allocated to hold
>the entire data as Java objects. It slows processing speed, too.
>Sending large data through SOAP messages seems unrealistic.
>
>So we are considering SkyNodes pass result VOTable using:
> 1. Staging (Result data are separately transferred through HTTP or
> FTP, like SIAP)
> 2. Attachment
>Both will need some extension to SkyNode protocol.
>
>Masahiro Tanaka
More information about the dal
mailing list