Transfer of large data

Andreas Wicenec awicenec at eso.org
Thu Jul 21 08:30:29 PDT 2005


On 21.07.2005, at 15:48, Roy Williams wrote:

> The key to efficient transfer of big data is to separate (XML) 
> metadata from (binary) data. The metadata can contain pointers to the 
> data (http:// or srb:// or gridftp://).
>
> When I buy a single brick at my local hardware store, I take it with 
> me to the cashier and deal with metadata (payment) and data (the 
> brick) together. But when I buy 1,000 bricks, it is different. I pay 
> the cashier and receive a piece of paper (the pointer), then I take 
> the paper somewhere else to load the bricks into my truck.
>
> As Andreas points out, VOTable was built in this way, to represent 
> table metadata, with pointers to binary or FITS data elsewhere. It is 
> not advised to use the TR,TD mechanism of VOTable to represent large 
> datasets. In the same way, the VOStore specification is being built 
> with the possibility of splitting metadata from data.
>
> Splitting metadata from data is more efficient.
> But it requires more effort to make sure the two are properly 
> synchronized.

That's why we chose to keep the two things together in one file (or 
rather transfer block) containing header (VOTable) and binary data. The
references in the VOTable thus are using what is called a contents-ID 
(see http://www.ietf.org/rfc/rfc2111.txt) and this construction
is known by many mime handlers. Like this we have a self-describing 
file containing both a valid XML-file and plain binary data. This is 
much like FITS, but we are using well-known standards from the e-mail 
world. In fact such a file can be opened and interpreted using a 
standard e-mail client. The size of the actual VOTable is limited to 
the resource and field description, all the data is in binary 
attachments. Well, we actually wanted to have a bit more flexibility 
even and created VOTables where single fields are referring to 
multi-dimensional binary arrays and other fields are still given as TD 
elements. Since this is not covered by the current version of the 
VOTable standard, we are not using this externally though.

I would also like to mention that we should strictly separate the issue 
of the transfer from the issue of the packaging/formating. The fact 
that TCP connections have their problems with latency and slow startup 
speed should not be mixed with what you would like to transfer. If we 
find a much better protocol to transfer large amounts of data, thats 
should be used, but we should not be forced to use a different internal 
packaging because of this.

Anita: The transfer speed we reached actually is almost 0.8 Gb/s 
(equivalent to 96 MB/s) and is in fact limited by the network cards.

Andreas



More information about the dal mailing list