Transfer of large data
Andreas Wicenec
awicenec at eso.org
Thu Jul 21 08:30:29 PDT 2005
On 21.07.2005, at 15:48, Roy Williams wrote:
> The key to efficient transfer of big data is to separate (XML)
> metadata from (binary) data. The metadata can contain pointers to the
> data (http:// or srb:// or gridftp://).
>
> When I buy a single brick at my local hardware store, I take it with
> me to the cashier and deal with metadata (payment) and data (the
> brick) together. But when I buy 1,000 bricks, it is different. I pay
> the cashier and receive a piece of paper (the pointer), then I take
> the paper somewhere else to load the bricks into my truck.
>
> As Andreas points out, VOTable was built in this way, to represent
> table metadata, with pointers to binary or FITS data elsewhere. It is
> not advised to use the TR,TD mechanism of VOTable to represent large
> datasets. In the same way, the VOStore specification is being built
> with the possibility of splitting metadata from data.
>
> Splitting metadata from data is more efficient.
> But it requires more effort to make sure the two are properly
> synchronized.
That's why we chose to keep the two things together in one file (or
rather transfer block) containing header (VOTable) and binary data. The
references in the VOTable thus are using what is called a contents-ID
(see http://www.ietf.org/rfc/rfc2111.txt) and this construction
is known by many mime handlers. Like this we have a self-describing
file containing both a valid XML-file and plain binary data. This is
much like FITS, but we are using well-known standards from the e-mail
world. In fact such a file can be opened and interpreted using a
standard e-mail client. The size of the actual VOTable is limited to
the resource and field description, all the data is in binary
attachments. Well, we actually wanted to have a bit more flexibility
even and created VOTables where single fields are referring to
multi-dimensional binary arrays and other fields are still given as TD
elements. Since this is not covered by the current version of the
VOTable standard, we are not using this externally though.
I would also like to mention that we should strictly separate the issue
of the transfer from the issue of the packaging/formating. The fact
that TCP connections have their problems with latency and slow startup
speed should not be mixed with what you would like to transfer. If we
find a much better protocol to transfer large amounts of data, thats
should be used, but we should not be forced to use a different internal
packaging because of this.
Anita: The transfer speed we reached actually is almost 0.8 Gb/s
(equivalent to 96 MB/s) and is in fact limited by the network cards.
Andreas
More information about the dal
mailing list