Proposals for VOSpace (content size)

Dave Morris dave.morris at metagrid.co.uk
Thu Jun 17 06:58:16 CEST 2021


Hi Sonia,
You raised several good suggestions in your email. To avoid confusion 
I'll reply to each one in a separate email thread.

On 2021-06-11 13:31, Zorba, Sonia wrote:
> 6. Clarify what to use as folder size (should it be the total size of 
> its content?)

Content size is hard to do because VOSpace is not just a file storage 
system. VOSpace is, as the name implies, an abstract space to access 
data. It can be used to store files, but that is just one use case.

For example, a node in VOSpace can represent an image. The VOSpace 
service can provide different views of the image, e.g. JPEG, PNG or 
FITS, full size, cutout or thumbnail. Each of the views would have a 
different number of bytes.

Several image nodes could be put in a container node. The VOSpace 
service can provide different views of the container, e.g. HTML page 
with thumbnails, tar.gz file or a SIAP service.

Only one of these, the tar,gz file, has a size in bytes. The others are 
abstract views for accessing the data. Even in the simplest case, a 
tar.gz of the original image files, there are two content sizes, the 
total size of all the images, or the size of the compressed tar.gz file.

For tabular data, a VOSpace node can represent a database table. The 
VOSpace service can provide different views of the table, e.g. VOTable 
or FITS file, but the size of the downloadable content of those views is 
not directly related to the size on disc that the database service uses 
to store the data.

In this example, it is likely that the content of the VOTable or FITS 
files would be generated on demand, streaming the data in response to a 
HTTP GET request. In which case the server would never see the whole of 
the content as an entity that could be measured.

Several table nodes could be put in a container node. The VOSpace 
service can provide different views of the container, e.g. HTML listing 
page, multi-table VOTable file or a TAP service.

Only one of these, multi-table VOTable file, has a size in bytes, but 
again, the content would probably be generated on the fly in response to 
a HTTP GET request, so the server would never see the whole content as a 
measurable object.

I think the best we can do is say how many things (images or tables) the 
container contains. If we try to do anything more, we will end up 
presenting the right value for one view and the wrong value for all the 
other views.

Cheers,
-- Dave

--------
Dave Morris
Research Software Engineer
Wide Field Astronomy Unit
Institute for Astronomy
University of Edinburgh
--------


More information about the grid mailing list