Updated ImageDM and SIAV2 drafts
Douglas Tody
dtody at nrao.edu
Sat Sep 28 19:56:54 PDT 2013
On Sat, 28 Sep 2013, Slava Kitaeff wrote:
>
> On 28/09/2013, at 10:40 AM, Douglas Tody <dtody at nrao.edu> wrote:
>
>> On Sat, 28 Sep 2013, Slava Kitaeff wrote:
>>
>>> I'd like to provide a quick feedback
>>> on http://wiki.ivoa.net/internal/IVOA/ImageDM/WD-ImageDM-20130812.pdf.
>>>
>>> The references to FITS in the document are quite numerous. Some of the statements,
>>> e.g. on FITS good performance, can be challenged. I'm raising this because the
>>> radioastronomy community is rapidly moving away from FITS being faced with its
>>> limitations.
>>
>> I agree that FITS runs into performance problems for large datasets.
>> I suspect the comment about performance you are referring to was just
>> stating that representing the data as a n-D array can have major
>> advantages for performance, as opposed to for example trying to work
>> directly on a visibility dataset. We meant n-D arrays in general, not
>> FITS in particular.
>
> n-D array, as a description of data, can not be categorised for a performance on it's own. The way how a particular format stores n-D array results in the differences in performance for a specific type of data access (e.g. cube cutout), but not the fact of using n-D array by itself. FITS and HDF5 will take the same n-D input, can produce the same output, but will display read performance at least an order of magnitude different.
The n-D array is a pure data model, well suited for high performance
computation; performance for accessing large arrays in external storage
is an entirely different matter. FITS is fine for small datasets but
has increasingly serious disk access performance issues for large
datasets and in particular for cubes. CASA image tables for example
have a block-storage option that scales better to higher dimensional
datasets. HDF5 integrates parallel i/o capabilities so can have much
higher i/o performance. Just breaking a large dataset up into multiple
smaller files in a parallel file system can have comparable or better
performance however, without the complexity of HDF5.
> Considering any file format or service/transfer protocol while developing a data model is methodologically problematic. It can bias the model and make generalisation incomplete. The demonstration of it is, e.g. the presence of naxis/naxes. If one thinks of self-descriptive data in a sufficiently generic model, such parameters won't be there. They've been clearly introduced replicating FITS image header.
The names naxes,naxis were certainly motivated by FITS but these are
also primary attributes of any n-D regularly sampled array. In any
case, I agree both that the ImageDM draft has strong ties to FITS, and
that FITS has serious issues, particularly for large complex datasets
such as from modern radio instruments, wide-field optical mosaics, and
so forth.
We need FITS to be well integrated with the ImageDM and SIAV2 to
encourage take-up by the broader community, which outside of radio (and
even within radio in certain areas) is mainly using FITS. The intention
though is to define the ImageDM independently of the FITS serialization,
while capturing the most important elements of FITS such as the FITS WCS
data model. This approach provides a high degree of compatibility with
existing data archives and analysis software while allowing migration to
more capable data formats.
- Doug
>>> I'd question if a reference to a file format is necessary in a data model
>>> at all. Shouldn't the data model describe the data, and then a necessary format be
>>> produced as required if the server or client are equipped with a capability - FITS
>>> (uncompressed, compressed with H-transform, GZIP_1, GZIP_2, RICE: 5 options for
>>> encoding), JPEG, PNG, JPEG2000 (two options for encoding with many parameters), HDF5
>>> (several options for encoding) etc?
>>
>> The approach is to define the data model abstractly, independent of
>> serialization. We just give some examples of serializations that we
>> expect to be used for cube data in astronomy, for which mapping from the
>> ImageDM are likely to be required. Other serializations are certainly
>> possible. When we produce actual data services like SIAV2 it will also
>> be necessary to say what is mandatory to support to return data to a
>> client. FITS would probably be the mandatory output format, but more
>> advanced services may support other serializations (JPEG2000, HDF5,
>> whatever).
>>
>> Speaking of JPEG2000, accessData (SIAV2) can support streaming of data
>> back to the client application. So, given a sufficiently powerful
>> service, it will be possible to access a portion or view of a large cube
>> and stream JPEG2000 encoded data back to the client, allowing advanced
>> capabilities such as progressive display in an interactive client. The
>> same service might return the data subset in FITS format if that is what
>> the client wants back.
>>
>> - Doug
>>
>>
>>> Regards,
>>> Slava
>>>
>>>
>>>
>>>
>>> ----------------------------------------------------------
>>> Dr Slava Kitaeff
>>>
>>> Research Associate Professor
>>> International Centre for Radio Astronomy Research
>>> The University of Western Australia
>>>
>>> Ph. +61 8 6488 7744
>>>
>>>
>>>
>>>
>>> On 12/09/2013, at 4:33 AM, Douglas Tody <dtody at nrao.edu> wrote:
>>>
>>> The Image data model (ImageDM) and associated image access protocol
>>> (SIAV2) documents have been updated on the IVOA TWiki (updated versions
>>> were actually uploaded several weeks ago to support author group
>>> discussions, but were not announced at the time).
>>>
>>> The ImageDM doc was largely written from scratch and replaces the
>>> placeholder version from the May interop. The SIAV2 draft has also been
>>> largely rewritten to make use of the closely linked ImageDM and other
>>> recent DAL developments such as DALI. Full support is provided for
>>> multidimensional image data including large data cubes and sparse cubes.
>>>
>>> The drafts live on these pages:
>>>
>>> http://wiki.ivoa.net/twiki/bin/view/IVOA/ImageDM
>>> http://wiki.ivoa.net/twiki/bin/view/IVOA/SiaInterface
>>>
>>> Neither document is complete yet, although the ImageDM draft comes close
>>> so far as the core model is concerned. The main thing to be updated at
>>> this point is the data access model for slicing and dicing, computing
>>> moments,etc., of large image cubes. We hope to update one or both
>>> documents prior to the interop to reflect author group discussions and
>>> prototyping over the past month or so.
>>>
>>> These drafts are also the basis for the cube project data access (SIAV2)
>>> prototype being developed by VAO, as Ray mentioned on the list a few
>>> days ago. The VAO SIAV2 prototype includes test multidimensional image
>>> datasets for ALMA, JWST, JVLA (and legacy VLA), JCMT, Keck (Osiris), and
>>> several cube datasets from NED. See the following for details and
>>> status; http://wiki.ivoa.net/twiki/bin/view/IVOA/SIA2VAOPrototype
>>>
>>>
>>>
>
>
More information about the dm
mailing list