SED FITS Serialisation: multi-extension?

Wed Jun 15 16:15:13 PDT 2005

Hi Alberto -

The query response table is in general not something which should be
thrown away.  Granted, this may happen for data analysis where a compliant
data product is returned which contains all the necessary metadata.
In other cases, such as when foreign/external data is returned, or when
a graphics format is returned, the query response metadata may be required
to understand the data.

Regarding MEF vs one-table-row-per-segment: you raise a good question,
but either approach will work.

As you say, we would like the data provider to be able to augment
the standard data model with additional metadata or data specific to
their data.  With the table approach one adds additional columns to the
table defined by the data model (alternatively one could add separate,
linked extension records as with the query response, providing a nice
separation between the standard and extension metadata).  An advantage
of the one-table approach is that it imposes a uniform structure on
all segments, making it easier for the client to understand the data.
Another advantage is that the 80-character limit on FITS headers is
avoided when table columns are used instead.

A problem if we go with one table per segment is that when a SED segment
is one photometry point (a frequently encountered case for a SED) then
you have a whole table containing only one data point!  This is too
inefficient to be acceptable.  An entire SED object in one table is far
more convenient and efficient.

Finally, a related point if we are working with FITS tables it is attractive
to be able to do everything consistently with one table format.  BINTABLE,
when used for spectra, most commonly represents a spectra as one table row,
so it is nice to be able to use this approach consistently.  In a simple
case of a table containing a single 1D spectrum there is only one row,
but it works fine and the format and semantics are the same as when we
have a SED table, or possibly in the future, a MOS/IFU table containing
many spectra.  In terms of efficiency, it is easier to extract a vector
from a table cell than from a column, vectors can be different lengths, etc.

> For example I can imagine useful to associate with the standard WAVELENGTH,
> FLUX and ERROR other columns like the SUBTRACTED BACKGROUND" etc.  Or,
> as is the case for spectropolarimetry, to add columns to store the
> Stock's parameters.

While I agree that data model extensions will always be needed for special
cases, if cases such as you mention here are important enough we should
add it to the data model.  There is already support in there for background
subtracted data.

	- Doug

---
>From Alberto.Micol at eso.org Wed Jun 15 16:41:26 2005
Date: Mon, 13 Jun 2005 16:16:04 +0200
From: Alberto Micol <Alberto.Micol at eso.org>
Reply-To: dal at ivoa.net
To: dal at ivoa.net
Subject: Re: SED FITS Serialisation: multi-extension?

On Jun 13, 2005, at 14:23, Markus Dolensky wrote:

> Hi Alberto,
>
> Would you mind specifying which parts of either the spectral DM doc ...
>
> http://www.ivoa.net/twiki/bin/view/IVOA/IVOADMSpectraWP
>
Yes. Specifically chapter 8 "Serializations".
Sorry not to have mentioned that earlier.

> or the SSA interface doc. ...
>
> http://www.ivoa.net/internal/IVOA/InterOpMay2005DAL/ssa-v090.pdf
>
> ... triggered particular thoughts?
>
> For instance, what is meant by a "VOTABLE accompanying the SED"?
> There is going to be a VOTable query response, but the serialization 
> can either be an XML or VOTable document or a FITS binary table.

As I see it, tell me if wrong, the SSA client receives back a VOTable
response, which might point to some FITS file for individual segments,
or even for a bunch of segments at once. The VOTable is the "messenger"
and I see it quite volatile; the associated FITS instead, containing the
actual data, is to be stored by the end user for subsequent scientific
analysis. And I'm afraid that, as soon as the message is received, the
VOTable will be kindly moved to .Trash hence leaving no idea to the end
user of which segment had certain characteristics; even the Provenance
(in DM terminology) of any segment might go lost.

It is particularly important to remember which reference files were used
for those archives that offer on-the-fly calibration, where the SAME
dataset at different times will originate different (better) products
as time goes by.  If the user loses that info, s/he will not be able to
know whether a given product is still the best possible (the "current"
one) for a given observation.

>
> The scope is 1d spectra and time series. Are you suggesting to expand 
> this for V1.0 of the two docs?

No, I'm not looking for an "expansion", I'm just considering a 
different (let me say "better") serialisation.

>
> Remember, we are trying to serialize a DM. So, are your suggestions 
> aiming at expanding the DM or the way its implemented (serialized)?

The second one.

> > Conclusions: I see only advantages in adopting MEF, am I biased?
>
> Does it mean to give up on serializing a particular DM and to use 
> existing formats instead?

Not at all. We need to agree to a single particular DM, otherwise it would
be a mess. when I say MEF I don't just say "any MEF". I'm considering an
MEF that contains what the SED DM imposes, but also allows Data Provider's
specific info.  Regarding metadata: Even the current DM allows for "more
keywords" than just the suggested standard ones. My idea is to preserve all
the metadata that the DM already promotes, and *at the same time* preserve
all the metadata that the data provider has already published. I can see
that only with a MEF (one header per extension, i.e. per segment).  /*
Note: SED proposed keywords "shall" not clash with the commonly used ones.*/

Regarding the data:

The actual format of the data is NOT to be the original format adopted maybe
20 years ago by a data provider; that of course needs to be standardised,
and the currently SED0.93 proposed solution is to use a binary table
with one segment per row.  Instead I'm proposing a MEF to allow for more
metadata than just the VO ones (see above), and to be able to cope with
other kind of data like the echelle or the spectropolarimetry, which are
still to be seen as 1d spectra, but need extra "columns", a concept ruled
out by the current SED 0.93. That's why I'm suggesting one binary table
per segment; a binary table per segment allows the data provider to fold
into the VO standard all the information judged to be useful.  For example
I can imagine useful to associate with the standard WAVELENGTH, FLUX and
ERROR other columns like the SUBTRACTED BACKGROUND" etc.  Or, as is the
case for spectropolarimetry, to add columns to store the Stock's parameters.

And again:
> Does it mean to give up on serializing a particular DM and to use 
> existing formats instead?

At the contrary: I am proposing "one format to rule them all".

And in fact:

> BTW, the next step on the roadmap is to unify access to images, 
> spectra and catalogues by means of ADQL.

also for images we probably need MEF if we want to offer not just the image
but also the accompanying weight maps, data quality, etc.  Hence MEF is
good for both imaging and spectroscopy.

> This is just to better understand your comments that you thankfully 
> took the time to put down.

Thanks for having taken the time to read me! :-)

> Cheers,
> Markus
>
Ciao,
Alberto

>
> Alberto Micol wrote:
>> Dear SSA/SEDers,
>> I'd like to comment on the serialisation aspects of the protocol
>> which now states that each segment is one row in a fits binary table.
>> In such serialisation the characterisation is left completely to the 
>> VOTable
>> accompanying the SED, since it becomes impossible to characterise 
>> each and
>> every segment with a single header.
>> That is fine IF the user does not care to know the origins of the 
>> segments.
>> (And someone might claim that such a user in not too careful, to say 
>> the least.)
>> My view is that the VO should simplify life of the users in other ways
>> than just stripping off all the information that the data provider, 
>> mostly
>> painfully, put together. :-)
>> My favourite solution would be to adopt a FITS extension for each of 
>> the segments,
>> each extension containing:
>>  -  a header with VO keywords PLUS the original header keywords,
>>  -  a binary table with scalar columns
>> In that way the work of the data provider would be happily 
>> recognised, and
>> the user might be able to find any kind of details regarding any 
>> segment,
>> from the calibration reference files used to calibrate a spectrum 
>> down to
>> the acknowledgment sentence some times buried in some fits COMMENT or 
>> HISTORY keyword.
>> The multiple extension FITS format would also allow to cover the 
>> spectropolarimetry
>> case (currently not supported at all), where for each wavelength
>> the Stokes parameters will be also stored in separate scalar columns.
>> Also, I think that the echelle spectra are causing some troubles to
>> the current format. Each of the multi order spectra should probably 
>> end up
>> into its own extension.
>> Conclusions: I see only advantages in adopting MEF, am I biased?
>> Alberto
>> Aside: With such a format, it would then also be easy to build a SED 
>> On The Fly
>> whereby a SED-OTF tool can compose SSAP queries to some selected 
>> services and
>> come back with a single multi-extension FITS file: it is just matter 
>> of
>> appending any individually ssap-returned FITS file to the 
>> multi-extension file.
>> (Unless I'm wrong, I don't think that the current serialisation allow 
>> a so simple
>> assembling of the fits files).
>>
Alberto Micol
ST-ECF HST Archive Scientist