Spectrum data model

Anita Richards amsr at jb.man.ac.uk
Tue Sep 12 11:09:31 PDT 2006



Jonathan et al.,

Thanks very much for this model, and although most comments are on things 
which worry me that should not detract from the enormous amount of very 
useful stuff.  I apologise if I am reopening any old discussions which I 
have forgotten, in which case if I am not saying anything new then old 
decisions should stand... and in any case, I think that nothing I ahve to 
say should impede the first attempts to use the model as that will be a 
much better way to get feedback.


Comments on Spectrum data model 0.98c Rev 1

---------------------------------------------------------------------
General comments on document and on implimentation:
---------------------------------------------------------------------

The tabulation of elements makes it very nice and easy to see what is
present. This could be useful for data publishers to see what is
required. Something which I would like to see soon is examples of how
a data provider should give the information, given that at present
metadata may live in a database (if we're lucky) or FITS headers or
???

I like the overlap with Registry in curation, as long as that is
workable in practice and (as with any element in common with another
data model) it is completely consistent and not too hard to keep in
synch. I am concerned about the lack of any explicit unique ID in
Char.

I like the provision of ucd1+ (I presume that is what is meant
everywhere you say UCD in any form) but we will need to help data
providers supply these, or include default ucd1+ for the most basic
elements in the templates.
Regarding defaults for element values, in Char we felt that this was
best done by the software agents using the model, e.g. if a task looks
for upper and lower error bounds and there is just one reference value
given then the software knows to take the single reference value for
both bounds.

All these features might be useful for Char also.  I have not tried to
compare Spectrum and Char schemata in detail for consistency although
I know that is important.

---------------------------------------------------------------------
Mandatory fields
---------------------------------------------------------------------

It would be better to use 'must', 'should' and 'may' instead of
required and optional.  Not only is this consistent with SIAP etc. but
it allows more flexibility for data sets which may be missing a
desirable parameter - or for elements where we have imposed rules
which will turn out to be impractical in use.

A data provider who is willing to start filling in a data model at
all, will want to do it 'properly' insofar as they have the metadata
available.  For a limited number of 'must' fields they might be
willing to scratch around for missing information, but not if it takes
too long or is not perceived as relevant.  Of course a software agent
needs enough information to use the model but it is better to build in
safeguards for possibly missing parameters, than that the data
providers simply ignor them (as happens with the Registry) or,
worse,don't bother to supply data at all.

---------------------------------------------------------------------
Detailed comments
---------------------------------------------------------------------

I have tried to think how I would apply Spectrum to Galactic and
extra-galactic data, to blind surveys and to time series.

3.1 Summary

line 2 (e.g. aperture, position, etc.) - i.e. add the 'e.g.' so that
sharing aperture and position is not obligatory, to allow for SEDs
from multiple instruments, stacked spectra of scattered galaxies of a
particular type etc.

line 2 Change sentence starting 'Specifically' to 'Typically' (since
the quantities in the pink box are only applicable to a certain type
of spectrum, not to a time series).

Omit references to mandated units e.g. seconds, MJD in the pink boxes as
section 3.2 deescribes how easy it is to interconvert - maybe add a
sentence to say that the data will be in units which can be converted
to recognised SI and astronomica units using STC and dimensional analysis.



I have not commented on mandatory ucd's but as mentioned above I don't
think that they should be mandatory unless we can provide them for
archivists who aren't familiar with the concept.

Tables of elements

Spectrum.Target.Name      - I agree that there should be a unique
                             identifier for every description made
                             using Spectrum, but is this the
Spectrum.DataID.DatasetID - or is that for a parent dataset?

Pedantically, would Spectrum.Target.ID be better, to avoid confusion
with recognised IAU or SIMBAD names (which are not appropriate
e.g. for blind surveys or astroparticles).

Spectrum.Target.VarAmpl - Is this for spectral data which vary with
 			  time? What is it for? Should it be on the
 			  Flux axis? For stellar variability you also
 			  want the Spectrum.Target.VarPeriod and
 			  Spectrum.Target.VarPeriodOrigin to be really
 			  useful but maybe this is overkill.

Spectrum.Char.SpectralAxis.Unit - One of these should be 'must'
Spectrum.Char.TimeAxis.Unit     - depending on type of data?
                                   (not always SpectralAxis)
Might we allow other axes e.g. spatial frequency or whatever CMB
people use for the order of spherical harmonics? In Char we have
proposed that there has to be at least one axis other than the
observable, but it could be any dimension along which measurments are
made.

Spectrum.Char.SpatialAxis.Location.Value: I presume that RA and Dec
are there as examples, as having to do coordinate conversion should be
easy for VO tools and thus is an uneccessary burden for archivists;
moreover it may be inappropriate e.g. for solar data.

Spectrum.Char.SpatialAxis.Coverage.Bounds.Extent: Bounds is box
corners in Char; I can see confusion if it is a diameter in Spectrum -
either call it something else or make it a box.

I think that .Bounds.Extent is the approximate (inclusive) area within
which observations were made and this could well be a box.  If you
want to use a radius then maybe that should be called e.g. Aperture
instead.

The spatial resolution of the instrument should also be somewhere but
I can't find it in the table - see notes below on 4.6.3


Spectrum.Char.SpatialAxis.Coverage.Location.Value and .Bounds.Extent
(or whatever) should be 'should' not 'must'.  There are occasions
(e.g. some astroparticle detections?) where there is no known location
in the sense of a sky direction, let alone aperture size, and either
or both concepts may be meaningless for simulated data or for stacked
spectra (e.g. 'a typical X-ray SED of a z=4 source' made from many
observations using many instruments) - and are not vital for all
analysis software.

Spectrum.Char.TimeAxis.Coverage.Location.Value and .Bounds.Extent
Unless the data are time series, these should be 'should' not 'must'
as not all spectra etc. have recorded times.

Spectrum.Char.SpectralAxis.Coverage.Location.Value and .Bounds.Extent
Unless the data are spectra, these should be 'should' not 'must' as
the VOEvent people don't see the waveband as indispensible, I believe.

Spectrum.Data.SpectralAxis.Value - One of these should be 'must'
Spectrum.Data.TimeAxis.Value	 - depending on type of data?
 				   (not always SpectralAxis)

- but I have a conceptual problem with this section, and with the
example on p40 - surely the model is supposed to describe the data,
not 'be' the data! Is p40 just an example? Surely this model is not
saying that e.g. FITS binary spectra must be converted to ascii-based
xml?  Maybe this is useful as a standard for the actual data where
SEDs are constructed on the fly by the VO from many separate
photometry points? Is that consistent with the ESO tool, for example,
or with SPECFind/ or with the input formats expected by spectral tools?
But not for all spectra.

4 Measurement objects

4.1 Note comment above that the Spectrum.Char.SpectralAxis should only by
'must' for spectral data not time series etc.

As far as I know, STC is compatible with the various Greissen et
al. papers and we should make full use of STC wherever appropriate.  I
presume that we are also allowing multiples of units as implied by
3.2 (e.g. wavenumber is often cm^-1)


4.2
   Jy/beam    also Jy/pixel, Jy/arcsec^2 etc.  (and W/arcsec^2...)

4.4 Time coordinate should be a 'must' for time series.  Again, I hope
   that multiples of the units are allowed e.g. ms for pulsar timing
   data.

4.5 Note comment above - there are many reasons why it is inadequate
   to restrict Spectrum.Char.SpatialAxis.Coverage.Location.Value to RA
   and Dec in decimal degrees and in fact on p27 Spectrum quotes the
   STC Coord Frames - we should allow these!

   Also the question of spectra without a unique position is
   acknowledged here, all the more reason to make
   Spectrum.Char.SpatialAxis.Coverage.Location.Value 'should' rather
   than must.

4.6 Can we make sure that the simpler levels of Accuracy and
   Uncertainties provided for in Spectrum are consistent with Char and
   in turn with STC - thus the Char accuracy axis could be used for
   more subtle levels with minimum confusion?

4.6.1
Maybe I have misread this, but I don;t see why bin size and explicit
limits are mutually exclusive - software might need either, why not
let the data provider give 'the bin size and/or both of the high and
low limits'
This is essentially the same as Sampling in Char, can we make sure
that it is consistent?

4.6.3 Resolution
'trivial' has unfortunate connotations (as in the trivial solution to
an equation is usually the useless solution...) - is what is meant,
that Spectrum provides the simplest level of resolution?

This is the top level in Char, i.e. the reference or typical value,
and Char could be used if more a detailed description (upper and lower
bounds etc.) were required for the 1-D axis along which the Observable
is measured, e.g. the Spectal or Time axis.  For the Spatial axis,
Spectrum needs to have more detailed (but still optional or 'may'
parameters.
Firstly, one of the nicest uses of Spectrum may be that it allows the
construction of aperture-matched SEDs, i.e. stringing together
individual photometry points, SEDs and high-resolution spectra, in a
homogenised fashion. At the very least, the user should have the
resolution information to make this possible.
Secondly, if an allowed unit is Jy/beam, you need to know what the
beam size is!
Thirdly, the spatial resolution may be more or less than the pixel
size and more or less than the bounds, e.g. for resampled data or data
whch have been extracted by drilling through a cube.

I suggest that the STC elements for Resolution are used in addition to
a reference value (the simple circular FWHM) to allow for ellipses at
given position angle etc.

4.6.4 Maybe I have misunderstood, but are you expecting data providers
     to convert quality masks to some XML representation? As per my
     earlier concern - I can't see people converting their entire
     archive data collections to a new format when they know that most
     of the VO has gone to a lot of trouble to make tools which will
     read FITS, VOTable, and convert in ascii etc... The provision to
     allow that might be useful, but not compulsory; concomitantly, a
     key for the existance of a bitmask is what is important, I think,
     not reproducing it.


5.1 provides a list of Coordsys but 5.1.4 then says that it does not
   use them.  I disagree strongly with this.  If this model is any use
   for high-resolution spectra or Galactic spectra, then
   Firstly, there needs to be a Velocity axis quite separate from
   redshift;
   Secondly, there should be provision to specify whether velocities
   are LSR, Heliocentric   etc,. etc. (it makes a huge difference
   e.g. studying the water masers of NGC 4258)
   Thirdly, the rest frequency (etc.) should be allowed for, as many
   data sets may have one frquency axis but several velocity axes as
   several overlapping transitions are present.

5.1.5 I suspect that it would be simpler in the long run to stick with
     STC.  How does the proposed system handle e.g.
     Spectrum.Char.TimeAxis.Coverage.Location.Value in MJD
     which contains several timing bands expressed in millisec offset
     from that value?


best wishes

Anita


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dr. Anita M. S. Richards, AstroGrid Astronomer
MERLIN/VLBI National Facility, University of Manchester, 
Jodrell Bank Observatory, Macclesfield, Cheshire SK11 9DL, U.K. 
tel +44 (0)1477 572683 (direct); 571321 (switchboard); 571618 (fax).



More information about the dm mailing list