Spectrum data model

Jonathan McDowell jcm at head.cfa.harvard.edu
Tue Sep 12 18:01:59 PDT 2006


Hi Anita,

Thanks for your comments. I'll take on a few of them.

> 
> I like the provision of ucd1+ (I presume that is what is meant
> everywhere you say UCD in any form) but we will need to help data

Yes.


> providers supply these, or include default ucd1+ for the most basic
> elements in the templates.

? I thought I have provided those in the document...

> Regarding defaults for element values, in Char we felt that this was
> best done by the software agents using the model, e.g. if a task looks
> for upper and lower error bounds and there is just one reference value
> given then the software knows to take the single reference value for
> both bounds.

We decided to make it explicit here. The software agents can of course
provide a layer which hides this from users.

> All these features might be useful for Char also.  I have not tried to
> compare Spectrum and Char schemata in detail for consistency although
> I know that is important.

I've tried to be pretty close.

> 
> It would be better to use 'must', 'should' and 'may' instead of
> required and optional.  Not only is this consistent with SIAP etc. but
> it allows more flexibility for data sets which may be missing a
> desirable parameter - or for elements where we have imposed rules
> which will turn out to be impractical in use.

OK. required= 'must' and optional='may', and the question
is which if any items should be 'should'. I'll think about this,
although feel this can wait as it won't affect software much (should==may
for any computer, except for something checking compliance level).

> A data provider who is willing to start filling in a data model at
> all, will want to do it 'properly' insofar as they have the metadata
> available.  For a limited number of 'must' fields they might be
> willing to scratch around for missing information, but not if it takes

That's why the must (required) items are a very, very few items
and most things are optional.

> 
> I have tried to think how I would apply Spectrum to Galactic and
> extra-galactic data, to blind surveys and to time series.
> 
> 3.1 Summary
> 
> line 2 (e.g. aperture, position, etc.) - i.e. add the 'e.g.' so that
> sharing aperture and position is not obligatory, to allow for SEDs
> from multiple instruments, stacked spectra of scattered galaxies of a
> particular type etc.

We'll revisit this in some more detail with the SED document.


> 
> Omit references to mandated units e.g. seconds, MJD in the pink boxes as
> section 3.2 deescribes how easy it is to interconvert - maybe add a
> sentence to say that the data will be in units which can be converted
> to recognised SI and astronomica units using STC and dimensional analysis.

This was a big argument among the authors. I agreed with you, but
other authors felt it was important, at least for an early release,
to restrict the units. It's easier to open up the choice in a later
release that to make it more restrictive later.

 
> I have not commented on mandatory ucd's but as mentioned above I don't
> think that they should be mandatory unless we can provide them for
> archivists who aren't familiar with the concept.

Right, but the table does this - the only mandatory UCDs are
those for the X and Y axes, which we carefully tabulate.


> 
> Tables of elements
> 
> Spectrum.Target.Name      - I agree that there should be a unique
>                              identifier for every description made
>                              using Spectrum, but is this the
> Spectrum.DataID.DatasetID - or is that for a parent dataset?

The Target.Name is mostly for humans and for by-object-name queries.
The DataID.DatasetID is really the parent dataset. I guess I don't
necessarily see the name for a separate ID for the VO version
of the dataset.

> Pedantically, would Spectrum.Target.ID be better, to avoid confusion
> with recognised IAU or SIMBAD names (which are not appropriate
> e.g. for blind surveys or astroparticles).

Well, I think Target.Name is reasonable, and will be the SIMBAD name
for things which have them and 'Survey Field 12' for other things.
If we'd called it Object.Name I would have more of a worry.


> Spectrum.Target.VarAmpl - Is this for spectral data which vary with
>  			  time? What is it for? Should it be on the
>  			  Flux axis? For stellar variability you also
>  			  want the Spectrum.Target.VarPeriod and
>  			  Spectrum.Target.VarPeriodOrigin to be really
>  			  useful but maybe this is overkill.

This is metadata requested by some providers. It's to warn you
that this thing varies by a factor of X so you might worry
that an SED with bits of the observation taken at different times
may be hard to interpret. It's not intended to characterize the
variability in detail in the way you describe.


> Spectrum.Char.SpectralAxis.Unit - One of these should be 'must'
> Spectrum.Char.TimeAxis.Unit     - depending on type of data?
>                                    (not always SpectralAxis)
> Might we allow other axes e.g. spatial frequency or whatever CMB
> people use for the order of spherical harmonics? In Char we have
> proposed that there has to be at least one axis other than the
> observable, but it could be any dimension along which measurments are
> made.

I think we can extend it later, we didn't want to get into that
for V1.0.

> Spectrum.Char.SpatialAxis.Location.Value: I presume that RA and Dec
> are there as examples, as having to do coordinate conversion should be
> easy for VO tools and thus is an uneccessary burden for archivists;
> moreover it may be inappropriate e.g. for solar data.

Again, this was a V1.0 issue. We would replace this with an STC
reference in later versions.


> Spectrum.Char.SpatialAxis.Coverage.Bounds.Extent: Bounds is box
> corners in Char; I can see confusion if it is a diameter in Spectrum -
> either call it something else or make it a box.

Well, the idea is that Bounds.Extent is an alternate representation
of the same information in Bounds.Min/Bounds.Max. It's more appropriate
for spectra, but you can convert it. 

> The spatial resolution of the instrument should also be somewhere but
> I can't find it in the table - see notes below on 4.6.3

It's in Spectrum.Char.SpatialAxis.Resolution.
 
> 
> Spectrum.Char.SpatialAxis.Coverage.Location.Value and .Bounds.Extent
> (or whatever) should be 'should' not 'must'.  There are occasions
> (e.g. some astroparticle detections?) where there is no known location
> in the sense of a sky direction, let alone aperture size, and either
> or both concepts may be meaningless for simulated data or for stacked
> spectra (e.g. 'a typical X-ray SED of a z=4 source' made from many
> observations using many instruments) - and are not vital for all
> analysis software.

Hmm. I'd like to introduce a concept of 'must' which means 
'must, if can be defined'.

> 
> Spectrum.Char.TimeAxis.Coverage.Location.Value and .Bounds.Extent
> Unless the data are time series, these should be 'should' not 'must'
> as not all spectra etc. have recorded times.

You usually know the date to within at least a decade.

Location.Value =  1975 Jan 1
Bounds.Extent = 10 years

> 
> Spectrum.Char.SpectralAxis.Coverage.Location.Value and .Bounds.Extent
> Unless the data are spectra, these should be 'should' not 'must' as
> the VOEvent people don't see the waveband as indispensible, I believe.

The VOSpectrum authors disagreed.

> 
> Spectrum.Data.SpectralAxis.Value - One of these should be 'must'
> Spectrum.Data.TimeAxis.Value	 - depending on type of data?
>  				   (not always SpectralAxis)
> 
> - but I have a conceptual problem with this section, and with the
> example on p40 - surely the model is supposed to describe the data,
> not 'be' the data! Is p40 just an example? Surely this model is not
> saying that e.g. FITS binary spectra must be converted to ascii-based
> xml?  Maybe this is useful as a standard for the actual data where
> SEDs are constructed on the fly by the VO from many separate
> photometry points? Is that consistent with the ESO tool, for example,
> or with SPECFind/ or with the input formats expected by spectral tools?
> But not for all spectra.


We distinguish in the companion SSA protocol document between 'foreign'
data and VOSpectrum serializations.  The problem is that "the input
formats expected by spectral tools" currently encompass many dozens of
different input formats, even  within FITS. This is very different from
the image case, where everyone can interpret a FITS image. We felt that
we had the responsibility to provide proposed specific serializations which
could be VO standard ways of communicating spectral data. The SSA
protocol will allow 'foreign', meaning that you didn't bother to 
convert to one of these formats. But for those who do bother, it allows
the possibility for your archive to also serve up a standard format.
We provide standard formats in ascii XML, ascii VOTABLE and in
FITS binary table. I believe that unless we can evolve towards
such standards, spectra will NOT be interoperable. At first, not
everyone will do such conversion, and you'll have to hope that
your software can consume them. But if it catches on, the pressure
to provide the conversion will increase.

We have had some success in interoperating with different sites
using an earlier iteration of the formats.

> 

> 4.5 Note comment above - there are many reasons why it is inadequate
>    to restrict Spectrum.Char.SpatialAxis.Coverage.Location.Value to RA
>    and Dec in decimal degrees and in fact on p27 Spectrum quotes the
>    STC Coord Frames - we should allow these!

I was directed by the Exec and by those present at the
last (Victoria) WG meeting to push Spectrum to completion without
waiting for STC and Char. We therefore include a subset of STC and Char
in the schema. The hope is to phase the full versions
in when they are adopted.

> 
>    Also the question of spectra without a unique position is
>    acknowledged here, all the more reason to make
>    Spectrum.Char.SpatialAxis.Coverage.Location.Value 'should' rather
>    than must.

Again, we are trying with 1.0 to take a different tack than
some other data model projects in the VO: think of these issues and
make sure we can expand to include them, but make a simpler first
implementation which won't support everyone's needs.

> 4.6 Can we make sure that the simpler levels of Accuracy and
>    Uncertainties provided for in Spectrum are consistent with Char and
>    in turn with STC - thus the Char accuracy axis could be used for
>    more subtle levels with minimum confusion?

I believe I have ensured that they are not significantly in conflict.

> 
> 4.6.1
> Maybe I have misread this, but I don;t see why bin size and explicit
> limits are mutually exclusive - software might need either, why not
> let the data provider give 'the bin size and/or both of the high and
> low limits'
> This is essentially the same as Sampling in Char, can we make sure
> that it is consistent?

Fair enough.

> 
> 4.6.3 Resolution
> 'trivial' has unfortunate connotations (as in the trivial solution to
> an equation is usually the useless solution...) - is what is meant,
> that Spectrum provides the simplest level of resolution?

Yes


> 5.1 provides a list of Coordsys but 5.1.4 then says that it does not
>    use them.  I disagree strongly with this.  If this model is any use
>    for high-resolution spectra or Galactic spectra, then
>    Firstly, there needs to be a Velocity axis quite separate from
>    redshift;
>    Secondly, there should be provision to specify whether velocities
>    are LSR, Heliocentric   etc,. etc. (it makes a huge difference
>    e.g. studying the water masers of NGC 4258)
>    Thirdly, the rest frequency (etc.) should be allowed for, as many
>    data sets may have one frquency axis but several velocity axes as
>    several overlapping transitions are present.

OK, I'll have to think about this and discuss it with you.

 - Jonathan 



More information about the dm mailing list