SED Data Model: Questions and Comments

Mon Feb 14 03:32:08 PST 2005

(from Igor)
>>A measure is a value and an error. If there is no flux calibration, the
>>measurement does not exist, and a data model shall not be concerned by
this.
>>DO NOT PERMIT TO WRONG OR UNRELIABLE DATA TO BE OUT IN THE VO!!!

Sorry but I fundamentally disagree with this, for the sort of reasons Ed
has stated (quoted below).  Moreover, VOs are not data police. We do not
judge the quality of the data, only of the metadata.  We must ensure that
data are described as accurately as possible and have a warning flag when,
for example, errors are not given.  A practical case - radio spectral line
data may be given in units of antenna temperature or even simply as a
percentage of the peak.  This was common 20+ years ago.  AGB stars evolve
rapidly at certain phases, and masers appear and disappear on timescales
of decades or less, giving a clue as to the evolutionary state of the
star.  A simple detection of an OH maser in 1975, say, where one is not
seen today, is very useful and exactly the sort of thing which data mining
would uncover.  This demands good spectral accuracy and reasonable
position accuracy, but no flux accuracy other than being able to see if
the line is clear of the noise.

One issue is whether VOs can cope with units which are linear but
non-physical - counts, percentage of peak, multiple of noise etc.  I think
that we must be able to do that (after all we can cope with non-linear
arbitrary units aka magnitudes).  In all these cases you can probably not
use such data in a workflow which requires mathematical processing of the
data, but you could find e.g. the statistics of objects of spectral class
K with OH masers.  This is probably a Quantity issue.

(from Ed)
> There is yet another argument for purposefully taking spectral data
> without regard to absolute calibration.  Many physical quantities rely
> only on the ratio of equivalent widths in a given spectrum.   Gas
> collision temperature and gas density are examples.  The equivalent
> width is the ratio of the area of a line to the continuum flux at the
> center of  the line so the sensitivity at that wavelength cancels out of
> the EW.   Studies of gas physics, which provide a major contribution to
> the spectral archives, need not and very often do not have calibrations.
>
> However, I would argue that some effort needs to be made to ensure some
> measure of an error estimate is on archived data.  It just is very hard
> for anyone to do valid science without this.  Calling the astronomer is
> not acceptable because most humans die.  To do Poison statistics one
> needs to at least know how many photons are represented by a count
> (DN).  So that is necessary, usually suffiicient, and easy.  In cases in
> which this is not available or appropriat, the noise can be estimated by
> assuming a certain section of the spectrum is actually smooth and so the
> fluctuations are a measure of the noise.  This is tricky however,
> because often at higher resolution or better S/N it turns out that a
> "smooth" section is a forest of lines.  Only someone who really
> understands the physics of the object can properly assess this, ie the
> observer or his/her analysis team.

These are issues which are indeed best dealt with when the data are first
harvested by collaboration between the VO and the data provider.  For data
which have yet to be produced, we can make sure that the data are properly
described e.g. with accuracy even in the small number of cases where
physical flux density units are not available.  I would rather see data
published albeit with warnings (Resource Metadata v1.0 has 3 grades of
overall quality, from totally calibrated to unknown), than left to rot.

Normally, the main criteria for how accurate the metadata are (I mean in
content, not format) would be whether the data came from a refereed
publication or an instrument archive from a facility subject to review
boards etc.  However this is not exhaustive, especially for 'legacy' data,
where it is a question of priorities.  There are some very rich archives
in the former Soviet Union which I would bit your hand off to get
published even if the only information was frequency, position and
arbitrary units, there could be a mechanism for users to add feedback if
they deduced more accurate metadata. For other data sources we might not
want to put VO efforts into enhancing metadata.  However there is an
argument for simply listing what data are potentially available so that if
the user really wants them s/he can go figure.

cheers
a