Comments on Canadian VO data model

Tue Apr 22 10:40:10 PDT 2003

Canadian VO Data Model Comments     - Jonathan McDowell
-------------------------------

The Canadian VO have published details of the data model used to
describe images in their archive.
The relevant documents are at
http://services.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/doc/cvo/
This data model is used to describe images and potentially
spectra and other data products returned from the CVO (the voObs
object), and also to describe entries in the derived source
catalogs (the voSrc object which I do not review here).
I'm sending my comments to the whole list in the hope of prompting
the rest of you to look at their documents too.

I think there are a few changes to the voObs model which could
make it more general. The major comments I have are:

A) lack of uniformity on axes
B) lack of information on observables.

(A) First, the axes: Spatial, Temporal and Spectral. Each of these have a
lot of overlap but not completely; this seems unfortunate because
if you want to add another axis it's hard to generalize.

Specifically, the relevant attributes are:

          Spatial                 Temporal             Spectral   

Shape     _bounds_eq [deg]          NONE                 NONE
Bounds      NONE                  _bounds [s?]         _bounds  [A]
Sample    _sample  [deg/bin]      _sample [s?/bin]     _sample  [A/bin]
Bins        NONE                  _bins   [bin]        _bins    [bin]
Fill      _fill                   _fill                  NONE
Res.      _resolution [deg]         NONE               _resolution [A]
Nyquist   _Nyquist                  NONE               _Nyquist 
[Deprecated:]
Span      _span      [deg?]       _span   [s?]         _span    [A]

Notes:

A.1  Spatial bounds are given as polygon nodes in J2000, and
     repeated as galactic and ecliptic. See notes on regions and bounds below.

    The choice of polygon nodes as the description of 2D
    regions is a fair one for the application in question, but doesn't generalize
    well to other VO uses. Eventually one should support a general VO region
   (which can include a circle, for instance, not supported here). 

    I would argue that it would be nice to have 'bounds' mean the extreme
    bounds of each coordinate, as it does for the other axes.
    As described, the spatial bounds can be a complicated polygon giving
    the exact shape of the detector, but the temporal bounds are a simple
    range giving the outer hull of the temporal window function.

    It is useful to have this outer bounds to answer the question 'might this
    dataset contain stuff of interest'. The detailed shape (detector polygon,
    temporal start and stop intervals) is needed when you get to actually
    analysing the data; the next step up is the sensitivity map and effective
    exposure depth versus time. The detailed information should
    accompany the data when it is retrieved, but arguably may not be needed
    at the index layer that this data model seems to represent.

A.2  Why no spatial_bins ? This seems a critical piece of info 
     (e.g. 1024 x 1024 image, or 1x1-spatial-pixel spectrum...)

A.3  Why no spectral_fill?  Not needed very often, but consistency is
     helpful.  
     I'm not fully convinced fill is that useful a value, since usually what
     you want is really to take a variable QE across the detector axis
     into account, rather than just an on/off - although I guess in the temporal
     case a simple fill number is often useful.

A.4  Why no temporal resolution or Nyquist?  
     For old, historical observations the accuracy of the recorded
     observing time may be poor (I've seen data in the literature,
     which one could imagine scanning back in, where the observational
     date is only known to a year or so. Bad, bad referee.)

A.5  It seems a bit labored to have Nyquist as a separate attribute
     (rather than method) since it is simply the ratio of two other attributes.

B) Observables

The "content properties" attributes give derived properties of an image
that are really the summary of a derived catalog for that image.
But the huge thing that seems to be missing here is a description of
what the pixel values in the data actually represent - I think the
implied assumption of your model is that they are flux values in Jansky
(or if you prefer, Janskys, but please, not "Jansky's" :-)), or something
that can be converted to that. 

Even within this assumption, I think there's crucial information that
could be added:
  - actual units of image
  - is the photometry absolutely calibrated, or not?
  - is it linear, or in magnitudes (instrumental or standard)
  - other indications of photometric quality
  - saturation level

But I think one should allow for the possibility that what is in the
data is not sky intensity but some other quantity:
  - spatial image of spectral index (or B-V color)
  - spatial image of ISM extinction, or Faraday rotation measure
  - spatial image of CMB dT/T anisotropy
  - extinction versus wavelength
  - integrated line flux versus time
  - radial velocity versus time
  - observatory humidity versus time

So I would propose

  observable_quantity: String [REQUIRED]  The quantity represented by the pixel values.
              The usual value is "SKY FLUX DENSITY". 
  observable_unit:     String [REQUIRED]  The unit of above, e.g. "Jy", "count",
                                          "mag".

As for the content properties:

I'm intrigued by the choice of S/N = 10 for your point source
reference. I would have thought that S/N = 3 might be more helpful
for people who are interested in 'is there a chance my source might be there?'
which I think is the most common question.

Again, one can generalize on axes. The number density things
are crying out for generalization: How about
 spatial_feature_density_positive_total
 spatial_feature_density_positive_resolved
 spectral_feature_density_positive_total
 spectral_feature_density_positive_resolved
 spectral_feature_density_negative_total
 spectral_feature_density_negative_resolved
 temporal_feature_density_positive_total
 temporal_feature_density_positive_resolved
 temporal_feature_density_negative_total
 temporal_feature_density_negative_resolved

Negative spatial features may also be worth counting since they
may indicate localized absorption or incorrect background
estimate.