[QUANTITY] Plea for pragmatism

Doug Tody dtody at nrao.edu
Thu Oct 30 12:55:33 PST 2003


> > Our most urgent need is for the component data models, 
> 
> Could we try to make a list of components, from the common concepts listed
> in Strasbourg
> (http://www.ivoa.net/twiki/bin/view/IVOA/InterOpOct2003DataModel)
> and perhaps some items of interest evocated in Cambridge,
> and then give them priorities.

Here is a first cut at the component data models needed for data
characterization.  This is based on further development of our Cambridge
etc. discussions.

Coverage / Characterization Component Data Models

    sky coverage
        coverage on the sky, if applicable
        e.g., a circular or rectangular region or aperture on the sky
	Can be used to estimate the WCS but the full WCS should be
	defined elsewhere.

    time coverage / bandpass
        time of observation
        refValue, hiValue, loValue, fillFactor
        refValue is the mean time of observation, e.g., mid-point

    spectral bandpass
        range of spectral frequencies in data
        id, refValue, hiValue, loValue, units, fillFactor
        id is user-defined bandpass name, e.g., "V", "SDSS_U", "K-Band", etc.
        refValue is the characteristic frequency of the bandpass

    spatial bandpass
        range of spatial frequencies in data
        hiValue, loValue, units, fillFactor (no refValue)
        loValue is also known as the spatial resolution

    flux bandpass
        range of flux values in data
        hiValue, loValue, units, fillFactor (no refValue)
        loValue is also known as the limiting flux or magnitude
        hiValue is saturation limit or maximum flux

All of the above may not be valid for a particular dataset.  Most datasets
would in the ideal world have all these attributes specified.  The same
components could be used regardless of the type of data, e.g., both an
"image" and a "spectrum" have a spectral bandpass, and we could use the
same component to describe either dataset, to drive queries, etc.

Did I leave anything out?  Polarization perhaps, but that might be better
dealt with elsewhere.

The refValue/hiValue/loValue concept comes from the spectral bandpass
model introduced in SIA.  From conversations with CVO I think they have
a similar approach which adds fillFactor.  The refValue if any is the
characteristic value.  The filling factor says something about how much
of the hi/lo range specified is actually "covered".  For example, if an
image does not contain valid data for the full region the fillFactor is
less than 1.0.  If we combine a number of exposures the time coverage
fill factor will be less than one.

The idea above is that concepts such as spatial resolution and limiting
flux or magnitude might be better represented as hi/lo values in a more
uniform bandpass model.

Other fundamental metadata required to describe a dataset includes the
dataset identification (title, creator, creator dataset ID, publisher,
publisher dataset ID, and so forth), and provenance information (where
the data came from, e.g., virtual data description).

The priority order is roughly as presented above, although I would put
dataset identification at the highest priority.

I don't think the actual "summary data models" required need to be much
more complicated, in terms of numbers of attributes per model, than what
I sketch out above.  If we add more detail later, e.g., a transfer curve
for a spectral bandpass, or other more fine-grained modeling information,
this can be done in such a way that the summary data model is still
retained and still valid.

	- Doug



More information about the dm mailing list