Spectrum data model

Wed Sep 13 22:41:27 PDT 2006

Hi All -

After reading Anita's careful review of Spectrum (thanks Anita!) and
Jonathan's thoughtful replies I think the issues below are the most
important, so some further elaboration follows.

	- Doug

Required/optional vs must/should/may

    The advantage of must/should/may is that it allows us to differentiate
    between "minimal compliance" (all the "must"s) and "full compliance"
    (all the "should"s).  This is useful as we want minimal compliance
    to be as low a bar as is reasonable, but we would really prefer that
    most services implement at least the "should"s.  To reward service
    implementors for doing more we would do something like flag fully
    compliant services in the registry.  Hence I tend to agree that it is
    useful to make the must/should/may distinction.

    In general what is required or optional depends upon how a general
    data model is used - it might be different in different circumstances.
    For Spectrum the priorities are probably pretty clear, but for
    something more general like Char it will really depend upon the
    application (hence it is not clear how much this should be specified
    at the level of the Char spec).

Coordinate systems other than just ra/dec

    For the 2nd generation DAL interfaces it is probably too restrictive
    to limit ourselves to only ICRS/J2000, as for SIA.  For example, we
    already have folks trying to use DAL for solar data.  A reasonable
    compromise is to default coordinates to ICRS as in SIA, but provide a
    means to optionally specify a different coordinate system; whether or
    not other coordinate systems are supported would be a service-specific
    capability.

    The above refers mainly to the query interface and standard
    parameters.  To describe the actual data we probably want to
    permit the native coordinate systems of the data to be used.
    This is already done in SIA 1.0, where the WCS information allows
    the coordinate system to be specified rather than requiring that a
    new WCS be computed to publish the data.

Should Coverage.Location (or whatever) be a MUST

    I agree with Jonathan that fundamental metadata such as this is a
    "must".  Anita is correct that it may not be appropriate for all
    data, e.g., theory data, but we should at least require it where it
    is appropriate for the data.  Rather than define what "appropriate"
    means it might be better to define values such as "not applicable"
    or "undefined", and still require such a value to be specified even
    for data where the value is not applicable.  This would allow more
    rigorous queries to be performed.  The problem is, this may not be
    possible for numeric values other than in a text-based serialization.
    (I saw something like this elsewhere recently, possibly in VOEvent).

Mediation to a standard data model vs pass-through of native data

    This is an essential feature of SSA.  There is no standard
    astronomical format for spectra, and at the scale of the VO, where
    a client application may access spectra from dozens of archives,
    it becomes impractical for each client application to know how to
    deal with spectral data from dozens of different projects (sure,
    a few applications do this now for a few archives, but that is not
    good enough, and such a scheme will break whenever anything changes).

    What we want to make possible is for each SSA service to return data
    conforming to the SSA data models (Spectrum in this case), so that
    the mediation occurs once in the service rather than hundreds of
    times in remote applications.  A pass-through for "native" format
    data is also important, in part for on-the-cheap services that can't
    perform the data conversion, or more importantly, to obtain direct
    access to the native data so that clients with intimate knowledge
    of a specific data collection can take advantage of project-specific
    features of the data.  Both approaches are important.

Target.Name vs dataset IDs, collection, etc.

    Target.Name is just the name of the observed object (if any), such
    as one might pass to a name resolver.  (Title is the more important
    version of this since it always applies and is broader).

    Collection is the data collection (ShortName) e.g., "SDSS-DR4"
    or whatever.  DataID.CreatorDID is the dataset ID (URI) assigned
    to the dataset (spectrum) by its creator, e.g., the survey project
    or observatory which created the data collection.  The CreatorDID
    does not change if the data is replicated.  Curation.PublisherDID is
    the dataset ID assigned by the publisher, and will be different for
    each publisher.

    It is possible that the published dataset returned by the service may
    differ significantly from the "parent" (Creator's) dataset, e.g., in
    the case of virtual or derived data.  This can be indicated with the
    CreationType attribute.  For example, if we extract a spectrum from a
    data cube, CreatorID identifies the cube, PublisherID the extracted
    spectrum, and CreationType is something like "extracted spectrum".
    This is a primitive form of provenance model.   If a completely new
    collection is formed by analysis then a new Creator resource is
    required to describe it.