SSA and Spectrum (SDM) documents update

Douglas Tody dtody at nrao.edu
Thu Apr 1 16:55:05 PDT 2010


Hi -

I am trying to synthesize these various comments on SSA/SDM.  I think
I mostly agree with the detailed changes suggested.  The comments here
are mostly higher level.

At this point it appears that both Jonathan and Bruno agree that the
SSA metadata and the SDM, while quite similar, are not quite the same
thing (hence where there are differences they are not necessarily
"inconsistencies" but are restrictions in units, mandatory fields
etc. required by the specific application).  SSA defines the data
model used in the query response, describing the data which a
service can deliver.  SDM defines the data model of a spectrum
dataset/file/instance.

A key point which has not been adequately discussed is the role of
the generic GDS/Obs data model.  While the SDM explicitly describes
a Spectrum instance, most of what is in the SSA query response is
actually GDS/ObsDM metadata and not specific to Spectra at all - at
this level the metadata should be the same for an image, for an SED,
for a time series, etc.  It does not make a whole lot of sense for
example to have Spectrum.Target.Name and Image.Target.Name and allow
these to be two different things.  An application which deals with
multiple types of data would like global and generic metadata such
as this, which is applicable to any type of data, to be usable as
just "Target.Name" (possibly in a future version something such as
Obs.Target.Name would be preferable, but we did not have that when
SSA was defined).  Once we have a full set of DAL2 services one would
like to have the common stuff factored out so that it does not have
to be customized for every type of data.  For the purposes of data
discovery and description most of this stuff is the same.

The above is the reason why the "Spectrum" prefix does not carry over
into the SSA query response.  Rather, essentially all of the SSA query
response metadata is either generic dataset metadata, or specific to
the service mechanism itself and hence also generic and applicable to
all such services.  It is not actually from the SDM at all, we merely
require them to be consistent.  That small part of the query response
metadata which is not generic and which is specific to the type of
dataset being queried (Spectrum, Image, whatever) is what is in the
"Dataset" component model.

In other words, for a DAL2 query response what we have is virtually
all generic dataset / Obs metadata, plus a separate "Dataset" element
which is specific to the type of dataset being queried.  This generic
query response is supposed to be the same for all of the "typed"
services derived from the generic Dataset.  The data model (if any is
formally defined) for a specific type of dataset such as Spectrum,
Image, etc. also inherits the generic dataset metadata.  Ideally we
want to have all these be internally consistent for a particular major
version of the interface (1.x, 2.x, etc.).  Once we have SSA, SIAV2,
ObsTAP, etc. it is clear that it could be attractive to standardize
the generic metadata - which comprises most of the metadata returned -
among all these very similar services.

Now there are some things which could possibly be done differently
several years later, for example the "Dataset" component in the query
response might instead take on the class name of the specific type
of data which it describes, such as Spectrum or Image.  Either approach
is possible and it doesn't make all that much difference.

Some more detailed comments responding to earlier emails follow.

On Wed, 24 Mar 2010, Jonathan McDowell wrote:

> Bruno, Doug, Mireille
>
> I have now had time to review Bruno's document and here is
> my viewpoint.
>
> The SDM document as it stands is a specification for the
> data model to describe a spectrum document. The SSA is a specification
> for a query to find such a document. In retrospect, it would
> be better to define a general DM for spectrum-related applications
> which could serve as an underlying data model both for
> describing spectrum files (documents, data,..) and for describing queries
> about finding spectrum files.
>
> In fact, as shown by the SSA document and Doug's spreadsheet,
> the basic existing model can serve as both, but as Bruno notes
> the identification of certain fields as mandatory/required/optional
> are appropriate only for the spectrum file case and not the
> spectrum query case.

What is missing here is the role of the GDS/ObsDM as outlined in more
detail above.

> I am not sure that I entirely understand Doug's POV that the 'spreadsheet'
> matching the SSA and SDM should become the normative document.

This misstates my suggestion.  What I suggest is that the specification
documents should be the primary normative part of the standard, and
that the data model spreadsheet play a role similar to the XML schema
we have in other services.  The DMS is a computer readable, precise,
complete version of the raw information content of the data model.
We can use it to autogenerate or drive actual code, to generate HTML
documentation, etc.  When a data model is expressed in this form it is
easy to see an overall data model and the relationships of the elements
(the more graphical UML view is also useful).  Many ambiguities are
eliminated in a simple way, and it becomes clear how similar elements
relate and what is missing.  Some cells in the spreadsheet simply have
no value and this missing information is immediately obvious when the
information is expressed in this form.  Wide rows are not a problem,
unlike in the formatted specification document which has to fit on
a printed page.

> In SSA, the fields
>   Dataset.SpectralAxis
>   Dataset.FluxAxis
> could be mapped to SDM Char.SpectralAxisName and Char.FluxAxisName.

Maybe.  These are not quite the same thing.  Dataset.SpectralAxis etc.
were motivated by the dimensional equation approach from ESAC,
and refer to the physical storage of an axis in a dataset/spectrum
(FITS binary table column name or whatever).  In the case of Char I
think this is the name that the model assigns to the axis, with no
clear relationship to the specific serialization.

> I don't know what to do about the deletion time, Dataset.Deleted.

This (also Association) is an example of a part of the SSA query
response model which is specific to the query itself.  It has nothing
to do with the SDM and should not be described there.

On Thu, 25 Mar 2010, Bruno Rino wrote:

> At the EuroVO AIDA a small group of people interested in updating the
> SSA and SpectrumDM document gathered. These are the minutes.

     [Note this AIDA meeting is in addition to the related email
      discussions which have been going on as well].

> We set forth the following goal: To create 1.1 versions (of SSA and
> SpectrumDM) that attempt to fix the inconsistencies between the two
> documents. Clarifications and small additions should be added in
> versions 1.2. Big changes, possibly not backwards compatible, are
> postponed to 2.0 versions.

I agree that we need an update such as a 1.1 document and that we
should defer any (significant) changes which are not backwards
compatible until there is time for more thorough discussion.
There might be some issues worth discussing in the Victoria interop
however, e.g. the understated role of the GDS/ObsDM as noted above.

> 1. The SSA data model is derived, but decoupled, from the SpectrumDM.
>
> The acknowledged divergences are:
> - "required" flags (Mandatory, Recommended, Optional) are different
> - the SSA data model contains service related metadata, that have no
> meaning for the SpectrumDM
> - the SpectrumDM contains metadata related to data analysis (Data.*)
> that are of no interest for data discovery (which is the purpose of SSA)
> - in SSA, the "Spectrum." prefix found in the SpectrumDM utypes was
> dropped, and in some cases a "Dataset." prefix was added.

I agree with most of this, but it still misses the role of the GDS/ObsDM
which links all the DAL2 interfaces together.  We are focusing too much
here on just the relationship of SSA/SDM but the real issue is broader
than that.

The more detailed comments I probably mostly agree with, but we
should probably defer this level of detail to a smaller group of us
(not just limited to AIDA; certainly the original authors/editors
should be included as well).

 	- Doug



More information about the dm mailing list