Spectrum data model

Mark Taylor m.b.taylor at bristol.ac.uk
Wed Sep 13 11:34:31 PDT 2006


On Wed, 13 Sep 2006, Doug Tody wrote:

> Hi All -
> 
> There have been various discussions on the data model / file format
> issue, but to keep it simple I will respond to Mark's original message.
> 
> On Wed, 13 Sep 2006, Mark Taylor wrote:
> 
> > [...] this relates to a point that I raised with Markus Dolensky last
> > week and he forwarded to Doug concerning SSAP and serialization
> > formats.  Since it's come up here, I'll shove my oar in.
> >
> > My problem is that the information returned from an SSAP query
> > gives the serialization MIME type, but no more - as you point out
> > above, the fact that a spectrum is encoded as FITS could cover any
> > number of specific serialization formats.  So a client trying
> > to make sense of a spectrum returned from SSAP, which only has
> > the MIME type got from the Access.Format response field to tell
> > it what kind of data is at the other end of the Access.Reference,
> > has an unneccessarily difficult job, in that it really has to
> > examine the data itself to work out what the serialization format is
> > (and in doing that it may end up downloading a large data file only
> > to find out that it is in a format that it can't understand).
> >
> > Possibly the intention is that an SSAP Access.Format of application/fits
> > means the data is in the FITS format defined in the Spectral DM
> > document (ditto for application/x-votable+xml, application/xml),
> > but I can't see this stated explicitly anywhere.
> >
> > Otherwise, it seems to me that what is called for is an additional
> > field in the SSAP response which names the specific serialization
> > format, if known.  This would require assigning some sort of name
> > to the XML, FITS and VOTable formats defined in the Spectral DM
> > document (presumably a URI of some sort).
> 
> This is primarily a query matter whereas Spectrum is a dataset data
> model, hence we are getting into issues here which aren't addressed by
> the Spectrum model alone.
> 
> We distinguish between the data model and the data format or
> serialization.  Both are described in the query response.  Since the
> same data object, conformant to the Spectrum data model, may be viewed
> via various formats/serializations, it is not clear whether the data
> model itself should specify the serialization; my view has always been
> that this best done externally, e.g., in the access protocol.

I agree that this is primarily DAL business not DM (so admittedly
we're on the wrong list, but it started here, and I think most 
interested parties are reading; if you want to send followups to 
dal@ though go ahead.).

> What we currently have in the access protocol in this area:
> 
> 	Dataset.Type		# Spectrum, TimeSeries, etc.
> 	Dataset.DataModel	# Data model, e.g., "Spectrum V1.0"
> 	Access.Format		# File format (MIME type)
> 
> If the DataModel is "Spectrum" then we have a fully VO-compliant dataset.
> (Yes, services will need to perform a conversion on the fly to return
> a dataset compliant with the VO Spectrum data model.)
> 
> If instead the service returns native project data (typically different
> for every data collection/mission/instrument) then Dataset.DataModel
> should identify the specific project data model for the data to be
> returned.  This is the "pass-through" mechanism for accessing native
> project data via an SSA query interface.  An application doesn't have to
> scan the data file to determine what it contains, this is specified
> directly by the dataset Type and DataModel.
> 
> The data format or serialization is (in principle at least) independent
> of the data model.  This is true for Spectrum but in general will not be
> true for native project data, where there is typically only one format.
> Currently, the file format is specified by its MIME type.

I follow this but I still have a problem: the Dataset.DataModel and
Access.Format are not sufficient to describe fully the format of a remote
data file (which I would like to know, for instance, if I'm a client
who has the SSAP query response and wants to know whether it's
worthwhile retrieving the remote resource or whether it's unreadable
to me).

The reason is that there may be multiple different and incompatible
serializations which share the same Data Model and MIME type.
For instance, an instance of the Spectrum V1.0 DM could be encoded
as application/fits either using the serialization suggested in
Part 4 of the current Spectral DM document or as, say, a 1D FITS
array, or any number of other FITS-based serializations I could
dream up.  Possibly the intention is that the *only* legal serialization
of Spectrum V1.0 into FITS is following the prescription in
spec98c Part 4 (though I'm not sure that would be a good idea),
but as far as I know that is not explicitly stated anywhere.

Hence my request for an additional specification somewhere (in the
SSAP response) which ties down the serialization more tightly.

Mark

-- 
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/



More information about the dm mailing list