[ObsCoreRFC]Minutes of the telco Monday June 6

Arnold Rots arots at head.cfa.harvard.edu
Tue Jul 5 08:51:57 PDT 2011


See below

Douglas Tody wrote:
> On Thu, 9 Jun 2011, Arnold Rots wrote:
> 
> > 3. dataproduct_type, dataproduct_subtype, access_format
> > I still think the scheme that is proposed is incomplete since it is
> > ill-suited (as currently defined) to accommodate datasets (i.e.,
> > collections of files).
> > I would like to suggest that it would be good to add a
> > dataproduct_type "package" (or some such thing) that indicates that
> > the client will be receiving not just a single file. However, the
> > client will still want to know what is in the package, so maybe the
> > subtype should contain a list of the science file data types?
> > In access format we are running into a somewhat similar problem:
> > it's nice (and necessary) to know that a tar file is coming, but it is
> > equally important to know what kinds of formats are hidden inside that
> > tar file: if it is, say, Cobol code, I am not interested. Should it be
> > a comma separated list? Or something like "tar(fits,pdf,txt)"?
> 
> Complex datasets are handled by the scheme.  It is true that we don't
> really have a way to define what is inside a tar, zip, FITS MEF,
> directory, etc.; that would be quite complex to attempt.  However
> support for this use case is provided in two ways.
> 
> First, the subtype may be used to define what the data object is in
> collection or archive specific terms.  For example if the data object is
> a tar file containing all the files comprising a ROSAT observation the
> data provider can define a subtype for this type of data.  It is up to
> the client to understand what the content of the proprietary data
> product is, but if they are able to deal with such instrument-specific
> data they probably do know what it is.

This is precisely the case I was trying to solve: a tarfile containing
a mix of data types: images, spectra, event lists.
The way I would like to solve it is to allow "package" (or something
similar) for the data type and enumerate the data files contained in
the tarfile in the data subtype.

It still leaves a similar issue for the access format: that would be
tar, but it would be nice to be able to enumerate the formats of the
files in the tarfile in a similar format subtype - that also would
allow one to indicate whether or not the content of the the tarfile is
gzipped (as opposed to gzipping the tarfile itself).

I realize that this constitutes a use of subtypes that is different
from the original intent (at least, I think so), but it does seem a
useful mechanism.

> 
> Second, it is possible to expose the individual files comprising the
> complex dataset.  Then all the metadata can be specified separately
> for each data product allowing a full description.  All data products
> would share the same obs_id hence they are still associated as a complex
> dataset.

That is a trivial case that is completely covered and I have no issue
with it.

> 
> Which approach is better probably depends upon how one expects the data
> to be used.  If the client will almost always want to get all the data
> elements at once (e.g. for custom reprocessing or analysis of
> instrument-specific data) then the first approach is probably
> preferable.  If they are more likely to want only a higher level derived
> data product such as an image or spectrum, the second approach might be
> preferred.  Combinations of the two approaches are also possible since
> obs_id can link multiple associated data products of any type.
> 
> On Thu, 9 Jun 2011, Arnold Rots wrote:
> > Are you saying that it is unwise to include optional columns in a
> > query, because it may cause them to error out?
> > Then why do we bother with optional items?
> > It seems to me that their use is discouraged. By not specifying how
> > servers should handle them we render them useless, don't we?
> 
> Not at all.  The optional columns are ignored by a generic query without
> error but are still useful to more fully describe the data to the client
> or user.  Also, it is possible in a subsequent query to the specific
> service providing this extra metadata to reference the custom elements,
> and still have a well-formed query.  In this way the general mechanism
> can be used to pose more precise archive-specific queries, but the
> ability to pose generic queries to a number of services has not been
> compromised.

That's fine, but then I fail to understand the problem with having
polarization metadata optional. That was the original issue and, if I
understood the discussion correctly, the argument was made that if
polarization was made optional, it would lead to many query errors.

> 
>  	- Doug
> 
--------------------------------------------------------------------------
Arnold H. Rots                                Chandra X-ray Science Center
Smithsonian Astrophysical Observatory                tel:  +1 617 496 7701
60 Garden Street, MS 67                              fax:  +1 617 495 7356
Cambridge, MA 02138                             arots at head.cfa.harvard.edu
USA                                     http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------


More information about the dm mailing list