[ObsCoreRFC]Minutes of the telco Monday June 6

Wed Jul 6 07:29:49 PDT 2011

I think I am beginning to realize what it is that makes me so
uncomfortable with ObsTAP and what makes it so hard to grasp the
correct way to implement it: its ambivalence.

It is primarily intended (I think) as a data discovery interface.
The problem is that it also doubles as a data access tool.
I think it is the intertwining of these two functions that makes it murky.
And I wish these two functions had been separated into separate intefaces.
I know this is not an issue for some observatories (say, the ones that
only produce simple 2-D images), but it makes life difficult for more
complicated datasets.

As a data discovery tool, I would have expected its purpose to be:
- find available observations that fall within certain constraints in
  time, space, frequency, etc.
- tell me what kind of data products are available for each

For a data access tool:
- Give me the URL to a specific (set of) type(s) of data product for a
  specific (set of) observation(s)
For all I know, this role could be played by SIAP. SSAP, SCS, or
whatever protocols are already in existence.

The trouble is that for Chandra data, the intertwining of the two
functions requires us to duplicate each ObsCore record six times to
enumerate, laboriously, the different data types we can provide.
When it comes to proper data discovery, it makes much more sense to
return a single record with the ObsCore parameters and a list of
available data product types (event lists, images, light curves,
spectra, tarfiles with all of the above, etc.).

Btw, Use Case 1.6 misquotes MJD as Mean Julian Date. Should be
Modified Julian Day.

I hope you don't mind these ruminations, but these are things that I
am discovering as we are trying to implement this - and it is hard.

Cheers,

  - Arnold

Douglas Tody wrote:
> On Tue, 5 Jul 2011, Arnold Rots wrote:
> 
> >> First, the subtype may be used to define what the data object is in
> >> collection or archive specific terms.  For example if the data object is
> >> a tar file containing all the files comprising a ROSAT observation the
> >> data provider can define a subtype for this type of data.  It is up to
> >> the client to understand what the content of the proprietary data
> >> product is, but if they are able to deal with such instrument-specific
> >> data they probably do know what it is.
> >
> > This is precisely the case I was trying to solve: a tarfile containing
> > a mix of data types: images, spectra, event lists.
> > The way I would like to solve it is to allow "package" (or something
> > similar) for the data type and enumerate the data files contained in
> > the tarfile in the data subtype.
> >
> > It still leaves a similar issue for the access format: that would be
> > tar, but it would be nice to be able to enumerate the formats of the
> > files in the tarfile in a similar format subtype - that also would
> > allow one to indicate whether or not the content of the the tarfile is
> > gzipped (as opposed to gzipping the tarfile itself).
> >
> > I realize that this constitutes a use of subtypes that is different
> > from the original intent (at least, I think so), but it does seem a
> > useful mechanism.
> 
> Arnold - I agree that in principle it would be useful to have this extra
> information.  However we had to argue for quite a while to get support
> for instrumental data at this level included at all.  One *can* expose
> this data with ObsTAP 1.0 as outlined in my earlier email; in particular
> exposing the individual data products separately allows them to be
> described if the data provider wants to do so.  Even exposing only the
> tar/zip/MEF etc.  file works so long as the client recognizes the
> subtype.
> 
> To attempt to the describe the contents of arbitrary complex
> instrumental datasets is out of scope for ObsTAP, at least 1.0.  Perhaps
> we can address this issue in the next phase of development where we
> prototype related mechanisms such as data linking.
> 
> > However, there is also the reverse problem: what do we do with data
> > products based on multiple observations? Do we allow ObsId to be a
> > list of ObsIds?
> 
> This was addressed in the document as I recall.  In the case of complex
> data products which are derived from multiple inputs (e.g.  multiple
> observations) which essentially have a new "software observation", and a
> new obs_id should be assigned.  To say more about the derivation of a
> particular data product is complex and gets into the general issue of
> provenance which is being addressed separately.  Furthermore obs_id is a
> database key used to uniquely identify specific "observations" (usable
> as a foreign key in other tables for example) hence we cannot turn it
> into a list of obs_ids.
> 
>  	- Doug
> 
--------------------------------------------------------------------------
Arnold H. Rots                                Chandra X-ray Science Center
Smithsonian Astrophysical Observatory                tel:  +1 617 496 7701
60 Garden Street, MS 67                              fax:  +1 617 495 7356
Cambridge, MA 02138                             arots at head.cfa.harvard.edu
USA                                     http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------