[ObsCoreRFC]Minutes of the telco Monday June 6

Mon Jul 11 06:48:06 PDT 2011

I hear what you are saying (I think), but in retrospect it would have
been much cleaner (and clearer) if the data discovery and data access
roles had not been combined in ObsTAP.
If ObsTAP had purely provided information on the availability of data,
observational metadata, the data product types, the data formats, and
the packaging, a comprehensive data access tool could have provided
the access URLs in response to a query that is based on the
information provided through ObsTAP.
As it is, the data discovery is somewhat compromised by the data
access features and the data access features themselves are not
very satisfactory.
I am afraid that this is a bit of a missed opportunity.

Cheers,

  - arnold

François Bonnarel wrote:
[ Charset ISO-8859-1 unsupported, converting... ]
> Arnold, Doug,
> Le 06/07/2011 17:54, Douglas Tody a ?crit :
> > On Wed, 6 Jul 2011, Arnold Rots wrote:
> >
> >> I think I am beginning to realize what it is that makes me so
> >> uncomfortable with ObsTAP and what makes it so hard to grasp the
> >> correct way to implement it: its ambivalence.
> >>
> >> It is primarily intended (I think) as a data discovery interface.
> >> The problem is that it also doubles as a data access tool.
> >> I think it is the intertwining of these two functions that makes it 
> >> murky.
> >> And I wish these two functions had been separated into separate 
> >> intefaces.
> >> I know this is not an issue for some observatories (say, the ones that
> >> only produce simple 2-D images), but it makes life difficult for more
> >> complicated datasets.
> >>
> >> As a data discovery tool, I would have expected its purpose to be:
> >> - find available observations that fall within certain constraints in
> >>  time, space, frequency, etc.
> >> - tell me what kind of data products are available for each
> >>
> >> For a data access tool:
> >> - Give me the URL to a specific (set of) type(s) of data product for a
> >>  specific (set of) observation(s)
> >> For all I know, this role could be played by SIAP. SSAP, SCS, or
> >> whatever protocols are already in existence.
> >
> > ObsTAP is intended mainly to provide uniform global data discovery; it
> > can find any type of data, even non-VO data formats.  The data access
> > capabilities provided at this level are very limited, but can be used to
> > retrieve static archive data files (the data product could actually be
> > generated on the fly if desired, but the description at least is
> > static).
> >
> > As you suggest, the idea is that for any non-trivial data access the
> > typed interfaces would be used (SIA, SSA, etc.).  So for example one
> > could do global data discovery using ObsTAP and then followup with one
> > of the typed interfaces to get more complete object-specific metadata
> > and do the actual data access, which for a typed/OO interface will often
> > involve virtual data generation (subsetting, filtering, transforming,
> > output format specification, etc.).  Of course if just retrieving the
> > static archive file is enough then that can be done with just the acref
> > returned by ObsTAP.
> >
> >> The trouble is that for Chandra data, the intertwining of the two
> >> functions requires us to duplicate each ObsCore record six times to
> >> enumerate, laboriously, the different data types we can provide.
> >> When it comes to proper data discovery, it makes much more sense to
> >> return a single record with the ObsCore parameters and a list of
> >> available data product types (event lists, images, light curves,
> >> spectra, tarfiles with all of the above, etc.).
> >
> > True, but this is necessary to be consistent with the relational model
> > and to provide a simple mechanism.  For a Chandra observation one might
> > return a set of records with the same obs_id, one being a tar.gz of the
> > full instrumental dataset, the others being static images, spectra, etc.
> > derived from that data.  A query for a specific obs_id would thus
> > describe all the data products available for the observation.  As you
> > note it is necessary to duplicate some of the metadata in associated
> > records, but much of the metadata will differ for each data product as
> > well.
> >
> > So far as the archive goes one would probably want to autogenerate the
> > ObsTAP table from more fundamental, fully normalized database tables.
> > Any updates would be done only on the underlying tables (auto-updating
> > the ObsTAP "view" after each such update).  Then there should be no
> > problem with the redundant metadata in the ObsTAP index table becoming
> > inconsistent or whatever.
> >
> > In addition to a few static images or spectra providing standard views
> > of an observation one would ideally provide SIA, SSA, etc.  services
> > capable of accessing the event data and computing custom virtual data
> > products on the fly.  In the future the proposed data linking facilities
> > would be able point directly to such services.  At present one would
> > have to do a registry query to find the service and then use the
> > publisher DID from the ObsTAP query to access the desired dataset.
> >
> A few words about these data linking facilities we have in mind 
> (presentation in
> Nara and Napoli for example)...
> The Obsid can be used as a key, (or an entry parameter) to a table or
> service containing or returning links to related data and metadata...
> The basic idea is that it's not just an obsid / acref association (which we
> have as  a byproduct of Obstap) allready, but provides also a description
> of the link... What we have is a little DataLink model with a few parameters
> Association meaning or Nature (calibration files, dataset retrieval, 
> whatever
> X-ray band, etc ......)
> the nature of the link (simple URL, S*AP Query or AccesData mode, etc ...)
> A little model of the internal path to a given file or subfile in the 
> global package
> is also proposed....
> 
> Thus we should be able to expose Chandra packages as a whole in the
> Obstap service... And the various files in the packages can be accessed
> via DataLink... If some of the files have their own dateset type they can
> be described by a S*AP service (or OBsTAp again) given by the link.
> or Accessed via the AccessData method of the relevant S*AP service
> (again URL given by the link)
> 
> An IVOA note is in preparation on this.
> 
> Fran?ois
> 
> 
> 
> >> Btw, Use Case 1.6 misquotes MJD as Mean Julian Date. Should be
> >> Modified Julian Day.
> >>
> >> I hope you don't mind these ruminations, but these are things that I
> >> am discovering as we are trying to implement this - and it is hard.
> >
> > Not at all; it is useful to have these discussions in the record for
> > others later as well.
> >
> >     - Doug
> >
> >
> >> Cheers,
> >>
> >>  - Arnold
> >>
> >>
> >> Douglas Tody wrote:
> >>> On Tue, 5 Jul 2011, Arnold Rots wrote:
> >>>
> >>>>> First, the subtype may be used to define what the data object is in
> >>>>> collection or archive specific terms.  For example if the data 
> >>>>> object is
> >>>>> a tar file containing all the files comprising a ROSAT observation 
> >>>>> the
> >>>>> data provider can define a subtype for this type of data.  It is 
> >>>>> up to
> >>>>> the client to understand what the content of the proprietary data
> >>>>> product is, but if they are able to deal with such 
> >>>>> instrument-specific
> >>>>> data they probably do know what it is.
> >>>>
> >>>> This is precisely the case I was trying to solve: a tarfile containing
> >>>> a mix of data types: images, spectra, event lists.
> >>>> The way I would like to solve it is to allow "package" (or something
> >>>> similar) for the data type and enumerate the data files contained in
> >>>> the tarfile in the data subtype.
> >>>>
> >>>> It still leaves a similar issue for the access format: that would be
> >>>> tar, but it would be nice to be able to enumerate the formats of the
> >>>> files in the tarfile in a similar format subtype - that also would
> >>>> allow one to indicate whether or not the content of the the tarfile is
> >>>> gzipped (as opposed to gzipping the tarfile itself).
> >>>>
> >>>> I realize that this constitutes a use of subtypes that is different
> >>>> from the original intent (at least, I think so), but it does seem a
> >>>> useful mechanism.
> >>>
> >>> Arnold - I agree that in principle it would be useful to have this 
> >>> extra
> >>> information.  However we had to argue for quite a while to get support
> >>> for instrumental data at this level included at all.  One *can* expose
> >>> this data with ObsTAP 1.0 as outlined in my earlier email; in 
> >>> particular
> >>> exposing the individual data products separately allows them to be
> >>> described if the data provider wants to do so.  Even exposing only the
> >>> tar/zip/MEF etc.  file works so long as the client recognizes the
> >>> subtype.
> >>>
> >>> To attempt to the describe the contents of arbitrary complex
> >>> instrumental datasets is out of scope for ObsTAP, at least 1.0.  
> >>> Perhaps
> >>> we can address this issue in the next phase of development where we
> >>> prototype related mechanisms such as data linking.
> >>>
> >>>> However, there is also the reverse problem: what do we do with data
> >>>> products based on multiple observations? Do we allow ObsId to be a
> >>>> list of ObsIds?
> >>>
> >>> This was addressed in the document as I recall.  In the case of complex
> >>> data products which are derived from multiple inputs (e.g.  multiple
> >>> observations) which essentially have a new "software observation", 
> >>> and a
> >>> new obs_id should be assigned.  To say more about the derivation of a
> >>> particular data product is complex and gets into the general issue of
> >>> provenance which is being addressed separately.  Furthermore obs_id 
> >>> is a
> >>> database key used to uniquely identify specific "observations" (usable
> >>> as a foreign key in other tables for example) hence we cannot turn it
> >>> into a list of obs_ids.
> >>>
> >>>      - Doug
> >>>
> >> -------------------------------------------------------------------------- 
> >>
> >> Arnold H. Rots                                Chandra X-ray Science 
> >> Center
> >> Smithsonian Astrophysical Observatory                tel:  +1 617 496 
> >> 7701
> >> 60 Garden Street, MS 67                              fax:  +1 617 495 
> >> 7356
> >> Cambridge, MA 02138                             
> >> arots at head.cfa.harvard.edu
> >> USA                                     
> >> http://hea-www.harvard.edu/~arots/
> >> -------------------------------------------------------------------------- 
> >>
> >>
> 
--------------------------------------------------------------------------
Arnold H. Rots                                Chandra X-ray Science Center
Smithsonian Astrophysical Observatory                tel:  +1 617 496 7701
60 Garden Street, MS 67                              fax:  +1 617 495 7356
Cambridge, MA 02138                             arots at head.cfa.harvard.edu
USA                                     http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------