[ObsCoreRFC]Minutes of the telco Monday June 6
Arnold Rots
arots at head.cfa.harvard.edu
Tue Jul 12 08:29:17 PDT 2011
Douglas Tody wrote:
[ Charset UTF-8 unsupported, converting... ]
> On Mon, 11 Jul 2011, Arnold Rots wrote:
>
> > I hear what you are saying (I think), but in retrospect it would have
> > been much cleaner (and clearer) if the data discovery and data access
> > roles had not been combined in ObsTAP.
> > If ObsTAP had purely provided information on the availability of data,
> > observational metadata, the data product types, the data formats, and
> > the packaging, a comprehensive data access tool could have provided
> > the access URLs in response to a query that is based on the
> > information provided through ObsTAP.
> > As it is, the data discovery is somewhat compromised by the data
> > access features and the data access features themselves are not
> > very satisfactory.
> > I am afraid that this is a bit of a missed opportunity.
>
> Well then we would lose the ability to describe and point to non-VO data
> like Chandra or ALMA observations, for which we have no VO data
> services. For images/spectra etc. it could work to not have any acref
> in the ObsTAP metadata, but the current scheme already makes it possible
> to ignore the acref returned by ObsTAP and go directly to a SIA or
> whatever service if such is provided. Even so it could be useful and
> convenient to get a reference or preview image back from the ObsTAP
> acref without having to go to the full up data service.
>
> What probably would be the best approach for Chandra (or ALMA etc.)
> would be to have the level 0 or 1 observational data plus some standard
> data products such as reference images etc., all described in the ObsTAP
> QR with a shared obs_id. Then also provide a data link service such as
> Francois describes to fully resolve all the data products or other
> resources available for the observation.
>
> In a data discovery portal one would then be able to do discovery and
> preview the data, but then optionally retrieve and examine all the data
> links for a given observation, and possibly do full-up data access,
> invoke a pipeline reprocessing job, examine auxiliary information like
> logs or proposal cover pages, etc.
>
> - Doug
>
We wouldn't lose anything.
If ObsTAP would just return observational parameters and a list of
available products (that's proper data discovery), the data access
protocol would allow users to get a list of access URLs for specific data
products and specific ObsIds, with information on file formats, etc.
That would be a proper separation of data discovery and data access
functions and it would in no way make us lose any capabilities.
Preview images could be included in the table returned by the access
service.
Cheers,
- Arnold
>
> > Cheers,
> >
> > - arnold
> >
> > Fran?ois Bonnarel wrote:
> > [ Charset ISO-8859-1 unsupported, converting... ]
> >> Arnold, Doug,
> >> Le 06/07/2011 17:54, Douglas Tody a ?crit :
> >>> On Wed, 6 Jul 2011, Arnold Rots wrote:
> >>>
> >>>> I think I am beginning to realize what it is that makes me so
> >>>> uncomfortable with ObsTAP and what makes it so hard to grasp the
> >>>> correct way to implement it: its ambivalence.
> >>>>
> >>>> It is primarily intended (I think) as a data discovery interface.
> >>>> The problem is that it also doubles as a data access tool.
> >>>> I think it is the intertwining of these two functions that makes it
> >>>> murky.
> >>>> And I wish these two functions had been separated into separate
> >>>> intefaces.
> >>>> I know this is not an issue for some observatories (say, the ones that
> >>>> only produce simple 2-D images), but it makes life difficult for more
> >>>> complicated datasets.
> >>>>
> >>>> As a data discovery tool, I would have expected its purpose to be:
> >>>> - find available observations that fall within certain constraints in
> >>>> time, space, frequency, etc.
> >>>> - tell me what kind of data products are available for each
> >>>>
> >>>> For a data access tool:
> >>>> - Give me the URL to a specific (set of) type(s) of data product for a
> >>>> specific (set of) observation(s)
> >>>> For all I know, this role could be played by SIAP. SSAP, SCS, or
> >>>> whatever protocols are already in existence.
> >>>
> >>> ObsTAP is intended mainly to provide uniform global data discovery; it
> >>> can find any type of data, even non-VO data formats. The data access
> >>> capabilities provided at this level are very limited, but can be used to
> >>> retrieve static archive data files (the data product could actually be
> >>> generated on the fly if desired, but the description at least is
> >>> static).
> >>>
> >>> As you suggest, the idea is that for any non-trivial data access the
> >>> typed interfaces would be used (SIA, SSA, etc.). So for example one
> >>> could do global data discovery using ObsTAP and then followup with one
> >>> of the typed interfaces to get more complete object-specific metadata
> >>> and do the actual data access, which for a typed/OO interface will often
> >>> involve virtual data generation (subsetting, filtering, transforming,
> >>> output format specification, etc.). Of course if just retrieving the
> >>> static archive file is enough then that can be done with just the acref
> >>> returned by ObsTAP.
> >>>
> >>>> The trouble is that for Chandra data, the intertwining of the two
> >>>> functions requires us to duplicate each ObsCore record six times to
> >>>> enumerate, laboriously, the different data types we can provide.
> >>>> When it comes to proper data discovery, it makes much more sense to
> >>>> return a single record with the ObsCore parameters and a list of
> >>>> available data product types (event lists, images, light curves,
> >>>> spectra, tarfiles with all of the above, etc.).
> >>>
> >>> True, but this is necessary to be consistent with the relational model
> >>> and to provide a simple mechanism. For a Chandra observation one might
> >>> return a set of records with the same obs_id, one being a tar.gz of the
> >>> full instrumental dataset, the others being static images, spectra, etc.
> >>> derived from that data. A query for a specific obs_id would thus
> >>> describe all the data products available for the observation. As you
> >>> note it is necessary to duplicate some of the metadata in associated
> >>> records, but much of the metadata will differ for each data product as
> >>> well.
> >>>
> >>> So far as the archive goes one would probably want to autogenerate the
> >>> ObsTAP table from more fundamental, fully normalized database tables.
> >>> Any updates would be done only on the underlying tables (auto-updating
> >>> the ObsTAP "view" after each such update). Then there should be no
> >>> problem with the redundant metadata in the ObsTAP index table becoming
> >>> inconsistent or whatever.
> >>>
> >>> In addition to a few static images or spectra providing standard views
> >>> of an observation one would ideally provide SIA, SSA, etc. services
> >>> capable of accessing the event data and computing custom virtual data
> >>> products on the fly. In the future the proposed data linking facilities
> >>> would be able point directly to such services. At present one would
> >>> have to do a registry query to find the service and then use the
> >>> publisher DID from the ObsTAP query to access the desired dataset.
> >>>
> >> A few words about these data linking facilities we have in mind
> >> (presentation in
> >> Nara and Napoli for example)...
> >> The Obsid can be used as a key, (or an entry parameter) to a table or
> >> service containing or returning links to related data and metadata...
> >> The basic idea is that it's not just an obsid / acref association (which we
> >> have as a byproduct of Obstap) allready, but provides also a description
> >> of the link... What we have is a little DataLink model with a few parameters
> >> Association meaning or Nature (calibration files, dataset retrieval,
> >> whatever
> >> X-ray band, etc ......)
> >> the nature of the link (simple URL, S*AP Query or AccesData mode, etc ...)
> >> A little model of the internal path to a given file or subfile in the
> >> global package
> >> is also proposed....
> >>
> >> Thus we should be able to expose Chandra packages as a whole in the
> >> Obstap service... And the various files in the packages can be accessed
> >> via DataLink... If some of the files have their own dateset type they can
> >> be described by a S*AP service (or OBsTAp again) given by the link.
> >> or Accessed via the AccessData method of the relevant S*AP service
> >> (again URL given by the link)
> >>
> >> An IVOA note is in preparation on this.
> >>
> >> Fran?ois
> >>
> >>
> >>
> >>>> Btw, Use Case 1.6 misquotes MJD as Mean Julian Date. Should be
> >>>> Modified Julian Day.
> >>>>
> >>>> I hope you don't mind these ruminations, but these are things that I
> >>>> am discovering as we are trying to implement this - and it is hard.
> >>>
> >>> Not at all; it is useful to have these discussions in the record for
> >>> others later as well.
> >>>
> >>> - Doug
> >>>
> >>>
> >>>> Cheers,
> >>>>
> >>>> - Arnold
> >>>>
> >>>>
> >>>> Douglas Tody wrote:
> >>>>> On Tue, 5 Jul 2011, Arnold Rots wrote:
> >>>>>
> >>>>>>> First, the subtype may be used to define what the data object is in
> >>>>>>> collection or archive specific terms. For example if the data
> >>>>>>> object is
> >>>>>>> a tar file containing all the files comprising a ROSAT observation
> >>>>>>> the
> >>>>>>> data provider can define a subtype for this type of data. It is
> >>>>>>> up to
> >>>>>>> the client to understand what the content of the proprietary data
> >>>>>>> product is, but if they are able to deal with such
> >>>>>>> instrument-specific
> >>>>>>> data they probably do know what it is.
> >>>>>>
> >>>>>> This is precisely the case I was trying to solve: a tarfile containing
> >>>>>> a mix of data types: images, spectra, event lists.
> >>>>>> The way I would like to solve it is to allow "package" (or something
> >>>>>> similar) for the data type and enumerate the data files contained in
> >>>>>> the tarfile in the data subtype.
> >>>>>>
> >>>>>> It still leaves a similar issue for the access format: that would be
> >>>>>> tar, but it would be nice to be able to enumerate the formats of the
> >>>>>> files in the tarfile in a similar format subtype - that also would
> >>>>>> allow one to indicate whether or not the content of the the tarfile is
> >>>>>> gzipped (as opposed to gzipping the tarfile itself).
> >>>>>>
> >>>>>> I realize that this constitutes a use of subtypes that is different
> >>>>>> from the original intent (at least, I think so), but it does seem a
> >>>>>> useful mechanism.
> >>>>>
> >>>>> Arnold - I agree that in principle it would be useful to have this
> >>>>> extra
> >>>>> information. However we had to argue for quite a while to get support
> >>>>> for instrumental data at this level included at all. One *can* expose
> >>>>> this data with ObsTAP 1.0 as outlined in my earlier email; in
> >>>>> particular
> >>>>> exposing the individual data products separately allows them to be
> >>>>> described if the data provider wants to do so. Even exposing only the
> >>>>> tar/zip/MEF etc. file works so long as the client recognizes the
> >>>>> subtype.
> >>>>>
> >>>>> To attempt to the describe the contents of arbitrary complex
> >>>>> instrumental datasets is out of scope for ObsTAP, at least 1.0.
> >>>>> Perhaps
> >>>>> we can address this issue in the next phase of development where we
> >>>>> prototype related mechanisms such as data linking.
> >>>>>
> >>>>>> However, there is also the reverse problem: what do we do with data
> >>>>>> products based on multiple observations? Do we allow ObsId to be a
> >>>>>> list of ObsIds?
> >>>>>
> >>>>> This was addressed in the document as I recall. In the case of complex
> >>>>> data products which are derived from multiple inputs (e.g. multiple
> >>>>> observations) which essentially have a new "software observation",
> >>>>> and a
> >>>>> new obs_id should be assigned. To say more about the derivation of a
> >>>>> particular data product is complex and gets into the general issue of
> >>>>> provenance which is being addressed separately. Furthermore obs_id
> >>>>> is a
> >>>>> database key used to uniquely identify specific "observations" (usable
> >>>>> as a foreign key in other tables for example) hence we cannot turn it
> >>>>> into a list of obs_ids.
> >>>>>
> >>>>> - Doug
> >>>>>
> >>>> --------------------------------------------------------------------------
> >>>>
> >>>> Arnold H. Rots Chandra X-ray Science
> >>>> Center
> >>>> Smithsonian Astrophysical Observatory tel: +1 617 496
> >>>> 7701
> >>>> 60 Garden Street, MS 67 fax: +1 617 495
> >>>> 7356
> >>>> Cambridge, MA 02138
> >>>> arots at head.cfa.harvard.edu
> >>>> USA
> >>>> http://hea-www.harvard.edu/~arots/
> >>>> --------------------------------------------------------------------------
> >>>>
> >>>>
> >>
> > --------------------------------------------------------------------------
> > Arnold H. Rots Chandra X-ray Science Center
> > Smithsonian Astrophysical Observatory tel: +1 617 496 7701
> > 60 Garden Street, MS 67 fax: +1 617 495 7356
> > Cambridge, MA 02138 arots at head.cfa.harvard.edu
> > USA http://hea-www.harvard.edu/~arots/
> > --------------------------------------------------------------------------
> >
>
--------------------------------------------------------------------------
Arnold H. Rots Chandra X-ray Science Center
Smithsonian Astrophysical Observatory tel: +1 617 496 7701
60 Garden Street, MS 67 fax: +1 617 495 7356
Cambridge, MA 02138 arots at head.cfa.harvard.edu
USA http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------
More information about the dm
mailing list