[obs-tap]:updates on the Proposed recommendation

François Bonnarel francois.bonnarel at astro.unistra.fr
Tue Aug 2 05:18:15 PDT 2011

Hi all,
Basically I agree with Doug here. Aind one more word
As we need at least a year or so to develop DataLink
it's usefull to have something for access in the mean time ....
Best regards
Le 02/08/2011 00:12, Douglas Tody a écrit :
> Arnold -
> ObsTAP alone is sufficient for the simpler use cases; it is similar to
> what a classical archive provides in providing direct discovery and
> download of static archive data products.
> Data linking (also association for modeling complex data) will be a
> powerful advanced capability.  However it is not required for the more
> basic use cases, is optional, and will still be being prototyped after
> we get the basic ObsTAP indexing in place.
> If you really want to expose only your instrumental observations and
> rely upon data linking for all access you can (it is possible within the
> interface design) however it would be preferable, and more consistent
> with usage at other sites, if you would expose both the observations and
> major data products, using data linking only to support the more
> advanced queries.  This is what we plan to do here for example.
> Yes there is some duplication of metadata in the query response for
> related products, but that considerably simplifies the interface and as
> noted earlier this metadata can easily be autogenerated from the actual
> fully normalized and probably non-standardized base tables.  If a client
> just wants to work off the observation index plus data links that usage
> mode will still work fine.  They can simply restrict the query to a
> single subtype and follow the links.
> This could be a good approach for dealing with complex instrumental data
> such as for Chandra or a radio instrument, however if a user is just
> looking for smallish images with calib_level>=2 and a minimum spatial
> resolution of 2 arcsec (or whatever) it can be done in one step with the
> basic ObsTAP interface - but globally for the whole VO.  A primary
> characteristic of a good design is that simple things can be done
> simply, but the design can also handle more complex use cases without
> conflicting with the basic model.
>     - Doug
> On Mon, 1 Aug 2011, Arnold Rots wrote:
>> See also my response to Francois.
>> I would argue that it's better to give up that ability, since it
>> yields a cleaner data discovery protocol, that will therefore be more
>> likely to survive future developments.
>> Besides, if I understand your usage correctly, it will require
>> separate records for the "reference" products and for the more
>> involved ones.
>> Cheers,
>>  - Arnold
>> Douglas Tody wrote:
>>> Arnold -
>>> As we see ObsTAP with association and data linking (which has long been
>>> the plan) is capable of doing what you want, i.e., describe only the
>>> observations and point to related data products or access services via
>>> data linking.
>>> What you would give up with this approach however is the ability to
>>> directly expose associated high level data products such as reference
>>> images or spectra via ObsTAP so that they can be accessed directly
>>> without having to follow data links or invoke additional services.
>>> As noted in earlier email a hybrid approach is possible, describing the
>>> observation and overall packaged dataset with data links pointing to 
>>> the
>>> full list of individual data products or access services, as well as
>>> selected high level data products such as precomputed reference images.
>>>      - Doug
>>> On Mon, 1 Aug 2011, Arnold Rots wrote:
>>>> Francois,
>>>> Nothing is going on underground.
>>>> I have shared our experiences in implementing ObsTAP with some local
>>>> members of the TCG. I made it clear that the PR can be implemented,
>>>> but that there are problems.
>>>> But if I understand your argument below, we are in full agreement.
>>>> You want to separate Data Link from Data Discovery and that is
>>>> precisely what I was arguing; my complaint was that there are Data
>>>> Link elements in the ObsTAP Data Discovery protocol that are causing a
>>>> problem.
>>>> Specifically: the access_* elements belong in Data Link, not in Data
>>>> Discovery, and with them removed the data types available can be
>>>> enumerated in a single record.
>>>> So, the example I gave (the responses to a Data Discovery query and a
>>>> Data Linking query) are in full agreement with what you are
>>>> advocating, as far as I can tell.
>>>> Is there still an issue, then?
>>>> Cheers,
>>>>  - Arnold
>>>> Francois Bonnarel wrote:
>>>>> Hi Arnold, all dm people,
>>>>> Let me go back to this, because apparently, this discussion is 
>>>>> going on
>>>>> underground
>>>>> First come back to the very beginning of the ObsTap effort...
>>>>> It was a strong commitment from the comitee to build something fast
>>>>> reusing tAP protocol and observation/charac data model for
>>>>> data discovery covering most of the needs...
>>>>>  From the very beginning also, it was obvious that Data links
>>>>> and virtual access data could not and will not be covered by Obstap
>>>>> The DataLink method or service concept has been around in various 
>>>>> DAL notes
>>>>> since years now. As far as I am concerned I made presentations in the
>>>>> last three
>>>>> Interop meetings (Victoria, Nara and Napoli, see eg the latter:
>>>>> http://www.ivoa.net/internal/IVOA/DAL-InteropMay2011/DataLink.pdf )
>>>>> This concept is there, because you cannot imagine providing both Data
>>>>> Discovery
>>>>> and complex linkage features (or linkage for complex data 
>>>>> structure) in
>>>>> one step
>>>>> and a SINGLE table, (single table required by the TAP-ADQL 
>>>>> protocol as
>>>>> all may remember)
>>>>> So ObsTap is there for DataDiscovery... the only thing you can 
>>>>> imagine
>>>>> to provide access to the
>>>>> various Data sets in an observation is to duplicate the 
>>>>> observation raws
>>>>> until you reach full
>>>>> discovery of all observation-related products as was allready
>>>>> explained... This is verbose
>>>>> and works . So now how can DataLink work in the future ? see below on
>>>>> your use case ...
>>>>> Data Link is now in the roodmap of the DAL working group and an IVOA
>>>>> note is in preparation as a
>>>>> very first drafting effort of this new "protocol".... The note 
>>>>> will be
>>>>> available within 3 weeks or so..
>>>>> Arnold Rots a e'crit :
>>>>>> This is becoming unwieldy.
>>>>>> Trying to make X-ray data (and I suspect the same is true for 
>>>>>> aperture
>>>>>> synthesis data) fit into something that is designed with optical
>>>>>> images in mind is reminiscent of round pegs and square holes.
>>>>>> Service providers are free to define subtypes and titles, but you 
>>>>>> are
>>>>>> saying that if they don't follow rules that are not spelled out,
>>>>>> things won't work as envisaged.
>>>>>> Also, if I understand the argument correctly, if data discovery
>>>>>> software is to be helpful at all, it needs to be able to extract 
>>>>>> some
>>>>>> information from the title field - but that is intended for human
>>>>>> consumption.
>>>>>> If I see this, it looks like I need to generate at least eight 
>>>>>> records
>>>>>> for a single observation, some containing a mix of levels, and all
>>>>>> duplicating pretty much the same metadata.
>>>>>> This is not going to make it attractive to provide ObsTAP services.
>>>>>> Maybe I should do what you did and provide an example of how I 
>>>>>> thought
>>>>>> it should have worked.
>>>>>> Here is how I would envisage data discovery of Chandra data to work:
>>>>>>   A single record per Obsid that provides the observational 
>>>>>> metadata and:
>>>>>>     ObsId
>>>>>>       12345
>>>>>>     Dataset Identifier
>>>>>>       ivo://ADS/Sa.CXO#obs/12345
>>>>>>     Data Types available
>>>>>>       Package
>>>>>>       Event list
>>>>>>       Image
>>>>>>     Calibration level
>>>>>>       2
>>>>>>     Title
>>>>>>       Chandra/ACIS ObsId 12345
>>>>> DataLink is a method or a service allowing to retrieve a table
>>>>> describing links between observations
>>>>> identified by their obsid and any kind of data retrieval ... Obsid 
>>>>> known
>>>>> from an ObsTap discovery
>>>>> phase can be directly used for interrogating such a service of 
>>>>> course..
>>>>> (and by the way in the case the Obstap service is a TAP-PQL 
>>>>> service the
>>>>> DataLink table could be attached with the main obstap table in the 
>>>>> same
>>>>> query response because the single table requirement is no more 
>>>>> there in
>>>>> that case)
>>>>> But it is a qualified link which means that the semantic or type 
>>>>> of the
>>>>> link is given in one field
>>>>> of the table, while the nature of the access is given in another 
>>>>> field :
>>>>> this can tell us if it is a simple
>>>>> retrieval , an SIA Query service ans SSA AccesData method, etc ...
>>>>> So in your use case we will get three different links for the same
>>>>> Observation (obsid) .. the types
>>>>> (or semantic) will be Package, event list and image and the Access
>>>>> nature could be respectivly : retrieval
>>>>> retrieval and SIA query (for example)
>>>>> In addition the "Access" package (group of access fields in the 
>>>>> table)
>>>>> is proposed to be extended
>>>>> beyond the traditional "reference" and "format" to describe which 
>>>>> part
>>>>> of a complex "file" is to be retrieved
>>>>> ( path in a directory/tar file, extension in MEF file, table name 
>>>>> in a
>>>>> VOTABLE, etc ...) .. A proposal
>>>>> for such an extended access package is described in the
>>>>> chaaracterisation 2 draft at the moment...
>>>>> Best regards
>>>>> Franc,ois
>>>>>> Then a data access protocol that allows querying the archive 
>>>>>> using any
>>>>>> of the above in a where clause, with either ObsId or DID 
>>>>>> required, and
>>>>>> returning:
>>>>>>   ObsId  DataType   Contents   Level   Format      URL
>>>>>>   -----------------------------------------------------------
>>>>>>   12345  Pkg_1      evt,img    2       tar         http://...
>>>>>>   12345  Pkg_2      evt,img    1       tar         http://...
>>>>>>   12345  Pkg_12     evt,img    2,1     tar         http://...
>>>>>>   12345  evt        evt        2       fits-bin    http://...
>>>>>>   12345  evt        evt        1       fits-bin    http://...
>>>>>>   12345  img        img        2       fits        http://...
>>>>>>   12345  img        img        2       jpg         http://...
>>>>>>   12345  img        img        2       fits        http://...
>>>>>>   12345  img        img        2       jpg         http://...
>>>>>> This is an example where the client specified ObsId or DID, but no
>>>>>> data type or format.
>>>>>> Never mind the terms and abbreviations I used - you get the picture.
>>>>>> Cheers,
>>>>>>   - Arnold
>>>>>> Douglas Tody wrote:
>>>>>>> More precisely what you might have is something like (display in 
>>>>>>> a wide view):
>>>>>>>      ObsId     Type     Subtype               Level     
>>>>>>> Format                         Title
>>>>>>> ----------------------------------------------------------------------------------------------------------
>>>>>>>      123      event    chandra.hrc.pkg         1      
>>>>>>> application/x-tar-gzip   Chandra ACS-XYZ observation package 
>>>>>>> (event,refimage)
>>>>>>>      123      image    chandra.hrc.refimage    2      
>>>>>>> image/fits               Chandra ACS-XYZ reference image
>>>>>>>      123      image    chandra.hrc.preview     2      
>>>>>>> image/jpeg               Chandra ACS-XYZ preview image
>>>>>>>      345      event    rosat.foo.pkg           1      
>>>>>>> application/x-tar-gzip   ROSAT whatever observation package (xxx)
>>>>>>> and so forth.  The subtype could in principle be more generic 
>>>>>>> but will
>>>>>>> likely be instrument-specific for a level 1 observation.
>>>>>>> The Title should concisely describe the data product, e.g., origin,
>>>>>>> instrument, ID, what it is (observation package, calibration, 
>>>>>>> standard
>>>>>>> view, etc.).  The title string is what one normally wants to 
>>>>>>> output on a
>>>>>>> displayed image or plot to identify to a human the data being 
>>>>>>> shown.
>>>>>>> You can put whatever you want in there to describe the data 
>>>>>>> product so
>>>>>>> long as it is concise (one line of text).
>>>>>>>          - Doug
>>>>>>> On Mon, 11 Jul 2011, Douglas Tody wrote:
>>>>>>>> On Thu, 7 Jul 2011, Arnold Rots wrote:
>>>>>>>>> Aside from what I reported in a previous message, quoted 
>>>>>>>>> below, there
>>>>>>>>> are more discrepancies between Table 5 and Tables 6 and 7:
>>>>>>>>> obs_creator_did is missing from Table 7
>>>>>>>>> o_units in Table 5 should be o_unit
>>>>>>>>> pol_states is missing from Table 6
>>>>>>>>> facility_name and instrument_name are spelled differently;
>>>>>>>>>  even though required, they show up in Table 7, rather than 6
>>>>>>>>> em_unit is missing from Table 5
>>>>>>>>> o_stat_error is missing from Table 7
>>>>>>>>> Also, note the comment I made on MJD in use case 1.6
>>>>>>>>> and on the uselessness of bib_reference because of its murky
>>>>>>>>> definition
>>>>>>>>> I still lament the fact that the data access functionality is
>>>>>>>>> compromising the self-consistency and usefulness of the data 
>>>>>>>>> discovery
>>>>>>>>> function, but decided for our tarred packages to use:
>>>>>>>>>  dataproduct_type = NULL
>>>>>>>>>  dataproduct_subtype = package:event,image
>>>>>>>>>  access_format = application/x-tar
>>>>>>>>> As far as I can tell, this is within the specifications.
>>>>>>>> Well we don't specify what the subtypes you provide for your 
>>>>>>>> archive
>>>>>>>> should be so I suppose you could get away with this, but this 
>>>>>>>> example is
>>>>>>>> not at all what we had in mind.  The subtype should be the 
>>>>>>>> science type
>>>>>>>> of the specific data product, *not* details about the content 
>>>>>>>> of the
>>>>>>>> data product.  I would expect the type to be "event" (meaning 
>>>>>>>> "event
>>>>>>>> data" not "event list") and the subtype to be something more like
>>>>>>>> "chandra.hrc.package", "chandra.hrc.refimage (or "rosat.XX" etc.).
>>>>>>>> Note subtypes are supposed to be fixed strings so that one can 
>>>>>>>> search
>>>>>>>> the local archive for a particular type of data product; if you 
>>>>>>>> try to
>>>>>>>> describe what is included in a particular data product then such
>>>>>>>> selection won't be possible.  So for example a client will do a 
>>>>>>>> generic
>>>>>>>> query to see what subtypes Chandra defines, and then they can 
>>>>>>>> pose a
>>>>>>>> more specific query to get a certain type of Chandra-specific data
>>>>>>>> product.  Likewise for ALMA etc.
>>>>>>>> Note you also have obs.title where you can provide a short 
>>>>>>>> description
>>>>>>>> of the data product and for this you can provide whatever you 
>>>>>>>> want.
>>>>>>>>       - Doug
>>>>> -- 
