towards a DataLink IVOA Note

François Bonnarel francois.bonnarel at astro.unistra.fr
Fri Oct 14 01:46:39 PDT 2011


Dear all,
      Next week in Pune a DAL working group session will be focused on  
DataLink.
      I present here first ideas for an IVOA note on this topic... Sorry 
for not providing this sooner to people willing to co write the note...
     But I think after discussing these ideas next weeks we can move 
rapidly together to something more collective...
Cheers
François
Introduction
---------------

Discussion in the IVOA has shown that "DataLink" is a usefull concept 
within the scope of the Generic data set architecture (see "Service 
architecture and standard profile"
- http://www.ivoa.net/internal/IVOA/SiaInterface/DAL2_Architecture.pdf -
  for first description of the GDS concept and a presentation by Doug 
Tody at the Strasbourg  May 2009 Interop meeting
  - http://www.ivoa.net/internal/IVOA/200905DALSessions/siapv2-may09.pdf -
  for first mention of the DataLink concept)

Within the scope of DAL protocols , Generic dataset protocol concept 
illustrates the need for a  type of services valid for any kind of data 
, as a counterpart of typed interfaces such as SSA and SIA ... which can 
describe only a single category of data, but can do it in finer detail, 
with a  data model specific to the data.

A second important difference is that the generic dataset can describe 
only data sets or  files as they are stored in some archive, whereas the 
typed interfaces can describe,  and provide access to both static 
archival datasets as well as virtual data. In addition, they actually 
DRIVE the  generation of the latter.

Finally, since the generic dataset can describe any type of data it can 
also describe all sorts of complex data.

This approach provides a lot of flexibility for both describing and 
accessing data. A complex observation consisting of several related data 
products can be described via the generic dataset query mechanism. For 
example we might have a survey field consisting of a spectral data cube, 
some 2-D projections of the cube (integrated flux, maps for a given 
wavelength, velocity/position maps, etc ...), a source catalog for the 
field computed from the 2-D continuum, and possibly some extracted 
spectra of objects in the field. It is the work of  Client applications 
to deal with the data sets.

Simple clients will discover and help to retrieve entire datasets while 
more sophisticated ones could make use of the metadata to go further in 
the analysis of the data. This phase could require interpretaion and use 
of new data links attached  to the considered data set ...

Within the frame of generic data set "DataLinks" relate a Data discovery 
table record  to some other "object". A record can have a number of such 
links. The table representation of links is what is more convenient in 
the TAP era...

Links contain at least a dataset ID used as a key, a link type and an 
URL . DataLinks link a record in the dataset table  to files of some 
types, a standard or custom service to access the data, an HTML page, 
etc ...

Discovery with Obstap
------------------------------

Obstap is actually playing the role of a data discovery service within 
the frame of  Generic dataset protocols  ...

Building on the work done on data models (ref) and TAP (ref), it became 
recently possible to define a standard service protocol to expose 
standard metadata describing available datasets: Obstap (ref). In 
general, any data model can be mapped to a relational database and 
exposed directly with the TAP protocol. The goal of ObsTAP is to provide 
such a capability based upon an essential subset of the general 
observational data model.

Specifically, this effort  defined a database table to describe 
astronomical datasets (data products) stored in archives that can be 
queried directly with the TAP protocol. This is very usefull for global 
data discovery as any type of data can be described in a straightforward 
and uniform fashion.

The described datasets can be directly downloaded, or linked to IVOA 
Data Access Layer (DAL) protocols such as for accessing images (SIA) or 
spectra (SSA) or whatever kind of services. These links  can 
potentially  be used to perform more advanced data access operations on 
the referenced datasets.

Actually this is what is behind the "DataLink" concept which we will 
describe now.



Linking the Discovery results with "something else"
--------------------------------------------------------------------

  Suppose we have interrogated an Obstap service... : Each row in the 
result page describes a dataset... The description contains a field 
named "reference". It could  be there for direct retrieval or tp provide 
richer type of "links".  What kind of "links" could be provided to the 
user for such a dataset ?  One can imagine various types such as:
     - direct retrieval of the full dataset
     - access to a part of the dataset when the internal structure is 
known ...
     - access to a service by forcing an ObsID-like parameter to be fixed
     - access to related files : previews, visualisations, calibration 
files,
      etc, etc....

Of course each dataset described in the Obstap service query response 
may have several links like this and the nature of the link has to be 
described somehow.. The reference itself (generally a web access) has to 
be described by its  URL, its format and size like in SSA services or 
Obstap services query responses but in addition the structure  of the 
file has also to be given ...

Describing the link in practice
---------------------------------------

Concretly, how will be the links described... A small package of 
attributes (data model package) can be defined for this.  We will define :
     - the ObsId attribute
    - an attribute giving the meaning (or semantics) of the link 
(Calibration file or SIA DESCRIPTION or catalogue part in a complex 
dataset -archive - )
     - an attribute describing the IVOA type of the link: simple 
retrieval, Other Obstap, SIA , SSA service, with either query or 
DataAccess method, UWS service , etc....
    - an Access package details the structure of the link (see 
Characaterisation 2 reference) for more complete description of this 
Access package.... It contains :
            * an URL or URI
            * a mime/type eg image/fits, ....
            * an estimated size for the response
            * a subtype : table, votable, mef, archive
            * a set of internal attributes:
                      . path
                      . array
                      . row
                      . field
                      . extnum
                      . extname

The "internal" attributes are really important to provide localisation 
of the link inside a complex dataset...

      - The "Path" attribute  allows to describe the file path and name 
inside an "archive" dataset
    - The "array" attribute defines a cutout in a n-dimensional array 
image using the cfitsio syntax:  [50:100,70:200] being the extracted 
subimage from pixel 50 to 100 in x and 70 to 200 in y.
      - The "extnume" or "extname" attribute designates the extension 
number or name in a multi extension FITS file. Extname can also be used 
to designate a RESOURCE or TAble name in a complex VOTABLE document ...
      - "FIELD" and "ROW" have obvious significations in a FITS or 
VOTABLE table...

All these attributes are optional and by using this ordering:
  [path][extnum|extname][field][row][array]
  it should be possible to locate any kind of significant structure in 
archives or datasets containing the most commun astronomy standards for 
files ...j

Building a Data Link service
-------------------------------------

A data Link service is a simple DAL service providing  DataLinks for a 
set of Obsids. The result is presented as a VOTABLE with one field per 
attribute ( attributes  described above) ... The input parameters is an 
Obsid. A set of OBsIds stored in a file can also be given as input.....

Let's give  examples of queries and query responses.

A service query could be of the following form:
http://aaa.bbbb.fr/dal-services/datalink?obsid="ivoa://xxx.yyy.edu/123345"
where ivoa://xxx.yyy.edu/12345 is an IVOA identifier of a specific 
observation which could have been provided by an Obstap Query.

The query response could be something like this:

<TABLE name="DataLinks">
<FIELD name="Obsid" utype="dl:Dataid.ObservationID" datatype="char" 
arraysize="*"/>
<FIELD name="Semantics" utype="dl:Semantics" datatype="char" arraysize="*"/>
<FIELD name="Servicetype" utype="dl:Votype" datatype="char" arraysize="*" />
<FIELD name="reference" utype="dl:Access.Reference" datatype="char" 
arraysize="*"/>
<FIELD name="format" utype="dl:Access.Format" datatype="char" 
arraysize="*" />
<FIELD name="size" utype="dl:Access.Size" datatype="char" arraysize="*" />
<FIELD name="subtype" utype="dl:Access.Subtype" datatype="char" 
arraysize="*" />
<FIELD name="path" utype="dl:Access.AccessParams.Path" datatype="char" 
arraysize="*" />
<FIELD name="extnum" utype="dl:Access.AccessParams.Extnum" 
datatype="char" arraysize="*" />
<FIELD name="extname" utype="dl:Access.AccessParams.Extname" 
datatype="char" arraysize="*" />
<FIELD name="field" utype="dl:Access.AccessParams.Field" datatype="char" 
arraysize="*" />
<FIELD name="row" utype="dl:Access.AccessParams.Row" datatype="char" 
arraysize="*" />
<FIELD name="array" utype="dl:Access.AccessParams.Array" datatype="char" 
arraysize="*" />
<DATA>
<TABLEDATA>
<TR>
<TD>ivoa://xxx.yyy.edu/123345</TD>
<TD>full dataset</TD>
<TD>retrieval</TD>
<TD>http://xxx.yyy.de/archive.tar</TD>
<TD>3.4Gb</TD>
<TD>archive/tar</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
</TR>
<TR>
<TD>ivoa://xxx.yyy.edu/123345</TD>
<TD>image</TD>
<TD>retrieval</TD>
<TD>http://xxx.yyy.de/archive.tar</TD>
<TD>1Gb</TD>
<TD>image/fits</TD>
<TD>none</TD>
<TD>image/cccc.fits</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
</TR>
<TR>
<TD>ivoa://xxx.yyy.edu/123345</TD>
<TD>image metadata</TD>
<TD>sia</TD>
<TD>http://xxx.yyy.de/sinea?query&obsid="ivoa://xxx.yyy.edu/123345"</TD>
<TD>1Kb</TD>
<TD>application/xml+votable</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
</TR>
</TABLEDATA>
</DATA>

In this response example, where the main dataset is a tar archive, the 
first record links to the full retrieval of the whole archive, the 
second record links to a FITS image cccc.fits in the directory image of 
the tar file.... The last record links to the query method of a SIAP 
service which will answer by description of images sharing the 
"ivoa://xxx.yyy.edu/123345" Obsid

Extensions of ObsTap for dataLinking
----------------------------------------------------
In any case a future version of ObsTAp could benefit defining an 
additional FIELD with utype "dataLink" which will be  pointing to a 
DataLink service (a PARAM could also be sufficient...), using the Obsid 
value of each record as the main parameter for the query.

In addition an ObsTAp service is a TAP service and may have several 
query languages ... The mandatory ADQL interface can be usefully 
completed by PQL for example. In the case we use a Obstap service with a 
PQL interface the standard doesn't require the single table response  
... This allows to add DAL extensions (additional tables) - see SSA 
recommendation for a definition of DAL extension mechanism - to the main 
standard Obstap table. Adding a specific DataLink response to the main 
Obstap table becomes then possible. The ObsId FIELD which is common to 
the two tables allows to relate records in the main obstap table to 
records in the concatenated DataLink response, and can be used as a 
reference key. This approach avoids the necessity for the clients to 
extract the URL for the dataLink service from the main table and to 
start a new query on another service...





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/dal/attachments/20111014/4caf10ff/attachment-0001.html>


More information about the dal mailing list