towards a DataLink IVOA Note
François Bonnarel
francois.bonnarel at astro.unistra.fr
Fri Oct 14 01:46:39 PDT 2011
Dear all,
Next week in Pune a DAL working group session will be focused on
DataLink.
I present here first ideas for an IVOA note on this topic... Sorry
for not providing this sooner to people willing to co write the note...
But I think after discussing these ideas next weeks we can move
rapidly together to something more collective...
Cheers
François
Introduction
---------------
Discussion in the IVOA has shown that "DataLink" is a usefull concept
within the scope of the Generic data set architecture (see "Service
architecture and standard profile"
- http://www.ivoa.net/internal/IVOA/SiaInterface/DAL2_Architecture.pdf -
for first description of the GDS concept and a presentation by Doug
Tody at the Strasbourg May 2009 Interop meeting
- http://www.ivoa.net/internal/IVOA/200905DALSessions/siapv2-may09.pdf -
for first mention of the DataLink concept)
Within the scope of DAL protocols , Generic dataset protocol concept
illustrates the need for a type of services valid for any kind of data
, as a counterpart of typed interfaces such as SSA and SIA ... which can
describe only a single category of data, but can do it in finer detail,
with a data model specific to the data.
A second important difference is that the generic dataset can describe
only data sets or files as they are stored in some archive, whereas the
typed interfaces can describe, and provide access to both static
archival datasets as well as virtual data. In addition, they actually
DRIVE the generation of the latter.
Finally, since the generic dataset can describe any type of data it can
also describe all sorts of complex data.
This approach provides a lot of flexibility for both describing and
accessing data. A complex observation consisting of several related data
products can be described via the generic dataset query mechanism. For
example we might have a survey field consisting of a spectral data cube,
some 2-D projections of the cube (integrated flux, maps for a given
wavelength, velocity/position maps, etc ...), a source catalog for the
field computed from the 2-D continuum, and possibly some extracted
spectra of objects in the field. It is the work of Client applications
to deal with the data sets.
Simple clients will discover and help to retrieve entire datasets while
more sophisticated ones could make use of the metadata to go further in
the analysis of the data. This phase could require interpretaion and use
of new data links attached to the considered data set ...
Within the frame of generic data set "DataLinks" relate a Data discovery
table record to some other "object". A record can have a number of such
links. The table representation of links is what is more convenient in
the TAP era...
Links contain at least a dataset ID used as a key, a link type and an
URL . DataLinks link a record in the dataset table to files of some
types, a standard or custom service to access the data, an HTML page,
etc ...
Discovery with Obstap
------------------------------
Obstap is actually playing the role of a data discovery service within
the frame of Generic dataset protocols ...
Building on the work done on data models (ref) and TAP (ref), it became
recently possible to define a standard service protocol to expose
standard metadata describing available datasets: Obstap (ref). In
general, any data model can be mapped to a relational database and
exposed directly with the TAP protocol. The goal of ObsTAP is to provide
such a capability based upon an essential subset of the general
observational data model.
Specifically, this effort defined a database table to describe
astronomical datasets (data products) stored in archives that can be
queried directly with the TAP protocol. This is very usefull for global
data discovery as any type of data can be described in a straightforward
and uniform fashion.
The described datasets can be directly downloaded, or linked to IVOA
Data Access Layer (DAL) protocols such as for accessing images (SIA) or
spectra (SSA) or whatever kind of services. These links can
potentially be used to perform more advanced data access operations on
the referenced datasets.
Actually this is what is behind the "DataLink" concept which we will
describe now.
Linking the Discovery results with "something else"
--------------------------------------------------------------------
Suppose we have interrogated an Obstap service... : Each row in the
result page describes a dataset... The description contains a field
named "reference". It could be there for direct retrieval or tp provide
richer type of "links". What kind of "links" could be provided to the
user for such a dataset ? One can imagine various types such as:
- direct retrieval of the full dataset
- access to a part of the dataset when the internal structure is
known ...
- access to a service by forcing an ObsID-like parameter to be fixed
- access to related files : previews, visualisations, calibration
files,
etc, etc....
Of course each dataset described in the Obstap service query response
may have several links like this and the nature of the link has to be
described somehow.. The reference itself (generally a web access) has to
be described by its URL, its format and size like in SSA services or
Obstap services query responses but in addition the structure of the
file has also to be given ...
Describing the link in practice
---------------------------------------
Concretly, how will be the links described... A small package of
attributes (data model package) can be defined for this. We will define :
- the ObsId attribute
- an attribute giving the meaning (or semantics) of the link
(Calibration file or SIA DESCRIPTION or catalogue part in a complex
dataset -archive - )
- an attribute describing the IVOA type of the link: simple
retrieval, Other Obstap, SIA , SSA service, with either query or
DataAccess method, UWS service , etc....
- an Access package details the structure of the link (see
Characaterisation 2 reference) for more complete description of this
Access package.... It contains :
* an URL or URI
* a mime/type eg image/fits, ....
* an estimated size for the response
* a subtype : table, votable, mef, archive
* a set of internal attributes:
. path
. array
. row
. field
. extnum
. extname
The "internal" attributes are really important to provide localisation
of the link inside a complex dataset...
- The "Path" attribute allows to describe the file path and name
inside an "archive" dataset
- The "array" attribute defines a cutout in a n-dimensional array
image using the cfitsio syntax: [50:100,70:200] being the extracted
subimage from pixel 50 to 100 in x and 70 to 200 in y.
- The "extnume" or "extname" attribute designates the extension
number or name in a multi extension FITS file. Extname can also be used
to designate a RESOURCE or TAble name in a complex VOTABLE document ...
- "FIELD" and "ROW" have obvious significations in a FITS or
VOTABLE table...
All these attributes are optional and by using this ordering:
[path][extnum|extname][field][row][array]
it should be possible to locate any kind of significant structure in
archives or datasets containing the most commun astronomy standards for
files ...j
Building a Data Link service
-------------------------------------
A data Link service is a simple DAL service providing DataLinks for a
set of Obsids. The result is presented as a VOTABLE with one field per
attribute ( attributes described above) ... The input parameters is an
Obsid. A set of OBsIds stored in a file can also be given as input.....
Let's give examples of queries and query responses.
A service query could be of the following form:
http://aaa.bbbb.fr/dal-services/datalink?obsid="ivoa://xxx.yyy.edu/123345"
where ivoa://xxx.yyy.edu/12345 is an IVOA identifier of a specific
observation which could have been provided by an Obstap Query.
The query response could be something like this:
<TABLE name="DataLinks">
<FIELD name="Obsid" utype="dl:Dataid.ObservationID" datatype="char"
arraysize="*"/>
<FIELD name="Semantics" utype="dl:Semantics" datatype="char" arraysize="*"/>
<FIELD name="Servicetype" utype="dl:Votype" datatype="char" arraysize="*" />
<FIELD name="reference" utype="dl:Access.Reference" datatype="char"
arraysize="*"/>
<FIELD name="format" utype="dl:Access.Format" datatype="char"
arraysize="*" />
<FIELD name="size" utype="dl:Access.Size" datatype="char" arraysize="*" />
<FIELD name="subtype" utype="dl:Access.Subtype" datatype="char"
arraysize="*" />
<FIELD name="path" utype="dl:Access.AccessParams.Path" datatype="char"
arraysize="*" />
<FIELD name="extnum" utype="dl:Access.AccessParams.Extnum"
datatype="char" arraysize="*" />
<FIELD name="extname" utype="dl:Access.AccessParams.Extname"
datatype="char" arraysize="*" />
<FIELD name="field" utype="dl:Access.AccessParams.Field" datatype="char"
arraysize="*" />
<FIELD name="row" utype="dl:Access.AccessParams.Row" datatype="char"
arraysize="*" />
<FIELD name="array" utype="dl:Access.AccessParams.Array" datatype="char"
arraysize="*" />
<DATA>
<TABLEDATA>
<TR>
<TD>ivoa://xxx.yyy.edu/123345</TD>
<TD>full dataset</TD>
<TD>retrieval</TD>
<TD>http://xxx.yyy.de/archive.tar</TD>
<TD>3.4Gb</TD>
<TD>archive/tar</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
</TR>
<TR>
<TD>ivoa://xxx.yyy.edu/123345</TD>
<TD>image</TD>
<TD>retrieval</TD>
<TD>http://xxx.yyy.de/archive.tar</TD>
<TD>1Gb</TD>
<TD>image/fits</TD>
<TD>none</TD>
<TD>image/cccc.fits</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
</TR>
<TR>
<TD>ivoa://xxx.yyy.edu/123345</TD>
<TD>image metadata</TD>
<TD>sia</TD>
<TD>http://xxx.yyy.de/sinea?query&obsid="ivoa://xxx.yyy.edu/123345"</TD>
<TD>1Kb</TD>
<TD>application/xml+votable</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
<TD>none</TD>
</TR>
</TABLEDATA>
</DATA>
In this response example, where the main dataset is a tar archive, the
first record links to the full retrieval of the whole archive, the
second record links to a FITS image cccc.fits in the directory image of
the tar file.... The last record links to the query method of a SIAP
service which will answer by description of images sharing the
"ivoa://xxx.yyy.edu/123345" Obsid
Extensions of ObsTap for dataLinking
----------------------------------------------------
In any case a future version of ObsTAp could benefit defining an
additional FIELD with utype "dataLink" which will be pointing to a
DataLink service (a PARAM could also be sufficient...), using the Obsid
value of each record as the main parameter for the query.
In addition an ObsTAp service is a TAP service and may have several
query languages ... The mandatory ADQL interface can be usefully
completed by PQL for example. In the case we use a Obstap service with a
PQL interface the standard doesn't require the single table response
... This allows to add DAL extensions (additional tables) - see SSA
recommendation for a definition of DAL extension mechanism - to the main
standard Obstap table. Adding a specific DataLink response to the main
Obstap table becomes then possible. The ObsId FIELD which is common to
the two tables allows to relate records in the main obstap table to
records in the concatenated DataLink response, and can be used as a
reference key. This approach avoids the necessity for the clients to
extract the URL for the dataLink service from the main table and to
start a new query on another service...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/dal/attachments/20111014/4caf10ff/attachment-0001.html>
More information about the dal
mailing list