DataLink issues

Laurent Michel laurent.michel at astro.unistra.fr
Thu Sep 13 06:15:42 PDT 2012


Hello,

I've the feeling that an agreement about the scope of Datalink is emerging but we still have to work a bit to go ahead:
Links contained in a DataLink response must be "clickable" without any setup as Doug said. But, in some cases, users must have 
the possibility to give parameters to the queries. This flexibility is due to the fact that the concept of linked data is rather 
general. That is the core of the discussion
I try here to summarise my understanding of the status of the discussion and I' d like to sketch some possible DataLink 
specifications.
(see below)


Le 10/09/2012 19:49, François Bonnarel a écrit :
> HI Doug, all,
> Le 07/09/2012 05:08, Douglas Tody a écrit :
>> On Thu, 6 Sep 2012, François Bonnarel wrote:
>>
>>> Dear all,
>>>       Last interop confirmed there was interest in the DataLink concept , defined as a "way to find related resources to
>>> datasets via a web service"
>>>      This is needed either as a complement to Data services (ObsTap, Tap, S*A service) or to SimDal...
>>> A general agreement was made on input/output (PublisherDID/ VOtable with links) and on a general concept for the structure of
>>> the links "records"
>>>
>>> There are  still a couple of issues, however:
>>>
>>>         -- Is "DataLink" a real full DAL service with recording in the IVOA registries ? Or is just a web service refered in
>>> the main services query responses ?
>>> The  first case requires  dataset ID to be a real PublisherDID. For the latter case an internal  DatasetID could be
>>> sufficient....
>>
>> The basic concept of DataLink is that we start with a Dataset (PubDID)
>> or maybe an Observation (ala TAP obs_id) and go get a list of data links
>> pointing to associated data files, services, or other resources, e.g.  a
>> batch job for custom reprocessing of the dataset.
>>
>> It would seem that, given a DAL query response (TAP, SIA, etc.) it
>> should be possible to directly query for the data links for a given
>> dataset or possibly obs_id, without having to try to infer the existence
>> of an associated datalink service via a registry query.  So the DAL
>> query response should point to whatever web service or service operation
>> is used to get datalinks.  If for the given service, a PubDID or obs_id
>> is global and persistent then there is no reason this datalink service
>> could not also be registered as a service in its own right, separate
>> from any shortcut links from related DAL query responses, but this need
>> not be required to get datalinks for an existing DAL service query
>> response.
> OK I agree.
>>
>>>        -- One of our use cases is "related access to another DAL service". Another one  "Internal access  to complex datasets
>>> (archives, MEFS ....)....
>>> The first use case can be some "AccessData" method of the DAL service performing some dynamic transformation on the dataset.
>>> The actual transformation is
>>> driven by the parameters values of the method. Probably "AccessData" URL of the service, without any parameter, could answer
>>> with a VOTABLE describing available parameters for the dataset...
>>>            For the internal access to complex datasets, the May 2012 DataLink draft proposed a little model for internal
>>> structure "mappable" on response VOTABLE FIELDs
>>> This has been criticized as an "ad hoc" solution for our MEF or tar archives examples...
>>>            Another solution could be "extended URI" (Norman Gray proposal). But the rightmost part of the URI is not
>>> interpretable anyway...
>>>            A new proposal (Laurent Michel) could be that the link to the complex dataset  provides a list of parameters
>>> allowing to extract the various subparts of the dataset.
>>> Something rather similar to the "AccessData" link behavior, eventually
>>
>> I think DataLink is a quite different mechanism from AccessData.
>>
> I agree. Probably the misunderstanding here comes from my bad english. When I write "The first use case can be some "AccessData"
> method of the DAL service" I mean that the acref in the link can be the URL root for a DAL service/accessData method.
> Suppose we have an image description in an Obstap query response. Datalink can provide a link to either SIA URL with query
> method for more metadata or SIA URL /accessdata method for cutouts or resampling
>> DataLink is a sematic Web type of capability, providing semantically
>> complex links (more complex than just URLs) which can be followed given
>> a starting object.  Although data links have more complex semantics I
>> think the basic mechanism should be kept fairly simple - given the right
>> software one might be able to just "click" on a link and somethingI've the feeling that an agreement about the scope of Datalink is emerging but we still have to work a bit to go ahead:
Links contained in a DataLink response must be "clickable" without any setup as Doug said. But, in some cases, users must have 
the possibility to give parameters to the queries. This flexibility is due to the fact that the concept of linked data is rather 
general.
I try here to summarise my understanding of the status of the discussion and I' d like to sketch some DataLink specifications.

>> happens.  Or at least one gets a list of associated data objects or
>> other resources, with more semantic detail than we get from just a Web
>> resource.
>>
>> AccessData is very different.  This provides precise, client-driven,
>> quantitative access to a given dataset.  So for example the client can
>> say give me this exact cutout, slice, or other subset of the dataset,
>> expressed in pixel (in the case of an image) or WCS coordinates.  This
>> is very different than a datalink as datalinks are predefined by the
>> data provider (here is a list of the standard datalinks we define)
>> whereas accessData provides the client with direct, sample/pixel level
>> access to a given dataset, to allow quantitative analysis without having
>> to download the full dataset.  The one place where they can come
>> together is where a datalink URL can invoke a specific accessData
>> operation on a dataset, e.g.  to provide some standard view of the
>> dataset.
> The only point is : do we force the link with static , predefined values of the parameters or do we let the AccessData method
> open for a given dataset...
>
> This can be done by AccessData answering  by a VOTABLE describing available  transformation parameters for the considered dataset


==== Parameterised Query =======================
Links can point on a simple data file (preview e.g). It can also point on a specific subset of a data file. These 2 cases 
requires a priori no parameters. But there are a lot of others situations where link accesses must be parameterised. That is 
especially true for services running post-processing (data tile extraction, filtering, (re)calibration....) or invoking DAL 
services.

==== No standardisation of query parameters ==================
I don't think we can define a global semantic or a generic grammar including all present and future use cases. This task would 
lead to something probably very complex and necessary too restrictive.

==== Processing links individually ==================
Links contained in one DataLink response do not have necessary the same semantic. In other words, the client must consider and 
process them (one link = one response row) individually.
A fairly simple way to sort that point out would be that individual links of a DataLink response support a free definition of 
their own query parameters.
An obvious solution could be to require the link service (pointed by  acref) to return a parameter description when parameters 
are missing; but that doesn't work for services accepting query either with or without parameters.
I think it would be more safe to indicate in some field of the link description, that parameters are required. In that case, a 
specific query parameter, mandatory for all link services (similar to the --help) could be used to get the description of all 
other parameters.

==== Parameter description ============================
Each parameter can be defined by a name, a textual description, a range of possible values and a unit.
The range of values depends on the start of the link (pointed by the PublisherDID). It can be a pixel range, a bandwidth, an 
enumeration of FITS extension names or anything else. This feature can be used by the client to deny inconstant queries.
Another open question is the set of operators which can be used. I suggest that a basic list (=!, =, >, <) is enough to do the 
Job (looks like a PQL use case).

===== Summary of the Link definition (just a non exhaustive example) ===================
* publisherDID
* acRef
* ResponseType: Type of the response (DAL, resampling, preview, tile extraction, misc, ...) followed by a flag indicating that 
acRef can be parameterised. The vocabulary of the response types is TbD.
...

===== Summary of the Parameter description (neither exhaustive) =====================================
* name
* uType (?)
* description
* range : range or enumeration
* unit

Regards
LM

>
> Regards
> François
>>
>>     - Doug
>>
>>
>>> Your comments are welcomed. We have to modify the May draft to go forward at netx interop
>>>
>>> Best regards
>>> François
>>>
>

-- 

---- Laurent MICHEL              Tel  (33 0) 3 68 85 24 37
      Observatoire de Strasbourg  Fax  (33 0) 3 68 85 24 32
      11 Rue de l'Universite      Mail laurent.michel at astro.unistra.fr
      67000 Strasbourg (France)   Web  http://astro.u-strasbg.fr/~michel
---
-------------- next part --------------
A non-text attachment was scrubbed...
Name: laurent_michel.vcf
Type: text/x-vcard
Size: 391 bytes
Desc: not available
URL: <http://www.ivoa.net/pipermail/dal/attachments/20120913/e3fcbb76/attachment.vcf>


More information about the dal mailing list