WD-DataLink-1.0

Patrick Dowler patrick.dowler at nrc-cnrc.gc.ca
Tue Dec 10 11:08:38 PST 2013


Lots of responses below:

On 12/10/2013 05:21 AM, Markus Demleitner wrote:
> Is Datalink a stand-alone service?
> ----------------------------------

The intent is that we are defining a single capability so it is OK to 
have a {links} resource as part of some other service. The language 
about resources is probably just too explicit about that you need to 
have in the service without clarifying that the VOSI resources could 
describe an (arbitrary) collection of capabilities. [to be clarified]

>
> RESPONSEFORMAT?
> ---------------

DALI describes how to define compliant capabilities. All DALI 
capabilities support this param and need to specify the default format 
(which is required). Having the here with default VOTable means any 
client can use any DataLink capability, but implementers can always have 
their own custom output formats (json) to support their own use cases 
(eg their own portal).

>
> Case issues
> -----------
> id, access_url, error_message, service_type, semantics, content_type,
> content_length
>
> as column names (where, of course, I'd on principle prefer if concepts
> that exist in obscore had the same name in both obscore and datalink).

That style is just from my habitual orm thinkng and we could change 
this. [will call for vote on naming style for FIELDs]

>
> 2.4 Capabilities
> ----------------
...
> I've taken the liberty of changing the standardID; IMHO these should
> resolve to actual standard resource records, and thus any URI referring
> to a fragment is out with current VOResource.

The standardID should be the ivron used to register the standard and it 
should be allowed to include a datalink capability in another service. 
[will change]


> Oh, and the use="base" vs. use="full" -- I've always understood this as:
> on GET-based services with parameters, we have use="base".  I'm open
> for enlightenment, though.

I thought use="full" was used when both GET nd POST can be used to send 
the request params, which is specifically required by a DALI-sync 
capability. So, I think this should be use="full"... also open for 
enlightenment :)

>
>
>
> 3.2 Service Resources
> ---------------------
...
> Meanwhile, we have to bring together a PARAM (presumably) with a field
> reference.  I claim rather than just using a naked GROUP it's much more
> straightforward to (ab-) use the LINK child of param.  This would mean
> striking the text between "To call the service, the inner..." and "...in
> the result table" and replacing it with something like:
>
>    To determine which column in the result table the values for the ID
>    parameters comes from, clients evaluate the xpath
>    GROUP[@name="inputParams"]/PARAM[@name="ID"]/LINK[@content-role="ddl:id-source"]/@value.
>    This contains a fragment identifier (including the hash, which means
>    it is a valid relative URI) for the FIELD element describing
>    the corresponding column in the primary result table.

I don't see how this is simple than a GROUP with a PARAM and a FIELDref. 
And for use with free service, it would be a PARAM without a (GROUPed) 
FIELDref, but with other metadata (TBD).

> Note that, again, once we have a proper modelling language in place,
> accepted, and supported by libraries, this kind of ad-hoc hack won't be
> necessary any more, so I'm not claiming that this is some sort of
> precedent.

We discussed this very point and it was generally agreed that 
DataLink-1.0 needed to be completed well before we could feasibly use a 
non-ad-hoc solution. We could deprecate the ad-hoc solution when it can 
be replaced by something better.

> The example resource above could then be:
>
>    <RESOURCE type="datalinkService">

Just a nitpick: the type="service" was not supposed to be an example; 
that was really defining a "type of resource" just like type="results" 
so it is easy for clients to know if they want to look inside (for the 
standardID).

>      <GROUP name="inputParams">
>        <PARAM arraysize="*" datatype="char"
>          name="ID" ucd="meta.id;meta.main" value="">
>          <LINK content-role="ddl:id-source" value="#ssa_pubDID"/>
>        </PARAM>
>      </GROUP>
>      <PARAM arraysize="*" datatype="char"
>        name="standardId"
>        value="ivo://ivoa.net/std/DataLink#links"/>
>      <PARAM arraysize="*" datatype="char"
>        name="accessURL"
>        value="http://localhost:8080/data/ssatest/c/dlmeta"/>
>    </RESOURCE>
>
> [Incidentally: If anyone feels these things should be GROUPs rather than
> RESOURCEs, you'd have my vote, but I don't think it matters much at this
> point]

In an earlier prototype shown in May 2013 we did use GROUPs in the the 
resuts resource, but separate resources was generally agreed to be more 
tidy and flexible (and easier to ignore when you care only about the 
content).

> UCDs
> ----
>
> I'd propose the following UCDs for the columns:
>
>                  ID               meta.id;meta.main
>                  accessURL        meta.ref.url
>                  serviceType      meta.code
>                  errorMessage     meta.code.error
>                  description      meta.note
>                  semantics        meta.code
>                  contentType      meta.code.mime
>                  contentLength    phys.size;meta.file
>
> -- where I'd say we should really register new UCDs for accessURL ("the
> URL a dataset can be retrieved at", meta.ref.accessURL, say), semantics
> ("a relationship between a dataset and a web resource",
> meta.ref.relationType), and description ("a human-readable elaboration
> on the nature of something", meta.description).

Requesting some new UCDs would probably be a god idea. We can do this 
once we have a WD with recommended UCDs (some of which are non-std). 
[will add UCDs]


> contentLength
> -------------
>
> I think the Description on 4.8 should more be something like
>
>    The contentLength column contains an estimate of the amount of data
>    that will be returned on retrival of accessURL.  An order-of-magnitude
>    figure here is better than nothing, as it probably will not matter to a
>    user very much whether they will be retrieving 40000 or 50000 Bytes.
>    It probably will matter whether they will be retrieving 40 kB or 40 GB.
>
>    contentLength is given in Bytes.  This must be reflected in the
>    column metadata of the metadata response.

Does changign this to be an estimate and your request above to reconcile 
names with ObsCore also imply your would want to use the ad-hoc name 
from ObsCore (access_estsize, iirc)?


>
> Abstract needs a bit more meat
> ------------------------------
>
> Here's a suggestion for a somewhat enhanced abstract:
>
>    Datalink is an IVOA defined protocol intended to allow access to
>    artifacts connected to a dataset -- e.g., pieces of complex datasets,
>    cutouts, processed and ancillary data, pieces of a dataset's
>    provenance, renderings and previews -- behind just a single URL.  It
>    thus works as an intermediate data access service that connects
>    discovered datasets on the one hand and downloadable resources,
>    services that can act upon the data files, and links to related
>    resources on the other.  It is intended to be used in connection with
>    IVOA data discovery services like Obscore/TAP, SIAP, or SSAP.
>
>
> Suggestions for clarification
> -----------------------------
>
> I'd appreciate some language on what a service should do without
> REQUEST.  Since the parameter is kinda superfluous in datalink, it's
> tempting to just work without it, but of course that's a liability as it
> may hide client bugs.

As with RESPONSEFORMAT, the REQUEST parameter is there to facilitate 
implementers having alternate functions that can be triggered, so it is 
like a reserved param they can use. It also lets the IVOA add new values 
later without breaking anything. For example, in TAP we have 
REQUEST=doQuery but I have considered implementing REQUEST=explainQuery 
to do, well, exactly what it says.


DALI says REQUEST is required in all requests so the description of each 
capability specifies the value(s). Currently we define one REQUEST value 
per capability (usually). REQUEST is kind of RPC-ish, while different 
accessURL for different capabilities is RESTful. Requiring REQUEST means 
that the RESTful-thinking folks have one boilerplate param to include 
while the RPC-thinking folks can also comply to the spec. This is only 
possible and transparent if REQUEST is required, which is why DALI is 
the way it is.

>
> Then again, if we agree this is not a full DALI service, maybe we can do
> away with REQUEST altogether?  IMHO that'd be a step forward (not only
> in Datalink:-).

Strongly opposed.

> Typos
> -----
>
> Sect 1.2.3, "may be of the some" -> "...same"
>
> Sect 1.2.5, "custom Uri" -> "...URI"
> No FIELDRef in a convenient location, hence PARAM/LINK for pointer to
> pubDID field.
>
> Sect 1.2.6, "response (e.g., recursive" -> "... (i.e., ..."
>
> Sect 4, "size of download" -- I'd rather have "size of resource" here.


Thanks for the work implementing and feeding back your thoughts.

-- 

Patrick Dowler
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9A 2L9

250-363-0044 (office) 250-363-0045 (fax)


More information about the dal mailing list