WD-DataLink-1.0

Laurent Michel laurent.michel at astro.unistra.fr
Thu Dec 12 01:05:24 PST 2013



Le 10/12/2013 14:21, Markus Demleitner a écrit :
> Dear DAL list,
>
> On Fri, Oct 25, 2013 at 10:40:52AM -0700, Patrick Dowler wrote:
>>
>> The first official and more or less complete WD for DataLink is now
>> available in the document repository (in the Documents in Progress
>> section).
>>
>> Direct link is here:
>>
>> http://www.ivoa.net/documents/DataLink/index.html
>
> I've now updated my prototype datalink service to something "pretty
> close" to the WD (for some features, I chose to implement the changes
> I'm suggesting below).
>
> This is a longish mail, but quite a bit of it is essentially redactional.
> I've tried to put what I think might be contentious near the top.
>
> There's also the bigger issue that I'm not entirely happy with the "free
> service" part.  This requires a fairly large chunk of text that I'm
> still preparing.
No doubt this lack will start a lot of discussions.
This aspect should however be clearly mentioned in the introduction and developed in the standard.
Another important aspect which should be reported in the introduction is that Datalink can connect non-VO services within the VO 
world.

>
>
>
> Is Datalink a stand-alone service?
> ----------------------------------
>
> In Sect. 2, datalink is designed as a full, registrable, DALI-compliant
> service.  The implementation as an extra service is fairly natural, and
> so that's what I've done, too, but I don't think the standard should
> mandate this.  In principle, it should be possible to have datalink as a
> capability of another service (e.g., the ObsTAP one).
>
> My arguments:
>
> (1) I expect datalink services to be fairly closely bound to concrete
> data collections and hence services, as in general you'll need to know
> quite a bit about your data
>
> (2) Having a separate registry entry for a datalink service clutters the
> registry with services that need no discovery, at least not as long as
> you cannot discover what IDs a service will have data for.
>
> (3) VOSI availability and examples can be re-used from the embedding
> service with no loss of functionality.
>
> This would mean striking the entire text between "2 Resources" and the
> 2.1. headline, and probably renaming the section "2 The Datalink
> Endpoint" (or "capability" if you prefer").  There would be some
> redactional changes further down (I'm making suggestions further down).
>
There are some cases (at least one) where a datalink service can be invoked outside of any service response.
I've in mind the implementation of a datalink attached to HIPS tiles in Aladin. FB will say more about that use-case.
Here, the client need a convenient way to discover which ID must be used. I think that this must be described within the 
registry following one spect of the DL document.
>
> RESPONSEFORMAT?
> ---------------
>
> Is there a use case for that?  This appears to me an overgeneralization,
> and it's a liability if we more or less require certain metadata to be
> transferred; this, in particular, concerns STC metadata, but even
> tivialities like the unit of contentLength, for which there's language
> in the draft just to support RESPONSEFORMAT.  It also seems highly
> doubtful that service metadata could usefully be transferred in formats
> other than VOTable.
>
> The use case "support naive javascript clients" is, I would argue,
> already satisified by requiring TABLEDATA serialization for the
> VOTable response.
I really convinced that forcing JS clients (and others) to tackle with VOTable serialization is toot restrictive.
Why should wed impose one format where simpler thinks can me more efficient (and more in the current trend)
The specification must allow services to propose different output formats understanding that VOTable remains compulsory

>
> If we're going to go forward with this, we'll have to severely limit
> what we can express in datalink responses, or we'll have to accept
> dramatically different semantics in differing output formats.
>
> Killing RESPONSEFORMAT would also do away with 5.1.2, which is good, as
> optional features are the curse of interoperability...
>
>
> Case issues
> -----------
>
> I'm in favour of explicitely saying that at least the ID parameter is
> *not* case-insensitive.
>
> Then, the column names in the table descibed in "4 List of Links" are
> camelCase.  I'm not a big fan of that when we're talking about names
> that might end up in an actual SQL-based database; granted, that's not
> what we recommend now, but I can totally see exposing a database table
> of "pre-rendered" datalinks via TAP.
>
> When that happens, we don't want mixed case in there.  The reason is
> that SQL becomes really mystifying when you have delimited identifiers
> in mixed case, and in particular the MySQL crowd has a tendency to
> over-delimit.  What happens then is that
>
> select accessURL
>
> will fail, as will
>
> select accessUrl,
>
> select accessurl
>
> and everything else except
>
> select "accessURL".
>
> It's easy to mitigate this kind of issue by just having all-lower case
> identifiers and separate words with underscores.  So, I'd like to have
>
> id, access_url, error_message, service_type, semantics, content_type,
> content_length
>
> as column names (where, of course, I'd on principle prefer if concepts
> that exist in obscore had the same name in both obscore and datalink).
>
Great, in a ideal world any quantity related to an SQL table description shouldn't be case sensitive.

>
>
> 2.4 Capabilities
> ----------------
>
> If we agree on datalink being an auxillary endpoint rather than a
> full-fledged DALI service, then this section would become:
>
>    A service with one or more Datalink endpoint(s) SHOULD declare them
>    in its VOSI capabilities resource as well as its registry record.  The
>    capability is a standard VOResource capability (i.e., there is no
>    dedicated Datalink registry extension) with a standard id of
>
>      ivo://ivoa.net/std/DataLink/v1.0
>
>    The capability MUST have at least one interface of the type
>    vs:ParamHTTP, where vs corresponds to the namespace
>    http://www.ivoa.net/xml/VODataService/1.1 or any earlier or later
>    namespace URI of VODataService version 1.x.  As usual in Registry
>    documents, the recommended namespace prefixes (in this case, vs) SHOULD
>    be used if at all possible.
>
>    Here is an example for such a capability [or rather have that in
>    an appendix with a clear indication that this is not normative?]:
>
>    <capability
>      xmlns:vs="http://www.ivoa.net/xml/VODataService/v1.1"
>      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>      standardID="ivo://ivoa.net/std/DataLink/v1.0">
>      <interface xsi:type="vs:ParamHTTP">
>        <accessURL use="base"
>          >http://example.com/datalink</accessURL>
>        <queryType>GET</queryType>
>        <resultType>application/x-votable+xml;content=datalink</resultType>
>        <param std="true">
>          <name>ID</name>
>          <description>The pubisher DID of the dataset of interest</description>
>          <ucd>meta.id;meta.main</ucd>
>          <dataType>string</dataType>
>        </param>
>      </interface>
>    </capability>
>
>
>    Multiple capability elements with the Datalink standard identifier may
>    be included a a capabilities element; this is typically used if they
>    differ in protocol (http vs. https) and/or authentication requirements.
>
> I've taken the liberty of changing the standardID; IMHO these should
> resolve to actual standard resource records, and thus any URI referring
> to a fragment is out with current VOResource.
>
> Note that that would still allow people to register standalone datalink
> services if that's what they want.
>
> Now, if we want to allow discovery queries on these (i.e., query all
> known datalink services to see which has a dataset), we should
> explicitely say so and urge people to register (DaCHS, for example, does
> not by default create a public capability for a datalink endpoint unless
> the user orders it; this is for consistency, code simplicity, and to
> reduce registry clutter.  If global discovery is what we want, I'd
> change this policy).
>
> Oh, and the use="base" vs. use="full" -- I've always understood this as:
> on GET-based services with parameters, we have use="base".  I'm open
> for enlightenment, though.
This depends o the above point related to the section 2.
>
>
>
> 3.2 Service Resources
> ---------------------
>
> I'll say a bit more about those in the upcoming data service proposal,
> but even if that is rejected, the proposed method of communicating which
> column contains the IDs... ahem... has potential for beautification.
> The ideal solution here would be a FIELDref with a utype that tells a
> client it's the ID source.
>
> I don't want to clobber that, as I still have hopes we'll have proper
> VO-DML accompanying a future version of Datalink, which will offer a
> clean way of expressing this without having to do a lot of specification
> work.
Let's remain cautious about a mechanism which pretends resolve so many issues.

>
> Meanwhile, we have to bring together a PARAM (presumably) with a field
> reference.  I claim rather than just using a naked GROUP it's much more
> straightforward to (ab-) use the LINK child of param.  This would mean
> striking the text between "To call the service, the inner..." and "...in
> the result table" and replacing it with something like:
Agree with this mechanism
>
>    To determine which column in the result table the values for the ID
>    parameters comes from, clients evaluate the xpath
>    GROUP[@name="inputParams"]/PARAM[@name="ID"]/LINK[@content-role="ddl:id-source"]/@value.
>    This contains a fragment identifier (including the hash, which means
>    it is a valid relative URI) for the FIELD element describing
>    the corresponding column in the primary result table.
>
> Note that, again, once we have a proper modelling language in place,
> accepted, and supported by libraries, this kind of ad-hoc hack won't be
> necessary any more, so I'm not claiming that this is some sort of
> precedent.
>
> The example resource above could then be:
>
>    <RESOURCE type="datalinkService">
>      <GROUP name="inputParams">
>        <PARAM arraysize="*" datatype="char"
>          name="ID" ucd="meta.id;meta.main" value="">
>          <LINK content-role="ddl:id-source" value="#ssa_pubDID"/>
>        </PARAM>
>      </GROUP>
>      <PARAM arraysize="*" datatype="char"
>        name="standardId"
>        value="ivo://ivoa.net/std/DataLink#links"/>
>      <PARAM arraysize="*" datatype="char"
>        name="accessURL"
>        value="http://localhost:8080/data/ssatest/c/dlmeta"/>
>    </RESOURCE>
>
> [Incidentally: If anyone feels these things should be GROUPs rather than
> RESOURCEs, you'd have my vote, but I don't think it matters much at this
> point]
>
>
> UCDs
> ----
>
> I'd propose the following UCDs for the columns:
>
> 		ID               meta.id;meta.main
> 		accessURL        meta.ref.url
> 		serviceType      meta.code
> 		errorMessage     meta.code.error
> 		description      meta.note
> 		semantics        meta.code
> 		contentType      meta.code.mime
> 		contentLength    phys.size;meta.file
>
> -- where I'd say we should really register new UCDs for accessURL ("the
> URL a dataset can be retrieved at", meta.ref.accessURL, say), semantics
> ("a relationship between a dataset and a web resource",
> meta.ref.relationType), and description ("a human-readable elaboration
> on the nature of something", meta.description).
>
> I'll suggest the need for several more UCDs in the data service
> proposal, so there'd be no need to open a new UCD process just for
> those.
>
> I believe the UCDs should go into section 4, not section 5.1.1.

Using a second word with something like "linkeddata.*"?

>
>
> contentLength
> -------------
>
> I think the Description on 4.8 should more be something like
>
>    The contentLength column contains an estimate of the amount of data
>    that will be returned on retrival of accessURL.  An order-of-magnitude
>    figure here is better than nothing, as it probably will not matter to a
>    user very much whether they will be retrieving 40000 or 50000 Bytes.
>    It probably will matter whether they will be retrieving 40 kB or 40 GB.
>
>    contentLength is given in Bytes.  This must be reflected in the
>    column metadata of the metadata response.
>
>
> Abstract needs a bit more meat
> ------------------------------
>
> Here's a suggestion for a somewhat enhanced abstract:
>
>    Datalink is an IVOA defined protocol intended to allow access to
>    artifacts connected to a dataset -- e.g., pieces of complex datasets,
>    cutouts, processed and ancillary data, pieces of a dataset's
>    provenance, renderings and previews -- behind just a single URL.  It
>    thus works as an intermediate data access service that connects
>    discovered datasets on the one hand and downloadable resources,
>    services that can act upon the data files, and links to related
>    resources on the other.  It is intended to be used in connection with
>    IVOA data discovery services like Obscore/TAP, SIAP, or SSAP.

A mention of free servoces would be elcome here again and to the connection with non VO services)
>
>
> Suggestions for clarification
> -----------------------------
>
> I'd appreciate some language on what a service should do without
> REQUEST.  Since the parameter is kinda superfluous in datalink, it's
> tempting to just work without it, but of course that's a liability as it
> may hide client bugs.
>
> Then again, if we agree this is not a full DALI service, maybe we can do
> away with REQUEST altogether?  IMHO that'd be a step forward (not only
> in Datalink:-).
>
>
> Typos
> -----
>
> Sect 1.2.3, "may be of the some" -> "...same"
>
> Sect 1.2.5, "custom Uri" -> "...URI"
> No FIELDRef in a convenient location, hence PARAM/LINK for pointer to
> pubDID field.
>
> Sect 1.2.6, "response (e.g., recursive" -> "... (i.e., ..."
>
> Sect 4, "size of download" -- I'd rather have "size of resource" here.
>
> Cheers,
>
>            Markus
>


Bye
LM
-- 

---- Laurent MICHEL              Tel  (33 0) 3 68 85 24 37
      Observatoire de Strasbourg  Fax  (33 0) 3 68 85 24 32
      11 Rue de l'Universite      Mail laurent.michel at astro.unistra.fr
      67000 Strasbourg (France)   Web  http://astro.u-strasbg.fr/~michel
---


More information about the dal mailing list