Datalink document questions
Norman Gray
norman at astro.gla.ac.uk
Mon Apr 28 04:32:59 PDT 2014
Greetings, all.
I've just gone through the whole Datalink document for the first time (I originally started at 3.2.6 and worked outwards). I'm reading http://www.ivoa.net/documents/DataLink/20140228/WD-DataLink-1.0-20140228.html
I've a few comments and suggestions. The following are in document order rather than importance order.
* Header: The document doesn't refer to its own current/previous URL in the header, in the approved fashion.
* (I think this is quite an important point) Introduction, paragraph 2: this talks of "drilling down from a discovered identifier (typically an IVOA publisher dataset identifier) to find details about the data files that can be downloaded..." This leaves me, at least, with an incomplete picture. Say this identifier is ivo://foo/bar or http://example.org/bar (I presume the standard is agnostic about which type of URIs it's servicing),
1. Is ivo://foo/bar the identifier for the dataset, or ...
2. ...the identifier for a bag of metadata about the dataset?
More concretely, which is these is a synonym for the dataset DOI, presuming that one such were registered, so that it could be cited in papers? Or, in other terms, if one were to give the 'author' of ivo://foo/bar would it be referring to the scientist who generated the data, or the datacentre that assembled the {link} information? The same point goes for the (Sect. 3.2) 'description' -- is this describing the dataset or the metadata?
Markus mentioned (in passing) that the 'semantics' bit should mention a 'self' link. What would that point to? (...is yet another version of this question).
* Sect 1.2.2: this appears to be talking about provenance -- should it say so explicitly? If the Datalink and Provenance efforts are smart, Datalink will be able to use the Provenance work with no or minimal extra work.
* Sect 1.2.3, para 1: Content-type. Recall that in HTTP [RFC 2616] the content-type header on the HTTP response is the _overriding_ source of information on the type of a response (this obviously doesn't apply to other schemes such as ivo:) See Sects. 14.17 and 7.2.1 of the RFC. Based on the 'MAY' in the latter section, I think that the text in this Sect 1.2.3 should say something like "...other metadata such as the expected content-type", since if the response comes back with a different content-type that's supposed to be authoritative. The same point applies in a couple of other places in the document.
* Sect 1.2.3, para 1: paragraph ends mid-sentence.
* Sect 1.2.3, para 3: just to raise the point -- are these alternate URLs necessarily distinct? Does the Datalink standard want to mention the possibility of requesting an (HTTP) resource with different 'Accept' request headers, and so getting different formats by that route?
* Sect 1.2.4: It's not completely clear to me how this section is different from the previous one, or indeed from 1.2.5 or 1.2.6, since all are about going from 'this' resource to another one. I can see that there's a distinction in that 1.2.3 is about HTTP (yes?) but 1.2.4--1.2.7 are about other types of lookup services. They seem to be basically describing the same thing, so it's a bit confusing seeing them described so differently. These are all 'links to other resources' -- surely if the way they're linked is so very heterogeneous, that's hinting that there's something wrong here.
* Sect. 1.2.5: the paragraph cuts off mid-sentence.
* Sect 1.2.6: para 1 here seems to be describing something very like PDL. I know that that's intended for simulations, and that one of MarkT's responses to the PDL TCG review was to hope that PDL would be confined to theoretical services. That said, this paragraph appears to be saying so very clearly that there's an analogous need for data services, that it starts to seem perverse not to mention PDL.
* Sect 2: it might be worth a couple of words clarifying what's meant by '{links}'. Reading this section in combination with the DALI standard (which I'm familiar with, though not in much detail), I'm still not sure what sort of URL I'd expect to see. The examples in Sect 2.1 suggest that '/links' is a magic URI fragment; but in that case what are the curly brackets indicating in "{links}"?
It would probably be useful, in this section, to have an example of the sort of exchange that a client would expect to see.
* Sects 2.1.1--2.1.3: This describes the client having to build a DALI-style composite URL with a redundant REQUEST parameter, a RESPONSEFORMAT which appears to be redundant with the HTTP response (since it would be overridden by the Content-Type header in the response), and an ID parameter.
Is there a URI you can simply retrieve the links data from, like <http://example.org/service/[dataset-id]/links>? Or even simply retrieving <http://example.org/service/[dataset-id]> with Accept:application/x-votable+xml;content=datalink -- that's the easiest for both the client and for the specification-writer. I suspect this question risks taking us a little further afield... so ignore it if you want.
Sect 2.2: here and elsewhere, there are 'should' and 'must'. Are these RFC 2119 should/must? If so, it might be worth highlighting them as such.
Sect 2.4, second-last para: "...if the differ" -> "...if they differ"
Sect 3.1, para 1: "with an URL" -> "with a URL". Here also, the last sentence might (following the content-type remarks above) be better "...to denote that the response" -> "...to denote that the expected response".
Sect 3.2.1: "The ID contains the input identifier value" -- is this the URL from which the Datalink information was retrieved, or the ID of Sect. 2.1.3?
Sect 3.2.2: Is the access_url cacheable? At one extreme this could be just a URL for an FTP service, or something like that; at another, this could be a staged file with an unpredictable URL that will disappear in some short period. I think it makes good sense both ways, but it might be worth a sentence discussing this.
Sect 3.2.3: "all others should null" -> "all others should be null".
Sect 3.2.4: "This resource is typically describes..." sentence garbled.
Sect 3.2.4, second-last para: "such a on-the-fly..." -> "such as on-the-fly...".
Sect 3.2.7: "they will receive" -> "they should expect to receive" (content-type again)?
Sect 3.2.8: same point -- "the size of the download" -> "the expected size of the download"
Sect 3.3: if this is just the usual stipulations of the HTTP conversation, then it seems redundant to state these here; if in contrast there's something here beyond the usual HTTP stuff, then that might need to be highlighted a bit more.
Regarding the content-length: RFC 2616 Sect 14.13 says that "Applications SHOULD use [the content-length] field to indicate the transfer-length of the message-body" and (14.29) "HTTP/1.1 servers SHOULD send Last-Modified whenever feasible."
Sect 3.2.1: is this saying that the type='results' RESOURCE must be first, should be first, or is usually first? Sect 4.1 seems to suggest the last.
Sect 3.4: just to avoid doubt, it might be worth saying something like "... the error message must start with one of the strings in the following table (with no leading whitespace, and matching case)"
Sect 4.1: there's a pull-out discussion here about "...However, the VOTable schema only allows “results” and “meta” as values for type..." I'm not positive about the issue, but is there a use here for the <LINK> element whose meaning was adjusted in VOTable 1.3 (Sect 3.5)
Sect 4.1: this section is about linking between this Datalink element and other resources (yes?). If this is about linking to other resources, why isn't it being done in the "{links}" table? That sort of linking appears to be exactly what Datalink is about, so I'm a bit confused about why this extra section is here, talking about an apparently completely different linking mechanism. (not to mention that it's a seriously complicated/confusing mechanism)
* Sect 6, References: Ref [1], link <http://www.ivoa.net/std/DALI> should (?) be <http://www.ivoa.net/documents/DALI/>
I hope all this helps....
All the best,
Norman
--
Norman Gray : http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK
More information about the dal
mailing list