PubDIDs (and DIDs in general, maybe)
Accomazzi, Alberto
aaccomazzi at cfa.harvard.edu
Thu Jan 16 07:14:22 PST 2014
Hi Markus,
Thank you for bringing this up. I admit that I should have paid more
attention to this and past threads in DAL about the subject of identifiers
and comment earlier, so forgive me for this late entry in the debate.
In trying to avoid a TL;DR issue, let me simply refer people interested in
finding out about the persistent dataset identifier standard that ADS, AAS,
and the NASA centers adopted in 2005 to this short read:
http://labs.adsabs.harvard.edu/adsabs/abs/2011ApSSP...1..135A/ (and
references therein).
To summarize my take about the whole situation:
1. The "ADS/FacilityID#datasetID" identifiers are a kind of "special" ids
in that they were designed to be persistent by linking them to a resolution
mechanism which will take care of following the datasets through archive
migrations, mirroring, etc. A comparison between these identifiers and the
more general ivo ids can be found in the presentations that Ray and I gave
at the Nara DCP session three years ago:
http://wiki.ivoa.net/twiki/bin/view/IVOA/InterOpDec2010DCP
Note that if you go back to the original proposals you will see that they
were not defined as ivo URIs (partly because it wasn't clear whether the
IVOA registry would ever be able to resolve the way we wanted), but later
thought we could subsume them under the scheme. Part of the current debate
stems from that point: as Arnold points out, if we want those identifiers
to be legal ivo ids then we need to deal with the fact that they break URI
semantics. I think a better way to keep this straight is to think of the
"ADS" identifiers as URNs and the ivo identifiers as URIs.
2. The fragment snafu was an unfortunate event that thanks to Norman was
finally understood now that we have full grasp of the URI issue. I think
we should just deprecate their use and move on.
3. There is little or no evidence that I know of which suggests that the
"ADS" style persistent ids are used as ivo identifiers at the moment. Even
when you look at the few found in the wild (such as a recent ALMA one:
http://labs.adsabs.harvard.edu/adsabs/abs/2014AAS...22335032J/) there is no
"ivo" prefix specified so we are not violating the fragment problem.
4. I have been thinking for a while that given the current situation we
should simply move to a more widely used system for persistent identifiers
such as arks or dois and simply "upgrade" the existing ones by registering
them using the DOI/ark infrastructure. I have spoken to John Kunze at CDL
and Chris Biemesderfer at AAS both of whom think this is a good way to go.
This is similar to the way in which a DOI found in the literature can be
made actionable by prepending http://dx.doi.org/ to it.
5. Since we are talking about identifiers and how they should be designed:
I would strongly suggest you check the proposed syntax and set of allowed
characters currently being considered for datacite DOIs (see
https://mds.datacite.org/static/apidoc). Experience has shown that using
URI-unfriendly characters as part of the identifiers (e.g. ">", "<", "&",
and yes, "?" as well) is asking for trouble. The reason is that these ids
often need to be encoded and embedded in URLs as part of their resolution
process, or encoded in XML or HTML. Anything that requires an
encoding/decoding is bound to create problems compared to a solution that
allows one to simply cut and paste a URI into a browser's location bar
(optionally prefixing it with the base uri of a resolver). So I would vote
NOT to use the "?" character in them.
6. Having said all of this, I still do have one basic question about the
ivo identifiers that you want to use in datalink, based on my current
understanding of them. Specifically, given that these lack persistence and
multiple resolution features, why bother at all rather than using a plain
http uris? I think this question is worth considering now since the
experience with the dataset ids has taught me that unless there are
compelling reason to go with a discipline-specific, custom solution you may
be better off using what the web already gives you for free: namely http
and dns.
Thanks,
-- Alberto
On Thu, Jan 16, 2014 at 3:44 AM, Markus Demleitner <
msdemlei at ari.uni-heidelberg.de> wrote:
> Dear Arnold, dear DAL and DC&P lists,
>
> Introduction for DC&P folks: Over on DAL and Registry, I started a
> discussion on why now is the right time to say a bit more on dataset
> identifiers; see
> http://www.ivoa.net/pipermail/dal/2014-January/006617.html and
> followups, the big point of which was that the fragment identifier
> should be used to identify entities *within* resources, but not the
> resources themselves.
>
> Arnold then mentioned that there are lots of persistent identifiers
> that use the # to refer to resources, to which Norman (who deserves lots
> of credit for having pointed out that we have a problem here,
> http://www.ivoa.net/documents/Notes/URIFragments/index.html) said
> that we're not discussing a matter of technology rather than taste
> here (paraphrasing *very* freely). Which brings me to Arnold's
> second mail:
>
> On Wed, Jan 15, 2014 at 12:22:13PM -0500, Arnold Rots wrote:
> > The question is not so much what the best way is to do this.
> > It would be perfectly fine, I think (and even better), if
> > ivo://ADS/Sa.CXO#obs/05285
> > were written as, e.g.,
> > ivo://ADS/Sa.CXO?type=obs&obsid=05285
> > since it would make resolving the URIs much simpler.
> > However, the issue is that there are tens of thousands of persistent
> > identifiers in existence that need to remain persistent.
>
> I realize it's probably too late to "invalidate" those identifiers.
> It would be highly useful to deprecate them, though, and start
> assigning new identifiers with a question mark. Since I suspect the
> identifiers are to a first order opaque to the client software that
> uses them so far, this can be fixed on the server sides, of which
> there presumably is only a very limited number. The legacy
> identifiers will be an oddity that might confuse future software, but
> that's nothing we can fix at this point.
>
> This is more than an academic discourse. The URI-correct
> interpretation of the fragment identifier becomes important as the
> registry becomes more expressive. In particular, the standard
> semantics -- reference something within the resource referenced -- is
> now used, e.g., to refer to keys in StandardRegExts. And of course I
> may mention your STC library with URIs like
> ivo://STClib/CoordSys#TT-ICRS-TOPO that would have to use a similar
> mechanism.
>
> Also consider the (forseeable) need to reference entities within
> resources; e.g. ivo://example?spect.large#order142 could reference a
> single Echelle order (say). If persistent identifiers were to keep
> on clobbering the hash, this kind of usage is basically blocked (or
> would require ugly heuristic hacks -- "if there's a question mark in
> the URI, the hash is a fragment identifer, otherwise it's really a
> question mark semantically. Ugh.)
>
> So, I'd urge that for future IVORN interoperability that # in
> persistent identifiers be deprecated and their issuers be strongly
> advised to use correct syntax in new ids (while doing some hack to
> still resolve the old ones).
>
> Talking about which: I coulnd't find anything on these identifiers on
> the IVOA documents page. Wouldn't it be good to have something on
> them there? I, for one, managed to totally ignore this (laudable)
> initiative until now. Which is of course is deeply embarrassing (and
> yes, it's also GAVO's fault for not having a single person on the
> DC&P list), but less so as there doesn't seem to be official
> documentation.
>
> Cheers,
>
> Markus
>
>
--
Dr. Alberto Accomazzi
Program Manager
NASA Astrophysics Data System - http://ads.harvard.edu
Harvard-Smithsonian Center for Astrophysics - http://www.cfa.harvard.edu
60 Garden St, MS 83, Cambridge, MA 02138, USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/datacp/attachments/20140116/694c5d0c/attachment.html>
More information about the datacp
mailing list