PubDIDs (and DIDs in general, maybe)
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Thu Jan 16 00:44:38 PST 2014
Dear Arnold, dear DAL and DC&P lists,
Introduction for DC&P folks: Over on DAL and Registry, I started a
discussion on why now is the right time to say a bit more on dataset
identifiers; see
http://www.ivoa.net/pipermail/dal/2014-January/006617.html and
followups, the big point of which was that the fragment identifier
should be used to identify entities *within* resources, but not the
resources themselves.
Arnold then mentioned that there are lots of persistent identifiers
that use the # to refer to resources, to which Norman (who deserves lots
of credit for having pointed out that we have a problem here,
http://www.ivoa.net/documents/Notes/URIFragments/index.html) said
that we're not discussing a matter of technology rather than taste
here (paraphrasing *very* freely). Which brings me to Arnold's
second mail:
On Wed, Jan 15, 2014 at 12:22:13PM -0500, Arnold Rots wrote:
> The question is not so much what the best way is to do this.
> It would be perfectly fine, I think (and even better), if
> ivo://ADS/Sa.CXO#obs/05285
> were written as, e.g.,
> ivo://ADS/Sa.CXO?type=obs&obsid=05285
> since it would make resolving the URIs much simpler.
> However, the issue is that there are tens of thousands of persistent
> identifiers in existence that need to remain persistent.
I realize it's probably too late to "invalidate" those identifiers.
It would be highly useful to deprecate them, though, and start
assigning new identifiers with a question mark. Since I suspect the
identifiers are to a first order opaque to the client software that
uses them so far, this can be fixed on the server sides, of which
there presumably is only a very limited number. The legacy
identifiers will be an oddity that might confuse future software, but
that's nothing we can fix at this point.
This is more than an academic discourse. The URI-correct
interpretation of the fragment identifier becomes important as the
registry becomes more expressive. In particular, the standard
semantics -- reference something within the resource referenced -- is
now used, e.g., to refer to keys in StandardRegExts. And of course I
may mention your STC library with URIs like
ivo://STClib/CoordSys#TT-ICRS-TOPO that would have to use a similar
mechanism.
Also consider the (forseeable) need to reference entities within
resources; e.g. ivo://example?spect.large#order142 could reference a
single Echelle order (say). If persistent identifiers were to keep
on clobbering the hash, this kind of usage is basically blocked (or
would require ugly heuristic hacks -- "if there's a question mark in
the URI, the hash is a fragment identifer, otherwise it's really a
question mark semantically. Ugh.)
So, I'd urge that for future IVORN interoperability that # in
persistent identifiers be deprecated and their issuers be strongly
advised to use correct syntax in new ids (while doing some hack to
still resolve the old ones).
Talking about which: I coulnd't find anything on these identifiers on
the IVOA documents page. Wouldn't it be good to have something on
them there? I, for one, managed to totally ignore this (laudable)
initiative until now. Which is of course is deeply embarrassing (and
yes, it's also GAVO's fault for not having a single person on the
DC&P list), but less so as there doesn't seem to be official
documentation.
Cheers,
Markus
More information about the datacp
mailing list