PubDIDs (and DIDs in general, maybe)
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Tue Jan 14 04:54:48 PST 2014
Dear Registry and DAL,
First, apologies for crossposting *again*, but somehow the "IVORNs
in DAL protocols" theme is haunting me quite a bit these days.
With the upcoming datalink standard, our dataset identifiers, in
particular the publisher dataset identifiers, will gain importance,
as the protocol's primary parameter will typically be the PubDID.
I hence believe this would be the perfect time to give some Good
Advice (or "Best Practice" if you prefer) on how these should look
like, in particular with a view to Norman's note at
http://www.ivoa.net/documents/Notes/URIFragments/index.html
(the TL;DR of which is: Don't use URI fragment identifiers -- the
things behind a # -- to reference separate resources).
To outline the problem for newcomers: SSA, ObsCore and probably some
others have the concept of dataset identifiers that are supposed to be
IVORNs (i.e., URIs with an ivo:// scheme). These IVORNs in turn are
supposed to resolve in a registry. Of course, you cannot and should not
register all your images (or whatever). Hence, the plan is to register
data collections (or abuse, e.g., the authority record) as "roots" for
the DIDs and add "local parts" after a stop character, i.e., a magic
character that says "split here and only resolve what's before me
in the registry".
The current IVOA Indentifiers REC (v. 1.12) defines two stop characters,
the # and the ?. For one reason or another, current PubDIDs lean
towards using the # (disclaimer: this is anecdotal based on what I've
seen while doing other things), which leads to PubDIDs somewhat like
[*]ivo://dc.g-vo.org#myGigaAndromedaCube.fits
or
[*]ivo://dc.g-vo.org/services/gigacubes#andromeda
or somesuch, where the [*] is supposed to indicate deprecation, since
after Norman's note, that practice should clearly be phased out.
This leaves the ? stop character, which, I suppose, is suitable
according to our own IVORN rules as well as rules for URIs in general.
There are, however, still some free parameters in building the DIDs, and
I can imagine quite a few scenarios where having conventions in
the DIDs might be useful or even really useful.
One of those is:
(a) should we have HTTP-URL-like keyword-value pairs after the stop
char or just the path?
That is, should the new PubDIDs look like
ivo://dc.g-vo.org?did=gigaAndromedaCube
or do we just dump the path like this:
ivo://dc.g-vo.org?gigaAndromedaCube
There's something to be said for both options (the second is more
straightforward and simpler, the first lets us encode other semantics
in the keyword if we need it) -- opinions?
Even if you don't see a need or can't be bothered to put your opinion
into words, you could help me by replying to me personally, checking one
of the following boxes (I'll summarize on-list if I get votes):
[ ] with did=x [ ] just the local identifier
[ ] should be left to the publisher [ ] undecided
(b) Should there be recommendations or even requirements for the base
URI? One thing that has come up now and then is to require that the
resources they resolve to are datalink services (if we want those in the
registry at all) or at least should have some relationship to a
service that allows access to the dataset referred to.
Again, if you'd like do vote:
[ ] require relation [ ] leave open [ ] forbid
[ ] undecided
(c) Based on my own doubts when I had to come up with various PubDID
schemes in my resources, I do believe this Good Advice should be made
explicit in a prominent place. I believe two paragraphs would do, so
something like a non-normative sidebar would do just fine. Probably
even an appendix would be overdoing it.
The question is: where could that reside? A natural place would be
the Identifiers REC, but I don't see anyone taking that up any time
soon (though I'd argue that removing the XML form of identifiers
would be a positive move). An alternative would be datalink close to
the introduction of the ID parameter, where it could illustrate what
kind of value there'd typically be in that parameter. Or, we could
expand what one of obscore, SSAP, or SIAv2 have to say about their
DIDs. For the record, I'd like a sidebar in datalink most (but I
acknowledge the desire to keep that document short).
So, the options to me look like this:
[ ] Identifiers REC [ ] Datalink [ ] Next available DAL Spec
[ ] Obscore [ ] I want a Note on this [ ] Oh, just leave it alone
Did I miss something DID-wise?
And I suggest to keep the discussion of this topic entirely on the
DAL list rather than keeping it on both lists. I submit persons on
the Registry list interested in this but not on the DAL list should
be sufficiently alerted to a possible discussion by this initial mail.
Thanks,
Markus
More information about the registry
mailing list