Identifiers 2.0 Public RFC results

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Tue Oct 6 11:00:45 CEST 2015


Hi Alberto,

On Mon, Oct 05, 2015 at 11:50:21AM -0400, Accomazzi, Alberto wrote:
> 1. I'm fine with the statement that pubDIDs are neither persistent nor
> resolvable per-se
> 2. However, I think that the capability of resolution should be explicitly
> exposed and optionally supported through a well-defined mechanism
> 3. It seems to me that Datalink would be the natural conduit for providing
> DID resolution

Let me give an executive summary, too:

I don't think I can require datalink interfaces on all services using
PubDIDs in Identifiers.  Without doing this, I don't see how to
improve on the resolution rules; hence, I'm increasingly leaning
saying less on resolution. I'm proposing a change in that direction
at the foot of this mail.


And here's the long-winding prose leading me there:

> On Fri, Oct 2, 2015 at 4:01 AM, Markus Demleitner <
> msdemlei at ari.uni-heidelberg.de> wrote:
> > now SIAv2).  My understanding is that the motivation was to have
> > globally unique identifiers so you can combine responses from
> > different services and still can group by something (i.e., the DID)
> > to tell apart datasets.  Which is a reasonable use case, I'd say.
> 
> Ok, this is the part where my bias leads me to think that there is no
> practical use for an identifier unless it's actionable (and therefore
> resolvable).  It seems to me that in practice you are suggesting that the
> services that emit these identifiers are be able to resolve them at some
> level, but there is no general normative resolution strategy defined by VO
> standards.  Note that I am ready to accept your argument that this ain't

What I'm trying to say is: ObsTAP and SSAP are too different to allow
a common interface to get from PubDID to dataset.

Hence, when we are resolved to have uniform PubDID resolution, we
will have to pose extra requirements an *all* ObsTAP, SSA, SIAv2 (and
potentially Datalink) services, and whatever else will use PubDIDs.

I am fairly sure increasing the implementation cost for these
standards is not worth the extra benefit -- as I'm pointing out
below, it'll still not really help persistence.  

But then I don't quite feel strongly about PubDIDs being actionable
beyond "you can figure out what service the PubDID originated from
with a bit of patience by using the Registry and its relations.  If
it turns out you speak that service's protocol, you can then retrieve
the dataset."  I wonder if I should.

> Well you could imagine a scenario in which you say "if you are going to
> mint pubDIDs, then you must provide a service for resolving them."  The

That is essentially the case right now.  PubDIDs come into being
because *some* service spits them out.  What Identifiers 2 adds that
it makes explicit the formerly more implicit requirement that a
certain part of the PubDID actually resolves in the Registry.

Incidentally, I have no illusion that even that requirement will
always be satisfied.  Right now, people use fantasy strings in their
DIDs or simply neglect registration, and I have little hope for this
to improve any time soon.  I'm mentioning this to solicit some
sympathy for my assessment PubDID resolution is a minor use case.

> service could very well be a Datalink endpoint, but in theory it could also
> be something else which returns standard metadata, and it would have to be
> defined in the Registry.  Based on my quick read of the Datalink standard I

This "standard metadata" brings us into the realm of data modelling.
Sure, I'd be cool if we'd say "embed ObsCore metadata into into your
Datalink response", and I'm sure interesting use could be made of
this, but it'd again raise the bar for providing VO compliant
services.  That's a major thing that I certainly don't want to
introduce through Identifiers.

> (or even persistence of the identifier).  So long as we agree that the
> semantics behind an identifier should not change I'm fine (i.e. the "thing"
> that ivo://org.gavo.dc/feros/q/ssa?f04031.bdf points to is always the same
> entity, although its particular manifestations may change in time).

The standard at this point says (in section 3):

  Furthermore, the identifier SHOULD refer to at most one resource over
  all time; that is, IVOIDs should not be reused for unrelated
  resouces. Note that a resource may potentially be dynamic (such as
  'weather at telescope' or 'current version of the standard') - here,
  there is a conceptually unique resource, even though the content of
  it may change in time.

Is that language strong enough in your assessment?  I could be swayed
to make that SHOULD a MUST, but I'm always reluctant making strong
statements I can't really write a validator for.

> > The part about PubDID resolution was by far the most contentious one
> > of the whole standard.  Since global PubDID resolution is, I believe,
> > more a gimmick than something centrally important, I could well
> > leave it out.  The procedure described would still work, so there's
> > not even any harm done.
> >
> 
> I would suggest that we should at least consider the scenario where
> resolution is assured under certain circumstances (which are under the
> control of the data provider).  This could be simply indicated by the
> presence of a Datalink endpoint with an optional attribute.  Why bother
> with this?  Because if I know that I have a resolution service which emits
> standard metadata records then I can at least begin to contemplate
> registering collections of such identifiers with a persistent id some day.
> If instead these pubDIDs aren't actionable then I'll be looking to build
> these collections out of HTTP URIs or something else.

What kind of text change would that imply?  I just realise that I
indeed do not say something like

  "If the Registry part of a pubDID refers to a datalink service, that
  service MUST be able to resolve that pubDID."

But then I don't think I can really do this, because that'd
essentially mean that people can't take data out from datalink
services, and DAL would understanably be cross with Registry if we
proposed that.

Alternatively, we could mandate that deleted data in Datalink
services has to leave a "stub" so things at least resolve to either
"moved" or "gone".  But then we're sneaking in persistency features
into Datalink, and once I take off my datalink tophat and pull on my
cool implementor's hood, I'll have to speak out against that.

So -- I don't think any existing IVOA protocol is a suitable basis
for persistent identifiers.  And without those, there's no assured
resolution.   Sigh -- persistence is hard.

> 
> > So, here's my offer: If you want this out and care enough, speak up
> > (or say: put it into an appendix and have a fat (red?)
> > "non-normative" in its title).  If you think it's cute and it can
> > remain in (and care enough), speak up, too.  I'll take private votes
> > if you're shy, and will summarise on-list if necessary.
> >
> > If there's no signal, I'd take the liberty to take PubDID resolution
> > into TCG review and let them shoot it down if they want.  If there's
> > mainly negative signals, I'll take it out without further griping.
> >
> 
> Well, I spoke up, so you know my point of view.  Is it silly to think that
> he resolution bit belongs in a separate spec? (And is it realistic to think
> that that spec will get written anytime soon?)  I note that RFC 3986 does
> not discuss the actual resolution mechanism except for the relative
> reference within a URI, so I think the document can stand as is without the
> section in question.

Ok -- I take that as a "in sum, take out resolution rules".  Since I am
fairly sure nobody will write a standard on resolving PubDID, and I
won't hold my breath for someone to take up the "standard metadata"
part, I feel the document might still hint at the possibility of
resolving PubDIDs.  


So, *SUGGESTED TEXT CHANGE*: We'd strike the text on p. 17 between
"This specification" and "suitable TAP capabilities" and would append
after the paragraph "Existing DIDs..." something like the following:

  Note that publishers have no obligation to ensure continued access
  to datasets identified with PubDIDs; they are *not* persistent
  identifiers but mainly intended to provide globally unique
  identifiers for use in, e.g., federating responses from different
  services.  
  
  Publishers are, however, encouraged to provide PubDID-resolving
  services (like Datalink, Obscore, or SSA) as a capability of the
  resource referenced by the Registry part of a PubDID [strike that?
  or as a capability of a resource declared as served-by by that
  resource].  This allows clients to resolve stand-alone PubDIDs,
  which is generally desirable.

I'm tempted to cite my global PubDID resolver 
http://dc.g-vo.org/ivoidval/q/didresolve/form
in the second paragraph.

Would that be a step forward?

Or should I say even less?

What I'd like to avoid is a time-like forward-reference to an
upcoming standard -- we have too many time-like forward references
dangling in our VO documents already.

Cheers,

         Markus



More information about the registry mailing list