PubDIDs (and DIDs in general, maybe)

Mon Feb 3 04:44:07 PST 2014

Dear List,

I promised to summarize responses I get via personal mail; there was
only one, voting for leaving the query part of the IVORN to the
publisher, require a relation of the resource referenced in the
PubDID to a service allowing access to the dataset (but not
necessarily to datalink), and have the future PubDID Good Advice in
the Identifiers REC.

On the discussion that happened on-list -- one thing we might want to
think about once more is the question if PubDIDs should/must be
IVORNs at all. Alberto has, for example, brought DOIs into the
discussion.

There's certainly something to be said for not inventing URI schemes
when we don't need to.  However, we do have the registry in place and
quite a bit of our tech relying on it, so this is not about inventing
URI schemes.  It's about using one we already have.  I don't think
saying now that IVORNs in PubDID fields should be replaced by DOIs
(in particular since DOI mining rights aren't free, are they?) would
be a wise move makeing the VO simpler, more robust, and more useful.

Should we, then, let people choose whether to have DOIs or IVORNs in
the DID fields?

My take: No.  The more features clients may have to deal with, the
less likely it is that something actually works.

But maybe it's time we clarified the use cases for having structured
PubDIDs anyway.  For me, it's

* Let clients figure out, from the PubDID, how to locate the actual
  data.
  (subcase a): enable access to large datasets in this way
  (subcase b): enable access via existing DAL protocols

If you think anything else (all DIDs should be persistent?  persistent ids
should at least be supported? ...) should be in scope, this would be
a good time to speak up.

As far as Arnold's "drilling down" is concerned, I share Doug's
doubts as to adding many parameters to such DIDs.  For one,
referencing individual sub-entities (via fragment identifiers, which
is what they're for) is of course always possible.  

And certainly, if publishers are free to choose their local parts as
they like, we can't really keep them from adding as many additional
parameters as they like, and if these turn out to be arguments for
their services, we can't send in the police.  But I doubt that that
should be an encouraged practice.  DID is for Dataset Identifier, not
Subdataset Identifier.  

But maybe there's use cases for Subdataset Identifiers?  What would
those be?

Finally, Pat intervened:

On Wed, Jan 29, 2014 at 10:00:16AM -0800, Patrick Dowler wrote:
> On 28/01/14 01:33 PM, Arnold Rots wrote:
> >I should add that it does not matter, of course, whether the persistent identifier's
> >root is ivo://ADS/<something>.<something> or a DOI
> 
> There is one example of such identifiers in use today and it works
> exactly as designed: vospace resource identifiers. The general form
> is:
> 
> vos://<authority>/<path>
> 
> for example:
> 
> vos://cadc.nrc.ca!vospace/myProject/myFile
[...]
> More fuel for the fire maybe :-)

I'm not sure I understand what this is suggesting.  Is this about an
alternative to the resolution rule suggested in the original mail?

Here's how I had imagined PubDID resolution:

(a) split the DID at the first ?
(b) resolve the first part of this in a Registry
(c) do $SOMETHING with the registry record to obtain a service endpoint
(d) feed the id to that service endpoint $SOMEHOW
(e) analyze away.

Of course, SOMETHING needs specification (and SOMEHOW, too, unless
SOMETHING is just "get a datalink capability"), but is that otherwise
contentious?

Cheers,

          Markus