PubDIDs (and DIDs in general, maybe)

Accomazzi, Alberto aaccomazzi at cfa.harvard.edu
Tue Feb 11 17:20:07 PST 2014


Hi Markus,

Thank you for persevering on this subject!

 On Mon, Feb 3, 2014 at 7:44 AM, Markus Demleitner <
msdemlei at ari.uni-heidelberg.de> wrote:

>
> On the discussion that happened on-list -- one thing we might want to
> think about once more is the question if PubDIDs should/must be
> IVORNs at all. Alberto has, for example, brought DOIs into the
> discussion.
>

The context in which I brought up DOIs was not as a substitute for IVORNs,
but as an example of persistent identifier standard which is already used
by the broader science community.  In my mind, the context in which it
would make sense to use a DOI is at a high level (dataset), not at the data
product level, and its main purpose would be to serve an html "splash page"
as opposed to raw data.  However, I will also note that DOIs can also work
well in a linked data environment, so there are a number of possibilities
for delivering machine-readable metadata using the DOI infrastructure.
Sorry if I confused people with my earlier comments.

There's certainly something to be said for not inventing URI schemes
> when we don't need to.  However, we do have the registry in place and
> quite a bit of our tech relying on it, so this is not about inventing
> URI schemes.  It's about using one we already have.  I don't think
> saying now that IVORNs in PubDID fields should be replaced by DOIs
> (in particular since DOI mining rights aren't free, are they?) would
> be a wise move makeing the VO simpler, more robust, and more useful.
>

Again, what I was trying to do is clarify the DID situation at least as I
see it:

1. The use of IVORNs instead of http URIs comes at a cost which should be
at least considered (but note that I'm not advocating ditching IVORNs)
2. The issue of persistence is orthogonal to the issue of URI schemes, and
no amount of syntax will get around this point
3. The use of ADS persistent ids is for the most part historical and we
should not allow it to get in the way of making progress on this or other
standards; these IDs fall in the same class as the DOIs, i.e. to identify
datasets at a high level rather than to refer to a variety of
manifestations of the data products.

And BTW, the cost of minting DOIs is really minimal given the resources of
the VO as a whole and could be easily done via DataCite.  (I believe the
CADC is already doing this).

Should we, then, let people choose whether to have DOIs or IVORNs in
> the DID fields?
>

Not something I'm advocating, but if you really wanted to do this then
you'd also want to enable content-negotiation to help deliver the data in a
format which is useful to the application.  And if you want to give
choices, then consider supporting IVORNs and http URIs and forget about
everything else.


> But maybe it's time we clarified the use cases for having structured
> PubDIDs anyway.  For me, it's
>
> * Let clients figure out, from the PubDID, how to locate the actual
>   data.
>   (subcase a): enable access to large datasets in this way
>   (subcase b): enable access via existing DAL protocols
>
> If you think anything else (all DIDs should be persistent?  persistent ids
> should at least be supported? ...) should be in scope, this would be
> a good time to speak up.
>

Well, I'll just note that on the issue of persistency the IVOA largely
punted when the IVOA identifiers RFC was adopted because it was (and still
is) a tough nut to crack.  But I do think that persistency can be built on
what we already have without needing to do anything special at the protocol
level.  DOIs are persistent thanks to their resolution service, and
piggy-back on http to deliver the intended content.  Similarly, a
persistent VO resolver service could provide the same functionality in the
context of a IVORN should the IVOA decide to implement that.


> As far as Arnold's "drilling down" is concerned, I share Doug's
> doubts as to adding many parameters to such DIDs.  For one,
> referencing individual sub-entities (via fragment identifiers, which
> is what they're for) is of course always possible.
>
> And certainly, if publishers are free to choose their local parts as
> they like, we can't really keep them from adding as many additional
> parameters as they like, and if these turn out to be arguments for
> their services, we can't send in the police.  But I doubt that that
> should be an encouraged practice.  DID is for Dataset Identifier, not
> Subdataset Identifier.
>
> But maybe there's use cases for Subdataset Identifiers?  What would
> those be?
>

There is precedent in the publishing industry where some journals are
assigning DOIs to article elements such as tables, figures, etc.  For
instance, this PLOS article: 10.1371/journal.pone.0088458  has a figure
with a doi of 10.1371/journal.pone.0088458.g002
I think that so long as you don't try too hard to infer semantics from
identifiers you will be fine no matter what.


>
> Finally, Pat intervened:
>
> On Wed, Jan 29, 2014 at 10:00:16AM -0800, Patrick Dowler wrote:
> > On 28/01/14 01:33 PM, Arnold Rots wrote:
> > >I should add that it does not matter, of course, whether the persistent
> identifier's
> > >root is ivo://ADS/<something>.<something> or a DOI
> >
> > There is one example of such identifiers in use today and it works
> > exactly as designed: vospace resource identifiers. The general form
> > is:
> >
> > vos://<authority>/<path>
> >
> > for example:
> >
> > vos://cadc.nrc.ca!vospace/myProject/myFile
> [...]
> > More fuel for the fire maybe :-)
>
> I'm not sure I understand what this is suggesting.  Is this about an
> alternative to the resolution rule suggested in the original mail?
>
> Here's how I had imagined PubDID resolution:
>
> (a) split the DID at the first ?
> (b) resolve the first part of this in a Registry
> (c) do $SOMETHING with the registry record to obtain a service endpoint
> (d) feed the id to that service endpoint $SOMEHOW
> (e) analyze away.
>
> Of course, SOMETHING needs specification (and SOMEHOW, too, unless
> SOMETHING is just "get a datalink capability"), but is that otherwise
> contentious?
>
> Cheers,
>
>           Markus
>
>


-- 
Dr. Alberto Accomazzi
Program Manager
NASA Astrophysics Data System - http://ads.harvard.edu
Harvard-Smithsonian Center for Astrophysics - http://www.cfa.harvard.edu
60 Garden St, MS 83, Cambridge, MA 02138, USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/dal/attachments/20140211/21c26865/attachment-0001.html>


More information about the dal mailing list