PubDIDs (and DIDs in general, maybe)

Tue Jan 28 13:33:23 PST 2014

The reason I was contemplating, in an earlier post, replacing the # by ?
was that it would allow parameterization of the identifiers.
The advantage is that it can implement very flexible drilling into datasets
without increasing the number of identifiers in the registry.

In the case quoted in earlier posts ivo://ADS/Sa.CXO#obs/05285
currently translates into (I think):
http://cda.harvard.edu/chaser/searchOcat.do?obsid=05285
and the ADS keeps a full lookup table for all dataset identifiers.
That URL brings the client to a landing page where some tar packages
can be selected.

If, instead, the identifier were written as:
ivo://ADS/Sa.CXO?obsid=05285
then the ADS lookup service would only need to know that ivo://ADS/Sa.CXO
translates into http://cda.harvard.edu/chaser/searchOcat.do, for all Chandra
identifiers.

That simplifies matters already, but in addition one can allow extensions
that
drill down directly to individual files in the package:
ivo://ADS/Sa.CXO?obsid=05285&type=event&level=2
Everything after the question mark gets passed on to the server.
If the server is smart, one might even allow drilling down into the file,
selecting columns:
ivo://ADS/Sa.CXO?obsid=05285&type=event&level=2&column=Time,pha
or just particular values:
i
vo://ADS/Sa.CXO?obsid=05285&type=event&level=2&column=Time,pha&tstart=2010-04-15T12:30:36
&tstop=2010-04-15T13:23:00

This may not be a high priority for the current use of dataset identifiers
linking entire datasets to papers, but it would be extremely useful when we
start using persistent identifiers for published data in data discovery and
focused data mining.

In short: the persistent identifier registry only needs to be aware of the
part in front of  '?' (%3F), and then it is up to the service to define what
parameters it allows (and that functionality needs to be queriable, of
course);
potentially that single identifier can stand for an infinite number of
identifier instances.
I should add that it does not matter, of course, whether the persistent
identifier's
root is ivo://ADS/<something>.<something> or a DOI

Cheers,

  - Arnold

-------------------------------------------------------------------------------------------------------------
Arnold H. Rots                                          Chandra X-ray
Science Center
Smithsonian Astrophysical Observatory                   tel:  +1 617 496
7701
60 Garden Street, MS 67                                      fax:  +1 617
495 7356
Cambridge, MA 02138
arots at cfa.harvard.edu
USA
http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------------------------------------------

On Thu, Jan 16, 2014 at 7:54 PM, Accomazzi, Alberto <
aaccomazzi at cfa.harvard.edu> wrote:

> At the danger of stating the obvious: we all know that Norman speaks the
> truth.
>
> Thanks for catching my URN vs. URI mangling -- I admit I hadn't looked up
> the definition of either one in quite a while.  But despite the misusage of
> terms, my point was that the ADS persistent ids were not born as IVORNs for
> both practical and political reasons, and I don't think it's worth
> agonizing about whether or not they can/should be retrofitted into that
> scheme now.  However, if agonize we must, one way out of this IMHO is to
> simply say the following:
>
> 1. the resource persistent identifier is: ADS/Sa.CXO#obs/05285
> 2. its corresponding IVO URI is: ivo://ADS/Sa.CXO%23obs/05285
> 3. its actionable URL is (as of today):
> http://vo.ads.harvard.edu/dv/DataResolver.cgi?ADS%2FSa.CXO%23obs%2F05285
>
> i.e. there is a URL-encoding step in going from the identifier to the
> URIs.  Doesn't look as pretty as we might have wanted, but it works.
>
> As far as managing these identifiers, let me add a pointer to the EZID
> system that CDL uses for its datacite DOIs and arks: http://n2t.net/ezid/
> The resolver and registry that they maintain could easily support the ivo
> URI scheme if we wanted to, but again no need to go that route unless we
> need it for something that plain http doesn't already provide.
>
> Cheers,
> -- Alberto
>
>
>
> On Thu, Jan 16, 2014 at 1:46 PM, Norman Gray <norman at astro.gla.ac.uk>wrote:
>
>>
>> Alberto and all, hello.
>>
>> On 2014 Jan 16, at 15:14, Accomazzi, Alberto <aaccomazzi at cfa.harvard.edu>
>> wrote:
>>
>> +1 generally, but...
>>
>> > I think a better way to keep this straight is to think of the "ADS"
>> identifiers as URNs and the ivo identifiers as URIs.
>>
>> Unleashing my inner lawyer: recall that URNs are (according to RFC 2396)
>> merely one of the two types of URIs, namely "the subset of URI that are
>> required to remain globally unique and persistent even when the resource
>> ceases to exist or becomes unavailable."
>>
>> RFC 3968 <https://www.ietf.org/rfc/rfc3986.txt> mentions that '[a] URI
>> can be further classified as a locator, a name, or both', and that '[t]he
>> term "Uniform Resource Name" (URN) has been used historically to refer to
>> both URIs under the "urn" scheme', but that 'Future specifications and
>> related documentation should use the general term "URI" rather than the
>> more restrictive terms "URL" and "URN".'
>>
>> All that said...
>>
>> > 6. Having said all of this, I still do have one basic question about
>> the ivo identifiers that you want to use in datalink, based on my current
>> understanding of them.  Specifically, given that these lack persistence and
>> multiple resolution features, why bother at all rather than using a plain
>> http uris?  I think this question is worth considering now since the
>> experience with the dataset ids has taught me that unless there are
>> compelling reason to go with a discipline-specific, custom solution you may
>> be better off using what the web already gives you for free: namely http
>> and dns.
>>
>> I think this is a really important point, which isn't made often enough
>> (cue hobbyhorse).  Without _necessarily_ discounting the existence of such
>> 'compelling reasons', non-standard schemes do come with a cost, and they're
>> not magic, so that if your resolution mechanism disappears, a URN-named
>> object is just as lost, and just as nameless, as one named with a 404ed
>> HTTP URI.
>>
>> I remember a workshop on persistent identifiers of a few years ago, where
>> Stuart Weibel (I think; or it may have been John Kunze) made this point
>> very convincingly.  Something under purl.org or under id.loc.gov has an
>> "institutional commitment to persistence" which is worth an awful lot more
>> than any amount of indirection that you get through a fancy URI scheme.  As
>> Stuart (or whoever) said , "loc.gov isn't going away any time soon".
>>
>> DOIs do, I think, have a pretty compelling reason to be a special URI
>> scheme, but the thing that's key about DOIs is not the scheme, or the
>> Handle-based lookup mechanism, but precisely the "institutional commitment
>> to persistence" that they represent.
>>
>> I don't plan to reopen any discussion here about IVORNs -- fear not,
>> everyone -- but will simply note that, on general principles, obsessing
>> about the punctuation of URIs is probably a distant second in importance to
>> developing and planning these sorts of institutional commitments within the
>> IVOA.
>>
>> All the best,
>>
>> Norman
>>
>>
>> --
>> Norman Gray  :  http://nxg.me.uk
>> SUPA School of Physics and Astronomy, University of Glasgow, UK
>>
>>
>
>
> --
> Dr. Alberto Accomazzi
> Program Manager
> NASA Astrophysics Data System - http://ads.harvard.edu
> Harvard-Smithsonian Center for Astrophysics - http://www.cfa.harvard.edu
> 60 Garden St, MS 83, Cambridge, MA 02138, USA
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/datacp/attachments/20140128/ec165519/attachment.html>