<div dir="ltr">Hi Markus,<div><br></div><div>Thank you for the thorough response on my comments.  I have detailed replies below but to offer an executive summary:</div><div><br></div><div>1. I&#39;m fine with the statement that pubDIDs are neither persistent nor resolvable per-se</div><div>2. However, I think that the capability of resolution should be explicitly exposed and optionally supported through a well-defined mechanism</div><div>3. It seems to me that Datalink would be the natural conduit for providing DID resolution</div><div><br></div><div>Cheers,</div><div>-- Alberto</div><div><br></div><div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Oct 2, 2015 at 4:01 AM, Markus Demleitner <span dir="ltr">&lt;<a href="mailto:msdemlei@ari.uni-heidelberg.de" target="_blank">msdemlei@ari.uni-heidelberg.de</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Dear Alberto,<br>

<span class=""><br>

On Thu, Oct 01, 2015 at 10:50:04PM -0400, Accomazzi, Alberto wrote:<br>

&gt; Sorry for coming late to the discussion, but I have some concerns about<br>

&gt; section 4.1 of the specification (Dataset Identifiers).  What troubles me<br>

&gt; is the resolution of these identifiers, and the fact that the spec itself<br>

&gt; states that &quot;This specification does not exhaustively define the resolution<br>

&gt; of publisher DIDs. Instead, we recommend the following procedure...&quot; .<br>

<br>

</span>I give you the non-exhaustivity is somewhat unfortunate, but PubDIDs<br>

weren&#39;t really designed to resolve, I believe.  They just turned up<br>

in various standards (SSAP, Obscore, some data models, then Datalink,<br>

now SIAv2).  My understanding is that the motivation was to have<br>

globally unique identifiers so you can combine responses from<br>

different services and still can group by something (i.e., the DID)<br>

to tell apart datasets.  Which is a reasonable use case, I&#39;d say.<br></blockquote><div><br></div><div>Ok, this is the part where my bias leads me to think that there is no practical use for an identifier unless it&#39;s actionable (and therefore resolvable).  It seems to me that in practice you are suggesting that the services that emit these identifiers are be able to resolve them at some level, but there is no general normative resolution strategy defined by VO standards.  Note that I am ready to accept your argument that this ain&#39;t necessarily so, and if so I will keep my peace, but it would be nice to have some clarity on this IMHO.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

Now, SSAP said something on how they were to be formed, in a fashion<br>

that was later criticized by Norman; one reason I went into the<br>

trouble of revising Identifiers was an attempt to fix what Norman<br>

criticized.  For the unique-in-union-of-service-responses use case,<br>

that form may not even matter, so I&#39;m (for a change) not blaming<br>

SSAP.  I&#39;m just saying that *if* we want to do other things with the<br>

PubDIDs, and with Datalink we&#39;re starting to do that, we&#39;re smart if<br>

we don&#39;t bend URI rules.<br></blockquote><div><br></div><div>Agreed, I remember that well, and I&#39;m glad to see the realignment of IVOIDs with URIs.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

Now, if we follow URI rules, we suddenly can do some<br>

cute^H^H^H^Hpotentially useful things.  Parsing PubDIDs into Registry<br>

and local part is one of these cute things.  It&#39;s not the reason why<br>

the PubDIDs are there, it&#39;s something that happens to become possible<br>

as we give the URI parts meaning (for perspective: This is where it<br>

started:<br>

<a href="http://mail.ivoa.net/pipermail/registry/2014-January/004905.html" rel="noreferrer" target="_blank">http://mail.ivoa.net/pipermail/registry/2014-January/004905.html</a>, and<br>

what moderate response there was to the questionnaire essentially<br>

advocated something like what&#39;s in now, except of course the<br>

resolution procedure is entirely my fault).</blockquote><div><br></div><div>Ah, thanks, I admit I didn&#39;t chime in when you asked for input back then.  My bad.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class=""><br>

&gt; Here is why I think this is a problem:<br>

&gt;<br>

&gt; - the spec seems to suggests that resolving these is more a matter of<br>

&gt; heuristics than anything else, so different implementors may chose to tweak<br>

&gt; the logic in ways that are not totally consistent<br>

<br>

</span>True.  Unless we mandate a common interface on all services using<br>

PubDIDs (either on the service interfaces themselves, or in a<br>

separate, say, datalink capability), I think there&#39;s little we can do<br>

about this.  Of course it&#39;d have been great if we could just say<br>

&quot;grab this, this, and this capability and then query<br>

&lt;endpoint&gt;?ID=pubDID, but I guess we don&#39;t want to change the<br>

respective standards, least of all for something that&#39;s probably not<br>

going to be an important use case in the first place.</blockquote><div><br></div><div>Well you could imagine a scenario in which you say &quot;if you are going to mint pubDIDs, then you must provide a service for resolving them.&quot;  The service could very well be a Datalink endpoint, but in theory it could also be something else which returns standard metadata, and it would have to be defined in the Registry.  Based on my quick read of the Datalink standard I see no reason why it couldn&#39;t provide the kind of resolution I&#39;m thinking about, but I admit that I don&#39;t fully understand all the details.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class=""><br>

&gt; - even if the recipe were prescriptive, an addition of a new capability<br>

&gt; (e.g. Datalink in addition to SSA) could potentially change the way a<br>

&gt; particular DID is resolved, thus yielding a different result at a later time<br>

<br>

</span>True.  I&#39;d claim this is an advantage.  I don&#39;t mean PubDIDs to<br>

supplant DOIs, I don&#39;t think people should be referencing them, at<br>

least not in the (annoying) &quot;preformance metric&quot; abuse of doing<br>

citations; as to &quot;here&#39;s what you need to reproduce my results&quot;, I<br>

believe letting data providers go with progress is really an<br>

advantage.  If your result depended on the concrete data format, it<br>

was probably wrong in the first place...<br></blockquote><div><br></div><div>Ok, agree with you.  I wasn&#39;t trying to imply that there needed to be persistence associated with the results returned via the resolution process (or even persistence of the identifier).  So long as we agree that the semantics behind an identifier should not change I&#39;m fine (i.e. the &quot;thing&quot; that ivo://org.gavo.dc/feros/q/ssa?f04031.bdf points to is always the same entity, although its particular manifestations may change in time).</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

For actual work, I claim it&#39;s an advantage if the resolution result<br>

can change over time.  Consider a spectrum you got through SSA.<br>

After a while, the publication changes, and the thing goes to<br>

Obscore+Datalink.  What changes now is that you get more metadata<br>

(from obscore) and you get potentially a lot more data links, which<br>

might, for instance, tell you there further processing has been done<br>

or that there&#39;s now a flux calibrated version of your dataset.  If<br>

data providers are really careful, they might still provide the<br>

&quot;original&quot; dataset alongside the re-reduced ones in the datalink<br>

result.<br>

<br>

Of course, it may be that your legacy client can&#39;t deal with datalink<br>

results or can&#39;t speak TAP/Obscore.  Given the choice between this<br>

and evolvability, I&#39;d go for evolvability: If you will, this is<br>

intended as &quot;operational&quot;, not (really) &quot;curational&quot;.<br>

<span class=""><br>

&gt; - there is no hint of what would be returned when a client tries to resolve<br>

&gt; one of these DIDs, which I think is a problem for any application which<br>

&gt; wants to do something with them<br>

<br>

</span>True.  But that&#39;s really true of all those services.  The VO (perhaps<br>

unfortunately) lets people toss in datasets of all sorts.  The<br>

protocols mentioned let you discover a media type, so if you really<br>

wanted you could have additional information (of course, in ways<br>

depending on the access protocol -- sigh) before actually going for<br>

it, but that is, I believe, not something any existing VO client<br>

actually does.  As far as I can tell, all of the jump first and see<br>

if they fall later.<br>

<span class=""><br>

&gt; Ultimately I am still confused as to the role and usefulness of these DIDs:<br>

&gt; they are not persistent, are difficult to resolve (it seems), and there is<br>

&gt; no infrastructure for returning standard metadata about the resource that<br>

&gt; they point to (is this correct?).  Which makes me wonder why one would not<br>

&gt; want to use DIDs rather than plain http URIs for retrieval or more durable<br>

&gt; identifiers if persistence and metadata registration is required.<br>

<br>

</span>Essentially, there can be a 1:n relationship between PubDIDs and<br>

access urls, for instance, when there&#39;s different formats of a<br>

dataset, or there&#39;s the think itself and the associated datalink<br>

document.  This one is a simple example:<br>

<br>

<a href="http://dc.g-vo.org/ivoidval/q/didresolve/form?__nevow_form__=genForm&amp;pub_did=ivo%3A%2F%2Forg.gavo.dc%2Fferos%2Fq%2Fssa%3Ff04031.bdf" rel="noreferrer" target="_blank">http://dc.g-vo.org/ivoidval/q/didresolve/form?__nevow_form__=genForm&amp;pub_did=ivo%3A%2F%2Forg.gavo.dc%2Fferos%2Fq%2Fssa%3Ff04031.bdf</a><br>

<br>

PubDIDs let you detect such situations even in large bags of data<br>

from different services.  That, I think is the entire reason why they<br>

were introduced.  And given the requirements on DOI-referenced<br>

datasets I claim we can&#39;t used DOIs for that purpose.<br></blockquote><div><br></div><div>Agree.  Just to be clear: I&#39;m not suggesting the use of DOIs in place of PubDIDs at all.  I&#39;m simply trying to explore if and how we can use existing VO infrastructure to solve some of the problems related to dataset publication and preservation.  And none of the use cases I have in mind include a one-to-one assignment of a DOI to a PubDID. </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

That, of course, doesn&#39;t mean we have to pretend PubDIDs are<br>

resolvable.<br>

<br>

The part about PubDID resolution was by far the most contentious one<br>

of the whole standard.  Since global PubDID resolution is, I believe,<br>

more a gimmick than something centrally important, I could well<br>

leave it out.  The procedure described would still work, so there&#39;s<br>

not even any harm done.<br></blockquote><div><br></div><div>I would suggest that we should at least consider the scenario where resolution is assured under certain circumstances (which are under the control of the data provider).  This could be simply indicated by the presence of a Datalink endpoint with an optional attribute.  Why bother with this?  Because if I know that I have a resolution service which emits standard metadata records then I can at least begin to contemplate registering collections of such identifiers with a persistent id some day.  If instead these pubDIDs aren&#39;t actionable then I&#39;ll be looking to build these collections out of HTTP URIs or something else.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

So, here&#39;s my offer: If you want this out and care enough, speak up<br>

(or say: put it into an appendix and have a fat (red?)<br>

&quot;non-normative&quot; in its title).  If you think it&#39;s cute and it can<br>

remain in (and care enough), speak up, too.  I&#39;ll take private votes<br>

if you&#39;re shy, and will summarise on-list if necessary.<br>

<br>

If there&#39;s no signal, I&#39;d take the liberty to take PubDID resolution<br>

into TCG review and let them shoot it down if they want.  If there&#39;s<br>

mainly negative signals, I&#39;ll take it out without further griping.<br></blockquote><div><br></div><div>Well, I spoke up, so you know my point of view.  Is it silly to think that he resolution bit belongs in a separate spec? (And is it realistic to think that that spec will get written anytime soon?)  I note that RFC 3986 does not discuss the actual resolution mechanism except for the relative reference within a URI, so I think the document can stand as is without the section in question.</div><div><br></div><div>Cheers,</div><div>-- Alberto</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

Cheers,<br>

<br>

           Markus<br>

<br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">Dr. Alberto Accomazzi<br>Program Manager<br>NASA Astrophysics Data System - <a href="http://ads.harvard.edu" target="_blank">http://ads.harvard.edu</a><br>Harvard-Smithsonian Center for Astrophysics - <a href="http://www.cfa.harvard.edu" target="_blank">http://www.cfa.harvard.edu</a><br>60 Garden St, MS 83, Cambridge, MA 02138, USA</div>

</div></div></div>