Identifiers 2.0 pro-/regress

Tue Jan 27 15:34:16 CET 2015

Dear Registry WG,

Those subscribed to Volute log messages will have noticed I've just
spent some quality time further revising the Identifiers document and
only now I start to appreciate the various subtleties.

In fact, IVORNs are a fairly restricted subset of URIs.  For
instance, percent-encoding is fairly obviously not supported in
IVORNs.  Now, this may not be a big deal, but frankly I'd have fairly
happily dumped a %E1 (or %20) into an IVORN if the need had
arisen[1].  And so it's actually fairly involved to see if something
that's fine as a URI really qualifies as an IVORN.

Frankly, I'm not quite sure why we should go into the trouble of
overriding all the perfectly good BNF from RFC 3986.  Sure, we should
put some restrictions (nothing funny in the authorities, no queries
and fragments in whatever you resolve in a registry,...), but I'd
think we should say "IVORNs are run-of-the-mill RFC 3986-type URIs,
*except* (a), (b), and (c)" rather than trying to pretend Identifiers
could possibly work without RFC 3986.

Now, that would finally be a complete rewrite of the document, and
before I even embark on the journey there:

(a) If any of the old authors still remember and care to comment: Why
wasn't it written as URI-RFC plus exceptions in the first place?  Is
there anything I'm missing badly?

(b) I'd try to arrive at a document that's not *terribly* more
permissive than what we have now, but I guess a few IVORN forms
would be legal in addition we'll have.  My main spot of trouble
right now is percent-encoding.  It's a hassle and interoperability
hazard if we allow it (because it makes comparisons difficult if
you're not careful), but if we disallow it we severely restrict
what kind of identifiers can be used, and people will percent-encode
after all the first time there's a blank in one of their file names.
Any advice?

(c) Although is version 2.0 and such some breakage could be accepted,
I had wanted to keep things that actually are in use the way they
are.  But now that it turns out we need to dig a bit deeper anyway:
What about IVORN comparison?  In effect, we've claimed IVORNs are to
be compared case-insensitively (though it turns out that Identifiers
1.x only had a "should" there).  For quite some time now I've been
panhandling the idea that this ruling is, as all case-folding things
are, a big pain in the neck.  Do we want to get rid of it?  I'd like
to as a matter of principle, but I shudder when I think of the chaos
that will ensue with current standards written to deal with the
painful case insensitivity (*cough* RegTAP *cough*).  If someone has
strong feelings there, I'd like to hear of them.[2]

So much for a minor update...

Cheers,

       Markus

[1] It may be reassuring that the current registry does not seem to
contain resources with ids that have % in them...

[2] For inspiration on the world of pain that is URL comparison,
enjoy section 6.2 of http://www.ietf.org/rfc/rfc3986.txt; it's clear
that our Identifiers spec will have to unambiguosly say what part of
the "comparison ladder" VO applications will have to climb; saying
"simple string comparison must be enough, so you're not allowed to
percent-encode anything you don't absolutely have to" would be dandy,
but then you'd still have to case fold the hex characters in
pct-encoded sequences.  Oh my.