Identifiers 2.0 new internal WD
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Wed May 13 14:38:05 CEST 2015
Hi Norman,
On Tue, May 12, 2015 at 05:49:52PM +0100, Norman Gray wrote:
> p4, just before Sect. 1.1: 'this standard sets the parameters left
> open defined for application use'. It looks as if this sentence
> has become a little garbled. Also, it would be useful (for more
> legalistic readers such as me) to highlight which
> 'implementation-defined' options are being defined. Do you mean
> simply 'we set the <scheme> to be "ivo:" '? In that case, I think
> this sentence looks more heavyweight than intended, and might be
> simply omitted, or replaced by a reference to Sect. 2.
Well, what I wanted to express here is that we define what our
authorities look like, how to compare our URIs, and so forth.
RFC 3986 actually gives you quite a bit of freedom in crafting your
URIs' shape and machinery to match your use cases.
As to the formulation, Mark had already noticed the sentence was
broken, and with his feedback the current formulation is:
Essentially, this standard sets the parameters left open for
application use by RFC 3986.
Do you think that better matches what I'm trying to say?
> If it's not _necessary_ to permit percent-encoding, it might be
> worth forbidding them -- this avoids worrying about edge cases such
> as decoding '?10%2521', which decodes to '?10%21', which _doesn't_
> decode to '?10!' (one must percent-decode at most once).
Version 1 did indeed entirely forbid percent-encoding, but I felt we
need to lift that requirement now that we admit that we're not only
dealing with references into the Registry any more. In particular, I
expect the query part in pubDIDs will frequently just be a partial
file system path on the operators' hard disks. That might very well
contain URI reserved or even non-ASCII characters.
Now, if we forbid percent-encoded characters (which sounds very
attractive to me, too), people in such situations will percent-encode
anyway, because they're so used to doing that from HTTP URIs, and
they'll be cross with us if we tell them their IVOIDs are invalid, or
they'll have to use horrible hacks and blame us for it.
Weighing things up, I figured allowing percent-encoded characters in
local parts but not in the IVORN was the path of least pain all
around.
> Sect. 2.3.3: you quite rightfully say that 'Naming authorities are
> discouraged from creating segments matching either ???.??? or
> ???..???. Empty segments, resulting in two or more consecutive
> slashes or a trailing slash, are also discouraged.' Should this be
> perhaps SHOULD NOT. In fact, is there a real need for this to be
> other than 'MUST NOT'?
Probably not. This is again taken from version 1, and I've carefully
tried to avoid outlawing anything that might have been actually used
in version 1 IVOIDs. Checking the current registry as seen by GAVO,
there are no IVORNs with /./ or /../, and the three cases that
appear to have empty segments are obviously mistakes [I've already
informed the contact persons].
Hence I'd say there's no real need. I'd make this a MUST NOT if
nobody protests within the next few days.
[Reminder: You can also still protest against me dropping the
recommendation to present authorities in all-lower case, which I'll
otherwise do as well].
> Sect 2.3.3: 'VO applications MUST be case-insensitive when handling
> resource keys.' The word 'handling' is a little vague, it seems to
> me. How about 'All processing of the IVORN <authority> and <path>
> MUST be case insensitive but case-preserving.' Are applications
Hm, no, the case-preserving would be asking too much. In RegTAP, for
instance, we're lowercasing all IVORNs (we can do that there because
there are no local parts in the identifiers there by definition), and
there's not really much else we can do to make sure we have
case-insensitive processing within the database engine (not only of
IVOIDs but also of other items defined to be case-insensitive like
UCDs).
As always with case folding, this would be a better world if IVORNs
hadn't been born case-insensitive, but I don't think there's a way to
change case folding properties now (except for the local parts, which
haven't really been properly constrained so far and where the
consequences of case folding would be particularly dire).
> required to recognise 'IVO:' as an IVOID scheme? (I think the
> answer is yes, by RFC 3986) Are they obliged to serialise it as
Right, RFC 3986 says scheme parts are case-insensitive.
> 'ivo:' (ie, not be case-preserving) (I think the answer is yes, by
> a principle of minimising surprises).
Uh... Well, applications shouldn't care, so I see little cause to
constrain that. What would be the rationale to demand lower-casing?
Are you thinking of allowing allowing constructs like
if incoming_uri.startswith("ivo://"):
# special ivoid-handling code
(i.e., people shouldn't be forced to actually parse the URI)? I'd
not lose sleep about requiring people who do that kind of thing to
throw in an extra .lower().
> In fact, since Sect. 2.3.4 says that applications mustn't change
> Query case, you could decide to apply this to the Fragment, too,
That has been the intention, and it's put like this in section 2.6.
I've put in some extra prose in 2.4.5 stressing Fragments must not to
be case-normalised either.
> The detail in these rules indicates that a validator, with a big
> set of test cases, would be a useful thing to have. I imagine a
> small Java or Python program would suffice.
I'm planning to write such a thing as the WD goes out.
> Sect. 3: 'that is, IVORNs should not be reused.' This rules out an
> IVORN which refers to 'today's weather', unless you decide that
> 'today's weather' is a single logical resource even thought the
> referent -- the data it is referring to, as opposed to its
> description -- changes from day to day. Is that intentional? I
> think such an IVORN _should_ be permitted, by the way.
>From the formulation you and Marco arrived at and a point that Pierre
Le Sidaner made that we shouldn't have special rules for IVORNs here,
I've made this:
In the context of IVOA identifiers, ``unique'' means that a given
identifier MUST NOT refer to two different resources at any
instant. Furthermore, the identifier SHOULD refer to at most one
resource over all time; that is, IVOIDs should not be reused for
unrelated resrouces. Note that a resource may potentially be
dynamic (such as 'weather at telescope' or 'current version of the
standard') -- here, there is a conceptually unique resource, even
though the content of it may change in time.
I'll try to not go into much more detail here as an exhaustive
treatment would incur building a major ontological appartus.
But part of the problem is that as far as identifiers go, we don't
actually distinguish between a resource and its description in the
VO, i.e., the identifier "points at both". That's highly nontrivial,
but we more or less must do that given the way we're using our
registries.
To keep things simple, I normally recommend forgetting about the
actual resource when discussion what identifiers should do.
Surprisingly, when you've found something that works naturally for
the Registry record, it tends to magically work for the resource,
too.
I'd claim that's also true in this case: If the description of what
you're having starts deviating badly from what it was, you shouldn't
have the same identifier for it. If the description still works,
it's fine too keep the identifier.
But of course, things aren't that simple, as illustrated by Marco's
example in one of the followups to this mail:
> [Marco:]
> My claim, when I thought of this before, was: if I use
> ivo://my.auth/resource for an SCS, and later for a TAP (or even a data
> collection), a change that is permitted as per the reasoning above,
> would this impact Apps? (in an ideal world the answer is: no)
In VOResource language, you've changed the record's capabilities (SCS
to TAP) or even its data type (TabularService to DataCollection).
*My* discretion would be that touching the capabilities is almost
certainly unproblematic. Changing the type feels as if it should be
identity-changing, but even there I'm not sure.
Anyway, I think if this really needs further discussion, it should
take place within reviewing VOResource and not Identifiers.
Norman -- I believe all your other points went into the document in
rev. 2955 unless they had been reported before.
Thanks for the review,
Markus
More information about the registry
mailing list