HiPS IVOID issue => proposed solution

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Tue Jan 19 18:26:55 CET 2016


Hi Pierre, dear greater VO,

On Tue, Jan 19, 2016 at 01:59:37PM +0100, Pierre Fernique wrote:
> May I suggest a constructive solution for the IVOID issue that I tried to
> expose in my two last mails, and raised by the current HiPS standardization
> process.

Let me start off explaining what IVOIDs really are -- they are
recipes to locate something.  Just as your http URIs lead you to web
pages (or something else), IVOIDs should lead you something.  Hence,
the rule that plain IVOIDs (so, without # or ?) resolve to Registry
records isn't new in Identifiers 2.0, it's been there all the time.
It's providing a guarantees on a first step in resource discovery.

What is (somewhat) new in 2.0 is defining actual semantics to # and
? (they have been stop characters before, so there's really no new
criteria on resolvability of IVOIDs).

That you can build unique names based on the Registry infrastructure
is then a nice side effect.  Identifiers 2.0 gives you tools to do
that without damaging the Registry ecosystem (using ? or # depending
on what you're naming).

> Why not continue to authorize any *ivo://authority.id/A/B/C/etc* at the
> condition that the full id is VO resolvable or, /at least, a left prefix of
> it/.

Because it would confuse clients that try to resolve IVOIDs.  Up to
now, they just had to cut off anything behind a hash or a question
mark[1] and retrieve a rich set of metadata from the registry.

If they have to try a lot of IVOIDs in turn, always cutting off path
segments, they'll have a much harder time.  Also, if at least your
authority.id exists (which it of course should if you're aiming for
unique identifiers), every IVOID will eventually resolve, to the
authority record, which is also not good (see below).

> For VO resource, a simple recursive resolution could be used for a metadata
> query. For instance, imagine that a client tries to retrieve meta data on
> "*ivo://CDS.VizieR/I/221/smc*". It queries the VO registry with this full
> id. The registry returns the information concerning the longer prefix found
> in the VO registry. In this case "*ivo://CDS.VizieR/I/221*" with a dedicated
> flag alerting the client that this sub-resource is unknown, but its
> "parents" is described.

In the VO, this kind of recursion has traditionally been solved by
discovery services -- you don't want all images in the world in the
Registry, so you have a two-step process: The Registry contains the
discovery services, which then are used to discover the indivdual
datasets.

> Thanks to this more flexible constraint we are sure that any resource can
> continue to be identified with a simple, stable, evolutive and canonical
> (only one way to write the id) method. We avoid to introduce the
> articificial separator "?" for delimiting "fragment" or "query syntax"
> (strange for an identifier and not necessary canonical). And we will be able

Why do you feel a question mark is strange in an identifier?  As a
matter of fact, it's a very natural thing for "resource that's part
of a larger resource".  Google, for instance, has been using it to
identify their search results forever:
http://google.com/?q=identifier 

This only half a joke.

The truth is that the query fragment was designed *exactly* for the
use case you're describing, and you're doing yourself and everyone
else a big favour if you just use it where appropriate.

> to manage the possible evolutions of the VO registry content without having
> to potentially modify the identifiers (for instance if the SMC table is, at
> the end, described itself in the VO registry. And we insure that any
> existing IVOID has, at least, a declared authority id.

...which isn't terribly helpful -- it doesn't tell you anything you'd
really like to know about a VO resource: Who did it, who to complain
to about it, what it's about, where do access it, where to find more
information, etc.  So, no, it would really make the Registry useless,
which IMHO would be a shame.

Your SMC case is, I would claim, perfectly covered by the current
Registry DM -- the common metadata (authors, publishers, waveband,
coverage, description, access URLs, etc.) are the same for both
tables, and the individual metadata is covered in the tableset part
of the resource record.  It's close to perfect, and that's why I want
to go there in TAP table discovery.

I give you it's not quite the same with HiPSes, because you don't
have the equivalent of tableset.  If you insisted, you could have a
HiPS capability that lets you enumerate HiPSes.  But I don't think
there's a major use case for that right now.

Let me again try to describe what I think you should be doing:

(1) define a standardID for a single HiPS.  Where there's one or only
very few HiPSes per VOResource, each would simply become a capability
in the corresponding registry record (e.g., VizieR tables).  The
actual ivoids associated with the HiPSes, would be
<ivoid_of_main_record>?<anything> -- the anything could very well be
the table id for VizieR resources.  It would be nice if there were a
way to resolve these full IVOIDs to keep things tidy, but there are
no guarantees that IVOIDs with local parts are resolvable, so there's
no problem.

To discover these HiPSes, you would search the capability table in
RegTAP and from there, you'd easily get all the metadata you might
want, from titles to alternative access modes.

(2) for services that have a lot of resources, you should really
use a second-level discovery protocol (why not just use SIA?). These
services would get registred and have proper IVOIDs allowing the
discovery of all the relevant metadata and, in a second step, the
HiPSes themselves.  The HiPSes in there would have the normal
<main_service_id>?<some_local_id> IVOIDs as per Identifiers 2.0.

Locating those HiPSes then requires a two-step process -- but that's
just as with FITS files right now, and, I'd claim, completely
appropriate in that scenario.

Trust me, it pays not to cut corners here.  In particular, case (2)
is what you really want as instruments might produce HiPSes in large
numbers because their individual datasets are just too large.  Your
central HiPS registry will just not scale to that situation.

While I'm already ranting, I'm not a big fan of extra, Registry-like
infrastructures for single purposes.  The arguments are essentially
those that speak against keeping GloTS as a permanent part of the VO
infrastructure, and I'm citing myself here
(http://ivoa.net/documents/Notes/DataCollect/20160108/NOTE-discovercollections-1.0-20160108.html#tth_sEc3.1):

  1. It introduces a separate mode of discovery, and typical clients
  would have to support both Registry- and GloTS-based discovery.

  2: It bypasses metadata dissemination by the proven OAI-PMH protocol,
  thus requiring regular crawls of all TAP services and updates of all
  their metadata although the vast majority of data will not have
  changed.

  3: Most importantly, the metadata model of TAP_SCHEMA is much less
  sophisticated than VOResources. Common query modes like searches by
  author or instrument would either not work at all or would have to
  rely on conventions for how to write table or schema descriptions.

-- mutatis mutandis, this applies to your HiPS registry, too.

I'm happy to drop over to Strasbourg for a nice afternoon of spec
hacking if you think it helps working out how to snugly fit HiPSes
into the VO.

Oh, and to prevent misunderstandings:

On Mon, Jan 18, 2016 at 08:09:51PM +0100, Pierre Fernique wrote:
[in the second mail, the "constraint" being resolvability of IVOID
Registry parts]
> So we can decide, as you did in your last note, to have a "cavalier
> approach" (section 3 - http://ivoa.net/documents/Notes/DataCollect/20160108/NOTE-discovercollections-1.0-20160108.pdf)
> and just ignore the constraint. But I would like to use a more coherent
> approach. May be, we have just to change the "MUST" word by "SHOULD" in the
> IVOA PR-identifier 2.0 document - section 2.4 ?

The "cavalier approach" is on identifiers like 

  ivo://ivoa.net/std/sia#aux

-- these do resolve (to StandardsRegExt records) when you cut away
the fragment, which is what Identifiers requires and required.

Resolution with the fragment included is not guaranteed (by the
registry infrastructure), so this is still in line with Identifiers.
There SHOULD be a StandardKey "aux" in ivo://ivoa.net/std/sia,
though, and there isn't, because there's been a gentleman's agreement
that standards don't create keys in other standard's Registry records
ever since StandardsRegExt came out.  This, incidentally, is why all
the terms relevant for describing TAP services (e.g., upload methods)
have TAPRegExt IVOIDs, not TAP ones.

The cavalier approach here is ignoring that gentleman's agreement[2].
Which is, admittedly, odious and something reputable persons would
avoid.  But nothing breaks if you do this, and it's highly useful, so
I indulged in this little (!) sleight-of-hand.

It's a bit like serving a web page with invalid HTML, say, img as a
child of body in XHTML.  It's not nice, but everyone has been doing
it now and then as long as the browser renders it ok.

What you propose is to sanction 404s in everyday practice.

Well, now that I consider what I've just written I notice... Oh.
Never mind.  Just let's not do it.

Cheers,

         Markus

[1] strictly speaking, there are a few more stop characters, but
never mind that for now.

[2] apologies for the strongly gendered metaphors here.  I admit
they're indicators I'm on thin ice.


More information about the registry mailing list