HiPS IVOID issue => proposed solution

Thu Jan 28 16:14:51 CET 2016

Hi,

On Mon, Jan 25, 2016 at 07:28:40PM +0100, Pierre Fernique wrote:
> I need your help Markus (and other Registry's involved persons) concerning
> the best way to declare HiPSes in the VO registry (independantly to the
> identifier issue above).
> 
> I am studying these two ways, not necessary exclusive :
> 
> 1. *Adding HiPS capabilities* to the VO resources already defined in
>    the VO registry (your suggestion (1)). For instance, we can imagine
>    to add two HiPS capabilities to Simbad VO resources, one for HiPS
>    simbad access to CDS location, and another HiPS capability to Simbad
>    Harvard mirror site => Both of this HiPS access will refer to the
>    same IVOID => ivo://CDS/Simbad. Good ! /*[my first question]*/ Can
>    you provide us a template of the XML capability that we should use ?
>    Similar to cone search capability ?

First off, this is not independent of what kind of discovery
scenarios you have in mind.  For what I think would be great -- I
discover a resource that looks interesting to me, and to get an idea
about its coverage and content, I send its access URL to a
HiPS-enabled client --, you do not need a special capability type.
All it takes is

<capability standardID="ivo://ivoa.net/std/hips#hips-1.0">
  <interface role="std" xsi:type="vs:ParamHTTP">
    <accessURL use="base"
      >http://example.com/hipses/data/band</accessURL>
  </interface>
</capability>

To make this really nice and proper, after publication of your
standards text, send a  StandardsRegExt record to the RofR (if you
need assistence writing one, kindly ask me).  This would then be what
ivo://ivoa.net/std/hips points to, and it would have

  <key>
    <name>hips-1.0</name>
    <description>
      A HiPS cube accessible through HTTP.
		</description>
  </key>

in it (or analogous, I'm improvising based on incomplete knowledge
about how HiPS really works), which is what the standardID
references.

>    This method seems to be more difficult to apply for the VizieR
>    tables. As I was trying to explain in my previous mail, the HiPS
>    capabilities will not be inserted at the good level of the VO
>    Registry records (catalogs not tables). And we are facing the same
>    problem that we already have for the cone search capabilities and
>    footprint descriptions (for years). In this case*/[my second
>    question]/*//do you recommend us to create several HiPS capabilities
>    at the catalog level, one per table (eventually multiplied by the
>    number of mirror sites) ? Or maybe it is time to define individually
>    each VizieR tables in the VO registry ? (in discussion in the CDS team)

This is a bit of a philosophical question:  What's a "resource".
The W3C people essentially say "if it's got a URI, it's a resource",
so that's not helpful.

Therefore, I recommend the two criteria: If it's got common metadata,
it's one resource.  If users would expect to discover x1 together
x2 because otherwise x1 or x2 don't make much sense, they would be in
one resource. That is, I'd claim, a bit more helpful, but of
course full of ambiguities either.  Indulge me for two examples:

Consider a catalog of cataclysmic variables, consisting of a table of
objects t_1 and a table of outbursts t_2.

One resource or two?  So, author and title are the same, presumably
also a resource-wide description.  But the tables of course can have
separate descriptions.  But, really, t_2 will have a foreign key into
t_1, so discovering it without t_1 won't help.  So, make one resource
out of it.  Indeed, tableset lets you attach detailed table  (and
column) description to each table.

I believe the situation is analogous in most VizieR tables -- but
really, there's no good reason why it couldn't be different for some.
Consider, for instance, two teams, each building one catalog of AGNs,
onein radio, the other in X-Ray, from original data obtained,
obviously, on two different instruments.  For the publication,
they've teamed up for some reason, so there's a bit of common
metadata.  But since there's so much differening metadata, my take
here would be: Make it two resources so you can properly support
discovery by wavelength or instrument.

After all that philosophy: I think it would make a lot of sense to
just include a capability as given above per HiPS in all but the
most pathologic cases.  Capability lets you add a description
(about 30% of the capabilities in the current VO use that, so perhaps
clients should do more with these), so you could give hints on what's
going on like this:

  <capability standardID="ivo://ivoa.net/std/hips#hips-1.0">
    <description>HiPS for the table of X-Ray detections</description>
    <interface role="std" xsi:type="vs:ParamHTTP">
      <accessURL use="base"
        >http://example.com/hipses/data/xray</accessURL>
    </interface>
  </capability>

  <capability standardID="ivo://ivoa.net/std/hips#hips-1.0">
    <description>HiPS for the table of K-band detections</description>
    <interface role="std" xsi:type="vs:ParamHTTP">
      <accessURL use="base"
        >http://example.com/hipses/data/kband</accessURL>
    </interface>
  </capability>

  <capability standardID="ivo://ivoa.net/std/hips#hips-1.0">
    <description>HiPS for the table of detections with gravitational
      waves</description>
    <interface role="std" xsi:type="vs:ParamHTTP">
      <accessURL use="base"
        >http://example.com/hipses/data/gravity</accessURL>
    </interface>
  </capability>

I guess at least in WIRR we should think about a smart way to use
capability/description.

> 2. *Declaring HiPS servers* in the VO registry (your suggestion (2)),
>    for letting each HiPS server enumerating HiPSes that it is
>    providing. [The way that this enumeration is provided is presently a
>    list of properties records in a specific HiPS syntax; your SIA
>    suggestion is unfortunately not really adaptable for HiPSes, I think
>    - but it is another discussion].

It is, indeed -- if we can avoid defining yet another protocol, that'd be
*really* great.  If SIA really doesn't work for you, perhaps ObsTAP
does?

>    In this last method I do not see how the same HiPS will not refer
>    potentially to several different IVOIDs. For instance the HiPS of
>    DSS colored is presently distributed by 3 HiPS servers located
>    respectively at CDS, ESAC and IAS. If the 3 HiPS servers are
>    declared in the VO registry, and following your IVOID building
>    mecanism (<main_service_id>?<some_local_id>) each HiPS server will
>    provide their own prefix  and syntax of the IVOID and we will have,
>    at the end,  three different IVOID associated to the same resource:
>    ivo://CDS/myhipsnode?id=DSSColor,
>    ivo://ESAVO/theirhipsnode?theirid=DSSColor,
>    ivo://IAS/yourhipsnode?yourid=DSSColor. So/*[my third question]*/
>    what I misunderstood ? and if not, do we agree to have several
>    IVOIDs for the same resource ?

You have not misunderstood anything -- this is the nature of the
*publisher* DID -- it depends on who's publishing the dataset.  If
you look at, for instance, SSA, in the VO we've also had the notion
of a *creator* DID.  That is a unique id for a dataset, invariant
with respect to where it comes from, assigned by the creator.

Example: If the Aladin team builds HiPSes with the intent that they are
mirrored by other publishers, they would get themselves an IVOID,
ivo://cds/aladin, say, and then stamp their datasets 

ivo://cds/aladin?twomass/ks
ivo://cds/aladin?twomass/h
ivo://cds/aladin?twomass/j
ivo://cds/aladin?twomass/technicolor

As they distribute their datasets, this id would never change, even
if the publisher DID would.

[Note, however, that actual mirrors probably should be handled
through different interfaces on the same capability and thus would
share a publisher, with the datasets served therefore having
identical pubDIDs regardless of the mirror; but that's, I guess, not
a pressing issue for HiPSes]

Creator DIDs so far haven't been used much.  But at least when looked
at from the outside, your use case looks like it is what they were
conceived for.

> /HiPS identifier issue//=> normally solved
> /
> It is clear that we (HiPS developers) had a wrong conception of the IVORN
> (IVOID now). We believed that we could use it as a stable and uniform
> astronomical resource identication mecanism. But as you said, "/they are
> recipes to locate something/", not to identify something (you are right: URI
> != URL). The two concepts seems similar but differ. I understand now that we

No, there is really no difference between URL and URI any more (see
RFC 3986).  And you cannot have one of uniqueness and discovery
without the other -- there's no way you can find out if some name is
already taken without some sort of Registry.  I still maintain you're
doing yourself a favour if you don't re-invent that Registry, even if
at first it may seem handing out some names to people you know anyway
can't be much of a problem.

> will have difficulties to use it as we need (notably because IVOID is still
> in evolution).

Uh, for the record, no, IVOID is *not* in evolution, the only thing
that has actually changed in the last 10 years[1] is that Identifiers
2.0 did away with the XML serialisation that nobody has used anyway.
The rest hasn't changed. Identifiers 2.0 is just a clarification
(or sanctioning of existing practice), and it filled in some gaps
left open by the old spec (comparison of IVOIDs with stop characters,
mainly).  I'd say stability over 10 years isn't at all bad for
something to do with computers.

> Also, forcing an "a priori" declaration in the VO registry of any HiPSes
> just for having an identifier is probably not a good idea for three reasons:
> 1) We have the risk to have a lot of "prototype" HiPS VO registrations -
> never maintain after (as we had with cone search resources at the beginning
> of the VO)

In the creatorDID scheme above the only thing that's in the registry
is a record for the creator (aladin in the example above), probably
an organisation.  Those don't hurt, and as said above, they're really
necessary for uniqueness.

For test purposes, you could still register ivo://CDS/hips-testing
(and forgoe uniqueness guarantees for HiPSes having that as Registry
part in their creator DID).

> 2) I'm not sure that all HiPS providers will agree.

That's a larger problem, but they'll have to agree to register their
id somewhere if you want to maintain uniqueness.  Why not use what's
already there?

> 3) The granularity of the VO registry is probably not always adapted in such
> HiPS cases (cf below)

[actually, above, now, because I first wanted to answer the questions]  

Oh, the granularity matters for discovery, but not for identification
purposes.  For that, I believe the creatorDID scheme above is as good
as anything you'd invent just for HiPSes.

> So to avoid jamming the HiPS standardization process and the deployment of
> the HiPS aware codes, we prefer to use another HiPS identification mecanism,
> no longer based on IVOID:
> 1) The publisher_did in the HiPS properties record (HiPS metadata file) will
> be now optional and no longer used for identifying the HiPS

If you want to use that id to un-dupe responses from several servers
then, yes, that is exactly what you should do.

> 2) The HiPS internal unique identifier will be now built by the
> concatenation of publisher_id et obs_id without the ivo:// prefix

Well, the publisher id could (and, to control uniqueness, probably
should) be an ivoid.  Then use a ? to concatenate it with the obs_id
as you propose, and presto, you have an ivoid, the uniqueness of
which is guaranteed by the Registry (in front of the question mark)
and the creator (after the question mark).

Nice, clean, no extra work required.  What's not to like?

Call that "internal identifier" a creator DID, and you're smack in
the middle of the VO mainstream.

> 3) We are modifying the various codes and data impacted by this evolution
> (Hipsgen.jar, Hipsgencat.jar, Aladin V9, Aladin Lite, MocServer, and some
> various scripts)
> 4) I will send a memo to all HiPS actors for upgrading their codes and
> synchronize their HiPS data according to this evolution (notably the mirror
> actions should be suspended amongst the HiPS sites for avoiding dupplication
> until the situation is again cleaned.

If you're still convinced you have to go into all that trouble,
please let's have a quick chat on the phone -- I think there's not
terribly much you need to do to get your use cases covered without
having to invent new tech.

Cheers,

          Markus

[1] Approximately; I'd wager with PR-20040621 it's essentially been
there, though I've not really, formally, checked it.