HiPS IVOID issue => proposed solution

Pierre Fernique Pierre.Fernique at astro.unistra.fr
Mon Jan 25 19:28:40 CET 2016


Hi Markus & all HiPS and registery involved persons.


/HiPS identifier issue//=> normally solved
/
It is clear that we (HiPS developers) had a wrong conception of the 
IVORN (IVOID now). We believed that we could use it as a stable and 
uniform astronomical resource identication mecanism. But as you said, 
"/they are recipes to locate something/", not to identify something (you 
are right: URI != URL). The two concepts seems similar but differ. I 
understand now that we will have difficulties to use it as we need 
(notably because IVOID is still in evolution).

Also, forcing an "a priori" declaration in the VO registry of any HiPSes 
just for having an identifier is probably not a good idea for three reasons:
1) We have the risk to have a lot of "prototype" HiPS VO registrations - 
never maintain after (as we had with cone search resources at the 
beginning of the VO)
2) I'm not sure that all HiPS providers will agree.
3) The granularity of the VO registry is probably not always adapted in 
such HiPS cases (cf below)

So to avoid jamming the HiPS standardization process and the deployment 
of the HiPS aware codes, we prefer to use another HiPS identification 
mecanism, no longer based on IVOID:
1) The publisher_did in the HiPS properties record (HiPS metadata file) 
will be now optional and no longer used for identifying the HiPS
2) The HiPS internal unique identifier will be now built by the 
concatenation of publisher_id et obs_id without the ivo:// prefix
3) We are modifying the various codes and data impacted by this 
evolution (Hipsgen.jar, Hipsgencat.jar, Aladin V9, Aladin Lite, 
MocServer, and some various scripts)
4) I will send a memo to all HiPS actors for upgrading their codes and 
synchronize their HiPS data according to this evolution (notably the 
mirror actions should be suspended amongst the HiPS sites for avoiding 
dupplication until the situation is again cleaned.


/HiPS VO registry declaration//=> questions
/
I need your help Markus (and other Registry's involved persons) 
concerning the best way to declare HiPSes in the VO registry 
(independantly to the identifier issue above).

I am studying these two ways, not necessary exclusive :

 1. *Adding HiPS capabilities* to the VO resources already defined in
    the VO registry (your suggestion (1)). For instance, we can imagine
    to add two HiPS capabilities to Simbad VO resources, one for HiPS
    simbad access to CDS location, and another HiPS capability to Simbad
    Harvard mirror site => Both of this HiPS access will refer to the
    same IVOID => ivo://CDS/Simbad. Good ! /*[my first question]*/ Can
    you provide us a template of the XML capability that we should use ?
    Similar to cone search capability ?
    This method seems to be more difficult to apply for the VizieR
    tables. As I was trying to explain in my previous mail, the HiPS
    capabilities will not be inserted at the good level of the VO
    Registry records (catalogs not tables). And we are facing the same
    problem that we already have for the cone search capabilities and
    footprint descriptions (for years). In this case*/[my second
    question]/*//do you recommend us to create several HiPS capabilities
    at the catalog level, one per table (eventually multiplied by the
    number of mirror sites) ? Or maybe it is time to define individually
    each VizieR tables in the VO registry ? (in discussion in the CDS team)

 2. *Declaring HiPS servers* in the VO registry (your suggestion (2)),
    for letting each HiPS server enumerating HiPSes that it is
    providing. [The way that this enumeration is provided is presently a
    list of properties records in a specific HiPS syntax; your SIA
    suggestion is unfortunately not really adaptable for HiPSes, I think
    - but it is another discussion].
    In this last method I do not see how the same HiPS will not refer
    potentially to several different IVOIDs. For instance the HiPS of
    DSS colored is presently distributed by 3 HiPS servers located
    respectively at CDS, ESAC and IAS. If the 3 HiPS servers are
    declared in the VO registry, and following your IVOID building
    mecanism (<main_service_id>?<some_local_id>) each HiPS server will
    provide their own prefix  and syntax of the IVOID and we will have,
    at the end,  three different IVOID associated to the same resource:
    ivo://CDS/myhipsnode?id=DSSColor,
    ivo://ESAVO/theirhipsnode?theirid=DSSColor,
    ivo://IAS/yourhipsnode?yourid=DSSColor. So/*[my third question]*/
    what I misunderstood ? and if not, do we agree to have several
    IVOIDs for the same resource ?

Thank for you help.
Cheers
Pierre



Le 19/01/2016 18:26, Markus Demleitner a écrit :
> Hi Pierre, dear greater VO,
>
> On Tue, Jan 19, 2016 at 01:59:37PM +0100, Pierre Fernique wrote:
>> May I suggest a constructive solution for the IVOID issue that I tried to
>> expose in my two last mails, and raised by the current HiPS standardization
>> process.
> Let me start off explaining what IVOIDs really are -- they are
> recipes to locate something.  Just as your http URIs lead you to web
> pages (or something else), IVOIDs should lead you something.  Hence,
> the rule that plain IVOIDs (so, without # or ?) resolve to Registry
> records isn't new in Identifiers 2.0, it's been there all the time.
> It's providing a guarantees on a first step in resource discovery.
>
> What is (somewhat) new in 2.0 is defining actual semantics to # and
> ? (they have been stop characters before, so there's really no new
> criteria on resolvability of IVOIDs).
>
> That you can build unique names based on the Registry infrastructure
> is then a nice side effect.  Identifiers 2.0 gives you tools to do
> that without damaging the Registry ecosystem (using ? or # depending
> on what you're naming).
>
>> Why not continue to authorize any *ivo://authority.id/A/B/C/etc* at the
>> condition that the full id is VO resolvable or, /at least, a left prefix of
>> it/.
> Because it would confuse clients that try to resolve IVOIDs.  Up to
> now, they just had to cut off anything behind a hash or a question
> mark[1] and retrieve a rich set of metadata from the registry.
>
> If they have to try a lot of IVOIDs in turn, always cutting off path
> segments, they'll have a much harder time.  Also, if at least your
> authority.id exists (which it of course should if you're aiming for
> unique identifiers), every IVOID will eventually resolve, to the
> authority record, which is also not good (see below).
>
>> For VO resource, a simple recursive resolution could be used for a metadata
>> query. For instance, imagine that a client tries to retrieve meta data on
>> "*ivo://CDS.VizieR/I/221/smc*". It queries the VO registry with this full
>> id. The registry returns the information concerning the longer prefix found
>> in the VO registry. In this case "*ivo://CDS.VizieR/I/221*" with a dedicated
>> flag alerting the client that this sub-resource is unknown, but its
>> "parents" is described.
> In the VO, this kind of recursion has traditionally been solved by
> discovery services -- you don't want all images in the world in the
> Registry, so you have a two-step process: The Registry contains the
> discovery services, which then are used to discover the indivdual
> datasets.
>
>> Thanks to this more flexible constraint we are sure that any resource can
>> continue to be identified with a simple, stable, evolutive and canonical
>> (only one way to write the id) method. We avoid to introduce the
>> articificial separator "?" for delimiting "fragment" or "query syntax"
>> (strange for an identifier and not necessary canonical). And we will be able
> Why do you feel a question mark is strange in an identifier?  As a
> matter of fact, it's a very natural thing for "resource that's part
> of a larger resource".  Google, for instance, has been using it to
> identify their search results forever:
> http://google.com/?q=identifier
>
> This only half a joke.
>
> The truth is that the query fragment was designed *exactly* for the
> use case you're describing, and you're doing yourself and everyone
> else a big favour if you just use it where appropriate.
>
>> to manage the possible evolutions of the VO registry content without having
>> to potentially modify the identifiers (for instance if the SMC table is, at
>> the end, described itself in the VO registry. And we insure that any
>> existing IVOID has, at least, a declared authority id.
> ...which isn't terribly helpful -- it doesn't tell you anything you'd
> really like to know about a VO resource: Who did it, who to complain
> to about it, what it's about, where do access it, where to find more
> information, etc.  So, no, it would really make the Registry useless,
> which IMHO would be a shame.
>
> Your SMC case is, I would claim, perfectly covered by the current
> Registry DM -- the common metadata (authors, publishers, waveband,
> coverage, description, access URLs, etc.) are the same for both
> tables, and the individual metadata is covered in the tableset part
> of the resource record.  It's close to perfect, and that's why I want
> to go there in TAP table discovery.
>
> I give you it's not quite the same with HiPSes, because you don't
> have the equivalent of tableset.  If you insisted, you could have a
> HiPS capability that lets you enumerate HiPSes.  But I don't think
> there's a major use case for that right now.
>
> Let me again try to describe what I think you should be doing:
>
> (1) define a standardID for a single HiPS.  Where there's one or only
> very few HiPSes per VOResource, each would simply become a capability
> in the corresponding registry record (e.g., VizieR tables).  The
> actual ivoids associated with the HiPSes, would be
> <ivoid_of_main_record>?<anything> -- the anything could very well be
> the table id for VizieR resources.  It would be nice if there were a
> way to resolve these full IVOIDs to keep things tidy, but there are
> no guarantees that IVOIDs with local parts are resolvable, so there's
> no problem.
>
> To discover these HiPSes, you would search the capability table in
> RegTAP and from there, you'd easily get all the metadata you might
> want, from titles to alternative access modes.
>
> (2) for services that have a lot of resources, you should really
> use a second-level discovery protocol (why not just use SIA?). These
> services would get registred and have proper IVOIDs allowing the
> discovery of all the relevant metadata and, in a second step, the
> HiPSes themselves.  The HiPSes in there would have the normal
> <main_service_id>?<some_local_id> IVOIDs as per Identifiers 2.0.
>
> Locating those HiPSes then requires a two-step process -- but that's
> just as with FITS files right now, and, I'd claim, completely
> appropriate in that scenario.
>
> Trust me, it pays not to cut corners here.  In particular, case (2)
> is what you really want as instruments might produce HiPSes in large
> numbers because their individual datasets are just too large.  Your
> central HiPS registry will just not scale to that situation.
>
> While I'm already ranting, I'm not a big fan of extra, Registry-like
> infrastructures for single purposes.  The arguments are essentially
> those that speak against keeping GloTS as a permanent part of the VO
> infrastructure, and I'm citing myself here
> (http://ivoa.net/documents/Notes/DataCollect/20160108/NOTE-discovercollections-1.0-20160108.html#tth_sEc3.1):
>
>    1. It introduces a separate mode of discovery, and typical clients
>    would have to support both Registry- and GloTS-based discovery.
>
>    2: It bypasses metadata dissemination by the proven OAI-PMH protocol,
>    thus requiring regular crawls of all TAP services and updates of all
>    their metadata although the vast majority of data will not have
>    changed.
>
>    3: Most importantly, the metadata model of TAP_SCHEMA is much less
>    sophisticated than VOResources. Common query modes like searches by
>    author or instrument would either not work at all or would have to
>    rely on conventions for how to write table or schema descriptions.
>
> -- mutatis mutandis, this applies to your HiPS registry, too.
>
> I'm happy to drop over to Strasbourg for a nice afternoon of spec
> hacking if you think it helps working out how to snugly fit HiPSes
> into the VO.
>
> Oh, and to prevent misunderstandings:
>
> On Mon, Jan 18, 2016 at 08:09:51PM +0100, Pierre Fernique wrote:
> [in the second mail, the "constraint" being resolvability of IVOID
> Registry parts]
>> So we can decide, as you did in your last note, to have a "cavalier
>> approach" (section 3 - http://ivoa.net/documents/Notes/DataCollect/20160108/NOTE-discovercollections-1.0-20160108.pdf)
>> and just ignore the constraint. But I would like to use a more coherent
>> approach. May be, we have just to change the "MUST" word by "SHOULD" in the
>> IVOA PR-identifier 2.0 document - section 2.4 ?
> The "cavalier approach" is on identifiers like
>
>    ivo://ivoa.net/std/sia#aux
>
> -- these do resolve (to StandardsRegExt records) when you cut away
> the fragment, which is what Identifiers requires and required.
>
> Resolution with the fragment included is not guaranteed (by the
> registry infrastructure), so this is still in line with Identifiers.
> There SHOULD be a StandardKey "aux" in ivo://ivoa.net/std/sia,
> though, and there isn't, because there's been a gentleman's agreement
> that standards don't create keys in other standard's Registry records
> ever since StandardsRegExt came out.  This, incidentally, is why all
> the terms relevant for describing TAP services (e.g., upload methods)
> have TAPRegExt IVOIDs, not TAP ones.
>
> The cavalier approach here is ignoring that gentleman's agreement[2].
> Which is, admittedly, odious and something reputable persons would
> avoid.  But nothing breaks if you do this, and it's highly useful, so
> I indulged in this little (!) sleight-of-hand.
>
> It's a bit like serving a web page with invalid HTML, say, img as a
> child of body in XHTML.  It's not nice, but everyone has been doing
> it now and then as long as the browser renders it ok.
>
> What you propose is to sanction 404s in everyday practice.
>
> Well, now that I consider what I've just written I notice... Oh.
> Never mind.  Just let's not do it.
>
> Cheers,
>
>           Markus
>
> [1] strictly speaking, there are a few more stop characters, but
> never mind that for now.
>
> [2] apologies for the strongly gendered metaphors here.  I admit
> they're indicators I'm on thin ice.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/apps/attachments/20160125/e0ed7f25/attachment-0001.html>


More information about the apps mailing list