STC in VOResource records

Arnold Rots arots at head.cfa.harvard.edu
Thu Dec 14 11:27:48 PST 2006


I just had a brief discussion with Jonathan and we may have a solution
that is a variation on Ray's, but generalized and formalized.

The problem really is how we can ensure that associations that are
unique within a document remain unique when elements are copied to,
or concatenated into, a new document.

Let's assume, for the sake of argument, that we are using ID/IDREF
pairs, though that is not essential (as I said before, the issue is
that the association needs to be unambiguous, not what the particular
datatype is).

If we require that all IVOA documents contain a document URI, assigned
by the publisher, then we can solve the problem by setting a rule that
all ID and IDREF tags, when extracted from the document, should
receive the document URI as a prefix.
Another way of putting it is that all tags should be URIs, but that
the common root may be omitted, provided that it is presented in a
document URI.

So, the STCResourceProfile from this document:

  <MyResource  ... documentURI="ivo://ncsa/MyResource">
     ...
     <STCResourceProfile>
        <AstroCoordSystem xlink:type="simple"
                          xlink:href="ivo://STClib/CoordSys#UTC-FK5-TOPO"
                          id="UTC-FK5-TOPO"/>
        <AstroCoordArea coord_system_id="UTC-FK5-TOPO">
           <AllSky/>
        </AstroCoordArea>
     </STCResourceProfile>
     ...
  </MyResource>

gets extracted into the registry as:

     <STCResourceProfile>
        <AstroCoordSystem xlink:type="simple"
                          xlink:href="ivo://STClib/CoordSys#UTC-FK5-TOPO"
                          id="ivo://ncsa/MyResource/UTC-FK5-TOPO"/>
        <AstroCoordArea coord_system_id="ivo://ncsa/MyResource/UTC-FK5-TOPO">
           <AllSky/>
        </AstroCoordArea>
     </STCResourceProfile>


We believe that this would be a global and general solution for all
associations, in the registry and elsewhere.

  - Arnold

Ray Plante wrote:
> Hi RWGers,
> 
> So we have a bit of a crisis to contend with regarding our use of STC
> within a VOResource record which is standing in the way of our upgrade
> to RI v1.0.  To catch folks up, I'm going to summarize the problem and
> review some useful input that others have made, and then try to
> conclude with our current set of alternatives.
> 
> I. The Problem
> 
> We use the Space-Time Coordinates schema (STC) to describe a resource's
> coverage of the sky, time, and frequency.  In STC, this is done by first 
> defining "coordinate systems" for each of these things and then listing 
> how the resource maps onto those systems.  A single, simple instance looks 
> like this:
> 
>      <stc:STCResourceProfile
>           xmlns="http://www.ivoa.net/xml/STC/stc-v1.30.xsd">
> 
>         <AstroCoordSystem xlink:type="simple"
>                           xlink:href="ivo://STClib/CoordSys#UTC-FK5-TOPO"
>                           id="UTC-FK5-TOPO"/>
> 
>         <AstroCoordArea coord_system_id="UTC-FK5-TOPO">
>            <AllSky/>
>         </AstroCoordArea>
> 
>      </stc:STCResourceProfile>
> 
> The <AstroCoordSystem> defines a system on the sky by refering to a 
> "standard system", via the xlink attributes.  The <AstroCoordArea> 
> describes the actual coverage on that system.  The two are linked through 
> the id value, "UTC-FK5-TOPO", which by convention, matches the local 
> identifier part of the xlink:href attribute.
> 
> An STC description may require multiple coordinate systems to describe its 
> coverage, so it needs a way to uniquely connect a particular coverage 
> description to a single coordinate system.  This is done with a little 
> XML magic by making <AstroCoordSystem>'s id of type xs:ID and 
> <AstroCoordArea>'s coord_system_id of type xs:IDREF.  For this to work, 
> there must be only one id="UTC-FK5-TOPO" in the entire document.
> 
> This is easily satisfied when we have single VOResource records; however, 
> the problem comes when we concatonate records into a single document. 
> If every record follows the conventional choice, there will be many 
> occurances of id="UTC-FK5-TOPO".  We could change this convention; 
> however, we have to realize that the individual VOResource records are 
> created independently, so some coordination is needed to ensure 
> uniqueness.
> 
> Concatonation of VOResource records happens in two cases in the Registry 
> Interface, within a harvesting response and within a search query 
> response.  As Paul Harrison has pointed out, there is an analogous problem 
> with VOEvent's use of STC, so this is likely to be a more general problem.
> 
> II. Discussion
> 
> Paul Harrison posted this very useful summary of suggested alternatives:
> 
> On Tue, 5 Dec 2006, Paul Harrison wrote:
> > As I see it, there a several solutions to this,
> >
> > 1. The registry always rewrites the id and coord_system_id within a
> > single record with unique values - e.g. ascending integers for a
> > particular harvest set - this is relatively simple to implement, but
> > is rather a shame to loose the "human readable" ids, however the
> > document will be xml valid.
> >
> > 2. Gather all of the AstroCoordSystem definitions into a special
> > record and retain their human readable IDs and then do not emit the
> > individual AstroCoordSystem elements in the individual records -
> > though for a normal query to the registry (returning one record), it
> > must remember to insert the appropriate AstroCoordSystem(s) from the
> > special record. This would be an extra level of complexity in the
> > registries housekeeping that it has not had to deal with so far
> > though.
> >
> > 3. Change the STC schema so that it does not use xs:ID and xs:IDREF
> > types for the cross referencing, but use xs:unique and xs:keyref
> > constraints to ensure integrity of the ids and references - this has
> > the advantage that the scope of the uniqueness can be defined rather
> > than it having to be global to the XML document, so that the ids
> > could be scoped to be unique just within each registry record. This
> > solution seems best to me as it retains XML parser checking of id
> > uniqueness, allows "human readable" ids within each record, and
> > requires no special processing by the registries.
> 
> Here are a few comments about these alternatives:
> 
> 1. Rewriting IDs.
> 
> This would have to be done at both publishing time and harvesting
> time since the IDs would have to be unique within the entire
> registry.  Note that you can't just take what another registries id
> when you harvest; consider:
> 
>    o  you have to make sure that the remote registry's locally unique
>       id doesn't clash with yours.
>    o  when you reharvest a record, you don't know what has changed or
>       added, so every id must be at least examined and perhaps
>       undated.
> 
> This might be made easier if we augment the id with the registry's
> IVOA ID; e.g: id="nvo.ncsa/registry/5:UTC-FK5-TOPO".  In this case, we
> would only need to set the ID at publishing time; subsequent rewriting
> is not necessary.  Note that the ID part does not need refer to the
> registry; it could be the ID of the resource itself.  If you used the
> resource id, then you shouldn't need the additional "/5".
> 
> My biggest misgivings are:
> 
>    o  this requires special processing for a special subset of records
>    o  we have to explain how (and why) to do this to publishers.  It's
>       not simple.
> 
> These are not insurmountable.
> 
> 2.  Restructure the records.
> 
> I belive Paul included this for completeness and for further
> illustrating the problem.  Nevertheless, this would require
> significant processing by both the sender and receiver to combine and
> then split the records.  So (unless I've misunderstood something),
> this is not particularly appealing.
> 
> 3.  Changing STC to use xs:keyref and xs:unique.
> 
> In principle this is possible because these types allow you to say
> that combinations of values--e.g. STC id and VOResource
> identifier--must be unique.  However, this would require coordination
> across these two schemas, which would break their respective designs.
> Any use of xs:keyref within just STC (I believe) would inevitably
> encounter the same problem.
> 
> III.  Current Options
> 
> We need a solution pretty much right away as this problem is standing
> in the way of our registry upgrade work.  I think the simplest
> solution available is Paul's suggestion #1, with the variation I
> suggest to incorporate the registry's (or the resource's) IVOA ID.
> 
> Arnold could in principle, change the STC schema not to use the
> xs:ID/IDREF types.  It could retain the data model, but impose rules
> of uniqueness that are outside the capabilities of a an XML
> Schema-aware parser to check; this would require an
> application-specific validater to check.  This is not unprecedented as
> we have this in VOResource now.  However, I'm not sure this is
> practical on a short timescale, and if the #1 solution above is
> viable, then changing the STC schema may not be wise and worth the
> extra validater development required.
> 
> If we assume #2 and #3 above are not viable (especially given our
> schedule), the only other option is to drop the use of STC altogether
> from VOResource until a solution can be found.  We still have the
> ability to point to a footprint service.  Personally, I'm not ready to
> go here, yet.  I'm not about to propose an alternate schema to STC
> (for one, this is not a quick solution).  More importantly, I'm not
> ready to drop an important set of metadata--coverage--recommended by
> the RM because of a technical glitch in STC.
> 
> In conclusion, if you guys agree that solution #1 is the way to go,
> then we will need to get out (quickly) a concise, unambiguous
> description of how form and use these IDs.
> 
> cheers,
> Ray
> 
--------------------------------------------------------------------------
Arnold H. Rots                                Chandra X-ray Science Center
Smithsonian Astrophysical Observatory                tel:  +1 617 496 7701
60 Garden Street, MS 67                              fax:  +1 617 495 7356
Cambridge, MA 02138                             arots at head.cfa.harvard.edu
USA                                     http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------



More information about the registry mailing list