STC in VOResource records

Arnold Rots arots at head.cfa.harvard.edu
Thu Dec 14 07:30:15 PST 2006


Just a quick comment.

The ID/IDREF pairs are used for two purposes in STC:
One is effectively to provide a substitution mechanism by allowing a
particular element to say that it is identical to another of the same
type in the same document.
The other is to provide an association mechanism between coordinates
and coordinate systems.

I don't think anything would be broken if these id/coordsys_id pairs
in the second application were changed to, say, strings or other
tokens, but the underlying uniqueness problem remains: if there are
multiple coordinate systems with the same id in the document, which of
these is the coordinate referring to?
Using ID/IDREF does not allow you to get into this situation, but if
you change the schema so it doesn't get caught, you are only hiding
the problem.
This is not really a problem with STC, it is a problem that is
inherent in the registry: if you pull XML elements from other sources
and there are associations defined, there is no way you can guarantee
their uniqueness unless you impose uniqueness on the association
identifiers, for instance through the mechanism Ray is suggesting.

But I am open to suggestions regarding the STC schema that won't break
anything currently defined.

  - Arnold

Ray Plante wrote:
> Hi RWGers,
> 
> So we have a bit of a crisis to contend with regarding our use of STC
> within a VOResource record which is standing in the way of our upgrade
> to RI v1.0.  To catch folks up, I'm going to summarize the problem and
> review some useful input that others have made, and then try to
> conclude with our current set of alternatives.
> 
> I. The Problem
> 
> We use the Space-Time Coordinates schema (STC) to describe a resource's
> coverage of the sky, time, and frequency.  In STC, this is done by first 
> defining "coordinate systems" for each of these things and then listing 
> how the resource maps onto those systems.  A single, simple instance looks 
> like this:
> 
>      <stc:STCResourceProfile
>           xmlns="http://www.ivoa.net/xml/STC/stc-v1.30.xsd">
> 
>         <AstroCoordSystem xlink:type="simple"
>                           xlink:href="ivo://STClib/CoordSys#UTC-FK5-TOPO"
>                           id="UTC-FK5-TOPO"/>
> 
>         <AstroCoordArea coord_system_id="UTC-FK5-TOPO">
>            <AllSky/>
>         </AstroCoordArea>
> 
>      </stc:STCResourceProfile>
> 
> The <AstroCoordSystem> defines a system on the sky by refering to a 
> "standard system", via the xlink attributes.  The <AstroCoordArea> 
> describes the actual coverage on that system.  The two are linked through 
> the id value, "UTC-FK5-TOPO", which by convention, matches the local 
> identifier part of the xlink:href attribute.
> 
> An STC description may require multiple coordinate systems to describe its 
> coverage, so it needs a way to uniquely connect a particular coverage 
> description to a single coordinate system.  This is done with a little 
> XML magic by making <AstroCoordSystem>'s id of type xs:ID and 
> <AstroCoordArea>'s coord_system_id of type xs:IDREF.  For this to work, 
> there must be only one id="UTC-FK5-TOPO" in the entire document.
> 
> This is easily satisfied when we have single VOResource records; however, 
> the problem comes when we concatonate records into a single document. 
> If every record follows the conventional choice, there will be many 
> occurances of id="UTC-FK5-TOPO".  We could change this convention; 
> however, we have to realize that the individual VOResource records are 
> created independently, so some coordination is needed to ensure 
> uniqueness.
> 
> Concatonation of VOResource records happens in two cases in the Registry 
> Interface, within a harvesting response and within a search query 
> response.  As Paul Harrison has pointed out, there is an analogous problem 
> with VOEvent's use of STC, so this is likely to be a more general problem.
> 
> II. Discussion
> 
> Paul Harrison posted this very useful summary of suggested alternatives:
> 
> On Tue, 5 Dec 2006, Paul Harrison wrote:
> > As I see it, there a several solutions to this,
> >
> > 1. The registry always rewrites the id and coord_system_id within a
> > single record with unique values - e.g. ascending integers for a
> > particular harvest set - this is relatively simple to implement, but
> > is rather a shame to loose the "human readable" ids, however the
> > document will be xml valid.
> >
> > 2. Gather all of the AstroCoordSystem definitions into a special
> > record and retain their human readable IDs and then do not emit the
> > individual AstroCoordSystem elements in the individual records -
> > though for a normal query to the registry (returning one record), it
> > must remember to insert the appropriate AstroCoordSystem(s) from the
> > special record. This would be an extra level of complexity in the
> > registries housekeeping that it has not had to deal with so far
> > though.
> >
> > 3. Change the STC schema so that it does not use xs:ID and xs:IDREF
> > types for the cross referencing, but use xs:unique and xs:keyref
> > constraints to ensure integrity of the ids and references - this has
> > the advantage that the scope of the uniqueness can be defined rather
> > than it having to be global to the XML document, so that the ids
> > could be scoped to be unique just within each registry record. This
> > solution seems best to me as it retains XML parser checking of id
> > uniqueness, allows "human readable" ids within each record, and
> > requires no special processing by the registries.
> 
> Here are a few comments about these alternatives:
> 
> 1. Rewriting IDs.
> 
> This would have to be done at both publishing time and harvesting
> time since the IDs would have to be unique within the entire
> registry.  Note that you can't just take what another registries id
> when you harvest; consider:
> 
>    o  you have to make sure that the remote registry's locally unique
>       id doesn't clash with yours.
>    o  when you reharvest a record, you don't know what has changed or
>       added, so every id must be at least examined and perhaps
>       undated.
> 
> This might be made easier if we augment the id with the registry's
> IVOA ID; e.g: id="nvo.ncsa/registry/5:UTC-FK5-TOPO".  In this case, we
> would only need to set the ID at publishing time; subsequent rewriting
> is not necessary.  Note that the ID part does not need refer to the
> registry; it could be the ID of the resource itself.  If you used the
> resource id, then you shouldn't need the additional "/5".
> 
> My biggest misgivings are:
> 
>    o  this requires special processing for a special subset of records
>    o  we have to explain how (and why) to do this to publishers.  It's
>       not simple.
> 
> These are not insurmountable.
> 
> 2.  Restructure the records.
> 
> I belive Paul included this for completeness and for further
> illustrating the problem.  Nevertheless, this would require
> significant processing by both the sender and receiver to combine and
> then split the records.  So (unless I've misunderstood something),
> this is not particularly appealing.
> 
> 3.  Changing STC to use xs:keyref and xs:unique.
> 
> In principle this is possible because these types allow you to say
> that combinations of values--e.g. STC id and VOResource
> identifier--must be unique.  However, this would require coordination
> across these two schemas, which would break their respective designs.
> Any use of xs:keyref within just STC (I believe) would inevitably
> encounter the same problem.
> 
> III.  Current Options
> 
> We need a solution pretty much right away as this problem is standing
> in the way of our registry upgrade work.  I think the simplest
> solution available is Paul's suggestion #1, with the variation I
> suggest to incorporate the registry's (or the resource's) IVOA ID.
> 
> Arnold could in principle, change the STC schema not to use the
> xs:ID/IDREF types.  It could retain the data model, but impose rules
> of uniqueness that are outside the capabilities of a an XML
> Schema-aware parser to check; this would require an
> application-specific validater to check.  This is not unprecedented as
> we have this in VOResource now.  However, I'm not sure this is
> practical on a short timescale, and if the #1 solution above is
> viable, then changing the STC schema may not be wise and worth the
> extra validater development required.
> 
> If we assume #2 and #3 above are not viable (especially given our
> schedule), the only other option is to drop the use of STC altogether
> from VOResource until a solution can be found.  We still have the
> ability to point to a footprint service.  Personally, I'm not ready to
> go here, yet.  I'm not about to propose an alternate schema to STC
> (for one, this is not a quick solution).  More importantly, I'm not
> ready to drop an important set of metadata--coverage--recommended by
> the RM because of a technical glitch in STC.
> 
> In conclusion, if you guys agree that solution #1 is the way to go,
> then we will need to get out (quickly) a concise, unambiguous
> description of how form and use these IDs.
> 
> cheers,
> Ray
> 
--------------------------------------------------------------------------
Arnold H. Rots                                Chandra X-ray Science Center
Smithsonian Astrophysical Observatory                tel:  +1 617 496 7701
60 Garden Street, MS 67                              fax:  +1 617 495 7356
Cambridge, MA 02138                             arots at head.cfa.harvard.edu
USA                                     http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------



More information about the registry mailing list