STC in VOResource records

Ray Plante rplante at poplar.ncsa.uiuc.edu
Thu Dec 14 06:56:46 PST 2006


Hi RWGers,

So we have a bit of a crisis to contend with regarding our use of STC
within a VOResource record which is standing in the way of our upgrade
to RI v1.0.  To catch folks up, I'm going to summarize the problem and
review some useful input that others have made, and then try to
conclude with our current set of alternatives.

I. The Problem

We use the Space-Time Coordinates schema (STC) to describe a resource's
coverage of the sky, time, and frequency.  In STC, this is done by first 
defining "coordinate systems" for each of these things and then listing 
how the resource maps onto those systems.  A single, simple instance looks 
like this:

     <stc:STCResourceProfile
          xmlns="http://www.ivoa.net/xml/STC/stc-v1.30.xsd">

        <AstroCoordSystem xlink:type="simple"
                          xlink:href="ivo://STClib/CoordSys#UTC-FK5-TOPO"
                          id="UTC-FK5-TOPO"/>

        <AstroCoordArea coord_system_id="UTC-FK5-TOPO">
           <AllSky/>
        </AstroCoordArea>

     </stc:STCResourceProfile>

The <AstroCoordSystem> defines a system on the sky by refering to a 
"standard system", via the xlink attributes.  The <AstroCoordArea> 
describes the actual coverage on that system.  The two are linked through 
the id value, "UTC-FK5-TOPO", which by convention, matches the local 
identifier part of the xlink:href attribute.

An STC description may require multiple coordinate systems to describe its 
coverage, so it needs a way to uniquely connect a particular coverage 
description to a single coordinate system.  This is done with a little 
XML magic by making <AstroCoordSystem>'s id of type xs:ID and 
<AstroCoordArea>'s coord_system_id of type xs:IDREF.  For this to work, 
there must be only one id="UTC-FK5-TOPO" in the entire document.

This is easily satisfied when we have single VOResource records; however, 
the problem comes when we concatonate records into a single document. 
If every record follows the conventional choice, there will be many 
occurances of id="UTC-FK5-TOPO".  We could change this convention; 
however, we have to realize that the individual VOResource records are 
created independently, so some coordination is needed to ensure 
uniqueness.

Concatonation of VOResource records happens in two cases in the Registry 
Interface, within a harvesting response and within a search query 
response.  As Paul Harrison has pointed out, there is an analogous problem 
with VOEvent's use of STC, so this is likely to be a more general problem.

II. Discussion

Paul Harrison posted this very useful summary of suggested alternatives:

On Tue, 5 Dec 2006, Paul Harrison wrote:
> As I see it, there a several solutions to this,
>
> 1. The registry always rewrites the id and coord_system_id within a
> single record with unique values - e.g. ascending integers for a
> particular harvest set - this is relatively simple to implement, but
> is rather a shame to loose the "human readable" ids, however the
> document will be xml valid.
>
> 2. Gather all of the AstroCoordSystem definitions into a special
> record and retain their human readable IDs and then do not emit the
> individual AstroCoordSystem elements in the individual records -
> though for a normal query to the registry (returning one record), it
> must remember to insert the appropriate AstroCoordSystem(s) from the
> special record. This would be an extra level of complexity in the
> registries housekeeping that it has not had to deal with so far
> though.
>
> 3. Change the STC schema so that it does not use xs:ID and xs:IDREF
> types for the cross referencing, but use xs:unique and xs:keyref
> constraints to ensure integrity of the ids and references - this has
> the advantage that the scope of the uniqueness can be defined rather
> than it having to be global to the XML document, so that the ids
> could be scoped to be unique just within each registry record. This
> solution seems best to me as it retains XML parser checking of id
> uniqueness, allows "human readable" ids within each record, and
> requires no special processing by the registries.

Here are a few comments about these alternatives:

1. Rewriting IDs.

This would have to be done at both publishing time and harvesting
time since the IDs would have to be unique within the entire
registry.  Note that you can't just take what another registries id
when you harvest; consider:

   o  you have to make sure that the remote registry's locally unique
      id doesn't clash with yours.
   o  when you reharvest a record, you don't know what has changed or
      added, so every id must be at least examined and perhaps
      undated.

This might be made easier if we augment the id with the registry's
IVOA ID; e.g: id="nvo.ncsa/registry/5:UTC-FK5-TOPO".  In this case, we
would only need to set the ID at publishing time; subsequent rewriting
is not necessary.  Note that the ID part does not need refer to the
registry; it could be the ID of the resource itself.  If you used the
resource id, then you shouldn't need the additional "/5".

My biggest misgivings are:

   o  this requires special processing for a special subset of records
   o  we have to explain how (and why) to do this to publishers.  It's
      not simple.

These are not insurmountable.

2.  Restructure the records.

I belive Paul included this for completeness and for further
illustrating the problem.  Nevertheless, this would require
significant processing by both the sender and receiver to combine and
then split the records.  So (unless I've misunderstood something),
this is not particularly appealing.

3.  Changing STC to use xs:keyref and xs:unique.

In principle this is possible because these types allow you to say
that combinations of values--e.g. STC id and VOResource
identifier--must be unique.  However, this would require coordination
across these two schemas, which would break their respective designs.
Any use of xs:keyref within just STC (I believe) would inevitably
encounter the same problem.

III.  Current Options

We need a solution pretty much right away as this problem is standing
in the way of our registry upgrade work.  I think the simplest
solution available is Paul's suggestion #1, with the variation I
suggest to incorporate the registry's (or the resource's) IVOA ID.

Arnold could in principle, change the STC schema not to use the
xs:ID/IDREF types.  It could retain the data model, but impose rules
of uniqueness that are outside the capabilities of a an XML
Schema-aware parser to check; this would require an
application-specific validater to check.  This is not unprecedented as
we have this in VOResource now.  However, I'm not sure this is
practical on a short timescale, and if the #1 solution above is
viable, then changing the STC schema may not be wise and worth the
extra validater development required.

If we assume #2 and #3 above are not viable (especially given our
schedule), the only other option is to drop the use of STC altogether
from VOResource until a solution can be found.  We still have the
ability to point to a footprint service.  Personally, I'm not ready to
go here, yet.  I'm not about to propose an alternate schema to STC
(for one, this is not a quick solution).  More importantly, I'm not
ready to drop an important set of metadata--coverage--recommended by
the RM because of a technical glitch in STC.

In conclusion, if you guys agree that solution #1 is the way to go,
then we will need to get out (quickly) a concise, unambiguous
description of how form and use these IDs.

cheers,
Ray



More information about the registry mailing list