STC in VOResource records
Arnold Rots
arots at head.cfa.harvard.edu
Thu Dec 14 11:27:48 PST 2006
I just had a brief discussion with Jonathan and we may have a solution
that is a variation on Ray's, but generalized and formalized.
The problem really is how we can ensure that associations that are
unique within a document remain unique when elements are copied to,
or concatenated into, a new document.
Let's assume, for the sake of argument, that we are using ID/IDREF
pairs, though that is not essential (as I said before, the issue is
that the association needs to be unambiguous, not what the particular
datatype is).
If we require that all IVOA documents contain a document URI, assigned
by the publisher, then we can solve the problem by setting a rule that
all ID and IDREF tags, when extracted from the document, should
receive the document URI as a prefix.
Another way of putting it is that all tags should be URIs, but that
the common root may be omitted, provided that it is presented in a
document URI.
So, the STCResourceProfile from this document:
<MyResource ... documentURI="ivo://ncsa/MyResource">
...
<STCResourceProfile>
<AstroCoordSystem xlink:type="simple"
xlink:href="ivo://STClib/CoordSys#UTC-FK5-TOPO"
id="UTC-FK5-TOPO"/>
<AstroCoordArea coord_system_id="UTC-FK5-TOPO">
<AllSky/>
</AstroCoordArea>
</STCResourceProfile>
...
</MyResource>
gets extracted into the registry as:
<STCResourceProfile>
<AstroCoordSystem xlink:type="simple"
xlink:href="ivo://STClib/CoordSys#UTC-FK5-TOPO"
id="ivo://ncsa/MyResource/UTC-FK5-TOPO"/>
<AstroCoordArea coord_system_id="ivo://ncsa/MyResource/UTC-FK5-TOPO">
<AllSky/>
</AstroCoordArea>
</STCResourceProfile>
We believe that this would be a global and general solution for all
associations, in the registry and elsewhere.
- Arnold
Ray Plante wrote:
> Hi RWGers,
>
> So we have a bit of a crisis to contend with regarding our use of STC
> within a VOResource record which is standing in the way of our upgrade
> to RI v1.0. To catch folks up, I'm going to summarize the problem and
> review some useful input that others have made, and then try to
> conclude with our current set of alternatives.
>
> I. The Problem
>
> We use the Space-Time Coordinates schema (STC) to describe a resource's
> coverage of the sky, time, and frequency. In STC, this is done by first
> defining "coordinate systems" for each of these things and then listing
> how the resource maps onto those systems. A single, simple instance looks
> like this:
>
> <stc:STCResourceProfile
> xmlns="http://www.ivoa.net/xml/STC/stc-v1.30.xsd">
>
> <AstroCoordSystem xlink:type="simple"
> xlink:href="ivo://STClib/CoordSys#UTC-FK5-TOPO"
> id="UTC-FK5-TOPO"/>
>
> <AstroCoordArea coord_system_id="UTC-FK5-TOPO">
> <AllSky/>
> </AstroCoordArea>
>
> </stc:STCResourceProfile>
>
> The <AstroCoordSystem> defines a system on the sky by refering to a
> "standard system", via the xlink attributes. The <AstroCoordArea>
> describes the actual coverage on that system. The two are linked through
> the id value, "UTC-FK5-TOPO", which by convention, matches the local
> identifier part of the xlink:href attribute.
>
> An STC description may require multiple coordinate systems to describe its
> coverage, so it needs a way to uniquely connect a particular coverage
> description to a single coordinate system. This is done with a little
> XML magic by making <AstroCoordSystem>'s id of type xs:ID and
> <AstroCoordArea>'s coord_system_id of type xs:IDREF. For this to work,
> there must be only one id="UTC-FK5-TOPO" in the entire document.
>
> This is easily satisfied when we have single VOResource records; however,
> the problem comes when we concatonate records into a single document.
> If every record follows the conventional choice, there will be many
> occurances of id="UTC-FK5-TOPO". We could change this convention;
> however, we have to realize that the individual VOResource records are
> created independently, so some coordination is needed to ensure
> uniqueness.
>
> Concatonation of VOResource records happens in two cases in the Registry
> Interface, within a harvesting response and within a search query
> response. As Paul Harrison has pointed out, there is an analogous problem
> with VOEvent's use of STC, so this is likely to be a more general problem.
>
> II. Discussion
>
> Paul Harrison posted this very useful summary of suggested alternatives:
>
> On Tue, 5 Dec 2006, Paul Harrison wrote:
> > As I see it, there a several solutions to this,
> >
> > 1. The registry always rewrites the id and coord_system_id within a
> > single record with unique values - e.g. ascending integers for a
> > particular harvest set - this is relatively simple to implement, but
> > is rather a shame to loose the "human readable" ids, however the
> > document will be xml valid.
> >
> > 2. Gather all of the AstroCoordSystem definitions into a special
> > record and retain their human readable IDs and then do not emit the
> > individual AstroCoordSystem elements in the individual records -
> > though for a normal query to the registry (returning one record), it
> > must remember to insert the appropriate AstroCoordSystem(s) from the
> > special record. This would be an extra level of complexity in the
> > registries housekeeping that it has not had to deal with so far
> > though.
> >
> > 3. Change the STC schema so that it does not use xs:ID and xs:IDREF
> > types for the cross referencing, but use xs:unique and xs:keyref
> > constraints to ensure integrity of the ids and references - this has
> > the advantage that the scope of the uniqueness can be defined rather
> > than it having to be global to the XML document, so that the ids
> > could be scoped to be unique just within each registry record. This
> > solution seems best to me as it retains XML parser checking of id
> > uniqueness, allows "human readable" ids within each record, and
> > requires no special processing by the registries.
>
> Here are a few comments about these alternatives:
>
> 1. Rewriting IDs.
>
> This would have to be done at both publishing time and harvesting
> time since the IDs would have to be unique within the entire
> registry. Note that you can't just take what another registries id
> when you harvest; consider:
>
> o you have to make sure that the remote registry's locally unique
> id doesn't clash with yours.
> o when you reharvest a record, you don't know what has changed or
> added, so every id must be at least examined and perhaps
> undated.
>
> This might be made easier if we augment the id with the registry's
> IVOA ID; e.g: id="nvo.ncsa/registry/5:UTC-FK5-TOPO". In this case, we
> would only need to set the ID at publishing time; subsequent rewriting
> is not necessary. Note that the ID part does not need refer to the
> registry; it could be the ID of the resource itself. If you used the
> resource id, then you shouldn't need the additional "/5".
>
> My biggest misgivings are:
>
> o this requires special processing for a special subset of records
> o we have to explain how (and why) to do this to publishers. It's
> not simple.
>
> These are not insurmountable.
>
> 2. Restructure the records.
>
> I belive Paul included this for completeness and for further
> illustrating the problem. Nevertheless, this would require
> significant processing by both the sender and receiver to combine and
> then split the records. So (unless I've misunderstood something),
> this is not particularly appealing.
>
> 3. Changing STC to use xs:keyref and xs:unique.
>
> In principle this is possible because these types allow you to say
> that combinations of values--e.g. STC id and VOResource
> identifier--must be unique. However, this would require coordination
> across these two schemas, which would break their respective designs.
> Any use of xs:keyref within just STC (I believe) would inevitably
> encounter the same problem.
>
> III. Current Options
>
> We need a solution pretty much right away as this problem is standing
> in the way of our registry upgrade work. I think the simplest
> solution available is Paul's suggestion #1, with the variation I
> suggest to incorporate the registry's (or the resource's) IVOA ID.
>
> Arnold could in principle, change the STC schema not to use the
> xs:ID/IDREF types. It could retain the data model, but impose rules
> of uniqueness that are outside the capabilities of a an XML
> Schema-aware parser to check; this would require an
> application-specific validater to check. This is not unprecedented as
> we have this in VOResource now. However, I'm not sure this is
> practical on a short timescale, and if the #1 solution above is
> viable, then changing the STC schema may not be wise and worth the
> extra validater development required.
>
> If we assume #2 and #3 above are not viable (especially given our
> schedule), the only other option is to drop the use of STC altogether
> from VOResource until a solution can be found. We still have the
> ability to point to a footprint service. Personally, I'm not ready to
> go here, yet. I'm not about to propose an alternate schema to STC
> (for one, this is not a quick solution). More importantly, I'm not
> ready to drop an important set of metadata--coverage--recommended by
> the RM because of a technical glitch in STC.
>
> In conclusion, if you guys agree that solution #1 is the way to go,
> then we will need to get out (quickly) a concise, unambiguous
> description of how form and use these IDs.
>
> cheers,
> Ray
>
--------------------------------------------------------------------------
Arnold H. Rots Chandra X-ray Science Center
Smithsonian Astrophysical Observatory tel: +1 617 496 7701
60 Garden Street, MS 67 fax: +1 617 495 7356
Cambridge, MA 02138 arots at head.cfa.harvard.edu
USA http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------
More information about the registry
mailing list