STC in VOResource records
KevinBenson
kmb at mssl.ucl.ac.uk
Tue Jan 2 02:18:46 PST 2007
Trying to check the final decision?
I suspect we are going with Paul's first suggestion of changing the 'id'
during publishing. I would suggest that if we use the {identifier} of
that same resource record then it will always be unique and don't have
to worry about problems when harvesting the records. Also clients if
told this can quite easily remove the identifier string part if desired
(when they do queries) plus each registry implementations does not have
to come up with some kind of unique string as long as they use the
identifier.
so the id would be come id='{identifier of this Resource
record}#UTC-FK5-TOPO'
As far as the user inputs I would rather they just put in UTC-FK5-TOPO
and let the registry implementation via XSL, DOM, (or however) should be
able to figure out if the {identifier} is there or not and to add it,
but this is more of an individual registry implementation.
Does this all sound good and correct? If so then I can make the fix and
start the upgrading very soon.
cheers,
Kevin
Arnold Rots wrote:
> Paul,
>
> With all due respect, I disagree, as I have argued before.
> I whole-heartedly agree that it is perfectly possible to create
> nonsense associations and that we need to rely on other means to guard
> against that.
> And I agree that changing from ID/IDREF allows validation against the
> schema.
> But it does not address the underlying (and in my opinion far more
> important) problem: we need a mechanism that allows to specify
> unambiguous associations - and I don't believe that problem is limited
> to STC. The issue is that if we allow identical association tags
> (whether they be IDs or strings) in a (concatenated) document, the
> associations become ambiguous. What we need, therefore, is a
> mechanism or a convention that ensures the creation of unique tags;
> and once that is in place, it is immaterial whether they are IDs or
> strings.
> Put differently: the validation problem arises from the datatypes,
> agreed, but if you solve that by changing the datatypes you have
> introduced a more serious problem: ambiguous associations; and I'm
> sure unambiguous associations are not only needed in STC.
>
> Hence the proposal that Jonathan and I made, yesterday.
>
> - Arnold
>
>
> Paul Harrison wrote:
>
>> I have said some of this in private emails - but I am resummarizing
>> for the list
>>
>>
>> On 14.12.2006, at 20:27, Arnold Rots wrote:
>>
>>
>>> Let's assume, for the sake of argument, that we are using ID/IDREF
>>> pairs, though that is not essential (as I said before, the issue is
>>> that the association needs to be unambiguous, not what the particular
>>> datatype is).
>>>
>> The problem *only* arises because the <AstroCoordSystem> id
>> attribute and the <AstroCoordArea> coord_system_id attribute are of
>> ID and IDREF type - it is because the XML parser requires global
>> uniqueness of IDs in a document and that IDREFs point to IDs that
>> there is a problem with the XML validity of a harvest document,
>> because each VOResource record was using "human readable" IDs -e.g.
>> UTC-FK5-TOPO that are fine if each VOResource is a document on its
>> own, but become a problem for a harvest document of many such
>> VOResource elements. However, if these two attributes were typed as
>> strings then the XML parser would not try to enforce the uniqueness
>> and referential constraints - it would be up to an external system to
>> ensure the "STC validity" of a document.
>>
>> Having the id and coord_system_id attributes as ID/IDREF does not
>> anyway guarantee the "STC validity" of a document anyway as all that
>> the XML parser checks is that the IDREF points at an ID somewhere
>> globally in the document - there is no guarantee that the
>> coord_system_id actually points at the id attribute of an
>> AstroCoordSystem - it can point to *any* id type - so the following
>> document is xml valid, but is obviously nonsense STC as all of the
>> IDREFs point at the ObsDataLocation id.
>>
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <p:ObsDataLocation id="idvalue0" idref="idvalue0" ucd=""
>> xmlns:p="http://www.ivoa.net/xml/STC/stc-v1.30.xsd"
>> xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://
>> www.w3.org/2001/XMLSchema-instance">
>> <p:ObservatoryLocation id="idvalue1">
>> <p:AstroCoordSystem id="idvalue3"></p:AstroCoordSystem>
>> <p:AstroCoords coord_system_id="idvalue0"></p:AstroCoords>
>> </p:ObservatoryLocation>
>> <p:ObservationLocation id="idvalue2" idref="idvalue0" >
>> </p:ObservationLocation>
>> </p:ObsDataLocation>
>>
>> I had earlier argued for using the xs:unique and xs:keyref schema
>> constructs to be used which could potentially be used to define the
>> exact scope of these references, but that would require some thought
>> as the scope always has to be within one of the global elements of
>> STC, which might end up restricting the use of STC itself in other
>> schema - in short, this is not a quick solution, but would require
>> careful consideration.
>>
>> In conclusion *not* using ID/IDREF (and making the attributes
>> xs:string or xs:anyURI) is IMHO the quickest and simplest solution to
>> the immediate problem - it allows all current uses to STC still to
>> work, allows the registry harvest document to be valid (with no
>> changes) and gives breathing space to come up with a referencing
>> scheme that is not directly checked by the XML parser, but by a yet
>> to be written STC validator.
>>
>> Paul Harrison
>> ESO Garching
>> www.eso.org
>>
>>
> --------------------------------------------------------------------------
> Arnold H. Rots Chandra X-ray Science Center
> Smithsonian Astrophysical Observatory tel: +1 617 496 7701
> 60 Garden Street, MS 67 fax: +1 617 495 7356
> Cambridge, MA 02138 arots at head.cfa.harvard.edu
> USA http://hea-www.harvard.edu/~arots/
> --------------------------------------------------------------------------
>
More information about the registry
mailing list