STC in VOResource records
Alberto Accomazzi
aaccomazzi at cfa.harvard.edu
Tue Jan 2 16:57:49 PST 2007
Arnold's suggestion sounds the most logical to me but you have to be a
bit careful in the implementation: isn't there a problem if the
identifier of the resource contains a fragment already? This may very
well happen as discussed in the VO Identifier spec and can break the URI
syntax for id/idref. One solution would be to URI-encode the original
resource identifier when automatically generating the id/idref pair.
-- Alberto
KevinBenson wrote:
> Trying to check the final decision?
> I suspect we are going with Paul's first suggestion of changing the 'id'
> during publishing. I would suggest that if we use the {identifier} of
> that same resource record then it will always be unique and don't have
> to worry about problems when harvesting the records. Also clients if
> told this can quite easily remove the identifier string part if desired
> (when they do queries) plus each registry implementations does not have
> to come up with some kind of unique string as long as they use the
> identifier.
> so the id would be come id='{identifier of this Resource
> record}#UTC-FK5-TOPO'
>
> As far as the user inputs I would rather they just put in UTC-FK5-TOPO
> and let the registry implementation via XSL, DOM, (or however) should be
> able to figure out if the {identifier} is there or not and to add it,
> but this is more of an individual registry implementation.
>
> Does this all sound good and correct? If so then I can make the fix and
> start the upgrading very soon.
>
> cheers,
> Kevin
>
> Arnold Rots wrote:
>> Paul,
>>
>> With all due respect, I disagree, as I have argued before.
>> I whole-heartedly agree that it is perfectly possible to create
>> nonsense associations and that we need to rely on other means to guard
>> against that.
>> And I agree that changing from ID/IDREF allows validation against the
>> schema.
>> But it does not address the underlying (and in my opinion far more
>> important) problem: we need a mechanism that allows to specify
>> unambiguous associations - and I don't believe that problem is limited
>> to STC. The issue is that if we allow identical association tags
>> (whether they be IDs or strings) in a (concatenated) document, the
>> associations become ambiguous. What we need, therefore, is a
>> mechanism or a convention that ensures the creation of unique tags;
>> and once that is in place, it is immaterial whether they are IDs or
>> strings.
>> Put differently: the validation problem arises from the datatypes,
>> agreed, but if you solve that by changing the datatypes you have
>> introduced a more serious problem: ambiguous associations; and I'm
>> sure unambiguous associations are not only needed in STC.
>>
>> Hence the proposal that Jonathan and I made, yesterday.
>>
>> - Arnold
>>
>>
>> Paul Harrison wrote:
>>
>>> I have said some of this in private emails - but I am resummarizing
>>> for the list
>>>
>>>
>>> On 14.12.2006, at 20:27, Arnold Rots wrote:
>>>
>>>
>>>> Let's assume, for the sake of argument, that we are using ID/IDREF
>>>> pairs, though that is not essential (as I said before, the issue is
>>>> that the association needs to be unambiguous, not what the particular
>>>> datatype is).
>>>>
>>> The problem *only* arises because the <AstroCoordSystem> id
>>> attribute and the <AstroCoordArea> coord_system_id attribute are of
>>> ID and IDREF type - it is because the XML parser requires global
>>> uniqueness of IDs in a document and that IDREFs point to IDs that
>>> there is a problem with the XML validity of a harvest document,
>>> because each VOResource record was using "human readable" IDs -e.g.
>>> UTC-FK5-TOPO that are fine if each VOResource is a document on its
>>> own, but become a problem for a harvest document of many such
>>> VOResource elements. However, if these two attributes were typed as
>>> strings then the XML parser would not try to enforce the uniqueness
>>> and referential constraints - it would be up to an external system
>>> to ensure the "STC validity" of a document.
>>>
>>> Having the id and coord_system_id attributes as ID/IDREF does not
>>> anyway guarantee the "STC validity" of a document anyway as all that
>>> the XML parser checks is that the IDREF points at an ID somewhere
>>> globally in the document - there is no guarantee that the
>>> coord_system_id actually points at the id attribute of an
>>> AstroCoordSystem - it can point to *any* id type - so the following
>>> document is xml valid, but is obviously nonsense STC as all of the
>>> IDREFs point at the ObsDataLocation id.
>>>
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <p:ObsDataLocation id="idvalue0" idref="idvalue0" ucd=""
>>> xmlns:p="http://www.ivoa.net/xml/STC/stc-v1.30.xsd"
>>> xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://
>>> www.w3.org/2001/XMLSchema-instance">
>>> <p:ObservatoryLocation id="idvalue1">
>>> <p:AstroCoordSystem id="idvalue3"></p:AstroCoordSystem>
>>> <p:AstroCoords coord_system_id="idvalue0"></p:AstroCoords>
>>> </p:ObservatoryLocation>
>>> <p:ObservationLocation id="idvalue2" idref="idvalue0" >
>>> </p:ObservationLocation>
>>> </p:ObsDataLocation>
>>>
>>> I had earlier argued for using the xs:unique and xs:keyref schema
>>> constructs to be used which could potentially be used to define the
>>> exact scope of these references, but that would require some thought
>>> as the scope always has to be within one of the global elements of
>>> STC, which might end up restricting the use of STC itself in other
>>> schema - in short, this is not a quick solution, but would require
>>> careful consideration.
>>>
>>> In conclusion *not* using ID/IDREF (and making the attributes
>>> xs:string or xs:anyURI) is IMHO the quickest and simplest solution
>>> to the immediate problem - it allows all current uses to STC still
>>> to work, allows the registry harvest document to be valid (with no
>>> changes) and gives breathing space to come up with a referencing
>>> scheme that is not directly checked by the XML parser, but by a yet
>>> to be written STC validator.
>>>
>>> Paul Harrison
>>> ESO Garching
>>> www.eso.org
>>>
>>>
>> --------------------------------------------------------------------------
>>
>> Arnold H. Rots Chandra X-ray Science
>> Center
>> Smithsonian Astrophysical Observatory tel: +1 617 496
>> 7701
>> 60 Garden Street, MS 67 fax: +1 617 495
>> 7356
>> Cambridge, MA 02138
>> arots at head.cfa.harvard.edu
>> USA
>> http://hea-www.harvard.edu/~arots/
>> --------------------------------------------------------------------------
>>
>>
More information about the registry
mailing list