STC in VOResource records

Alberto Accomazzi aaccomazzi at cfa.harvard.edu
Tue Jan 2 16:57:49 PST 2007


Arnold's suggestion sounds the most logical to me but you have to be a 
bit careful in the implementation: isn't there a problem if the 
identifier of the resource contains a fragment already?  This may very 
well happen as discussed in the VO Identifier spec and can break the URI 
syntax for id/idref.  One solution would be to URI-encode the original 
resource identifier when automatically generating the id/idref pair.

-- Alberto


KevinBenson wrote:
> Trying to check the final decision?
> I suspect we are going with Paul's first suggestion of changing the 'id' 
> during publishing.  I would suggest that if we use the {identifier} of 
> that same resource record then it will always be unique and don't have 
> to worry about problems when harvesting the records.  Also clients if 
> told this can quite easily remove the identifier string part if desired 
> (when they do queries) plus each registry implementations does not have 
> to come up with some kind of unique string as long as they use the 
> identifier.
> so the id would be come id='{identifier of this Resource 
> record}#UTC-FK5-TOPO'
> 
> As far as the user inputs I would rather they just put in UTC-FK5-TOPO 
> and let the registry implementation via XSL, DOM, (or however) should be 
> able to figure out if the {identifier} is there or not and to add it, 
> but this is more of an individual registry implementation.
> 
> Does this all sound good and correct?  If so then I can make the fix and 
> start the upgrading very soon.
> 
> cheers,
> Kevin
> 
> Arnold Rots wrote:
>> Paul,
>>
>> With all due respect, I disagree, as I have argued before.
>> I whole-heartedly agree that it is perfectly possible to create
>> nonsense associations and that we need to rely on other means to guard
>> against that.
>> And I agree that changing from ID/IDREF allows validation against the
>> schema.
>> But it does not address the underlying (and in my opinion far more
>> important) problem: we need a mechanism that allows to specify
>> unambiguous associations - and I don't believe that problem is limited
>> to STC.  The issue is that if we allow identical association tags
>> (whether they be IDs or strings) in a (concatenated) document, the
>> associations become ambiguous.  What we need, therefore, is a
>> mechanism or a convention that ensures the creation of unique tags;
>> and once that is in place, it is immaterial whether they are IDs or
>> strings.
>> Put differently: the validation problem arises from the datatypes,
>> agreed, but if you solve that by changing the datatypes you have
>> introduced a more serious problem: ambiguous associations; and I'm
>> sure unambiguous associations are not only needed in STC.
>>
>> Hence the proposal that Jonathan and I made, yesterday.
>>
>>   - Arnold
>>
>>
>> Paul Harrison wrote:
>>  
>>> I have said some of this in private emails - but I am resummarizing  
>>> for the list
>>>
>>>
>>> On 14.12.2006, at 20:27, Arnold Rots wrote:
>>>
>>>    
>>>> Let's assume, for the sake of argument, that we are using ID/IDREF
>>>> pairs, though that is not essential (as I said before, the issue is
>>>> that the association needs to be unambiguous, not what the particular
>>>> datatype is).
>>>>       
>>> The problem *only* arises because the <AstroCoordSystem> id  
>>> attribute  and the <AstroCoordArea> coord_system_id attribute are of  
>>> ID and IDREF type - it is because the XML parser requires global  
>>> uniqueness of IDs in a document and that IDREFs point to IDs that  
>>> there is a problem with the XML validity of a harvest document,  
>>> because each VOResource record was using "human readable" IDs -e.g.  
>>> UTC-FK5-TOPO that are fine if each VOResource is a document on its  
>>> own, but become a problem for a harvest document of many such  
>>> VOResource elements. However, if these two attributes were typed as  
>>> strings then the XML parser would not try to enforce the uniqueness  
>>> and referential constraints - it would be up to an external system 
>>> to  ensure the "STC validity" of a document.
>>>
>>> Having the id and coord_system_id attributes as ID/IDREF does not  
>>> anyway guarantee the "STC validity" of a document anyway as all that  
>>> the XML parser checks is that the IDREF points at an ID somewhere  
>>> globally in the document - there is no guarantee that the  
>>> coord_system_id actually points at the id attribute of an  
>>> AstroCoordSystem - it can point to *any* id type - so the following  
>>> document is xml valid, but is obviously nonsense STC as all of the  
>>> IDREFs point at the ObsDataLocation id.
>>>
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <p:ObsDataLocation id="idvalue0" idref="idvalue0" ucd=""  
>>> xmlns:p="http://www.ivoa.net/xml/STC/stc-v1.30.xsd"  
>>> xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http:// 
>>> www.w3.org/2001/XMLSchema-instance">
>>>    <p:ObservatoryLocation id="idvalue1">
>>>    <p:AstroCoordSystem id="idvalue3"></p:AstroCoordSystem>
>>>    <p:AstroCoords coord_system_id="idvalue0"></p:AstroCoords>
>>>    </p:ObservatoryLocation>
>>>    <p:ObservationLocation id="idvalue2" idref="idvalue0" >
>>>    </p:ObservationLocation>
>>> </p:ObsDataLocation>
>>>
>>> I had earlier argued for using the xs:unique and xs:keyref schema  
>>> constructs to be used which could potentially be used to define the  
>>> exact scope of these references, but that would require some thought  
>>> as the scope always has to be within one of the global elements of  
>>> STC, which might end up restricting the use of STC itself in other  
>>> schema - in short, this is not a quick solution, but would require  
>>> careful consideration.
>>>
>>> In conclusion *not* using ID/IDREF (and making the attributes  
>>> xs:string or xs:anyURI) is IMHO the quickest and simplest solution 
>>> to  the immediate problem - it allows all current uses to STC still 
>>> to  work, allows the registry harvest document to be valid (with no  
>>> changes) and gives breathing space to come up with a referencing  
>>> scheme that is not directly checked by the XML parser, but by a yet  
>>> to be written STC validator.
>>>
>>> Paul Harrison
>>> ESO Garching
>>> www.eso.org
>>>
>>>     
>> -------------------------------------------------------------------------- 
>>
>> Arnold H. Rots                                Chandra X-ray Science 
>> Center
>> Smithsonian Astrophysical Observatory                tel:  +1 617 496 
>> 7701
>> 60 Garden Street, MS 67                              fax:  +1 617 495 
>> 7356
>> Cambridge, MA 02138                             
>> arots at head.cfa.harvard.edu
>> USA                                     
>> http://hea-www.harvard.edu/~arots/
>> -------------------------------------------------------------------------- 
>>
>>   



More information about the registry mailing list