schema validate question

KevinBenson kmb at mssl.ucl.ac.uk
Tue Dec 12 08:55:23 PST 2006


This is a very long e-mail, but it seems a potential problem that needs 
to get resolved probably before any Registries are upgraded to 1.0. 

I suspected a possible easy solution  and originally sent it to Ray, 
Paul Harrison, Guy Rixon and later Arnold Rots became involved.  But it 
seems it might be more complex and was not as easy of a solution.  So I 
am e-mailing to the Registry list for possible ideas/solutions or even 
confirmations of some ideas PaulH wrote.

Please scroll down to the bottom to read the original e-mail and read 
bottom-up to see the replies.  I separated all the original responses 
with "---------".  The original is at "------------ (original by Kevin)" 
at the bottom.  So please give it a good read.

cheers,
Kevin
p.s. (let me know if you need me to attach the samples.xml, but it is 
essentially the same samples Ray made on the RegUpgrade wiki site just 
bundled together)

Arnold Rots wrote:
> Here is the problem I am having with the unique/keyref solution.
> If I understand the mechainsm correctly, the scope is determined in
> the schema.  That means that every user would be required to repeat
> coordinate system definitions in STC-compliant documents which is
> something I wanted to avoid.
> It would seem reasonable to allow people to define a unique coordinate
> system once in a document and have it reused all over the place.
>
> I agree that Paul's solution makes life easier for the registry, but
> it takes flexibility away from everone else.
>
> Cheers,
>
>   - Arnold
>   

I'm currently swamped, but after that I'll try to understand unique
and keyref.  My main concern is not to break anything that is working
now, or to make life more complicated for other users.

  - Arnold

-------(response from Paul and reply by Kevin)

> KevinBenson wrote:
>   
>> Put a couple of comments below.   Maybe need to open up this issue more 
>> to the reg list?   But be good to get this issue resolved soon (very 
>> soon).  I have pretty much had registry for awhile working with 1.0 but 
>> other things and delays has caused it not to quite go to our main CVS 
>> HEAD, but would like to in the next week or so.
>> Of course though there is no point in doing any kind of release if 'the 
>> fix' is going to require some registry coding.  So in a way it's good 
>> this problem has been caught now before our small publishing registries 
>> were upgraded.
>>
>> cheers,
>> Kevin
>>
>> Paul Harrison wrote:
>>     
>>> On 04.12.2006, at 22:39, Ray Plante wrote:
>>>
>>>       
>>>> Hey Arnold,
>>>>
>>>> On Mon, 4 Dec 2006, Arnold Rots wrote:
>>>>         
>>>>> But why do you need multiple occurrences of the same coordinate
>>>>> system?  You can include it once and make multiple references to it.
>>>>>           
>>>> Yes, thank you--you are right.  This takes care of the first case.
>>>>
>>>> The other case occurs when we esstentially concatonate multiple 
>>>> resource descriptions together into one XML document during 
>>>> harvesting.  This is the sort of case that Kevin ran into.  The 
>>>> multiple occurances look something like this:
>>>>
>>>>        <stc:STCResourceProfile
>>>>             xmlns="http://www.ivoa.net/xml/STC/stc-v1.30.xsd">
>>>>           <AstroCoordSystem xlink:type="simple"
>>>>                             
>>>> xlink:href="ivo://STClib/CoordSys#UTC-FK5-TOPO"
>>>>                             id="UTC-FK5-TOPO"/>
>>>>           <AstroCoordArea coord_system_id="UTC-FK5-TOPO">
>>>>              <AllSky/>
>>>>           </AstroCoordArea>
>>>>        </stc:STCResourceProfile>
>>>>         
>>> As I see it, there a several solutions to this,
>>>
>>> 1. The registry always rewrites the id and coord_system_id within a 
>>> single record with unique values - e.g. ascending integers for a 
>>> particular harvest set - this is relatively simple to implement, but 
>>> is rather a shame to loose the "human readable" ids, however the 
>>> document will be xml valid.
>>>       
>> Yes possible but still would require not just some ascending number but 
>> probably an authorityid or something else to make it unique.  Otherwise 
>> will still run into a problem when a user does a harvest.  Probably a 
>> 'time in milliseconds' instead of an ascending number.  Overall not sure 
>> if I like this idea though.
>>     
>>> 2. Gather all of the AstroCoordSystem definitions into a special 
>>> record and retain their human readable IDs and then do not emit the 
>>> individual AstroCoordSystem elements in the individual records - 
>>> though for a normal query to the registry (returning one record), it 
>>> must remember to insert the appropriate AstroCoordSystem(s) from the 
>>> special record. This would be an extra level of complexity in the 
>>> registries housekeeping that it has not had to deal with so far though.
>>>
>>>       
>> Yep more coding involved on this one, but not sure if it resolves the 
>> issue of harvesting.
>>     
>>> 3. Change the STC schema so that it does not use xs:ID and xs:IDREF 
>>> types for the cross referencing, but use xs:unique and xs:keyref 
>>> constraints to ensure integrity of the ids and references - this has 
>>> the advantage that the scope of the uniqueness can be defined rather 
>>> than it having to be global to the XML document, so that the ids could 
>>> be scoped to be unique just within each registry record. This solution 
>>> seems best to me as it retains XML parser checking of id uniqueness, 
>>> allows "human readable" ids within each record, and requires no 
>>> special processing by the registries.
>>>       
>> Yes this seems like a good choice.  if I read your response right, I 
>> suspect then if your doing a multiple Resource update or a Harvest 
>> ListRecords that you won't get the validation problem.   As noted by 
>> Guy&Ray that I would suspect this would require a namespace change hence 
>> a namespace change on the extension schemas that use stc correct?
>>
>>     
>>> Paul.
>>>       
---------------(response from Ray)
On Mon, 4 Dec 2006, Arnold Rots wrote:
> But why do you need multiple occurrences of the same coordinate
> system?  You can include it once and make multiple references to it.

Yes, thank you--you are right.  This takes care of the first case.

The other case occurs when we esstentially concatonate multiple resource 
descriptions together into one XML document during harvesting.  This is 
the sort of case that Kevin ran into.  The multiple occurances look 
something like this:

       <stc:STCResourceProfile
            xmlns="http://www.ivoa.net/xml/STC/stc-v1.30.xsd">
          <AstroCoordSystem xlink:type="simple"
                            xlink:href="ivo://STClib/CoordSys#UTC-FK5-TOPO"
                            id="UTC-FK5-TOPO"/>
          <AstroCoordArea coord_system_id="UTC-FK5-TOPO">
             <AllSky/>
          </AstroCoordArea>
       </stc:STCResourceProfile>

cheers,
Ray


---------------(response from Arnold)

But why do you need multiple occurrences of the same coordinate
system?  You can include it once and make multiple references to it.

  - Arnold


---------------- (response from Ray)
Hey Kevin,

> Just wanted to resend this, if I don't get much of an answer in the 
> next couple of days will send it to the registry group.

I don't know why I didn't get this the first time.

>> Problem is I get validation errors such as these:
>> cvc-attribute.3: The value 'UTC-FK5-TOPO' of attribute 'id' on 
>> element 'AstroCoordSystem' is not valid with respect to its type, 'ID'.
>> cvc-id.2: There are multiple occurrences of ID value 'UTC-FK5-TOPO'.

Okay, this is definitely a problem.

The STC schema basically stipulates the use of xml ids to relate parts 
of the description together, and it is its convention to use an id that 
matches standard names.  Unfortunately, do to XML rules, the same ID 
cannot appear twice in one file.  In the examples we've had (both from 
the STC and my examples for the registry), this has been the case.  
However, we will run into this problem in the following situations:

   *  a record contains two references to the same standard coordinate
        system
   *  multiple records with references to the same system in a harvesting
        response.

It seems that the answer is that we need a different convention for 
naming these IDs.  Unfortunately, given the above context, they probably 
need to be globally unique to avoid this collision.  Rats!!

Does anyone have any good ideas here?
------------ (original by Kevin)
Curious I just noticed on my msslxt site I forgot to add back the 
registry samples you made Ray.  I now went back and constructed the 
samples to put in the registry.

Problem is I get validation errors such as these:
cvc-attribute.3: The value 'UTC-FK5-TOPO' of attribute 'id' on element 
'AstroCoordSystem' is not valid with respect to its type, 'ID'.
cvc-id.2: There are multiple occurrences of ID value 'UTC-FK5-TOPO'.


I was thinking there was something I might have done to fix this or you 
did Ray, but can't remember.  I should note with some experimentation I 
noticed I could take the sample resources and input them individually 
fine, but not as a group.  Also when I input them individually which 
worked then tried to do a harvest I got the same validation errors.  So 
for now I just took out the coverage data on the test registry that way 
I can harvest with no errors to it.
So flagging this up, maybe this is a simple fix.  Or possibly something 
bigger not sure.  I have attached a Samples that was not working.  The 
wrapper elements around it "StoreResources" is something astrogrid uses 
for inputs.

ok I have to log off and go to the airport to pick somebody up, probably 
won't be back for a couple of hours at least.

cheers,
Kevin

------------------------------------------------------------------------





More information about the registry mailing list