Followup to Re: utype questions

Thu Jul 2 11:00:50 PDT 2009

Hi all,

Well, I suppose I wasn't precise or clear enough in my last email to 
provide substantive criticism of the Utype proposal to which others
might respond. In fact, re-reading it, it does look like I         
am only trolling for a fight, and this is not my intention (nor         
 is it to insult the proposers, so please accept my apology        
if that is how my last email was taken!).                          

I am, in fact, very much in support of the idea behind the Utype
proposal that components of one data model may be shared/re-used
 within another. What I have issue with is the present description
and the limitations which occur because of the choices the authors
have made. As I will attempt to illustrate below, I believe some  
of these limitations are fatal to the purpose of the proposal and 
create (presently) and unworkable spec except for the very simplest
cases of use.                                                      

So. Allow me to re-try my criticism from another angle, which will
hopefully be clearer to the rest of the community.                

To start with, from reading the documentation (reference [1],[4];
with additional modifying possible proposals: [2] [3]) I understand the 
following main points about Utypes:             

A. Represent data model elements ([1], pg 1)

B. May be serialized in a VOTable as param elements or fields of 
the table ([1], pg 5).                                           

C. Elements may be complex or simple, single values ([1] pg 10) 

D. Use the grouping mechanism in VOTable to be used for multiple instances
of the same model within a single file ([1] pg 10, [4] pg 16).            

E. Utype syntax ([1] pg 8) is meant to represent components of a model. 
It is composed of elements model-name, package-name, class-name and     
includes a final position for properties which may be one of            
attribute-name, collection or reference (using XML ID/IDREF mechanism,  
[1] pg 10). Some sort of chaining mechanism exists for naming Utypes    
which occur within other models ([1] bottom of pg 10 and illustrated    
in [2] on pg. 7)                                                        

Now on to questions problems which I see in the present document:

1. Where is the description of the ID/IDREF mechanism for the Utype
syntax? Much depends on this working properly, including how easily
the parser will be able to associate various parent model element  
instance to the correct child model instances, the ability         
to correctly order collected child model elements (important in    
some cases, will you infer this from the order of appearance of    
model elements in the file? If so, its not mentioned), and just    
how this might be made to work within the context of a FITS file   
(or HTML table for that matter, as mentioned in [2]) is very unclear.                   

2. Why use of the grouping mechanism in VOTable to handle multiple instances
of a model? This means that the Utype spec is not a complete standalone     
specification. You are relying on an outside structure (grouping) to        
provide meaning to the reconstructed model (ie. which child nodes belong    
to which parents). If I wanted to use Utypes for FITS tables or HTML,       
then what do I replace the grouping mechanism with?                         

3. Why is the chaining mechanism important? If the grouping and
ID/IDREF referencing is working, you don't need it (child will already 
know the parent).                                                     

And now some criticism based on the above. 

1. Without working examples for FITS/HTML which use referencing and
grouping, I can only conclude that this proposal really only amounts 
to a modification of the VOTable specification. Utypes spec relies on
outside structures to function, namely the grouping which is an element
of VOTable, and ID/IDREF is an XML mechanism which doesn't exist in FITS.

Please explain how Utypes are to be used with nesting/referencing in FITS and
provide examples of the ID/IDREF mechanism at work.

2. Why isn't it reasonable to use an existing XML/XML-schema mechanism of
import/redefine to allow 'foreign' model elements within the VOTable? This
is well-tested, XML parsers grok it already, and its standard practice.   
I assume that it must be related to some requirements which are not       
present in [1] (I see that in [2] the Utypes are supposed to allow the    
'application to treat the Utypes as opaque strings'). All sorts of bad    
things flow from the present specification, here's a short list:          

  * You *must* modify the VOTable parser (to understand Utype attribute,
    the grouping, and the special ID/IDREF mechanism of Utypes). No     
    allowance for passing over.                                         

  * Not possible to create a stand-alone library for grokking Utypes, 
    instead it will have to be tightly integrated with the parser (VOTable, 
    FITs, and whatever is reading your HTML tables, should that be          
    attempted as well).                                                     

Please consider a different nesting/referencing mechanism from the present ones.

3. I wonder if repeated objects must be sent, why use VOTable to serialize 
them? Why not develop a light-weight collection model in XML, and allow    
compression of the contents? The amount of application/semantics work      
looks to be about the same to me.                                          

Furthermore, unless serendipity reigns, you are going to require the re-
factoring of various data models so that they conform to Utype standard. Are 
you suggesting this is mandatory for all VO accepted models or will their   
be a contributory list of kosher models which work with Utype? Again, this  
adds work to implementation of Utypes successfully.                         

Even if not mandatory, some models will have to be re-factored (along with
any existing software based on them), so some work will exist for this.   
Along with the work on the VOTable parser, it seems like significant effort
will be needed to implement this proposal.                                 

Please point out the savings of doing it in this manner over a simpler
approach of importing/redefing elements in XML.                       

---

Well, that's enough for now (I have tried to stick purely with [1] and
to main problem points, I have some support for and criticism of [2]
and [3], but I don't want to run on).

Hopefully I have been precise and clear. I have tried to cite the material
as well as read it closely. It is possible that I missed something which
has led to my misunderstanding and I am happy to be corrected on this.

Again, this is written in the spirit of substantive criticism, not a flame
bait.

Regards,

=brian

References
----------

[1] Utype: a data model field name convention, v0.3, IVOA Note draft,
     May 24, 2009

[2] Utype proposals, Norman Grey, 2009?,
     http://nxg.me.uk/note/utype-proposals

[3] Utypes and URIs, IVOA Note,v1.0, 2007, Norman Grey
     http://www.ivoa.net/Documents/latest/utype-uri.html

[4] Simple Spectral Access Protocol, v104,
     http://www.ivoa.net/Documents/latest/SSA