[QUANTITY] Data Model for Quantity v0.5 - Serialisation

Martin Hill mchill at dial.pipex.com
Mon May 10 09:32:34 PDT 2004


Brian Thomas wrote:

>>Suggested serialization changes:
>>
>>- Array values tagged rather than space-separated (Hill)
> 

I seem to remember having had this conversation before... :-)

> 	I disagree. First: the present serialization *does* allow for tagged values.

Options to tag or not tag just makes interoperability harder - code has 
to cope with both for example.  Let's do it or not do it.

> 	There is an *option* to not tag values. Why? For several important reasons:
> 
> 	1. Compactness (tags may double or triple data size. For large amounts of data, this is
>             unacceptable)

No.  Where data size is an issue *at all* then we should *not* be using 
ASCII XML.

I'm all for small and neat and fast, but optimisation should be left 
until we know it's required.  It's not a reason to make things difficult 
before we start.  We know data sizes in some cases are going to be 
large; we already know (?) that ASCII XML is not an appropriate data 
form to use for these cases.

> 	2. Compatability with legacy data.

Examples please!  Anything that is being transformed into XML should be 
taggable?  If it's a problem with legacy data being space-separated XML, 
then this is a good example of a reason not to do it in the future!

> 	Furthermore, (and perhaps I am wrong, someone please confirm/deny) You CAN have
> 	PCDATA that are separated by whitespace and have it validate by a schema, so this 
> 	sort of thing *is* allowed in the XML (schema) spec, as far as I'm aware. If an application
> 	breaks because of space-delmits, then its because the app is not completely XML compliant.
> 
> 	[And its simply not that hard to parse a set of datum that are separated by strings!!]

Space-delimited values are technically part of XML.  However this is 
breaking the spirit of it, just as positionally-dependent cell values in 
VOTable do the same.  It's not a case of breaking apps, as requiring 
extra code to deal with it when it's not necessary.

I was not aware that you could get schemas to validate space-delimited 
values within an element; I can't find anything to say yes or no as to 
how you do it and what the limitations are.  Can you check type? 
Enumerations?  Maximum/Minimum occurances?

Space-delimited values:

1) In this example (7.8/8) are positionally dependent, so the document 
does not explicitly associate values with their axis.  This:

    a) is not robust - ie it's an easy place for bugs to creep in unnoticed.
    b) makes any meaningful XPath/XQuery statement very unpleasant. 
Which will make any XSLT style sheet that has to do anything with the 
values very nasty.
    c) Require extra code to reassociate the values.

2) Require extra code in general to do the extra parsing (though I agree 
it is trivial, it still needs extra code for every language and 
application that reads it).

3) Will not properly load into automatically-generated object models (eg 
castor).

4) Limits you to values that do not take spaces, or require a whole load 
of extra checks and processing.  Can we be sure that Quantity will never 
contain, say, an enumeration of values that might be strings?

Cheers,

Martin

-- 
Martin Hill
www.mchill.net
07901 55 24 66



More information about the dm mailing list