Nulls in VOTables in TAP

Francois Ochsenbein (ext.52429) francois at cdsarc.u-strasbg.fr
Mon Jul 4 05:38:23 PDT 2011


Hi Tom,

I basically agree with all of Mark Taylor's answers:

* yes, VOTable was designed on the basis of FITS, not as
  a DBMS subset -- NaN and a database 'null' are considered
  as the same thing as it is in fits binary table; and
  in the case of an array of floats/doubles in <TABLEDATA> 
  seralization, a simple space can't work, hence the "NaN"
  alternative of the empty <TD/>...

* yes there is some confusion for the boolean, the FITS
  document indicates only the possibilities T F and hexa 00
  (but the hexa 00 can't be used for an array in the <TABLEDATA>
  seralization, problem similar to the NaN for doubles)

* for integers, no bit pattern exists for undefined value.
  It is just "suggested" in the section 4.7 to use the value
  -32768 for short integers.
  
In fact the lowest integer numbers are frequently used as the
bit pattern for "null" integers (the lowest integer numbers
are their own opposite); these numbers are:
  -32768               (0x8000) for short int, 
  -2147483648          (0x80000000) for 32-bit integers,
  -9223372036854775808 (0x8000000000000000) for longs
These values are those assigned by the gnu C compiler 
(and fortran as far sa I know) in instructions like  
  i = x
if x is a double with NaN value and i is an integer.

Unfortunately, it seems that the java compiler does not use
the same convention, a Double.shortValue/intValue/longValue()
returns a value of zero as the corresponding integer of a
NaN double...

Cheers, francois

>
>My recent security issues have caused me to relook at some of the
>formatting options for VOTables and in doing so I've become a bit
>confused about how database nulls should be handled properly.  It
>doesn't look like any VOTable representation can do a proper job of
>handling nulls as they appear in databases consistently with the
>recommendations of the VOTable standard.
>
>The TABLEDATA representation could do pretty well.  It could in
>principle represent nulls for most types by having empty text in the
>appropriate TD element.  This could work for all types except that it
>cannot distinguish between 0 length arrays and null arrays.  Most
>databases allow for 0 length strings distinct from null strings so
>that's a bit of an issue but we can probably live with it.  However
>the VOTable standard seems to suggest that using empty string values
>is not supported for anything other than boolean and float/complex
>data types. [The text is actually a bit confused here. E.g., at one
>point (4.7) it suggests that booleans will require a value attribute
>to specify a null, but later (6) on it describes how nulls should be
>represented for that type and makes the empty cell the default way.]
>
>E.g., if I have an 'int' field and represent the value of this field
>in some row with just <TD/> the interpretation of that value seems to
>be undefined by the standard.
>
>The VOTable standard also suggests conflating the ideas of null and
>NaN for floating point values.   If I have a 'double' field, then the
>standard suggest that <TD/> should be interpreted as identical to
><TD>NaN</TD>.  These are very distinct in the database world but it
>looks like this distinction may be lost when we return results using TAP.
>
>In the BINARY and FITS serializations there is no natural way to
>represent null values for any types.  The only avenue is to use the
>value/null attribute.  The conflation of null and NaN numbers is
>explicitly mandated.
>
>For all representations there is a significant penalty for the short
>integer types (bytes, shorts and ints), where collisions between null
>values and actual occurrences of any reserved value are likely.
>
>One solution for TAP services might be to promote integer types.
>E.g., if I have a short in the underlying database I could represent
>it as an int in TAP so that I can be assured of not having collisions
>in the VOTable response.
>
>However it's all pretty inelegant for me at least.  Am I
>misunderstanding something here?  As far as I can tell neither the
>ADQL nor TAP standards actually talk about null values (except that
>TAP notes in some cases that certain metadata values are null) so the
>VOTable standard is where the action is.
>
>	Regards,
>	Tom
=======================================================================
Francois Ochsenbein    ------   Observatoire Astronomique de Strasbourg
   11, rue de l'Universite 67000 STRASBOURG  Phone: +33-(0)368 85 24 29
Email: francois at astro.u-strasbg.fr (France)    Fax: +33-(0)368 85 24 17
=======================================================================


More information about the dal mailing list