Nulls in VOTables in TAP

Tom McGlynn Thomas.A.McGlynn at nasa.gov
Fri Jul 1 07:25:24 PDT 2011


My recent security issues have caused me to relook at some of the 
formatting options for VOTables and in doing so I've become a bit 
confused about how database nulls should be handled properly.  It 
doesn't look like any VOTable representation can do a proper job of 
handling nulls as they appear in databases consistently with the 
recommendations of the VOTable standard.

The TABLEDATA representation could do pretty well.  It could in 
principle represent nulls for most types by having empty text in the 
appropriate TD element.  This could work for all types except that it 
cannot distinguish between 0 length arrays and null arrays.  Most 
databases allow for 0 length strings distinct from null strings so 
that's a bit of an issue but we can probably live with it.  However 
the VOTable standard seems to suggest that using empty string values 
is not supported for anything other than boolean and float/complex 
data types. [The text is actually a bit confused here. E.g., at one 
point (4.7) it suggests that booleans will require a value attribute 
to specify a null, but later (6) on it describes how nulls should be 
represented for that type and makes the empty cell the default way.]

E.g., if I have an 'int' field and represent the value of this field 
in some row with just <TD/> the interpretation of that value seems to 
be undefined by the standard.

The VOTable standard also suggests conflating the ideas of null and 
NaN for floating point values.   If I have a 'double' field, then the 
standard suggest that <TD/> should be interpreted as identical to
<TD>NaN</TD>.  These are very distinct in the database world but it 
looks like this distinction may be lost when we return results using TAP.

In the BINARY and FITS serializations there is no natural way to 
represent null values for any types.  The only avenue is to use the 
value/null attribute.  The conflation of null and NaN numbers is 
explicitly mandated.

For all representations there is a significant penalty for the short 
integer types (bytes, shorts and ints), where collisions between null 
values and actual occurrences of any reserved value are likely.

One solution for TAP services might be to promote integer types. 
E.g., if I have a short in the underlying database I could represent 
it as an int in TAP so that I can be assured of not having collisions 
in the VOTable response.

However it's all pretty inelegant for me at least.  Am I 
misunderstanding something here?  As far as I can tell neither the 
ADQL nor TAP standards actually talk about null values (except that 
TAP notes in some cases that certain metadata values are null) so the 
VOTable standard is where the action is.

	Regards,
	Tom


More information about the dal mailing list