Nulls in VOTables in TAP
Tom McGlynn
Thomas.A.McGlynn at nasa.gov
Fri Jul 1 07:25:24 PDT 2011
My recent security issues have caused me to relook at some of the
formatting options for VOTables and in doing so I've become a bit
confused about how database nulls should be handled properly. It
doesn't look like any VOTable representation can do a proper job of
handling nulls as they appear in databases consistently with the
recommendations of the VOTable standard.
The TABLEDATA representation could do pretty well. It could in
principle represent nulls for most types by having empty text in the
appropriate TD element. This could work for all types except that it
cannot distinguish between 0 length arrays and null arrays. Most
databases allow for 0 length strings distinct from null strings so
that's a bit of an issue but we can probably live with it. However
the VOTable standard seems to suggest that using empty string values
is not supported for anything other than boolean and float/complex
data types. [The text is actually a bit confused here. E.g., at one
point (4.7) it suggests that booleans will require a value attribute
to specify a null, but later (6) on it describes how nulls should be
represented for that type and makes the empty cell the default way.]
E.g., if I have an 'int' field and represent the value of this field
in some row with just <TD/> the interpretation of that value seems to
be undefined by the standard.
The VOTable standard also suggests conflating the ideas of null and
NaN for floating point values. If I have a 'double' field, then the
standard suggest that <TD/> should be interpreted as identical to
<TD>NaN</TD>. These are very distinct in the database world but it
looks like this distinction may be lost when we return results using TAP.
In the BINARY and FITS serializations there is no natural way to
represent null values for any types. The only avenue is to use the
value/null attribute. The conflation of null and NaN numbers is
explicitly mandated.
For all representations there is a significant penalty for the short
integer types (bytes, shorts and ints), where collisions between null
values and actual occurrences of any reserved value are likely.
One solution for TAP services might be to promote integer types.
E.g., if I have a short in the underlying database I could represent
it as an int in TAP so that I can be assured of not having collisions
in the VOTable response.
However it's all pretty inelegant for me at least. Am I
misunderstanding something here? As far as I can tell neither the
ADQL nor TAP standards actually talk about null values (except that
TAP notes in some cases that certain metadata values are null) so the
VOTable standard is where the action is.
Regards,
Tom
More information about the dal
mailing list