Nulls in VOTables in TAP
Mark Taylor
m.b.taylor at bristol.ac.uk
Fri Jul 1 08:40:10 PDT 2011
Tom,
yes an empty TD for integer types is not permitted in VOTable;
a null in an integer column can only be represented by use of the
VALUES/null attribute. And yes NaN and null are not distinguished
for floating point types.
VOTable was designed (I believe) as FITS-with-metadata rather than
serialized-database, and from this point of view those decisions
look sensible. So, you can map a column of numeric data from a VOTable
(or FITS table) into a simple array of primitive integer/floating values,
which makes storage in C/Fortran-like programming languages,
translation between TABLEDATA/BINARY/FITS VOTable formats, or
translation between FITS and VOTable straightforward. With a more
database-like value space these things would be more problematic.
In my personal opinion the conflation of NaN and null is not a
serious issue - I can't think of many astronomical processing
situations where the distinction would make much practical difference
(though I'm willing to be corrected). I do agree that having to come
up with an out-of-band value for nulls in nullable integer typed
columns makes life difficult for TAP services (and other generators
of VOTable, or FITS, tables), but you haven't misunderstood,
that's the way the VOTable standard is.
Mark
On Fri, 1 Jul 2011, Tom McGlynn wrote:
> My recent security issues have caused me to relook at some of the formatting
> options for VOTables and in doing so I've become a bit confused about how
> database nulls should be handled properly. It doesn't look like any VOTable
> representation can do a proper job of handling nulls as they appear in
> databases consistently with the recommendations of the VOTable standard.
>
> The TABLEDATA representation could do pretty well. It could in principle
> represent nulls for most types by having empty text in the appropriate TD
> element. This could work for all types except that it cannot distinguish
> between 0 length arrays and null arrays. Most databases allow for 0 length
> strings distinct from null strings so that's a bit of an issue but we can
> probably live with it. However the VOTable standard seems to suggest that
> using empty string values is not supported for anything other than boolean and
> float/complex data types. [The text is actually a bit confused here. E.g., at
> one point (4.7) it suggests that booleans will require a value attribute to
> specify a null, but later (6) on it describes how nulls should be represented
> for that type and makes the empty cell the default way.]
>
> E.g., if I have an 'int' field and represent the value of this field in some
> row with just <TD/> the interpretation of that value seems to be undefined by
> the standard.
>
> The VOTable standard also suggests conflating the ideas of null and NaN for
> floating point values. If I have a 'double' field, then the standard suggest
> that <TD/> should be interpreted as identical to
> <TD>NaN</TD>. These are very distinct in the database world but it looks like
> this distinction may be lost when we return results using TAP.
>
> In the BINARY and FITS serializations there is no natural way to represent
> null values for any types. The only avenue is to use the value/null
> attribute. The conflation of null and NaN numbers is explicitly mandated.
>
> For all representations there is a significant penalty for the short integer
> types (bytes, shorts and ints), where collisions between null values and
> actual occurrences of any reserved value are likely.
>
> One solution for TAP services might be to promote integer types. E.g., if I
> have a short in the underlying database I could represent it as an int in TAP
> so that I can be assured of not having collisions in the VOTable response.
>
> However it's all pretty inelegant for me at least. Am I misunderstanding
> something here? As far as I can tell neither the ADQL nor TAP standards
> actually talk about null values (except that TAP notes in some cases that
> certain metadata values are null) so the VOTable standard is where the action
> is.
>
> Regards,
> Tom
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
More information about the dal
mailing list