Support for data containing NaN values

Francois Ochsenbein (ext.52429) francois at cdsarc.u-strasbg.fr
Mon Sep 26 16:24:40 PDT 2011


Hi Tom,

I think I understand the difference you make between "wrong" and 
"unspecified"; however the situation is already clumpsy with the
lack of well-defined "null" value in the case of integer numbers
(yes, the "unspecified" value has to be accurately defined...) .
And since the databases are able to understand only a single "null"
specification, I would really encourage to avoid further distinctions
for the time being: new semantic definitions would need to be defined
in FITS (not only in VOTable!) and can't currently be ported to 
relational databases.

By the way I tried in Postgres to apply arithmetic operations
between tables with NaN values or +/-Inf; the operations are
not always working, e.g. 1/0 or 1/Inf generate an error.
For instance with the table
==> select * from t;
 n  |     v     
----+-----------
  0 |        -1
 -1 |       NaN
  9 |  Infinity
 -1 |  Infinity
  0 | -Infinity
==> select n, 1/v from t;
ERROR:  value out of range: underflow

Cheers, francois
>
>Hi Francois,
>
>Further comments in answer to your questions.
>	Tom
>
>Francois Ochsenbein (ext.52429) wrote:
>>
>> Hi Tom,
>>
>> Just 2 remarks:
>>
>> * have you in mind any practical case in which a distinction
>>    between NaN (not-a-number IEEE pattern) and NULL (in relational
>>    database jargon) is meaningful ?
>
>I tend to think of NaN's as expressing a concept that something is
>'wrong' while Null's represent that something is 'missing'.  E.g., in
>a telemetry stream I might use a NaN when the instrument is in some
>invalid state and Null when the instrument is off.  Developing this
>example a little: Suppose I want to have a set of photometric
>measurements in a standard time frame from an instrument.  In the
>calibrated photometry, Null brightness values indicate the instrument
>is off, NaN's that the data is invalid.  If I feel that I can do the
>calibration a bit better I might want to look for the NaN rows to see
>if I can get around the problems that occurred in the original
>calibration.  If the data has been passed through a VOTable somewhere
>in the processing, the distinction between the null and NaN rows has
>been lost so I can't easily pick out the 'invalid' values.
>
>Sure it would be possible to use a different representation...but this
>is a reasonable approach and we should be very chary of putting limits
>on what data users can represent or saying how they have to represent
>it in what is intended to be a generic format.  I think that lack of
>full support for IEEE floating point numbers is a substantial limit
>for VOTables.  The fact that a standard developed in the 1970's has
>the same limit isn't really a good reason that we need to propagate
>the constraint.
>
>
>>
>> * do you know any relational database which accepts to enter
>>    NaN (and/or +/-Inf) values ? As far as I know, these IEEE
>>    floating-point patterns are not handled (the  relational databases
>>    I use generate arithmetic errors, and refuse the operation which
>>    could lead to NaN's or Inf's)
>>
>
>It is my understanding (and personal experience in the first case)
>that Postgres, Oracle and Ingres support NaNs and Infs.  As far as I
>can tell Sybase, MySQL and SQLServer do not.  (See, e.g., 8.1.3 in
>http://www.postgresql.org/docs/8.2/static/datatype-numeric.html or
>http://www.techonthenet.com/oracle/functions/nanvl.php)
>
>I'm not clear if databases that do not allow NaNs as internal values
>in databases also preclude them as calculated results, e.g., what the
>result of
>    select sqrt(-1.)
>is.
>
>
>> And yes, the VOTable document assumes (as FITS does) that NaN
>> and NULL have the same semantic meaning -- i.e. that the answer
>> is "no" to both questions.
>>
>> Cheers, francois
>>
>>>
>>> I'm not sure that Mark's comments really addresses the Randy's
>>> question.  Suppose we have an original dataset, O, in some non-VO
>>> format and a VO serialization of this dataset, V.  Both O and V may
>>> contain NaNs.   As Mark points out a NaN in V is the recommended
>>> representation for a null value. So in any context in which null
>>> values are distinct from NaNs, VOTables cannot distinguish them, i.e.,
>>> a NaN in the VOTable does not in general mean that there was a NaN in
>>> the original data.  So if you wish to preserve NaNs VOTables are not
>>> currently a safe way to do so.
>>>
>>> As alluded to, this particular issue has been discussed before and
>>> some thoughts have been collected at
>>> http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/VOTableIssues.
>>>
>>> 	Tom
>>>
>>> Mark Taylor wrote:
>>>> Mike,
>>>>
>>>> in fact VOTable does permit NaN values in (single or double) floating
>>>> point data; I think this is a consequence of the design decision
>>>> that the VOTable model for data is to be as close as possible to
>>>> that of FITS.  Furthermore, NaN is how VOTable recommends to represent
>>>> NULL values in floating point data (again, following FITS) - whether
>>>> that's a good idea or not is a question that has been debated elsewhere,
>>>> but that's what VOTable section 6 says
>>>> (http://www.ivoa.net/Documents/VOTable/20091130/REC-VOTable-1.2.html#ToC41)
>>>>
>>>> Mark
>>>>
>>>> On Mon, 26 Sep 2011, Mike Fitzpatrick wrote:
>>>>
>>>>> I took the question differently:  If VO allows FITS  data, and
>>>>> FITS data allows NaN, then apps should of course allow
>>>>> for this.  OTOH, if the question is whether "VO data" as is
>>>>> serialized in a VOTable allows NaN values the I think the
>>>>> answer is 'no' (but I'd have to check).  There are similar issues
>>>>> with how NULL values are handled, but again it depends on
>>>>> whether it is in the serialized VOTable or the end data product
>>>>> being accessed.  The DAL protocols don't say anything about
>>>>> NaN/NULL beyond how they might be serialized, FITS is FITS
>>>>> and if that's what the app retrieves in the end and then that is
>>>>> the standard to follow when interpreting the data.  Was that
>>>>> your question?
>>>>>
>>>>> My $0.02,
>>>>> -Mike
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Sep 26, 2011 at 2:06 AM, Mark Taylor<m.b.taylor at bristol.ac.uk>wrote:
>>>>>
>>>>>> On Thu, 22 Sep 2011, Randy Thompson wrote:
>>>>>>
>>>>>>> As a general question, does data containing NaN values
>>>>>>> violate any VO standards or protocols,and if not, should VO
>>>>>>> applications be expected to accept them as input?
>>>>>>
>>>>>> the question is rather broad ("data" can take many forms), but on
>>>>>> the whole the answer is that most standards and software in the VO
>>>>>> should and do behave sensibly in the presence of NaN-valued floating
>>>>>> point values.
>>>>>>
>>>>>> --
>>>>>> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
>>>>>> m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
>>>> m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
>> =======================================================================
>> Francois Ochsenbein    ------   Observatoire Astronomique de Strasbourg
>>     11, rue de l'Universite 67000 STRASBOURG  Phone: +33-(0)368 85 24 29
>> Email: francois at astro.u-strasbg.fr (France)    Fax: +33-(0)368 85 24 17
>> =======================================================================
=======================================================================
Francois Ochsenbein    ------   Observatoire Astronomique de Strasbourg
   11, rue de l'Universite 67000 STRASBOURG  Phone: +33-(0)368 85 24 29
Email: francois at astro.u-strasbg.fr (France)    Fax: +33-(0)368 85 24 17
=======================================================================


More information about the dal mailing list