Support for data containing NaN values
Francois Ochsenbein (ext.52429)
francois at cdsarc.u-strasbg.fr
Mon Sep 26 16:24:40 PDT 2011
Hi Tom,
I think I understand the difference you make between "wrong" and
"unspecified"; however the situation is already clumpsy with the
lack of well-defined "null" value in the case of integer numbers
(yes, the "unspecified" value has to be accurately defined...) .
And since the databases are able to understand only a single "null"
specification, I would really encourage to avoid further distinctions
for the time being: new semantic definitions would need to be defined
in FITS (not only in VOTable!) and can't currently be ported to
relational databases.
By the way I tried in Postgres to apply arithmetic operations
between tables with NaN values or +/-Inf; the operations are
not always working, e.g. 1/0 or 1/Inf generate an error.
For instance with the table
==> select * from t;
n | v
----+-----------
0 | -1
-1 | NaN
9 | Infinity
-1 | Infinity
0 | -Infinity
==> select n, 1/v from t;
ERROR: value out of range: underflow
Cheers, francois
>
>Hi Francois,
>
>Further comments in answer to your questions.
> Tom
>
>Francois Ochsenbein (ext.52429) wrote:
>>
>> Hi Tom,
>>
>> Just 2 remarks:
>>
>> * have you in mind any practical case in which a distinction
>> between NaN (not-a-number IEEE pattern) and NULL (in relational
>> database jargon) is meaningful ?
>
>I tend to think of NaN's as expressing a concept that something is
>'wrong' while Null's represent that something is 'missing'. E.g., in
>a telemetry stream I might use a NaN when the instrument is in some
>invalid state and Null when the instrument is off. Developing this
>example a little: Suppose I want to have a set of photometric
>measurements in a standard time frame from an instrument. In the
>calibrated photometry, Null brightness values indicate the instrument
>is off, NaN's that the data is invalid. If I feel that I can do the
>calibration a bit better I might want to look for the NaN rows to see
>if I can get around the problems that occurred in the original
>calibration. If the data has been passed through a VOTable somewhere
>in the processing, the distinction between the null and NaN rows has
>been lost so I can't easily pick out the 'invalid' values.
>
>Sure it would be possible to use a different representation...but this
>is a reasonable approach and we should be very chary of putting limits
>on what data users can represent or saying how they have to represent
>it in what is intended to be a generic format. I think that lack of
>full support for IEEE floating point numbers is a substantial limit
>for VOTables. The fact that a standard developed in the 1970's has
>the same limit isn't really a good reason that we need to propagate
>the constraint.
>
>
>>
>> * do you know any relational database which accepts to enter
>> NaN (and/or +/-Inf) values ? As far as I know, these IEEE
>> floating-point patterns are not handled (the relational databases
>> I use generate arithmetic errors, and refuse the operation which
>> could lead to NaN's or Inf's)
>>
>
>It is my understanding (and personal experience in the first case)
>that Postgres, Oracle and Ingres support NaNs and Infs. As far as I
>can tell Sybase, MySQL and SQLServer do not. (See, e.g., 8.1.3 in
>http://www.postgresql.org/docs/8.2/static/datatype-numeric.html or
>http://www.techonthenet.com/oracle/functions/nanvl.php)
>
>I'm not clear if databases that do not allow NaNs as internal values
>in databases also preclude them as calculated results, e.g., what the
>result of
> select sqrt(-1.)
>is.
>
>
>> And yes, the VOTable document assumes (as FITS does) that NaN
>> and NULL have the same semantic meaning -- i.e. that the answer
>> is "no" to both questions.
>>
>> Cheers, francois
>>
>>>
>>> I'm not sure that Mark's comments really addresses the Randy's
>>> question. Suppose we have an original dataset, O, in some non-VO
>>> format and a VO serialization of this dataset, V. Both O and V may
>>> contain NaNs. As Mark points out a NaN in V is the recommended
>>> representation for a null value. So in any context in which null
>>> values are distinct from NaNs, VOTables cannot distinguish them, i.e.,
>>> a NaN in the VOTable does not in general mean that there was a NaN in
>>> the original data. So if you wish to preserve NaNs VOTables are not
>>> currently a safe way to do so.
>>>
>>> As alluded to, this particular issue has been discussed before and
>>> some thoughts have been collected at
>>> http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/VOTableIssues.
>>>
>>> Tom
>>>
>>> Mark Taylor wrote:
>>>> Mike,
>>>>
>>>> in fact VOTable does permit NaN values in (single or double) floating
>>>> point data; I think this is a consequence of the design decision
>>>> that the VOTable model for data is to be as close as possible to
>>>> that of FITS. Furthermore, NaN is how VOTable recommends to represent
>>>> NULL values in floating point data (again, following FITS) - whether
>>>> that's a good idea or not is a question that has been debated elsewhere,
>>>> but that's what VOTable section 6 says
>>>> (http://www.ivoa.net/Documents/VOTable/20091130/REC-VOTable-1.2.html#ToC41)
>>>>
>>>> Mark
>>>>
>>>> On Mon, 26 Sep 2011, Mike Fitzpatrick wrote:
>>>>
>>>>> I took the question differently: If VO allows FITS data, and
>>>>> FITS data allows NaN, then apps should of course allow
>>>>> for this. OTOH, if the question is whether "VO data" as is
>>>>> serialized in a VOTable allows NaN values the I think the
>>>>> answer is 'no' (but I'd have to check). There are similar issues
>>>>> with how NULL values are handled, but again it depends on
>>>>> whether it is in the serialized VOTable or the end data product
>>>>> being accessed. The DAL protocols don't say anything about
>>>>> NaN/NULL beyond how they might be serialized, FITS is FITS
>>>>> and if that's what the app retrieves in the end and then that is
>>>>> the standard to follow when interpreting the data. Was that
>>>>> your question?
>>>>>
>>>>> My $0.02,
>>>>> -Mike
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Sep 26, 2011 at 2:06 AM, Mark Taylor<m.b.taylor at bristol.ac.uk>wrote:
>>>>>
>>>>>> On Thu, 22 Sep 2011, Randy Thompson wrote:
>>>>>>
>>>>>>> As a general question, does data containing NaN values
>>>>>>> violate any VO standards or protocols,and if not, should VO
>>>>>>> applications be expected to accept them as input?
>>>>>>
>>>>>> the question is rather broad ("data" can take many forms), but on
>>>>>> the whole the answer is that most standards and software in the VO
>>>>>> should and do behave sensibly in the presence of NaN-valued floating
>>>>>> point values.
>>>>>>
>>>>>> --
>>>>>> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
>>>>>> m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
>>>> m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
>> =======================================================================
>> Francois Ochsenbein ------ Observatoire Astronomique de Strasbourg
>> 11, rue de l'Universite 67000 STRASBOURG Phone: +33-(0)368 85 24 29
>> Email: francois at astro.u-strasbg.fr (France) Fax: +33-(0)368 85 24 17
>> =======================================================================
=======================================================================
Francois Ochsenbein ------ Observatoire Astronomique de Strasbourg
11, rue de l'Universite 67000 STRASBOURG Phone: +33-(0)368 85 24 29
Email: francois at astro.u-strasbg.fr (France) Fax: +33-(0)368 85 24 17
=======================================================================
More information about the dal
mailing list