arith.diff, arith.ratio

Thu Sep 2 12:13:02 CEST 2021

On Thu, 2 Sep 2021, Markus Demleitner wrote:

> On Wed, Sep 01, 2021 at 07:05:18PM +0200, Mireille LOUYS wrote:
> > we did discuss this point some years ago.
> > we reached the agreement that when a mathematical operator is applied to
> > physical quantities,
> > what is important is still to keep the emphasis on the physical quantity and
> > not on the operator applied to it.

But that thinking doesn't seem to apply to the statistical operators;
e.g. stat.mean is S, but stat.stdev is P.  Is there a consistent
rationale for these choices?

> Hm... what you're saying is that in data discovery, if I'm looking
> for columns matching pos.eq.ra;%, I want RA-like things, and
> differences with other RAs count?  Yes, that's a fair point.  And
> conversely, I really don't see a plausible case for people wanting to
> discover "columns that are differences".
> 
> On the other hand, if I'm a client trying to figure out what to do
> with a table, and I'm looking for columns with UCDs matching
> pos\.eq\.ra.* in an attempt to find columns suitable for use in
> positions -- yes, I admit we should have better ways to do that, but
> so far we don't --, using a pos.eq.ra;arith.diff column quite
> certainly is wrong (unless it's a difference to a constant RA, of
> course).  For arith.ratio, inspecting the unit would probably prevent
> bad mistakes, but for arith.diff, even the unit would look right.

There are places in TOPCAT where I do exactly this, look for
columns starting pos.eq.ra[;*], so a column labelled
pos.eq.ra;arith.diff would confuse it (though the worst that
will happen is the default value in a plot configuration selection
box will be inappropriate, and if it finds a pos.eq.ra or
pos.eq.ra;meta.main it will prioritise them instead, so this
isn't very likely to cause practical problems).

I always kind of assumed that the flag assignments for e.g.
stat.stdev (P) and stat.mean (S) were like that to cater for
exactly this kind of reasoning: if you look at the first atom
and ignore the rest, you ought to be able to draw some basic
conclusions about how to use the quantity.  But the S flag on
arith.diff and arith.ratio mean that doesn't work.

The UCD 1.10 standard (https://www.ivoa.net/documents/latest/UCD.html)
by my reading backs this up.  Section 3.3 says things like:

   The choice of the primary word (when a complex element is to be
   described) should be guided by the answer to the question: "in one
   word, what is this element?"

but "pos.eq.ra;arith.ratio" is not a Right Ascension.  Similarly:

   The content of the third column is an uncertainty, a measurement
   error. It can be expressed in magnitudes, but it is not a magnitude,
   so it is not correct to use phot.mag as primary word. One should
   use instead stat.error as the primary word, because the definition
   of this word corresponds precisely to the content of the column. The
   complete UCD could be written stat.error;phot.mag, to indicate that
   this error applies to a magnitude.

It seems to me that this reasoning about stat.error (P) ought to apply
equally to arith.diff (S) and arith.ratio (S).

As long as the first-atom-should-make-some-sense-on-its-own rule
can't be relied on, it makes me even less likely to try to do
any machine reasoning about UCDs that I encounter, since it becomes
a much harder job.

Mark

--
Mark Taylor  Astronomical Programmer  Physics, Bristol University, UK
m.b.taylor at bristol.ac.uk          http://www.star.bristol.ac.uk/~mbt/