UCD question: what UCD word to use for percentiles? stat.percentile proposal

Mark Taylor m.b.taylor at bristol.ac.uk
Mon Mar 21 17:13:26 CET 2022


Hi Bo et al.,

I'm not a semantics expert, but my feeling is that trying to go into
more detail than stat.median and stat.percentile would be a mistake.
As noted, designating them by number of standard deviations away
from the mean requires assumptions about the distribution, and
in any case providing any fixed set of numeric values is bound to
disappoint data providers who have different percentiles available,
unless a very large number of them is provided.  If the UCD mechanism
provided some way to associate numeric values with the semantics
it would be nice to do that here, but it doesn't (we've encountered
this before with e.g. HEALPix pixel ID at depth N).

So my suggestion would be to stick with just adding stat.percentile
(or maybe stat.quantile) which is enough information to tell a human 
or computer *roughly* how to treat such a quantity.

Mark

On Thu, 17 Mar 2022, Bo Milvang-Jensen wrote:

> Dear Mireille, Sebastien, IVOA Semantics group and colleagues,
> 
> Thank you very much for giving my question so much thought. Your proposed new
> words are clearly useful for my use case. My comments are:
> 
> The proposed new word stat.percentile is clearly a good idea.
> 
> The proposed new words stat.percentile.1sigma (and 2sigma and 3 sigma) are
> also useful (and something I had not thought about myself), as they provide
> more information about what percentile is meant. Your scheme of adding either
> stat.min or stat.max, as in
> stat.percentile.1sigma;stat.min
> stat.percentile.1sigma;stat.max
> works, but I am not sure it's the most satisfying solution. As far as I can
> see, one would never use stat.percentile.1sigma without adding either stat.min
> or stat.max, so I would therefore create separate words for the percentiles
> below and above the median, e.g.
> stat.percentile.lower1sigma
> stat.percentile.upper1sigma
> And similarly for 2sigma and 3sigma. I am not sure what the best wording would
> be. If you want to use more characters, one could insert the word "median", as
> in "1sigmabelowmedian". (And instead of lower/upper one could user
> below/above.) One could also have another level
> (stat.percentile.1sigma.lower), which could be more readable.
> 
> I want to note that e.g. the 16% percentile is only guaranteed to be located 1
> standard deviation ("sigma") below the median (and mean) for a normal
> distribution, whereas for asymmetric distributions that would not be the case.
> (Disclaimer: I am not a statistics expert.) It should be therefore be
> understood that these new UCD words can be applied to the percentiles that in
> a normal distribution would correspond to 1,2,3 sigma below/above the median,
> but which in the concrete case may not have that property.
> 
> I think that the 1sigma/2sigma/3sigma naming is fine. If you instead wanted to
> have the actual numbers, a problem is the dot in e.g. 2.5%. Instead of per
> cent one could use per mille. I have looked up what the percentiles (in per
> mille!) are for a normal distribution for -3,-2,-1,+1,+2,+3 sigma:
> 1.3499000000000194
> 22.750130000000013
> 158.65525499999995
> 841.3447450000001
> 977.24987
> 998.6501
> So one could create the words
> stat.percentile.1permille
> stat.percentile.23permille
> stat.percentile.159permille
> stat.percentile.841permille
> stat.percentile.977permille
> stat.percentile.999permille
> But I am not sure it is more elegant. (And I note that my catalogue (not
> created by my) has e.g. the 2.5% percentile and not 2.3% which would be the
> logical choice.)
> 
> I would like to use the new proposed UCD words (either directly what you
> wrote, or a modified version based on what I suggest now) in my catalogues for
> publication in ESO's Phase 3. How long would it take before the new words
> would be approved? I suppose they need to be approved before ESO can accept
> them. I can say that we found a small problem with one column in the
> catalogue, so the final version will probably not be ready before 1-2 weeks,
> as the main author is finishing his PhD thesis these days.
> 
> Kind regards, Bo
> 
> On 3/17/22 12:39 PM, Mireille LOUYS wrote:
> > 
> > Hi Bo , Hi semantics,
> > 
> > We have re-examined your use case together with S. Derriere and A. Preite
> > Martinez and checked also how Vizier handles percentiles.
> > 
> > There is indeed currently no proper way to describe with UCDs that a
> > measurement is associated to some percentile
> > of a statistical model/distribution.
> > Creating a new word could help describe these values :
> > Q stat.percentile    Percentile in a statistical distribution
> > We could also have a few more precise words to address exactly what you are
> > trying to describe :
> > Q stat.percentile.1sigma    Percentile corresponding to one standard
> > deviation from the median
> > Q stat.percentile.2sigma    Percentile corresponding to two standard
> > deviations from the median
> > 
> > With these words, we could use :
> > ucd="src.redshift.phot;stat.percentile.2sigma;stat.min"  for EAZY  2.5%
> > percentile of photo-z
> > ucd="src.redshift.phot;stat.percentile.1sigma;stat.min"  for EAZY  16%
> > percentile of photo-z AND LePhare photo-z lower limit, 68% conf. level
> > 
> > ucd="src.redshift.phot;stat.median"  for EAZY  50% percentile of photo-z
> > 
> > ucd="src.redshift.phot;stat.percentile.1sigma;stat.max"  for EAZY  84%
> > percentile of photo-z AND LePhare photo-z upper limit, 68% conf. level
> > 
> > ucd="src.redshift.phot;stat.percentile.2sigma;stat.max"  for EAZY  16%
> > percentile of photo-z
> > 
> > In the UCD vocabulary, maybe an extra word would cover all possible cases :
> > Q stat.percentile.3sigma   Percentile corresponding to three standard
> > deviations from the median
> > I hope this helps .
> > I have created a VEP-UCD for this term , and will circulate it in the UCD
> > Board to discuss it for adoption .
> > 
> > Tell us wheter you can use this , and your feedback in case .
> > Thanks in advance .
> > 
> > Mireille & Sebastien
> > CDS, Strasbourg
> > ----------------
> > ------------------------------------------------------------------------
> 

--
Mark Taylor  Astronomical Programmer  Physics, Bristol University, UK
m.b.taylor at bristol.ac.uk          http://www.star.bristol.ac.uk/~mbt/


More information about the semantics mailing list