UCD question: what UCD word to use for percentiles? stat.percentile proposal

Bo Milvang-Jensen milvang at astro.ku.dk
Thu Mar 17 15:47:06 CET 2022


Dear Mireille, Sebastien, IVOA Semantics group and colleagues,

Thank you very much for giving my question so much thought. Your 
proposed new words are clearly useful for my use case. My comments are:

The proposed new word stat.percentile is clearly a good idea.

The proposed new words stat.percentile.1sigma (and 2sigma and 3 sigma) 
are also useful (and something I had not thought about myself), as they 
provide more information about what percentile is meant. Your scheme of 
adding either stat.min or stat.max, as in
stat.percentile.1sigma;stat.min
stat.percentile.1sigma;stat.max
works, but I am not sure it's the most satisfying solution. As far as I 
can see, one would never use stat.percentile.1sigma without adding 
either stat.min or stat.max, so I would therefore create separate words 
for the percentiles below and above the median, e.g.
stat.percentile.lower1sigma
stat.percentile.upper1sigma
And similarly for 2sigma and 3sigma. I am not sure what the best wording 
would be. If you want to use more characters, one could insert the word 
"median", as in "1sigmabelowmedian". (And instead of lower/upper one 
could user below/above.) One could also have another level 
(stat.percentile.1sigma.lower), which could be more readable.

I want to note that e.g. the 16% percentile is only guaranteed to be 
located 1 standard deviation ("sigma") below the median (and mean) for a 
normal distribution, whereas for asymmetric distributions that would not 
be the case. (Disclaimer: I am not a statistics expert.) It should be 
therefore be understood that these new UCD words can be applied to the 
percentiles that in a normal distribution would correspond to 1,2,3 
sigma below/above the median, but which in the concrete case may not 
have that property.

I think that the 1sigma/2sigma/3sigma naming is fine. If you instead 
wanted to have the actual numbers, a problem is the dot in e.g. 2.5%. 
Instead of per cent one could use per mille. I have looked up what the 
percentiles (in per mille!) are for a normal distribution for 
-3,-2,-1,+1,+2,+3 sigma:
1.3499000000000194
22.750130000000013
158.65525499999995
841.3447450000001
977.24987
998.6501
So one could create the words
stat.percentile.1permille
stat.percentile.23permille
stat.percentile.159permille
stat.percentile.841permille
stat.percentile.977permille
stat.percentile.999permille
But I am not sure it is more elegant. (And I note that my catalogue (not 
created by my) has e.g. the 2.5% percentile and not 2.3% which would be 
the logical choice.)

I would like to use the new proposed UCD words (either directly what you 
wrote, or a modified version based on what I suggest now) in my 
catalogues for publication in ESO's Phase 3. How long would it take 
before the new words would be approved? I suppose they need to be 
approved before ESO can accept them. I can say that we found a small 
problem with one column in the catalogue, so the final version will 
probably not be ready before 1-2 weeks, as the main author is finishing 
his PhD thesis these days.

Kind regards, Bo

On 3/17/22 12:39 PM, Mireille LOUYS wrote:
>
> Hi Bo , Hi semantics,
>
> We have re-examined your use case together with S. Derriere and A. 
> Preite Martinez and checked also how Vizier handles percentiles.
>
> There is indeed currently no proper way to describe with UCDs that a 
> measurement is associated to some percentile
> of a statistical model/distribution.
> Creating a new word could help describe these values :
> Q stat.percentile    Percentile in a statistical distribution
> We could also have a few more precise words to address exactly what 
> you are trying to describe :
> Q stat.percentile.1sigma    Percentile corresponding to one standard deviation from the median
> Q stat.percentile.2sigma    Percentile corresponding to two standard deviations from the median
>
> With these words, we could use :
> ucd="src.redshift.phot;stat.percentile.2sigma;stat.min"  for EAZY  2.5% percentile of photo-z
> ucd="src.redshift.phot;stat.percentile.1sigma;stat.min"  for EAZY  16% percentile of photo-z AND LePhare photo-z lower limit, 68% conf. level
>
> ucd="src.redshift.phot;stat.median"  for EAZY  50% percentile of photo-z
>
> ucd="src.redshift.phot;stat.percentile.1sigma;stat.max"  for EAZY  84% percentile of photo-z AND LePhare photo-z upper limit, 68% conf. level
>
> ucd="src.redshift.phot;stat.percentile.2sigma;stat.max"  for EAZY  16% percentile of photo-z
>
> In the UCD vocabulary, maybe an extra word would cover all possible 
> cases :
> Q stat.percentile.3sigma   Percentile corresponding to three standard deviations from the median
> I hope this helps .
> I have created a VEP-UCD for this term , and will circulate it in the 
> UCD Board to discuss it for adoption .
>
> Tell us wheter you can use this , and your feedback in case .
> Thanks in advance .
>
> Mireille & Sebastien
> CDS, Strasbourg
> ----------------
> ------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/semantics/attachments/20220317/ff8fb0d2/attachment-0001.html>


More information about the semantics mailing list