UCD question: what UCD word to use for percentiles? stat.percentile proposal
Bo Milvang-Jensen
milvang at astro.ku.dk
Thu Mar 17 15:47:06 CET 2022
Dear Mireille, Sebastien, IVOA Semantics group and colleagues,
Thank you very much for giving my question so much thought. Your
proposed new words are clearly useful for my use case. My comments are:
The proposed new word stat.percentile is clearly a good idea.
The proposed new words stat.percentile.1sigma (and 2sigma and 3 sigma)
are also useful (and something I had not thought about myself), as they
provide more information about what percentile is meant. Your scheme of
adding either stat.min or stat.max, as in
stat.percentile.1sigma;stat.min
stat.percentile.1sigma;stat.max
works, but I am not sure it's the most satisfying solution. As far as I
can see, one would never use stat.percentile.1sigma without adding
either stat.min or stat.max, so I would therefore create separate words
for the percentiles below and above the median, e.g.
stat.percentile.lower1sigma
stat.percentile.upper1sigma
And similarly for 2sigma and 3sigma. I am not sure what the best wording
would be. If you want to use more characters, one could insert the word
"median", as in "1sigmabelowmedian". (And instead of lower/upper one
could user below/above.) One could also have another level
(stat.percentile.1sigma.lower), which could be more readable.
I want to note that e.g. the 16% percentile is only guaranteed to be
located 1 standard deviation ("sigma") below the median (and mean) for a
normal distribution, whereas for asymmetric distributions that would not
be the case. (Disclaimer: I am not a statistics expert.) It should be
therefore be understood that these new UCD words can be applied to the
percentiles that in a normal distribution would correspond to 1,2,3
sigma below/above the median, but which in the concrete case may not
have that property.
I think that the 1sigma/2sigma/3sigma naming is fine. If you instead
wanted to have the actual numbers, a problem is the dot in e.g. 2.5%.
Instead of per cent one could use per mille. I have looked up what the
percentiles (in per mille!) are for a normal distribution for
-3,-2,-1,+1,+2,+3 sigma:
1.3499000000000194
22.750130000000013
158.65525499999995
841.3447450000001
977.24987
998.6501
So one could create the words
stat.percentile.1permille
stat.percentile.23permille
stat.percentile.159permille
stat.percentile.841permille
stat.percentile.977permille
stat.percentile.999permille
But I am not sure it is more elegant. (And I note that my catalogue (not
created by my) has e.g. the 2.5% percentile and not 2.3% which would be
the logical choice.)
I would like to use the new proposed UCD words (either directly what you
wrote, or a modified version based on what I suggest now) in my
catalogues for publication in ESO's Phase 3. How long would it take
before the new words would be approved? I suppose they need to be
approved before ESO can accept them. I can say that we found a small
problem with one column in the catalogue, so the final version will
probably not be ready before 1-2 weeks, as the main author is finishing
his PhD thesis these days.
Kind regards, Bo
On 3/17/22 12:39 PM, Mireille LOUYS wrote:
>
> Hi Bo , Hi semantics,
>
> We have re-examined your use case together with S. Derriere and A.
> Preite Martinez and checked also how Vizier handles percentiles.
>
> There is indeed currently no proper way to describe with UCDs that a
> measurement is associated to some percentile
> of a statistical model/distribution.
> Creating a new word could help describe these values :
> Q stat.percentile Percentile in a statistical distribution
> We could also have a few more precise words to address exactly what
> you are trying to describe :
> Q stat.percentile.1sigma Percentile corresponding to one standard deviation from the median
> Q stat.percentile.2sigma Percentile corresponding to two standard deviations from the median
>
> With these words, we could use :
> ucd="src.redshift.phot;stat.percentile.2sigma;stat.min" for EAZY 2.5% percentile of photo-z
> ucd="src.redshift.phot;stat.percentile.1sigma;stat.min" for EAZY 16% percentile of photo-z AND LePhare photo-z lower limit, 68% conf. level
>
> ucd="src.redshift.phot;stat.median" for EAZY 50% percentile of photo-z
>
> ucd="src.redshift.phot;stat.percentile.1sigma;stat.max" for EAZY 84% percentile of photo-z AND LePhare photo-z upper limit, 68% conf. level
>
> ucd="src.redshift.phot;stat.percentile.2sigma;stat.max" for EAZY 16% percentile of photo-z
>
> In the UCD vocabulary, maybe an extra word would cover all possible
> cases :
> Q stat.percentile.3sigma Percentile corresponding to three standard deviations from the median
> I hope this helps .
> I have created a VEP-UCD for this term , and will circulate it in the
> UCD Board to discuss it for adoption .
>
> Tell us wheter you can use this , and your feedback in case .
> Thanks in advance .
>
> Mireille & Sebastien
> CDS, Strasbourg
> ----------------
> ------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/semantics/attachments/20220317/ff8fb0d2/attachment-0001.html>
More information about the semantics
mailing list