<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class="">Hi all,</div><div class=""><br class=""></div><div class="">For the record, this issue has been extensively discussed since last January on the collaborative platform that has been set-up forf that workshop to exercice different proposals. (<a href="https://github.com/ivoa/dm-usecases" class="">https://github.com/ivoa/dm-usecases</a>).</div><div class="">I think it won’t be fruitful to replay this discussion of the DM list, hence if you have a piece of luck and a good search engine, you may find in the issues (<a href="https://github.com/ivoa/dm-usecases/issues" class="">https://github.com/ivoa/dm-usecases/issues</a>) many pro/con about the Markus proposal.</div><div class=""><br class=""></div><div class="">LM </div><div class=""><br class=""></div>
<div><br class=""><blockquote type="cite" class=""><div class="">On 18 May 2021, at 11:44, Markus Demleitner <<a href="mailto:msdemlei@ari.uni-heidelberg.de" class="">msdemlei@ari.uni-heidelberg.de</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">Dear DM,<br class=""><br class="">At yesterday's DM interop preparation workshop, I was asked to bring<br class="">forward a model for Measurement that I'd consider fine for my programme<br class="">of "cover a governing use case".<br class=""><br class="">This use case for Measurement from my perspective is "plot error bars"<br class="">(which is think is easily sold to the client writers, which, full<br class="">disclosure, is what I'm convinced should guide us), with a<br class="">perspective to "automatic error propagation" in the future.<br class=""><br class="">I think the current Measurements proposal will essentially work for that<br class="">when we drop a few of the boxes -- and then drop anything that is not<br class="">used by a client at the time of RFC.<br class=""><br class="">What I'd like to see unconditionally dropped are the Time, Position,<br class="">Velocity, ProperMotion, and Polarization classes; they entangle the DM<br class="">with other DMs without giving a benefit I can perceive; for the rough<br class="">classification of quantities we have UCDs, and frames, photometric<br class="">metadata, and similar data can be attached directly to the columns.<br class=""><br class="">For the rest, I strongly suspect you won't see implementations for the<br class="">3D errors, so I'd not be surprised if those dropped out at the<br class="">RFC implementation test.<br class=""><br class="">The 2D errors I suspect may be convenient shortcuts. But really, in the<br class="">end we'll need a proper model for correlated errors, perhaps as<br class="">envisioned by<br class=""><a href="https://github.com/msdemlei/astropy#working-with-covariance" class="">https://github.com/msdemlei/astropy#working-with-covariance</a>, but I'd<br class="">strongly advise to postpone that to later versions -- it'll scare<br class="">adopters unnecessarily, and I think it's really only useful once we<br class="">want to do automatic error propagation (which is Sci-Fi at this point<br class="">for all I can see).<br class=""><br class="">That's basically it (and I've said as much on the two RFC pages).<br class=""><br class=""><br class=""><br class="">If, on the other hand, you ask me how I'd build the measurement/error<br class="">thing if I got to design it from scratch... Well, in some ad-hoc<br class="">notation what we ought to have is at first (where "column" could of<br class="">course be a param as well and perhaps a literal):<br class=""><br class="">Measurement:<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>location: the column containing the value<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>label: some human-readable designation how this annotation is to<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span> be understood<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>error_type: "stat" by default, or "sys", perhaps later other values;<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">        </span>note that a single column can have both stat and sys annotations<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>naive_error: a column containing a naive, symmetrical error<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>naive_lower: a column containing a naive lower bound<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>naive_upper: a column containing a naive upper bound<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>naive_plus: a column containing a naive upper error<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>naive_minus: a column containing a naive lower error<br class=""><br class="">"Naive" here means that we don't actually say what this is (as in "one<br class="">sigma" or so); that's not known or specified in many sorts of data, and<br class="">while humans will eventually have to figure it out if they want to<br class="">interpret the error bars, it's not important for the first governing use<br class="">case. Everything except location is optional, and data providers would<br class="">be encouraged to only give one of naive_error, (naive_upper/_lower), and<br class="">(naive_plus/_minus) in one annotation.<br class=""><br class="">If we find a client that wants to plot error ellipses, we'd add<br class=""><br class="">Measurement2D:<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>location1: columns containing the position<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>location2:<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>semiMajor:<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>semiMinor:<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>posAngle:<br class=""><br class="">as in current Measurement's ellipse (or whatever the client writer<br class="">says).<br class=""><br class="">That would be it for the first round.<br class=""><br class=""><br class="">Once we've figured out how to talk to the client writers, I expect<br class="">they'll want to learn about correlated errors. For that, there'd be a<br class="">class<br class=""><br class="">Correlation:<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>error1: column that contains the first error<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>error2: column that contains the second error<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>correlation_coeff: the entry in the covariance matrix<br class=""><br class="">(and possibly other representations of correlations as requested by the<br class="">client writers).<br class=""><br class=""><br class="">And then, when we want to actually enable error calculus, I expect we<br class="">need to represent actual distributions. I'm just mentioning this here<br class="">to show one way in which that could be done. I'm pretty sure we'll want<br class="">something else in the end, but that would need to be worked out between<br class="">consumers (client writers) and producers (data providers) strictly based<br class="">on actual use cases.<br class=""><br class="">Having said that, we could extend Measurement (meaning: even with<br class="">distributions, data providers should still provide some naive error<br class="">measure) by saying:<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>dist_func: (from a vocabulary)<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>dist_pars: array of DistPar<br class=""><br class="">and <br class=""><br class="">DistPar:<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>name: (literal, depending on dist_func)<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>value: something<br class=""><br class="">For instance, a Gaussian-distributed column z could have<br class=""><br class="">(Measurement) {<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>location: z<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>naive_error: z_err<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>dist_func: "normal"<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>dist_pars: [<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">        </span>{name: "mu", value: z}<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">        </span>{name: "sigma", value: z_err}<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>]<br class="">}<br class=""><br class="">I think defining all the various distributions as separate classes<br class="">wouldn't help the clients writers enough to make it worthwhile. Just<br class="">having a master list (vocabulary?) of what dist_funcs have what<br class="">dist_pars ought to do the trick -- if a client doesn't know a specific<br class="">dist_func, it's hosed whatever we do.<br class=""><br class="">One important special case would be non-parametric distributions,<br class="">perhaps like this:<br class=""><br class="">(Measurement) {<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>location: z<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>naive_error: 0.5<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>dist_func: "deviation_histogram"<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>dist_pars: [<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">        </span>{name: "sampling_points", [-1, -0.5, 0, 0.5, 1]}<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">        </span>{name: "sampling_values", [0.01, 0.2, 0.68, 0.1, 0.01]}<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>]<br class="">}<br class=""><br class="">-- but as I said, that's just Sci-Fi I'm inventing here to show that we<br class="">*can* extend this to support actual error calculus once we've worked out<br class="">the basic cases.<br class=""><br class=""> -- Markus<br class=""></div></div></blockquote></div><br class=""></body></html>