Post-workshop Measurement musings
Laurent Michel
laurent.michel at astro.unistra.fr
Tue May 18 20:48:00 CEST 2021
Hi all,
For the record, this issue has been extensively discussed since last January on the collaborative platform that has been set-up forf that workshop to exercice different proposals. (https://github.com/ivoa/dm-usecases <https://github.com/ivoa/dm-usecases>).
I think it won’t be fruitful to replay this discussion of the DM list, hence if you have a piece of luck and a good search engine, you may find in the issues (https://github.com/ivoa/dm-usecases/issues <https://github.com/ivoa/dm-usecases/issues>) many pro/con about the Markus proposal.
LM
> On 18 May 2021, at 11:44, Markus Demleitner <msdemlei at ari.uni-heidelberg.de> wrote:
>
> Dear DM,
>
> At yesterday's DM interop preparation workshop, I was asked to bring
> forward a model for Measurement that I'd consider fine for my programme
> of "cover a governing use case".
>
> This use case for Measurement from my perspective is "plot error bars"
> (which is think is easily sold to the client writers, which, full
> disclosure, is what I'm convinced should guide us), with a
> perspective to "automatic error propagation" in the future.
>
> I think the current Measurements proposal will essentially work for that
> when we drop a few of the boxes -- and then drop anything that is not
> used by a client at the time of RFC.
>
> What I'd like to see unconditionally dropped are the Time, Position,
> Velocity, ProperMotion, and Polarization classes; they entangle the DM
> with other DMs without giving a benefit I can perceive; for the rough
> classification of quantities we have UCDs, and frames, photometric
> metadata, and similar data can be attached directly to the columns.
>
> For the rest, I strongly suspect you won't see implementations for the
> 3D errors, so I'd not be surprised if those dropped out at the
> RFC implementation test.
>
> The 2D errors I suspect may be convenient shortcuts. But really, in the
> end we'll need a proper model for correlated errors, perhaps as
> envisioned by
> https://github.com/msdemlei/astropy#working-with-covariance, but I'd
> strongly advise to postpone that to later versions -- it'll scare
> adopters unnecessarily, and I think it's really only useful once we
> want to do automatic error propagation (which is Sci-Fi at this point
> for all I can see).
>
> That's basically it (and I've said as much on the two RFC pages).
>
>
>
> If, on the other hand, you ask me how I'd build the measurement/error
> thing if I got to design it from scratch... Well, in some ad-hoc
> notation what we ought to have is at first (where "column" could of
> course be a param as well and perhaps a literal):
>
> Measurement:
> location: the column containing the value
> label: some human-readable designation how this annotation is to
> be understood
> error_type: "stat" by default, or "sys", perhaps later other values;
> note that a single column can have both stat and sys annotations
> naive_error: a column containing a naive, symmetrical error
> naive_lower: a column containing a naive lower bound
> naive_upper: a column containing a naive upper bound
> naive_plus: a column containing a naive upper error
> naive_minus: a column containing a naive lower error
>
> "Naive" here means that we don't actually say what this is (as in "one
> sigma" or so); that's not known or specified in many sorts of data, and
> while humans will eventually have to figure it out if they want to
> interpret the error bars, it's not important for the first governing use
> case. Everything except location is optional, and data providers would
> be encouraged to only give one of naive_error, (naive_upper/_lower), and
> (naive_plus/_minus) in one annotation.
>
> If we find a client that wants to plot error ellipses, we'd add
>
> Measurement2D:
> location1: columns containing the position
> location2:
> semiMajor:
> semiMinor:
> posAngle:
>
> as in current Measurement's ellipse (or whatever the client writer
> says).
>
> That would be it for the first round.
>
>
> Once we've figured out how to talk to the client writers, I expect
> they'll want to learn about correlated errors. For that, there'd be a
> class
>
> Correlation:
> error1: column that contains the first error
> error2: column that contains the second error
> correlation_coeff: the entry in the covariance matrix
>
> (and possibly other representations of correlations as requested by the
> client writers).
>
>
> And then, when we want to actually enable error calculus, I expect we
> need to represent actual distributions. I'm just mentioning this here
> to show one way in which that could be done. I'm pretty sure we'll want
> something else in the end, but that would need to be worked out between
> consumers (client writers) and producers (data providers) strictly based
> on actual use cases.
>
> Having said that, we could extend Measurement (meaning: even with
> distributions, data providers should still provide some naive error
> measure) by saying:
>
> dist_func: (from a vocabulary)
> dist_pars: array of DistPar
>
> and
>
> DistPar:
> name: (literal, depending on dist_func)
> value: something
>
> For instance, a Gaussian-distributed column z could have
>
> (Measurement) {
> location: z
> naive_error: z_err
> dist_func: "normal"
> dist_pars: [
> {name: "mu", value: z}
> {name: "sigma", value: z_err}
> ]
> }
>
> I think defining all the various distributions as separate classes
> wouldn't help the clients writers enough to make it worthwhile. Just
> having a master list (vocabulary?) of what dist_funcs have what
> dist_pars ought to do the trick -- if a client doesn't know a specific
> dist_func, it's hosed whatever we do.
>
> One important special case would be non-parametric distributions,
> perhaps like this:
>
> (Measurement) {
> location: z
> naive_error: 0.5
> dist_func: "deviation_histogram"
> dist_pars: [
> {name: "sampling_points", [-1, -0.5, 0, 0.5, 1]}
> {name: "sampling_values", [0.01, 0.2, 0.68, 0.1, 0.01]}
> ]
> }
>
> -- but as I said, that's just Sci-Fi I'm inventing here to show that we
> *can* extend this to support actual error calculus once we've worked out
> the basic cases.
>
> -- Markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20210518/52da8192/attachment.html>
More information about the dm
mailing list