Post-workshop Measurement musings

Tue May 18 20:48:00 CEST 2021

Hi all,

For the record, this issue has been extensively discussed since last January on the collaborative platform that has been set-up forf that workshop to exercice different proposals. (https://github.com/ivoa/dm-usecases <https://github.com/ivoa/dm-usecases>).
I think it won’t be fruitful to replay this discussion of the DM list, hence if you have a piece of luck and a good search engine, you may find in the issues (https://github.com/ivoa/dm-usecases/issues <https://github.com/ivoa/dm-usecases/issues>) many pro/con about the Markus proposal.

LM 

> On 18 May 2021, at 11:44, Markus Demleitner <msdemlei at ari.uni-heidelberg.de> wrote:
> 
> Dear DM,
> 
> At yesterday's DM interop preparation workshop, I was asked to bring
> forward a model for Measurement that I'd consider fine for my programme
> of "cover a governing use case".
> 
> This use case for Measurement from my perspective is "plot error bars"
> (which is think is easily sold to the client writers, which, full
> disclosure, is what I'm convinced should guide us), with a
> perspective to "automatic error propagation" in the future.
> 
> I think the current Measurements proposal will essentially work for that
> when we drop a few of the boxes -- and then drop anything that is not
> used by a client at the time of RFC.
> 
> What I'd like to see unconditionally dropped are the Time, Position,
> Velocity, ProperMotion, and Polarization classes; they entangle the DM
> with other DMs without giving a benefit I can perceive; for the rough
> classification of quantities we have UCDs, and frames, photometric
> metadata, and similar data can be attached directly to the columns.
> 
> For the rest, I strongly suspect you won't see implementations for the
> 3D errors, so I'd not be surprised if those dropped out at the
> RFC implementation test.
> 
> The 2D errors I suspect may be convenient shortcuts.  But really, in the
> end we'll need a proper model for correlated errors, perhaps as
> envisioned by
> https://github.com/msdemlei/astropy#working-with-covariance, but I'd
> strongly advise to postpone that to later versions -- it'll scare
> adopters unnecessarily, and I think it's really only useful once we
> want to do automatic error propagation (which is Sci-Fi at this point
> for all I can see).
> 
> That's basically it (and I've said as much on the two RFC pages).
> 
> 
> 
> If, on the other hand, you ask me how I'd build the measurement/error
> thing if I got to design it from scratch... Well, in some ad-hoc
> notation what we ought to have is at first (where "column" could of
> course be a param as well and perhaps a literal):
> 
> Measurement:
> 	location: the column containing the value
> 	label: some human-readable designation how this annotation is to
> 	  be understood
> 	error_type: "stat" by default, or "sys", perhaps later other values;
> 		note that a single column can have both stat and sys annotations
> 	naive_error: a column containing a naive, symmetrical error
> 	naive_lower: a column containing a naive lower bound
> 	naive_upper: a column containing a naive upper bound
> 	naive_plus: a column containing a naive upper error
> 	naive_minus: a column containing a naive lower error
> 
> "Naive" here means that we don't actually say what this is (as in "one
> sigma" or so); that's not known or specified in many sorts of data, and
> while humans will eventually have to figure it out if they want to
> interpret the error bars, it's not important for the first governing use
> case.  Everything except location is optional, and data providers would
> be encouraged to only give one of naive_error, (naive_upper/_lower), and
> (naive_plus/_minus) in one annotation.
> 
> If we find a client that wants to plot error ellipses, we'd add
> 
> Measurement2D:
> 	location1: columns containing the position
> 	location2:
> 	semiMajor:
> 	semiMinor:
> 	posAngle:
> 
> as in current Measurement's ellipse (or whatever the client writer
> says).
> 
> That would be it for the first round.
> 
> 
> Once we've figured out how to talk to the client writers, I expect
> they'll want to learn about correlated errors.  For that, there'd be a
> class
> 
> Correlation:
> 	error1: column that contains the first error
> 	error2: column that contains the second error
> 	correlation_coeff: the entry in the covariance matrix
> 
> (and possibly other representations of correlations as requested by the
> client writers).
> 
> 
> And then, when we want to actually enable error calculus, I expect we
> need to represent actual distributions.  I'm just mentioning this here
> to show one way in which that could be done.  I'm pretty sure we'll want
> something else in the end, but that would need to be worked out between
> consumers (client writers) and producers (data providers) strictly based
> on actual use cases.
> 
> Having said that, we could extend Measurement (meaning: even with
> distributions, data providers should still provide some naive error
> measure) by saying:
> 
> 	dist_func: (from a vocabulary)
> 	dist_pars: array of DistPar
> 
> and 
> 
> DistPar:
> 	name: (literal, depending on dist_func)
> 	value: something
> 
> For instance, a Gaussian-distributed column z could have
> 
> (Measurement) {
> 	location: z
> 	naive_error: z_err
> 	dist_func: "normal"
> 	dist_pars: [
> 		{name: "mu", value: z}
> 		{name: "sigma", value: z_err}
> 	]
> }
> 
> I think defining all the various distributions as separate classes
> wouldn't help the clients writers enough to make it worthwhile.  Just
> having a master list (vocabulary?) of what dist_funcs have what
> dist_pars ought to do the trick -- if a client doesn't know a specific
> dist_func, it's hosed whatever we do.
> 
> One important special case would be non-parametric distributions,
> perhaps like this:
> 
> (Measurement) {
> 	location: z
> 	naive_error: 0.5
> 	dist_func: "deviation_histogram"
> 	dist_pars: [
> 		{name: "sampling_points", [-1, -0.5, 0, 0.5, 1]}
> 		{name: "sampling_values", [0.01, 0.2, 0.68, 0.1, 0.01]}
> 	]
> }
> 
> -- but as I said, that's just Sci-Fi I'm inventing here to show that we
> *can* extend this to support actual error calculus once we've worked out
> the basic cases.
> 
>           -- Markus

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20210518/52da8192/attachment.html>