[QUANTITY] The discussion so far

Gerard Lemson gerard.lemson at mpe.mpg.de
Fri Oct 31 05:34:35 PST 2003


Jonathan
Also many thanks from me for this effort, now I have something concrete to
react to.
The reply is somewhat longish, but I think I've kept relatively quiet sofar.
Gerard

> Quantity Discussions Oct 2003
>
> I've been drowning in email but I think I'm almost caught up. Phew!
> Here's my understanding of the issues so far, as represented
> in both dm at ivoa,net and off-list discussions, with a few of my own
> spins. Forgive me if I have misrepresented opinions and been sloppy
> about credit (e.g. the Dowler:: namespace should really be
> Dowler-Lemson::?; I use Foo:: to indicate an object introducted into the
> discussion by person Foo in some email).
>
> What should Q be called? Is Quantity an OK name?
>  Thomas yes, but it means what we want it to mean
>  Dowler yes, but it means what CS people mean
>  Berry no, use Data Container
>  McDowell yes.
Quantity is an OK name for some things, not for others. We need to define
what the concept is that
the "Quantity modeling effort" is supposed to model to be able to decide
whether it is a good name
for that concept.
I would like to add a question to this list here:

How many concepts are there that could go by the name Quantity, which one
are we discussing and where does it fit in the greater context of the data
modeling effort ?
Gerard: at least three. One is the concept of the number(s) that exist(s) in
some archive, formatted and serialized in some way.
The modeling of this concept must take into account the way in which the
quantity is stored and must have a mapping to
mata-data concepts that provide it with meaning.
The other concept of Quantity is that it is a value for a *single*
phenomenon/property (which however may be more complex than a single
number), This is the concept to which the previous one is mapped. I.e. it is
the meta-data concept of a quantity as a number that has been produced in a
certain way and has a certain meaning. The third is that it is the
phenomenon itself (in the sens of SI's quantity in the generalized sense,
see again http://physics.nist.gov/cuu/Units/introduction.html).
I think all of these three concepts have been discussed and I thinknot
separating these concepts has caused a lot of confusion.
I think all three need modeling, simply because all three can be the subject
of discussion.

>
> Should we do Q now, or wait until we've done O (Observation)?
>  Alberto - wait
>  Everyone else writing emails, - presumably, now
>  McDowell: I think O is a higher priority, but no reason not to
>   forge ahead with Q too.
I agree.

>
> Is the name of the (phenomenon, variable, etc) part of Q?
>  (Dowler::Property name in a separate place)
>  (Berry::DataContainer: yes, include name and/or label)
>  McDowell: I would like to see a name as part of Q - then scalar Qs
>   can serialize as a FITS keyword.
Depending on which concept(ion) of Quantity we're discussing I'd say yes and
no.
I think that in one sense a Quantity is related to a phenomenon through a
measurement that associated it to a variable (Pat's opinion).
When it is serialized in a FITS file one may use the phenomenon there to
indicate what it means. Note that this is an implementation
question, not a conceptual modeling question.

We need to ask another question (and answer it in the affirmative !):
Do we need to model "phenomenon" separately.?
Gerard: I belive YES. I believe it is through phenomena that we can find out
whether two measurements and their
results can be compared, namely by checking whether they were intended to
measure the same thing. Similarly, I can now
start modeling simulations in a way that allows me to make contact to
observation, simply by declaring that my simulations
calculates values for particular phenomona or properties, for example
galaxy/luminosity (nicely correspondent to the latest UCD
proposals). And finally, IF we want archives to specify the meaning of a
quantity by means of a phenomenon keyword for example, we
better have a (meta-)data dictionary/repository somewhere that prescribes
valid values for this class. So we need to model these
independently from any use by quantities or whatever.



> Does Q support arrays? Multi-dim arrays?
>  (Dowler: yes)
>  (Thomas Oct 25 0845: yes)
>  (Didelon Oct 28 no: model something simple)
>   Reason: same unit for many samples, commonalities
> McDowell: I say yes, because we will need such a multi-dim object,
> and it will be exactly the same as Q except for the multi-dimensionality.
> So why do the same work twice? So much of astronomical life
> is N-dim array based. We might as well put it in at the basic level.
Again, depends on concept(ion) of Quantity. In the conceptual model Pat and
I proposed
a Quantity can only be an array (of whatever dimension) if there is a
corresponding phenomenon
that requires such a datastructure for its description. At first I doubted
whether something like that exists,
but Pat persuaded me with the example of a "shape" phenomenon, that might be
represented as an array of positions
describing the vertices of a polygon.
I believe that arrays of numbers with common units and meaning, created
simply for efficient storage for example does not fit in
with that concept of Quantity. Calling a column in a table, represented as
an array a Quantity for example seems not right either.
They do have a place in modeling storage, but should be named differently,
or at least, the storage definition and mapping to
the meta-model of such a datacontainer should be somewhat more complex than
that of an individual value/number must be.

I think we have to be careful to group things in one concept simply because
they seem the same.
An array of individual atomic quantities that are grouped because together
they represent a new phenomenon (for example polyon)
is different from an array of atomic quantities that are grouped together
simply because all of them have the same units and therefore
it is more efficient to store them that way.

>
> Does Q support multi-dim arrays with links to other Q's as axes?
>  example: Flux(Wavelength)
>  Thomas Oct 29: yes
>  Plante: no, higher level object to connect data Q with axis Qs
>  McDowell: I agree with Ray, this should be a higher level object.
>   Q should be the values associated with a single UCD (not counting
>   modifiers like error, quality), and anything connecting two UCDs
>   should not be in Q.
>  (admittedly our CfA DataContainer object does do this WCS stuff,
>   but I think we are trying
>   to keep Q a little simpler.)
I agree
>
> Does Q support complex types?
>  Dowler::Type = Ellipse2D, Oct 27
>  Dowler, polygon types (Oct 29)
>  McDowell: suggest we not rule this out, but an initial implementation
>    would only support basic datatypes.
I think we have to immediately, for example Position.
>
> Should array quantity and a scalar quantity be separate
> classes?
>  (Dowler::AtomicQuantity, Dowler::ArrayQuantity;
>  McDowell - My view:
>     Not a separate class, simply the case n = 1
>     Failing that, at least a class inheriting via restriction
>     from Array, not a separate derivation from Q.
>  Dowler's view (Oct 29): "really dislike the array of length 1..."
>     (but I think this is just for the serialization, not
>     the internal class representation, so perhaps reconciliation possible)
Yes, if only to be able to end the recursion in the model that expresses the
array quantity as an
array of a particular other quantity (possibly again arrays). In cases like
this I prefer to vote for explicitness.
In the end it anyone can choose their own implementation if they want to, as
long as we can translate to the common definition.
Btw, we may well need to model *how* a phenomenon can be represented
explicitly and store this in the same reference mata-data
repository I mentioned before. Because agreeing on having a phenomenon
called "position" for example does not prescribe how to
recognize values of it. We need as it were to provide an abstract type
definition that can serve as the type of a quantity.
It is through this definition that a quantity would be related to a
phenomenon in the end.

>
> Should Q include heterogenous arrays? (with different UCD, units etc)
>   (Thomas::QuantitySet table row construct)
>   Most people: No
>   Dowler: No, but consider representation issues (ISO date,
> numerical error)
>   McDowell: Probably no, at least for rev 1
Different UCD seems to indicate that we can have arrays consisting of
quantities representing different concepts ?
Such a thing should imo be modeled as a composite quantity. If we're talking
about representation only I see no reason
to forbind this, as long as it can be properly mapped to the meta-data.
>
> Should units be in Q?
>  Everyone (I think!): Yes
Yes, but we need to model units separately as well and store allowed
instances in the reference meta-data repository
mentioned earlier.
>
> Should errors be in Q or in Measurement (aggregation with Q)?
>  Dowler::Measurement: not in Q
>  Dowler: lots of things in VO are not physical measurements and
> do not have errors
>  Thomas: data fusion requires errors in Q.
>  Thomas (Oct 30): suggest that
>                   Dowler::Measurement maps to Q (and
> Dowler::Quantity does not)
>  Berry::DataContainer should include them
>  McDowell: yes, but don't model the Error object fully yet.
yes ? so which is it ?
In the conceptual model, they belong in measurement imo. In the storage
model it is clear that one needs to
find a place to store them. Notice btw that even for the storage model one
is realy storing measurment results, which are
more than the quantity.

>
> If errors are in Q, should there be a simpler class similar to
> Dowler::Quantity
> which does not contain errors?
>  Didelon yes, (Oct 30)
>  Thomas: implicitly no (Oct 30?) but didn't address Didelon's request for
>   a name to talk about Dowler::Quantity (the object with no errors)
>  McDowell: I think no, there's no need for a 'Simple-Q' with no errors
>  (as opposed to a Q with a null error), it doesn't add significant
>  weight to the class (and in the XML serialization doesn't have
> to add any weight?)
>  Lemson (Oct 30): Dowler::Quantity is individual pixels - but errors
>   may be correlated. Whole image is a Lemson::Result and not a
> Dowler::Quantity.
>  McDowell: I like the idea that a single pixel can be a Quantity
> on its own,
>   and an array of pixels can be a Quantity. Much fun will be hidden in the
>   Error model. In particular, even in an image where the errors are
>   correlated, one sometimes asks 'what is the absolute error on
> this pixel?',
>   or 'what is the relative error on this pixel?', information that really
>   is meaningful for just that pixel alone. Sometimes in contrast one asks:
>   'what is the error on the flux extracted from this group of pixels'
>   in which case the array's Error is the thing you need to use. The
>   fact that errors are correlated doesn't mean it's meaningless to
>   ask a pixel what its error is, and so doesn't mean Quantity shouldn't
>   have an Error object.
First, I think errors should not be in Q.
The different use of quantities is actually one of the reasons to keep them
simple.
As a parameter (a concept by itself) an error is in general not required.
In a measurement it is, though possibly rather complex. A parameter is
however not a simple kind
of measurement.

I actually want to refine my statement that a single pixel is a quantity. It
really represents a single
measurement, that is one can get the provenance of it, one may on a pixel by
pixel basis decide on errors/uncertainties.
The image as a whole I would not call a measurement. It is a colelction of
possibly correlated measurement, what we call
a Result of an Experiment. Again, the physical/stored representation of the
image in some FITS file must be modeled
and mapped separately. If there is something like correlated errors on an
image some representation must be designe dfor that.
This can of worms we left for others to open.

>
> Is there an intermediate astronomy/container-type object
> between O and Q?
>  Tody: Adding quality etc to Q makes it no longer Q, but Tody::Dataset
>  McDowell: I introduce this question because of Doug's comment;
>   one can perhaps recast the continuum of opinions into a divide between
>   those who want Q really simple (scalar value + unit, no array, no
>   name, no error) and those who (like me) want Q to be the basis
> for containing
>   everything except the astronomy (array values, unit, quality, errors,
>   perhaps even coords). Maybe that's an indication there are two
>   objects to be modelled, even if some of us think that using the
>   extra, simpler object will mean more difficulty in writing properly
>   general application code.
>
In the conceptual model Observation is very far removed from Quantity.
Observation is at the root of its hierarchy, i.e. represents an entity on
its own.
A quantity is a leaf in this hierarchy that is used to represent certain
aspects of the observation.
In this sense there is a whole lot between the two.

>
> Should quality be in Q?
>  Berry::DataContainer, yes (specific flags, not overall Obs quality)
>  Thomas: Yes, as part of Accuracy
>  Tody: No, keep Q simple
>  Micol: No, keep Q simple
>  Others: my impression is tending no
>  McDowell:  yes, I think it would be good to have this
>
Again, depends on concept. Not in conceptual, simple Quantity. Yes in
measurement/observation.

> Should Measurement inherit from Q vs use it as e.g. aggregation?
>  (Thomas yes, inherit?)
>  (Didelon no, aggregate)
>  (McDowell no, aggregate)
>  (Lemson no, "uses")
>
> Should Q support string values?
>  (Thomas, Oct 25 0845, yes)
>  (Plante, Oct 29, no)
>  (McDowell, strongly yes)
Depends on conseption. If we talk about serialized quantities in some file
there is no reason why one should be disallowed
to represent a real length using a string representation of a float.
But also in the conceptual interpretation can there be phenomena whose value
is most naturally represented by a string, for
example those that correspond to classifications such as the "Sc" spiral
galaxy type. Such a thing is not
a quantity in the sense that for example SI and we give to it. We've
introduced a separate class for this, called Classification
which is a sibling of Quantity, both subclasses of Value.

>
> Should Q also be used for metadata?
>  (Thomas Oct 25 0845, yes)
>  Didelon says Dowler separation (in fact, layering) of concept
>  and Q is good. Thomas seems to think he is arguing for separation
>  of data and metadata. I don't see the connection but perhaps
>  I missed something.
I guess this question corresponds to my very first remark about the
different interpretations.
I think we need to model both.
>
> Should Q describe its datatype?
>  Most people: yes? and I agree
>   Dowler::Type can be things like Ellipse2D, or things like Double.
Yes, in the way that a datatype can be associated to the phenomenon the
quantity models.
>
> Should Q include coverage, completeness?
>  Everyone: no, belongs in Observation
agree
>
> Should Q describe everything in a FITS file or VOTable
>  Everyone (?): No! Maybe this is true for Observation
agree
>
> Should Q talk about transform/mappings?
>  (Didelon no)
>  (Thomas details should be at higher level)
>  (Thomas no astronomical concepts like astro coord transforms)
>  (Berry: yes, since need to know why pixel 3 and pixel 4 are distinct)
>  (Plante no)
>  (McDowell: no, this should be the next higher level object)
No, separate. Possibly part of units modeling.
>
> Should Q have methods to describe quantity arithmetic?
>  (Barnes yes?)
>  (Everyone else: maybe but not yet?)
An implementation/binding to software can well have this. For the conceptual
model
it is not important.
>
> Should things be attributes or pointers?
>  (Dowler: start off with everything as classes)
>  (Plante: can refer to quantity without it having a value)
>  McDowell: I argue everything should be classes as long as possible,
>  allows for interfaces to hide special cases.
Agree that it is often a good policy apart from very simple concepts such as
name, description.
>
>
>  - Cheers, Jonathan
>



More information about the dm mailing list