SED data model v0.92

Wed Nov 24 08:29:53 PST 2004

Ed Shaya wrote:

> Gilles DUVERT wrote:
>
>>
>>
>> Ed Shaya wrote:
>>
>>>
>>> You want the lower error bar to reach from the upper limit to the 
>>> x-axis.
>>>
>> I think, in view of what you object, that I would prefer upper limits 
>> to have a different status than measure. Besides, I'm ill at ease 
>> thinking measure in geometrical terms (or, rather, in "plot+axis" 
>> terms), this looks too linked with our day to day external 
>> representation of data, (a mathematician would perhaps read "measure" 
>> as a "norm" for topology )
>>
> Although it makes the model more complicated, it is probably worth 
> having an  UpperLimitPoint and a LowerLimitPoint which contains only 
> the limit to be used ONLY when the limit value is not known.  It could 
> have an optional  NSigma which tells  how many sigma limit was used 
> and the default should be 3.  This way one does not confuse the upper 
> limit magnitude with the true error value or the true value.

now that you phrase it in DM words, it indeed complicates the model...

>
>>>> The risk would be, if any *measured* value is set, that is is taken 
>>>> at face value, when only the noise level has sense. Could you use 
>>>> some kind of blanking value for the measured value in this? (or is 
>>>> there a general concept of upper limit that would go in the 
>>>> Quantity::Accuracy data model?)
>>>>
>>> All I need say here is that if 4 independent experiments come up  
>>> with  0.9 sigma  detections of  some measurement that it would be 
>>> awful  if they each  published  only the upper limits of the value.
>>
>>
>>
>> Here you suppose that the value "detected at 0.9 sigma" exists, or, 
>> rather, that I can pinpoint it in the graph, and put nice big 
>> errorbars on it for each of the 4 measurements. What I say is that 
>> this value does not exist until it is measured, and it is measured 
>> only when it is not an upper limit.
>
>
>
> Are we having some subtle argument about when a measurement is a 
> measurement and when a detection is a detection?
> If I measure 9 photons in a pixel but the sky noise is 10 photons  we 
> keep track of those 9 photons by placing it in the point/value just 
> like any other measured value.
> With just this measurement it is not a detection of anything except 
> for those 9 photons, but  if  I  go back through the archives and find 
> that this location has been studied 3 times before, perhaps with 
> different filters, and each time this location has roughly 1 sigma 
> positive measurements, then I would say that an object is detected at 
> roughly 4 sigma and my observation was just a 0.9 sigma detection.  
> Perhaps you would phrase this differently.
> But, I hope that you are not supporting the idea that the USUAL 
> treatment for a  measurement that is  less than  3 sigma should be to 
> drop the value and just quote the upper limit.
> You would not support blanking out all of the pixels in an image or 
> spectrum where the signal is below 3 times the sky noise and replace 
> them with upper limit values?  Or, would you?  We would need to 
> reprocess all existing data archives to treat upper limits in this way?
> If 4 major physics laboratories come up with 1 sigma detections of the 
> mass of the neutrino at about 10eV should each publish just upper limits?
>
Well, your example is good and I certainly  do not want to drop all 
experiments below the 3 sigma limit !. But I was not claiming the SED 
datamodel does not describe properly measurements, on the contary! ( I 
would just *force* each Flux.StatErrXXX to be present and non-zero, 
instead of being optional and zero by default, because this *is* how a 
measurement *must be*).
No, I was discussing the fact that UpperLimits [of non-detections] 
cannot be expressed that way, and at the same time retain meaningfulness:
Simple detectors such as camera pixels counting photoevents in your 
example are just too easy to deal with...  Imagine you  map a region of  
interest in radio interferometry, nothing evident in the maps, yet you 
know there is a source at some position already detected at other 
wavelengths. You can fit a point source model in the visibilities, even 
force it to be at the source position. The fit may not converge. You 
have not measured anything, but you can still give an upper limit  
either based on the rms of the map or (more accurately) on the sensivity 
of the instrument given the total integration time, etc... People do 
that all the time. Where is the pixel with the actual photoevent count? 
Will the SED DM force radioastronomers to transform their (valuable) 
non-detection into a value+errorbar? with which value?

So the point is, "are all SED-related measurements expressible in terms 
of value+errorbar?", and I say no for detection limits. Data producers 
should carefully choose whatever corresponds most to their data between 
( Flux.value + Flux.StatErrXXX ) or Flux.UpperLimit ( or  
Flux.DetectionThreshold) .

>>
>> Of the 4 groups coming up with this "0.9 sigma detection", the 3 last 
>> in time are morons: they were not able to devise an experiment with 
>> less uncertainty as the 1st pioneering group. Shall we continue to 
>> support them financially? ;^). 
>
>
> That is a political statement.

(but with a smiley)

>
>> Besides, since the experiments are not the same, and you want to get 
>> a measurement at the end by averaging values+errors, you have to 
>> prove that errors and "measure" in those different experimental 
>> setups can be averaged. Unless the 4 experiments are just 4 
>> realizations of the 1st measurement, and, bingo, upper limits are 
>> still upper limits in this case....
>>
> We certainly need to have a system so that if the same measurement 
> shows up in several context that we and our applications can easily 
> recognize it.  That is the point of putting IDs on observations.
> Beyond that, it is the job of a distributed data system  to  be able 
> to  pull together similar  data entities for  processing.  Isn't the 
> idea of the VO to allow us to do analyses across multiple data sets 
> and data archives that is beyond our present capabilities.

the point here is how (difficult it is) to recognize if data is "similar"

>
>> Fortunately, for people bold enough to claim (and they are numerous!) 
>> that their 0.9 sigma measurement (was really an upper limit) _is_ a 
>> measurement, they can use the normal value+error scheme.
>
>
> This should be the normal mode and our applications should be upgraded 
> to recognize measurements below 3 sigma and plot them as yellow and 
> properly draw the upper limit symbol.

yes

>
>>
>>> The possibility that some moron may not look at the quoted noise 
>>> levels before coming to some silly conclusion on a measurment does 
>>> not compensate for missing out on  potentially important real 
>>> discoveries that properly archived data makes possible.
>>
>>
>>
>> a) The possibility of "some moron..." is huge.
>
>
> Oh yes!  This is a tradeoff between a bunch of people wasting some 
> time against losing precious perhaps irretrievable information.

What i meant is that in every astronomer using VO services there may be 
a potential moron since, however smart I am, I will be as dumb as the 
dumbest part of the chain  of software+specifications+documentation that 
conveys the data to me. If the DM is not accurate/clear enough, data 
producers may not be able to use the most appropriate tag for their 
data, data retrieval programs may misinterpret "measure" with "upper 
limits", overlook "errors", users may not be able to realize through the 
human-readable interface what was overlooked or misintrepreted.
This to stress only the importance of the work done on DMs as basements 
of the VO. (and praise those that concieve them!).

>> b) I would not place too much faith in discoveries (real, important) 
>> based on a sum of  invalid measurements...
>>
> This is the crux of it.  To me, a less than 3 sigma measurement is a 
> valid measurement.  It tells me  the probability of something being 
> there so that I can properly assess my risks in attempting to go 
> deeper in that direction.  If it is an invalid measurement then I 
> don't want it published at all anywhere.

I agree completely with the above sentence. Again, the point was not on 
the use of a valid pair (measurement+1sigma error) , but on the 
properties of UpperLimits that do not follow this scheme.

>
>>
>> Best,
>> Gilles
>>
> Cheers,
> Ed

Best,

Gilles