Spectra DM for theoretical spectra?

Gerard gerard.lemson at mpe.mpg.de
Wed Jun 3 06:05:24 PDT 2009


 Hi Alberto and others

I can not give a reaction that may do justice to all your philosophical
considerations.
Therefore simply some comments on part of your mail from the point of view
of the SimDB data modelling effort.
I think it would be good not to start redoing the work that has gone into
that in a few emails.

First, what is it we are discussing here?
I guess it started by questions whether synthetic spectra MUST be described
using SSA/SpectrumDM,
or whether we can use the SimDB data model for describing them.
I first want to second Rick who said that it depends on the context and the
publisher.
If one want to publish synthetic spectra through SSA you MUST describe them
according to the SpectrumDM.
If you want to publish them through SimDB, you MUST describe them according
to its data model (IF we decide that SimDB will be extended to the
"micro-simulations", or "models" that can produce them).

I guess some discussion arises because of claims that some aspects are not
well covered by one model that are/seem to be covered by the other. 


> 
> Rick Wagner wrote:
> > 1) What we are calling Characterization in the simulation 
> data model, 
> > ...
> > ...
> I would ask Theory to try to avoid any confusion and not to 
> use Characterisation what we are already used to call Provenance, etc.
> 
Don't worry, we are not doing that.
I agree with you about the importance of using the right vocabulary.
But I think part of the problem is that Rick's description is not correct,
not because existing names were attached to wrong concepts.
The SimDB:Characterisation class (notation using utypes!) in the simulation
data model represents a similar concept of "characterisation" as does the
existing characterisation model. There are subtle differences that are still
under discussion with the "characterisation model group" (represented by
Mireille and Francois and me).

What is usually called Provenance is not explicitly represented "with a box"
in the Simulation data model.
Instead most of the SimDB model is about describing the provenance of
simulation results ("Snapshots").
The patterns in this model have a long history (ADASS 2003), and I'd advice
the people working on Provenance to have a look at it ;)



> I'm trying to depict the problem here, and later I'll make 
> some suggestions on the vocabulary and on a possible way to 
> solve the issue, that is, enabling a full description of a 
> theoretical spectrum.
> Please bear with me... I need to clarify my terminology to 
> get understood...
> 
> Observation flow:
> Real Universe --> Observation --> Telescope --> Data/Metadata 
> --> Measurements
> 
> e.g.
> input: a spectrum of star is observed by a camera and from it
> output: the metallicity, log g, eff. temperature are 
> obtained/measured;
> 
> 
> Simulation flow:
> in the Theoretical World, it is conceptually reversed, the 
> "Measurements" actually become the Input parameters in a flow 
> that looks very much reversed:
> 
> Input parameters --> Virtual Telescope --> Simulation --> 
> Modelled Universe
> 
> e.g.
> Input parameters are metallicity, log g, eff temperature
> output: a modelled spectrum is obtained.
> 
> 
> It is a reversed flow, it is like mathematically inverting a function:
>    function(domain)  ->  image
>    inverse function(image) -> domain
> 
> the "image" of the function called "observing"
> becomes the "domain" of the function called "simulating", and 
> viceversa.
> 

>From the point of view of SimDB I don't think this all is very relevant.
First, the Simulation data model is not aimed at synthetic observations.
It is about a(n ever expanding it seems) class of simulations and their
results, not (necessarily) about (virtually) observing such simulations. The
current discussion about synthetic spectra may seem to indicate this, but I
guess that most synthetic spectra and images are created without virtual
telescopes at the moment, they simply result directly from models or
projections of 3D simulaitons and could be described inside SimDB without
much problem.

The  SimDB:InputParameters are parameters into a simulation code or other
type of software. They may represent physical parameters, but may also be
purely technical such as gravitational softening lengths. There is not
concept of inversion of functions here. In fact observations could easily be
described in the same pattern. They are simply experiments that produce
images, just as an N-Body simulation produces "SimDB:Snapshots".


> 
> So far, CharDM, ProvenanceDM and SpectrumDM have focused 
> their attention onto the HOW, and have intentionally avoided 
> to describe the WHAT is observed.
> It would be too difficult to describe anything that could be 
> observed [a star, a galaxy, a cluster, a cloud, a bird, a 
> girl (try to model it! ;-) ), etc.]
> 

I would argue that SimDB is mainly about the HOW (provenance), and a little
about the WHAT (the characterisation part and the "target" part). 
The first question that scientists (who were polled!) had about simulations
was "what is simulated".
We are NOT modelling this in detail, but have two "boxes",
SimDB:TargetObject and SimDB:TargetProcess, that have an attribute that
allows one to say that a simulation is modelling a "galaxy merger" for
example. For this we introduce an appropriate semantic vocabulary as indeed
for now we do not see it as our task to model the world in detail. 



> 
> Maybe some confusion raises from the fact that the WHAT of 
> the real world becomes the Input parameters in the Simulated 
> World (e.g. the eff temp, the log g, the metallicity of a star).
> 
> The distinction though is that in the Real World the WHAT is 
> in the end a measurement, an estimate of the Truth, and hence 
> has got an associated error, while in the Theoretical World 
> the Input parameters _are_ the Truth (and error is virtually zero).
> If TheoryWG comes up with a good set of models for the Input 
> Parameters, please bear in mind the above, because the same 
> model, with some care, could be re-used to model the real 
> world's WHAT!
> 
> 

I don't know about this, I think you're making some overly simplifying
assumptions. 
In SimDB we model the input parameters to a simulation explicitly, some of
which have physical meaning, some of which are technical. And we also model
the results of the simulations. And there we have "measurements" just as in
observations.
Though "measurement" is maybe not quite the right term for a value assigned
to a property by a simulaiton code.
In our domain model (ADASS 2003, ,
http://www.ivoa.net/internal/IVOA/IvoaDataModel/DomainModelv0.9.1.doc) we
called it simply a value assignment. Furthermore we may easily have a
billion measurements and so we use characterisation to summarise those.




> Coming back to terminology/vocabulary:
> 
> Carlos Rodrigo Blanco wrote:
> > I'm not so sure that we are talking about provenance and not 
> > ...
> > the physical parameter space of observed or simulated astronomical 
> > data sets, such as 2D-images, data cubes, X-ray event 
> lists, IFU data, 
> > etc..
> 
> Characterisation DM: Care was taken *not* to describe "what" 
> was observed/simulated, CharDM describes the N-dimensional 
> space subtended by an observation/simulation product 
> specifying which part of the N-dimensional space [whose axes 
> are the _data_ axes: space, time, wavelength...] was covered, 
> with which sampling, and which resolution, all at various 
> levels of detail (from an indicative number (location) down 
> to finer details like detector sensitivy, transmission curve, etc).
> 
Partially because of this lack of a "what is charactaerised", the
characterisation data model in its current form was not sufficient to
describe those aspects of characterisation of simulations that we deemed
necessary. At the same time it contains types that so far we felt were not
needed for SimDB purposes, such as accuracy.
Nevertheless we tried incorporating the *ideas* behind the CharDM and tried
to expand it by analysing precisely WHAT is characterised. That alterative
and generalised model of characterisation can hopefully be seen as simply
another *view* of some Platonic ideal model of characterisation, and the
same should be true for 


> 
> The ProvenanceDM describes the process that resulted in an 
> observation/simulation (e.g., among other things, the 
> telescope/instrument configuration, but also the PI program, etc.)
> 
> 
This is what SimDB already does for simulations. 
I would urgently suggest that the Provenance DM is not going to claim
describing the provenance of simulations.
That has been done already in the simulation data model. And already since
2003 there has been proposed an approach to describing "provenance". 



> Rick Wagner wrote:
> > However, there are some properties, such as resolution (spatial, or
> > frequency) which are known beforehand. For this reason, there is a 
> > boolean "a priori" attribute on the Characterization class. This 
> > distinction may need to be made more clear, in order to align our 
> > model with the Characterization Data Model.
> 
> The SpectrumDM combines a number of "How"DMs (CharDM, 
> ProvenanceDM, Curation, etc.) though it allows some little 
> digression into the realm of "Derived Quantities", that is 
> quantities that are measured out of the described spectrum, 
> and into the realm of what the PI already knows about the 
> target ("a priori"). For example, for REDSHIFT, two different 
> utypes and FITS keywords exist at this effect, one for the "a 
> priori" knowledge one for the "a posteriori" measured quantity).
> 
> 
> As many have already stated I also think that the Input 
> parameters are the equivalent to the telescope/instrument 
> settings, they are the Virtual Telescope settings, and as 
> such the word Characterisation should strongly be avoided, in 
> favour of Provenance.
> 
That is indeed exactly the way we modelled this (already in 2003!) and why
observations could maybe follow the patterns that we have laid out.


> 
> A possible recipe to solve the problem:
> 
> Given the multitude of simulators that create spectra 
> (stellar atmospheres, galaxy clusters, bl lacs, etc.) it is 
> not possible to have one ProvenanceDM that covers them all; I 
> think that each of those sets of Input Parameters should be 
> modelled separately; then it would be nice to enable 
> ProvenanceDM to reference anyone of them; similary SSAP (or 
> its subset TSAP) could easily make use of all that.

No, we can come up with a model that covers much of this in one go.
We have done so in the SimDB data model.
It may take some getting used to, but it's worth it. 
Note that models can come in many different shapes.
One does not always have to model all features using named attributes or so.

>  
> Sorry for the long email, but I think it is useful to clarify 
> things (and, viceversa, please clarify things that I might 
> have gotten wrong)
Hope to have been of help ;)



Cheers

Gerard





More information about the theory mailing list