Spectra DM for theoretical spectra?

Thu Jun 4 01:18:41 PDT 2009

Hi Gerard,
    First question (provenance):
       For a real observation it will have progenitors...
     Each of those themselves can be described with OBs data model including
Provenance.
      If the provenance of simulated data include some description coming
from SimDB, it is simDB work to describe the previous steps.

    Second point ( Input parameters): from Carlos and Miguel's questions I
inferred that the Characterization ma be concerned and not the input
parameters. I was wrong and fixed this error in my " a la SSA" solution
according to your solution.

    Third point (Characterization): You are right we discussed this several
times. 
    What I aqreed on is that although it is the same concept on both sides
(simDB and IVOA char), in practice it has a lot of consequences (on both
sides) to share the same solution not at the abstract but at the practical
level, and it's probably not required to do this effort unless we have
strong use-cases to do that.
     I will answer Rick's today mail on the mass resolution anyway
Cheers
François 

-----Message d'origine-----
De : Gerard [mailto:gerard.lemson at mpe.mpg.de] 
Envoyé : mercredi 3 juin 2009 16:38
À : 'bonnarel'; 'Alberto Micol'; rwagner at physics.ucsd.edu
Cc : theory at ivoa.net; dm at ivoa.net
Objet : RE: Spectra DM for theoretical spectra?

Hi Francois

> 
> That is if we are facing the simulated observation (simulated 
> spectrum Or image), the simulation experiment which gave this 
> dataset is definitely part of the provenance. So in the case 
> of a simulation described according to SimDB datamodel, the 
> provenance of the dataset should hook a lot of material from SimDB.
> 
This has always been my main problem with a simple "Provenance" class that
should take care of everything.

For the synthetci image may actually have been obtained by virtually
observing a synthetic galaxy catalogue, which was built on the skeleton of
dark halo merger trees, which were derived by detecting substructure in a
friends-of-friends dark halo catalogue, which were extracted from the
Millennium N-Body simulation, which used a particular code to derive the
initial conditions. ALL of these steps are part of the provenance of the
final synthetic image.
Where do you stop in modelling your "Provenance" box?
Unless you make simplifying assumptions, or leave out certain steps because
you assume them known, or they are not important for a certain application,
you can not make a general model for Provenance like that.

Or do you actually model these steps themselves real world, where these
steps all occ
As you know (I suppose), SimDB, being based on the original domain model
(ADASS 2003) took all of this into account already by explicitly including
these steps in the model. 
Clearly not every step explicitly, that owuld be impossible.
There is no box for "SemiAnalyticalGalaxyFormationCoe" and another box for
"FOFClusterFinder" and nother box for "NBodySimulator" etc. That model would
nver finish.

Instead we recognise that the two main concepts are that experiments (i.e.
simulations/observations/postprocessing/...) are done according to a well
described "protocols" (simulation code/telescope configuration/...).
And that some experiment use the results of prior experiments.

This is less explicit than may be desired for certain use cases, but in the
world of simulations there is very little homogeneity. Maybe real
observations offer more handles?

>  But Alberto, where does the confusion come from with our colleagues?
> 
> In SimDB itself there is a characterization class. As Rick 
> states it describes distributions of values for quantities 
> (they say properties) Measured on ObjectCollections (part of 
> the so called Snapshot), such as metallicity or  RMS Mach 
> number. Also such things as resolution which makes some 
> confusion also. SimDB Characterization metadata gives min/max 
> mean, mode or statistical moments for these distribution of 
> values.( I did not understand that they will describe the 
> input parameters of the simulation as well... )
> 
The input parameters are captured under the class SimDB:InputParameter,
defined on the SimDB:Protocol.
The values of the parameters are in SimDB:ParameterSetting.

>    SO if we are facing a simulated dataset we have the 
> following situation:
>       A spectrum as  an IVOA Char + Provenance description
>        This provenance provides hooks to SimDB description of 
> the simulation. This Description contains itself a 
> SimDBCharacterization description of the Simulated reality 
> (star, particules, etc...) but is not limited to it . 
> 
IF you want to describe also synthetic spectra in this way, that is possible
I would think.
You can certainly extract such a view of theory spectra from the SimDB data
model, including the relation to those other aspects in the model that
describe the provenance.

>       You may read in one of the compilation mails I sent you 
> this morning how the two Charac are related.
>       In my opinion, development of level 4 of IVOA 
> characterization could fulfill the needs of SimDB 
> (statistical). I may explain this in more details.
As you know, from the beginning I have argued that characterisation is a
statistical description/summary of data values.
It is not that aspect that is missing in CharacterisationDM. After all
location/bounds/support can already be interpreted as statistical quantities
derived from a more complete description (a priori or a posteriori). We
don't need the more complete description in SimDB most likely. Does not help
much in discovering interesting simulations.

> Some other people (including Gerrad and Rick I suppose) 
> prefer to derive the two charac (IVOA one and SimDB one) from 
> a more abstract common model
> 
And I have frequently discussed this with you and Mireille, the relations
between the SimDB view of characterisation, and the view that the
Characterisation DM takes. And indeed that it might be fruitful to realise
they are both different, specialised representations of a common "Platonic"
model for charactarisation. And I thought we agreed on this.
Didn't we?

Cheers

Gerard

> Cheers
> François
> -----Message d'origine-----
> De : Alberto Micol [mailto:alberto.micol at eso.org] Envoyé : 
> mercredi 3 juin 2009 10:56 À : rwagner at physics.ucsd.edu Cc : 
> theory at ivoa.net; dm at ivoa.net Objet : Re: Spectra DM for 
> theoretical spectra?
> 
> 
> Rick Wagner wrote:
> > 1) What we are calling Characterization in the simulation 
> data model, 
> > is restricted to only provenance information. For example, the star 
> > formation rate or RMS Mach number may be represented, and these are 
> > physical properties measure from the data (in my cases).
> 
> I'd like to stress what already mentioned by Francois: the 
> importance of using the correct vocabulary.
> I take the example here above to illustrate the problem 
> (sorry Rick, nothing against you, yours just happens to be 
> the last email about this, see also Carlos' 
> post): in one sentence three different
> concepts are used somewhat intermingled: Characterisation, 
> Provenance, and measured properties.
> 
> Given that in the IVOA world (or 'vosphere' to take Sebastien's nice
> expression) both Characterisation and  Provenance DMs already 
> exist (one already recommended, the other being worked out), 
> I would ask Theory to try to avoid any confusion and not to 
> use Characterisation what we are already used to call Provenance, etc.
> 
> I'm trying to depict the problem here, and later I'll make 
> some suggestions on the vocabulary and on a possible way to 
> solve the issue, that is, enabling a full description of a 
> theoretical spectrum.
> Please bear with me... I need to clarify my terminology to 
> get understood...
> 
> Observation flow:
> Real Universe --> Observation --> Telescope --> Data/Metadata 
> --> Measurements
> 
> e.g.
> input: a spectrum of star is observed by a camera and from it
> output: the metallicity, log g, eff. temperature are 
> obtained/measured;
> 
> 
> Simulation flow:
> in the Theoretical World, it is conceptually reversed, the 
> "Measurements" actually become the Input parameters in a flow 
> that looks very much reversed:
> 
> Input parameters --> Virtual Telescope --> Simulation --> 
> Modelled Universe
> 
> e.g.
> Input parameters are metallicity, log g, eff temperature
> output: a modelled spectrum is obtained.
> 
> 
> It is a reversed flow, it is like mathematically inverting a function:
>    function(domain)  ->  image
>    inverse function(image) -> domain
> 
> the "image" of the function called "observing"
> becomes the "domain" of the function called "simulating", and 
> viceversa.
> 
> 
> So far, CharDM, ProvenanceDM and SpectrumDM have focused 
> their attention onto the HOW, and have intentionally avoided 
> to describe the WHAT is observed.
> It would be too difficult to describe anything that could be 
> observed [a star, a galaxy, a cluster, a cloud, a bird, a 
> girl (try to model it! ;-) ), etc.]
> 
> 
> Maybe some confusion raises from the fact that the WHAT of 
> the real world becomes the Input parameters in the Simulated 
> World (e.g. the eff temp, the log g, the metallicity of a star).
> 
> The distinction though is that in the Real World the WHAT is 
> in the end a measurement, an estimate of the Truth, and hence 
> has got an associated error, while in the Theoretical World 
> the Input parameters _are_ the Truth (and error is virtually zero).
> If TheoryWG comes up with a good set of models for the Input 
> Parameters, please bear in mind the above, because the same 
> model, with some care, could be re-used to model the real 
> world's WHAT!
> 
> 
> Coming back to terminology/vocabulary:
> 
> Carlos Rodrigo Blanco wrote:
> > I'm not so sure that we are talking about provenance and not 
> > characterization.
> >
> > I feel that the sentence "description of a dataset or an 
> observation 
> > in the Physical parameter space of the data" describes 
> precisely what 
> > we are talking about.
> >
> > The physical n-dimensional space where a theoretical spectra is 
> > located is a space parametrized by some parameters as Teff, Logg, 
> > metallicity, etc.
> >
> > when I go to the characterization data model I read:
> >
> > ---
> > This document defines the high level metadata necessary to describe 
> > the physical parameter space of observed or simulated astronomical 
> > data sets, such as 2D-images, data cubes, X-ray event 
> lists, IFU data, 
> > etc..
> 
> Characterisation DM: Care was taken *not* to describe "what" 
> was observed/simulated, CharDM describes the N-dimensional 
> space subtended by an observation/simulation product 
> specifying which part of the N-dimensional space [whose axes 
> are the _data_ axes: space, time, wavelength...] was covered, 
> with which sampling, and which resolution, all at various 
> levels of detail (from an indicative number (location) down 
> to finer details like detector sensitivy, transmission curve, etc).
> 
> 
> The ProvenanceDM describes the process that resulted in an 
> observation/simulation (e.g., among other things, the 
> telescope/instrument configuration, but also the PI program, etc.)
> 
> 
> Rick Wagner wrote:
> > However, there are some properties, such as resolution (spatial, or
> > frequency) which are known beforehand. For this reason, there is a 
> > boolean "a priori" attribute on the Characterization class. This 
> > distinction may need to be made more clear, in order to align our 
> > model with the Characterization Data Model.
> 
> The SpectrumDM combines a number of "How"DMs (CharDM, 
> ProvenanceDM, Curation, etc.) though it allows some little 
> digression into the realm of "Derived Quantities", that is 
> quantities that are measured out of the described spectrum, 
> and into the realm of what the PI already knows about the 
> target ("a priori"). For example, for REDSHIFT, two different 
> utypes and FITS keywords exist at this effect, one for the "a 
> priori" knowledge one for the "a posteriori" measured quantity).
> 
> 
> As many have already stated I also think that the Input 
> parameters are the equivalent to the telescope/instrument 
> settings, they are the Virtual Telescope settings, and as 
> such the word Characterisation should strongly be avoided, in 
> favour of Provenance.
> 
> 
> A possible recipe to solve the problem:
> 
> Given the multitude of simulators that create spectra 
> (stellar atmospheres, galaxy clusters, bl lacs, etc.) it is 
> not possible to have one ProvenanceDM that covers them all; I 
> think that each of those sets of Input Parameters should be 
> modelled separately; then it would be nice to enable 
> ProvenanceDM to reference anyone of them; similary SSAP (or 
> its subset TSAP) could easily make use of all that.
>  
> Sorry for the long email, but I think it is useful to clarify 
> things (and, viceversa, please clarify things that I might 
> have gotten wrong)
> 
> Alberto
> 
>