Documentation for Provenance

HI all,

     Material for IVOA provenance is not yet complete.
     The basic ideas can be found here:

      You can also read  the discussion we had in the small OBs team
before the Baltimore meeting.
       Peculiarly these compilation mails:
as well as

     You can also read my presentations in Trieste, Baltimore and Strasbourg
interops (mainly the two ,latter)

      I attach the first Provenance example (in xml)

This material will be presented for comments on the DM pages with an xml
schema by next the end of next week...

> Hi,
> I will also try to be brief, and state the points I see as important
> up to this point.
> 1) What we are calling Characterization in the simulation data model,
> is restricted to only provenance information. For example, the star
> formation rate or RMS Mach number may be represented, and these are
> physical properties measure from the data (in my cases).
> However, there are some properties, such as resolution (spatial, or
> frequency) which are known beforehand. For this reason, there is a
> boolean "a priori" attribute on the Characterization class. This
> distinction may need to be made more clear, in order to align our
> model with the Characterization Data Model.
> 2) The discussion about the relationship between a data model for
> theory data, and data models for observational data is not specific
> to spectra. As I mentioned in my initial response, I think the
> appropriate thing to do when presenting spectra or images in a
> context with a specific model (e.g., SSA or SIA) is to use the
> corresponding model.
> Having an additional box, such as "Provenance", hanging off of a
> Generic Data Set class may be to include the additional information
> from the simulation data model. But, because models for theory data
> must be more general, the data may be de-normalized.
> 3) The question is still open as to whether or not we need to
> incorporate these thoughts into the current simulation data model, or
> if we can bridge the theory and observational models in the future.
> --Rick
>> Hi all,
>> Very short. I fully agree with Francois. I think all this  discussion can
>> be solved by adding a Provenance box to the current Spectrum DM, so we
>> can reuse Spectrum specific utypes for the spectrum part of the
>> theoretical spectra and the provenance box for the software, input
>> parameters, etc.
>> The boxes from SIMDB could be reused in this case but we should not
>> start from scratch just because there are some fields missing in the
>> current Spectrum DM.
>> As I said during last interop sessions, I am a little more worried  about
>> how to characterize the output records of a S3 (or SIMDAP) server when
>> the response is _not_ spectra.
>> Best Regards,
>> Jesus
>>> Hi all,
>>>    My personnal view about this.
>>>     A ) a question of vocabulary
>>>         Up to now an IVOA characterization has been reserved  
>>> vocabulary for
>>> description of a dataset or an observation in the Physical  parameter space
>>> of the data. What Carlos or Miguel would like to call  
>>> "characaterization" is
>>> more something like the "Provenance" of the dataset. Again all  this is a
>>> vocabulary question. But if we don't agree on the vocabulary how  can we do
>>> Interoperability?
>>>      B ) The spectral DM is conceptually a simple and peculiar  case of an
>>> overall Observation or Generic Dataset Model. The current version of
>>> spectrum doen't have the "Provenance " package. But it would be  
>>> really easy
>>> to add this package in a future version of Spectrum, because The  Obs DM
>>> currently being developed (with Provenance in it) is very similar  
>>> in overall
>>> structure to Spectrum DM.
>>>      C ) The provenance that IVOA is currently developing  integrates the
>>> software provenance. In the case of a theoretical dataset it would  be a
>>> place to hook necessary information described according to SimDB I  
>>> guess...
>>>      D ) a Service giving access to Theoretical spectra compliant  with SSA
>>> May really have hooks to Provenance and SimDB information because  
>>> additional
>>> fields and Extensions resources in the query response are allowed  by the
>>> protocol. They may have Provenance or SimDB utypes without  difficulty.
>>>    So If you agree with this general view It would be nice to have  
>>> input on
>>> what we could have in Provenance for the use case of simulated
>>> observations...
>>> Cheers
>>> François
>>> Hi Carlos
>>>>>> The Spectra datamodel is perfect for most of the issues, but the
>>>>>> characterization in SimDB provides a better description of
>>>> what the
>>>>>> theoretical spectra is.
>>>>> Dear Miguel,
>>>>> May I ask you what is missing in the SpectrumDM?
>>>>> What is SimDB offering extra, specific to spectra?
>>>> The main point, as I see it, is that the SpectrumDM is
>>>> designed having observed spectra in mind, no theoretical ones.
>>>> The model contains everything that is necesary to describe
>>>> the content of a spectrum (the wavelength, flux and all that,
>>>> and this is the same for observed and theoretical ones) but
>>>> nothing of what is needed to __characterize__ a theoretical  spectrum.
>>>> For instance, a theoretical spectra is usually characterized
>>>> giving, at
>>>> least:
>>>> - the code used to synthetize it.
>>>> - the effective temperature of the star
>>>> - the gravity (logg) of the star
>>>> - the metallicity of the star
>>>> (and sometimes some other parameters)
>>>> And, in the spectrum data model there is no utype for those
>>>> properties.
>>>> Making a long history short, if two different developers make
>>>> two different services with theoretical spectra and one
>>>> chooses "Meta" for the parameter containing the value for the
>>>> metallicity and the other chooses "Z" for the same parameter,
>>>> a client/application does not have a way to know that both
>>>> refer to the same concept (and UCD's are not enough for
>>>> this)
>>>> By the way, I think that SimDB doesn't solve that problem
>>>> either, am I right?
>>> That depends on what you expect from SimDB.
>>> That is, SimDB could allow you to define in some detail what code  was used
>>> to produce synthetics spectra, though it may need some additions  to the
>>> model as discussed in previous emails.
>>> The code is represented by the SimDB:Protocol, which contains input
>>> parameters, physics, algorithms and allows
>>> you to describe what is contained in a result
>>> (SimDB:RepresentationObjectType). The input parameters have a name  as well
>>> as a "semantic label", which may be a UCD or something more  generic. So if
>>> metallicity is in that vocabulary you can describe this. SimDB  
>>> allows you to
>>> find all protocols that use a metallicity in their list of input  
>>> parameters.
>>> The actual experiment that you run to produce your synthetic  spectra is
>>> described amongst others by the values you assign to the parameters.
>>> Note that I am not suggesting that there can not be other,  possibly more
>>> explicit models for theoretical/synthetic spectra. The SimDB data  model is
>>> rather abstract, i.e. not very concrete, as it aims to support  
>>> many types of
>>> siimulations and simulation codes etc. The SimDB data model could  serve as
>>> the basis from which to derive more concrete models, but it may  not serve
>>> the purposes of SimDB to do this in SimDB itself.
>>> For example if there is a particular set of parameters that all
>>> codes-producing-synthetic-spectra use, one could create a subclass of
>>> SimDB:Protocol that has these explicitly as attributes.
>>> For example a SyntheticSpectralModel could (I do not say should!  It seems
>>> rather specialised and in need of discussion with a larger group of
>>> astrophysicists) have an attribute "metallicity" for example. Such  a model
>>> will now give rise to a corresponding UTYPE. Something similar  
>>> occurs in the
>>> SimDB data model where the SimDB:Snapshot is a special type of  result (for
>>> 3+1D simulations) and has explicit attributes like spatialSize and  time.
>>>> I think that the spectrum data model should contain a section
>>>> for characterization of theoretical data providing utypes
>>>> for, at least, a minimum set of parameters associated to
>>>> theoretical spectra.
>>> One may argue that this is not "characterisation" but "provenance",
>>> something that the spectral data model does not deal with in detail.
>>>> The fact is that SSAP/SpectrumDM was done for observed
>>>> spectra, it considers a lot of details about them, etc but it
>>>> included theoretical spectra just as a use case in an appendix.
>>>> If it is going to be mandatory to use the same schema for
>>>> theoretical spectra and it is expected that we do it (let's
>>>> say) for ever, a little amount of time should be dedicated to
>>>> fill the holes in the protocol/data model when it refers to
>>>> theoretical spectra.
>>> Correct, but I would second Rick in proposing this not be done in the
>>> current effort on SimDB or SimDAP/S3, at least not in their  version 1.0.
>>> If you can come up with a more concrete model for synthetic spectra,
>>> possibly derived from SimDB/DM, you can easily create a service  
>>> spec around
>>> this by mapping the model to a relational representation and using  TAP for
>>> access.
>>> Cheers
>>> Gerard
>> -- 
