Basic/root of big-picture model and 2 history handling example

DIDELON Pierre dide at discovery.saclay.cea.fr
Tue May 20 10:40:01 PDT 2003


Hi Data modelers,

I pursue the effort of building the firsts steps of basic
root model, started by jonatahn at cambridge,
at this basic model we would attach the diff. models
of the various components we are supposed to work on.

I have posted 3 figures on the DM IVOA twiki,
(http://www.ivoa.net/twiki/bin/view/IVOA/IvoaDataModel)
as suggested by Jonathan.

1) the first one gives an overview of "my point of view"
concerning the basic domains that we must model.
It is a UML class diagramm representing the relations
between Process, Data and History
Process and Data are basic (very abstract class)
concepts which must be design a little bit more in
sub-diagramm (I already have an hand made version on
paper, postscript version will come asap).

History materialise a class,
for which I don't see (for the moment) any need attribute.
It can be seen like an agent (with some methods to define)
which will handle all the link related to history,
and by following them would be able to list
the history of specific data or processing.
See at the end an example of what could be (for me)
a file in XML format, storing data history.

The link/associations between process and data 
handle the history from the processus point of view.

BackwardHistory and ForwardHistory handle more
precise history from the data point view.

Version association would be used to handle data versionning.
If needed such a association could be implemented also
for the process.

2) The second diagramm illustrate a very simple example
of history handling at the level of the process.
It is a kind of data flow diag, where the lines of the flow
use the link name of the class diagramm.

In the case of catalog the data output level at which 
the history is handled, can be deeper nested in catalog.
For example Pat in CVO model handle history at the level
of EntryProp contained in CatalogEntry.

This corresponds to the case where the entire input dataset
as an influence, at least, on the entire output data.
Like astrometric calibration where the position of all
(or almost all) the input sources/detections are used 
to obtained a calibration applied to the catalogs.
So more detailed history is not needed.
But if it is crucial to know which items of the data list input
is used to construct individually all the output data items,
then more details and data centric relations are needed.

3) the third diagramm illustrates prcisely this case.
It is also a kind of data flow diag.

It shows the history link between one output data item
and the input data which have been used to construct it.
We have now some redondancy, and we can imagine that the link
DataIn between process and dataIn is not needed anymore,
but it would perhaps be useful to keep it anyway.

The link DataRejected allows to point to "Bad Data".
The way he is implemented between process and data
class implies that the rejection is done
at the process level once for the whole sample of
input data. If Rejection depends on the data produced
in output, the link has to be implemented has a loop
on data class like versions and *wardHistory assoc.


Now let's start the discussion.

Pierre

PS : below
"my" example of what could be a file containing
history in XML format.
Nothing has formal meaning except the structure.

<history>
 <process name="gnarf">
  <listdatain>
   <datain1 .../><datain2 .../>...<dataini .../>
  </listdatain>
  <listdataout>
   <dataout1 .../><dataout2 .../>...<dataoutj .../>
  </listdataout>
  <listdatarejected>
   <dataing .../><datainw .../>...<datainq .../>
  </listdatarejected>
  <datalink>
   <--! this section gives the specific history between an output item and all the
   	input data items used to produce this data product.(corresponds to the second
	example of history diagramm.
	It will be repeated for all the output data which needs this kind of 
	history handling. 
	It is a complement to the history handle at the processus level given above-->
   <dataout1>
    <datain1 .../><datain4 .../>...<dataink .../>
   </dataout1>
	...
  <datalink>
 </process>
 <process name="gnourf">
	...
 </process>
</history>






More information about the dm mailing list