Comments on Canadian VO data model

Patrick Dowler patrick.dowler at nrc-cnrc.gc.ca
Mon Apr 28 12:25:02 PDT 2003


On April 28, 2003 09:47, DIDELON Pierre wrote:
> I am very interrested by the provenance associated with an EntryProp.
> Does it means that the processing history is handled at the level of
> individual values for each entry independantly?

Yes, every EntryProp has its own provenance and can be updated
independently. There is by necessity a "null" provenance that means
"came from outside the Catalog system" which we only use for the observation 
catalog entries that are "published" by an Archive.

In the source catalog, all sources from a single processing instance (ie. 
applied to a single observation) have the same provenance. This is an
arbitrary distinction brought about by the quantised nature of the data (many 
rows with same provenance) and and the monolthic processing we applied (many 
columns with the same provenance since one process measures multiple things).
Currently (within WFPC2 assocaitions) all the source catalog propeprties come 
from one type of process (voSextractor :-) However, when one starts to
introduce observations and sources from other parts of the spectrum (X-ray 
from ROSAT is underway) or other types of observations (spectra from the 2QZ 
survey is underway) then one also has different types of processing that 
measures the same properies (i.e. every source should have a total flux).

Thus, we fell this fine level of granularity in provenance is necessary.


> In my design, because I thought at that time,
> that catalogs are commonly updated by columns as a whole,
> I forsee to implement the processing history handling at that level;
> the data provenance would have been handled column by column.

> Can you explicit what is in common, except the structure use to store
> the data?

The things common to every Entry are:

- an identifier unique within the Catalog (Long: 64-bit integer)
- a set/array of EntryProp(s)

The things common to every EntryProp are:

- entry id : Long - identifier for the "parent" Entry
- property id : Short - identifier for this property, from the EntryPropMap
- tuple id : Short - unique identifier for this tuple (EntryProp) amoung 
others with the same entry_id and prop_id (ie. this allows an Entry to have 
multiple EntryProp(s) - multiple values for the same property!!)
- group id: Short - group access control mechanism (crude permission spec)

- value: object type depends on the property id requirements
- error: object type depends on the proeprty (ie. the value type and error 
type are tightly coupled)
- provenance: an EntryLink to some other EntryProp or Entry (if link.prop_id 
and link.tuple_id are null) that answers "where did this EntryProp come 
from?"
 
The entry_id/prop_id/tuple_id uniquely designate an EntryProp in the entire 
Catalog. An EntryLink is also defined as these three things plus the name of 
the Catalog.

I should note that the multiplicity implied (well, allowed) by having a 
tuple_id is very important rather than laziness on our part. It allows for:

- measure the same property in different ways (different types of processing 
applied to the same input data)

- measure the same property with different input data (multiple observations 
of the same "object"), which is more of an issue in the object catalog

- automatically decomposing a complex property into a set/list of more 
primitive ones: eg the observation catalog could have an Entry that is a 
collection of broadband images (UBVRI) of the same field. Each individual 
image could be an Entry, and the collection could be an Entry too, but with 
more detail in the spectral_bounds than is allowed by a single interval 
(which would be the "outer hull" that Jonathan described). One could do the 
same  sort of thing with "time series" collections with ~ the same spatial 
and spectral bounds and a useful set of observations in the temporal 
dimension.


> I saw that you plan to come to the interopMeeting in cambridge.
> Perhaps could we meet there and have a more precise and fruitfull
> discussion
> than what is possible by mail?

Definitely,

-- 
Patrick Dowler
Tel/Tél: (250) 363-6914 | Fax: (250) 363-0045
Canadian Astronomy Data Centre    | Centre canadien de donnees astronomiques
National Research Council Canada  | Conseil national de recherches Canada
Government of Canada                   | Gouvernement du Canada
5071 West Saanich Road                | 5071, chemin West Saanich
Victoria, BC                                   | Victoria (C.-B.)



More information about the dm mailing list