Putting the pieces together - Quality...

Doug Tody dtody at aoc.nrao.edu
Fri May 14 11:34:16 PDT 2004


Hi Martin -

On Fri, 14 May 2004, Martin Hill wrote:
> The data model would be the wrong place to put non-standard things :-)

Not in the data model, in the dataset instance.  The dataset is an instance
of the data model, and must conform to the data model, but it can contain
nonstandard extensions to convey information specific to the actual dataset.
This would normally be put there by the data provider.

 
> Quality threshold values I would have thought would be set by the datacenters 
> - after all, these are the people who have already set effective threshold 
> values when working out binary or trinary quality values. Later we can model 
> quality more carefully.

We need to define something in the model for quality.  Then as you say,
it is up to the data provider to define this value when they map their data
into the standard data model.

 
> Which brings us to another point; if we can model the quality of an 
> individual bit of data, how do we model quality on large groups such as 
> datasets?  Presumably some datasets are better than others, and so the data 
> 'as a whole' has better value than another one - particularly as I understand 
> it even undergraduates should be able to post their data on the VO in order 
> to use VO tools, and to allow other people to use their data :-).

This is a hard problem, but would be a reasonable thing to do in the
level 1 (registry) or level 2 (uniform dataset characterization) metadata.
Better yet we should refine the level 2 metadata to provide uniform
information about resolution, sensitivity, etc., then we can do this 
quantitatively.

	- Doug



> On Friday 14 May 2004 1:45 pm, Ivo Busko wrote:
> > Doug Tody wrote:
> > <snip>
> >
> > > Ivo - regarding your point about data quality vectors:  As you know,
> > > the SSA data model has a data quality vector.  We don't really know what
> > > to put in it though.  I don't think we should put anything instrumental
> > > in nature in the general SSA data model (this can be done but it would
> > > go into nonstandard extension records).  Simple models for the quality
> > > vector would be binary (good or bad) or trinary (known good, known bad
> > > or flagged, or questionable).  Perhaps once we get more experience with
> > > real data from archives it will be possible to develop a more refined
> > > quality model.  (Note this should not be confused with the error vectors
> > > which we already have).
> > >
> > >         - Doug
> >
> > Thanks, Doug, that sounds good enough. I agree that nothing
> > instrument-specific
> > should be put in the data model. However, something must be done to
> > accomodate
> > cases that do not follow the norm.
> >
> > I have in mind cases where a binary or trinary model wouldn't be enough
> > to summarize the data quality information available in the original
> > file.
> > A good example is FUSE data; it uses a continuously variable 2-byte
> > integer
> > value to store a kind of "weight" (between 0 an 100), instead of the
> > more
> > commonly found bit-encoded mask. To cast that data into a, say, binary
> > good/bad
> > model, one needs an additional piece of information, in the form of a
> > threshold
> > value.
> >
> > Ideally, a VO transaction involving such data should allow for the
> > threshold
> > value to be either specified by the requestor, or alternatively be set
> > by the
> > data provider in a instrument-dependent way.
> >
> > In short, my point is: shouldn't the data model allow some room for
> > non-standard
> > extra bits of info such as data quality threshold values?
> >
> > Cheers,
> >
> > -Ivo
> 
> 



More information about the dal mailing list