Taxonomy issues

Sun Sep 29 05:05:12 PDT 2002

This is a very illuminating discussion. Thanks especially to Sean for
his lucid differentiation between an ontology of classes and instances
of those classes.

It seems that the bottom line is that we should try to get our
'concepts' organised into some logical non-conflicting coherence but
there's no problem with different people classifying individuals as
instances of different classes.

So, our ontology will state that a star cannot be the same as a galaxy
but entity X may be classified by author A as a star and by author B as
a galaxy. And all of this is consistent even though the inference engine
will tell us that there is a conflict between the authors' statements.

**Questions to Sean: does an inference engine work on instances or just
on the classes? Can we store the ontology in OWL format and the
classification of instances in a rdbms (while the instance data may be
stored in FITS files or whatever) and still make inferences?

The more serious issue is that raised by Anita: 'whether a red fairy is
a variable star with a period of <400 days or <350 days and a green
fairy has a period of >400 or >350 days'. The astronomy community, or
that part of it constructing an ontology plugged into a given VO, needs
to decide whether to plump for one or the other definition or allow both
and if the latter, how.

Looking at George's email, his classification of classifications was
illuminating. What I think he is looking for in any 'use' of an ontology
is being able to get at the provenance of any instantiations. So if
entity X is classified by author A as a star, then we need to provide a
record of how that classification was arrived at.

This provenance should include the original source data and any
selections, transformations and analyses which were performed on the
data. This has a couple of implications.

1. We need some way of organising or describing an author's set of
instantiations and their provenance. Namespaces could be one way but,
given that we'll have tens or hundereds of '000s of these, it could get
a little cumbersome. Somehow, an astronomer needs to be able to say: 'I
accept the sets described by authors A, D and M. Please check my own
work against theirs'.

2. We need a language (and ontology) for describing workflows and their
component activities. There has been work in the commercial world on
this, partly subsumed by the term 'organizational memory', which I know
Carole and the Geodise project is working on.

This is partly what I was getting at in my previous email about
'ontology-based querying'. Is there some higher order of classification
for selection criteria (apart from their UCD-operation-Value graph) that
will allow deduction of criteria from the description of a desired set
of results?

The final issue is that of outliers or anomalies. We have to make sure
they aren't shoehorned by the software into inappropriate categories and
also that they can be identified as the target for future research. I'm
not sure whether what has been said protects them or not, highlights
them or not. I don't think they need separate handling but need to be
accounted for in any scheme or approach.

Cheers,
Tony. 
__
Tony Linde                       Phone:  +44 (0)116 223 1292
AstroGrid Project Manager        Fax:    +44 (0)116 252 3311
Dept of Physics & Astronomy      Mobile: +44 (0)7753 603356
University of Leicester          Email:  tol at star.le.ac.uk
Leicester, UK   LE1 7RH          Web:    http://www.astrogrid.org