Taxonomy issues

S. G. Djorgovski george at astro.caltech.edu
Fri Sep 27 09:52:22 PDT 2002


OK, I have to weigh in on this.  Astronomical classifications are very
different from those in biology, say.  We use the word "classification"
in several different meanings.  It is important to distinguish between:

- Physically distinct classes which derive from the basic measurements, 
e.g., a radio source, an E galaxy, an IRAS 60um source, a QSO host ..., 
where obviously multiple classes can and should be assigned; they are
all correct.

- Really physical classes, e.g., QSO, normal galaxy, MS star, GRB, planet ...
They can manifest as different measurement-based classes, e.g., a QSO may
or may not be radio-loud, and a radio source may or may not be QSO.

- Measurement-based classifications where there are boundary cases, e.g.,
star/galaxy classif., where you can have what in the old FOCAS is called
"stars with a fuzz"; this can be parametrized in some way (probability of
classification, PSF-ness...)  This is actually a subset of:

- Classes assigned statistically and objectively by some clustering algorithm
operating in some parameter space (a very much VO-type study), which then have
to be interpreted physically.

- Conventionally or subjectivelly assigned classes, such as the Hubble types,
which can easily differ by a class or two between different human classifiers,
different algorithms, wavelength, seeing, ... and which bear some, but not a
perfect correlation to something physically meaningful.

There are several complications:

- An object morphologically classified as a "star" (= PSF-like) could
physically be really a star, or a QSO, or a slow-moving asteroid in a short
exposure, or an optical transient of some sort, etc.  There may be an
equivalent situation with the clustering analysis more generally.  In other
words, operationally and objectively defined classes can contain multiple
physical classes of objects (and v.v.), and for some sources this may be
known or suspected, and in some cases not.

- Blended sources of different classes (in any of the senses listed above),
where a dual class is suggested by the data.

- Of course, measurement errors can move source in and out of the classes
in any one of these senses of classification.

As a practising astronomer, I would like to have a complete picture without
someone's particular value judgement imposed.  So, if I search by position
or indeed any non-class parameter, I'd like to know about ALL classifications
assigned to every source, with some suitable tracers as to where did they
come from.  If I search by class (e.g., give me all radio-loud E's in this
part of the sky), then I also want to have all sources for which there is
at least one such classificastion, but I also want to know about other classes
that might have been assigned for any one of them.

So, I would say that any technical solutions need to be liberal enough to
allow for this diversity, and then provide down-filtering tools.

Now, the really interesting issue in my mind is the danger of missing something
new by forcibly shoving every source in some pre-existing classification bin
or a set thereof.  It is crucial to avoid this, and to design tools which would
isolate "anomalous" sources, while not delivering TB's of garbage along with
them.  Discovery of new types of sources and phenomena is likely to be one
of the key types of VO-based astronomy, so the information infrastructure 
should be designed to facilitate this.

Should we introduce a classification or ontology or whatever of sources
"(possibly/probably/certainly) anomalous" ?  And how does that get done?
Now, there is a problem worthy of this community...

Sorry for the ramble, but I had to get it off my chest.   George Djorgovski



More information about the semantics mailing list