[QUANTITY] and [OBSERVATION] data models: new drafts
Brian Thomas
brian.thomas at gsfc.nasa.gov
Mon May 3 09:15:58 PDT 2004
Francois, all,
On Friday 23 April 2004 07:35 pm, Francois Bonnarel wrote:
> Dear DM partners,
> As I mailed allready I am a little concerned about the lack of
> reactions on the two drafts: Quantity and Observation. Not to speak about
> STC! Because even if there is no real opposition to these drafts, are we
> sure that the people will use this work if they are not convinced?
> [snip]
Well, please review these docs then. As far as practicality of application
is concerned, I am preparing a prototype reference package (in Java) for
Quantity. This package should allow one to create, serializes to/from XML
various core classes which support general data models. While the code is not
ready for list-wide release at this time I would be very happy to share
the code with anyone interested (even better, anyone who wants to help
code the project is *very* welcome to join and add their name as an author).
> and
>
>
> Brian:
> Simply put, thats just not the reality of the astrophysical concepts. Many
> (if not most) concepts belong to more than one group, and when they belong
> to a group, their relationship may be qualified (ex. I belong to this group
> under the following conditions..)
> As a result, I don't see how pushing the strings as id's/concepts
> within UCD is going to be effective in the long run in terms of meeting VO
> requirements. We need to take a more advanced approach, such as has been
> frequently suggested by UCD3. (Brian)
Where/when did you "hear" me state this opinion?!?
I think you are confused about my point of view. As per the paper that
Ed and I presented at the ADASS 2003, we completely believe that the issue
of overlapping concepts must be handled. Currently, the only technology
which appears to be able to handle this are the RDF/OWL Ontology-based ones
(in essence, "UCD3").
I have always maintained that the string-based UCD approach is very
flawed, and it the root cause of many of the issues in trying to transition
from UCD1 to 1+/2. I am very much a believer that we should just keep
UCD1 as is, and immediately start work on UCD3. Skip the "intermediate"
steps which aren't going to be very productive as the gap between what
is really needed (a flexible, multi-node interconnected set of semantics)
and what is currently implemented (somewhat inflexible 2-level hierarchy based
on strings) is too wide. An incremental approach will likely be a failure here.
We need to admit the failures of the present approach and move on.
> D ) Serialization ?
>
> What kind of serialization are we looking for ? In a SIAP or "AVO demo"
> oriented perspective, we are looking for a serialization which is a
> standardized data description. But I think the full XML serialization in
> the Quantity/Brian/Ed perspective is serialization for the data themselves
> (replacing older data formats like FITS).
Yes, the quantity should underpin the other data models. Its reason for being
may be simply stated:
"To provide the basic mechanism for storage, retrieval (search) and transport
between VO services/repositories."
Hence, the Quantity gives semantic meaning only to those components
which relate to the above requirement. In other words, it only defines
things like dataTypes, units, etc which relate to _all scientific data_. When
you start talking about Astronomy Images, spectra, etc, then you are
talking about higher-level data models. When you start talking about concepts
like "Flux", "phenomena" (as per some theorists) then you are talking
about higher level data models again. The quantity only provides a minimal
framework for holding these things, and allows interchange between them.
Now, given this, the serialization for the Quantity should be fairly mutable. It
should be able to support all of the higher level datamodels. Most astronomers
will not be aware of it, but it will make many things within the VO much easier
(such as search across the VO for concepts like "galaxies which have h-alpha
emission of X and are observed in the V and I bands simultaneously").
Once the data is discovered, and perhaps downloaded, it will probably be
translated back into FITS. I say: Why re-invent the wheel? We have currently
plenty of analysis software that astronomers can use with FITS, and plenty
of other VO -related work to do to worry about trying to redesign how
astronomers do their analysis (at this time).
>
> Other question: is the serialization controlled by Developper/Astronomer
> like in the various VOTable serialization proposals or automatically
> generated?
Why not both? (Seriously!!)
By having namespaces where models are applicable, we can control,
appropriately, what is a valid data model.
I would guess that perhaps 3 levels of namespace could exist: (we
could have more or less), these being:
"VO" = You need a serious proposal/community agreement to have model here.
IF some model is in the VO namespace, then most repositories will
need to implement it.
"Archive" = For archives that need to design internal models the public won't see
or are highly-specialized for a very small community of astronomers.
` Oversight here is by the archive. The model may change more frequently,
and changes wont break the VO as a whole.
"Personal" = For individual astronomers who want to group information in a particular way.
I can see this as usefull if the astronomer may use their model as a search tool.
For example, they may submit this model to the VO repositories as a "bag" to
be filled when they initiate a search. IF this was the search paradigm, then there
probably would be some standard "search models" so that Joe Astronomer
doesn't have to design his own model in order to search the VO. I'm thinking
Data model design is more of a power user type of activity.
Perhaps the last namespace level isn't merited, but I think the other 2 are.
>
> And for serialization we need attributes: Jonathan defined a few of
> them by defining utypes for SIA. Should we go on before the next version of
> the draft?
>
>
> E ) Packaging
>
> This is somewhat related to the previous point. If we are using the
> datamodel for data description, we need a description of the data which
> will come next? Is Quantity what we need for that ? What about the coding
> compression/format stuff? Don't we have to add a packaging class to the
> draft ?
I guess it depends on what you mean by "packaging". I'd say packaging
depends on the model.
Quantity controls the position/content of general scientific and IO (like compression)
information at a low-level.
So, to provide some concrete examples, if you ask, "where do I find the errors for
this number?" the Quantity model tells you. If you are asking "where do I find the
name of the observer for this telescope?" then the UCD/phenomena model tells you
(which inherits from the Quantity, at least, thats how I see it in UCD3). If you are asking
"what are the fundamental information that I need to include to describe an astronomical
image?" then the SIAP/Image data model tells you what concepts from UCD and Quantity
are needed and where they go.
Thus, to provide a short answer to your question, the Quantity should contain prescription
for how to do the IO/compression. It is currently not complete in that regard although the
Quantity DM group does have some early drafts of how to achieve this.
Regards,
=b.t.
>
> Just to be discussed here.
>
> Cheers
> François
>
> SAUVONS LA RECHERCHE : <http://recherche-en-danger.apinc.org/>
>
> =====================================================================
> Francois Bonnarel Observatoire Astronomique de Strasbourg
> CDS (Centre de donnees 11, rue de l'Universite
> astronomiques de Strasbourg) F--67000 Strasbourg (France)
>
> Tel: +33-(0)3 90 24 24 11 WWW:
> http://cdsweb.u-strasbg.fr/people/fb.html Fax: +33-(0)3 90 24 24 25
> E-mail: bonnarel at astro.u-strasbg.fr
> ---------------------------------------------------------------------
--
* Dr. Brian Thomas
* Dept of Astronomy/University of Maryland-College Park
* Code 630.1/Goddard Space Flight Center-NASA
* fax: (301) 286-1775
* phone: (301) 286-6128 [GSFC]
(301) 405-2312 [UMD]
More information about the dm
mailing list