[QUANTITY] and [OBSERVATION] data models: new drafts

Wed May 5 06:18:06 PDT 2004

Dear Brian, dear all,
   Here are some answers to your mail of monay.

On Friday 23 April 2004 07:35 pm, Francois Bonnarel wrote:
<> Dear DM  partners,
<>     As I mailed allready I am a little concerned about the lack of
<> reactions on the two drafts: Quantity and Observation. Not to speak about
<> STC! Because even if there is no real opposition to these drafts, are we
<> sure that the people  will use this work if they are not convinced?
<> [snip]

< 	Well, please review these docs then.
     There was a misunderstanding there. as part of the group
who prepared the draft for Observation, I was complaining about the
lack of reactions as a sort of coauthor. The situation changed
slightly since the time of my mail. This does not prevent me
to review  Quantity and STC of course.
<      As far as practicality of application
<	is concerned, I am  preparing a prototype reference package (in Java)  for 
<	Quantity. This package should allow one to create, serializes to/from XML
<	various core classes which support general data models. While the code is not
<	ready for list-wide release at this time I would be very happy to share
<	the code with anyone interested (even better, anyone who wants to help
<	code the project is *very* welcome to join and add their name as an author).

   Please, send it to me!

<> and
<>
<>
<> Brian:
<> Simply put, thats just not the reality of the astrophysical concepts. Many
<> (if not most) concepts belong to more than one group, and when they belong
<> to a group, their relationship may be qualified (ex. I belong to this group
<> under the following conditions..)
<>         As a result, I don't see how pushing the strings as id's/concepts
<> within UCD is going to be effective in the long run in terms of meeting VO
<> requirements. We need to take a more advanced approach, such as has been
<> frequently suggested by UCD3. (Brian)

<	Where/when did you "hear" me state this opinion?!?

    ????? -> dm list, Thu 1 April 2004
             Subject: Tom is right : and here is why this approach
is wrong.

<	I think you are confused about my point of view. As per the paper that
<	Ed and I presented at the ADASS 2003, we completely believe that the issue
<	of overlapping concepts must be handled. Currently, the only technology
<	which appears to be able to handle this are the RDF/OWL Ontology-based ones
<	(in essence, "UCD3").
<
<	I have always maintained that the string-based UCD approach is very 
<	flawed, and it the root cause of many of the issues in trying to transition
<	from UCD1 to 1+/2. I am very much a believer that we should just keep
<	UCD1 as is, and immediately start work on UCD3. Skip the "intermediate"
<	steps which aren't going to be very productive as the gap between what
<	is really needed (a flexible, multi-node interconnected set of semantics)
<	and what is currently implemented (somewhat inflexible 2-level hierarchy based
<	on strings) is too wide. An incremental approach will likely be a failure here. 
<	We need to admit the failures of the present approach and move on.

      How can something like UCD1 used in various softwares and 
performing real work be considered as a failure? (And there was a need
for some simplification- error/fixing - generalization of that --> UCD1+). 
Except if we put data model theory over all. Not that I do not recognize 
that the UCD3/ontology approach has  to be performed. I described 
another conception of the relationship between Data model work and
actual interoperability in my mail to G Lemson yesterday. 

<>      D )   Serialization ?
<>
<>     What kind of serialization are we looking for ? In a SIAP or "AVO demo"
<> oriented perspective, we are looking for a serialization which is a
<> standardized data description. But I think the full XML serialization in
<> the Quantity/Brian/Ed perspective is serialization for the data themselves
<> (replacing older data formats like FITS).

<	Yes, the quantity should underpin the other data models. Its reason for being
<	may be simply stated:

<	"To provide the basic mechanism for storage, retrieval (search) and transport
<	between VO services/repositories."
<
<
<	Hence, the Quantity gives semantic meaning only to those components
<	which relate to the above requirement. In other words, it only defines
<	things like dataTypes, units, etc which relate to _all scientific data_. When
<	you start talking about Astronomy Images, spectra, etc, then you are 
<	talking about higher-level data models. When you start talking about concepts
<	like "Flux", "phenomena" (as per some theorists) then you are talking
<	about higher level data models again. The quantity only provides a minimal
<	framework for holding these things, and allows interchange between them.
<
<	Now, given this, the serialization for the Quantity should be fairly mutable. It
<	should be able to support all of the higher level datamodels. Most astronomers
<	will not be aware of it, but it will make many things within the VO much easier
<	(such as search across the VO for concepts like "galaxies which have h-alpha
<	emission of X and are observed in the V and I bands simultaneously").
< 	Once the data is discovered, and perhaps downloaded, it will probably be
<	translated back into FITS. I say: Why re-invent the wheel? We have currently
<	plenty of analysis software that astronomers can use with FITS, and plenty
<	of other VO -related work to do to worry about trying to redesign how 
<	astronomers do their analysis (at this time).

Ok, I consider quantity XML usefull for the description of the
quantities, but do we really have to put the content of the
FITS file in the XML if it is to be retranslated after that? The
fits file coming with the xml description is probably 
time saving.

<>
<> Other question: is the serialization controlled by Developper/Astronomer
<> like in the various VOTable serialization proposals or automatically
<> generated?

<	Why not both? (Seriously!!)

<	By having namespaces where models are applicable, we can control,
<	appropriately, what is a valid data model.

<	I would guess that perhaps 3 levels of namespace could exist: (we 
<	could have more or less), these being:

<	"VO"	= 	You need a serious proposal/community agreement to have model here.
<				IF some model is in the VO namespace, then most repositories will
<				need to implement it.

<	"Archive" 	= For archives that need to design internal models the public won't see
<				 or are highly-specialized for a very small community of astronomers.
<			`	Oversight here is by the archive. The model may change more frequently,
<				and changes wont break the VO as a whole.

<	"Personal"  = For individual astronomers who want to group information in a particular way.
<				I can see this as usefull if the astronomer may use their model as a search tool.
<				For example, they may submit this model to the VO repositories as a "bag" to 
<				be filled when they initiate a search. IF this was the search paradigm, then there 
<
<				probably would be some standard "search models" so that Joe Astronomer
<				doesn't have to design his own model in order to search the VO. I'm thinking
<				Data model design is more of a power user type of activity.

<	Perhaps the last namespace level isn't merited, but I think the other 2 are. 

    On this I agree with you, but you will probably not like if
I say that level 2 and 3 specific datamodels could be usefully
described with accuracy and wider agreement using VO level utypes 

<>
<>      And for serialization we need attributes: Jonathan defined a few of
<> them by defining utypes for SIA. Should we go on before the next version of
<> the draft?
<>
<>
<>      E ) Packaging
<>
<>      This is somewhat related to the previous point. If we are using the
<> datamodel for data description, we need a description of the data which
<> will come next? Is Quantity what we need for that ? What about the coding
<> compression/format stuff? Don't we have to add a packaging class to the
<> draft ?

<	I guess it depends on what you mean by "packaging". I'd say packaging 
<	depends on the model.

<	Quantity controls the position/content of general scientific and IO (like compression)
<	information at a low-level.

<	So, to provide some concrete examples,  if you ask, "where do I find the errors for 
<	this number?" the Quantity model tells you. If you are asking "where do I find the 
<	name of the observer for this telescope?" then the UCD/phenomena model tells you 
<	(which inherits from the Quantity, at least, thats how I see it in UCD3). If you are asking 
<	"what are the fundamental information that I need to include to describe an astronomical 
<	image?" then the SIAP/Image data model tells you what concepts from UCD and Quantity 
<	are needed and where they go.
<
<	Thus, to provide a short answer to your question, the Quantity should contain prescription
<	for how to do the IO/compression. It is currently not complete in that regard although the
<	Quantity DM group does have some early drafts of how to achieve this.

<	Regards,

<	=b.t.

   I would be happy to see these drafts. In the mean time, the 
Observation subgroup concluded that we needed a Packaging class
distinct from Quantity because "Quantity gives the structure, but not  
t the coding and format." (JCM).
  I wrote a pre-draft which is currently reviewed by the sub-group.
Probably something will come on the list rather soon

Cheers
François

=====================================================================
Francois   Bonnarel               Observatoire Astronomique de Strasbourg
CDS (Centre de donnees          11, rue de l'Universite
astronomiques de Strasbourg)    F--67000 Strasbourg (France)

Tel: +33-(0)3 90 24 24 11       WWW: http://cdsweb.u-strasbg.fr/people/fb.html
Fax: +33-(0)3 90 24 24 25       E-mail: bonnarel at astro.u-strasbg.fr
---------------------------------------------------------------------