Problems about the Spectrum Data Model from the view of a Web Service programmer

Dobos, Laszlo dobos at pha.jhu.edu
Thu Sep 14 07:53:29 PDT 2006


Hello DM group,

For those who don't know me, I'm Alex Szalay's and Tamas Budavari's student
at the JHU and I was working on the Spectrum Service
(http://voservices.net/spectrum) in the last three years. I'm studying
physics at the Eotvos University, Hungary and before starting the university
I was working for years as a database and distributed application programmer
for commercial companies.

I just ran through the document quickly, because I was interested in the Web
Service issues. I'm pretty sure that from the scientific view the data model
is very well designed, and your work on collection and organizing the
enourmous number of different metadata fields is appreciated, but I've found
some problems that may cause trouble in a web service aware application.

Here are my observations:

* 1. XML serialization puts the value of the fields between opening and
closing tags. It's okay for an XML document, but SOAP (the web services XML
protocoll) can deserialize _only strings_ if they're written between the
tags and other attributes are also specified at the same time. The correct
way to store values is in elements, like

	<Wavelength>
		<Unit>nm</Unit>
		<Value>700</Value>
	</Wavelength>

or store in attributes, like

	<Wavelength Unit="nm" Value="700" />

But the following is not correct:

	<Wavelength Unit="nm">700</Wavelength>

We should keep in mind that even though this third version looks nicer in a
text editor, XML is nor for reading by humans, but for using in web
services, so I suggest to follow the SOAP standard instead of making good
looking documents.

* 2. The model now supports serializing data points (i.e. the elements of
the arrays) two ways (struct-style and flat). From the programmers approach,
both may be problematic. Flat-style is obviously introduced in order to
shrink the resulting xml files to reduce network traffic, which is nice.

The problem is the following: If one creates program classes in Java or C#
or whatever and wants to expose his/her functions as a web service, he/she
might also want to add the optional fields to his/her data modell. The
problem is with the normal data types. Let's consider an everyday double
variable. It can have the value of 0, but the SOAP serializer is going to
write it into the XML anyway. To avoid a value written into the XML the
variable should be set to null, but only reference types (pointers) can be
null. In the Spectrum Service I'm developing I used the following approach:
stored the header field as usual, in classes producing a hierarchical set of
variables and stored the actual data in simple arrays of doubles (or ints).
Because arrays are reference types, they can have the value of null, so if I
don't want to use any of the optional axes, I simply set it to null.

If I would have used an array of structs (i.e. simple data types grouped
together), I couldn't make this trick, I would have to set each unused field
in each instance of the struct to null... but fields of the struct are
simple types thus cannot be null...

So if we want to create a data model that supports the web services and
still want to keep the xml file size small, we should consider storing the
actual data in simple arrays instead of arrays of structs. It is also going
to help in the future, when the binary web service protocold will be
available!

* 3. Francois Ochsenbein and I found some issues about VOTables too. No
correct proxy class can be generated according to the VOTable xsd, the
resulting code must be modified by hand at several places, at least in
windows .net and linux mono framework (the two main web service
implementations).

* 4. STC uses different data types than the spectrum header and
characterization use. Thay should be synchronised sooner or later, since if
we start implementing a unit conversion software sometime in the (far)
future, it will be very useful to have everything in the same structure.
Thus, a Quantity object with field of unit, ucd, value are required. I have
such an object in my spectrum model implementation (which currently
implements the previous version of the document, unfortunately, without the
characterisation and STC), and I can automatically convert it to VOTable or
serialize as an XML using a single line of code and I should not modify the
serializer/deserializer code if the underlying object structure (DM)
changes.

* 5. The SSA protocol uses a different data model to return the spectrum
info in a table. Why not to return an array of spectrum headers instead
without the data points?

I've attached a document that I wrote a year ago but somehow missed the DM
group.

I Hope I could help eliminating these problems about Web Services.
Unfortunately I cannot go to the Moscow meeting, but most of you will be at
the ADASS, so if you're interested, I can show you the actual Spectrum
Service code and explain the programming tricks that bridge the gaps between
the complicated data model and the web service standard.

-Laszlo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: data model notes.pdf
Type: application/pdf
Size: 77958 bytes
Desc: not available
URL: <http://www.ivoa.net/pipermail/dm/attachments/20060914/67493aba/attachment-0001.pdf>


More information about the dm mailing list