[utypes] standard serialisation strategies/ Omar's proposal

Tue Feb 7 08:04:32 PST 2012

Hi Mireille,

It makes sense to me.

We would have one part that is devoted to describing how to construct utype
strings starting from a UML-like description of the entity to be
serialized. An introduction will include a general statement of the problem
(i.e. a standard strategy for building strings enables the description of
structured objects using arbitrary unstructured tabular file formats).

Another part will describe the general framework and the standardized
serialization strategy, that puts the utype strings in a consistent
framework, thus avoiding any future data model / DAL specification to be
forced to introduce its own custom serialization.

However, notice that my proposal is incompatible with section 5 (and/or
includes some parts of it). Also, section 5.1, imho, raises one fundamental
question: why should we bother with a utypes syntax in the first place if
we can have short strings that express exactly the same thing through an
isomorphism?

I don't have a strong opinion on whether the latter part is an appendix or
not, as long as it is somewhere, but since my proposal is devoted to an
effective, efficient and OOP-friendly data model reuse strategy, I guess it
would make sense to use my proposal for section 5.

Similarly, sections 2 and 3  in the december draft are incompatible with
section 7, and with the original idea of representing DMs using XMI and
using this description to build utypes. Either we drop the latter approach
and update (or drop) section 7, or we solve the inconsistency in section 2
and 3 by making Markus' algorithm use the XMI instead of the XSD.

I still think that a different document is a better place for the Data
Modeling process abstraction. But again, as long as the information is
somewhere, I really don't care.

Bottom line, I think that with some minor tweaks we can consistently
implement what you described.

Cheers,

Omar.

On Fri, Jan 27, 2012 at 12:15 PM, Mireille Louys
<mireille.louys at unistra.fr>wrote:

> Dear Omar, dear all,
>
> I read again the WD issued late december.
> I am trying to suggest a compromise in order to converge soon.
>
> The requirements and use-cases sections can be included in the Utype WD,
> and completed with simple example to express the mapping
> concept<--->keyword-string
> ex region --STC, sampling--Characterisation, PhotometryFilter--PhotDM
>
> Instead of consisting in a separate IVOA note, the general strategy for
> serialisation , including both diagrams, could be an appendix to the Utype
> document, if agreed.
> I just fear slow converging process if we want to finalise 2 documents
> concurrently.
>
> More updates and suggestions on the text, next week.
> What are your opinions about that?
> Thanks for your comments,
> Cheers, Mireille.
>
>  Laurino <olaurino at head.cfa.harvard.edu**> a écrit :
>
>  Hi All,
>>
>> I am attaching two diagrams. One is an abstraction of the Data Modeling
>> process and might not be included in the UTypes document. However, the
>> Utypes document should refer to that diagram (maybe a Note?).
>>
>> The other diagram shows an abstraction of the serialization process, and
>> uses classes defined in the DM diagram. Both diagrams describe logical
>> data
>> models.
>>
>> I am also attaching an excerpt from draft, with only the requirements and
>> use cases for Utypes.
>>
>> I have a prototype in Java that shows how the thing works, I will add a
>> demo to it and circulate as soon as possible (probably at the beginning of
>> january, though).
>>
>> Here is a description of the DM diagrams: a StandardDocument contains one
>> or more Namespaces (Characterization, STC, Target, Spectrum, etc.). Each
>> namespace has a label (char, stc, target, spec) which is used to resolve
>> the Entities defined in each Namespace. For instance, char:Accuracy and
>> spec:Accuracy are different Entities (different concepts). Also, Entities
>> have versions. Thus, the qualified name of an Entity is ns:Name-x.y (e.g.
>> char:Accuracy-1.0). You can serialize a SerializableEntity, which is an
>> Entity with attributes that have concrete values: in particular they have
>> UtypedValues, i.e. values associated to a Utype string. This values hold a
>> reference to the actual attribute they refer to, through their utype. A
>> serialization (a Document, or file) holds several instances of an Entity,
>> one for each row. It also has a set of declarations. Note that the
>> declaration is one of the few differences between my model and what we
>> currently, implicitly, do. The other subtle differences should be clear in
>> the following description.
>>
>> This model and its prototype implementation provide a solution for meeting
>> the requirements and enable the use cases described in the document draft.
>>
>> So, Entities have Attributes, according to the DM diagram. In my
>> prototype,
>> here is a description of a sample Namespace in a simple ASCII format
>> (constraints are not included):
>>
>> [Namespace]
>>
>>> label = char
>>>
>>
>> [Accuracy-1.0]
>>
>>> attribute = StatError|Double|True|The Symmetric Statistic Error
>>> associated
>>> to a measurement
>>> [CharAxis-1.0]
>>> attribute = Name|String|True|The name of this characterization axis
>>> attribute = Unit|String|True|The unit for this characterization axis
>>> attribute = UCD|String|False|The UCD that describes this characterization
>>> axis
>>> attribute = Accuracy|char:Accuracy-1.0|**False|The accuracy of the
>>> point on
>>> this characterization axis
>>>
>>
>>
>> Of course, an XML description might be possible. In this respect, note
>> that
>> this description would be an XML document, not an XSD schema! In other
>> words creating a new Data Model wouldn't require creating a new schema,
>> but
>> just a description of the new data model in terms of a single predefined
>> XSD schema.
>>
>> Now, you can define new Entities in different Namespaces by aggregating or
>> extending the existing ones, for example, this is the Namespace "Data":
>>
>> [Namespace]
>>
>>> label = data
>>> [DataAxis-1.0]
>>> parent = char:CharAxis-1.0
>>> attribute = Value|String|True|The value of the point on this data axis
>>> attribute = DataType|String|True|The Datatype of this value, expressed as
>>> a String and compliant with the description in [blah blah]
>>> attribute = Quality|Integer|False|A quality index of this value,
>>> expressed
>>> as an Integer
>>> [Accuracy-1.0]
>>> parent = char:Accuracy-1.0
>>> attribute = StatError|Double|True| This attribute was overridden
>>> [DataAxis-2.0]
>>> parent = data:DataAxis-1.0
>>> attribute = Accuracy|data:Accuracy-1.0|**False|This attribute is in the
>>> new
>>> version of DataAxis
>>> attribute = Value|String|True| This is the overridden version of Value
>>>
>>
>>
>> data:DataAxis-1.0 extends char:CharAxis-1.0, adds three more attributes
>> and
>> inherits all the others.
>>
>> Then, it defines its version of Accuracy, by overriding the one defined in
>> Char, and a new version of DataAxis that overrides both Value (declared by
>> its previous version) and Accuracy (inherited from Char in the previous
>> version).
>>
>> Note that "extension" means that a change occurred either in the semantic
>> description (the "meaning") or in the constraints (which are not included
>> in the example). And the constraints (for example the possible values) can
>> only be reduced and not extended, for semantic consistency. In any case
>> the
>> datatype can't change; otherwise, as it would happen in any OOP program,
>> the extending class is breaking the contract (on java such an extension
>> wouldn't even compile).
>>
>> All the instances have a set of "utype terminals" defined from these
>> descriptions using the usual convention described in the draft
>> (alternative
>> notation). For instance, Accuracy defines this utype terminal:
>> *.StatError.
>> CharAxis has several terminals, for example: *.Accuracy.StatError.
>>
>> The set of Namespaces defines some sort of database of models (the IVOA
>> conceptual domain) that can be used to dynamically define new ones. In
>> particular, I can now define a new Entity that I want to serialize in a
>> file. By dynamically I mean that this Entity is not defined in any
>> standard
>> document, but created, let's say, by an SDSS cone search to describe a
>> catalog of photometric data.
>>
>> Here is a snippet from my Java tests:
>>
>> Entity photometryPoint = new Entity("PhotometryPoint");
>>
>>> photometryPoint.add(new Attribute("SpectralAxis|data:**DataAxis-1.0",
>>> photometryPoint));
>>> photometryPoint.add(new Attribute("UFilter|data:**DataAxis-1.0",
>>> photometryPoint));
>>> photometryPoint.add(new Attribute("GFilter|data:**DataAxis-1.0",
>>> photometryPoint));
>>> photometryPoint.add(new Attribute("RFilter|data:**DataAxis-2.0",
>>> photometryPoint));
>>>
>>
>>
>> Basically I am adding a bunch of DataAxis with different names. However,
>> note how I can mix DataAxis instances of different versions. For clarity,
>> in the example I omit the mandatory field and the description.
>>
>> Now, the utypes are generated only at this stage, for the dynamically
>> created PhotometryPoint Entity. They are created using the strategy
>> described in the current draft (alternative notation), but without any
>> namespaces and starting from PhotometryPoint. Recall that the
>> PhotometryPoint Entity isn't defined in any standard document, so neither
>> are the utypes (and this is where the black magic will come in, later).
>>
>> So, if I ask my prototype to generate and list the Utypes for
>> PhotometryPoint, here is what I get:
>>
>> PhotometryPoint.SpectralAxis.**Value
>>
>>> PhotometryPoint.SpectralAxis.**DataType
>>> PhotometryPoint.SpectralAxis.**Quality
>>> PhotometryPoint.SpectralAxis.**Name<http://PhotometryPoint.SpectralAxis.Name>
>>> <http://photometrypoint.**spectralaxis.name/<http://photometrypoint.spectralaxis.name/>
>>> >
>>> PhotometryPoint.SpectralAxis.**Unit
>>> PhotometryPoint.SpectralAxis.**UCD
>>> PhotometryPoint.SpectralAxis.**Accuracy.StatError
>>> PhotometryPoint.UFilter.Value
>>> PhotometryPoint.UFilter.**DataType
>>> PhotometryPoint.UFilter.**Quality
>>> PhotometryPoint.UFilter.Name <http://photometrypoint.**ufilter.name/<http://photometrypoint.ufilter.name/>
>>> >
>>> PhotometryPoint.UFilter.Unit
>>> PhotometryPoint.UFilter.UCD
>>> PhotometryPoint.UFilter.**Accuracy.StatError
>>> PhotometryPoint.GFilter.Value
>>> PhotometryPoint.GFilter.**DataType
>>> PhotometryPoint.GFilter.**Quality
>>> PhotometryPoint.GFilter.Name <http://photometrypoint.**gfilter.name/<http://photometrypoint.gfilter.name/>
>>> >
>>> PhotometryPoint.GFilter.Unit
>>> PhotometryPoint.GFilter.UCD
>>> PhotometryPoint.GFilter.**Accuracy.StatError
>>> PhotometryPoint.RFilter.**Accuracy.StatError
>>> PhotometryPoint.RFilter.Value
>>> PhotometryPoint.RFilter.**DataType
>>> PhotometryPoint.RFilter.**Quality
>>> PhotometryPoint.RFilter.Name <http://photometrypoint.**rfilter.name/<http://photometrypoint.rfilter.name/>
>>> >
>>> PhotometryPoint.RFilter.Unit
>>> PhotometryPoint.RFilter.UCD
>>>
>>
>>
>> Well, this is the list of utypes that will be used in the usual way, for
>> instance in a VOTable serialization. This alone allows a client
>> application
>> to reconstruct the object according to its original, dynamically created,
>> data model, for example providing a GUI Tree widget. [Of course, one could
>> take the PhotometryPoint prefix off the utypes. It doesn't change anything
>> in this description.]
>>
>> However, there is no way the client can spot objects in its conceptual
>> domain, because the utypes were undefined and so unknown to the client.
>>
>> So, in order for the black magic to work, the code provides me with
>> additional information to be added to the file header, in the form of
>> (utype, value) pairs (the "declarations"):
>>
>> (Declarations.Entity.**Instances,
>> data:DataAxis-1.0!**PhotometryPoint.SpectralAxis;**
>> PhotometryPoint.UFilter;**PhotometryPoint.GFilter)
>>
>>> (Declarations.Entity.**Instances, data:DataAxis-2.0!**
>>> PhotometryPoint.RFilter)
>>>
>>> (Declarations.Entity.**Instances,  data:Accuracy-1.0!**
>>> PhotometryPoint.RFilter.**Accuracy)
>>>
>>> (Declarations.Entity.**Extension,  data:DataAxis-2.0->data:**
>>> DataAxis-1.0->char:CharAxis-1.**0)
>>> (Declarations.Entity.**Extension, data:Accuracy-1.0->char:**
>>> Accuracy-1.0)
>>>
>>
>>
>> If these PARAMs are in the file header, the client program can start
>> hunting objects it knows. Let's say for example that the client is a
>> VO-Enabled plotting program. In the fictional universe of my example, this
>> VOPlot has the data:DataAxis model in its conceptual domain. Let's also
>> say
>> that it only knows version 1.0 of it.
>>
>> So, the plotting program parses the header going after the
>> Declarations.Entity.Instances utypes. There it finds some hooks in the
>> form
>> of utype prefixes. For example, going after data:DataAxis-1.0, it will
>> find
>> three instances associated to the utypes prefix
>> PhotometryPoint.SpectralAxis, .UFilter, and .GFilter. By appending the
>> known utypes terminals to these prefixes, the program can build the
>> instances of the DataAxis objects and use them.
>>
>> For example, it will check that all the points have consistent units on
>> each axis and convert them if necessary. It will ask the user "which axis
>> to plot on X?", and not "which column to plot on X?". Also, once the user
>> has made a choice, the program will be able to draw the error bars
>> automatically, without needing the user to ask. And let's not forget the
>> UCDs! A VO-enabled program will be able to parse them according to the
>> standard DataAxis-1.0 definition and could even figure out that, by
>> default, it can plot SpectralAxis on the X and the photometric axes on the
>> Y (and maybe Z in a 3D scatter plot), or generate a color-color plot
>> automatically, by simply combining the photometric axis.
>>
>> However, we haven't captured the instance of the DataAxis-2.0 type. Well,
>> the client program can sneak into the Declarations.Entity.Extension PARAMs
>> and find that data:DataAxis-1.0 is extended by the 2.0 version, and get
>> the
>> handle to the RFilter axis. Since the extensibility mechanism is
>> standardized and "type safe", the plotting program can still use the
>> information to plot the points along that axis.
>>
>> It might be a good idea, though, to inform the user that something is
>> slightly wrong. For example, the client can go to the URL where the
>> standard description of the data:DataAxis model is defined (this
>> information must be in the file, of course, associated to each of the
>> namespaces), look up the updates in the new version, and display what
>> semantic changes occurred and to which attributes. Or it could just inform
>> the user that the file was built for a different version of the Data
>> Model,
>> and let the user figure out how much of a problem this is.
>>
>> It should be clear now that if a client was designed to support the
>> char:CharAxis-1.0 model, it would be able to identify all the instances
>> that extend this model and handle them according to its use cases.
>>
>> So, by using the metaDataModel I can dynamically define new Entities by
>> re-using concepts defined in the IVOA standards. Using the standardized
>> serialization strategy, I can store the entities in a tabular format,
>> including the relevant metadata about the model. This metadata will allow
>> clients to identify objects in their conceptual domain and instantiate
>> them.
>>
>> This meets the requirements, and in doing so enables the original use
>> cases
>> (and many more).
>>
>> --
>> Omar Laurino
>> Smithsonian Astrophysical Observatory
>> Harvard-Smithsonian Center for Astrophysics
>> 100 Acorn Park Dr. R-376 MS-81
>> 02140 Cambridge, MA
>> (617) 495-7227
>>
>>
>
>
> --
> Mireille Louys, assistant professor at  UDS: ENSPS, Laboratoire ICube et
> CDS
> Observatoire de Strasbourg
> mail to: mireille.louys at unistra.fr
> Tel: +33 3 68 85 24 34
> Adress 1: CDS/Observatoire de Strasbourg
> 11, rue de l'Université
> 67000 STRASBOURG
>
>
> ______________________________**_________________
> utypes mailing list
> utypes at ivoa.net
> http://www.ivoa.net/mailman/**listinfo/utypes<http://www.ivoa.net/mailman/listinfo/utypes>
>

-- 
Omar Laurino
Smithsonian Astrophysical Observatory
Harvard-Smithsonian Center for Astrophysics
100 Acorn Park Dr. R-376 MS-81
02140 Cambridge, MA
(617) 495-7227
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/utypes/attachments/20120207/08250d6c/attachment-0001.html>