utype questions

Tue May 12 13:37:18 PDT 2009

On Tue, 12 May 2009, Norman Gray wrote:
>
> On 2009 May 12, at 16:54, Doug Tody wrote:
>
>> The purpose of utypes is to "parameterize a data model", that is,
>> assign unique tags to each field of a data model.  The data model
>> in question may have some hierarchical structure, and in the process
>> of parameterizing the data model we "flatten" it, reducing it to
>> a set of name-value pairs.
>> 
>> The reason we do this is to separate the semantics of the data model
>> from the representation, to allow the same semantic content to be
>> reliably represented in many different ways, both externally and in
>> program structures and containers.  Hence, we can take a data model
>> instance, parameterize it via utypes, and store the resultant data
>> in the fields of a table, in a parameter set, in a hashmap in Java,
>> or even in a FITS header
>
> Then that turns into a formal requirement that the properties utypes 
> represent, or the objects with the types the represent, can be present in a 
> description only with a cardinality of zero or one.

A key point about UTYPE tagging is that we are not trying to provide
a completely general data structuring mechanism, merely something
which is simple and sufficient for at least 90% of a certain type
of application.  Where this is not sufficient we can instead use
a full fledged structured dataset of some sort.  We have a lot of
experience with parameter sets, object properties, keyword tables,
etc. in many contexts (including within astronomy) and it is clear
that one can do a lot with such a simple mechanism, particularly when
the tags themselves can have some structure and are namespaced.

You are right that within a given data model, UTYPEs must be unique.
However it is not true that there only one instance of the represented
object can be present.  There is no problem having multiple instances,
with the same UTYPEs in each, so long as the container provides some
mechanism for dealing with this (GROUP in VOTable for example can do
this although it can complicate referencing a bit).

> That provides in turn an explicit articulation of The Uniqueness Problem: 
> given a data model, is it feasible to generate a usable set of utypes which 
> can reconstruct the data model under this restriction?

It is not difficult to define a data model (actually usually a data
structure) which is so complex that a "flat" representation such as
UTYPE is not possible.  However one can argue that such complexity
should be avoided unless it really is necessary, due to the greater
complexity required to deal with such an object.  Also, when a single
data model gets so large that this becomes an issue, it may be time to
consider decomposing it into components which can be aggregated within
a generic container, as we have done for example in SSA and DAL2.

This component data model business where we aggregate and relate
simpler flat keyword-value sets is similar to the relational approach,
where a more complex object can be modeled as several tables which
are related by the table content and overall abstract model, rather
than via explicit, fixed hierarchical structure.  While simpler in
many respects, this approach can also be more flexible and powerful
than representing relationships in explicit structure, and the use of
generic containers (a votable, a DBMS, a parameter set, a FITS file,
etc.) permits use of generic tools regardless of what kind of object
is stored in the container.

>> Since the purpose of utype tags is to simplify manipulation of data
>> model instances by providing a simple keyword-value mechanism, we do
>> not want to parse utypes as this would defeat their whole purpose.
>
> I get the feeling there are multiple accounts of what 'their whole purpose' 
> is.  But that could be just my misunderstanding.

All I can speak to is what we had in mind with UTYPE in the first
place, and how they have been used in the DAL interfaces and
related software.  People keep trying to make the UTYPE mechanism
more complicated, more XML-specific, more of a runtime referencing
mechanism, etc., and I am trying to head this off less we lose what
has been accomplished and invested in current standards and software.

> I'm not proposing parsing anything, by the way.  I was very careful to avoid 
> proposing anything in the note I posted, but simply to point out what appear 
> to me to be unanswered but significant questions.  I can see that in some 
> cases (for example utype equality) each of the initial answers is 
> unattractive to someone, but ... answers don't become any less unattractive 
> by failing to ask the question!
>
>> UFI [...] plus we have other tags such as ID and NAME which
>> can provide shorter tags within a controlled context.  Examples of
>> all of the above are already in use in implementations and code today.
>
> I appreciate it's a complicated issue.  But that's sounding like a large 
> number of distinct elements to nibble at different parts of the problem, 
> which means that keeping things consistent, intelligible, and free from 
> ad-hockery, across multiple applications and disparate projects, could 
> potentially become quite difficult.

I agree there are some issues at the edges which need further
definition, and guidelines for using the mechanism in standard
patterns are needed, but lets make sure we do not lose track of the
basic concept in the process of dealing with these.

 	- Doug