UTYPEs and UFIs
Doug Tody
dtody at nrao.edu
Tue Sep 18 11:16:33 PDT 2007
On Tue, 18 Sep 2007, Jonathan McDowell wrote:
> Dear DM team,
> I have prepared a note on my view of the role and syntax of UTYPEs
> (data model field labels) and their relationship to what I call UFIs
> (ways to describe/locate uniquely a piece of information in a DM instance),
> to set things up for discussion in Cambridge.
> The doc is
> http://hea-www.harvard.edu/~jcm/vo/docs/utype/ut2.pdf
If we are going to attempt to formally define UTYPE I'd like to
volunteer to be involved as well. The DAL interfaces and their
associated applications probably represent the most evolved current
usage of UTYPE other than DM, and indeed the scope of UTYPE is broader
than just DM, as UTYPE is a general structural construct which can
be used (for example) to identify interface elements as well as data
model fields. UTYPE was originally created in a visit I made to
CDS several years ago, where we discussed the problem of the "VOX:"
UCDs used in DAL at that point.
While the usage of UTYPE is broader than data models, I think it is
fine to define UTYPE usage primarily within the DM group so long as
there is broader participation and consideration of use-cases.
It appears that UTYPE/UFI is scheduled to be discussed in the VOQL
session in Cambridge, which is parallel with DAL-2 so I will probably
have to miss the discussion - others may have the same problem;
perhaps this could be discussed in the DM session instead?
Some specific comments on the issues raised in the document follow.
- Doug
Section 1 Introduction pg 4
One could also mention that UTYPEs are used in the SSA data
model and interface; this is based upon Spectrum but incorporates
additional access-related elements as well. SSA and Spectrum are
an interesting test-case where two different constructs share the
same data model elements, which has implications for how UTYPE
is used.
Section 1.1 pg 5
The only thing I might differ with here is the suggestion that
"Name" (in the Spectrum.Target.Name example) can be considered
a UTYPE. "Name" is indeed a data model element or field with
implied hierarchical context, but the point of UTYPE is to flatten
a (sufficiently simple) data model so that the data model is
reduced to a simple set of named tokens, so that a wider range
of representations can be used.
In other words, my view is that UTYPEs should be fully qualified.
Otherwise we end up having to construct UTYPEs from UTYPEs,
which unnecessarily complicates software and is prone to error:
if a UTYPE can be either "simple" or fully qualified the software
has to work harder to figure out which is the case and generate
the fully qualified UTYPE required for external use.
Even with fully qualified UTYPEs there is always some context,
for example the overall container object (e.g., SSA or Spectrum),
or a re-usable "component data model" such as Target. This gets
into the issue of namespacing and scope discussed in 1.2.
Section 1.2 pg 5
For simplicity I suggest that the overall "container" (e.g.,
Spectrum or SSA) normally define the components that it uses, such
as Target. Versioning is then controlled by the container object.
If a "component" gets to be big enough we might separate it off
as a separate top level (stand-alone) data model.
The document states that either Spectrum.Target.Name or
spec:Target.Name could be used to represent namespaces.
I suggest the latter as it is easier to separate the namespace
from the UTYPE. That is, if we create a Spectrum object for
which the defauld namespace is "spec", the UTYPE within this
namespace is Target.Name. In an SSA namespace, we can also
have Target.Name. So long as we synchronize the two models,
this simplifies applications which use both types of objects.
(Note that the current Spectrum model differs from either of
these and uses "spec:Spectrum.Target.Name" - the "Spectrum." is
redundant).
Question: Top level scope.
I would argue that Target.Name is also fully qualified, within
the scope (namespace) defined by Spectrum (or SSA etc.). Hence
for example, "spec:Target.Name" is fully qualified.
Section 1.3 Combining data models
I agree that this needs more thought. My inclination would be to
argue against constructs such as
Coordinate.Resolution.PosAngle.Value;RedshiftFrame.CustomRefPos
(note the ";") which complicate the simple UTYPE syntax, and suggest
that string equivalence may no longer apply and that semantic parsing
is required. If we start to do this sort of thing we are losing
the point of UTYPE and starting to reinvent more general mechanisms.
The solution is probably either to simplify Characterization (it is
probably using STC in an awkward and overly general fashion; e.g.,
factor out the units etc. and fix them for the model); or if things
really need to be this complex, separate the data models.
Section 1.4 UFIs
Agreed that multiple instances of a data model within the same
container or name space are required; we have to deal with this.
If there are multiple instances of the same data model, the instances
share the same UTYPEs, so some higher level mechanism is required to
select one. This could be a UFI, but in general it is application
dependent, and the "container" (whatever it is) may provide its own
solution for this. So one question is do we really need UFIs?
What is their application domain? (I think we probably need them for
a range of applications, I am just raising the question).
I don't agree with Roy that we can require that UTYPEs be unique
within a VOTable. We have already violated this in SSA, where for
example there can be multiple instances of an Association. (Note
that client applications probably do not require a UFI construct
to deal with this case.)
Section 1.5 Value fields (decorated values)
I think I may have been the one who originally suggested this; it
is an issue currently in SSA/Spectrum and probably Char as well.
The problem is that we have something like Coverage.Location
and then later we want to add Errors or something, hence we
have Coverage.Location.Value, Coverage.Location.Error. This is
inconsistent and requires an interface change if we later add
this additional detail.
We see this now in SSA/Spectrum, where we have
Coverage.Location.Value
and (for example)
Coverage.Bounds.Extent
what if later we want to add an Error, Unit, etc. attribute to Extent?
Then we have
Coverage.Bounds.Extent.Value
and so forth, and the interface (UTYPE token) changes. In principle
this could happen to almost any data model element.
It is not clear if there is an elegant solution to this problem,
but perhaps the solution should be that if we reference a quantity
which may have attributes, we get the value of the quantity
by default. Then
Coverage.Location
Coverage.Location.Error
Coverage.Location.Unit
etc., and nothing changes if we come along later and add the
attributes.
One way to look at this is that Coverage.Location is not a class,
rather these are separate elements which are logically associated.
An advantage of this approach is that Error, Unit, etc. can be
specified either for an individual quantity, or globally (in a
frame, Accuracy model, etc.), without affecting the fundamental
value we are trying to represent.
Section 1.7 VOTable serialization
It is good to see the fully qualified UTYPEs in the GROUP
construct; that is what I was arguing for earlier. This simplifies
interpretation by a client application, and allows UTYPE to be
used to directly navigate to an element of the data model.
More information about the dm
mailing list