UTYPEs and UFIs

Tue Sep 18 11:16:33 PDT 2007

On Tue, 18 Sep 2007, Jonathan McDowell wrote:

> Dear DM team,
>  I have prepared a note on my view of the role and syntax of UTYPEs
> (data model field labels) and their relationship to what I call UFIs
> (ways to describe/locate uniquely a piece of information in a DM instance),
> to set things up for discussion in Cambridge.
>  The doc is
>   http://hea-www.harvard.edu/~jcm/vo/docs/utype/ut2.pdf

If we are going to attempt to formally define UTYPE I'd like to
volunteer to be involved as well.  The DAL interfaces and their
associated applications probably represent the most evolved current
usage of UTYPE other than DM, and indeed the scope of UTYPE is broader
than just DM, as UTYPE is a general structural construct which can
be used (for example) to identify interface elements as well as data
model fields.  UTYPE was originally created in a visit I made to
CDS several years ago, where we discussed the problem of the "VOX:"
UCDs used in DAL at that point.

While the usage of UTYPE is broader than data models, I think it is
fine to define UTYPE usage primarily within the DM group so long as
there is broader participation and consideration of use-cases.

It appears that UTYPE/UFI is scheduled to be discussed in the VOQL
session in Cambridge, which is parallel with DAL-2 so I will probably
have to miss the discussion - others may have the same problem;
perhaps this could be discussed in the DM session instead?

Some specific comments on the issues raised in the document follow.

	- Doug

Section 1 Introduction pg 4

    One could also mention that UTYPEs are used in the SSA data
    model and interface; this is based upon Spectrum but incorporates
    additional access-related elements as well.  SSA and Spectrum are
    an interesting test-case where two different constructs share the
    same data model elements, which has implications for how UTYPE
    is used.

Section 1.1 pg 5

    The only thing I might differ with here is the suggestion that
    "Name" (in the Spectrum.Target.Name example) can be considered
    a UTYPE.  "Name" is indeed a data model element or field with
    implied hierarchical context, but the point of UTYPE is to flatten
    a (sufficiently simple) data model so that the data model is
    reduced to a simple set of named tokens, so that a wider range
    of representations can be used.

    In other words, my view is that UTYPEs should be fully qualified.
    Otherwise we end up having to construct UTYPEs from UTYPEs,
    which unnecessarily complicates software and is prone to error:
    if a UTYPE can be either "simple" or fully qualified the software
    has to work harder to figure out which is the case and generate
    the fully qualified UTYPE required for external use.

    Even with fully qualified UTYPEs there is always some context,
    for example the overall container object (e.g., SSA or Spectrum),
    or a re-usable "component data model" such as Target.  This gets
    into the issue of namespacing and scope discussed in 1.2.

Section 1.2 pg 5

    For simplicity I suggest that the overall "container" (e.g.,
    Spectrum or SSA) normally define the components that it uses, such
    as Target.  Versioning is then controlled by the container object.
    If a "component" gets to be big enough we might separate it off
    as a separate top level (stand-alone) data model.

    The document states that either Spectrum.Target.Name or
    spec:Target.Name could be used to represent namespaces.
    I suggest the latter as it is easier to separate the namespace
    from the UTYPE.  That is, if we create a Spectrum object for
    which the defauld namespace is "spec", the UTYPE within this
    namespace is Target.Name.  In an SSA namespace, we can also
    have Target.Name.  So long as we synchronize the two models,
    this simplifies applications which use both types of objects.

    (Note that the current Spectrum model differs from either of
    these and uses "spec:Spectrum.Target.Name" - the "Spectrum." is
    redundant).

    Question: Top level scope.
	I would argue that Target.Name is also fully qualified, within
	the scope (namespace) defined by Spectrum (or SSA etc.).  Hence
	for example, "spec:Target.Name" is fully qualified.

Section 1.3 Combining data models

    I agree that this needs more thought.  My inclination would be to
    argue against constructs such as 

	Coordinate.Resolution.PosAngle.Value;RedshiftFrame.CustomRefPos

    (note the ";") which complicate the simple UTYPE syntax, and suggest
    that string equivalence may no longer apply and that semantic parsing
    is required.  If we start to do this sort of thing we are losing
    the point of UTYPE and starting to reinvent more general mechanisms.

    The solution is probably either to simplify Characterization (it is
    probably using STC in an awkward and overly general fashion; e.g.,
    factor out the units etc. and fix them for the model); or if things
    really need to be this complex, separate the data models.

Section 1.4 UFIs

    Agreed that multiple instances of a data model within the same 
    container or name space are required; we have to deal with this.

    If there are multiple instances of the same data model, the instances
    share the same UTYPEs, so some higher level mechanism is required to
    select one.  This could be a UFI, but in general it is application
    dependent, and the "container" (whatever it is) may provide its own
    solution for this.  So one question is do we really need UFIs?
    What is their application domain?  (I think we probably need them for
    a range of applications, I am just raising the question).

    I don't agree with Roy that we can require that UTYPEs be unique
    within a VOTable.  We have already violated this in SSA, where for
    example there can be multiple instances of an Association.  (Note
    that client applications probably do not require a UFI construct
    to deal with this case.)

Section 1.5 Value fields (decorated values)

    I think I may have been the one who originally suggested this; it
    is an issue currently in SSA/Spectrum and probably Char as well.

    The problem is that we have something like Coverage.Location
    and then later we want to add Errors or something, hence we
    have Coverage.Location.Value, Coverage.Location.Error.  This is
    inconsistent and requires an interface change if we later add
    this additional detail.

    We see this now in SSA/Spectrum, where we have

	Coverage.Location.Value

    and (for example)

	Coverage.Bounds.Extent

    what if later we want to add an Error, Unit, etc. attribute to Extent?
    Then we have

	Coverage.Bounds.Extent.Value

    and so forth, and the interface (UTYPE token) changes.  In principle
    this could happen to almost any data model element.

    It is not clear if there is an elegant solution to this problem,
    but perhaps the solution should be that if we reference a quantity
    which may have attributes, we get the value of the quantity
    by default.  Then

	Coverage.Location
	Coverage.Location.Error
	Coverage.Location.Unit

    etc., and nothing changes if we come along later and add the 
    attributes. 

    One way to look at this is that Coverage.Location is not a class,
    rather these are separate elements which are logically associated.
    An advantage of this approach is that Error, Unit, etc. can be
    specified either for an individual quantity, or globally (in a
    frame, Accuracy model, etc.), without affecting the fundamental
    value we are trying to represent.

Section 1.7 VOTable serialization

    It is good to see the fully qualified UTYPEs in the GROUP
    construct; that is what I was arguing for earlier.  This simplifies
    interpretation by a client application, and allows UTYPE to be
    used to directly navigate to an element of the data model.