content, format, ctype, or xtype ?

Mon May 4 11:14:09 PDT 2009

Thanks Pat, this helps clarify the issue.  This is a long email going
into the UTYPE issue in detail, but the conclusion is that *if* we just
want a simple way to specify the format of a votable field we probably
do want to define some new attribute.  The UTYPE issue itself is largely
independent of TAP, being part of the content TAP provides access to.

On Mon, 4 May 2009, Patrick Dowler wrote:
> 
> This is not intended to be a metadata concept at all, just an
> indication of the format a value is expressed in.
> 
> As far as I am aware, a FIELD has one utype attribute and a
> service with some data model will need to use that to describe it's
> content. That model could have multiple dates in it, or multiple
> positions: just specifying the STC utype is not sufficient to describe
> this.

In principle this is not necessarily a problem.  If the UTYPE of
a field is specified by a data model then the data model should
specify the allowable representation, units, etc. for the field
(or alternatively specify this as data elsewhere in the model).
If the field is not part of a data model then UTYPE is available and
could in principle be used to specify the "data model" as something
like ISO8601.

The likely issue though is that an application might want to know
something about the format of a field without having to understand a
complex external data model.  Hence it could be useful to have some
way to specify the format or representation of a field, separately
from the usage of the field in a data model.

> As it stands, I don't think it is good for the TAP spec to specify
> a usage for the utype attributes or somehow require a specific usage
> to express low level metadata... but this has already been asked and
> I think not answered:

I agree we should probably avoid most UTYPE issues for data tables,
at least for the present.  If a data table contains a data model then
the model should set the UTYPEs, but this is not something TAP needs
to be concerned with.

An exception is the TAP_SCHEMA, which is a data model and which also
provides a means to specify UTYPE information for the data tables
it describes.  Since the TAP_SCHEMA is itself a formal data model
the fields should have assigned UTYPEs.  These may be trivial,
e.g., directly corresponding to the field names which the schema
also defines.

> Sure, we could jam both of those in there, but it would be ugly:
> 
> utype='simdb:Simulation.ExecutionTime at stc:AstroCoords.Time.TimeInstant.ISOTime'
> 
> where @ is some arbitrary separator chosen to cause the least pain
> and suffering. Ugh.

I realize this is not being seriously suggested, but this would not be
a legal UTYPE, at least not according to current practice.  UTYPEs are
supposed to be simple fixed tags used to identify data model elements
flexibly in a variety of different kinds of software.  They are not
parsed (except by human eyes), rather simple case-insensitive string
equivalence is used to compare UTYPEs.

> There is a fundamental question here": how does one "use" a data
> model inside another data model? We don't know the answer to that,
> but I am also pretty sure TAP is not the place to answer it. However,
> since we don't know the answer, we don't even know if it is safe to
> defer to some other standard.

While I agree this is still a topic for analysis, SSA and DAL2 have
the concept of "component data models" which can be reused in more
complex data models ("DataID" for example is one such).  The intention
is that a component data model can be embedded within a larger model,
but still referenced separately as a component.  Elements of the
component get assigned full UTYPES by the larger model (i.e., more
elements to the left and a different namespace).  But the component
can nonetheless be referenced as a separate object, starting at the
root element of the component.  In this case the UTYPE defined by the
component model would be used to reference the same field.  Hence the
UTYPE is determined by context and UTYPEs are simple fixed strings,
rather than having some complex parsed referential expression.
Within a VOTable, the GROUP construct is used to group the fields
of a component data model.

> Even then, suppose there is some way to combine all the above
> metadata. It *still* does not specify the format of the content. It
> could still be STC-S or STC-X (elements from the stc namespace)
> or some other format. So I still think we need a format/content
> attribute for FIELD (and PARAM) so applications can grok the content
> (or happily not do so if they are not capable).

This is the main argument for doing this.  If we want a simple way to
specify the format of a field, without having to understand a complex
external data model, then a separate mechanism is required.

 	- Doug