content, format, ctype, or xtype ?

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Tue May 12 09:07:54 PDT 2009


On Tue, May 12, 2009 at 07:42:06AM -0700, Rob Seaman wrote:
> On May 12, 2009, at 7:16 AM, Markus Demleitner wrote:
>
>> I'd be still much happier if there could be a separation of
>> concerns like my understanding so far:
>>
>> (a) ucd: What's in the column?
>> (b) unit: What's the unit of the values?
>> (c) utype: What's the role of this column in a defined data model?
>> (d) content, ctype, representation, whatever: What representation for a
>>    value has been chosen?
>
> - My immediate visceral reaction to all this metadata is that there are 
> numerous use cases in which practicing astronomers are encouraged to 
> manipulate their own data holdings into VO compliant formats. We are 
> likely all skeptical that they will bother with even a fraction of this.
-- which is fair.  I'm sure, though, that all of them will see the
need for unit.

The other pieces of metadata are IMHO primarily interesting when
machines exchange data, and they concern data people, not astronomers
directly.  They should ideally just see that some things magically just
work.  This is particularly true for utype, but quite certainly also
for UCD.

ctype, of course, is a different issue again.  However, few astronomers
will want to embed STC-S strings into their tables, so for the use
cases we have now, that would leave datetimes.  Maybe that's a sign
we want datetime as a VOTable type after all?  If not, I think it
shouldn't be hard to tell astronomers they have to say what kind of
time they are storing (or applications like Topcat can make sensible
choices, possibly domain-specific, for them).

> - The more left-brain response is that perhaps we aren't normalizing the 
> data structures properly.  Shouldn't some of this complexity be built into 
> the pertaining data model(s), for instance?
Well, "real" metadata on physical data is some orders of magnitude
more complicated than even these puny four (or three) values.
Agreed, it's not immediately obvious that we got the compromise
between "no" and "full" metadata right; but I think we're not very
far form an optimum between usefulness and relative ease of
implementation.

And I'd warn against referring to data models.  Much interesting data
will not have a data model (at any given point of time), and data
modelling is hard, time consuming, and prone with opportunities for
broken friendships.  

VOTables must IMHO be useful without data models, but be able to make
use of them when they are available.  This is what I believe utypes
are about.  Again, ctype is a different matter.

> - Or are we sidestepping a responsibility to choose appropriate defaults?  
> Units can be forced into narrow options, e.g., RA can be required to be 
> either the decimal degrees or sexagesimal hours that correspond to 99.9% 
Well, you'd have my vote for a canoncial representation of datetimes
any minute, and coordinates as well.  But I doubt you'll get all the
interested parties agree just *what* the canonical representation
would be.

Plus: How would you tell your two representations apart ("if it's a
string, it's sexagesimal, if it's numeric, it's decimal degrees" has
an eerie air about it...)

> - And it seems like we have yet to even clarify what (a) means.  What  
> does "What's in the column?" mean, precisely?  Does anybody have the URL 
> to Tom McGlynn's screed on UCDs?
Well, the UCD specification gives you an idea what they currently can
express.  This, I would maintain, is the current VO answer of
what (a) means.  I'll give you that UCDs are severely restricted, and
I'd hope that at some point (possibly even before the Mars colony)
we'll come up with a "better" answer for the meaning of (a).  But for
now, I feel there are many interesting things we can do with UCDs
that we aren't doing yet.

Cheers,

          Markus



More information about the dal mailing list