Who chooses? (was Re: content, format, ctype, or xtype ?)
Mark Taylor
m.b.taylor at bristol.ac.uk
Wed May 13 04:10:16 PDT 2009
sorry all, I hit the wrong button. Please ignore my previous incomplete
message. I'll send a completed one shortly.
On Wed, 13 May 2009, Mark Taylor wrote:
> On Tue, 12 May 2009, Rob Seaman wrote:
>
> > I'm snow-blind with the blizzard of messages today. Returning to Mark's use
> > case from yesterday...
> >
> > On May 11, 2009, at 5:48 AM, Mark Taylor wrote:
> >
> > > The use case which I have in mind (and I think Doug is thinking along
> > > similar lines) is this: a user acquires a VOTable from some source -
> > > perhaps TAP, perhaps not. It contains a column X whose contents
> > > is a string in iso-8601 format - this is perhaps identified by
> > > utype with part of the STC data model, or with some other data model,
> > > or perhaps is not. The user loads the table into TOPCAT
> > > (or some other generic table handling software) and wants to make a
> > > plot with column X as one of the axes.
> > >
> > > As far as TOPCAT can tell, the column contains a string, and so it is
> > > unable to make a plot with it, or otherwise do anything much apart
> > > from display the string contents. If it understood that the column
> > > contained a string with the semantics of an iso-8601 date/time,
> > > it could make this plot. Yes it may be possible to glean this
> > > information by inspecting the utype, but in order to do that it needs
> > > to have an understanding of the data model in question - a lot of work
> > > for the developer, and needs to be updated every time a new data model
> > > appears or is modified. Moreover, the additional, probably rather
> > > detailed, information supplied by the utype is not relevant for this
> > > kind of processing.
> > >
> > > You can think of similar stories for 'ctype' (or whatever) values of
> > > stc-s, stc-x, sexagesimal, and other possibilities of your own device,
> > > including domain-specific ones. It should not be necessary to invent
> > > a data model in order to flag this kind of thing, partly for practical
> > > reasons (you need to reach agreement about a data model and update
> > > software each time), and partly because use of a data model is orthogonal
> > > to this issue.
> >
> > ...and subtracting out all the high-falutin' computer science issues from
> > today, we see that this is simply a question of whether to flag some value.
> > Whether the value is flagged or not, if TOPCAT is to do what Mark's user's
> > want, then TOPCAT has to be able to parse ISO-8601 datetime strings or
> > sexagesimal strings or stc-s strings. These parsing methods must all be in
> > place, the question is how to trigger them and who decides when to do so.
> >
> > Requiring an explicit metadata flag (whether expressed as a UCD, utype, ctype,
> > xtype, unit or whatever) implies that the data provider (or her minion
> > programmers) should be the one selecting how an application like TOPCAT
> > chooses to interpret different values. This, I think, is the real underlying
> > issue. Rather, might it not be asserted that TOPCAT is a power tool belonging
> > to the user?
> >
> > With a method to parse sexagesimal values - a method that is required in any
> > event - isn't it trivial for TOPCAT to activate user controlled plotting
> > capabilities for such string valued columns?
>
> Rob,
>
> you're right, you could do it like this. It's really a matter of
> convenience.
>
> At one end of the scale you can have a data format
> like CSV (no data type declared) and it's up to either the user,
> or the application to make sense of each value. If the user has
> to mark values explicitly as numeric, or double precision, or
> iso-8601 or whatever, it's fiddly for them, they have to read
> documentation, they may have to have a clue what iso-8601 means, ...
> If the application does it there may be performance implications.
> In either case, the wrong decision might get made.
>
> At the other end of the scale you have a data format featuring a
> semantic system (utypes, UCDs, units, plus maybe a load of other
> magic) so sophisticated that the application can make decisions
> on behalf of the astronomer about, say, how to perform a crossmatch
> between two tables.
>
> Of course where we want to be is somewhere between the two, and
> the question is exactly where. In my opinion where it's feasible
>
>
> the user could be required
> to declare
>
> a user tool could use CSV
> tables (no data type declared) and the user could be required to
> declare before use for each column whether it's a number, or a
> string and/or represents an angle, or a time,
>
> We could have CSV tables (no data type declared) and
> ask the user to mark each column before use as numeric
> >
> > Rob
> >
>
>
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
More information about the dal
mailing list