ivoa: DM type system

Thu Apr 13 17:23:43 CEST 2017

Markus, All,

>
> Well, I've put it there, and as regards pulling the ivoa data model
> out of VO-DML proper, I think there still is a point.  Keeping what
> it specifies fluid until we have actual experience how these things
> work out while fundamental things like stc and whatever we use to
> model value+error are in flux would seem wise to me.
>

Hear, hear. Shall we do it?

> I feared it would come to this.  After long contortions, we
> eventually reached agreement on using
>
>   datatype="char", arraysize="*", xtype="timestamp"
>
> in DALI, sect. 3.3.3, for these.
>

If I am not mistaken the schema says xtype is a string (more specifically
an xs:token) while datatype is restricted to a bunch of strings.

You then need three pieces of information (datatype, arraysize, and xtype)
to express that something is a timestamp, but that's just because the
schema mandates datatype (and in this case arraysize to be present) because
otherwise xtype (and maybe even ucd) would be enough for the semantics. For
a lot of code out there that doesn't parse xtype, the datatype and
arraysize are also enough to figure out the content is a string (not a
number or a boolean), and treat it as such.

I believe there could be value in keeping the votable datatypes in the
mapping syntax for the reasons you explained, yet the existence and usage
of xtype demonstrates that you need more semantics in some cases.

In the current mapping draft (I hope we can get an announcement early next
week, I have been pushing for announcing the draft for a while) you have
two xml types that refer to FIELDs and to PARAMs. For these elements I
wouldn't mandate anything other than the vodml type (currently called
dmtype, more on this below). The datatype and maybe xtype will be in the
elements being referenced.

As mentioned, the problem is just with LITERAL. The compromise that comes
to mind is to provide LITERAL with votable-like datatype and arraysize
attributes. [although for the reasons Gerard mentioned I would keep them
separate from the current VOTable types so we can keep the schemata
distinct... that's a second order detail, though].

One could then use the currently defined xtypes to come up with a core of
ivoa types. The problem here is that the vodml type descriptor needs to be
a vodmlref, i.e. an ns:type kind of string, which "timestamp" and the like
are not. So timestamp would need to become ivoa:timestamp.
Another option would be to define an xtype model with a bunch of vodml
primitive types, so you would have xtype:timestamp. Again, second order
detail, the point being that you need a qualified name for the dmtype
attribute.

Another option might be to explicitly map ivoa primtitive types to xtype
definitions (I guess there are also some VODML DataTypes, e.g.
xtype="point"), e.g. ivoa:datetime -> xtype=timestamp (are they truly the
same, though?) and votable datatype's, e.g. ivoa:string -> datatype="char",
arraysize="*". For ivoa primitive types this could be done in the mapping
document itself, for more complex stuff like xtype="point" this can only be
done once the relevant model is standardized, e.g. STC. It's then less
clear where the additional mappings like xtype="point" ->
dmtype="stc2:Coordinate.Point" (I am making this up right now) should go.

> But if the price for this is that people, within one VOTable, have to
> recognise timestamps in LITERALs by seeing it's
> vodml-type="ivoa:datetime" and using one literal parser, while having
> to check xtype and use a different literal parser when it's in PARAM
> makes me cringe.
>
>
Would this be reasonably avoided with any of the above suggestion?

> Plus, it won't stop with datetimes.
>

The thing is that it probably shouldn't, because the current votable
datatypes and xtypes are not enough anyway, and xtype already goes beyond
simple primitive type (e.g. xtype="point"). On the other hand, there can't
be too many primitive types. Yes, we could make mistakes, have too few
primitive types or we could compromise our way to self-destruction by
introducing primitive types for everything, but that seems reasonably
avoidable with the emerging framework.

Is that a benefit proportional to forcing VOTable implementors to
> supporting a second, incompatible type system with its own rules for
> serialisation?
>
> As I said, VODML annotation will do extremely basic things in the
> future.  I don't see how any VOTable library could get away with not
> supporting it.
>

I wouldn't like to give up a better mechanism because we currently have
what you claim to be a sub-optimal one. At the same time you make some good
points about having to compromise to make life easier for implementors, and
it would be good to find a reasonable compromise. To me the compromise is
in turning the "incompatible" above to "compatible". Is that possible
without dumbing down the syntax?

>
> > Also, it was decided that the dmtype would be mandatory even if the type
> is
> > not being cast to a more specific type than the one declared for a Role?
> >
> > Would you want to remove that requirement form the mapping?
>
> Since the requirement doesn't seem to be something that makes
> client's lives easier, I'm all for removing it, yes.
>

It depends on the client, though. How does xtype="timestamp" make client's
lives easier? Isn't it "just a string" for a lot of use cases? Thing is,
there are use cases where knowing that something is a timestamp, and
assuming some domain knowledge about times, is rather useful, especially in
astronomy (hello, Capt. Obvious). The same should be true for other
primitive types. Not many, but the current votable datatypes don't help
either, as you and Gerard pointed out earlier from different angles.

By the same token, there will be clients that know about specific roles,
and so they don't need the extra type information, and clients that don't
know about roles for a lot of models but they still care about the
primitive types, e.g. a datetime/timestamp. And yes, there will be clients
that won't care about datetime/timestamps and just want to know it's a
string of characters.

> Since you mention it: Why not?  Sure, an extra reference is involved,
> which is always bad, but as far as I can see there's nothing you can
> do with LITERAL that you can't do with CONSTANT, and one feature less
> is always a big win.
>
>
I have mixed feelings about this. I like the idea that the new VODML
element could represent a standalone annotation, which means you need
LITERAL.

To recap:
  1. my personal Yay! to removing the ivoa model from the current VODML PR.
  2. I believe it's important to have the extra flexibility and
standardization that goes beyond the votable datatypes, and the ivoa
primitive types represent that.
  3. I believe it's important to require the dmtype on LITERAL, for clients
that are model-unaware but know how to deal with primitives (and dmtypes go
beyond votable datatypes because of point 2.)
  4. We should make sure we keep the ivoa model close to what was already
introduced, so to make the type systems compatible.
  5. We should carefully decide what goes and what doesn't go in the list
of primitive types. Too few and we lose semantics, too many and
interoperability is hurt. Maybe we should even constrain primitive types to
only be defined in the ivoa model.

And I think it's good to have a mechanism that goes beyond the current
limitations/complexity of VOTable. How many attributes do FIELDs and PARAMs
have in order to describe what of a piece of (meta)data is? xtype, utype,
datatype, ucd, as well as the deprecated 'type', I believe. That could be
fixed in the new LITERAL element by having only one dmtype element, which
uniquely describes the type, and by a sane set of primitives. Some
compromises might make it a little bit more complex, but hopefully not too
much.

Omar.

> $ python -c "import this" | grep preferably
> There should be one-- and preferably only one --obvious way to do it.
>
>
Yet Python is a rather bad example of this (not as bad as perl!), at least
as soon as you go beyond the basic syntax and start doing something useful
with it. True, they are slowly fixing this in Python 3, by adding new,
admittedly saner ways of doing the same things you could do in Python 2.
And idiomatic Python can sometimes be far from obvious for beginners. The
very existence of an "idiomatic Python" defeats the rule above, which is
one of the reasons I think there is a telling second part of the rule:
"Although that way may not be obvious at first unless you're Dutch." ;)

[Teasing mode off]

-- 
Omar Laurino
Smithsonian Astrophysical Observatory
Harvard-Smithsonian Center for Astrophysics
100 Acorn Park Dr. R-377 MS-81
02140 Cambridge, MA
(617) 495-7227
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20170413/c2c0d313/attachment.html>