UType proposals
Norman Gray
norman at astro.gla.ac.uk
Fri Jun 12 10:30:22 PDT 2009
Gerard, hello.
On 2009 Jun 12, at 15:31, Gerard wrote:
>>> The latter allows one to find the element represented by
>> the string in
>>> the model by parsing the string.
>>
>> I'm nervous of 'parsing the string'.
>
> I meant really only parsing as in "reading", or what you call
> "informally
> parsing by humans".
Ah right -- I'm with you there.
>>> So for the DM group to propose a rule for deriving Utype-s from the
>>> data model that does not require a separate lookup, i.e. is
>> parsable
>>> by humans I think is not a bad thing as it promotes homogeneity and
>>> readability between different modelling efforts.
>>
>> The recipe that Mireille proposes is informally parseable by humans,
>> and that has substantial mnemonic value if it is used
>> homogeneously.
>> It's just that turning that into a formally parseable thing
>> would have substantial costs with few benefits.
>>
> So are you suggesting that it might be ok for the Utype document to
> RECOMMEND, or SUGGEST
> that data modellers use these rules? Or should the document get rid
> of these
> rules altogether,
> in which case their "substantial mnemonic value" would be lost as
> well.
I was thinking of SHOULD in the RFC 2119 sense: "This word, or the
adjective "RECOMMENDED", mean that there may exist valid reasons in
particular circumstances to ignore a particular item, but the full
implications must be understood and carefully weighed before choosing
a different course."
>> <param id='foo'
>> utype='http://www.ivoa.net/dm/simdb/v1.0#Simulated.Foo'
>>>
>> xxx
>> </param>
>>
>> or
>> <VOTABLE xmlns:simdb='http://www.ivoa.net/dm/simdb/v1.0#'>
>> ...
>> <param id='foo' utype='simdb:Simulated.Foo'>
>> xxx
>> </param>
>> </VOTABLE>
>>
[...]
>> nt way of representing this, if that were
>> deemed necessary. That doesn't matter, because it's the full
>> URI -- possibly after some string concatenation -- which the
>> resulting application would be required to recognise.
>>
> I was assuming that you meant for utypes to be something like your
> first
> case.
> I was referring to the second usage, including the explicit
> xmlns:simdb=...
I did mean that the UTypes should be the full URI as in the first
case, but that the difference between this and the second case is
merely a matter of syntax -- the definition of the serialisation.
That is, when processing the second case, an application's first step
would be to concatenate the two bits of information into a single URI
string -- this namespace information is readily available when
processing a SAX stream or within an XSLT template. Or rather -- as
is the usual way of specifying these things -- it should act _as if_
it had done that, since it might well be more efficient or
straightforward in a particular case to do something more direct. As
before, there would be a different normalisation step in the case of a
FITS serialisation (I emphasise this in order to emphasise that there
is nothing here which is fundamentally coupled to 'XML namespaces').
> What I gathered from the discussion about the new 'xtype' attribute
> in the
> VOTable session seemed to indicate
> that no such xmlns declaration was desired in a case that has
> similarities
> to what we discuss here.
I confess I didn't follow all of the xtype discussion.
>>> Finally, I think one thing that Mireille's note does not
>> make clear is
>>> that to be able to have a rule deriving parsable Utypes from a data
>>> model such as the one used in SimDB, one must have defined the
>>> syntactic elements for expressing one's data model. In SimDB we do
>>> this explicitly and we have proposed a similar approach to the DM
>>> group. Once one has that one may also have hope of creating
>> instances
>>> for a specified Utype.
>>
>> I don't think I follow you. By 'syntactic elements' do you
>> mean some parseable syntax for UTypes?
>>
>
> No. What I meant was that it seems to me that if one wants to
> associate
> Utypes in a meaningful way to a data model, one needs to understand
> what
> kind of data model construct they may refer to/correspond to.
> The BNF-like syntax for utype-s assumes implicitly the existence of
> certain
> data model concepts (the "syntactic elements"): Class, Model,
> Package etc.
> Utypes may refer to any of these.
I see what you mean. I think there are some who would disagree
emphatically with you here, and assert that UTypes can only describe
things which have literal values. That's what I take from the
emphasis on the use-case of reconstructing an instance of a model from
a set of key-value pairs.
Myself, I agree with you, that it would be useful to associate
'UTypes' with each of the Classes, Models, and Packages in a data
model. Given a UML data model (or a XSchema data model, or whatever
modelling framework you prefer), it would be straightforward to
develop a simple ontology which reflected it, and RDFS or OWL would be
the languages to do that in. However I want to keep this fuller use-
case separate from the proposal for Utypes-as-URIs, in order to keep
their distinct advantages distinct, and to avoid too much talking at
cross-purposes.
> Currently the DM group does not have an agreed upon language in
> which to
> express data models.
> But in particular when you want machines to do something with utypes
> they
> must be able to find out what kind of thing they are referring to.
> There is quite some difference between an attribute and a reference,
> or
> between a package and a class.
> If we do not agree on a language for expressing the data models, it
> will be
> hard to code against them.
> Again, this is what SimDB has actually done. Because of this we can
> write
> code that uses metadata about a model (expressed in our intermediate
> representation), to infer things about instances of the model in
> various
> forms (XMl, Java, RDB). Admittedly we are not using Utypes, these
> are too
> limited for this purpose.
All I think the IVOA UType standard has to do is agree on a way of
naming bits of models. In certain circumstances, it'll be possible to
know more about the relationships between those bits of models: in the
case of SimDB for example, there will be lots of extra information in
the XMI (say) describing the rich interrelationships between these
model items; the same would be true of SSA, say, though there the
interrelationships are described primarily in text (if I recall
correctly; at any rate, I don't think there an SSA XMI, nor do I
believe there rfc2119-should be). These interrelationships can be
exploited by code hand-written or generated from an XMI file.
It's at a higher layer of interoperability that a restricted view of
what UTypes are for will pay off. I can imagine an application which
might want to handle bits of SSA, bits of SimDB, and some (SKOS)
vocabulary terms, perhaps using information pulled from an RDB, FITS
files and a registry query. That sort of application probably isn't
going to benefit from an intricately described structure for each of
the data models, but it _can_ benefit from a consistent and technology-
neutral way of naming entities (ie, UTypes), and a consistent way of
finding display labels, and (here moving into a potential payoff from
RDF) a consistent way of finding lightweight interrelationships, such
as that a simulated galaxy is the same sort of thing as a SIMBAD-
galaxy. [Just to be clear: that last one goes beyond what I'm
suggesting in this UTypes proposal].
>> This is indeed much less clear. I think it would be valuable
>> and fairly easy to do this, and it would mean that you could
>> envisage a future query which asked for (all of) a table by
>> giving the table's UType. The framework for this is in, for
>> example, section 3 of the proposal
>> <http://nxg.me.uk/note/2009/utype-proposals/#composite>,
>> which effectively suggests that char:coverage.location.coord
>> be a UType which has a structured value, and this is perfectly
>> strict.
>>
>> However, as Doug and others have argued, the current primary
>> use-case for UTypes is the notion of a list of key-value
>> pairs, where the keys are UTypes and the values are literals
>> (ie, columns or single values). I think there are benefits
>> and few costs to going beyond that (if one has a clear idea
>> of what one is doing), but that's where this argument would live.
>>
>
> I think the origin for utype was as an extra attribute on FIELD,
> where I
> though it was supposed to assign extra meaning to the column in the
> table,
> somewhat (but not very much) different from UCDs.
I remember this, too. As I mentioned in my utype-questions posting, I
think there are multiple conceptions of what UTypes are and are for,
and that these are not always compatible, nor written down with much
precision. I listed the key-value-pair use-case as an explicit goal
in the utype-proposals posting, just so it was explicit which problem
I thought I was solving.
> But utype attributes are now everywhere in VOTable, also on GROUP,
> TABLE and
> RESOURCE. These are all complex constructs, and one might worry that
> in
> general it may not be correct to assume a 1-1 relation to a complex
> construct in a data model (unless designed to be so, like in SimDB's
> TAP
> mapping).
>
> For example consider a model for people with a class Individual having
> attributes
> - firstName
> - lastName
> - age
> - email
> - telephone number
>
> In a VOTable one might encounter a TABLE with name="Person" and FIELDs
> (surname, dateOfBirth, emailAddress).
> Could I add the utype people:Individual to the Person table?
That would seem fine and sensible to me. And the URI UTypes would
comfortably handle that, too.
> I seem to recall you telling me about a concept in ontologies/
> vocabularies
> that seems similar to me to the utype.
> If I am not mistaken in ontologies one can point from one ontology to
> another and declare that a thing in the former is
> similar(/equivalent/equal/?) to a thing in the latter. Is utype a
> simlar
> construct, and if so which (if any) of these meanings might it
> correspond
> to?
If we want to talk about ontologies, then yes, you can declare
relationships between classes A and B in different ontologies. If A
sameAs B, then if you state that a thing is a member of the class A,
then it'll appear when you ask for the members of class B. Or if A is
a subClass of B, then if you state that x is in A, then it'll appear
when you ask for the members of B, but not vice versa. But, again,
this is separate from the notion of URI UTypes: I don't want to be
thought to be smuggling ontologies here -- URI UTypes are a pragmatic
solution to a simple problem, but they don't block off sophisticated
solutions to harder problems. Here a UType is just a name for a
class, or a property -- rather than 'A' and 'B' above, you'd use
URIs. That's all I'm suggesting.
All the best,
Norman
--
Norman Gray : http://nxg.me.uk
Dept Physics and Astronomy, University of Leicester
More information about the dm
mailing list