RDF Proposal (Was: Re: SKOS concepts in VOTable

Tue Jun 5 15:30:21 PDT 2012

Brian and all, hello.

This is a rather discursive response -- it's thinking aloud.

On 2012 Jun 2, at 02:38, Brian Thomas wrote:

>> All the things you (Brian) say are true (one could also imagine a sort of VOTable RDFa), but:
>> 
>>  * The rdf:* elements look icky, and including them would require schema changes to avoid them 
>>  breaking validation (which many people care about).  That's not going to fly, because it was an explicit 
>>  principle of the current forced changes in forthcoming 1.3, that this wouldn't be an opportunity to embark
>>  on wholesale VOTable revisions: the proposed changes to link/@content-role are only acceptable 
>>  because they require no schema changes, and no more than a usage note added to the relevant 
>>  section of the document.
>> 
> 
> Icky?!? I suppose its "eye of the beholder" and all that. I like seeing namespaced stuff. Nevertheless, a personal 
> standard of beauty is not an argument for or against using RDF (or namespaces on attributes).
> 
> As for the breaking of validation on the schema, this is, of course true. This is, of course,
> not a unique feature of my proposal, I'll note that a few other proposals (even a new one today)
> also have this 'flaw'. 

I've long been an enthusiast for the sort of 'mixin' extension mechanism which your proposal represents -- the idea of being able to add an rdf:Description element into an otherwise unrelated document instance, in a way which is invisible to a document processor which 'sees' VOTable (in this example) but ignores the unrecognised elements.  I came across this first in HyTime, and it's never quite gone away (the wikipedia page at <http://en.wikipedia.org/wiki/HyTime> is opaque about the point of it, but illuminating about its legacy, which just about matches my experience of hacking my way through the gottverdamned standard).

The problem with that is that you either (i) bake the extension elements into an extended schema, which loses the flexibility which was the original point, and annoys everyone who's depending on the original, and probably simpler, schema; or (ii) have a lot of fun-and-games and end up, as HyTime did, inventing a sort of rococo meta-schema mechanism which is intriguing and satisfying, but which is complicated enough that it's going to have ... major difficulty in the market of ideas; or (iii) abandon validation, and tell everyone "duck typing is good -- process what you recognise".   Option (i) is all 'con' and no 'pro', and option (ii) is fun, but I eventually gave up even aspiring to sell it.  I quite like option (iii), but no-one seems to agree with me.  We're doomed.

Or not: there's RDFa, which has the advantage of being more completely orthogonal to the *ML markup than HyTime ever managed to conceive of, and thus to evade some of the problems.

Brian, were you thinking of RDFa when you were describing your earlier proposal?

For those who aren't familiar with it, the idea of RDFa is that it's a smallish extension to the HTML DTD which allows one to embed a broad range of RDF statements into an HTML document.  There's a good example in the Wikipedia article <http://en.wikipedia.org/wiki/Rdfa>, but

    <p xmlns:dc="http://purl.org/dc/elements/1.1/">This page was
    written by <span property='dc:creator'>Norman</span>.</p>

...illustrates how it can intersperse normal HTML and RDF triples like "<> dc:creator 'Norman'."

Now, RDFa is defined with respect to HTML, but there's no reason why one couldn't define an RDFa-like thing for VOTable, and the registry, and any other XML used in the IVOA.  It would mean defining a couple of extra attributes in each of the relevant schemas, and mandating that they're ignored by existing applications.  I think the result of that thought-experiment would look very similar to what you're proposing.

So, onward to actually replying to your message.

> 1. Regardless of the mechanism chosen, I'll posit that semantic labelling is unlikely to be initially very 
> popular with many sites/services in the IVOA.

We -- and for the purposes of this screed I mean semantics at ivoa -- have failed to persuade many people of the advantages of this sort of approach.  We're sure we've got the next XML here, but can't persuade people away from roff.

Part of the resolution here is, I believe, a 'build it and they will come' attitude: we build systems which solve simple existing problems in a way which is thought-provoking and principled.  The 'no schema changes' proposal for VOTable is I think such a thing, but that doesn't preclude thinking further into the future.

> 2. We should be putting together (as has been called for by various persons already on 
> this list) is a standard for semantic labelling which can span across more than the VOTable standard.
> This means any proposal must be orthogonal and separable from the existing standard which becomes 
> "semantized". 
> 
> 3. Its already been pointed out that  the existing VOTable standard is not going to be modified for the 
> sake of semantics, and this has caused many on this list to do some serious mental gymnastics in order
> to retrofit the semantics into VOTable; none of which, I'll hazard, are particularly 'pretty' (heh).

I think it was Sébastien who initially proposed repurposing the LINK element, and I think that's inspired, because it solves a simple case of the problem in a way which is defensible in principle, and without gymnastics.

You're maybe thinking of the vagueness of the 'type-like' relation implied by content-role='type'.  It is vague, yes, but it's no _more_ vague than the UCD or utype relations which are already in use, so it's clearly not _unacceptably_ vague to VOTable users.

Is it too vague to be a SemWeb solution?  No: the linked data world has run up against such vagueness again and again ("what does owl:sameAs really _mean_?"), and the vagueness of the DC properties and the mutability of the FOAF ones, and they've just come to accept heuristics.

The possibility of the LINK[@content-role='doc']/@href returning RDF could act as a back door^W^W loading bay for associating arbitrary extra triples with a VOTable in a way which is both precise, and completely ignorable for existing VOTable users.  Your proposal, or an RDFa-style inflection of it, could be another.

> Under the weight of these points I'd then suggest that we pitch a new, hybrid, standard which *is* easily 
> filtered back to the original standard and which uses a widely accepted semantic labelling mechanism 
> which we can reuse beyond VOTable labelling.

VOTable is probably not going to change, but that needn't stop us using it as a skeleton on which to work out what such a mixin standard cum pattern would look like.  If we can devise a set of patterns which solve real problems while using VOTable, it'd be clear how to extend them to other contexts, too.

That'd need real problems to solve.  I'm sure we can all come up with some more or less abstract ones -- "associate a writer and a data with a GROUP" -- but for the purposes of this exercise can we identify some manifestly concrete ones which people have run across?

So, any offers?

[Brian: I believe I've by now addressed the points you made following this quote -- have I missed anything major?]

All the best,

Norman
[if you got down here, congratulations...]
> 

-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK