SKOS concepts in VOTable

Mon May 28 04:35:14 PDT 2012

Rick and all, hello.

On 2012 May 28, at 10:53, Frederic V. Hessman wrote:

> 
> On 25 May 2012, at 17:46, Norman Gray wrote:
> 
>>>> Solution 2 :
>>>> Create another attribute (or element) dedicated to SKOS vocabularies.
>>>> PROs : clean way to do things. Well defined scope, no ambiguities.
>>>> CONs : makes metadata generation more complicated or confusing for data
>>>> providers (generate ucds, utypes, skos, ...). If it is an attribute,
>>>> not possible to describe several SKOS concepts. Requires schema change in
>>>> VOTable 1.3 to allow this solution.
>>> 
>>> This is the only way to go, since it makes things backward compatible.  On the long term, we'll have to use UCDs and utypes as SKOS vocabularies anyway, since that is all it is (and UCD's have already been so expressed); a common use model would make life much easier.  Thus a transition towards a better and more flexible semantic model would push things in the right direction.
>> 
>> I don't think that the element would have to be dedicated to SKOS.
>> 
>> Since each of the SKOS identifiers is a URI, these could be identified from their form, including the vocabulary they're part of.  Thus <http://purl.org/astronomy/vocab/Algorithms/HartreeFock> is from the algorithms vocabulary,<http://purl.org/astronomy/vocab/DataObjectTypes/Image> is from the data object type vocabulary.
>> 
>> This doesn't have to be dedicated to SKOS for the following reason.
>> 
>> If you dereference<http://purl.org/astronomy/vocab/DataObjectTypes/Image>, and follow the redirection, you get more information about the concept, including the statement that it's a skos:Concept.  That is, any other future non-SKOS things could go in the same slot, and if it was necessary to distinguish SKOS from non-SKOS things, for some reason, that information is to hand.
> 
> No, it doesn't have to be skos, but it has to be officially semantic.

I'm not sure what you mean by 'officially semantic'.  All that's 'semantic' about <http://purl.org/astronomy/vocab/DataObjectTypes/Image> is that it does have a reasonably precisely defined meaning, which is documented at the end of the URL.

>> The use of <link> as a generic link is already established in HTML's <head> element, so there's no innovation here.  It simply expresses that there is a link of some sort between the current document and some other resource, with the nature of the link described in the @content-role attribute, orthogonally to @content-type.
>> 
>> The @content-role attribute is declared in the schema as NMTOKEN, so there are no restrictions on what values it can take.  Some future VOTable standard would simply have to note that content-role='type' (say) has a particular meaning, and that's all the standardisation required.  Indeed, since the value of this attribute is unconstrained, and there are no reserved words (appendix A.1 talks of "allowed values", but isn't normative), people could start using content-role='type' _now_, with no standardisation required at all.
>> 
>> Since each utype will have an associated documentation URL (that's still the case, isn't it? or has that changed again?), this would incidentally double as a way of associating utypes with parts of the VOTable, without further standardisation.  That is, people could start doing this tomorrow.
>> 
>> A slight variant on this would be arbitrarily extensible, expressive, and lightweight.
> 
> So you're suggesting a standardized @content-role?  Fine with me, but you still don't have content control.

I am indeed suggesting a standardized @content-role.  I'm also noting, though, that since the value of this is only syntactically constrained in the v1.2 VOTable schema, <link content-role='type' href='http://purl.org/astronomy/vocab/DataObjectTypes/Image' /> is already fully standard from a syntactic point of view, and could be deployed immediately.  All I'd suggest adding to a future VOTable spec would be a remark which said that "a <link> with content-role='type' should be, or has been, taken to mean that...".

I don't think we need worry about constraining the values used, and indeed should not: the value of the link/@href attribute should indeed be any URI you wish, with the interpretation that the link is expected to indicate a 'type' in some rather loose sense.  This will typically (exclusively, to begin with) be a SKOS concept.

If an application doesn't recognise the 'type', perhaps because it's not a SKOS concept, or perhaps because it's a concept in a vocabulary that the application doesn't recognise, that's fine: ignore it -- it's not telling you anything you can make use of.  For extra points, an application _might_ want to dereference the URI, find documentation and labels for the 'type', and display them to the user.  That's a nice trick, but that wouldn't be useful behaviour for all applications, and most applications would probably stick to just matching strings.

Let me stress that this is 'loose' only from the point of view of someone who cares about ontologies and all that, and so is concerned about what they can formally/logically deduce from this annotation. From everyone else's point of view, this is probably quite precise enough to do useful work with, since it's at least as precise a relation as with UCDs and (what I've understood about) utypes.

> Yes and no.   Yes, this would be terrible for human readers and we would have to blush as we watch it easily perform the function we need, but no - I don't see any other formal (i.e. content-controlled) means of doing it with attributes, which means being forced to use whole tags:
> 
> 	<someVOTag ....>
> 		...
> 		<link content-role="semantic">iau93:Quasar</link>
> 		<link content-role="semantic">ucd:src.redshift</link>
> 	</someVOTag>

Looking at the VOTable schema, I think that would be:

    <someVOElement>
        <link content-role='type' href='http://www.ivoa.net/rdf/IAUT93#Quasar' />
        <link content-role='type' href='http://www.ivoa.net/rdf/UCD#src.redshift' />
        <link content-role='type' href='http://www.ivoa.net/myStandard/my-favourite-utype' />
    </someVOElement>

(preferring 'type' to the rather obscure-looking 'semantic').  I don't think that's too ugly, and while it would have to be done slightly differently for other IVOA schemas, would be a very straightforwardly portable pattern.  You could put several URIs into a single attribute, I expect, but it would require more application-layer parsing work to split them up, and gain little.

> My point was that the present schema formalism makes it difficult to do what we would need to do, so the solution is going to be a kludge or compromise of some sort, no matter what.

I disagree: using LINK for this purpose isn't even really repurposing it, so it's a fortiori not a kludge.

Taking a quick look at the TAP document, one could imagine permitting the table TAP_SCHEMA.columns to have a column 'type', of type varchar, whose value is a space-separated list of URIs with the same semantics as above.  The TAP spec indicates what columns must be present, but doesn't constrain them as far as I can see.  That's no kludge, either, surely.

All the best,

Norman

-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK