Datalink Feedback VI: Semantics
François Bonnarel
francois.bonnarel at astro.unistra.fr
Mon Apr 14 06:02:58 PDT 2014
Hi Pat, Markus, all
As far as the "full retrieval" is concerned (self) I don't think it
belongs to the dataset. I think "dataset reterieval" is some kind of
service and should be described via the "service-def" field.
Secondly, i think we should not delegate the minimal definition of
a vocubulary to another group or document. If it is to be usefull this
field has to be provided with a minimal usefull content from the
beginning. This doesn't mean that a hierarchical refinment like it's
proposed by Markus will not be usefull. With Mireille and Laurent I
proposed something very similar to this hierarchy in November (see
appended file).
If we want to organize a Skos vocabulary, I think the datalink
document has to provide the list of broader concepts, which could be
refined now or later in the external document. So this basic "broad
concepts" list has to be solid.
I agree with Markus that basically the concepts driven by the
semantics field are better to be types.
By the way, if we want to refine the relationship between the
dataset and what is retrieved through the link we could use the concept
hierarchical refinment to do it.
For example:
science
catalog
external_reference_catalog
could be different relationship to the daset than
science
catalog
extracted_sources
Best regards
François
Le 08/04/2014 19:09, Patrick Dowler a écrit :
>
> I completely agree with this:
>
> - has to be useful for software to use semantcis to make decisions;
> the description is for people
> - vocab should be maintained outside the DataLink specification
>
> with the caveat that I don't have much of an opinion on how such a
> vocabulary should be maintained or what for it should take. It does
> seem like SKOS should be looked at first.
>
> In the next revision, I will change the language here to refer to an
> external vocabulary. That requires a URL where
> people/developers/software can go and find "words"... Someone (Markus?
> Norman?) should propose a URL and put some minimal content there.
> Where do we write down the responsibility/rules for maintaining it?
>
> Pat
>
> On 07/04/14 10:59 AM, Markus Demleitner wrote:
>> Just when you thought it'd be safe to read DAL again... here's
>> another one.
>>
>> This time it's about 3.2.6, semantics, on which the WD says that it's
>> a column containing "a single word (or comma-separated list?) from a
>> small vocabulary that describes the meaning of this link relative to
>> the dataset."
>>
>> First, I'd like to strongly suggest that a single word should do, as I'd
>> say we've done a bad job in vocabulary construction if overlaps are
>> common; also, people tend to get enumerations wrong (see, e.g.,
>> content.level in VOResource).
>>
>> But my main concern is: what are the valid values here? If it's a
>> closed vocabulary (and I think that's almost the right design
>> descision), we had better get this right. Which here means: useful.
>> Which begs the question: Useful for what?
>>
>> So: What are the use cases for semantics?
>>
>> I believe semantics should be for machines what description is for
>> humans: It should allow machines a selection of links of interest to
>> them, or to do a preselection based on what the user supposedly is
>> interested in.
>>
>> What could the selection tasks be?
>>
>> One is clear to me: "Retrieve the full, original dataset." I had
>> suggested "self" as a term for that, and I stand by it. "science" is
>> IMHO too general for that, as it would cover things like "joined Echelle
>> spectrum", too.
>>
>> For there on, it gets murky, and what I think we should do is go out
>> to service operators and client writers and pipeline builders and
>> explain things to them until they come up with use cases. One thing
>> that came up recently for me is "mask for contaminated areas of an
>> image". That's fairly common, and in an extraction or analysis task,
>> it's useful to have, and a machine presumably could automatically do
>> something with it.
>>
>> I suspect there are quite a few cases like this, but we don't have them,
>> and new ones might come up. I'd therefore suggest to have the
>> vocabulary in an external resource mainained by the DAL chairs, a
>> SKOS vocabulary, which is easy enough for clients to interpret. They
>> might thus still figure out that a mask is an auxillary file although
>> it's never heard of mask as a term before (this is, e.g., for
>> preselection on behalf of the user).
>>
>> Based on this and the "preselection on behalf of the user" use case,
>> here's a proposal for the start of the vocabulary (with indentation
>> meaning a narrower-than relation; I volunteer for turning this into
>> well-formed SKOS if people agree this is where we want to go).
>>
>> self (the full main dataset)
>> science (science data related to or generated from the main dataset)
>> derivation
>> source-list
>> joined-dataset (e.g., stacked images, joined spectrum)
>> source-file (something the current data was made from)
>> calibration
>> preview
>> info
>> log
>> auxillary
>> mask
>>
>> Maybe that's enough to get people (i.e., pipeline authors, client
>> writers, all the later users of datalink) dreaming?
>>
>> Pat has also raised the question of whether the terms should actually be
>> names of relations. I believe this is inspired by RDF-like triples,
>> which look somewhat like
>>
>> <entity1> <relation> <entity2>.
>>
>> Since entity2 could be "the link in this row" and entity1 "the dataset
>> referred to in the id column", I might like this relation thing. But
>> then I didn't quite like that in the end. Here's how that came
>> about: Suppose entity2 really is a big log of observation entries
>> for the Z observatory.
>>
>> Now, what's <relation> in
>>
>> ivo://x.ogs/data?exposure1 <relation> the Z observatory log?
>>
>>
>> is-logged-in? has-an-entry-in? Don't like it, seems very artificial.
>> And indeed, what we're talking about here simply is a file, a
>> dataset, "a thing", and semantics IMHO shouldn't do more than say
>> what kind of thing. Hence, semantics should contain a noun,
>> specifically, a noun that's narrower than "scientifically relevant
>> data" or "service producing scientifically relevant data". For that,
>> we don't need RDF, the notion of triples, or relations. The
>> computationally much simpler plain vocabulary suffices.
>>
>> And I'd consider that good news.
>>
>> Cheers,
>>
>> Markus
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DL_semanticField.pdf
Type: text/pdf
Size: 2981 bytes
Desc: not available
URL: <http://www.ivoa.net/pipermail/dal/attachments/20140414/ecee4045/attachment.bin>
More information about the dal
mailing list