Datalink Feedback VI: Semantics
François Bonnarel
francois.bonnarel at astro.unistra.fr
Mon Apr 14 06:42:25 PDT 2014
Apparently the previous attached file was bad.
Thanks to Markus I am able to send you the right one.
Cheers
François
Le 14/04/2014 15:02, François Bonnarel a écrit :
>
> Hi Pat, Markus, all
> As far as the "full retrieval" is concerned (self) I don't think it
> belongs to the dataset. I think "dataset reterieval" is some kind of
> service and should be described via the "service-def" field.
> Secondly, i think we should not delegate the minimal definition
> of a vocubulary to another group or document. If it is to be usefull
> this field has to be provided with a minimal usefull content from the
> beginning. This doesn't mean that a hierarchical refinment like it's
> proposed by Markus will not be usefull. With Mireille and Laurent I
> proposed something very similar to this hierarchy in November (see
> appended file).
> If we want to organize a Skos vocabulary, I think the datalink
> document has to provide the list of broader concepts, which could be
> refined now or later in the external document. So this basic "broad
> concepts" list has to be solid.
> I agree with Markus that basically the concepts driven by the
> semantics field are better to be types.
> By the way, if we want to refine the relationship between the
> dataset and what is retrieved through the link we could use the
> concept hierarchical refinment to do it.
> For example:
> science
> catalog
> external_reference_catalog
> could be different relationship to the daset than
> science
> catalog
> extracted_sources
>
>
> Best regards
> François
>
> Le 08/04/2014 19:09, Patrick Dowler a écrit :
>>
>> I completely agree with this:
>>
>> - has to be useful for software to use semantcis to make decisions;
>> the description is for people
>> - vocab should be maintained outside the DataLink specification
>>
>> with the caveat that I don't have much of an opinion on how such a
>> vocabulary should be maintained or what for it should take. It does
>> seem like SKOS should be looked at first.
>>
>> In the next revision, I will change the language here to refer to an
>> external vocabulary. That requires a URL where
>> people/developers/software can go and find "words"... Someone
>> (Markus? Norman?) should propose a URL and put some minimal content
>> there. Where do we write down the responsibility/rules for
>> maintaining it?
>>
>> Pat
>>
>> On 07/04/14 10:59 AM, Markus Demleitner wrote:
>>> Just when you thought it'd be safe to read DAL again... here's
>>> another one.
>>>
>>> This time it's about 3.2.6, semantics, on which the WD says that it's
>>> a column containing "a single word (or comma-separated list?) from a
>>> small vocabulary that describes the meaning of this link relative to
>>> the dataset."
>>>
>>> First, I'd like to strongly suggest that a single word should do, as
>>> I'd
>>> say we've done a bad job in vocabulary construction if overlaps are
>>> common; also, people tend to get enumerations wrong (see, e.g.,
>>> content.level in VOResource).
>>>
>>> But my main concern is: what are the valid values here? If it's a
>>> closed vocabulary (and I think that's almost the right design
>>> descision), we had better get this right. Which here means: useful.
>>> Which begs the question: Useful for what?
>>>
>>> So: What are the use cases for semantics?
>>>
>>> I believe semantics should be for machines what description is for
>>> humans: It should allow machines a selection of links of interest to
>>> them, or to do a preselection based on what the user supposedly is
>>> interested in.
>>>
>>> What could the selection tasks be?
>>>
>>> One is clear to me: "Retrieve the full, original dataset." I had
>>> suggested "self" as a term for that, and I stand by it. "science" is
>>> IMHO too general for that, as it would cover things like "joined
>>> Echelle
>>> spectrum", too.
>>>
>>> For there on, it gets murky, and what I think we should do is go out
>>> to service operators and client writers and pipeline builders and
>>> explain things to them until they come up with use cases. One thing
>>> that came up recently for me is "mask for contaminated areas of an
>>> image". That's fairly common, and in an extraction or analysis task,
>>> it's useful to have, and a machine presumably could automatically do
>>> something with it.
>>>
>>> I suspect there are quite a few cases like this, but we don't have
>>> them,
>>> and new ones might come up. I'd therefore suggest to have the
>>> vocabulary in an external resource mainained by the DAL chairs, a
>>> SKOS vocabulary, which is easy enough for clients to interpret. They
>>> might thus still figure out that a mask is an auxillary file although
>>> it's never heard of mask as a term before (this is, e.g., for
>>> preselection on behalf of the user).
>>>
>>> Based on this and the "preselection on behalf of the user" use case,
>>> here's a proposal for the start of the vocabulary (with indentation
>>> meaning a narrower-than relation; I volunteer for turning this into
>>> well-formed SKOS if people agree this is where we want to go).
>>>
>>> self (the full main dataset)
>>> science (science data related to or generated from the main dataset)
>>> derivation
>>> source-list
>>> joined-dataset (e.g., stacked images, joined spectrum)
>>> source-file (something the current data was made from)
>>> calibration
>>> preview
>>> info
>>> log
>>> auxillary
>>> mask
>>>
>>> Maybe that's enough to get people (i.e., pipeline authors, client
>>> writers, all the later users of datalink) dreaming?
>>>
>>> Pat has also raised the question of whether the terms should
>>> actually be
>>> names of relations. I believe this is inspired by RDF-like triples,
>>> which look somewhat like
>>>
>>> <entity1> <relation> <entity2>.
>>>
>>> Since entity2 could be "the link in this row" and entity1 "the dataset
>>> referred to in the id column", I might like this relation thing. But
>>> then I didn't quite like that in the end. Here's how that came
>>> about: Suppose entity2 really is a big log of observation entries
>>> for the Z observatory.
>>>
>>> Now, what's <relation> in
>>>
>>> ivo://x.ogs/data?exposure1 <relation> the Z observatory log?
>>>
>>>
>>> is-logged-in? has-an-entry-in? Don't like it, seems very artificial.
>>> And indeed, what we're talking about here simply is a file, a
>>> dataset, "a thing", and semantics IMHO shouldn't do more than say
>>> what kind of thing. Hence, semantics should contain a noun,
>>> specifically, a noun that's narrower than "scientifically relevant
>>> data" or "service producing scientifically relevant data". For that,
>>> we don't need RDF, the notion of triples, or relations. The
>>> computationally much simpler plain vocabulary suffices.
>>>
>>> And I'd consider that good news.
>>>
>>> Cheers,
>>>
>>> Markus
>>>
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DL_semanticField.pdf
Type: text/pdf
Size: 59635 bytes
Desc: not available
URL: <http://www.ivoa.net/pipermail/dal/attachments/20140414/6c41a882/attachment-0001.bin>
More information about the dal
mailing list