Datalink Feedback VI: Semantics

François Bonnarel francois.bonnarel at astro.unistra.fr
Mon Apr 14 06:42:25 PDT 2014


Apparently the previous attached file was bad.
Thanks to Markus I am able to send you the right one.
Cheers
François
Le 14/04/2014 15:02, François Bonnarel a écrit :
>
> Hi Pat, Markus, all
> As far as the "full retrieval" is concerned (self) I don't think it 
> belongs to the dataset. I think "dataset reterieval" is some kind of 
> service and should be described via the "service-def" field.
>     Secondly, i think we should not delegate  the minimal definition 
> of a vocubulary to another group or document. If it is to be usefull 
> this field has to be provided with a minimal usefull content from the 
> beginning. This doesn't mean that a hierarchical refinment like it's 
> proposed by Markus will not be usefull. With Mireille and Laurent I 
> proposed something very similar to this hierarchy in November (see 
> appended file).
>     If we want to organize a Skos vocabulary, I think the datalink 
> document has to provide the list of broader concepts, which could be 
> refined now or later in the external document. So this basic "broad 
> concepts" list has to be solid.
>     I agree with Markus that basically the concepts driven by the 
> semantics field are better to be types.
>      By the way, if we want to refine the relationship between the 
> dataset and what is retrieved through the link we could use the 
> concept hierarchical refinment to do it.
>       For example:
>                 science
>                         catalog
>                                external_reference_catalog
>       could be different relationship to the daset than
>                 science
>                        catalog
>                               extracted_sources
>
>
> Best regards
> François
>
> Le 08/04/2014 19:09, Patrick Dowler a écrit :
>>
>> I completely agree with this:
>>
>> - has to be useful for software to use semantcis to make decisions; 
>> the description is for people
>> - vocab should be maintained outside the DataLink specification
>>
>> with the caveat that I don't have much of an opinion on how such a 
>> vocabulary should be maintained or what for it should take. It does 
>> seem like SKOS should be looked at first.
>>
>> In the next revision, I will change the language here to refer to an 
>> external vocabulary. That requires a URL where 
>> people/developers/software can go and find "words"... Someone 
>> (Markus? Norman?) should propose a URL and put some minimal content 
>> there. Where do we write down the responsibility/rules for 
>> maintaining it?
>>
>> Pat
>>
>> On 07/04/14 10:59 AM, Markus Demleitner wrote:
>>> Just when you thought it'd be safe to read DAL again... here's
>>> another one.
>>>
>>> This time it's about 3.2.6, semantics, on which the WD says that it's
>>> a column containing "a single word (or comma-separated list?) from a
>>> small vocabulary that describes the meaning of this link relative to
>>> the dataset."
>>>
>>> First, I'd like to strongly suggest that a single word should do, as 
>>> I'd
>>> say we've done a bad job in vocabulary construction if overlaps are
>>> common; also, people tend to get enumerations wrong (see, e.g.,
>>> content.level in VOResource).
>>>
>>> But my main concern is: what are the valid values here?  If it's a
>>> closed vocabulary (and I think that's almost the right design
>>> descision), we had better get this right.  Which here means: useful.
>>> Which begs the question: Useful for what?
>>>
>>> So: What are the use cases for semantics?
>>>
>>> I believe semantics should be for machines what description is for
>>> humans: It should allow machines a selection of links of interest to
>>> them, or to do a preselection based on what the user supposedly is
>>> interested in.
>>>
>>> What could the selection tasks be?
>>>
>>> One is clear to me: "Retrieve the full, original dataset."  I had
>>> suggested "self" as a term for that, and I stand by it.  "science" is
>>> IMHO too general for that, as it would cover things like "joined 
>>> Echelle
>>> spectrum", too.
>>>
>>> For there on, it gets murky, and what I think we should do is go out
>>> to service operators and client writers and pipeline builders and
>>> explain things to them until they come up with use cases.  One thing
>>> that came up recently for me is "mask for contaminated areas of an
>>> image".  That's fairly common, and in an extraction or analysis task,
>>> it's useful to have, and a machine presumably could automatically do
>>> something with it.
>>>
>>> I suspect there are quite a few cases like this, but we don't have 
>>> them,
>>> and new ones might come up.  I'd therefore suggest to have the
>>> vocabulary in an external resource mainained by the DAL chairs, a
>>> SKOS vocabulary, which is easy enough for clients to interpret.  They
>>> might thus still figure out that a mask is an auxillary file although
>>> it's never heard of mask as a term before (this is, e.g., for
>>> preselection on behalf of the user).
>>>
>>> Based on this and the "preselection on behalf of the user" use case,
>>> here's a proposal for the start of the vocabulary (with indentation
>>> meaning a narrower-than relation; I volunteer for turning this into
>>> well-formed SKOS if people agree this is where we want to go).
>>>
>>> self (the full main dataset)
>>> science (science data related to or generated from the main dataset)
>>>    derivation
>>>      source-list
>>>      joined-dataset (e.g., stacked images, joined spectrum)
>>>    source-file (something the current data was made from)
>>> calibration
>>> preview
>>> info
>>>    log
>>> auxillary
>>>    mask
>>>
>>> Maybe that's enough to get people (i.e., pipeline authors, client
>>> writers, all the later users of datalink) dreaming?
>>>
>>> Pat has also raised the question of whether the terms should 
>>> actually be
>>> names of relations.  I believe this is inspired by RDF-like triples,
>>> which look somewhat like
>>>
>>> <entity1> <relation> <entity2>.
>>>
>>> Since entity2 could be "the link in this row" and entity1 "the dataset
>>> referred to in the id column", I might like this relation thing.  But
>>> then I didn't quite like that in the end.  Here's how that came
>>> about:  Suppose entity2 really is a big log of observation entries
>>> for the Z observatory.
>>>
>>> Now, what's <relation> in
>>>
>>> ivo://x.ogs/data?exposure1 <relation> the Z observatory log?
>>>
>>>
>>> is-logged-in?  has-an-entry-in? Don't like it, seems very artificial.
>>> And indeed, what we're talking about here simply is a file, a
>>> dataset, "a thing", and semantics IMHO shouldn't do more than say
>>> what kind of thing.  Hence, semantics should contain a noun,
>>> specifically, a noun that's narrower than "scientifically relevant
>>> data" or "service producing scientifically relevant data".  For that,
>>> we don't need RDF, the notion of triples, or relations.  The
>>> computationally much simpler plain vocabulary suffices.
>>>
>>> And I'd consider that good news.
>>>
>>> Cheers,
>>>
>>>              Markus
>>>
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: DL_semanticField.pdf
Type: text/pdf
Size: 59635 bytes
Desc: not available
URL: <http://www.ivoa.net/pipermail/dal/attachments/20140414/6c41a882/attachment-0001.bin>


More information about the dal mailing list