Datalink vocabulary extension: sibling/co-generated

François Bonnarel francois.bonnarel at astro.unistra.fr
Mon May 11 22:26:12 CEST 2020


Hi Paul, Markus, all

Le 11/05/2020 à 11:37, Markus Demleitner a écrit :
> Hi Paul,
>
> On Mon, May 11, 2020 at 09:09:27AM +0000, Paul Harrison wrote:
>>> On 2020-05 -07, at 10:58, Markus Demleitner <msdemlei at ari.uni-heidelberg.de> wrote:
>>> for #co-generated?  And how much would you dislike it? [as in: enough
>>> to block the VEP?  Because then I'd probably save the time until
>>> we're closer to consensus or someone else goes ahead with
>>> #co-generated]
>> It strikes me that as finding the correct term is proving so
>> difficult then it is indeed the description/concept that is the
>> problem, and perhaps it should be dropped entirely - I get the
> Hmmm, no, frankly, in this case I am by now pretty much convinced
> we've identified a useful concept, and people in the debate haven't
> really been contesting it for quite some time now.  I think by now
> it's really about "choose a word that gives people who don't bother
> to read the description the right idea".
I agree with you Markus. The concept is clear (see below the family of 
use cases) it's the word we have to find.
>
>> I always like to keep things simple and imagine what a fairly dumb
> Sure -- but keeping things as simple as possible is difficult not
> only because you have to fend off feature creep but also because
> making it simpler than possible is another danger.  And...
>
>> client is going to do with the information to further process it
>> rather than what a human looking at the pointed to information can
>> do, and I generally think that in many cases you cannot really do
>> better than “related” - Markus had said something similar earlier
>> in the thread
> ..."related" is just too simple.  If it weren't related, it wouldn't
> be in the datalink document in the first place.
>
> For the datalink vocabulary, the central question I think is "what
> would a reasonably smart client do with it?".  And I've argued for a
> long time: Organise the datalink matches in a tree, preferably in a
> consistent order so users' muscle memory can kick in.
>
> In that way, users don't see 200 links in the pop up on a
> datalink-enabled result but something like
>
>    Dataset
>    Preview
>    Derived data >
>    Progenitors and Calibration data >
>    ...
agreed.
> -- and then you can follow the non-atomic items down in sub-menus.
>
> That way, people get to access the most frquently used stuff in a
> common place, and you can still rather predictably place even fairly
> obscure stuff.
>
> And while you're right, we're categorising along provenance relations
> here, I think that absolutely makes sense for datalink (which is why
> we've had #progenitor and #derivation from day one) because it
> intersects with what you'd like to do with the data:
>
> Debugging? -> #progenitor [wait for my VEP pushing #calibration below
>    #progenitor]
> Analyses other people have made? -> #derivation
> Other things people have done with the observation? -> #sibling/#co-derived

yes I also agree with that description (except I want to see what the 
new VEP calibration/progenitor linkage) would look like

DataLink is not reproducing Provenance here but it has a strong 
brotherhood with it. These are actually single relationships extracted 
from the ProvDM which we explore.

That's why I proposed "co-generated" and now "co-derived". In the 
context of provenance co-generated means the  pair of sides of the 
"DataLink"  "WasGeneratedBy"  a common activity using the progenitor

"Co-derived" is probably better because "WasDerivedFrom" bypasses the 
activity and makes a shortcut directly between progenitors and generated 
entities.

The granularity of these activities and what they really encompass as 
detailed actions is a choice of the implementer in such a way that a 
WasDerivedFrom relation is always possible between progenitors and results

The family of use cases is represented by the Gaia Time Series. These 
TimeSeries are to be linked in some way to records in the Gaia main 
catalog. Both of them come from the same observations however.

on the other side "sibling" according to the definitions I fond on the 
web are "brother and sisters". In some way they are coderved for sure. 
But they also seem to be of the same nature "same species"

which is not implied by co-derived and which is not generally the case 
in this family of use cases.

the record in the main catalog is not the brother/sister of the TimeSeries.

>
> What's still missing is "Other things I might like to know
> about...<ugh>", as in, perhaps "HST spectra taken for items shown on
> this image".   I think that's, indeed, a tough one because you'd have
> to say what <ugh> is, and datalink doesn't really have a way to have
> <ugh> anything but "the thing you've just discovered", and that
> absolutely does not cover the use case I've made up above (where
> <ugh> is "objects in this image" rather than "this image").

Yes., Markus. The family use cases for that is very large and encompass 
everything like finding charts or cross-correlated objects or datasets.

One of the attempts in this direction was made by Carlos Rodrigo and 
Enrique Solana a couple of years ago.

The use case and prototype was presented in Shanghai By Carlos: 
https://wiki.ivoa.net/internal/IVOA/InterOpMay2017-DAL/dlink-shanghai.pdf

For this familiy of use cases I now propose (my unachieved VEP005) 
"counterpart"

Some more on next Paul's email

Cheers

François


>
> Given that, for now I'd like to agree with you on:
>
>> Because linking is a very powerful tool, there is a tendency to
>> want to put in lots of links,  but can you be sure that you are
>> presenting a “complete” set of links. The IVOA has a whole lot of
>> “discovery” protocols which are in competition with these “hard”
> I hope as people get used to things like the Aladin 10's discovery
> tree it'll be more natural to them to look for the HST spectra (or
> any spectra at all) using a generic client, rather than get them
> pre-packaged from the service operator.
>
>           -- Markus
>
> (perhaps getting carried away a bit in that last paragraph...)


More information about the apps mailing list