Discovering Data Collections Within Services Note version 1.0

Mon Feb 15 23:35:09 CET 2016

Hi Markus & Registry,

> meaning -- there's some leeway.  But really, if you change a service
> to move from a single-instrument archive (with the respective metadata
> like "observed at instrument", "PI is XY", and a matching
> description) to a thematic archive (with the respective
> metadata like "multiple instruments", "publisher is creator", and a
> description of the scope of the collection archive), I'd say it's a
> different resource in almost all cases, so it should get a new
> registry record and hence a new identifier.

Okay, so in principle you are saying that service-only records
(publishing multiple data sets) are usually very different from
service-descriptions within a data+service record?
I can see that the descriptions would be different, since the combined
record would probably concentrate more on the description of the
published data ("observed at instrument" is data-related metadata), and
the service-only record would concentrate of course on the service
descriptions. I guess I have to look at more example records to get a
better impression of the big picture.

>> That second or third dataset could also be a new release/updated
>> version. Or is there some other infrastructure already set up for
>> versions of datasets?
>
> Well, that's a bit orthogonal to the question here.  In principle,
> it's possible to register each release separately and then switch the
> assoicate discovery service (the main capability) between a, say,
> "current" and "known-broken-archived", so this would help flexibly
> support all kinds of schemes that keep multiple versions of data
> collections alive; but exactly because I think the proposed discovery
> scheme can essentially accomodate almost all ways to do this, this is
> the wrong place to figure out what's a good idea in that particular
> business and what is not.

Ok, discussion on this postponed. I did not intend to start a new thread
here, I was merely curious.

>> existing services or validators, but would you really want to have that
>> aux-thing inside the standardId in the long run?
>
> Yes -- I maintain this is a completely valid, and indeed intended,
> use of of what StandardsRegExt introduced the standard keys for.

Since no one else is shouting out loud, and I haven't managed to study
the StandardRegExt-Document yet, I trust that you know what you are
doing. :-)

>>>> * multiple auxiliary capabilities [...]
> There's also a minor technical point: What we would *really* want
> here is relationships between *capabilities*, i.e., not only should
> the source be a capability element, so should, in order to be true to
> the theory, the target.  We don't really have a precendent for
> referencing into resource records (except for StandardsRegExt keys,
> which are in a whole different ballpark in that  respect), and
> building one for something that in my view is definitely among the 20%
> functionality that take 80% of the work seemed unwise to me.
>
> So, yes, collection discovery as proposed here is an 80% solution.
> But it does solve these 80% with 20% of the effort, and my feeling so
> far is that the 80% solved probably are all anyone is ever going to
> want to use.

Okay, if this is something that people probably won't use, then the
current version will be enough. I guess if the need arises, one can
discuss again about referencing from capabilities to capabilities in
other records or something like that for an updated version of the note.

> My proposal at this point is: The current Note is easy and fairly
> cheap to try out, and I hope the major TAP operators will push out
> such records fairly soon (I'll push out some more too, soon).  If we
> really see some important use case's requirements are clumsy to
> satisfy with this, there's no big damage done if a (on the VOResource
> level) minor correction becomes necessary later.

Fair enough.

>>> As to cleanliness and elegance -- well, that's for a good part in the
>>> eye of the beholder.  To me, cleanliness and elegance in the Registry
>>> by now are largely measured in "how hard is it to get the registry
>>> operators to actually do it?"
>>
>> It's a pity that it has to be reduced to that. But if that's the case,
>> then I can see no alternatives to your approach.
>
> But
> well, certainly efficiency is an unashamed element of elegance, no?

Well, I am not very practiced in really writing something efficient -
but some of the most efficient codes I have seen were really ugly in the
sense of readability, and probably had some beauty in that. ;-)
But, as you said, it's in the eye of the beholder, so let's not get too
philosophical here.

I think we discussed my main issues, so if everyone else is fine with
the current state of the discovery note, I'll stop here. I think it is a
very useful document, and though I have some doubts about how some
things are done or have to be done, it is better to have a note
explaining a way to handle data discovery than having nothing.

Cheers,

Kristin

-- 
-----------------------------------------------------------------
| Dr. Kristin Riebe
| eScience & GAVO
|
| Email: kriebe at aip.de
| Phone: +49 331 7499-377
| Room:  B6/25
-----------------------------------------------------------------
| Leibniz-Institut für Astrophysik Potsdam (AIP)
| An der Sternwarte 16, D-14482 Potsdam
| Vorstand: Prof. Dr. Matthias Steinmetz 	
|
| Stiftungsverzeichnis Brandenburg: 26 742-00/7026
-----------------------------------------------------------------