Discovering Data Collections Within Services Note version 1.0
Kristin Riebe
kriebe at aip.de
Mon Feb 8 14:53:20 CET 2016
Hi Markus & the registry,
>> * aux in full-capability approach:
>> Did I get it right that you propose the full-capability approach with
>> modifications, meaning that all the properties of a service (capability)
>> will be replicated for each record of a dataset that uses this service?
>
> Not necessarily, and at least for the TAP case I propose untyped
> capabilities, as given in the examples.
Okay, so it won't be the full service record that shall be given with
the data, but only it's standardId, a link (URL) to its endpoint and a
publisher? That sounds much better.
> with just a single record still being preferred.
What happens if someone published a, say, SIA service for one dataset,
using one combined service+dataset record for the registry. And later
on, another two or three datasets are added for the same service --
wouldn't that be the point when one goes back and rather splits the
original service+dataset record into 2 records, in order to be able to
just make a link to the service for each new data collection?
So wouldn't that mean: Unless I am really sure that there will be just
one dataset for one service, I should rather make separate records, one
for the service and one for the dataset (with aux-cap.), in case I need
to add additional datasets later on?
That second or third dataset could also be a new release/updated
version. Or is there some other infrastructure already set up for
versions of datasets?
>> * aux in standardId (section 2.1):
>> Does it have to be inserted into the standardId? It somewhat obscures
>> the standardId and looks to me like "misusing" it.
>
> We-ell, I don't agree that this is somehow "soiling" the standard id.
> Technically, a standard can define any number of "terms" (that's the
> "key" element from StandardsRegExt). These terms can refer to all
> kind of things: endpoint types, output formats, whatever.
>
> What we do here is define an endpoint type, i.e., a TAP endpoint
> whose metadata are defined somewhere else. I'd claim this is fairly
> well along the lines of both StandardsRegExt and the usage that
> capabilitiy/@standardID has found in VO practice.
>
> But even if it were a minor bending of the rules (it definitely is
> for the transition-phase identifiers in section 3), adding another
> attribute has the big disadvantage that legacy clients will ignore
> this -- this means that a TAP validator might re-validate VizieR
> 15000 times. Well, this particular service would get fixed (or
> blocked by VizieR) fairly quickly, but don't forget that there's
> quite a bit of infrastructure using the Registry, and so smooth
> transitions are a major concern. Anything that changes the
> behaviour of Registry components towards existing clients carries a
> massive price tag.
Hm, still, from my (admittedly probably very naive) point of view it
still looks like a "hack" to me. I understand the wish to not break
existing services or validators, but would you really want to have that
aux-thing inside the standardId in the long run?
About how many legacy clients (validators etc.) are we talking here?
>> * multiple auxiliary capabilities [...]
>> So I would expect a relatedResource entry at servedBy for the
>> corresponding TAP service (which is already there) and for the SIA
>> service (which is not given).
>
> But it is (there is a problem in that the relationship to the TAP service
> is given twice, which is because my machinery doesn't realise the
> ObsCore and the TAP services are the same; I'll probably fix that).
> But the record correctly declares a served-by each to
>
> ivo://org.gavo.dc/tap and ivo://org.gavo.dc/lensunion/q/im
Oh, I see, I just expected something explicitly containing "SIA"
somewhere, and so I missed that.
>> or (even better?) one should add an attribute (e.g. the standardId?) to
>> each relatedResource that makes it clearer if and which type of
>
> Yes, it is a bit ugly that clients need to dereference the
> references to the related resources to figure out which of them is
> the main record, but the RegTAP query patterns are reasonable,
> whereas adding something to relatedResource would again be a problem
> in terms of migrating existing infrastructure.
What about adding the ivo-Id for the services (e.g.
ivo://org.gavo.dc/lensunion/q/im) to the aux-capabilities instead? In
addition to accessURL?
Then this could give a direct link and no dereferencing is needed.
> Well, updating 15000 records is not as bad as having to create
> another 15000. Even worse, it'd have to be more or less
> instantaneous to give the client writers a chance to maintain their
> sanity.
>
> And then all legacy clients would immediately break.
>
> I'm not saying that's totally out of the question forever -- perhaps
> one could keep some "legacy" searchable registries at the state
> before the flag day for a couple of years. But I think we'd have to
> have *very* tangible and substantial benefits to make that
> worthwhile, and I cannot see them in this case.
Hm, I would say: better clean up now than later on. Later we would have
even more records to repair.
(But maybe "later" never happens or a completely different approach with
a lot of "substantial benefits" will come along. Who knows. :-)
So maybe then it is better to not break existing clients etc.)
> As to cleanliness and elegance -- well, that's for a good part in the
> eye of the beholder. To me, cleanliness and elegance in the Registry
> by now are largely measured in "how hard is it to get the registry
> operators to actually do it?"
It's a pity that it has to be reduced to that. But if that's the case,
then I can see no alternatives to your approach.
>> 3. more complicated queries
>
> No, this is actually a very conceptual concern, and *that* was what
> finally convinced me that even migrating in that direction wouldn't
> fly. I was sure I had nicely laid that out at a recent interop talk,
> but I can't seem to find that now. Well, Fig. 4 from
> http://ads.ari.uni-heidelberg.de/abs/2015A%26C....11...91D will do,
> too: The problem when you do this is that there are essentially four
> classes of tables (or equivalently, metadata) when it comes to joins
> with that scheme. This is what made me decide the split-metadata
> approach won't fly -- there's too much to explain before people can
> write queries.
It seems to me that there is still much to explain with this approach.
It took me quite some time to go through your note ...
> Truth be told, my instinct was a bit like yours for a long while --
> let's go to a Registry with a clear separation of data collections
> and services? After I discovered how ugly the queries become unless
> we totally re-built the Registry, I now have my doubts. Perhaps it's
> for the better that our forefathers built the Registry the way they
> did.
Not really knowing how much effort that would be, I would even vote for
rebuilding the registry, and making a clean separation between datasets
and services. It sounds like this would be a lot of effort now, invested
into a cleaner system for the future.
But of course people would have to be willing to invest into this
effort, and rewrite their systems and clients where necessary.
Cheers,
Kristin
--
-------------------------------------------------------
Dr. Kristin Riebe
E-Science & GAVO
Email: kriebe at aip.de
Phone: +49 331 7499-377
Room: B6/25
-------------------------------------------------------
Leibniz-Institut für Astrophysik Potsdam (AIP)
An der Sternwarte 16, D-14482 Potsdam
Vorstand: Prof. Dr. Matthias Steinmetz, Matthias Winker
Stiftung bürgerlichen Rechts
Stiftungsverzeichnis Brandenburg: 26 742-00/7026
-------------------------------------------------------
More information about the registry
mailing list