new take on resource registration best practice

Wed Oct 23 06:45:56 PDT 2013

Dear Reg-WG,

On Wed, Oct 23, 2013 at 01:46:26PM +0200, Pierre Le Sidaner wrote:
> Thank you for raising this problem
> I just want us to separate two point :
> The way we want the services and collection to be registered by the
> users. (example for registering one tap with multiple collections)
> The way we want to ingest them in the registry. (or the way we would
> like to retrieve informations)

This is true to some extent -- but of course the design of the
registry data model (and its actual usage) has to keep both the needs
of the publishers and the needs of the searching users in mind.  If
these needs are so different that, in effect, two different data
models (i.e., we shove around pieces of information on resource
ingestion) are necessary, so be it.

But if we can avoid this, we should.

> To register services do we agree that we want to register one end
> point TAP service (one vodataservice) and multiple collections ?

In essence, that has been the plan so far.  To again point out the
problem with this:  Users searching for a certain kind of physics
("data about quasars with columns talking about magnitic flux")
will, in this design, get the collection records.  These do *not*
contain access URLs; to figure those out, they have to inspect
information that is in VOResource relations, which is fairly painful
at least in RegTAP and close to impossible with RI1 registries if you
want to have portable requests.

There are various ways to deal with this; one is what Ray has
suggested.

> In the implementation of REST registry done by Jonathan we will
> consider one collection as one resource and we will add them the
> associate capabilities (the one associated to the related service)
> and it will transparent for the user.

This would be another possibility; even then, some "best practices"
recommendation would be in order; e.g.: should this kind of resolution
only happen for served-by relationships or also for service-for?  If
there's interest I'll be happy elaborate the various entertaining
technical and -- in particular -- sociological issues behind that.

And since the relations point to a complete service: Do you copy all
capabilities?  Just the TAP one?  But the problem we're facing for
TAP also exists of other protocols; for example, I'm running a
federated SIAP service that uses that pattern (in case you're
curious: ivo://org.gavo.dc/lensunion/q/im).

If we figure all that out, we could specify that some sort of copying
of capabilities has to happen on ingestion; that specification needs
to be done regardless if you're querying through ADQL or Solr's query
language.  In any case, you're doing this capabilities-copying, you're
creating a VOResource-user data model distinct from
VOResource-publisher.  How bad is that?  Given the limited extent,
it's probably not catastrophic.

But still: Before we do this, let's think again if we can't keep the
two together.

One possibility still is that we do nothing in VOResource.  Under the
assumption that there's not going to be thousands of "federated"
services, maybe clients could cope with resolving relationsships by
just memoizing the most common federated services?  Maybe queries
against the original VOResource DM can be made natural enough that
this can work?  I believe the three-worlds approach I've described in
my Interop talk --
http://wiki.ivoa.net/internal/IVOA/InterOpSep2013Registry/regtap.pdf
-- is at least workable, for example, even if it is not too
beautiful. Similar approaches are, I guess, possible using Lucene.

Frankly, I personally would probably still rather go with Ray and add
capabilities in what are now data collection records.  It's simple,
it'll not derail the old RI1 registries, and I believe it can be
pulled off with fairly minimal changes -- if at all -- to VOResource.

It is regrettable that DataCollections cannot have capabilities;
unless we change VODataService, we'd have to use CatalogServices or
similar.  These could then have the, say, TAP capability.

There's a catch, however: Let's say someone wants to enumerate all TAP
services -- if all the little data collections all say they have this
TAP capability, they'll have a lot of records.  Things are even worse
for the typed services as for them, all-VO-queries do make sense.  If
all contributing data collections say they have the SSAP capability
for a federated service, a naive all-VO SSA search would hit the
service containing that capabilitiy fairly often.

Therefore, I'd say these "served-by" capabilities should have special
standardIds (maybe just the normal standard ids with "?service-for"
appended?).

This wouldn't necessarily require VOResource changes.  What
would, AFAICS, require those would be the declaration of the ivoid of
the "donor service", i.e., the one that embeds the "real" capability.
If we're *really* afraid of schema changes, we can simply require
that ivoid be given in a relation element.  don't like that much.
I'd much rather have some way to say that in the capability itself --
it's far less brittle and easier  to deal with, in particular with
interfaces returning actual VOResource XML.

All this relies on a fairly strong set of conventions, which
certainly is not ideal.  But right now I'm strongly leaning to liking
this much better than the other two options I've discussed.

But maybe there are more?

Cheers,

        Markus