linking capabilities with tablesets

gilles landais gilles.landais at astro.unistra.fr
Mon Mar 25 14:50:27 CET 2024


:)


Changing the VizieR granularity from catalogue to datasets is of course 
not possible for so many reason that I will not detail here.
Just keep in mind that a catalogue correspond to a reference article and 
can contain several datasets(tables).
The logic to gather these datasets in a single entity that includes 
common metadata is a valuable capability of the registry.

Note that the problem of service-tablesets association is not a VizieR 
issue -  other data-centers , other records reveal the same problem (eg: 
search ukidss)

Linking tablesets with its interfaces seem to be natural.


Talking about protocol  - ok, but I think that it is another debate!
I agree and in favor to create a new SCS, more DALI compliant and that 
would serve tablesets collection - It would be a good feature, may be 
better included in the registry. But it doesn't exist yet and when it 
will,  this  migration of an architecture with service specific to 
dataset  to an architecture that serve tablesets collection (like TAP)  
will impact a non negligible part of the VO architecture.
The idea to link VOResource and VODataService is, for my point of view,  
an adaptation of the current architecture to repair an imperfection.

For the story, we discovered this issue recently when we added notebooks 
in order to educate our users with pyvo.


Regards,

Gilles & Manon




Le 22/03/2024 à 09:32, Markus Demleitner via registry a écrit :
> Dear Registry,
>
> On Thu, Mar 21, 2024 at 04:53:36PM +0100, gilles landais via registry wrote:
>> <capability xsi:type="cs:ConeSearch"
>> standardID="ivo://ivoa.net/std/ConeSearch">
>>      <serves>
>>          <reftable name='@table1' />
>>          <reftable name='@table2' />
>>      </serves>
>> </capability>
>>
>> ...
>>
>> <tableset>
>>      <schema>
>>         <table>
>>             <name>table1</name>
>> ....
> Hmha... this has come up now and then before, and I cannot say I'm
> *particularly* keen on this kind of thing.  The main reason is that
> inter-branch references in XML have a way of getting out of hand.  In
> this particular case, also think about the discovery pattern once you
> map this into RegTAP; everything I can think of off-hand looks fairly
> ugly.
>
> Still, *if* you want to go for it, the thing to touch is the SCS
> standard, which should specify its Registry schema (taking over from
> SimpleDALRegExt).  In there, for simplicity I'd just define an
> element <queriesTable> (say), which would just contain the table name
> without further syntax (did you have a special reason to add the "@"
> in your example?).  Cone search only queries one table, so this could
> be maxOccurs="1", which further simplifies usage.
>
> In RegTAP, this would be mapped into rr.res_detail, the xpath would be
> "/queriesTable".
>
> But as I said: the ugly part of this is the client work; try an
> implementation of the discovery of this before you start writing the
> actual specification.
>
>
> Me, I'd prefer a different way to clean this up.  Part of it is
> metadata work, the rest may be protocol work.
>
>
> Metadata work
> -------------
>
> Part of this problem is the VizieR policy to group all tables
> belonging to one publication into one VO resource.  Admittedly, this
> the the right thing to do in several contexts, in particular if
> multiple tables primarily work as one unit.  A simple test would be:
> Does it make sense to JOIN these tables?  If it is (classic example:
> RegTAP), it should probably be a single resource.
>
> In other important cases, and I believe many of the multiple-SCS
> resources are of that sort, the publication-induces-resource policy
> leads to clumsy discovery.
>
> Let me invent a paper "Recent discoveries with the Volute Radio
> Dish", which contains three tables, "Cataclysmic Stars", "Radio
> Galaxies and QSOs", and "Solar System Objects".  I think these should
> become three resources, two of which would have SCS services, an one
> probably an EPN-TAP one.
>
> Sure, you *could* make a resource that has all the relevant
> capabilities, all the necessary subject keywords, and had a
> description with three sections that discuss the various collections
> in turn.
>
> But that would be painful for a machine that, say, iterates over all
> cone searches associated to records giving
> #cataclysmic-variable-stars (or wider) as a subject; I don't think
> there is a way for them to avoid hitting the QSO table and perhaps
> even puzzle about the EPN-TAP service.
>
> I always liked the term "unit of discovery"; what that is keeps
> requiring thought and may even change over time as use cases change.
> But I'm pretty sure at this time you should not, say, stick both
> gaia_source and the light curves into one resource record (as in
> I/355).  Think of metadata like "which product type do you serve?" as
> inhttps://github.com/ivoa-std/VODataService/pull/1.
>
> Punting such decisions down to the capability level is a recipe for
> constant grief and feature creep on the levels of tables and
> capabilities, which would keep growing metadata we already have
> mapped at the resource level.  Properly dissecting tables and
> services into well-fitting units of discovery saves that grief and
> keeps the whole system manageable, with reasonable, at least
> potentially expectable queries.
>
> Of course I realise that even figuring out which of the existing
> multi-SCS resources "should" (in my reasoning) be split up is a
> herculanean task that's not easily tackled.  But it's probably not
> orders of magnitude more complicated than teaching the SCS clients
> the discovery patterns you need with the table references, not to
> mention the effort of repeating resource-level metadata in tablesets
> and capabilities.
>
> By the way, that latter way would also include making the table
> descriptions hit by the standard freetext queries (which neither
> TOPCAT nor pyVO nor, to my knowledge, anyone else does these days).
> That's another aspect of that I don't like.
>
> For me (who's not VizieR), it's easy to say that I'd rather spend
> work on aligning the metadata models than on doing client work, but
> I'll say it anyway :-)
>
>
> Potocol Work
> ------------
>
> For the other cases, where there is, in essence, a single resource
> that has multiple cone searches (perhaps: multiple epochs of the same
> things or so), I believe the right way to deal with them is fixing
> SCS.
>
> That fix would be to let you declare just one SCS capability, but
> the service then requires passing in a table name.  I distinctly
> remember that SCS at one point already had a facility for passing in
> table names, but I don't find it any more.
>
> Let's do this again, and in the cases when there really *are*
> multiple SCS-able tables in a resource, the associated SCS service
> will send by a nice error message when clients don't pass in a table
> name.  This then fixes the problem of people blindly querying a
> random table when there is more than one -- they get an error message
> if they're not explicit about what exactly they want.
>
> Giving users some UI to choose the tables then actually *is* client
> work, but it's client work that is in line with our current (and IMHO
> reasonable) discovery practices.
>
>
> Sorry for that sermon; but talking about what to discover when and how
> never is simple, and we've gotten it wrong several times before in
> the past.  Fixing things after the fact is even harder (cf.
> discovering data collections...).
>
> Thanks,
>
>            Markus
>


More information about the registry mailing list