Pre-cooked joins
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Mon Apr 15 09:35:35 CEST 2024
Dear Colleagues,
On Fri, Apr 12, 2024 at 10:42:42AM +0200, BONNAREL FRANCOIS via dm wrote:
> You can find the dedicated page for this meeting with presentations and
> Notes here: https://wiki.ivoa.net/twiki/bin/view/IVOA/RadioastronomyInterestGroupSeventhVirtualMeeting
Sorry I couldn't make it to the meeting, but I do care fairly deeply
about the question of whether we want a pre-cooked obs_radio join or
we want individual extension tables designed to be NATURAL-ly JOIN-ed
together. And I am *very* convinced that the pre-cooked joins are
trouble. On my pain level scale
<https://blog.g-vo.org/building-consensus.html#scale>, they would be
an 8+.
The main reason is that obs_radio hopefully will not be the last
obscore extension we will design. Once the next extension comes in,
what do we say? That the extensions cannot be used together? I
think that would be a mistake, as the domains of the extensions are
not necessarily orthogonal; this was obviously discussed in the
meeting on the example of time and radio, but I want to emphasise
this once more, because it is really the central point.
You see, *if* we expect extensions to be used together, you are
either back at the JOIN-s some of you seem to be skeptical about, or
you will end up with 2^(n-1) pre-cooked JOIN-s, each joining a subset of
the set of extensions you implement. That would be a horrible
nightmare. Let us not do that.
Let me also briefly comment the points made in this connection on
https://wiki.ivoa.net/internal/IVOA/RadioastronomyInterestGroupSeventhVirtualMeeting/_Radio_IG_7th_Running_meeting.txt:
> - registration and how to expose the tables.
> + Separate tables (Markus + ... ) vs. single table (François + ...).
> + tables identified by standardID set via utype on the table in
> the registry --> standardIF for the extenSION table or for the
> extenDED table ?
> + Combining multiple extensions (time+radio) in the same tables
> becomes painful.
That's my point above, so +1 (at least).
> + on the other side querying only the extension table makes no
> sense so why imposing a join to users ?
Because we cannot predict *which* tables users will want to query
together. And don't worry about the mental load imposed by JOIN-s;
it's fairly low on NATURAL JOIN-s to begin with, and then just
provide a few example queries in the document and in your VOSI
examples. People will cope just fine. Most of them start from
examples anyway and then tweak them. Also, whoever has passed rookie
level in ADQL will be loosely familiar with joins: crossmatching
means JOIN-s, and crossmatching, I claim, is the single most popular
use case for doing ADQL in the first place.
> + the issue has Two aspects; queries and discovery of the tables.
Marginally on these being two aspects. I claim it turns out
explicit JOIN-s work fine for both queries and discovery, whereas
pre-cooked tables are painful in both cases. I won't elaborate this
here because it seems so obvious to me, but I'm happy to expand on it
on request.
> + WHICH utype for extended tables : one ad hoc utype
> (standardID) per extended table or multiple utypes on the same
> extended single table
Please note that neither VODataService nor RegTAP can deal with
multiple table utypes at this point, so if we went for multi-utype,
someone would have to push through changes in both of these
standards, and presumably breaking ones (what's to end up in the
current table.table_utype column?) at that.
> + PD: if your database is natively standards compliant the
> join approach is more normalized and makes sense.
I don't think that's strongly related to what your native tables look
like, except perhaps if all you have is a single, "pre-extended"
table. But then defining two simple views on top of that
pre-extended table is a lot less hassle than forcing those with more
complex setups to bother with pre-cooked joins.
> + But other implementations implement obscore as a view on an
> existing database and making a complete view would be better.
> Joining views is going to be cumbersome. Joining two views
> that are views on the same underlying table is strange and
> would require implementers to remove the join.
No, absolutely not. Nested joins are not uncommon at all, and the
only people that *may* have to worry about them is the people writing
the database engines. And do not worry: *That* part they have long
worked out. It's no trouble at all.
> + in the current draft there are the two versions (single table
> appears in pdf, two tables is there but commented in latex)
As a general point: Please don't comment out text in
version-controlled material. Delete whatever you want to disappear
(and fork if you want to explore alternatives). It is *much* simpler
to follow the edit processes using VCS tools than to squint at
comment characters. That capability is one of the major reasons to
have standards in VCS in the first place.
Finally, one last, unrelated point:
> + FB: user defined functions only used for TAP (not for SIA/DAP interface)
Even if we want multi-unit in DAP, that is unrelated; you can, of
course, have independed parameters FREQ, WAVELENGTH, and ENERGY (or
perhaps BAND and BAND_UNIT) in such an interface regardless of the
underlying table structure, and if you *really* wanted that kind of
thing, you could have SPECTRAL_OUTPUT_UNIT on top.
Of course, I believe all of these are terrible ideas, and both the
conversion of user-preferred units to service units (TOPCAT shows how
to do it in a few places) and the presentation of spectral
coordinates are responsibilities of the clients in a well-designed
system. But let's not complicate the f_min/f_max debate with DAP,
which simply is unrelated.
Thanks,
Markus
More information about the dm
mailing list