Pre-cooked joins
Patrick Dowler
pdowler.cadc at gmail.com
Tue Apr 16 23:14:44 CEST 2024
I did participate in the meeting but (due to a major s/w release last week)
have
not had time to follow up (check the notes or post some of my thoughts in
more
detail). So I'll clarify what I meant that was quoted above and add some
others
to the mix.
> + PD: if your database is natively standards compliant the
> join approach is more normalized and makes sense.
I think what I meant here is that if someone was creating their database
from
scratch and chose to create an ObsCore table, then having a separate table
for
extensions and expecting to join them makes sense and is the more
normalised
approach. So "natively standards compliant" I think was trying to capture
that
one created an ObsCore table and populated it rather than created a "view"
in some
fashion (combination of database view and/or TAP and SIA code to manipulate
or
generate the query).
My understanding is that many data centres would implement ObsCore as a view
on existing underlying table(s). In the case of CADC (and others) where we
use
the CAOM data model, ObsCore is a well defined view that joins two tables
(because
CAOM is more normalised than ObsCore). For anyone who implements
ObsCore as a view on another model, extension views may well be views on
the same
tables and "ObsCore natural join ObsRadio" could be quite complex
underneath.In this
kind of situation, the complete "pre-cooked" views and no joins is more
direct and simple,
but I completely agree that the concept doesn't scale to N extensions and
their
combinations.
Just for completeness, the approach that we take with CAOM to make it
suitable for
describing all kinds of data is to have a single model that covers all
scenarios. We try
to minimise the "sparseness" by emphasizing the C(ommon) when designing the
model
and when it evolves (add new fields) to describe something new. Emphasizing
"common"
means finding ways to satisfy the broadest range of use cases with the
minimum number
of sufficiently generic concepts. The analogy here would be to try to add
things directly to
ObsCore instead of having extensions. CAOM v1 had a smaller "core" and
"archive specific
metadata" (extensions) and that turned out to be a bad idea. Of course, a
large and potentially
sparse model (set of tables) means users have to make good use of the
"select list" (not just
"select * from") and it also helps a lot to use "is not null" effectively
in queries. There are ways
to make both of these easier (TBD).
--
Patrick Dowler
Canadian Astronomy Data Centre
Victoria, BC, Canada
On Mon, 15 Apr 2024 at 00:35, Markus Demleitner via dm <dm at ivoa.net> wrote:
> Dear Colleagues,
>
> On Fri, Apr 12, 2024 at 10:42:42AM +0200, BONNAREL FRANCOIS via dm wrote:
> > You can find the dedicated page for this meeting with presentations and
> > Notes here:
> https://wiki.ivoa.net/twiki/bin/view/IVOA/RadioastronomyInterestGroupSeventhVirtualMeeting
>
> Sorry I couldn't make it to the meeting, but I do care fairly deeply
> about the question of whether we want a pre-cooked obs_radio join or
> we want individual extension tables designed to be NATURAL-ly JOIN-ed
> together. And I am *very* convinced that the pre-cooked joins are
> trouble. On my pain level scale
> <https://blog.g-vo.org/building-consensus.html#scale>, they would be
> an 8+.
>
> The main reason is that obs_radio hopefully will not be the last
> obscore extension we will design. Once the next extension comes in,
> what do we say? That the extensions cannot be used together? I
> think that would be a mistake, as the domains of the extensions are
> not necessarily orthogonal; this was obviously discussed in the
> meeting on the example of time and radio, but I want to emphasise
> this once more, because it is really the central point.
>
> You see, *if* we expect extensions to be used together, you are
> either back at the JOIN-s some of you seem to be skeptical about, or
> you will end up with 2^(n-1) pre-cooked JOIN-s, each joining a subset of
> the set of extensions you implement. That would be a horrible
> nightmare. Let us not do that.
>
> Let me also briefly comment the points made in this connection on
>
> https://wiki.ivoa.net/internal/IVOA/RadioastronomyInterestGroupSeventhVirtualMeeting/_Radio_IG_7th_Running_meeting.txt
> :
>
> > - registration and how to expose the tables.
> > + Separate tables (Markus + ... ) vs. single table (François + ...).
> > + tables identified by standardID set via utype on the table in
> > the registry --> standardIF for the extenSION table or for the
> > extenDED table ?
> > + Combining multiple extensions (time+radio) in the same tables
> > becomes painful.
>
> That's my point above, so +1 (at least).
>
> > + on the other side querying only the extension table makes no
> > sense so why imposing a join to users ?
>
> Because we cannot predict *which* tables users will want to query
> together. And don't worry about the mental load imposed by JOIN-s;
> it's fairly low on NATURAL JOIN-s to begin with, and then just
> provide a few example queries in the document and in your VOSI
> examples. People will cope just fine. Most of them start from
> examples anyway and then tweak them. Also, whoever has passed rookie
> level in ADQL will be loosely familiar with joins: crossmatching
> means JOIN-s, and crossmatching, I claim, is the single most popular
> use case for doing ADQL in the first place.
>
> > + the issue has Two aspects; queries and discovery of the tables.
>
> Marginally on these being two aspects. I claim it turns out
> explicit JOIN-s work fine for both queries and discovery, whereas
> pre-cooked tables are painful in both cases. I won't elaborate this
> here because it seems so obvious to me, but I'm happy to expand on it
> on request.
>
> > + WHICH utype for extended tables : one ad hoc utype
> > (standardID) per extended table or multiple utypes on the same
> > extended single table
>
> Please note that neither VODataService nor RegTAP can deal with
> multiple table utypes at this point, so if we went for multi-utype,
> someone would have to push through changes in both of these
> standards, and presumably breaking ones (what's to end up in the
> current table.table_utype column?) at that.
>
> > + PD: if your database is natively standards compliant the
> > join approach is more normalized and makes sense.
>
> I don't think that's strongly related to what your native tables look
> like, except perhaps if all you have is a single, "pre-extended"
> table. But then defining two simple views on top of that
> pre-extended table is a lot less hassle than forcing those with more
> complex setups to bother with pre-cooked joins.
>
> > + But other implementations implement obscore as a view on an
> > existing database and making a complete view would be better.
> > Joining views is going to be cumbersome. Joining two views
> > that are views on the same underlying table is strange and
> > would require implementers to remove the join.
>
> No, absolutely not. Nested joins are not uncommon at all, and the
> only people that *may* have to worry about them is the people writing
> the database engines. And do not worry: *That* part they have long
> worked out. It's no trouble at all.
>
> > + in the current draft there are the two versions (single table
> > appears in pdf, two tables is there but commented in latex)
>
> As a general point: Please don't comment out text in
> version-controlled material. Delete whatever you want to disappear
> (and fork if you want to explore alternatives). It is *much* simpler
> to follow the edit processes using VCS tools than to squint at
> comment characters. That capability is one of the major reasons to
> have standards in VCS in the first place.
>
> Finally, one last, unrelated point:
>
> > + FB: user defined functions only used for TAP (not for SIA/DAP
> interface)
>
> Even if we want multi-unit in DAP, that is unrelated; you can, of
> course, have independed parameters FREQ, WAVELENGTH, and ENERGY (or
> perhaps BAND and BAND_UNIT) in such an interface regardless of the
> underlying table structure, and if you *really* wanted that kind of
> thing, you could have SPECTRAL_OUTPUT_UNIT on top.
>
> Of course, I believe all of these are terrible ideas, and both the
> conversion of user-preferred units to service units (TOPCAT shows how
> to do it in a few places) and the presentation of spectral
> coordinates are responsibilities of the clients in a well-designed
> system. But let's not complicate the f_min/f_max debate with DAP,
> which simply is unrelated.
>
> Thanks,
>
> Markus
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20240416/2b9e6af6/attachment.htm>
More information about the dm
mailing list