Statistics metadata in TAP
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Fri Oct 21 22:10:42 CEST 2016
Hi all,
On Fri, Oct 21, 2016 at 12:34:18PM +0200, Gregory Mantelet wrote:
> To answer to Thomas, yes, I originally planned to provide also histograms in
> the TAP metadata, but since it is not as simple information as the others, I
> preferred to go step by step and keep that idea for later.
>
> As proposes Tom, providing histograms in TAP could be a second interesting
> step. But I admit being a bit sceptical about the interest from the users
> for such information. For the moment, since the moment the statistics I
> presented are online, I have very few download (nearly none) of the
I've always considered this information to be of most interest to
other machines, in particular when planning queries spanning several
services (whatever happened to OGSA-DAI?).
In Postgres, there's a view with (approximate) planner statistics.
Of course, they're using arrays, but I'd say that'd not much of a
problem for TAP -- we don't want to match against these columns or
pull individual array elements from them. We just want to retrieve
them, and since VOTable can represent these values, missing support
of array *operations* in ADQL to me wouldn't be a blocker..
Here's what's currently in pg_stats:
https://www.postgresql.org/docs/current/static/view-pg-stats.html
Without ever having touched anything doing SQL query planning, I'd
say null_frac, n_distinct, histogram_bounds (though that's anyarray
in the postgres incarnation, which I'd rather not have in TAP_SCHEMA,
not even optionally), and elem_count_histogram would appear the most
useful to me. I'd guess a suitable client could even make fairly
attractive displays from those (where I'd presumably not bother with
non-float columns in TAP).
> 1- A column of type ARRAY (1D or 2D depending of whether the histogram
> is contiguous or not). Very simple and minimalist solution, but ARRAY are
> not yet supported in TAP.
That one has my vote, just like postgres does it (an array of bin
boundaries and an array of values).
-- Markus
More information about the dal
mailing list