Statistics metadata in TAP
Patrick Dowler
pdowler.cadc at gmail.com
Sat Oct 22 11:52:30 CEST 2016
I have added this to the list of things to discuss in the TAP part of
DAL2 this afternoon.
Pat
On 21 October 2016 at 13:10, Markus Demleitner
<msdemlei at ari.uni-heidelberg.de> wrote:
> Hi all,
>
> On Fri, Oct 21, 2016 at 12:34:18PM +0200, Gregory Mantelet wrote:
>> To answer to Thomas, yes, I originally planned to provide also histograms in
>> the TAP metadata, but since it is not as simple information as the others, I
>> preferred to go step by step and keep that idea for later.
>>
>> As proposes Tom, providing histograms in TAP could be a second interesting
>> step. But I admit being a bit sceptical about the interest from the users
>> for such information. For the moment, since the moment the statistics I
>> presented are online, I have very few download (nearly none) of the
>
> I've always considered this information to be of most interest to
> other machines, in particular when planning queries spanning several
> services (whatever happened to OGSA-DAI?).
>
> In Postgres, there's a view with (approximate) planner statistics.
> Of course, they're using arrays, but I'd say that'd not much of a
> problem for TAP -- we don't want to match against these columns or
> pull individual array elements from them. We just want to retrieve
> them, and since VOTable can represent these values, missing support
> of array *operations* in ADQL to me wouldn't be a blocker..
>
> Here's what's currently in pg_stats:
>
> https://www.postgresql.org/docs/current/static/view-pg-stats.html
>
> Without ever having touched anything doing SQL query planning, I'd
> say null_frac, n_distinct, histogram_bounds (though that's anyarray
> in the postgres incarnation, which I'd rather not have in TAP_SCHEMA,
> not even optionally), and elem_count_histogram would appear the most
> useful to me. I'd guess a suitable client could even make fairly
> attractive displays from those (where I'd presumably not bother with
> non-float columns in TAP).
>
>> 1- A column of type ARRAY (1D or 2D depending of whether the histogram
>> is contiguous or not). Very simple and minimalist solution, but ARRAY are
>> not yet supported in TAP.
>
> That one has my vote, just like postgres does it (an array of bin
> boundaries and an array of values).
>
> -- Markus
--
Patrick Dowler
Canadian Astronomy Data Centre
Victoria, BC, Canada
More information about the dal
mailing list