Statistics metadata in TAP

Walter Landry wlandry at caltech.edu
Fri Oct 21 01:29:19 CEST 2016


Gregory Mantelet <gmantele at ari.uni-heidelberg.de> wrote:
> ** Columns metadata
> 
> The idea is to add basic statistics like a count, min, max, ... for
> some numerical columns of tables published in a TAP service. For that,
> I have just added the following columns in TAP_SCHEMA.columns:
> 
>     - min_value
>     - max_value
>     - mean
>     - std_dev
>     - q1          (i.e. first quartile)
>     - median (i.e. second quartile)
>     - q3          (i.e. third quartile)
>     - filling     (number of rows having a NOT NULL value for this column)

As a data point, at IRSA we already calculate min, max, and number of
rows for internal purposes.  Mean, std_dev, and filling would not be
difficult to calculate at the same time.  Quartiles would be somewhat
onerous.  We have rather large tables that grow over time (the project
takes more data), and calculating the quartiles requires either sorting
the data or lots of external storage.

As a side point, I am a little worried about what it means to take the
mean of a table with NULL's.  I can define it, but I do not know if I
like it.

Cheers,
Walter Landry


More information about the dal mailing list