Statistics metadata in TAP
Walter Landry
wlandry at caltech.edu
Fri Oct 21 01:29:19 CEST 2016
Gregory Mantelet <gmantele at ari.uni-heidelberg.de> wrote:
> ** Columns metadata
>
> The idea is to add basic statistics like a count, min, max, ... for
> some numerical columns of tables published in a TAP service. For that,
> I have just added the following columns in TAP_SCHEMA.columns:
>
> - min_value
> - max_value
> - mean
> - std_dev
> - q1 (i.e. first quartile)
> - median (i.e. second quartile)
> - q3 (i.e. third quartile)
> - filling (number of rows having a NOT NULL value for this column)
As a data point, at IRSA we already calculate min, max, and number of
rows for internal purposes. Mean, std_dev, and filling would not be
difficult to calculate at the same time. Quartiles would be somewhat
onerous. We have rather large tables that grow over time (the project
takes more data), and calculating the quartiles requires either sorting
the data or lots of external storage.
As a side point, I am a little worried about what it means to take the
mean of a table with NULL's. I can define it, but I do not know if I
like it.
Cheers,
Walter Landry
More information about the apps
mailing list