Statistics metadata in TAP

Thomas Boch thomas.boch at astro.unistra.fr
Fri Oct 21 09:15:05 CEST 2016


Hi Gregory,

I like the idea and I also like your histograms.
Did you think about how you could expose the histograms through TAP?

Cheers,
Thomas

Le 20/10/2016 à 23:07, Gregory Mantelet a écrit :
>
> Dear DAL and Apps members,
>
> Since I do not attend to this interop, I would like to highlight 
> quickly one of my last development concerning TAP because I think it 
> may be interested to either do the same in your own TAP service or 
> merely use it. As suggested by the title of this email it is about 
> adding metadata in TAP.
>
> (I send this email also to Apps because of the last point I make in 
> this email: a compatibility with a new feature of TOPCAT)
>
>
> ** Columns metadata
>
> The idea is to add basic statistics like a count, min, max, ... for 
> some numerical columns of tables published in a TAP service. For that, 
> I have just added the following columns in TAP_SCHEMA.columns:
>
>     - min_value
>     - max_value
>     - mean
>     - std_dev
>     - q1          (i.e. first quartile)
>     - median (i.e. second quartile)
>     - q3          (i.e. third quartile)
>     - filling     (number of rows having a NOT NULL value for this 
> column)
>
> Except for "filling" which must be an integer (INTEGER or BIGINT in 
> PostgreSQL), I have chosen to set all these columns as DOUBLE 
> PRECISION since most of the columns to describe are, in the "worst" 
> case, double values.
>
> When no statistics can be provided for a column, all these additional 
> metadata would be NULL.
>
>
> ** Tables metadata
>
> In addition, I have also added another column in TAP_SCHEMA.tables:
>
>     - row_count (of type INTEGER or BIGINT)
>
>
> ** VOSI description of tables
>
> Since in TAP all tables and columns metadata MUST be the same in 
> TAP_SCHEMA and /tables, I have also updated our /tables resource.
>
> Besides, on a recommendation of Mark Taylor, I designed and linked a 
> simple XSD schema  in order to have a valid XML document. You can find 
> this schema at the following address:
>
> http://gaia.ari.uni-heidelberg.de/tap-stats.xsd
>
>
> ** Visibility in TOPCAT
>
> Thanks to Mark Taylor, any custom metadata (non-standard TAP columns) 
> can be displayed in the last version of TOPCAT. Thus, all the 
> statistics described above can be seen there for our Gaia TAP service 
> (n.b. you can find this TAP service easily in the registry with the 
> keywords "Gaia" and "ARI", but in case you can not, here is the root 
> access URL: http://gaia.ari.uni-heidelberg.de/tap).
>
>
> ** Last words...
>
> According to me all these basic statistics may be useful to discover 
> the content of a table, especially when this one is as large as Gaia, 
> PPMXL, 2MASS, ... It can indeed prevent some users to perform 
> apparently simple and short queries such as "SELECT COUNT(*) FROM 
> a_big_table" which on the contrary to what most people think is not 
> often a quick query on large tables. Having already computed such 
> information is then an economy of time and resources for the users and 
> the server.
>
> Finally, I am not trying to convince anybody to have such metadata, 
> but I just want to highlight a possible extension of TAP helping in 
> simple data discovery. Besides, this use-case also demonstrates how 
> easy it could be to add more simple metadata inside a TAP service. So 
> maybe it could be interested, if possible, to write an appendix about 
> that in the next version of TAP or just as an IVOA note. What do you 
> think?
>
> If anybody has questions or wants further details about the TAP 
> "extension" I presented here, do not hesitate to ask ; I am not at the 
> interop, but I am fully available by email
>
> Regards,
> Grégory
>
>
> PS: For those who are interested, I also provide histograms and 
> sky-maps (using Healpix) for most of the published columns on the page 
> http://gaia.ari.uni-heidelberg.de/tap/tables. Both can be downloaded 
> as images but also as tables that you can then display/process as you 
> want (e.g. display the histogram in TOPCAT, display and navigate 
> inside the Healpix map in Aladin, ...).
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20161021/40a6429f/attachment.html>


More information about the dal mailing list