<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Hello All,
<div class=""><br class="">
</div>
<div class="">At MAST, we are currently computing and using min and max for numeric columns, and of course could compute the other basic statistics.</div>
<div class=""><br class="">
</div>
<div class="">Our main use of the min/max values is to define bin sizes for histograms that support the user interface for certain queries. We also store histograms for discrete-valued columns. It seems to me that it is worth considering adding this histogram
information as optional TAP metadata.</div>
<div class=""><br class="">
</div>
<div class="">We use 4 tables to store the histogram-related information. Our default number of bins per column is 145, mostly because that gives a decent looking histgram in the UI without being too large. I’m sure these could all be rearranged to some degree
to make it more TAP-suitable, but the current tables look like this:</div>
<div class=""><br class="">
</div>
<div class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> </span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class="">CREATE TABLE [dbo].[histogramBins](<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> [columnName] [varchar](32) NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> [step] [int] NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> [lb] [float] NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> [hb] [float] NULL<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class="">)<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> </span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> </span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class="">CREATE TABLE [dbo].[histogramContinuous](<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> [columnName] [varchar](32) NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> [step] [int] NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> [lb] [float] NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> [hb] [float] NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> [theCount] [bigint] NULL<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class="">)<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> </span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class="">CREATE TABLE [dbo].[histogramDiscrete](<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> [columnName] [varchar](32) NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> [distinctValue] [varchar](2048) NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> [theCount] [bigint] NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> [hasBeenSplit] [char](1) NULL<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class="">)<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> </span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> </span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class="">CREATE TABLE [dbo].[histogramMinMax](<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> [columnName] [varchar](32) NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> [minimum] [float] NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class=""> [maximum] [float] NULL<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New'; color: rgb(31, 73, 125);" class="">)<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">
<span style="color: rgb(31, 73, 125);" class=""> </span></div>
</div>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class="">Tom</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">
<div>
<blockquote type="cite" class="">
<div class="">On Oct 20, 2016, at 5:07 PM, Gregory Mantelet <<a href="mailto:gmantele@ari.uni-heidelberg.de" class="">gmantele@ari.uni-heidelberg.de</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class="">
<div class=""><br class="webkit-block-placeholder">
</div>
<div class="moz-text-flowed" style="font-family: -moz-fixed;
font-size: 12px;" lang="x-unicode">
Dear DAL and Apps members, <br class="">
<br class="">
Since I do not attend to this interop, I would like to highlight quickly one of my last development concerning TAP because I think it may be interested to either do the same in your own TAP service or merely use it. As suggested by the title of this email it
is about adding metadata in TAP. <br class="">
<br class="">
(I send this email also to Apps because of the last point I make in this email: a compatibility with a new feature of TOPCAT)
<br class="">
<br class="">
<br class="">
** Columns metadata <br class="">
<br class="">
The idea is to add basic statistics like a count, min, max, ... for some numerical columns of tables published in a TAP service. For that, I have just added the following columns in TAP_SCHEMA.columns:
<br class="">
<br class="">
- min_value <br class="">
- max_value <br class="">
- mean <br class="">
- std_dev <br class="">
- q1 (i.e. first quartile) <br class="">
- median (i.e. second quartile) <br class="">
- q3 (i.e. third quartile) <br class="">
- filling (number of rows having a NOT NULL value for this column) <br class="">
<br class="">
Except for "filling" which must be an integer (INTEGER or BIGINT in PostgreSQL), I have chosen to set all these columns as DOUBLE PRECISION since most of the columns to describe are, in the "worst" case, double values.
<br class="">
<br class="">
When no statistics can be provided for a column, all these additional metadata would be NULL.
<br class="">
<br class="">
<br class="">
** Tables metadata <br class="">
<br class="">
In addition, I have also added another column in TAP_SCHEMA.tables: <br class="">
<br class="">
- row_count (of type INTEGER or BIGINT) <br class="">
<br class="">
<br class="">
** VOSI description of tables <br class="">
<br class="">
Since in TAP all tables and columns metadata MUST be the same in TAP_SCHEMA and /tables, I have also updated our /tables resource.
<br class="">
<br class="">
Besides, on a recommendation of Mark Taylor, I designed and linked a simple XSD schema in order to have a valid XML document. You can find this schema at the following address:
<br class="">
<br class="">
<a class="moz-txt-link-freetext" href="http://gaia.ari.uni-heidelberg.de/tap-stats.xsd">
http://gaia.ari.uni-heidelberg.de/tap-stats.xsd</a> <br class="">
<br class="">
<br class="">
** Visibility in TOPCAT <br class="">
<br class="">
Thanks to Mark Taylor, any custom metadata (non-standard TAP columns) can be displayed in the last version of TOPCAT. Thus, all the statistics described above can be seen there for our Gaia TAP service (n.b. you can find this TAP service easily in the registry
with the keywords "Gaia" and "ARI", but in case you can not, here is the root access URL:
<a class="moz-txt-link-freetext" href="http://gaia.ari.uni-heidelberg.de/tap">http://gaia.ari.uni-heidelberg.de/tap</a>).
<br class="">
<br class="">
<br class="">
** Last words... <br class="">
<br class="">
According to me all these basic statistics may be useful to discover the content of a table, especially when this one is as large as Gaia, PPMXL, 2MASS, ... It can indeed prevent some users to perform apparently simple and short queries such as "SELECT COUNT(*)
FROM a_big_table" which on the contrary to what most people think is not often a quick query on large tables. Having already computed such information is then an economy of time and resources for the users and the server.
<br class="">
<br class="">
Finally, I am not trying to convince anybody to have such metadata, but I just want to highlight a possible extension of TAP helping in simple data discovery. Besides, this use-case also demonstrates how easy it could be to add more simple metadata inside a
TAP service. So maybe it could be interested, if possible, to write an appendix about that in the next version of TAP or just as an IVOA note. What do you think?
<br class="">
<br class="">
If anybody has questions or wants further details about the TAP "extension" I presented here, do not hesitate to ask ; I am not at the interop, but I am fully available by email
<span class="moz-smiley-s1" title=":)"></span><br class="">
<br class="">
Regards, <br class="">
Grégory <br class="">
<br class="">
<br class="">
PS: For those who are interested, I also provide histograms and sky-maps (using Healpix) for most of the published columns on the page
<a class="moz-txt-link-freetext" href="http://gaia.ari.uni-heidelberg.de/tap/tables">
http://gaia.ari.uni-heidelberg.de/tap/tables</a>. Both can be downloaded as images but also as tables that you can then display/process as you want (e.g. display the histogram in TOPCAT, display and navigate inside the Healpix map in Aladin, ...).
<br class="">
<br class="">
<br class="">
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</body>
</html>