<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hi Thomas, Tom,<br>
<br>
To answer to Thomas, yes, I originally planned to provide also
histograms in the TAP metadata, but since it is not as simple
information as the others, I preferred to go step by step and keep
that idea for later.<br>
<br>
As proposes Tom, providing histograms in TAP could be a second
interesting step. But I admit being a bit sceptical about the
interest from the users for such information. For the moment,
since the moment the statistics I presented are online, I have
very few download (nearly none) of the histograms ; maybe because
this service is not so well known for the moment or maybe because
the image of the histogram is enough for the users, but I doubt
about all these reasons. Then, maybe users are merely not aware
yet of such possibility and if they were, maybe they would use it.
So, I do not know...<br>
<br>
These sceptical ideas apart, in order to have histograms in TAP I
originally thought to three solutions:<br>
<br>
1- A column of type ARRAY (1D or 2D depending of whether the
histogram is contiguous or not). Very simple and minimalist
solution, but ARRAY are not yet supported in TAP.<br>
<br>
2- A URL toward a VOTable containing the histogram ; the
structure/data-model would be then to be defined. Easy to store
and to get in TAP_SCHEMA but probably not really query-able by
users, although I assume that most of them would prefer to get the
whole histogram rather than just parts of it.<br>
<br>
3- Having an additional table in TAP_SCHEMA. I never thought
to any precise structure here as you did Tom. However I am not
sure I like the idea to have 4 tables to store an histogram, but I
may have wrongly interpreted your suggestion. Considering the
structure of your tables, I would say that just one table with the
following structure would be fairly enough and would cover the
case of continuous histogram or not:<br>
<br>
CREATE TABLE "TAP_SCHEMA".histogram_bins (<br>
table_name VARCHAR,<br>
column_name VARCHAR,<br>
step INTEGER,<br>
lb DOUBLE PRECISION,<br>
hb DOUBLE PRECISION,<br>
value INTEGER,<br>
PRIMARY KEY (table_name, column_name)<br>
);<br>
<br>
Note 1: I think "step" could be optional if the minimum and
maximum of the histogram are exactly the minimum and maximum
values provided in TAP_SCHEMA.columns (min_value and max_value).<br>
<br>
Note 2: Depending of the implementation, "step" and "value" could
be BIGINT instead of INTEGER.<br>
<br>
<br>
Considering these solutions and the fact that I have already all
histograms for the columns of my TAP service (for Gaia, Hipparcos,
Tycho2, PPMXL and 2MASS), I could easily implement a prototype for
one (or more if really needed) of these solutions or any other
preferred suggestion, if we all agree that having histograms in
TAP metadata is really a good idea and if we all merge toward of
solution.<br>
<br>
Cheers,<br>
Grégory<br>
<br>
<br>
On 21/10/2016 09:21, Tom Donaldson wrote:<br>
</div>
<blockquote
cite="mid:864BC27D-FB32-4D5F-A457-392450BC9D8E@stsci.edu"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Hello All,
<div class=""><br class="">
</div>
<div class="">At MAST, we are currently computing and using min
and max for numeric columns, and of course could compute the
other basic statistics.</div>
<div class=""><br class="">
</div>
<div class="">Our main use of the min/max values is to define bin
sizes for histograms that support the user interface for certain
queries. We also store histograms for discrete-valued columns.
It seems to me that it is worth considering adding this
histogram information as optional TAP metadata.</div>
<div class=""><br class="">
</div>
<div class="">We use 4 tables to store the histogram-related
information. Our default number of bins per column is 145,
mostly because that gives a decent looking histgram in the UI
without being too large. I’m sure these could all be rearranged
to some degree to make it more TAP-suitable, but the current
tables look like this:</div>
<div class=""><br class="">
</div>
<div class="">
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> </span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class="">CREATE TABLE
[dbo].[histogramBins](<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> [columnName]
[varchar](32) NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> [step] [int] NULL,<o:p
class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> [lb] [float] NULL,<o:p
class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> [hb] [float] NULL<o:p
class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class="">)<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> </span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> </span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class="">CREATE TABLE
[dbo].[histogramContinuous](<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> [columnName]
[varchar](32) NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> [step] [int] NULL,<o:p
class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> [lb] [float] NULL,<o:p
class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> [hb] [float] NULL,<o:p
class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> [theCount]
[bigint] NULL<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class="">)<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> </span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class="">CREATE TABLE
[dbo].[histogramDiscrete](<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> [columnName]
[varchar](32) NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> [distinctValue]
[varchar](2048) NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> [theCount]
[bigint] NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> [hasBeenSplit]
[char](1) NULL<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class="">)<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> </span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> </span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class="">CREATE TABLE
[dbo].[histogramMinMax](<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> [columnName]
[varchar](32) NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> [minimum] [float]
NULL,<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class=""> [maximum] [float]
NULL<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="font-size: 8pt; font-family: 'Courier New';
color: rgb(31, 73, 125);" class="">)<o:p class=""></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 11pt;
font-family: Calibri, sans-serif;" class="">
<span style="color: rgb(31, 73, 125);" class=""> </span></div>
</div>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class="">Tom</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">
<div>
<blockquote type="cite" class="">
<div class="">On Oct 20, 2016, at 5:07 PM, Gregory Mantelet
<<a moz-do-not-send="true"
href="mailto:gmantele@ari.uni-heidelberg.de" class="">gmantele@ari.uni-heidelberg.de</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div bgcolor="#FFFFFF" text="#000000" class="">
<div class=""><br class="webkit-block-placeholder">
</div>
<div class="moz-text-flowed" style="font-family:
-moz-fixed; font-size: 12px;" lang="x-unicode">
Dear DAL and Apps members, <br class="">
<br class="">
Since I do not attend to this interop, I would like to
highlight quickly one of my last development
concerning TAP because I think it may be interested to
either do the same in your own TAP service or merely
use it. As suggested by the title of this email it is
about adding metadata in TAP. <br class="">
<br class="">
(I send this email also to Apps because of the last
point I make in this email: a compatibility with a new
feature of TOPCAT)
<br class="">
<br class="">
<br class="">
** Columns metadata <br class="">
<br class="">
The idea is to add basic statistics like a count, min,
max, ... for some numerical columns of tables
published in a TAP service. For that, I have just
added the following columns in TAP_SCHEMA.columns:
<br class="">
<br class="">
- min_value <br class="">
- max_value <br class="">
- mean <br class="">
- std_dev <br class="">
- q1 (i.e. first quartile) <br class="">
- median (i.e. second quartile) <br class="">
- q3 (i.e. third quartile) <br class="">
- filling (number of rows having a NOT NULL
value for this column) <br class="">
<br class="">
Except for "filling" which must be an integer (INTEGER
or BIGINT in PostgreSQL), I have chosen to set all
these columns as DOUBLE PRECISION since most of the
columns to describe are, in the "worst" case, double
values.
<br class="">
<br class="">
When no statistics can be provided for a column, all
these additional metadata would be NULL.
<br class="">
<br class="">
<br class="">
** Tables metadata <br class="">
<br class="">
In addition, I have also added another column in
TAP_SCHEMA.tables: <br class="">
<br class="">
- row_count (of type INTEGER or BIGINT) <br
class="">
<br class="">
<br class="">
** VOSI description of tables <br class="">
<br class="">
Since in TAP all tables and columns metadata MUST be
the same in TAP_SCHEMA and /tables, I have also
updated our /tables resource.
<br class="">
<br class="">
Besides, on a recommendation of Mark Taylor, I
designed and linked a simple XSD schema in order to
have a valid XML document. You can find this schema at
the following address:
<br class="">
<br class="">
<a moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://gaia.ari.uni-heidelberg.de/tap-stats.xsd">
http://gaia.ari.uni-heidelberg.de/tap-stats.xsd</a>
<br class="">
<br class="">
<br class="">
** Visibility in TOPCAT <br class="">
<br class="">
Thanks to Mark Taylor, any custom metadata
(non-standard TAP columns) can be displayed in the
last version of TOPCAT. Thus, all the statistics
described above can be seen there for our Gaia TAP
service (n.b. you can find this TAP service easily in
the registry with the keywords "Gaia" and "ARI", but
in case you can not, here is the root access URL:
<a moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://gaia.ari.uni-heidelberg.de/tap">http://gaia.ari.uni-heidelberg.de/tap</a>).
<br class="">
<br class="">
<br class="">
** Last words... <br class="">
<br class="">
According to me all these basic statistics may be
useful to discover the content of a table, especially
when this one is as large as Gaia, PPMXL, 2MASS, ...
It can indeed prevent some users to perform apparently
simple and short queries such as "SELECT COUNT(*) FROM
a_big_table" which on the contrary to what most people
think is not often a quick query on large tables.
Having already computed such information is then an
economy of time and resources for the users and the
server.
<br class="">
<br class="">
Finally, I am not trying to convince anybody to have
such metadata, but I just want to highlight a possible
extension of TAP helping in simple data discovery.
Besides, this use-case also demonstrates how easy it
could be to add more simple metadata inside a TAP
service. So maybe it could be interested, if possible,
to write an appendix about that in the next version of
TAP or just as an IVOA note. What do you think?
<br class="">
<br class="">
If anybody has questions or wants further details
about the TAP "extension" I presented here, do not
hesitate to ask ; I am not at the interop, but I am
fully available by email
<span class="moz-smiley-s1" title=":)"></span><br
class="">
<br class="">
Regards, <br class="">
Grégory <br class="">
<br class="">
<br class="">
PS: For those who are interested, I also provide
histograms and sky-maps (using Healpix) for most of
the published columns on the page
<a moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://gaia.ari.uni-heidelberg.de/tap/tables">
http://gaia.ari.uni-heidelberg.de/tap/tables</a>.
Both can be downloaded as images but also as tables
that you can then display/process as you want (e.g.
display the histogram in TOPCAT, display and navigate
inside the Healpix map in Aladin, ...).
<br class="">
<br class="">
<br class="">
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</blockquote>
<p><br>
</p>
</body>
</html>