<div dir="ltr"><div>Hi Markus,</div><div>while I currently have no strong opinion on the content...</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Il giorno mer 17 lug 2019 alle ore 14:48 Markus Demleitner &lt;<a href="mailto:msdemlei@ari.uni-heidelberg.de">msdemlei@ari.uni-heidelberg.de</a>&gt; ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi all,<br>

<br>

Since there&#39;s no ADQL-2.2-Next yet, I&#39;d like to propose a feature<br>

that I could see in it here: TABLESAMPLE.<br></blockquote><div><br></div><div>...there&#39;s no ADQL-2.2-Next because even ADQL-2.1 is not REC.</div><div>Thus, even if it may sound silly, you can chose, zero, one or more from:</div><div>- TWiki Next for current REC - <a href="https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL-2_0-Next">https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL-2_0-Next</a></div><div>- github ADQL issues - <a href="https://github.com/ivoa-std/ADQL/issues">https://github.com/ivoa-std/ADQL/issues</a> </div><div><br></div><div>Discussion I would prefer however, as per current practice, to happen</div><div>in this thread.</div><div><br></div><div>Cheers</div><div>     Marco</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

The purpose is that the server takes a more-or-less random sample of<br>

a table; you will specify a percentage to take in parentheses.  For<br>

instance, on <a href="http://dc.g-vo.org/tap" rel="noreferrer" target="_blank">http://dc.g-vo.org/tap</a> you can run something like<br>

<br>

select avg(phot_g_mean_mag) from gaia.dr2light tablesample(0.2)<br>

<br>

and still finish within the sync timeout (and be within some 1e-4 of<br>

the true value, I guess).  Tablesample can be applied to each table<br>

separately (though there&#39;s probably not many scenarios where you&#39;d<br>

want multiple of them).  For instance, you can do something like<br>

<br>

SELECT<br>

   *<br>

   FROM gaia.dr2light as d tablesample(0.01)<br>

   JOIN ppmxl.main AS n<br>

   ON distance(d.ra, d.dec, n.raj2000, n.dej2000)&lt; 2./3600.<br>

<br>

which gives you a reasonably all-sky sample of ~1e5 pairs of gaia and<br>

ppmxl objects (try it and do a sky plot), which might give you an<br>

idea of positional scatter, photometric matches, or whatever.  And<br>

that still within the sync timeout on my box, which is a few seconds.<br>

<br>

Much better than TOP, anyway.<br>

<br>

<br>

<br>

This is fashioned after the corresponding feature of postgres.<br>

Postgres offers different sampling methods; essentially, row-wise and<br>

block-wise.  Only block-wise gives you a lot in terms of run-time,<br>

and I&#39;d avoid giving any guarantees (beyond &quot;best effort&quot;) here<br>

anyway, because it&#39;ll be hard to give them interoperably.  And that&#39;s<br>

why I&#39;d say <br>

<br>

tablesample(percentage) <br>

<br>

would be a great feature that probably can be reasonably implemented<br>

on most databases where the difference between a naive and a fast<br>

implementation matters in the first place.  <br>

<br>

I&#39;ve even given it an id one can use in TAPRegExt until, perhaps,<br>

it&#39;ll get into ADQL; capabilities fragment:<br>

<br>

      &lt;languageFeatures type=&quot;ivo://org.gavo.dc/std/exts#extra-adql-keywords&quot;&gt;<br>

        &lt;feature&gt;<br>

          &lt;form&gt;TABLESAMPLE&lt;/form&gt;<br>

          &lt;description&gt;<br>

            Written after a table reference,<br>

            TABLESAMPLE(10) will make the database only use 10% of the<br>

            rows; these are `somewhat random&#39; in that the system will<br>

            use random blocks.  This should be good enough when just<br>

            testing queries (and much better than using TOP n).<br>

          &lt;/description&gt;<br>

        &lt;/feature&gt;<br>

      &lt;/languageFeatures&gt;<br>

<br>

Opinions?<br>

<br>

        -- Markus<br>

</blockquote></div></div>