TABLESAMPLE?
Molinaro, Marco
marco.molinaro at inaf.it
Wed Jul 17 17:32:49 CEST 2019
Hi Markus,
while I currently have no strong opinion on the content...
Il giorno mer 17 lug 2019 alle ore 14:48 Markus Demleitner <
msdemlei at ari.uni-heidelberg.de> ha scritto:
> Hi all,
>
> Since there's no ADQL-2.2-Next yet, I'd like to propose a feature
> that I could see in it here: TABLESAMPLE.
>
...there's no ADQL-2.2-Next because even ADQL-2.1 is not REC.
Thus, even if it may sound silly, you can chose, zero, one or more from:
- TWiki Next for current REC -
https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL-2_0-Next
- github ADQL issues - https://github.com/ivoa-std/ADQL/issues
Discussion I would prefer however, as per current practice, to happen
in this thread.
Cheers
Marco
>
> The purpose is that the server takes a more-or-less random sample of
> a table; you will specify a percentage to take in parentheses. For
> instance, on http://dc.g-vo.org/tap you can run something like
>
> select avg(phot_g_mean_mag) from gaia.dr2light tablesample(0.2)
>
> and still finish within the sync timeout (and be within some 1e-4 of
> the true value, I guess). Tablesample can be applied to each table
> separately (though there's probably not many scenarios where you'd
> want multiple of them). For instance, you can do something like
>
> SELECT
> *
> FROM gaia.dr2light as d tablesample(0.01)
> JOIN ppmxl.main AS n
> ON distance(d.ra, d.dec, n.raj2000, n.dej2000)< 2./3600.
>
> which gives you a reasonably all-sky sample of ~1e5 pairs of gaia and
> ppmxl objects (try it and do a sky plot), which might give you an
> idea of positional scatter, photometric matches, or whatever. And
> that still within the sync timeout on my box, which is a few seconds.
>
> Much better than TOP, anyway.
>
>
>
> This is fashioned after the corresponding feature of postgres.
> Postgres offers different sampling methods; essentially, row-wise and
> block-wise. Only block-wise gives you a lot in terms of run-time,
> and I'd avoid giving any guarantees (beyond "best effort") here
> anyway, because it'll be hard to give them interoperably. And that's
> why I'd say
>
> tablesample(percentage)
>
> would be a great feature that probably can be reasonably implemented
> on most databases where the difference between a naive and a fast
> implementation matters in the first place.
>
> I've even given it an id one can use in TAPRegExt until, perhaps,
> it'll get into ADQL; capabilities fragment:
>
> <languageFeatures
> type="ivo://org.gavo.dc/std/exts#extra-adql-keywords">
> <feature>
> <form>TABLESAMPLE</form>
> <description>
> Written after a table reference,
> TABLESAMPLE(10) will make the database only use 10% of the
> rows; these are `somewhat random' in that the system will
> use random blocks. This should be good enough when just
> testing queries (and much better than using TOP n).
> </description>
> </feature>
> </languageFeatures>
>
> Opinions?
>
> -- Markus
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20190717/24d36d86/attachment.html>
More information about the dal
mailing list