msdemlei at ari.uni-heidelberg.de
Wed Jul 17 14:48:39 CEST 2019
Since there's no ADQL-2.2-Next yet, I'd like to propose a feature
that I could see in it here: TABLESAMPLE.
The purpose is that the server takes a more-or-less random sample of
a table; you will specify a percentage to take in parentheses. For
instance, on http://dc.g-vo.org/tap you can run something like
select avg(phot_g_mean_mag) from gaia.dr2light tablesample(0.2)
and still finish within the sync timeout (and be within some 1e-4 of
the true value, I guess). Tablesample can be applied to each table
separately (though there's probably not many scenarios where you'd
want multiple of them). For instance, you can do something like
FROM gaia.dr2light as d tablesample(0.01)
JOIN ppmxl.main AS n
ON distance(d.ra, d.dec, n.raj2000, n.dej2000)< 2./3600.
which gives you a reasonably all-sky sample of ~1e5 pairs of gaia and
ppmxl objects (try it and do a sky plot), which might give you an
idea of positional scatter, photometric matches, or whatever. And
that still within the sync timeout on my box, which is a few seconds.
Much better than TOP, anyway.
This is fashioned after the corresponding feature of postgres.
Postgres offers different sampling methods; essentially, row-wise and
block-wise. Only block-wise gives you a lot in terms of run-time,
and I'd avoid giving any guarantees (beyond "best effort") here
anyway, because it'll be hard to give them interoperably. And that's
why I'd say
would be a great feature that probably can be reasonably implemented
on most databases where the difference between a naive and a fast
implementation matters in the first place.
I've even given it an id one can use in TAPRegExt until, perhaps,
it'll get into ADQL; capabilities fragment:
Written after a table reference,
TABLESAMPLE(10) will make the database only use 10% of the
rows; these are `somewhat random' in that the system will
use random blocks. This should be good enough when just
testing queries (and much better than using TOP n).
More information about the dal