Row count in TAP_SCHEMA

Mark Taylor m.b.taylor at bristol.ac.uk
Fri Aug 12 11:25:40 CEST 2022


It looks to me like agreement or lack of disagreement on this.

I have provisionally added code to topcat to read and use
the column "nrows" in tap_schema.tables if it is present
(the value is displayed under the heading "Rows (approx)" in the
Table tab of the TAP window Use Service panel).
Pre-release with this feature available here:

   http://andromeda.star.bristol.ac.uk/releases/topcat/pre/topcat-full.jar

If some TAP service implements this, please let me know and I'll
check it works properly.

Mark

On Wed, 13 Jul 2022, Mark Taylor wrote:

> That's fine by me.  The nrows attribute in VODataService is documented:
> 
>    Meaning
>        The approximate size of the table in rows.
>    Comment
>        This is not expected to be exact. For instance, the estimates
>        on table sizes databases keep for query planning purposes are
>        suitable for this field.
> 
> so a similar definition for the TAP_SCHEMA equivalent would make
> good sense.
> 
> Mark
> 
> On Wed, 13 Jul 2022, Patrick Dowler wrote:
> 
> > +1 on implementing now and adding to TAP-next (I intend to make a TAP_next
> > wiki page asap; will announce)
> > 
> > Can I assume that the definition of "nrows" would be that it is
> > approximate? In most cases I would have to
> > implement a periodic update to set the value based on current content so
> > the value returned could be out of date
> > wrt. reality (eg not agree with "select count(*) from <table>".
> > 
> > I am thinking about cases where there are millions of rows and the count
> > changes by thousands each day. My gut
> > says a daily update would be feasible... maybe a few times per day.
> > Probably not less frequent than 1/day.
> > 
> > 
> > --
> > Patrick Dowler
> > Canadian Astronomy Data Centre
> > Victoria, BC, Canada
> > 
> > 
> > On Mon, 11 Jul 2022 at 08:30, Gregory MANTELET <
> > gregory.mantelet at astro.unistra.fr> wrote:
> > 
> > > Hi Mark, DAL,
> > >
> > > I agree with the addition of this optional column to the TAP_SCHEMA.
> > >
> > > A little note from the ADQL side though. `size` is a reserved keyword in
> > > SQL/ADQL. It would be better to choose another one in order to avoid the
> > > annoying wrapping between double quotes.
> > >
> > > It was the main reason why I chose to call it `row_count` when I added
> > > this column in the TAP service of ARI-Gaia. The other reason was that
> > > the unit is immediately obvious, on the contrary to the generic keyword
> > > `size`.
> > >
> > > `nrows` seems to be a very nice alternative to me: short, explicit, not
> > > reserved and consistent with VODataService.
> > >
> > > Cheers,
> > > Grégory M.
> > >
> > >
> > > On 11/07/2022 16:34, Mark Taylor wrote:
> > > > Hi DAL,
> > > >
> > > > since VODataService v1.2 (see sec 3.3), the Table element has had
> > > > an optional attribute "nrows" which allows services to declare how
> > > > many rows a table has.  That is useful information, and TOPCAT
> > > > displays it, if known, as part of the table metadata in its TAP window.
> > > >
> > > > However, there is currently no corresponding standard way to report
> > > > this information from TAP_SCHEMA.  Topcat sometimes gets TAP service
> > > > metadata from the /tables endpoint (VODataService) and sometimes from
> > > > TAP_SCHEMA (depending on things like apparent service size); in the
> > > former
> > > > case it's able to report table sizes, but in the latter case it's not.
> > > >
> > > > So it would be nice to have a standard way in which TAP services
> > > > could report table size in TAP_SCHEMA if they wanted to.
> > > > This would just need to be a new optional column with an agreed
> > > > name in TAP_SCHEMA.tables.
> > > >
> > > > In fact some services already do this, but different column names
> > > > are in use.  ARI-Gaia uses "row_count" and ESA uses "size"
> > > > (and also has "size_bytes" for size in bytes).  "size" is a
> > > > somewhat problematic choice since it's an ADQL reserved word,
> > > > it's also not very explicit about what it means.
> > > > "row_count" is OK by me, though "nrows" would also be reasonable
> > > > for consistency with VODataService.
> > > >
> > > > Could we agree here on a suitable column name for this?
> > > > Next time there's a TAP update it could go in there,
> > > > but there's nothing to stop people agreeing on and implementing
> > > > best practice in the mean time; since the column would be optional,
> > > > and you're allowed to add non-standard columns in TAP_SCHEMA,
> > > > it doesn't break anything.
> > > >
> > > > Mark
> > > >
> > > > --
> > > > Mark Taylor  Astronomical Programmer  Physics, Bristol University, UK
> > > > m.b.taylor at bristol.ac.uk          http://www.star.bristol.ac.uk/~mbt/
> > >
> > >
> > 
> 
> --
> Mark Taylor  Astronomical Programmer  Physics, Bristol University, UK
> m.b.taylor at bristol.ac.uk          http://www.star.bristol.ac.uk/~mbt/
> 

--
Mark Taylor  Astronomical Programmer  Physics, Bristol University, UK
m.b.taylor at bristol.ac.uk          http://www.star.bristol.ac.uk/~mbt/


More information about the dal mailing list