total number of matched results

Patrick Dowler patrick.dowler at nrc-cnrc.gc.ca
Fri Jun 16 10:03:35 PDT 2006


On 16.6.2006 08:52, Aurelien Stebe wrote:
> 
> I remember talking about it briefly too. Kevin's right saying its use is 
> limited to informing the user. That's why I would be in favor of it, 
> only if it's really easy to implement. I think that with RDBMS (at least 
> mine) it is easy and not at all time/memory consuming. JDBC gives that 
> total number of rows returned, or a "COUNT(id)" query could be done.
> I am not familiar with XMLDBs, so I don't know if a "COUNT(id)" query 
> would actually fill the memory with the records or just return the 
> value, or even if it's possible.

In general, we have implemented counting query in many services and it is good as
a way to evaulate (quickly) if the query is actually a decent one. For example,
a user may want to find services of type T which serve the content from data
collection D. They formulate a query, do a count, and find out there are 5000 such
services... they expected 10 (say), so they know the query is likely not doing what
they want. A return count of 0 also says something about the query (logically 
inconsistent conditions, usually) rather than the content. It is very useful to be 
able to do this...

"select count(...)" in a RDBMS is usually very fast (assuming indexes that work with
the where clause). The server only returns one int to the caller (ie JDBC driver for
java) and never instantiates the results (outside of possible temp tables within the
server). In many cases of simple queries the index structures can be used to
get the count without visiting all the rows, so it is very scalable - certainly more
scalable than a registry is likely to require.

As for truncating results via something like TOP, there are some gotchas if you
want to enable the caller to get more results without just redoing the query with
a larger batch size. the caller needs to be able to provide an offset to get the
2nd and subsequent batch and the server needs to produce results in a consistent
order, which often means a stateless server will always have to sort. That can be 
costly, but maybe not enough at the scale of a registry... it does add an implementation
constraint. 

As a minor detail, if you use an RDBMS and its TOP feature, you do not actually know
that it did truncate (you guess by knowing you said TOP N and getting exactly N results).
Or, the user says TOP N and you put TOP N+something in the query and then post-filter
to get back to N, thus knowing there really are more results. Its a detail, but a pain to get
exactly right in the implementation.


-- 
Patrick Dowler
Tel/Tél: (250) 363-6914                  | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre   | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada                  | Gouvernement du Canada
5071 West Saanich Road               | 5071, chemin West Saanich
Victoria, BC                                  | Victoria (C.-B.)



More information about the registry mailing list