Quantcast

Slow count(*) again...

classic Classic list List threaded Threaded
120 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Slow count(*) again...

Neil Whelchel
I know that there haven been many discussions on the slowness of count(*) even
when an index is involved because the visibility of the rows has to be
checked. In the past I have seen many suggestions about using triggers and
tables to keep track of counts and while this works fine in a situation where
you know what the report is going to be ahead of time, this is simply not an
option when an unknown WHERE clause is to be used (dynamically generated).
I ran into a fine example of this when I was searching this mailing list,
"Searching in 856,646 pages took 13.48202 seconds. Site search powered by
PostgreSQL 8.3." Obviously at some point count(*) came into play here because
the site made a list of pages (1 2 3 4 5 6 > next). I very commonly make a
list of pages from search results, and the biggest time killer here is the
count(*) portion, even worse yet, I sometimes have to hit the database with
two SELECT statements, one with OFFSET and LIMIT to get the page of results I
need and another to get the amount of total rows so I can estimate how many
pages of results are available. The point I am driving at here is that since
building a list of pages of results is such a common thing to do, there need
to be some specific high speed ways to do this in one query. Maybe an
estimate(*) that works like count but gives an answer from the index without
checking visibility? I am sure that this would be good enough to make a page
list, it is really no big deal if it errors on the positive side, maybe the
list of pages has an extra page off the end. I can live with that. What I
can't live with is taking 13 seconds to get a page of results from 850,000
rows in a table.
-Neil-

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Scott Marlowe-2
On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel <[hidden email]> wrote:

> I know that there haven been many discussions on the slowness of count(*) even
> when an index is involved because the visibility of the rows has to be
> checked. In the past I have seen many suggestions about using triggers and
> tables to keep track of counts and while this works fine in a situation where
> you know what the report is going to be ahead of time, this is simply not an
> option when an unknown WHERE clause is to be used (dynamically generated).
> I ran into a fine example of this when I was searching this mailing list,
> "Searching in 856,646 pages took 13.48202 seconds. Site search powered by
> PostgreSQL 8.3." Obviously at some point count(*) came into play here because
> the site made a list of pages (1 2 3 4 5 6 > next). I very commonly make a
> list of pages from search results, and the biggest time killer here is the
> count(*) portion, even worse yet, I sometimes have to hit the database with
> two SELECT statements, one with OFFSET and LIMIT to get the page of results I
> need and another to get the amount of total rows so I can estimate how many
> pages of results are available. The point I am driving at here is that since
> building a list of pages of results is such a common thing to do, there need
> to be some specific high speed ways to do this in one query. Maybe an
> estimate(*) that works like count but gives an answer from the index without
> checking visibility? I am sure that this would be good enough to make a page
> list, it is really no big deal if it errors on the positive side, maybe the
> list of pages has an extra page off the end. I can live with that. What I
> can't live with is taking 13 seconds to get a page of results from 850,000
> rows in a table.

99% of the time in the situations you don't need an exact measure, and
assuming analyze has run recently, select rel_tuples from pg_class for
a given table is more than close enough.  I'm sure wrapping that in a
simple estimated_rows() function would be easy enough to do.

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Mladen Gogala-4
In reply to this post by Neil Whelchel
Neil Whelchel wrote:

> I know that there haven been many discussions on the slowness of count(*) even
> when an index is involved because the visibility of the rows has to be
> checked. In the past I have seen many suggestions about using triggers and
> tables to keep track of counts and while this works fine in a situation where
> you know what the report is going to be ahead of time, this is simply not an
> option when an unknown WHERE clause is to be used (dynamically generated).
> I ran into a fine example of this when I was searching this mailing list,
> "Searching in 856,646 pages took 13.48202 seconds. Site search powered by
> PostgreSQL 8.3." Obviously at some point count(*) came into play here because
> the site made a list of pages (1 2 3 4 5 6 > next). I very commonly make a
> list of pages from search results, and the biggest time killer here is the
> count(*) portion, even worse yet, I sometimes have to hit the database with
> two SELECT statements, one with OFFSET and LIMIT to get the page of results I
> need and another to get the amount of total rows so I can estimate how many
> pages of results are available. The point I am driving at here is that since
> building a list of pages of results is such a common thing to do, there need
> to be some specific high speed ways to do this in one query. Maybe an
> estimate(*) that works like count but gives an answer from the index without
> checking visibility? I am sure that this would be good enough to make a page
> list, it is really no big deal if it errors on the positive side, maybe the
> list of pages has an extra page off the end. I can live with that. What I
> can't live with is taking 13 seconds to get a page of results from 850,000
> rows in a table.
> -Neil-
>
>  
Unfortunately, the problem is in the rather primitive way PostgreSQL
does I/O. It didn't change in 9.0 so there is nothing you could gain by
upgrading. If you execute strace -o /tmp/pg.out -e read <PID of the
sequential scan process> and inspect the file /tmp/pg.out when the query
finishes, you will notice a gazillion of read requests, all of them 8192
bytes in size. That means that PostgreSQL is reading the table block by
block, without any merging of the requests. You can alleviate the pain
by using the OS tricks, like specifying the deadline I/O scheduler in
the grub.conf and set prefetch on the FS block devices by using
blockdev, but there is nothing special that can be done, short of
rewriting the way PostgreSQL does I/O. There were rumors about the
version 9.0 and asynchronous I/O, but that didn't materialize. That is
really strange to me, because PostgreSQL tables are files or groups of
files, if the table size exceeds 1GB. It wouldn't be very hard to try
reading 1MB at a time and that would speed up the full table scan
significantly.
Problem with single block I/O is that there is a context switch for each
request, the I/O scheduler has to work hard to merge requests
appropriately and there is really no need for that, tables are files
navigating through files is not a problem, even with much larger blocks.
In another database, whose name I will not mention, there is a parameter
db_file_multiblock_read_count which specifies how many blocks will be
read by a single read when doing a full table scan. PostgreSQL is in
dire need of something similar and it wouldn't even be that hard to
implement.


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Joe Conway
On 10/09/2010 06:54 PM, Mladen Gogala wrote:
> In another database, whose name I will not mention, there is a parameter
> db_file_multiblock_read_count which specifies how many blocks will be
> read by a single read when doing a full table scan. PostgreSQL is in
> dire need of something similar and it wouldn't even be that hard to
> implement.

You're correct in that it isn't particularly difficult to implement for
sequential scans. But I have done some testing with aggressive read
ahead, and although it is clearly a big win with a single client, the
benefit was less clear as concurrency was increased.

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support


signature.asc (917 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Mladen Gogala-4
Joe Conway wrote:

> On 10/09/2010 06:54 PM, Mladen Gogala wrote:
>  
>> In another database, whose name I will not mention, there is a parameter
>> db_file_multiblock_read_count which specifies how many blocks will be
>> read by a single read when doing a full table scan. PostgreSQL is in
>> dire need of something similar and it wouldn't even be that hard to
>> implement.
>>    
>
> You're correct in that it isn't particularly difficult to implement for
> sequential scans. But I have done some testing with aggressive read
> ahead, and although it is clearly a big win with a single client, the
> benefit was less clear as concurrency was increased.
>
> Joe
>
>  
Well, in my opinion that should be left to the DBA, the same as in the
"other database".  The mythical DBA, the creature that mighty Larry
Ellison himself is on a crusade against, usually can  figure out the
right value for the database he or she's is in charge of. I humbly
confess to being an Oracle DBA for more than 2 decades and now branching
into Postgres because my employer is less than enthusiastic about
Oracle, with the special accent on their pricing.

Modern databases, Postgres included, are quite complex and companies
need DBA personnel to help fine tune the applications. I know that good
DBA personnel is quite expensive but without a competent DBA who knows
the database software well enough,  companies can and will suffer from
blunders with performance, downtime, lost data and alike. In the world
where almost every application is written for the web, performance,
uptime and user experience are of the critical importance. The
architects of Postgres database would be well advised to operate under
the assumption that every production database has a competent DBA
keeping an eye on the database.

Every application has its own mix of sequential and index scans, you
cannot possibly test all possible applications.  Aggressive read-ahead
or "multi-block reads" can be a performance problem and it will
complicate the optimizer, because the optimizer now has a new variable  
to account for: the block size, potentially making  seq_page_cost even
cheaper and random_page_cost even more expensive, depending on the
blocking. However,  slow sequential scan is, in my humble opinion, the
single biggest performance problem of the PostgreSQL databases and
should be improved, the sooner, the better. You should, however, count
on the DBA personnel to help with the tuning.
We're the Tinkerbells of the database world. I am 6'4", 240 LBS, no wings.


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Neil Whelchel
In reply to this post by Scott Marlowe-2
On Saturday 09 October 2010 18:47:34 Scott Marlowe wrote:
> On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel <[hidden email]>
wrote:

> > I know that there haven been many discussions on the slowness of count(*)
> > even when an index is involved because the visibility of the rows has to
> > be checked. In the past I have seen many suggestions about using
> > triggers and tables to keep track of counts and while this works fine in
> > a situation where you know what the report is going to be ahead of time,
> > this is simply not an option when an unknown WHERE clause is to be used
> > (dynamically generated). I ran into a fine example of this when I was
> > searching this mailing list, "Searching in 856,646 pages took 13.48202
> > seconds. Site search powered by PostgreSQL 8.3." Obviously at some point
> > count(*) came into play here because the site made a list of pages (1 2
> > 3 4 5 6 > next). I very commonly make a list of pages from search
> > results, and the biggest time killer here is the count(*) portion, even
> > worse yet, I sometimes have to hit the database with two SELECT
> > statements, one with OFFSET and LIMIT to get the page of results I need
> > and another to get the amount of total rows so I can estimate how many
> > pages of results are available. The point I am driving at here is that
> > since building a list of pages of results is such a common thing to do,
> > there need to be some specific high speed ways to do this in one query.
> > Maybe an estimate(*) that works like count but gives an answer from the
> > index without checking visibility? I am sure that this would be good
> > enough to make a page list, it is really no big deal if it errors on the
> > positive side, maybe the list of pages has an extra page off the end. I
> > can live with that. What I can't live with is taking 13 seconds to get a
> > page of results from 850,000 rows in a table.
>
> 99% of the time in the situations you don't need an exact measure, and
> assuming analyze has run recently, select rel_tuples from pg_class for
> a given table is more than close enough.  I'm sure wrapping that in a
> simple estimated_rows() function would be easy enough to do.

This is a very good approach and it works very well when you are counting the
entire table, but when you have no control over the WHERE clause, it doesn't
help. IE: someone puts in a word to look for in a web form.

From my perspective, this issue is the biggest problem there is when using
Postgres to create web pages, and it is so commonly used, I think that there
should be a specific way to deal with it so that you don't have to run the
same WHERE clause twice.
IE: SELECT count(*) FROM <table> WHERE <clause>; to get the total amount of
items to make page navigation links, then:
SELECT <columns> FROM table WHERE <clause> LIMIT <items_per_page> OFFSET
<(page_no-1)*items_per_page>; to get the actual page contents.
 
It's bad enough that count(*) is slow, then you have to do it all over again
to get the results you need! I have not dug into this much yet, but would it
be possible to return the amount of rows that a WHERE clause would actually
return if the LIMIT and OFFSET were not applied. IE: When a normal query is
executed, the server returns the number of rows aside from the actual row
data. Would it be a big deal to modify this to allow it to return the amount
of rows before the LIMIT and OFFSET is applied as well? This would sure cut
down on time it takes to do the same WHERE clause twice... I have considered
using a cursor to do this, however this requires a transfer of all of the rows
to the client to get a total count, then setting the cursor to get the rows
that you are interested in. Or is there a way around this that I am not aware
of?
-Neil-


--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Samuel Gendler
In reply to this post by Mladen Gogala-4


On Sat, Oct 9, 2010 at 7:44 PM, Mladen Gogala <[hidden email]> wrote:
 The architects of Postgres database would be well advised to operate under the assumption that every production database has a competent DBA keeping an eye on the database.

I'd actually go so far as to say that they have already made this assumption.  The out of the box config needs modification for all but the most low-volume applications and postgres really benefits from having some attention paid to performance.  Not only does tuning the db provide enormous gains, but it is often possible to dramatically improve query responsiveness by simply restructuring a query (assuming an aggregating query over a fairly large table with a few joins thrown in).  My team does not have a competent DBA (though I've got 15+ years of experience developing on top of various dbs and certainly don't make overly naive assumptions about how things work) and the gains that we made, when I finally just sat down and read everything I could get my hands on about postgres and started reading this list, were really quite impressive.  I intend to take some of the courses offered by some of the companies that are active on this list when my schedule allows in order to expand my knowledge even farther, as a DBA is a luxury we cannot really afford at the moment. 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Craig Ringer
In reply to this post by Neil Whelchel
On 10/10/2010 11:02 AM, Neil Whelchel wrote:

> On Saturday 09 October 2010 18:47:34 Scott Marlowe wrote:
>> On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel<[hidden email]>
> wrote:
>>> I know that there haven been many discussions on the slowness of count(*)
>>> even when an index is involved because the visibility of the rows has to
>>> be checked. In the past I have seen many suggestions about using
>>> triggers and tables to keep track of counts and while this works fine in
>>> a situation where you know what the report is going to be ahead of time,
>>> this is simply not an option when an unknown WHERE clause is to be used
>>> (dynamically generated). I ran into a fine example of this when I was
>>> searching this mailing list, "Searching in 856,646 pages took 13.48202
>>> seconds. Site search powered by PostgreSQL 8.3." Obviously at some point
>>> count(*) came into play here because the site made a list of pages (1 2
>>> 3 4 5 6>  next). I very commonly make a list of pages from search
>>> results, and the biggest time killer here is the count(*) portion, even
>>> worse yet, I sometimes have to hit the database with two SELECT
>>> statements, one with OFFSET and LIMIT to get the page of results I need
>>> and another to get the amount of total rows so I can estimate how many
>>> pages of results are available. The point I am driving at here is that
>>> since building a list of pages of results is such a common thing to do,
>>> there need to be some specific high speed ways to do this in one query.
>>> Maybe an estimate(*) that works like count but gives an answer from the
>>> index without checking visibility? I am sure that this would be good
>>> enough to make a page list, it is really no big deal if it errors on the
>>> positive side, maybe the list of pages has an extra page off the end. I
>>> can live with that. What I can't live with is taking 13 seconds to get a
>>> page of results from 850,000 rows in a table.
>>
>> 99% of the time in the situations you don't need an exact measure, and
>> assuming analyze has run recently, select rel_tuples from pg_class for
>> a given table is more than close enough.  I'm sure wrapping that in a
>> simple estimated_rows() function would be easy enough to do.
>
> This is a very good approach and it works very well when you are counting the
> entire table, but when you have no control over the WHERE clause, it doesn't
> help. IE: someone puts in a word to look for in a web form.

For that sort of thing, there isn't much that'll help you except
visibility-aware indexes, covering indexes, etc if/when they're
implemented. Even then, they'd only help when it was a simple
index-driven query with no need to hit the table to recheck any test
conditions, etc.

I guess there could be *some* way to expose the query planner's cost
estimates in a manner useful for result count estimation ... but given
how coarse its stats are and how wildly out the estimates can be, I kind
of doubt it. It's really intended for query planning decisions and more
interested in orders of magnitude, "0, 1, or more than that" measures,
etc, and seems to consider 30% here or there to be pretty insignificant
most of the time.

> It's bad enough that count(*) is slow, then you have to do it all over again
> to get the results you need! I have not dug into this much yet, but would it
> be possible to return the amount of rows that a WHERE clause would actually
> return if the LIMIT and OFFSET were not applied. IE: When a normal query is
> executed, the server returns the number of rows aside from the actual row
> data. Would it be a big deal to modify this to allow it to return the amount
> of rows before the LIMIT and OFFSET is applied as well?

It'd force the server to fully execute the query. Then again, it sounds
like you're doing that anyway.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Craig Ringer
In reply to this post by Mladen Gogala-4
On 10/10/2010 9:54 AM, Mladen Gogala wrote:

> Unfortunately, the problem is in the rather primitive way PostgreSQL
> does I/O. It didn't change in 9.0 so there is nothing you could gain by
> upgrading. If you execute strace -o /tmp/pg.out -e read <PID of the
> sequential scan process> and inspect the file /tmp/pg.out when the query
> finishes, you will notice a gazillion of read requests, all of them 8192
> bytes in size. That means that PostgreSQL is reading the table block by
> block, without any merging of the requests.

I'd be really interested in any measurements you've done to determine
the cost of this over doing reads in larger chunks. If they're properly
detailed and thought out, the -hackers list is likely to be interested
as well.

The Linux kernel, at least, does request merging (and splitting, and
merging, and more splitting) along the request path, and I'd personally
expect that most of the cost of 8k requests would be in the increased
number of system calls, buffer copies, etc required. Measurements
demonstrating or contradicting this would be good to see.

It's worth being aware that there are memory costs to doing larger
reads, especially when you have many backends each of which want to
allocate a larger buffer for reading. If you can use a chunk of
shared_buffers as the direct destination for the read that's OK, but
otherwise you're up for (1mb-8kb)*num_backends extra memory use on I/O
buffers that could otherwise be used as shared_buffers or OS cache.

Async I/O, too, has costs.

 > PostgreSQL is in
> dire need of something similar and it wouldn't even be that hard to
> implement.

I'd really like to see both those assertions backed with data or patches ;-)

Personally, I know just enough about how PG's I/O path works to suspect
that "not that hard to implement" is probably a little ...
over-optimistic. Sure, it's not that hard to implement in a new program
with no wired-in architectural and design choices; that doesn't mean
it's easy to retrofit onto existing code, especially a bunch of
co-operating processes with their own buffer management.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Neil Whelchel
In reply to this post by Craig Ringer
On Saturday 09 October 2010 23:56:15 Craig Ringer wrote:

> On 10/10/2010 11:02 AM, Neil Whelchel wrote:
> > On Saturday 09 October 2010 18:47:34 Scott Marlowe wrote:
> >> On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel<[hidden email]>
> >
> > wrote:
> >>> I know that there haven been many discussions on the slowness of
> >>> count(*) even when an index is involved because the visibility of the
> >>> rows has to be checked. In the past I have seen many suggestions about
> >>> using triggers and tables to keep track of counts and while this works
> >>> fine in a situation where you know what the report is going to be
> >>> ahead of time, this is simply not an option when an unknown WHERE
> >>> clause is to be used (dynamically generated). I ran into a fine
> >>> example of this when I was searching this mailing list, "Searching in
> >>> 856,646 pages took 13.48202 seconds. Site search powered by PostgreSQL
> >>> 8.3." Obviously at some point count(*) came into play here because the
> >>> site made a list of pages (1 2 3 4 5 6>  next). I very commonly make a
> >>> list of pages from search results, and the biggest time killer here is
> >>> the count(*) portion, even worse yet, I sometimes have to hit the
> >>> database with two SELECT statements, one with OFFSET and LIMIT to get
> >>> the page of results I need and another to get the amount of total rows
> >>> so I can estimate how many pages of results are available. The point I
> >>> am driving at here is that since building a list of pages of results
> >>> is such a common thing to do, there need to be some specific high
> >>> speed ways to do this in one query. Maybe an estimate(*) that works
> >>> like count but gives an answer from the index without checking
> >>> visibility? I am sure that this would be good enough to make a page
> >>> list, it is really no big deal if it errors on the positive side,
> >>> maybe the list of pages has an extra page off the end. I can live with
> >>> that. What I can't live with is taking 13 seconds to get a page of
> >>> results from 850,000 rows in a table.
> >>
> >> 99% of the time in the situations you don't need an exact measure, and
> >> assuming analyze has run recently, select rel_tuples from pg_class for
> >> a given table is more than close enough.  I'm sure wrapping that in a
> >> simple estimated_rows() function would be easy enough to do.
> >
> > This is a very good approach and it works very well when you are counting
> > the entire table, but when you have no control over the WHERE clause, it
> > doesn't help. IE: someone puts in a word to look for in a web form.
>
> For that sort of thing, there isn't much that'll help you except
> visibility-aware indexes, covering indexes, etc if/when they're
> implemented. Even then, they'd only help when it was a simple
> index-driven query with no need to hit the table to recheck any test
> conditions, etc.

Good point, maybe this is turning more into a discussion of how to generate a
list of pages of results and one page of results with one query so we don't
have to do the same painfully slow query twice to do a very common task.

On the other hand, I copied a table out of one of my production servers that
has about 60,000 rows with 6 columns (numeric, numeric, bool, bool, timestamp,
text). The first numeric column has numbers evenly spread between 0 and 100
and it is indexed. I put the table in a pair of database servers both running
on the same physical hardware. One server is Postgres, the other is a popular
server (I am not mentioning names here). on Postgres: SELECT count(*) FROM
table where column>50; takes about 8 seconds to run. The other database server
took less than one second (about 25 ms) as it is using the index (I assume) to
come up with the results. It is true that this is not a fair test because both
servers were tested with their default settings, and the defaults for Postgres
are much more conservative, however, I don't think that any amount of settings
tweaking will bring them even in the same ball park. There has been discussion
about the other server returning an incorrect count because all of the indexed
rows may not be live at the time. This is not a problem for the intended use,
that is why I suggested another function like estimate(*). It's name suggests
that the result will be close, not 100% correct, which is plenty good enough
for generating a list of results pages in most cases. I am faced with a very
serious problem here. If the query to make a list of pages takes say 6 seconds
and it takes another 6 seconds to generate a page of results, the customer is
waiting 12 seconds. This is not going to work. If count made a quick estimate,
say less than a second, and it took 6 seconds to come up with the actual
results, I could live with that. Or if coming up with the window of results
via (OFFSET and LIMIT) and returned the total number of rows that would have
matched the query, then I would still have everything I need to render the
page in a reasonable time. I really think that this needs to be addressed
somewhere. It's not like I am the only one that does this. You see it nearly
everywhere a long list of results is (expected to be) returned in a web site.
Among the people I work with, this seems to be the most mentioned reason that
they claim that they don't use Postgres for their projects.

It would be nice to see how the server comes up with the search results and
list of links to pages of results for this mailing list.
(http://search.postgresql.org/search?q=slow+count%28%29&m=1&l=&d=365&s=r) I am
guessing that it probably uses the count and query method I am talking about.

>
> I guess there could be *some* way to expose the query planner's cost
> estimates in a manner useful for result count estimation ... but given
> how coarse its stats are and how wildly out the estimates can be, I kind
> of doubt it. It's really intended for query planning decisions and more
> interested in orders of magnitude, "0, 1, or more than that" measures,
> etc, and seems to consider 30% here or there to be pretty insignificant
> most of the time.
>
> > It's bad enough that count(*) is slow, then you have to do it all over
> > again to get the results you need! I have not dug into this much yet,
> > but would it be possible to return the amount of rows that a WHERE
> > clause would actually return if the LIMIT and OFFSET were not applied.
> > IE: When a normal query is executed, the server returns the number of
> > rows aside from the actual row data. Would it be a big deal to modify
> > this to allow it to return the amount of rows before the LIMIT and
> > OFFSET is applied as well?
>
> It'd force the server to fully execute the query. Then again, it sounds
> like you're doing that anyway.

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Vitalii Tymchyshyn
In reply to this post by Neil Whelchel


2010/10/10 Neil Whelchel <[hidden email]>
On Saturday 09 October 2010 18:47:34 Scott Marlowe wrote:
> On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel <[hidden email]>
wrote:
> > I know that there haven been many discussions on the slowness of count(*)
> > even when an index is involved because the visibility of the rows has to
> > be checked. In the past I have seen many suggestions about using
> > triggers and tables to keep track of counts and while this works fine in
> > a situation where you know what the report is going to be ahead of time,
> > this is simply not an option when an unknown WHERE clause is to be used
> > (dynamically generated). I ran into a fine example of this when I was
> > searching this mailing list, "Searching in 856,646 pages took 13.48202
> > seconds. Site search powered by PostgreSQL 8.3." Obviously at some point
> > count(*) came into play here because the site made a list of pages (1 2
> > 3 4 5 6 > next). I very commonly make a list of pages from search
> > results, and the biggest time killer here is the count(*) portion, even
> > worse yet, I sometimes have to hit the database with two SELECT
> > statements, one with OFFSET and LIMIT to get the page of results I need
> > and another to get the amount of total rows so I can estimate how many
> > pages of results are available. The point I am driving at here is that
> > since building a list of pages of results is such a common thing to do,
> > there need to be some specific high speed ways to do this in one query.
> > Maybe an estimate(*) that works like count but gives an answer from the
> > index without checking visibility? I am sure that this would be good
> > enough to make a page list, it is really no big deal if it errors on the
> > positive side, maybe the list of pages has an extra page off the end. I
> > can live with that. What I can't live with is taking 13 seconds to get a
> > page of results from 850,000 rows in a table.
>
> 99% of the time in the situations you don't need an exact measure, and
> assuming analyze has run recently, select rel_tuples from pg_class for
> a given table is more than close enough.  I'm sure wrapping that in a
> simple estimated_rows() function would be easy enough to do.

This is a very good approach and it works very well when you are counting the
entire table, but when you have no control over the WHERE clause, it doesn't
help. IE: someone puts in a word to look for in a web form.

From my perspective, this issue is the biggest problem there is when using
Postgres to create web pages, and it is so commonly used, I think that there
should be a specific way to deal with it so that you don't have to run the
same WHERE clause twice.
IE: SELECT count(*) FROM <table> WHERE <clause>; to get the total amount of
items to make page navigation links, then:
SELECT <columns> FROM table WHERE <clause> LIMIT <items_per_page> OFFSET
<(page_no-1)*items_per_page>; to get the actual page contents.

How about
select * from (select *, count(*) over () as total_count from <table> where <clause)  a LIMIT <items_per_page> OFFSET
<(page_no-1)*items_per_page>
It will return you total_count column with equal value in each row. You may have problems if no rows are returned (e.g. page num is too high).
--
Best regards,
 Vitalii Tymchyshyn
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Reid Thompson
In reply to this post by Neil Whelchel
  On 10/10/2010 6:29 AM, Neil Whelchel wrote:

> On Saturday 09 October 2010 23:56:15 Craig Ringer wrote:
>> On 10/10/2010 11:02 AM, Neil Whelchel wrote:
>>> On Saturday 09 October 2010 18:47:34 Scott Marlowe wrote:
>>>> On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel<[hidden email]>
>>> wrote:
>>>>> I know that there haven been many discussions on the slowness of
>>>>> count(*) even when an index is involved because the visibility of the
>>>>> rows has to be checked. In the past I have seen many suggestions about
>>>>> using triggers and tables to keep track of counts and while this works
>>>>> fine in a situation where you know what the report is going to be
>>>>> ahead of time, this is simply not an option when an unknown WHERE
>>>>> clause is to be used (dynamically generated). I ran into a fine
>>>>> example of this when I was searching this mailing list, "Searching in
>>>>> 856,646 pages took 13.48202 seconds. Site search powered by PostgreSQL
>>>>> 8.3." Obviously at some point count(*) came into play here because the
>>>>> site made a list of pages (1 2 3 4 5 6>   next). I very commonly make a
>>>>> list of pages from search results, and the biggest time killer here is
>>>>> the count(*) portion, even worse yet, I sometimes have to hit the
>>>>> database with two SELECT statements, one with OFFSET and LIMIT to get
>>>>> the page of results I need and another to get the amount of total rows
>>>>> so I can estimate how many pages of results are available. The point I
>>>>> am driving at here is that since building a list of pages of results
>>>>> is such a common thing to do, there need to be some specific high
>>>>> speed ways to do this in one query. Maybe an estimate(*) that works
>>>>> like count but gives an answer from the index without checking
>>>>> visibility? I am sure that this would be good enough to make a page
>>>>> list, it is really no big deal if it errors on the positive side,
>>>>> maybe the list of pages has an extra page off the end. I can live with
>>>>> that. What I can't live with is taking 13 seconds to get a page of
>>>>> results from 850,000 rows in a table.
> Good point, maybe this is turning more into a discussion of how to generate a
> list of pages of results and one page of results with one query so we don't
> have to do the same painfully slow query twice to do a very common task.
>
> On the other hand, I copied a table out of one of my production servers that
> has about 60,000 rows with 6 columns (numeric, numeric, bool, bool, timestamp,
> text). The first numeric column has numbers evenly spread between 0 and 100
> and it is indexed. I put the table in a pair of database servers both running
> on the same physical hardware. One server is Postgres, the other is a popular
> server (I am not mentioning names here). on Postgres: SELECT count(*) FROM
> table where column>50; takes about 8 seconds to run. The other database server
> took less than one second (about 25 ms) as it is using the index (I assume) to
> come up with the results. It is true that this is not a fair test because both
> servers were tested with their default settings, and the defaults for Postgres
> are much more conservative, however, I don't think that any amount of settings
> tweaking will bring them even in the same ball park. There has been discussion
> about the other server returning an incorrect count because all of the indexed
> rows may not be live at the time. This is not a problem for the intended use,
> that is why I suggested another function like estimate(*). It's name suggests
> that the result will be close, not 100% correct, which is plenty good enough
> for generating a list of results pages in most cases. I am faced with a very
> serious problem here. If the query to make a list of pages takes say 6 seconds
> and it takes another 6 seconds to generate a page of results, the customer is
> waiting 12 seconds. This is not going to work. If count made a quick estimate,
> say less than a second, and it took 6 seconds to come up with the actual
> results, I could live with that. Or if coming up with the window of results
> via (OFFSET and LIMIT) and returned the total number of rows that would have
> matched the query, then I would still have everything I need to render the
> page in a reasonable time. I really think that this needs to be addressed
> somewhere. It's not like I am the only one that does this. You see it nearly
> everywhere a long list of results is (expected to be) returned in a web site.
> Among the people I work with, this seems to be the most mentioned reason that
> they claim that they don't use Postgres for their projects.
>
> It would be nice to see how the server comes up with the search results and
> list of links to pages of results for this mailing list.
> (http://search.postgresql.org/search?q=slow+count%28%29&m=1&l=&d=365&s=r) I am
> guessing that it probably uses the count and query method I am talking about.
>
>> I guess there could be *some* way to expose the query planner's cost
>> estimates in a manner useful for result count estimation ... but given
>> how coarse its stats are and how wildly out the estimates can be, I kind
>> of doubt it. It's really intended for query planning decisions and more
>> interested in orders of magnitude, "0, 1, or more than that" measures,
>> etc, and seems to consider 30% here or there to be pretty insignificant
>> most of the time.
>>
>>> It's bad enough that count(*) is slow, then you have to do it all over
>>> again to get the results you need! I have not dug into this much yet,
>>> but would it be possible to return the amount of rows that a WHERE
>>> clause would actually return if the LIMIT and OFFSET were not applied.
>>> IE: When a normal query is executed, the server returns the number of
>>> rows aside from the actual row data. Would it be a big deal to modify
>>> this to allow it to return the amount of rows before the LIMIT and
>>> OFFSET is applied as well?
>> It'd force the server to fully execute the query. Then again, it sounds
>> like you're doing that anyway.
How big is your DB?
How fast is your disk access?
Any chance disks/RAM can be addressed?

My disk access is pitiful...
first run, 2.3 million rows.. 0m35.38s, subsequent runs.. real    0m2.55s

rthompso@hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
   count
---------
  2340704
(1 row)


real    0m35.38s
user    0m0.25s
sys     0m0.03s

subsequent runs.... (count changes due to inserts.)

rthompso@hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
   count
---------
  2363707
(1 row)


real    0m2.70s
user    0m0.27s
sys     0m0.02s
rthompso@hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
   count
---------
  2363707
(1 row)


real    0m2.55s
user    0m0.26s
sys     0m0.02s
rthompso@hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
   count
---------
  2363707
(1 row)


real    0m2.50s
user    0m0.26s
sys     0m0.02s

reporting=# SELECT pg_size_pretty(pg_total_relation_size('my_production_table'));
  pg_size_pretty
----------------
  1890 MB
(1 row)


--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Craig Ringer
In reply to this post by Neil Whelchel
On 10/10/2010 6:29 PM, Neil Whelchel wrote:
> On the other hand, I copied a table out of one of my production servers that
> has about 60,000 rows with 6 columns (numeric, numeric, bool, bool, timestamp,
> text). The first numeric column has numbers evenly spread between 0 and 100
> and it is indexed. I put the table in a pair of database servers both running
> on the same physical hardware. One server is Postgres, the other is a popular
> server (I am not mentioning names here).

Please do. Your comment is pretty meaningless otherwise.

If you're talking about MySQL: Were you using InnoDB or MyISAM table
storage? Of course it's fast with MyISAM, it relies on locks to do
updates and has bugger all capability for write concurrency, or to
permit readers while writing is going on.

If you're using InnoDB, then I'd like to know how they've managed that.

If you're talking about some *other* database, please name it and
provide any useful details, because the hand waving is not helpful.

 > I don't think that any amount of settings
> tweaking will bring them even in the same ball park.

If you are, in fact, comparing MySQL+MyISAM and PostgreSQL, then you're
quite right. Pg will never have such a fast count() as MyISAM does or
the same insanely fast read performance, and MyISAM will never be as
reliable, robust or concurrency-friendly as Pg is. Take your pick, you
can't have both.

> There has been discussion
> about the other server returning an incorrect count because all of the indexed
> rows may not be live at the time. This is not a problem for the intended use,
> that is why I suggested another function like estimate(*). It's name suggests
> that the result will be close, not 100% correct, which is plenty good enough
> for generating a list of results pages in most cases.

Do you have any practical suggestions for generating such an estimate,
though? I find it hard to think of any way the server can do that
doesn't involve executing the query. The table stats are WAY too general
and a bit hit-and-miss, and there isn't really any other way to do it.

If all you want is a way to retrieve both a subset of results AND a
count of how many results would've been generated, it sounds like all
you really need is a way to get the total number of results returned by
a cursor query, which isn't a big engineering challenge. I expect that
in current Pg versions a trivial PL/PgSQL function could be used to
slurp and discard unwanted results, but a better in-server option to
count the results from a cursor query would certainly be nice.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Mladen Gogala-4
In reply to this post by Craig Ringer
Craig Ringer wrote:

> On 10/10/2010 9:54 AM, Mladen Gogala wrote:
>
>  
>> Unfortunately, the problem is in the rather primitive way PostgreSQL
>> does I/O. It didn't change in 9.0 so there is nothing you could gain by
>> upgrading. If you execute strace -o /tmp/pg.out -e read <PID of the
>> sequential scan process> and inspect the file /tmp/pg.out when the query
>> finishes, you will notice a gazillion of read requests, all of them 8192
>> bytes in size. That means that PostgreSQL is reading the table block by
>> block, without any merging of the requests.
>>    
>
> I'd be really interested in any measurements you've done to determine
> the cost of this over doing reads in larger chunks. If they're properly
> detailed and thought out, the -hackers list is likely to be interested
> as well.
>  
I can provide measurements, but from  Oracle RDBMS. Postgres doesn't
allow tuning of that aspect, so no measurement can be done. Would the
numbers from Oracle RDBMS be acceptable?


> The Linux kernel, at least, does request merging (and splitting, and
> merging, and more splitting) along the request path, and I'd personally
> expect that most of the cost of 8k requests would be in the increased
> number of system calls, buffer copies, etc required. Measurements
> demonstrating or contradicting this would be good to see.
>  

Even the cost of hundreds of thousands of context switches is far from
negligible. What kind of measurements do you expect me to do with the
database which doesn't support  tweaking of that aspect of its operation?

> It's worth being aware that there are memory costs to doing larger
> reads, especially when you have many backends each of which want to
> allocate a larger buffer for reading.

Oh, it's not only larger memory, the buffer management would have to be
changed too, to prevent process doing a sequential scan from inundating
the shared buffers. Alternatively, the blocks would have to be written
into the private memory and immediately thrown away after that. However,
the experience with Oracle tells me that this is well worth it. Here are
the numbers:

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> show parameter db_file_multi

NAME                                 TYPE        VALUE
------------------------------------ -----------
------------------------------
db_file_multiblock_read_count        integer     16
SQL> alter session set db_file_multiblock_read_count=1;

Session altered.
SQL> select count(*) from ni_occurrence;

  COUNT(*)
----------
 402062638

Elapsed: 00:08:20.88
SQL> alter session set db_file_multiblock_read_count=128;

Session altered.

Elapsed: 00:00:00.50
SQL>  select count(*) from ni_occurrence;

  COUNT(*)
----------
 402062638

Elapsed: 00:02:17.58


In other words, when I batched the sequential scan to do 128 blocks I/O,
it was 4 times faster than when I did the single block I/O.
Does that provide enough of an evidence and, if not, why not?


> If you can use a chunk of
> shared_buffers as the direct destination for the read that's OK, but
> otherwise you're up for (1mb-8kb)*num_backends extra memory use on I/O
> buffers that could otherwise be used as shared_buffers or OS cache.
>
> Async I/O, too, has costs.
>  

There is a common platitude that says that there is no such thing as
free lunch. However, both Oracle RDBMS and IBM DB2 use asynchronous I/O,
probably because they're unaware of the danger. Let me now give you a
full table scan of a much smaller table located in a Postgres database:

news=> select count(*) from internet_web_sites;
  count
---------
 1290133
(1 row)

Time: 12838.958 ms


Oracle counts 400 million records in 2 minutes and Postgres 9.01 takes
12.8 seconds to count 1.2 million records?  Do you see the disparity?

Both databases, Oracle and Postgres, are utilizing the same 3Par SAN
device, the machines housing both databases are comparable HP 64 bit
Linux machines,  both running 64 bit version of Red Hat 5.5. Respective
table sizes are here:

SQL> select bytes/1048576 as MB from user_segments
  2  where segment_name='NI_OCCURRENCE';

        MB
----------
     35329

news=> select pg_size_pretty(pg_table_size('moreover.internet_web_sites'));
 pg_size_pretty
----------------
 216 MB
(1 row)

So, I really pushed Oracle much harder than I pushed Postgres.

>  > PostgreSQL is in
>  
>> dire need of something similar and it wouldn't even be that hard to
>> implement.
>>    
>
> I'd really like to see both those assertions backed with data or patches ;-)
>  

With the database that doesn't allow tuning of that aspect, it's the
self-defeating proposition. However, I did my best to give you the numbers.

> Personally, I know just enough about how PG's I/O path works to suspect
> that "not that hard to implement" is probably a little ...
> over-optimistic. Sure, it's not that hard to implement in a new program
> with no wired-in architectural and design choices; that doesn't mean
> it's easy to retrofit onto existing code, especially a bunch of
> co-operating processes with their own buffer management.
>
>  
It maybe so, but slow sequential scan is still the largest single
performance problem of PostgreSQL. The frequency with which that topic
appears on the mailing lists should serve as a good evidence for that. I
did my best to prove my case.  Again, requiring "hard numbers" when
using the database which doesn't allow tweaking of the I/O size is self
defeating proposition. The other databases, like DB2 and Oracle both
allow tweaking of that aspect of its operation, Oracle even on the per
session basis. If you still claim that it wouldn't make the difference,
the onus to prove it is on you.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Reid Thompson
In reply to this post by Reid Thompson
  On 10/10/2010 11:02 AM, Reid Thompson wrote:

>>>> On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel<[hidden email]>
>>>>
>> On the other hand, I copied a table out of one of my production servers that
>> has about 60,000 rows with 6 columns (numeric, numeric, bool, bool, timestamp,
>> text). The first numeric column has numbers evenly spread between 0 and 100
>> and it is indexed. I put the table in a pair of database servers both running
>> on the same physical hardware. One server is Postgres, the other is a popular
>> server (I am not mentioning names here). on Postgres: SELECT count(*) FROM
>> table where column>50; takes about 8 seconds to run. The other database server
>> took less than one second (about 25 ms) as it is using the index (I assume) to
>> come up with the results. It is true that this is not a fair test because both
>> servers were tested with their default settings, and the defaults for Postgres
>> are much more conservative, however, I don't think that any amount of settings
>> tweaking will bring them even in the same ball park. There has been discussion
>> about the other server returning an incorrect count because all of the indexed
>> rows may not be live at the time. This is not a problem for the intended use,
>> that is why I suggested another function like estimate(*). It's name suggests
>> that the result will be close, not 100% correct, which is plenty good enough
>> for generating a list of results pages in most cases. I am faced with a very
>> serious problem here. If the query to make a list of pages takes say 6 seconds
>> and it takes another 6 seconds to generate a page of results, the customer is
>> waiting 12 seconds. This is not going to work. If count made a quick estimate,
>> say less than a second, and it took 6 seconds to come up with the actual
>> results, I could live with that. Or if coming up with the window of results
>> via (OFFSET and LIMIT) and returned the total number of rows that would have
>> matched the query, then I would still have everything I need to render the
>> page in a reasonable time. I really think that this needs to be addressed
>> somewhere. It's not like I am the only one that does this. You see it nearly
>> everywhere a long list of results is (expected to be) returned in a web site.
>> Among the people I work with, this seems to be the most mentioned reason that
>> they claim that they don't use Postgres for their projects. t anyway.
>
> How big is your DB?
> How fast is your disk access?
> Any chance disks/RAM can be addressed?
>
> My disk access is pitiful...
> first run, 2.3 million rows.. 0m35.38s, subsequent runs.. real    0m2.55s
>
> rthompso@hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
>   count
> ---------
>  2340704
> (1 row)
>
>
> real    0m35.38s
> user    0m0.25s
> sys     0m0.03s
>
> subsequent runs.... (count changes due to inserts.)
>
> rthompso@hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
>   count
> ---------
>  2363707
> (1 row)
>
>
> real    0m2.70s
> user    0m0.27s
> sys     0m0.02s
> rthompso@hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
>   count
> ---------
>  2363707
> (1 row)
>
>
> real    0m2.55s
> user    0m0.26s
> sys     0m0.02s
> rthompso@hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
>   count
> ---------
>  2363707
> (1 row)
>
>
> real    0m2.50s
> user    0m0.26s
> sys     0m0.02s
>
> reporting=# SELECT pg_size_pretty(pg_total_relation_size('my_production_table'));
>  pg_size_pretty
> ----------------
>  1890 MB
> (1 row)
>
>
forgot to note, my table schema is significantly larger.

rthompso@hw-prod-repdb1> time psql -c "\d my_production_table_201010" reporting
                                               Table "public.my_production_table_201010"
            Column            |            Type             |                           Modifiers
-----------------------------+-----------------------------+----------------------------------------------------------------
                              | integer                     | not null default
nextval('my_production_table_parent_id_seq'::regclass)
                              | character varying(20)       |
                              | character(1)                |
                              | character varying(32)       |
                              | character varying(32)       |
                              | character varying(20)       |
                              | character varying(5)        |
                              | character varying(5)        |
                              | date                        |
                              | character(1)                |
                              | character varying(32)       |
                              | character varying(32)       |
                              | character varying(32)       |
                              | character varying(2)        |
                              | character varying(10)       |
                              | character varying(10)       |
                              | character varying(32)       |
                              | character varying(7)        |
                              | character varying(10)       |
                              | character varying(2)        |
                              | character varying(9)        |
                              | character varying(9)        |
                              | character varying(9)        |
                              | character varying(10)       |
                              | character varying(32)       |
                              | character varying(32)       |
                              | character varying(20)       |
                              | character varying(5)        |
                              | character varying(5)        |
                              | character varying(32)       |
                              | character varying(32)       |
                              | character varying(32)       |
                              | character varying(2)        |
                              | character varying(10)       |
                              | character varying(10)       |
                              | character varying(10)       |
                              | character varying(10)       |
                              | integer                     |
                              | character varying(2)        |
                              | character varying(32)       |
                              | character varying(32)       |
                              | integer                     |
                              | integer                     |
                              | text                        |
                              | character varying(3)        |
                              | date                        |
                              | date                        |
                              | date                        |
                              | integer                     |
                              | integer                     |
                              | integer                     |
                              | integer                     |
                              | character varying(6)        |
                              | character varying(10)       |
                              | character varying(32)       |
                              | character varying(32)       |
                              | character varying(32)       |
                              | character varying(10)       |
                              | character varying(6)        |
                              | character varying(8)        |
                              | boolean                     |
                              | character(1)                |
                              | date                        |
                              | integer                     |
                              | date                        |
                              | character varying(11)       |
                              | character varying(4)        |
                              | character(1)                |
                              | date                        |
                              | character varying(5)        |
                              | character varying(20)       |
                              | date                        |
                              | character(1)                |
                              | character(1)                |
                              | character varying(2)        |
                              | text                        |
                              | integer                     |
                              | integer                     |
                              | timestamp without time zone | default now()
                              | timestamp without time zone |
                              | character varying(64)       |
                              | character varying(64)       |
                              | character varying(64)       |
Indexes:
     "my_production_table_201010_pkey" PRIMARY KEY, btree (id)
     "my_production_table_201010_date_idx" btree (xxxxdate), tablespace "indexspace"
     "my_production_table_201010_epatient_idx" btree (storeid, xxxxxxxxxxxxx), tablespace "indexspace"
     "my_production_table_201010_medicationname_idx" btree (xxxxxxxxxxxxxx), tablespace "indexspace"
     "my_production_table_201010_ndc_idx" btree (xxx), tablespace "indexspace"
Check constraints:
     "my_production_table_201010_filldate_check" CHECK (xxxxdate >= '2010-10-01'::date AND xxxxdate <
'2010-11-01'::date)
Foreign-key constraints:
     "my_production_table_201010_pkgfileid_fkey" FOREIGN KEY (pkgfileid) REFERENCES my_production_tablefiles(id)
Inherits: my_production_table_parent



--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Neil Whelchel
In reply to this post by Vitalii Tymchyshyn
On Sunday 10 October 2010 05:02:03 Віталій Тимчишин wrote:

> 2010/10/10 Neil Whelchel <[hidden email]>
>
> > On Saturday 09 October 2010 18:47:34 Scott Marlowe wrote:
> > > On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel <[hidden email]>
> >
> > wrote:
> > > > I know that there haven been many discussions on the slowness of
> >
> > count(*)
> >
> > > > even when an index is involved because the visibility of the rows has
> >
> > to
> >
> > > > be checked. In the past I have seen many suggestions about using
> > > > triggers and tables to keep track of counts and while this works fine
> >
> > in
> >
> > > > a situation where you know what the report is going to be ahead of
> >
> > time,
> >
> > > > this is simply not an option when an unknown WHERE clause is to be
> > > > used (dynamically generated). I ran into a fine example of this when
> > > > I was searching this mailing list, "Searching in 856,646 pages took
> > > > 13.48202 seconds. Site search powered by PostgreSQL 8.3." Obviously
> > > > at some
> >
> > point
> >
> > > > count(*) came into play here because the site made a list of pages (1
> > > > 2 3 4 5 6 > next). I very commonly make a list of pages from search
> > > > results, and the biggest time killer here is the count(*) portion,
> > > > even worse yet, I sometimes have to hit the database with two SELECT
> > > > statements, one with OFFSET and LIMIT to get the page of results I
> > > > need and another to get the amount of total rows so I can estimate
> > > > how many pages of results are available. The point I am driving at
> > > > here is that since building a list of pages of results is such a
> > > > common thing to do, there need to be some specific high speed ways
> > > > to do this in one query. Maybe an estimate(*) that works like count
> > > > but gives an answer from the index without checking visibility? I am
> > > > sure that this would be good enough to make a page list, it is
> > > > really no big deal if it errors on
> >
> > the
> >
> > > > positive side, maybe the list of pages has an extra page off the end.
> > > > I can live with that. What I can't live with is taking 13 seconds to
> > > > get
> >
> > a
> >
> > > > page of results from 850,000 rows in a table.
> > >
> > > 99% of the time in the situations you don't need an exact measure, and
> > > assuming analyze has run recently, select rel_tuples from pg_class for
> > > a given table is more than close enough.  I'm sure wrapping that in a
> > > simple estimated_rows() function would be easy enough to do.
> >
> > This is a very good approach and it works very well when you are counting
> > the
> > entire table, but when you have no control over the WHERE clause, it
> > doesn't
> > help. IE: someone puts in a word to look for in a web form.
> >
> > From my perspective, this issue is the biggest problem there is when
> > using Postgres to create web pages, and it is so commonly used, I think
> > that there
> > should be a specific way to deal with it so that you don't have to run
> > the same WHERE clause twice.
> > IE: SELECT count(*) FROM <table> WHERE <clause>; to get the total amount
> > of items to make page navigation links, then:
> > SELECT <columns> FROM table WHERE <clause> LIMIT <items_per_page> OFFSET
> > <(page_no-1)*items_per_page>; to get the actual page contents.
> >
> > How about
>
> select * from (select *, count(*) over () as total_count from <table> where
> <clause)  a LIMIT <items_per_page> OFFSET
> <(page_no-1)*items_per_page>
> It will return you total_count column with equal value in each row. You may
> have problems if no rows are returned (e.g. page num is too high).

I have done this before, but the speedup from the two hits to the database
that I mentioned above is tiny, just a few ms. It seems to end up doing about
the same thing on the database end. The reason that I don't commonly do this
is what you said about not getting a count result if you run off the end.
-Neil-

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Craig Ringer
In reply to this post by Mladen Gogala-4
On 10/11/2010 01:14 AM, Mladen Gogala wrote:

> I can provide measurements, but from Oracle RDBMS. Postgres doesn't
> allow tuning of that aspect, so no measurement can be done. Would the
> numbers from Oracle RDBMS be acceptable?

Well, they'd tell me a lot about Oracle's performance as I/O chunk size
scales, but almost nothing about the cost of small I/O operations vs
larger ones in general.

Typically dedicated test programs that simulate the database read
patterns would be used for this sort of thing. I'd be surprised if
nobody on -hackers has already done suitable testing; I was mostly
asking because I was interested in how you were backing your assertions.

PostgreSQL isn't Oracle; their design is in many ways very different.
Most importantly, Oracle uses a redo log, where PostgreSQL stores old
rows with visibility information directly in the tables. It is possible
that a larger proportion of Oracle's I/O costs are fixed per-block
overheads rather than per-byte costs, so it seeks to batch requests into
larger chunks. Of course, it's also possible that 8k chunk I/O is just
universally expensive and is something Pg should avoid, too, but we
can't know that without
dedicated testing, which I at least haven't done. I don't follow
-hackers closely, and wouldn't have seen discussion about testing done
there. The archives are likely to contain useful discussions.

Then again, IIRC Pg's page size is also it's I/O size, so you could
actually get larger I/O chunking by rebuilding Pg with larger pages.
Having never had the need, I haven't examined the performance of page
size changes on I/O performance.

>> The Linux kernel, at least, does request merging (and splitting, and
>> merging, and more splitting) along the request path, and I'd
>> personally expect that most of the cost of 8k requests would be in the
>> increased number of system calls, buffer copies, etc required.
>> Measurements demonstrating or contradicting this would be good to see.
>
> Even the cost of hundreds of thousands of context switches is far from
> negligible. What kind of measurements do you expect me to do with the
> database which doesn't support tweaking of that aspect of its operation?

Test programs, or references to testing done by others that demonstrates
these costs in isolation. Of course, they still wouldn't show what gain
Pg might obtain (nothing except hacking on Pg's sources really will) but
they'd help measure the costs of doing I/O that way.

I suspect you're right that large I/O chunks would be desirable and a
potential performance improvement. What I'd like to know is *how*
*much*, or at least how much the current approach costs in pure
overheads like context switches and scheduler delays.

> Does that provide enough of an evidence and, if not, why not?

It shows that it helps Oracle a lot ;-)

Without isolating how much of that is raw costs of the block I/O and how
much is costs internal to Oracle, it's still hard to have much idea how
much it'd benefit Pg to take a similar approach.

I'm sure the folks on -hackers have been over this and know a whole lot
more about it than I do, though.

> Oracle counts 400 million records in 2 minutes and Postgres 9.01 takes
> 12.8 seconds to count 1.2 million records? Do you see the disparity?

Sure. What I don't know is how much of that is due to block sizes. There
are all sorts of areas where Oracle could be gaining.

> It maybe so, but slow sequential scan is still the largest single
> performance problem of PostgreSQL. The frequency with which that topic
> appears on the mailing lists should serve as a good evidence for that.

I'm certainly not arguing that it could use improvement; it's clearly
hurting some users. I just don't know if I/O chunking is the answer - I
suspect that if it were, then it would've become a priority for one or
more people working on Pg much sooner.

It's quite likely that it's one of those things where it makes a huge
difference for Oracle because Oracle has managed to optimize out most of
the other bigger costs. If Pg still has other areas that make I/O more
expensive per-byte (say, visibility checks) and low fixed per-block
costs, then there'd be little point in chunking I/O. My understanding is
that that's pretty much how things stand at the moment, but I'd love
verification from someone who's done the testing.

>If you still claim that it wouldn't make the difference,
> the onus to prove it is on you.

I didn't mean to claim that it would make no difference. If I sounded
like it, sorry.

I just want to know how _much_ , or more accurately how great the
overheads of the current approach in Pg are vs doing much larger reads.

--
Craig Ringer


--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Jon Nelson-14
In reply to this post by Mladen Gogala-4
On Sun, Oct 10, 2010 at 12:14 PM, Mladen Gogala
<[hidden email]> wrote:
>
>
>
> In other words, when I batched the sequential scan to do 128 blocks I/O, it
> was 4 times faster than when I did the single block I/O.
> Does that provide enough of an evidence and, if not, why not?


These numbers tell us nothing because, unless you dropped the caches
between runs, then at least part of some runs was very probably
cached.

--
Jon

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Joshua Tolley
In reply to this post by Craig Ringer
On Mon, Oct 11, 2010 at 06:41:16AM +0800, Craig Ringer wrote:

> On 10/11/2010 01:14 AM, Mladen Gogala wrote:
>
>> I can provide measurements, but from Oracle RDBMS. Postgres doesn't
>> allow tuning of that aspect, so no measurement can be done. Would the
>> numbers from Oracle RDBMS be acceptable?
>
> Well, they'd tell me a lot about Oracle's performance as I/O chunk size  
> scales, but almost nothing about the cost of small I/O operations vs  
> larger ones in general.
>
> Typically dedicated test programs that simulate the database read  
> patterns would be used for this sort of thing. I'd be surprised if  
> nobody on -hackers has already done suitable testing; I was mostly  
> asking because I was interested in how you were backing your assertions.
One thing a test program would have to take into account is multiple
concurrent users. What speeds up the single user case may well hurt the
multi user case, and the behaviors that hurt single user cases may have been
put in place on purpose to allow decent multi-user performance. Of course, all
of that is "might" and "maybe", and I can't prove any assertions about block
size either. But the fact of multiple users needs to be kept in mind.

It was asserted that reading bigger chunks would help performance; a response
suggested that, at least in Linux, setting readahead on a device would
essentially do the same thing. Or that's what I got from the thread, anyway.
I'm interested to know how similar performance might be between the large
block size case and the large readahead case. Comments, anyone?

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

signature.asc (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow count(*) again...

Craig Ringer
On 10/11/2010 08:27 AM, Joshua Tolley wrote:

> One thing a test program would have to take into account is multiple
> concurrent users. What speeds up the single user case may well hurt the
> multi user case, and the behaviors that hurt single user cases may have been
> put in place on purpose to allow decent multi-user performance. Of course, all
> of that is "might" and "maybe", and I can't prove any assertions about block
> size either. But the fact of multiple users needs to be kept in mind.

Agreed. I've put together a simple test program to test I/O chunk sizes.
It only tests single-user performance, but it'd be pretty trivial to
adapt it to spawn a couple of worker children or run several threads,
each with a suitable delay as it's rather uncommon to have a bunch of
seqscans all fire off at once.

 From this test it's pretty clear that with buffered I/O of an uncached
700mb file under Linux, the I/O chunk size makes very little difference,
with all chunk sizes taking 9.8s to read the test file, with
near-identical CPU utilization. Caches were dropped between each test run.

For direct I/O (by ORing the O_DIRECT flag to the open() flags), chunk
size is *hugely* significant, with 4k chunk reads of the test file
taking 38s, 8k 22s, 16k 14s, 32k 10.8s, 64k - 1024k 9.8s, then rising a
little again over 1024k.

Apparently Oracle is almost always configured to use direct I/O, so it
would benefit massively from large chunk sizes. PostgreSQL is almost
never used with direct I/O, and at least in terms of the low-level costs
of syscalls and file system activity, shouldn't care at all about read
chunk sizes.

Bumping readahead from 256 to 8192 made no significant difference for
either case. Of course, I'm on a crappy laptop disk...

I'm guessing this is the origin of the OP's focus on I/O chunk sizes.

Anyway, for the single-seqscan case, I see little evidence here that
using a bigger read chunk size would help PostgreSQL reduce overheads or
improve performance.

OP: Is your Oracle instance using direct I/O?

--
Craig Ringer


--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

testio.c (959 bytes) Download Attachment
results_buffered_ra256 (1K) Download Attachment
results_buffered_ra4096 (1K) Download Attachment
results_odirect_ra256 (1K) Download Attachment
results_odirect_ra4096 (1K) Download Attachment
1234 ... 6
Loading...