Quantcast

count * performance issue

classic Classic list List threaded Threaded
107 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

count * performance issue

sathiya psql
count(*) tooks much time...

but with the where clause we can make this to use indexing,... what where clause we can use??

Am using postgres 7.4 in Debian OS with 1 GB RAM,

am having a table with nearly 50 lakh records,

it has more than 15 columns, i want to count how many records are there, it is taking nearly 17 seconds to do that...

i know that to get a approximate count we can use
         SELECT reltuples FROM pg_class where relname = TABLENAME;

but this give approximate count, and i require exact count...

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

chris smith-9
sathiya psql wrote:
> count(*) tooks much time...
>
> but with the where clause we can make this to use indexing,... what
> where clause we can use??
>
> Am using postgres 7.4 in Debian OS with 1 GB RAM,
>
> am having a table with nearly 50 lakh records,

Looks suspiciously like a question asked yesterday:

http://archives.postgresql.org/pgsql-performance/2008-03/msg00068.php

--
Postgresql & php tutorials
http://www.designmagick.com/

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

A. Kretschmer
In reply to this post by sathiya psql
am  Thu, dem 06.03.2008, um 11:13:01 +0530 mailte sathiya psql folgendes:
> count(*) tooks much time...
>
> but with the where clause we can make this to use indexing,... what where
> clause we can use??

An index without a WHERE can't help to avoid a seq. scan.


>
> Am using postgres 7.4 in Debian OS with 1 GB RAM,

PG 7.4 are very old... Recent versions are MUCH faster.



>
> am having a table with nearly 50 lakh records,
>
> it has more than 15 columns, i want to count how many records are there, it is
> taking nearly 17 seconds to do that...
>
> i know that to get a approximate count we can use
>          SELECT reltuples FROM pg_class where relname = TABLENAME;
>
> but this give approximate count, and i require exact count...

There aren't a general solution. If you realy need the exact count of
tuples than you can play with a TRIGGER and increase/decrease the
tuple-count for this table in an extra table.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

Shoaib Mir
On Thu, Mar 6, 2008 at 5:08 PM, A. Kretschmer <[hidden email]> wrote:>
> am having a table with nearly 50 lakh records,
>
> it has more than 15 columns, i want to count how many records are there, it is
> taking nearly 17 seconds to do that...
>
> i know that to get a approximate count we can use
>          SELECT reltuples FROM pg_class where relname = TABLENAME;
>
> but this give approximate count, and i require exact count...

There aren't a general solution. If you realy need the exact count of
tuples than you can play with a TRIGGER and increase/decrease the
tuple-count for this table in an extra table.



Or do something like:

ANALYZE tablename;
select reltuple from pg_class where relname = 'tablename';

That will also return the total number of rows in a table and I guess might be much faster then doing a count(*) but yes if trigger can be an option that can be the easiest way to do it and fastest too.

--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

sathiya psql
buy every time i need to put ANALYZE...
this takes the same time as count(*) takes, what is the use ??

On Thu, Mar 6, 2008 at 11:45 AM, Shoaib Mir <[hidden email]> wrote:
On Thu, Mar 6, 2008 at 5:08 PM, A. Kretschmer <[hidden email]> wrote:>
> am having a table with nearly 50 lakh records,
>
> it has more than 15 columns, i want to count how many records are there, it is
> taking nearly 17 seconds to do that...
>
> i know that to get a approximate count we can use
>          SELECT reltuples FROM pg_class where relname = TABLENAME;
>
> but this give approximate count, and i require exact count...

There aren't a general solution. If you realy need the exact count of
tuples than you can play with a TRIGGER and increase/decrease the
tuple-count for this table in an extra table.



Or do something like:

ANALYZE tablename;
select reltuple from pg_class where relname = 'tablename';

That will also return the total number of rows in a table and I guess might be much faster then doing a count(*) but yes if trigger can be an option that can be the easiest way to do it and fastest too.

--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

Shoaib Mir
On Thu, Mar 6, 2008 at 5:19 PM, sathiya psql <[hidden email]> wrote:
buy every time i need to put ANALYZE...
this takes the same time as count(*) takes, what is the use ??



Dont you have autovacuuming running in the background which is taking care of the analyze as well?

If not then hmm turn it on and doing manual analyze then shouldnt I guess take much time!

But yes, I will say if its possible go with the trigger option as that might be more helpful and a very fast way to do that.

--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

Mark Mielke
In reply to this post by Shoaib Mir
 
There aren't a general solution. If you realy need the exact count of
tuples than you can play with a TRIGGER and increase/decrease the
tuple-count for this table in an extra table.

Of course, this means accepting the cost of obtaining update locks on the count table.

The original poster should understand that they can either get a fast estimated count, or they can get a slow accurate count (either slow in terms of select using count(*) or slow in terms of updates using triggers and locking).

Other systems have their own issues. An index scan may be faster than a table scan for databases that can accurately determine counts using only the index, but it's still a relatively slow operation, and people don't normally need an accurate count for records in the range of 100,000+? :-)

Cheers,
mark

-- 
Mark Mielke [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

sathiya psql
In reply to this post by Shoaib Mir
will you please tell, what is autovacuuming... and wat it ll do... is there any good article in this....

On Thu, Mar 6, 2008 at 11:56 AM, Shoaib Mir <[hidden email]> wrote:
On Thu, Mar 6, 2008 at 5:19 PM, sathiya psql <[hidden email]> wrote:
buy every time i need to put ANALYZE...
this takes the same time as count(*) takes, what is the use ??



Dont you have autovacuuming running in the background which is taking care of the analyze as well?

If not then hmm turn it on and doing manual analyze then shouldnt I guess take much time!

But yes, I will say if its possible go with the trigger option as that might be more helpful and a very fast way to do that.


--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

Shoaib Mir


On Thu, Mar 6, 2008 at 5:31 PM, sathiya psql <[hidden email]> wrote:
will you please tell, what is autovacuuming... and wat it ll do... is there any good article in this....



Read this --> http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html#AUTOVACUUM

--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

A. Kretschmer
In reply to this post by Mark Mielke
am  Thu, dem 06.03.2008, um  1:26:46 -0500 mailte Mark Mielke folgendes:

>  
>
>         There aren't a general solution. If you realy need the exact count of
>         tuples than you can play with a TRIGGER and increase/decrease the
>         tuple-count for this table in an extra table.
>
>
> Of course, this means accepting the cost of obtaining update locks on the count
> table.
>
> The original poster should understand that they can either get a fast estimated
> count, or they can get a slow accurate count (either slow in terms of select
> using count(*) or slow in terms of updates using triggers and locking).
>
> Other systems have their own issues. An index scan may be faster than a table
> scan for databases that can accurately determine counts using only the index,

No. The current index-implementation contains no information about the
row-visibility within the current transaction. You need to scan the
whole data-table to obtain if the current row are visible within the
current transaction.


> but it's still a relatively slow operation, and people don't normally need an
> accurate count for records in the range of 100,000+? :-)

right.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

sathiya psql
is there any way to explicitly force the postgres to use index scan

On Thu, Mar 6, 2008 at 12:06 PM, A. Kretschmer <[hidden email]> wrote:
am  Thu, dem 06.03.2008, um  1:26:46 -0500 mailte Mark Mielke folgendes:
>
>
>         There aren't a general solution. If you realy need the exact count of
>         tuples than you can play with a TRIGGER and increase/decrease the
>         tuple-count for this table in an extra table.
>
>
> Of course, this means accepting the cost of obtaining update locks on the count
> table.
>
> The original poster should understand that they can either get a fast estimated
> count, or they can get a slow accurate count (either slow in terms of select
> using count(*) or slow in terms of updates using triggers and locking).
>
> Other systems have their own issues. An index scan may be faster than a table
> scan for databases that can accurately determine counts using only the index,

No. The current index-implementation contains no information about the
row-visibility within the current transaction. You need to scan the
whole data-table to obtain if the current row are visible within the
current transaction.


> but it's still a relatively slow operation, and people don't normally need an
> accurate count for records in the range of 100,000+? :-)

right.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Fwd: count * performance issue

sathiya psql
In reply to this post by A. Kretschmer


---------- Forwarded message ----------
From: sathiya psql <[hidden email]>
Date: Thu, Mar 6, 2008 at 12:17 PM
Subject: Re: [PERFORM] count * performance issue
To: "A. Kretschmer" <[hidden email]>
Cc: [hidden email]


TRIGGER i can use if i want the count of the whole table, but i require for some of the rows with WHERE condition....

so how to do that ???


On Thu, Mar 6, 2008 at 12:06 PM, A. Kretschmer <[hidden email]> wrote:
am  Thu, dem 06.03.2008, um  1:26:46 -0500 mailte Mark Mielke folgendes:
>
>
>         There aren't a general solution. If you realy need the exact count of
>         tuples than you can play with a TRIGGER and increase/decrease the
>         tuple-count for this table in an extra table.
>
>
> Of course, this means accepting the cost of obtaining update locks on the count
> table.
>
> The original poster should understand that they can either get a fast estimated
> count, or they can get a slow accurate count (either slow in terms of select
> using count(*) or slow in terms of updates using triggers and locking).
>
> Other systems have their own issues. An index scan may be faster than a table
> scan for databases that can accurately determine counts using only the index,

No. The current index-implementation contains no information about the
row-visibility within the current transaction. You need to scan the
whole data-table to obtain if the current row are visible within the
current transaction.


> but it's still a relatively slow operation, and people don't normally need an
> accurate count for records in the range of 100,000+? :-)

right.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

A. Kretschmer
In reply to this post by sathiya psql
am  Thu, dem 06.03.2008, um 12:13:17 +0530 mailte sathiya psql folgendes:
> is there any way to explicitly force the postgres to use index scan

Not realy, PG use a cost-based optimizer and use an INDEX if it make
sense.


>
> On Thu, Mar 6, 2008 at 12:06 PM, A. Kretschmer <
> [hidden email]> wrote:

please, no silly top-posting with the complete quote below.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

A. Kretschmer
In reply to this post by A. Kretschmer
am  Thu, dem 06.03.2008, um 12:17:55 +0530 mailte sathiya psql folgendes:
> TRIGGER i can use if i want the count of the whole table, but i require for
> some of the rows with WHERE condition....
>
> so how to do that ???

Okay, in this case a TRIGGER are a bad idea. You can use an INDEX on
this row. Can you show us the output for a EXPLAIN ANALYSE SELECT
count(*) from <your_table> WHERE <your_row> = ... ?


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

sathiya psql

                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=205756.95..205756.95 rows=1 width=0) (actual time=114675.042..114675.042 rows=1 loops=1)
   ->  Seq Scan on call_log  (cost=0.00..193224.16 rows=5013112 width=0) (actual time=11.754..91429.594 rows=5061619 loops=1)
         Filter: (call_id > 0)
 Total runtime: 114699.797 ms
(4 rows)


it is now taking 114 seconds, i think because of load in my system.... any way will you explain., what is this COST, actual time and other stuffs....

On Thu, Mar 6, 2008 at 12:27 PM, A. Kretschmer <[hidden email]> wrote:
am  Thu, dem 06.03.2008, um 12:17:55 +0530 mailte sathiya psql folgendes:
> TRIGGER i can use if i want the count of the whole table, but i require for
> some of the rows with WHERE condition....
>
> so how to do that ???

Okay, in this case a TRIGGER are a bad idea. You can use an INDEX on
this row. Can you show us the output for a EXPLAIN ANALYSE SELECT
count(*) from <your_table> WHERE <your_row> = ... ?


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

A. Kretschmer
am  Thu, dem 06.03.2008, um 12:36:48 +0530 mailte sathiya psql folgendes:

>
>                                                           QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=205756.95..205756.95 rows=1 width=0) (actual time=
> 114675.042..114675.042 rows=1 loops=1)
>    ->  Seq Scan on call_log  (cost=0.00..193224.16 rows=5013112 width=0)
> (actual time=11.754..91429.594 rows=5061619 loops=1)
>          Filter: (call_id > 0)
>  Total runtime: 114699.797 ms
> (4 rows)

'call_id > 0' are your where-condition? An INDEX can't help, all rows
with call_id > 0 are in the result, and i guess, that's all records in
the table.


>
>
> it is now taking 114 seconds, i think because of load in my system.... any way
> will you explain., what is this COST, actual time and other stuffs....


08:16 < akretschmer> ??explain
08:16 < rtfm_please> For information about explain
08:16 < rtfm_please> see http://explain-analyze.info
08:16 < rtfm_please> or http://www.depesz.com/index.php/2007/08/06/better-explain-analyze/
08:16 < rtfm_please> or http://www.postgresql.org/docs/current/static/sql-explain.html

and

http://redivi.com/~bob/oscon2005_pgsql_pdf/OSCON_Explaining_Explain_Public.pdf


Read this to learn more about explain.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

Dave Cramer-8
In reply to this post by sathiya psql

On 6-Mar-08, at 1:43 AM, sathiya psql wrote:

> is there any way to explicitly force the postgres to use index scan
>
>

If you want to count all the rows in the table there is only one way  
to do it (without keeping track yourself with a trigger ); a seq scan.

An index will not help you.

The only thing that is going to help you is really fast disks, and  
more memory, and you should consider moving to 8.3 for all the other  
performance benefits.

Dave

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

Dave Cramer-8
Hi,

On 6-Mar-08, at 6:58 AM, sathiya psql wrote:

The only thing that is going to help you is really fast disks, and
more memory, and you should consider moving to 8.3 for all the other
performance benefits.
Is 8.3 is a stable version or what is the latest stable version of postgres ??

Yes it is the latest stable version.
moving my database from 7.4 to 8.3 will it do any harm ??

You will have to test this yourself. There may be issues 
what are all the advantages of moving from 7.4 to 8.3

Every version of postgresql has improved performance, and robustness; so you will get better overall performance. However I want to caution you this is not a panacea. It will NOT solve your seq scan problem.


Dave


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

sathiya psql

Yes it is the latest stable version.

is there any article saying the difference between this 7.3 and 8.4


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: count * performance issue

Harald Armin Massa[legacy]
Of course, the official documentation covers that information in its
release notes

http://www.postgresql.org/docs/8.3/static/release.html

best wishes

Harald

On Thu, Mar 6, 2008 at 1:43 PM, sathiya psql <[hidden email]> wrote:

>
>
> >
> >
> >
> > Yes it is the latest stable version.
>
> is there any article saying the difference between this 7.3 and 8.4
>
>
>



--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Stra├če 49
70435 Stuttgart
0173/9409607
fx 01212-5-13695179
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
1234 ... 6
Loading...