Quantcast

Performance

classic Classic list List threaded Threaded
48 messages Options
123
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Performance

Ogden
I have been wrestling with the configuration of the dedicated Postges 9.0.3 server at work and granted, there's more activity on the production server, but the same queries take twice as long on the beefier server than my mac at home. I have pasted what I have changed in postgresql.conf - I am wondering if there's any way one can help me change things around to be more efficient.

Dedicated PostgreSQL 9.0.3 Server with 16GB Ram

Heavy write and read (for reporting and calculations) server. 

max_connections = 350 
shared_buffers = 4096MB  
work_mem = 32MB
maintenance_work_mem = 512MB


seq_page_cost = 0.02                    # measured on an arbitrary scale
random_page_cost = 0.03 
cpu_tuple_cost = 0.02  
effective_cache_size = 8192MB



The planner costs seem a bit low but this was from suggestions from this very list a while ago. 


Thank you

Ogden
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Andreas Kretschmer-2
Ogden <[hidden email]> wrote:

> I have been wrestling with the configuration of the dedicated Postges 9.0.3
> server at work and granted, there's more activity on the production server, but
> the same queries take twice as long on the beefier server than my mac at home.
> I have pasted what I have changed in postgresql.conf - I am wondering if
> there's any way one can help me change things around to be more efficient.
>
> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>
> Heavy write and read (for reporting and calculations) server.
>
> max_connections = 350
> shared_buffers = 4096MB  
> work_mem = 32MB
> maintenance_work_mem = 512MB

That's okay.


>
>
> seq_page_cost = 0.02                    # measured on an arbitrary scale
> random_page_cost = 0.03

Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
and random_page_cost) are completly wrong.



> cpu_tuple_cost = 0.02  
> effective_cache_size = 8192MB
>
>
>
> The planner costs seem a bit low but this was from suggestions from this very
> list a while ago.

Sure? Can you tell us a link into the archive?


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Ogden

On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote:

> Ogden <[hidden email]> wrote:
>
>> I have been wrestling with the configuration of the dedicated Postges 9.0.3
>> server at work and granted, there's more activity on the production server, but
>> the same queries take twice as long on the beefier server than my mac at home.
>> I have pasted what I have changed in postgresql.conf - I am wondering if
>> there's any way one can help me change things around to be more efficient.
>>
>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>>
>> Heavy write and read (for reporting and calculations) server.
>>
>> max_connections = 350
>> shared_buffers = 4096MB  
>> work_mem = 32MB
>> maintenance_work_mem = 512MB
>
> That's okay.
>
>
>>
>>
>> seq_page_cost = 0.02                    # measured on an arbitrary scale
>> random_page_cost = 0.03
>
> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
> and random_page_cost) are completly wrong.
>

No, I don't have super fast disks. Just the 15K SCSI over RAID. I find by raising them to:

seq_page_cost = 1.0
random_page_cost = 3.0
cpu_tuple_cost = 0.3
#cpu_index_tuple_cost = 0.005           # same scale as above - 0.005
#cpu_operator_cost = 0.0025             # same scale as above
effective_cache_size = 8192MB

That this is better, some queries run much faster. Is this better?

I will find the archive and post.

Thank you

Ogden



--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Tomas Vondra
Dne 12.4.2011 19:23, Ogden napsal(a):

>
> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote:
>
>> Ogden <[hidden email]> wrote:
>>
>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3
>>> server at work and granted, there's more activity on the production server, but
>>> the same queries take twice as long on the beefier server than my mac at home.
>>> I have pasted what I have changed in postgresql.conf - I am wondering if
>>> there's any way one can help me change things around to be more efficient.
>>>
>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>>>
>>> Heavy write and read (for reporting and calculations) server.
>>>
>>> max_connections = 350
>>> shared_buffers = 4096MB  
>>> work_mem = 32MB
>>> maintenance_work_mem = 512MB
>>
>> That's okay.
>>
>>
>>>
>>>
>>> seq_page_cost = 0.02                    # measured on an arbitrary scale
>>> random_page_cost = 0.03
>>
>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
>> and random_page_cost) are completly wrong.
>>
>
> No, I don't have super fast disks. Just the 15K SCSI over RAID. I
> find by raising them to:
>
> seq_page_cost = 1.0
> random_page_cost = 3.0
> cpu_tuple_cost = 0.3
> #cpu_index_tuple_cost = 0.005           # same scale as above - 0.005
> #cpu_operator_cost = 0.0025             # same scale as above
> effective_cache_size = 8192MB
>
> That this is better, some queries run much faster. Is this better?

I guess it is. What really matters with those cost variables is the
relative scale - the original values

seq_page_cost = 0.02
random_page_cost = 0.03
cpu_tuple_cost = 0.02

suggest that the random reads are almost as expensive as sequential
reads (which usually is not true - the random reads are significantly
more expensive), and that processing each row is about as expensive as
reading the page from disk (again, reading data from disk is much more
expensive than processing them).

So yes, the current values are much more likely to give good results.

You've mentioned those values were recommended on this list - can you
point out the actual discussion?

regards
Tomas

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Ogden

On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote:

> Dne 12.4.2011 19:23, Ogden napsal(a):
>>
>> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote:
>>
>>> Ogden <[hidden email]> wrote:
>>>
>>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3
>>>> server at work and granted, there's more activity on the production server, but
>>>> the same queries take twice as long on the beefier server than my mac at home.
>>>> I have pasted what I have changed in postgresql.conf - I am wondering if
>>>> there's any way one can help me change things around to be more efficient.
>>>>
>>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>>>>
>>>> Heavy write and read (for reporting and calculations) server.
>>>>
>>>> max_connections = 350
>>>> shared_buffers = 4096MB  
>>>> work_mem = 32MB
>>>> maintenance_work_mem = 512MB
>>>
>>> That's okay.
>>>
>>>
>>>>
>>>>
>>>> seq_page_cost = 0.02                    # measured on an arbitrary scale
>>>> random_page_cost = 0.03
>>>
>>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
>>> and random_page_cost) are completly wrong.
>>>
>>
>> No, I don't have super fast disks. Just the 15K SCSI over RAID. I
>> find by raising them to:
>>
>> seq_page_cost = 1.0
>> random_page_cost = 3.0
>> cpu_tuple_cost = 0.3
>> #cpu_index_tuple_cost = 0.005           # same scale as above - 0.005
>> #cpu_operator_cost = 0.0025             # same scale as above
>> effective_cache_size = 8192MB
>>
>> That this is better, some queries run much faster. Is this better?
>
> I guess it is. What really matters with those cost variables is the
> relative scale - the original values
>
> seq_page_cost = 0.02
> random_page_cost = 0.03
> cpu_tuple_cost = 0.02
>
> suggest that the random reads are almost as expensive as sequential
> reads (which usually is not true - the random reads are significantly
> more expensive), and that processing each row is about as expensive as
> reading the page from disk (again, reading data from disk is much more
> expensive than processing them).
>
> So yes, the current values are much more likely to give good results.
>
> You've mentioned those values were recommended on this list - can you
> point out the actual discussion?
>
>

Thank you for your reply.

http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values...

Ogden
--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Tomas Vondra
Dne 12.4.2011 20:28, Ogden napsal(a):

>
> On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote:
>
>> Dne 12.4.2011 19:23, Ogden napsal(a):
>>>
>>> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote:
>>>
>>>> Ogden <[hidden email]> wrote:
>>>>
>>>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3
>>>>> server at work and granted, there's more activity on the production server, but
>>>>> the same queries take twice as long on the beefier server than my mac at home.
>>>>> I have pasted what I have changed in postgresql.conf - I am wondering if
>>>>> there's any way one can help me change things around to be more efficient.
>>>>>
>>>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>>>>>
>>>>> Heavy write and read (for reporting and calculations) server.
>>>>>
>>>>> max_connections = 350
>>>>> shared_buffers = 4096MB  
>>>>> work_mem = 32MB
>>>>> maintenance_work_mem = 512MB
>>>>
>>>> That's okay.
>>>>
>>>>
>>>>>
>>>>>
>>>>> seq_page_cost = 0.02                    # measured on an arbitrary scale
>>>>> random_page_cost = 0.03
>>>>
>>>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
>>>> and random_page_cost) are completly wrong.
>>>>
>>>
>>> No, I don't have super fast disks. Just the 15K SCSI over RAID. I
>>> find by raising them to:
>>>
>>> seq_page_cost = 1.0
>>> random_page_cost = 3.0
>>> cpu_tuple_cost = 0.3
>>> #cpu_index_tuple_cost = 0.005           # same scale as above - 0.005
>>> #cpu_operator_cost = 0.0025             # same scale as above
>>> effective_cache_size = 8192MB
>>>
>>> That this is better, some queries run much faster. Is this better?
>>
>> I guess it is. What really matters with those cost variables is the
>> relative scale - the original values
>>
>> seq_page_cost = 0.02
>> random_page_cost = 0.03
>> cpu_tuple_cost = 0.02
>>
>> suggest that the random reads are almost as expensive as sequential
>> reads (which usually is not true - the random reads are significantly
>> more expensive), and that processing each row is about as expensive as
>> reading the page from disk (again, reading data from disk is much more
>> expensive than processing them).
>>
>> So yes, the current values are much more likely to give good results.
>>
>> You've mentioned those values were recommended on this list - can you
>> point out the actual discussion?
>>
>>
>
> Thank you for your reply.
>
> http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values...
>

OK, what JD said there generally makes sense, although those values are
a bit extreme - in most cases it's recommended to leave seq_page_cost=1
and decrease the random_page_cost (to 2, the dafault value is 4). That
usually pushes the planner towards index scans.

I'm not saying those small values (0.02 etc.) are bad, but I guess the
effect is about the same and it changes the impact of the other cost
variables (cpu_tuple_cost, etc.)

I see there is 16GB of RAM but shared_buffers are just 4GB. So there's
nothing else running and the rest of the RAM is used for pagecache? I've
noticed the previous discussion mentions there are 8GB of RAM and the DB
size is 7GB (so it might fit into memory). Is this still the case?

regards
Tomas

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Ogden

On Apr 12, 2011, at 4:09 PM, Tomas Vondra wrote:

> Dne 12.4.2011 20:28, Ogden napsal(a):
>>
>> On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote:
>>
>>> Dne 12.4.2011 19:23, Ogden napsal(a):
>>>>
>>>> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote:
>>>>
>>>>> Ogden <[hidden email]> wrote:
>>>>>
>>>>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3
>>>>>> server at work and granted, there's more activity on the production server, but
>>>>>> the same queries take twice as long on the beefier server than my mac at home.
>>>>>> I have pasted what I have changed in postgresql.conf - I am wondering if
>>>>>> there's any way one can help me change things around to be more efficient.
>>>>>>
>>>>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>>>>>>
>>>>>> Heavy write and read (for reporting and calculations) server.
>>>>>>
>>>>>> max_connections = 350
>>>>>> shared_buffers = 4096MB  
>>>>>> work_mem = 32MB
>>>>>> maintenance_work_mem = 512MB
>>>>>
>>>>> That's okay.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> seq_page_cost = 0.02                    # measured on an arbitrary scale
>>>>>> random_page_cost = 0.03
>>>>>
>>>>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
>>>>> and random_page_cost) are completly wrong.
>>>>>
>>>>
>>>> No, I don't have super fast disks. Just the 15K SCSI over RAID. I
>>>> find by raising them to:
>>>>
>>>> seq_page_cost = 1.0
>>>> random_page_cost = 3.0
>>>> cpu_tuple_cost = 0.3
>>>> #cpu_index_tuple_cost = 0.005           # same scale as above - 0.005
>>>> #cpu_operator_cost = 0.0025             # same scale as above
>>>> effective_cache_size = 8192MB
>>>>
>>>> That this is better, some queries run much faster. Is this better?
>>>
>>> I guess it is. What really matters with those cost variables is the
>>> relative scale - the original values
>>>
>>> seq_page_cost = 0.02
>>> random_page_cost = 0.03
>>> cpu_tuple_cost = 0.02
>>>
>>> suggest that the random reads are almost as expensive as sequential
>>> reads (which usually is not true - the random reads are significantly
>>> more expensive), and that processing each row is about as expensive as
>>> reading the page from disk (again, reading data from disk is much more
>>> expensive than processing them).
>>>
>>> So yes, the current values are much more likely to give good results.
>>>
>>> You've mentioned those values were recommended on this list - can you
>>> point out the actual discussion?
>>>
>>>
>>
>> Thank you for your reply.
>>
>> http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values...
>>
>
> OK, what JD said there generally makes sense, although those values are
> a bit extreme - in most cases it's recommended to leave seq_page_cost=1
> and decrease the random_page_cost (to 2, the dafault value is 4). That
> usually pushes the planner towards index scans.
>
> I'm not saying those small values (0.02 etc.) are bad, but I guess the
> effect is about the same and it changes the impact of the other cost
> variables (cpu_tuple_cost, etc.)
>
> I see there is 16GB of RAM but shared_buffers are just 4GB. So there's
> nothing else running and the rest of the RAM is used for pagecache? I've
> noticed the previous discussion mentions there are 8GB of RAM and the DB
> size is 7GB (so it might fit into memory). Is this still the case?
>
> regards
> Tomas


Thomas,

By decreasing random_page_cost to 2 (instead of 4), there is a slight performance decrease as opposed to leaving it just at 4. For example, if I set it 3 (or 4), a query may take 0.057 seconds. The same query takes 0.144s when I set random_page_cost to 2. Should I keep it at 3 (or 4) as I have done now?

Yes there is 16GB of RAM but the database is much bigger than that. Should I increase shared_buffers?

Thank you so very much

Ogden
--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Tomas Vondra
Dne 12.4.2011 23:19, Ogden napsal(a):

>
> On Apr 12, 2011, at 4:09 PM, Tomas Vondra wrote:
>
>> Dne 12.4.2011 20:28, Ogden napsal(a):
>>>
>>> On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote:
>>>
>>>> Dne 12.4.2011 19:23, Ogden napsal(a):
>>>>>
>>>>> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote:
>>>>>
>>>>>> Ogden <[hidden email]> wrote:
>>>>>>
>>>>>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3
>>>>>>> server at work and granted, there's more activity on the production server, but
>>>>>>> the same queries take twice as long on the beefier server than my mac at home.
>>>>>>> I have pasted what I have changed in postgresql.conf - I am wondering if
>>>>>>> there's any way one can help me change things around to be more efficient.
>>>>>>>
>>>>>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>>>>>>>
>>>>>>> Heavy write and read (for reporting and calculations) server.
>>>>>>>
>>>>>>> max_connections = 350
>>>>>>> shared_buffers = 4096MB  
>>>>>>> work_mem = 32MB
>>>>>>> maintenance_work_mem = 512MB
>>>>>>
>>>>>> That's okay.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> seq_page_cost = 0.02                    # measured on an arbitrary scale
>>>>>>> random_page_cost = 0.03
>>>>>>
>>>>>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
>>>>>> and random_page_cost) are completly wrong.
>>>>>>
>>>>>
>>>>> No, I don't have super fast disks. Just the 15K SCSI over RAID. I
>>>>> find by raising them to:
>>>>>
>>>>> seq_page_cost = 1.0
>>>>> random_page_cost = 3.0
>>>>> cpu_tuple_cost = 0.3
>>>>> #cpu_index_tuple_cost = 0.005           # same scale as above - 0.005
>>>>> #cpu_operator_cost = 0.0025             # same scale as above
>>>>> effective_cache_size = 8192MB
>>>>>
>>>>> That this is better, some queries run much faster. Is this better?
>>>>
>>>> I guess it is. What really matters with those cost variables is the
>>>> relative scale - the original values
>>>>
>>>> seq_page_cost = 0.02
>>>> random_page_cost = 0.03
>>>> cpu_tuple_cost = 0.02
>>>>
>>>> suggest that the random reads are almost as expensive as sequential
>>>> reads (which usually is not true - the random reads are significantly
>>>> more expensive), and that processing each row is about as expensive as
>>>> reading the page from disk (again, reading data from disk is much more
>>>> expensive than processing them).
>>>>
>>>> So yes, the current values are much more likely to give good results.
>>>>
>>>> You've mentioned those values were recommended on this list - can you
>>>> point out the actual discussion?
>>>>
>>>>
>>>
>>> Thank you for your reply.
>>>
>>> http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values...
>>>
>>
>> OK, what JD said there generally makes sense, although those values are
>> a bit extreme - in most cases it's recommended to leave seq_page_cost=1
>> and decrease the random_page_cost (to 2, the dafault value is 4). That
>> usually pushes the planner towards index scans.
>>
>> I'm not saying those small values (0.02 etc.) are bad, but I guess the
>> effect is about the same and it changes the impact of the other cost
>> variables (cpu_tuple_cost, etc.)
>>
>> I see there is 16GB of RAM but shared_buffers are just 4GB. So there's
>> nothing else running and the rest of the RAM is used for pagecache? I've
>> noticed the previous discussion mentions there are 8GB of RAM and the DB
>> size is 7GB (so it might fit into memory). Is this still the case?
>>
>> regards
>> Tomas
>
>
> Thomas,
>
> By decreasing random_page_cost to 2 (instead of 4), there is a slight performance decrease as opposed to leaving it just at 4. For example, if I set it 3 (or 4), a query may take 0.057 seconds. The same query takes 0.144s when I set random_page_cost to 2. Should I keep it at 3 (or 4) as I have done now?
>
> Yes there is 16GB of RAM but the database is much bigger than that. Should I increase shared_buffers?

OK, that's a very important information and it kinda explains all the
problems you had. When the planner decides what execution plan to use,
it computes a 'virtual cost' for different plans and then chooses the
cheapest one.

Decreasing 'random_page_cost' decreases the expected cost of plans
involving index scans, so that at a certain point it seems cheaper than
a plan using sequential scans etc.

You can see this when using EXPLAIN - do it with the original cost
values, then change the values (for that session only) and do the
EXPLAIN only. You'll see how the execution plan suddenly changes and
starts to use index scans.

The problem with random I/O is that it's usually much more expensive
than sequential I/O as the drives need to seek etc. The only case when
random I/O is just as cheap as sequential I/O is when all the data is
cached in memory, because within RAM there's no difference between
random and sequential access (right, that's why it's called Random
Access Memory).

So in the previous post setting both random_page_cost and seq_page_cost
to the same value makes sense, because when the whole database fits into
the memory, there's no difference and index scans are favorable.

In this case (the database is much bigger than the available RAM) this
no longer holds - index scans hit the drives, resulting in a lot of
seeks etc. So it's a serious performance killer ...

Not sure about increasing the shared_buffers - if the block is not found
in shared buffers, it still might be found in pagecache (without need to
do a physical read). There are ways to check if the current size of
shared buffers is enough or not - I usually use pg_stat views (bgwriter
and database).

regards
Tomas

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Ogden

On Apr 12, 2011, at 5:36 PM, Tomas Vondra wrote:

> Dne 12.4.2011 23:19, Ogden napsal(a):
>>
>> On Apr 12, 2011, at 4:09 PM, Tomas Vondra wrote:
>>
>>> Dne 12.4.2011 20:28, Ogden napsal(a):
>>>>
>>>> On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote:
>>>>
>>>>> Dne 12.4.2011 19:23, Ogden napsal(a):
>>>>>>
>>>>>> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote:
>>>>>>
>>>>>>> Ogden <[hidden email]> wrote:
>>>>>>>
>>>>>>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3
>>>>>>>> server at work and granted, there's more activity on the production server, but
>>>>>>>> the same queries take twice as long on the beefier server than my mac at home.
>>>>>>>> I have pasted what I have changed in postgresql.conf - I am wondering if
>>>>>>>> there's any way one can help me change things around to be more efficient.
>>>>>>>>
>>>>>>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>>>>>>>>
>>>>>>>> Heavy write and read (for reporting and calculations) server.
>>>>>>>>
>>>>>>>> max_connections = 350
>>>>>>>> shared_buffers = 4096MB  
>>>>>>>> work_mem = 32MB
>>>>>>>> maintenance_work_mem = 512MB
>>>>>>>
>>>>>>> That's okay.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> seq_page_cost = 0.02                    # measured on an arbitrary scale
>>>>>>>> random_page_cost = 0.03
>>>>>>>
>>>>>>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
>>>>>>> and random_page_cost) are completly wrong.
>>>>>>>
>>>>>>
>>>>>> No, I don't have super fast disks. Just the 15K SCSI over RAID. I
>>>>>> find by raising them to:
>>>>>>
>>>>>> seq_page_cost = 1.0
>>>>>> random_page_cost = 3.0
>>>>>> cpu_tuple_cost = 0.3
>>>>>> #cpu_index_tuple_cost = 0.005           # same scale as above - 0.005
>>>>>> #cpu_operator_cost = 0.0025             # same scale as above
>>>>>> effective_cache_size = 8192MB
>>>>>>
>>>>>> That this is better, some queries run much faster. Is this better?
>>>>>
>>>>> I guess it is. What really matters with those cost variables is the
>>>>> relative scale - the original values
>>>>>
>>>>> seq_page_cost = 0.02
>>>>> random_page_cost = 0.03
>>>>> cpu_tuple_cost = 0.02
>>>>>
>>>>> suggest that the random reads are almost as expensive as sequential
>>>>> reads (which usually is not true - the random reads are significantly
>>>>> more expensive), and that processing each row is about as expensive as
>>>>> reading the page from disk (again, reading data from disk is much more
>>>>> expensive than processing them).
>>>>>
>>>>> So yes, the current values are much more likely to give good results.
>>>>>
>>>>> You've mentioned those values were recommended on this list - can you
>>>>> point out the actual discussion?
>>>>>
>>>>>
>>>>
>>>> Thank you for your reply.
>>>>
>>>> http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values...
>>>>
>>>
>>> OK, what JD said there generally makes sense, although those values are
>>> a bit extreme - in most cases it's recommended to leave seq_page_cost=1
>>> and decrease the random_page_cost (to 2, the dafault value is 4). That
>>> usually pushes the planner towards index scans.
>>>
>>> I'm not saying those small values (0.02 etc.) are bad, but I guess the
>>> effect is about the same and it changes the impact of the other cost
>>> variables (cpu_tuple_cost, etc.)
>>>
>>> I see there is 16GB of RAM but shared_buffers are just 4GB. So there's
>>> nothing else running and the rest of the RAM is used for pagecache? I've
>>> noticed the previous discussion mentions there are 8GB of RAM and the DB
>>> size is 7GB (so it might fit into memory). Is this still the case?
>>>
>>> regards
>>> Tomas
>>
>>
>> Thomas,
>>
>> By decreasing random_page_cost to 2 (instead of 4), there is a slight performance decrease as opposed to leaving it just at 4. For example, if I set it 3 (or 4), a query may take 0.057 seconds. The same query takes 0.144s when I set random_page_cost to 2. Should I keep it at 3 (or 4) as I have done now?
>>
>> Yes there is 16GB of RAM but the database is much bigger than that. Should I increase shared_buffers?
>
> OK, that's a very important information and it kinda explains all the
> problems you had. When the planner decides what execution plan to use,
> it computes a 'virtual cost' for different plans and then chooses the
> cheapest one.
>
> Decreasing 'random_page_cost' decreases the expected cost of plans
> involving index scans, so that at a certain point it seems cheaper than
> a plan using sequential scans etc.
>
> You can see this when using EXPLAIN - do it with the original cost
> values, then change the values (for that session only) and do the
> EXPLAIN only. You'll see how the execution plan suddenly changes and
> starts to use index scans.
>
> The problem with random I/O is that it's usually much more expensive
> than sequential I/O as the drives need to seek etc. The only case when
> random I/O is just as cheap as sequential I/O is when all the data is
> cached in memory, because within RAM there's no difference between
> random and sequential access (right, that's why it's called Random
> Access Memory).
>
> So in the previous post setting both random_page_cost and seq_page_cost
> to the same value makes sense, because when the whole database fits into
> the memory, there's no difference and index scans are favorable.
>
> In this case (the database is much bigger than the available RAM) this
> no longer holds - index scans hit the drives, resulting in a lot of
> seeks etc. So it's a serious performance killer ...
>
> Not sure about increasing the shared_buffers - if the block is not found
> in shared buffers, it still might be found in pagecache (without need to
> do a physical read). There are ways to check if the current size of
> shared buffers is enough or not - I usually use pg_stat views (bgwriter
> and database).


Thomas,

Thank you for your very detailed and well written description. In conclusion, I should keep my random_page_cost (3.0) to a value more than seq_page_cost (1.0)? Is this bad practice or will this suffice for my setup (where the database is much bigger than the RAM in the system)? Or is this not what you are suggesting at all?

Thank you

Ogden



--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Tomas Vondra
> Thomas,
>
> Thank you for your very detailed and well written description. In
> conclusion, I should keep my random_page_cost (3.0) to a value more than
> seq_page_cost (1.0)? Is this bad practice or will this suffice for my
> setup (where the database is much bigger than the RAM in the system)? Or
> is this not what you are suggesting at all?

Yes, keep it that way. The fact that 'random_page_cost >= seq_page_cost'
generally means that random reads are more expensive than sequential
reads. The actual values are  dependent but 4:1 is usually OK, unless your
db fits into memory etc.

The decrease of performance after descreasing random_page_cost to 3 due to
changes of some execution plans (the index scan becomes slightly less
expensive than seq scan), but in your case it's a false assumption. So
keep it at 4 (you may even try to increase it, just to see if that
improves the performance).

regards
Tomas


--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Kevin Grittner
In reply to this post by Ogden
Ogden <[hidden email]> wrote:
 
> In conclusion, I should keep my random_page_cost (3.0) to a value
> more than seq_page_cost (1.0)? Is this bad practice or will this
> suffice for my setup (where the database is much bigger than the
> RAM in the system)?
 
The idea is to adjust the costing factors to model the actual
relative costs of the various actions in your environment with your
workload.  The best way to determine whether your settings are good
is to gauge how happy the those using the database are with
performance.  :-)
 
The degree of caching has a large effect on the page costs.  We've
managed to keep the active portion of our databases cached to a
degree that we have always benefited by reducing the
random_page_cost to 2 or less.  Where the entire database is cached,
we get the best plans with seq_page_cost and random_page_cost set to
equal values in the 0.1 to 0.05 range.  We've occasionally needed to
bump the cpu_tuple_cost up a bit relative to other cpu costs, too.
 
On the other hand, I've seen reports of people who have found it
necessary to increase random_page_cost to get good plans.  These
have been people with large databases where the entire database is
"active" (versus our databases where recent, active data is accessed
much more heavily than, say, 20 year old data).
 
If you model the costing to reflect the reality on your server, good
plans will be chosen.
 
-Kevin

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Claudio Freire
On Wed, Apr 13, 2011 at 4:32 PM, Kevin Grittner
<[hidden email]> wrote:
> If you model the costing to reflect the reality on your server, good
> plans will be chosen.

Wouldn't it be "better" to derive those costs from actual performance
data measured at runtime?

Say, pg could measure random/seq page cost, *per tablespace* even.

Has that been tried?

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Tom Lane-2
Claudio Freire <[hidden email]> writes:
> On Wed, Apr 13, 2011 at 4:32 PM, Kevin Grittner
> <[hidden email]> wrote:
>> If you model the costing to reflect the reality on your server, good
>> plans will be chosen.

> Wouldn't it be "better" to derive those costs from actual performance
> data measured at runtime?

> Say, pg could measure random/seq page cost, *per tablespace* even.

> Has that been tried?

Getting numbers that mean much of anything is a slow, expensive
process.  You really don't want the database trying to do that for you.
Once you've got them, you *really* don't want the database
editorializing on them.

                        regards, tom lane

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Claudio Freire
On Wed, Apr 13, 2011 at 11:52 PM, Tom Lane <[hidden email]> wrote:
> Getting numbers that mean much of anything is a slow, expensive
> process.  You really don't want the database trying to do that for you.
> Once you've got them, you *really* don't want the database
> editorializing on them.
>

So it hasn't even been tried.

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Kevin Grittner
Claudio Freire <[hidden email]> wrote:
 
> So it hasn't even been tried.
 
If you want to do that, I would be interested in your benchmark
numbers.  Or if you're not up to that, there are a number of
companies which I'd bet would be willing to spend the time if they
had a sponsor to pay for their hours.  So far nobody has felt it
likely enough to be beneficial to want to put their time or money on
the line for it.  Here's your chance to be first.
 
-Kevin

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Nathan Boley-2
In reply to this post by Claudio Freire
>> If you model the costing to reflect the reality on your server, good
>> plans will be chosen.
>
> Wouldn't it be "better" to derive those costs from actual performance
> data measured at runtime?
>
> Say, pg could measure random/seq page cost, *per tablespace* even.
>
> Has that been tried?

FWIW, awhile ago I wrote a simple script to measure this and found
that the *actual* random_page / seq_page cost ratio was much higher
than 4/1.

The problem is that caching effects have a large effect on the time it
takes to access a random page, and caching effects are very workload
dependent. So anything automated would probably need to optimize the
parameter values over a set of 'typical' queries, which is exactly
what a good DBA does when they set random_page_cost...

Best,
Nathan

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Kevin Grittner
Nathan Boley <[hidden email]> wrote:
 
> The problem is that caching effects have a large effect on the
> time it takes to access a random page, and caching effects are
> very workload dependent. So anything automated would probably need
> to optimize the parameter values over a set of 'typical' queries,
> which is exactly what a good DBA does when they set
> random_page_cost...
 
Another database product I've used has a stored procedure you can
run to turn on monitoring of workload, another to turn it off and
report on what happened during the interval.  It drags performance
enough that you don't want to leave it running except as a tuning
exercise, but it does produce very detailed statistics and actually
offers suggestions on what you might try tuning to improve
performance.  If someone wanted to write something to deal with this
issue, that seems like a sound overall strategy.
 
-Kevin

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Tomas Vondra
In reply to this post by Nathan Boley-2
Dne 14.4.2011 00:05, Nathan Boley napsal(a):

>>> If you model the costing to reflect the reality on your server, good
>>> plans will be chosen.
>>
>> Wouldn't it be "better" to derive those costs from actual performance
>> data measured at runtime?
>>
>> Say, pg could measure random/seq page cost, *per tablespace* even.
>>
>> Has that been tried?
>
> FWIW, awhile ago I wrote a simple script to measure this and found
> that the *actual* random_page / seq_page cost ratio was much higher
> than 4/1.
>
> The problem is that caching effects have a large effect on the time it
> takes to access a random page, and caching effects are very workload
> dependent. So anything automated would probably need to optimize the
> parameter values over a set of 'typical' queries, which is exactly
> what a good DBA does when they set random_page_cost...

Plus there's a separate pagecache outside shared_buffers, which adds
another layer of complexity.

What I was thinking about was a kind of 'autotuning' using real
workload. I mean - measure the time it takes to process a request
(depends on the application - could be time to load a page, process an
invoice, whatever ...) and compute some reasonable metric on it
(average, median, variance, ...). Move the cost variables a bit (e.g.
the random_page_cost) and see how that influences performance. If it
improved, do another step in the same direction, otherwise do step in
the other direction (or do no change the values at all).

Yes, I've had some lectures on non-linear programming so I'm aware that
this won't work if the cost function has multiple extremes (walleys /
hills etc.) but I somehow suppose that's not the case of cost estimates.

Another issue is that when measuring multiple values (processing of
different requests), the decisions may be contradictory so it really
can't be fully automatic.

regards
Tomas

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Claudio Freire
On Thu, Apr 14, 2011 at 12:19 AM, Tomas Vondra <[hidden email]> wrote:
>
> Another issue is that when measuring multiple values (processing of
> different requests), the decisions may be contradictory so it really
> can't be fully automatic.
>

I don't think it's soooo dependant on workload. It's dependant on
access patterns (and working set sizes), and that all can be
quantified, as opposed to "workload".

I've been meaning to try this for a while yet, and it needs not be as
expensive as one would imagine. It just needs a clever implementation
that isn't too intrusive and that is customizable enough not to
alienate DBAs.

I'm not doing database stuff ATM (though I've been doing it for
several years), and I don't expect to return to database tasks for a
few months. But whenever I get back to it, sure, I'd be willing to
invest time on it.

What an automated system can do and a DBA cannot, and it's why this
idea occurred to me in the first place, is tailor the metrics for
variable contexts and situations. Like, I had a DB that was working
perfectly fine most of the time, but some days it got "overworked" and
sticking with fixed cost variables made no sense - in those
situations, random page cost was insanely high because of the
workload, but sequential scans would have ran much faster because of
OS read-ahead and because of synchroscans. I'm talking of a decision
support system that did lots of heavy duty queries, where sequential
scans are an alternative. I reckon most OLTP systems are different.

So, to make things short, adaptability to varying conditions is what
I'd imagine this technique would provide, and a DBA cannot no matter
how skilled. That and the advent of SSDs and really really different
characteristics of different tablespaces only strengthen my intuition
that automation might be better than parameterization.

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance

Tomas Vondra
Dne 14.4.2011 01:10, Claudio Freire napsal(a):

> On Thu, Apr 14, 2011 at 12:19 AM, Tomas Vondra <[hidden email]> wrote:
>>
>> Another issue is that when measuring multiple values (processing of
>> different requests), the decisions may be contradictory so it really
>> can't be fully automatic.
>>
>
> I don't think it's soooo dependant on workload. It's dependant on
> access patterns (and working set sizes), and that all can be
> quantified, as opposed to "workload".

Well, think about a database that's much bigger than the available RAM.

Workload A: Touches just a very small portion of the database, to the
'active' part actually fits into the memory. In this case the cache hit
ratio can easily be close to 99%.

Workload B: Touches large portion of the database, so it hits the drive
very often. In this case the cache hit ratio is usually around RAM/(size
of the database).

So yes, it may be very workload dependent. In the first case you may
actually significantly lower the random_page_cost (even to
seq_page_cost) and it's going to be quite fast (thanks to the cache).

If you do the same thing with workload B, the database is going to burn.

I'm not saying it's not possible to do some autotuning, but it's a bit
tricky and it's not just about hardware. The workload *is* a very
important part of the equation.

But I have to admit this post probably sounds like an overengineering.
If you can develop something simple (even if that does not consider
workload at all), it might be a useful starting point. If I could help
you in any way with this, let me know.

regards
Tomas

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
123
Loading...