Quantcast

TCP keepalive support for libpq

classic Classic list List threaded Threaded
58 messages Options
123
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

TCP keepalive support for libpq

Tollef Fog Heen-8

(please Cc me on replies, I am not subscribed)

Hi,

libpq currently does not use TCP keepalives.  This is a problem in our
case where we have some clients waiting for notifies and then the
connection is dropped on the server side.  The client never gets the FIN
and thinks the connection is up.  The attached patch unconditionally
adds keepalives.  I chose unconditionally as this is what the server
does.  We didn't need the ability to tune the timeouts, but that could
be added with reasonable ease.

--
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

libpq_keepalive.diff (803 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Magnus Hagander-2
On Tue, Feb 9, 2010 at 14:03, Tollef Fog Heen
<[hidden email]> wrote:

>
> (please Cc me on replies, I am not subscribed)
>
> Hi,
>
> libpq currently does not use TCP keepalives.  This is a problem in our
> case where we have some clients waiting for notifies and then the
> connection is dropped on the server side.  The client never gets the FIN
> and thinks the connection is up.  The attached patch unconditionally
> adds keepalives.  I chose unconditionally as this is what the server
> does.  We didn't need the ability to tune the timeouts, but that could
> be added with reasonable ease.

Seems reasonable to add this. Are there any scenarios where this can
cause trouble, that would be fixed by having the ability to select
non-standard behavior?
I don't recall ever changing away from the standard behavior in any of
my deployments, but that might be platform dependent?

If not, I think this is small and trivial enough not to have to push
back for 9.1 ;)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Tollef Fog Heen-8
]] Magnus Hagander

| Seems reasonable to add this. Are there any scenarios where this can
| cause trouble, that would be fixed by having the ability to select
| non-standard behavior?

Well, it might be unwanted if you're on a pay-per-bit connection such as
3G, but in this case, it just makes the problem a bit worse than the
server keepalive already makes it – it doesn't introduce a new problem.

| I don't recall ever changing away from the standard behavior in any of
| my deployments, but that might be platform dependent?

If you were (ab)using postgres as an IPC mechanism, I could see it being
useful, but not in the normal case.

| If not, I think this is small and trivial enough not to have to push
| back for 9.1 ;)

\o/

--
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Andrew Chernow-3
In reply to this post by Tollef Fog Heen-8
Tollef Fog Heen wrote:

> (please Cc me on replies, I am not subscribed)
>
> Hi,
>
> libpq currently does not use TCP keepalives.  This is a problem in our
> case where we have some clients waiting for notifies and then the
> connection is dropped on the server side.  The client never gets the FIN
> and thinks the connection is up.  The attached patch unconditionally
> adds keepalives.  I chose unconditionally as this is what the server
> does.  We didn't need the ability to tune the timeouts, but that could
> be added with reasonable ease.

ISTM that the default behavior should be keep alives disabled, as it is
now, and those wanting it can just set it in their apps:

setsockopt(PQsocket(conn), SOL_SOCKET, SO_KEEPALIVE, ...)

If you really want libpq to manage this, I think you need to expose the
probe interval and timeouts.  There should be some platform checks as
well.  Check out...

http://www.mail-archive.com/pgsql-hackers@.../msg128603.html

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Fujii Masao-2
On Tue, Feb 9, 2010 at 11:34 PM, Andrew Chernow <[hidden email]> wrote:
> If you really want libpq to manage this, I think you need to expose the
> probe interval and timeouts.

Agreed.

Previously I was making the patch that exposes them as conninfo
options so that the standby can detect a network outage ASAP in SR.
I attached that WIP patch as a reference. Hope this helps.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

keepalive.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

DavidGould
In reply to this post by Andrew Chernow-3
On Tue, Feb 09, 2010 at 09:34:10AM -0500, Andrew Chernow wrote:

> Tollef Fog Heen wrote:
> >(please Cc me on replies, I am not subscribed)
> >
> >Hi,
> >
> >libpq currently does not use TCP keepalives.  This is a problem in our
> >case where we have some clients waiting for notifies and then the
> >connection is dropped on the server side.  The client never gets the FIN
> >and thinks the connection is up.  The attached patch unconditionally
> >adds keepalives.  I chose unconditionally as this is what the server
> >does.  We didn't need the ability to tune the timeouts, but that could
> >be added with reasonable ease.
>
> ISTM that the default behavior should be keep alives disabled, as it is
> now, and those wanting it can just set it in their apps:
>
> setsockopt(PQsocket(conn), SOL_SOCKET, SO_KEEPALIVE, ...)

I disagree. I have clients who have problems with leftover client connections
due to server host failures. They do not write apps in C. For a non-default
change to be effective we would need to have all the client drivers, eg JDBC,
psycopg, DBD-DBI, and the apps like psql make changes to turn it on. Adding
this option as a non-default will not really help.

-dg
 

--
David Gould       [hidden email]      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Tollef Fog Heen-8
]] daveg

| I disagree. I have clients who have problems with leftover client connections
| due to server host failures. They do not write apps in C. For a non-default
| change to be effective we would need to have all the client drivers, eg JDBC,
| psycopg, DBD-DBI, and the apps like psql make changes to turn it on. Adding
| this option as a non-default will not really help.

FWIW, this is my case.  My application uses psycopg, which provides no
way to get access to the underlying socket.  Sure, I could hack my way
around this, but from the application writer's point of view, I have a
connection that I expect to stay around and be reliable.  Whether that
connection is over a UNIX socket, a TCP socket or something else is
something I would rather not have to worry about; it feels very much
like an abstraction violation.

--
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Magnus Hagander-2
In reply to this post by DavidGould
2010/2/10 daveg <[hidden email]>:

> On Tue, Feb 09, 2010 at 09:34:10AM -0500, Andrew Chernow wrote:
>> Tollef Fog Heen wrote:
>> >(please Cc me on replies, I am not subscribed)
>> >
>> >Hi,
>> >
>> >libpq currently does not use TCP keepalives.  This is a problem in our
>> >case where we have some clients waiting for notifies and then the
>> >connection is dropped on the server side.  The client never gets the FIN
>> >and thinks the connection is up.  The attached patch unconditionally
>> >adds keepalives.  I chose unconditionally as this is what the server
>> >does.  We didn't need the ability to tune the timeouts, but that could
>> >be added with reasonable ease.
>>
>> ISTM that the default behavior should be keep alives disabled, as it is
>> now, and those wanting it can just set it in their apps:
>>
>> setsockopt(PQsocket(conn), SOL_SOCKET, SO_KEEPALIVE, ...)
>
> I disagree. I have clients who have problems with leftover client connections
> due to server host failures. They do not write apps in C. For a non-default
> change to be effective we would need to have all the client drivers, eg JDBC,
> psycopg, DBD-DBI, and the apps like psql make changes to turn it on. Adding
> this option as a non-default will not really help.

Yes, that's definitely the use-case. PQsocket() will work fine for C apps only.

But it should work fine as an option, no? As long as you can specify
it on the connection string - I don't think there are any interfaces
that won't take a connection string?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Andrew Chernow-3

>>> ISTM that the default behavior should be keep alives disabled, as it is
>>> now, and those wanting it can just set it in their apps:
>>>
>>> setsockopt(PQsocket(conn), SOL_SOCKET, SO_KEEPALIVE, ...)
>> I disagree. I have clients who have problems with leftover client connections
>> due to server host failures. They do not write apps in C. For a non-default
>> change to be effective we would need to have all the client drivers, eg JDBC,
>> psycopg, DBD-DBI, and the apps like psql make changes to turn it on. Adding
>> this option as a non-default will not really help.
>
> Yes, that's definitely the use-case. PQsocket() will work fine for C apps only.
>
> But it should work fine as an option, no? As long as you can specify
> it on the connection string - I don't think there are any interfaces
> that won't take a connection string?
>

Perl and python appear to have the same abilities as C.  I don't use either of
these drivers but I *think* the below would work:

DBD:DBI
setsockopt($dbh->pg_socket(), ...);

psycopg
conn.cursor().socket().setsockopt(...);

Although, I think Dave's comments have made me change my mind about this patch.
  Looks like it serves a good purpose.  That said, there is no guarentee the
driver will implement the new feature ... JDBC seems to lack the ability to get
the backing Socket object but java can set socket options.  Maybe a JDBC kong fu
master knows how to do this.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Robert Haas
In reply to this post by Tollef Fog Heen-8
On Thu, Feb 11, 2010 at 2:15 AM, Tollef Fog Heen
<[hidden email]> wrote:

> ]] daveg
>
> | I disagree. I have clients who have problems with leftover client connections
> | due to server host failures. They do not write apps in C. For a non-default
> | change to be effective we would need to have all the client drivers, eg JDBC,
> | psycopg, DBD-DBI, and the apps like psql make changes to turn it on. Adding
> | this option as a non-default will not really help.
>
> FWIW, this is my case.  My application uses psycopg, which provides no
> way to get access to the underlying socket.  Sure, I could hack my way
> around this, but from the application writer's point of view, I have a
> connection that I expect to stay around and be reliable.  Whether that
> connection is over a UNIX socket, a TCP socket or something else is
> something I would rather not have to worry about; it feels very much
> like an abstraction violation.

I've sometimes wondered why keepalives aren't the default for all TCP
connections.  They seem like they're usually a Good Thing (TM), but I
wonder if we can think of any situations where someone might not want
them?

...Robert

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Kevin Grittner
Robert Haas <[hidden email]> wrote:
 
> I've sometimes wondered why keepalives aren't the default for all
> TCP connections.  They seem like they're usually a Good Thing
> (TM), but I wonder if we can think of any situations where someone
> might not want them?
 
I think it's insane not to use them at all, but there are valid use
cases for different timings.  Personally, I'd be happy to see a
default of sending them if a connection is idle for two minutes, but
those people who create 2000 lightly used connections to the
database might feel differently.
 
-Kevin

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Peter Geoghegan
From the Slony-I docs (http://www.slony.info/documentation/faq.html) :

"Supposing you experience some sort of network outage, the connection
between slon and database may fail, and the slon may figure this out
long before the PostgreSQL  instance it was connected to does. The
result is that there will be some number of idle connections left on
the database server, which won't be closed out until TCP/IP timeouts
complete, which seems to normally take about two hours. For that two
hour period, the slon will try to connect, over and over, and will get
the above fatal message, over and over. "

Speaking as someone who uses Slony quite a lot, this patch sounds very
helpful. Why hasn't libpq had keepalives for years?

Regards,
Peter Geoghegan

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Andrew Chernow-3
In reply to this post by Robert Haas
Robert Haas wrote:

> On Thu, Feb 11, 2010 at 2:15 AM, Tollef Fog Heen
> <[hidden email]> wrote:
>> ]] daveg
>>
>> | I disagree. I have clients who have problems with leftover client connections
>> | due to server host failures. They do not write apps in C. For a non-default
>> | change to be effective we would need to have all the client drivers, eg JDBC,
>> | psycopg, DBD-DBI, and the apps like psql make changes to turn it on. Adding
>> | this option as a non-default will not really help.
>>
>> FWIW, this is my case.  My application uses psycopg, which provides no
>> way to get access to the underlying socket.  Sure, I could hack my way
>> around this, but from the application writer's point of view, I have a
>> connection that I expect to stay around and be reliable.  Whether that
>> connection is over a UNIX socket, a TCP socket or something else is
>> something I would rather not have to worry about; it feels very much
>> like an abstraction violation.
>
> I've sometimes wondered why keepalives aren't the default for all TCP
> connections.  They seem like they're usually a Good Thing (TM), but I
> wonder if we can think of any situations where someone might not want
> them?
>

The only case I can think of are systems that send application layer
keepalive-like packets; I've worked on systems like this.  The goal
wasn't to reinvent keepalives but to check-in every minute or two to
meet a different set of requirements, thus TCP keepalives weren't
needed.  However, I don't think they would of caused any harm.

The more I think about this the more I think it's a pretty non-invasive
change to enable keepalives in libpq.  I don't think this has any
negative impact on clients written while the default was disabled.

This is really a driver setting.  There is no way to ensure libpq, DBI,
psycopg, JDBC, etc... all enable or disable keepalives by default.  I
only bring this up because it appears there are complaints from
non-libpq clients.  This patch wouldn't fix those cases.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Dimitri Fontaine-3
In reply to this post by Kevin Grittner
"Kevin Grittner" <[hidden email]> writes:
> those people who create 2000 lightly used connections to the
> database might feel differently.

Yeah I still run against installation using the infamous PHP pconnect()
function. You certainly don't want to add some load there, but that
could urge them into arranging for being able to use pgbouncer in
transaction pooling mode (and stop using pconnect(), damn it).

Regards,
--
dim

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Peter Geoghegan
In reply to this post by Peter Geoghegan
Also, more importantly (from
http://www.slony.info/documentation/slonyadmin.html):

"A WAN outage (or flakiness of the WAN in general) can leave database
connections "zombied", and typical TCP/IP behaviour  will allow those
connections to persist, preventing a slon restart for around two
hours. "

Regards,
Peter Geoghegan

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Tollef Fog Heen-8
In reply to this post by Robert Haas
]] Robert Haas

| I've sometimes wondered why keepalives aren't the default for all TCP
| connections.  They seem like they're usually a Good Thing (TM), but I
| wonder if we can think of any situations where someone might not want
| them?

As somebody mentioned somewhere else (I think): If you pay per byte
transmitted, be it 3G/GPRS.  Or if you're on a very, very high-latency
link or have no bandwidth.  Like, a rocket to Mars or maybe the moon.
While I think they are valid use-cases, requiring people to change the
defaults if that kind of thing sounds like a sensible solution to me.

--
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Kris Jurka
In reply to this post by Andrew Chernow-3


On Thu, 11 Feb 2010, Andrew Chernow wrote:

>
> Although, I think Dave's comments have made me change my mind about this
> patch.  Looks like it serves a good purpose.  That said, there is no
> guarentee the driver will implement the new feature ... JDBC seems to
> lack the ability to get the backing Socket object but java can set
> socket options. Maybe a JDBC kong fu master knows how to do this.

Use the tcpKeepAlive connection option as described here:

http://jdbc.postgresql.org/documentation/84/connect.html#connection-parameters

Java can only enable/disable keep alives, it can't set the desired
timeout.

http://java.sun.com/javase/6/docs/api/java/net/Socket.html#setKeepAlive%28boolean%29

Kris Jurka

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Fujii Masao-2
In reply to this post by Peter Geoghegan
On Fri, Feb 12, 2010 at 1:33 AM, Peter Geoghegan
<[hidden email]> wrote:
> Why hasn't libpq had keepalives for years?

I guess that it's because keepalive doesn't work as expected
in some cases. For example, if the network outage happens
before a client sends some packets, keepalive doesn't work,
then it would have to wait for a long time until it detects
the outage. This is the specification of linux kernel. So
a client should not have excessive expectations of keepalive,
and should have another timeout like QueryTimeout of JDBC.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Peter Geoghegan
> <[hidden email]> wrote:
>> Why hasn't libpq had keepalives for years?
>
> I guess that it's because keepalive doesn't work as expected
> in some cases. For example, if the network outage happens
> before a client sends some packets, keepalive doesn't work,
> then it would have to wait for a long time until it detects
> the outage. This is the specification of linux kernel. So
> a client should not have excessive expectations of keepalive,
> and should have another timeout like QueryTimeout of JDBC.

In my experience, the problems described are common when using libpq
over any sort of flaky connection, which I myself regularly do (not
just with Slony, but with a handheld wi-fi PDT application, where
libpq is used without any wrapper). The slony docs say it takes about
2 hours for the problem to correct itself, but I have found that it
may take a lot longer, perhaps because I have a hybrid Linux/Windows
Slony cluster.

> keepalive doesn't work,
> then it would have to wait for a long time until it detects
> the outage.

I'm not really sure what you mean. In this scenario, would it take as
long as it would have taken had keepalives not been used?

I strongly welcome anything that can ameliorate these problems, which
are probably not noticed by the majority of users, but are a real
inconvenience when they do arise.

Regards,
Peter Geoghegan

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TCP keepalive support for libpq

Fujii Masao-2
On Fri, Feb 12, 2010 at 6:40 PM, Peter Geoghegan
<[hidden email]> wrote:
>> keepalive doesn't work,
>> then it would have to wait for a long time until it detects
>> the outage.
>
> I'm not really sure what you mean. In this scenario, would it take as
> long as it would have taken had keepalives not been used?

Please see the following threads.
http://archives.postgresql.org/pgsql-bugs/2006-08/msg00098.php
http://lkml.indiana.edu/hypermail/linux/kernel/0508.2/0757.html

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
123
Loading...