|
It looks like initdb doesn't fsync all the files it creates, e.g. the
PG_VERSION file. While it's unlikely that it would cause any real data loss, it can be inconvenient in some testing scenarios involving VMs. Thoughts? Would a patch to add a few fsync calls to initdb be accepted? Is a platform-independent fsync be available at initdb time? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Fri, Jan 27, 2012 at 04:19:41PM -0800, Jeff Davis wrote:
> It looks like initdb doesn't fsync all the files it creates, e.g. the > PG_VERSION file. > > While it's unlikely that it would cause any real data loss, it can be > inconvenient in some testing scenarios involving VMs. > > Thoughts? Would a patch to add a few fsync calls to initdb be accepted? +1. If I'm piloting "strace -f" right, initdb currently issues *no* syncs. We'd probably, then, want a way to re-disable the fsyncs for hacker benefit. > Is a platform-independent fsync be available at initdb time? Not sure. Thanks, nm -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On 01/27/2012 11:52 PM, Noah Misch wrote: >> Is a platform-independent fsync be available at initdb time? > Not sure. > It's a macro on Windows that calls _commit(fd), so it should be portable enough. I'm curious what problem we're actually solving here, though. I've run the buildfarm countless thousands of times on different VMs, and five of my seven current animals run in VMs, and I don't think I've ever seen a failure ascribable to inadequately synced files from initdb. cheers andrew -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Sat, 2012-01-28 at 10:31 -0500, Andrew Dunstan wrote:
> I'm curious what problem we're actually solving here, though. I've run > the buildfarm countless thousands of times on different VMs, and five of > my seven current animals run in VMs, and I don't think I've ever seen a > failure ascribable to inadequately synced files from initdb. I believe I have seen such a problem second hand in a situation where the VM was known to be killed harshly (not sure if you do that regularly). It's a little difficult for me to _prove_ that this would have solved the problem, and I think it was only observed once (though I could probably reproduce it if I tried). The symptom was a log message indicating that PG_VERSION was missing or corrupt on a system that was previously started and online (albeit briefly for a test). Regards, Jeff Davis -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Andrew Dunstan
Andrew Dunstan <[hidden email]> writes:
> I'm curious what problem we're actually solving here, though. I've run > the buildfarm countless thousands of times on different VMs, and five of > my seven current animals run in VMs, and I don't think I've ever seen a > failure ascribable to inadequately synced files from initdb. Yeah. Personally I would be sad if initdb got noticeably slower, and I've never seen or heard of a failure that this would fix. I wonder whether it wouldn't be sufficient to call sync(2) at the end, anyway, rather than cluttering the entire initdb codebase with fsync calls. regards, tom lane -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Andrew Dunstan
On Sat, Jan 28, 2012 at 7:31 AM, Andrew Dunstan <[hidden email]> wrote:
> > > On 01/27/2012 11:52 PM, Noah Misch wrote: >>> >>> Is a platform-independent fsync be available at initdb time? >> >> Not sure. >> > > It's a macro on Windows that calls _commit(fd), so it should be portable > enough. > > I'm curious what problem we're actually solving here, though. I've run the > buildfarm countless thousands of times on different VMs, and five of my > seven current animals run in VMs, and I don't think I've ever seen a failure > ascribable to inadequately synced files from initdb. I wouldn't expect you to ever see that problem on the buildfarm. If the OS gets thunked during the middle of a regression test, when it comes back up the code is not going to try to pick up where it left off, it is just going to blow away the entire install and start over from scratch. So any crash-recoverability problems will never be detected. I would guess the original poster is doing a more stringent kind of test. Cheers, Jeff -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Tom Lane-2
On Sat, 2012-01-28 at 13:18 -0500, Tom Lane wrote:
> Andrew Dunstan <[hidden email]> writes: > > I'm curious what problem we're actually solving here, though. I've run > > the buildfarm countless thousands of times on different VMs, and five of > > my seven current animals run in VMs, and I don't think I've ever seen a > > failure ascribable to inadequately synced files from initdb. > > Yeah. Personally I would be sad if initdb got noticeably slower, and > I've never seen or heard of a failure that this would fix. > > I wonder whether it wouldn't be sufficient to call sync(2) at the end, > anyway, rather than cluttering the entire initdb codebase with fsync > calls. I can always add a "sync" call to the test, also (rather than modifying initdb). Or, it could be an initdb option, which might be a good compromise. I don't have a strong opinion here. As machines get more memory and filesystems get more lazy, I wonder if it will be a more frequent occurrence, however. On the other hand, if filesystems are more lazy, that also increases the cost associated with extra "sync" calls. I think there would be a surprise factor if sometimes initdb had a long pause at the end and caused 10GB of data to be written out. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Tom Lane-2
On Sat, Jan 28, 2012 at 10:18 AM, Tom Lane <[hidden email]> wrote:
> Andrew Dunstan <[hidden email]> writes: >> I'm curious what problem we're actually solving here, though. I've run >> the buildfarm countless thousands of times on different VMs, and five of >> my seven current animals run in VMs, and I don't think I've ever seen a >> failure ascribable to inadequately synced files from initdb. > > Yeah. Personally I would be sad if initdb got noticeably slower, and > I've never seen or heard of a failure that this would fix. > > I wonder whether it wouldn't be sufficient to call sync(2) at the end, > anyway, rather than cluttering the entire initdb codebase with fsync > calls. > > regards, tom lane Does sync(2) behave like sync(8) and flush the entire system cache, or does it just flush the files opened by the process which called it? The man page didn't enlighten me on that. sometimes sync(8) never returns. It doesn't just flush what was dirty at the time it was called, it actually keeps running until there are simultaneously no dirty pages anywhere in the system. On busy systems, this condition might never be reached. And it can't be interrupted, not even with kill -9. Cheers, Jeff -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Jeff Davis-8
On 01/28/2012 01:46 PM, Jeff Davis wrote: > On Sat, 2012-01-28 at 13:18 -0500, Tom Lane wrote: >> Andrew Dunstan<[hidden email]> writes: >>> I'm curious what problem we're actually solving here, though. I've run >>> the buildfarm countless thousands of times on different VMs, and five of >>> my seven current animals run in VMs, and I don't think I've ever seen a >>> failure ascribable to inadequately synced files from initdb. >> Yeah. Personally I would be sad if initdb got noticeably slower, and >> I've never seen or heard of a failure that this would fix. >> >> I wonder whether it wouldn't be sufficient to call sync(2) at the end, >> anyway, rather than cluttering the entire initdb codebase with fsync >> calls. > I can always add a "sync" call to the test, also (rather than modifying > initdb). Or, it could be an initdb option, which might be a good > compromise. I don't have a strong opinion here. > > As machines get more memory and filesystems get more lazy, I wonder if > it will be a more frequent occurrence, however. On the other hand, if > filesystems are more lazy, that also increases the cost associated with > extra "sync" calls. I think there would be a surprise factor if > sometimes initdb had a long pause at the end and caused 10GB of data to > be written out. > -1 for that. A very quick look at initdb.c suggests to me that there are only two places where we'd need to put fsync(), right before we call fclose() in write_file() and write_version_file(). If we're going to do anything that seems to be the least painful and most portable way to go. cheers andrew -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Tom Lane-2
* Tom Lane:
> I wonder whether it wouldn't be sufficient to call sync(2) at the end, > anyway, rather than cluttering the entire initdb codebase with fsync > calls. We tried to do this in the Debian package mananger. It works as expected on Linux systems, but it can cause a lot of data to hit the disk, and there are kernel versions where sync(2) never completes if the system is rather busy. initdb is much faster with 9.1 than with 8.4. It's so fast that you can use it in test suites, instead of reusing an existing cluster. I think this is a rather desirable property. -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Tom Lane-2
On Sat, 2012-01-28 at 13:18 -0500, Tom Lane wrote:
> Yeah. Personally I would be sad if initdb got noticeably slower, and > I've never seen or heard of a failure that this would fix. I worked up a patch, and it looks like it does about 6 file fsync's and a 7th for the PGDATA directory. That degrades the time from about 1.1s to 1.4s on my workstation. pg_test_fsync says this about my workstation (one 8kB write): open_datasync 117.495 ops/sec fdatasync 117.949 ops/sec fsync 25.530 ops/sec fsync_writethrough n/a open_sync 24.666 ops/sec 25 ops/sec means about 40ms per fsync, times 7 is about 280ms, so that seems like about the right degradation for fsync. I tried with fdatasync as well to see if it improved things, and I wasn't able to realize any difference (not sure exactly why). So, is it worth it? Should we make it an option that can be specified? > I wonder whether it wouldn't be sufficient to call sync(2) at the end, > anyway, rather than cluttering the entire initdb codebase with fsync > calls. It looks like there are only a few places, so I don't think clutter is really the problem with the simple patch at this point (unless there is a portability problem with just calling fsync). Regards, Jeff Davis -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Sat, Feb 04, 2012 at 03:41:27PM -0800, Jeff Davis wrote:
> On Sat, 2012-01-28 at 13:18 -0500, Tom Lane wrote: > > Yeah. Personally I would be sad if initdb got noticeably slower, and > > I've never seen or heard of a failure that this would fix. > > I worked up a patch, and it looks like it does about 6 file fsync's and > a 7th for the PGDATA directory. That degrades the time from about 1.1s > to 1.4s on my workstation. > So, is it worth it? Should we make it an option that can be specified? If we add fsync calls to the initdb process, they should cover the entire data directory tree. This patch syncs files that initdb.c writes, but we ought to also sync files that bootstrap-mode backends had written. An optimization like the pg_flush_data() call in copy_file() may reduce the speed penalty. initdb should do these syncs by default and offer an option to disable them. -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Sat, 2012-02-04 at 20:18 -0500, Noah Misch wrote:
> If we add fsync calls to the initdb process, they should cover the entire data > directory tree. This patch syncs files that initdb.c writes, but we ought to > also sync files that bootstrap-mode backends had written. It doesn't make sense for initdb to take responsibility to sync files created by the backend. If there are important files that the backend creates, it should be the backend's responsibility to fsync them (and their parent directory, if needed). And if they are unimportant to the backend, then there is no reason for initdb to fsync them. > An optimization > like the pg_flush_data() call in copy_file() may reduce the speed penalty. That worked pretty well. It took it down about 100ms on my machine, which closes the gap significantly. > initdb should do these syncs by default and offer an option to disable them. For test frameworks that run initdb often, that makes sense. But for developers, it doesn't make sense to spend 0.5s typing an option that saves you 0.3s. So, we'd need some more convenient way to choose the no-fsync option, like an environment variable that developers can set. Or maybe developers don't care about 0.3s? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
Jeff Davis <[hidden email]> writes:
> On Sat, 2012-02-04 at 20:18 -0500, Noah Misch wrote: >> If we add fsync calls to the initdb process, they should cover the entire data >> directory tree. This patch syncs files that initdb.c writes, but we ought to >> also sync files that bootstrap-mode backends had written. > It doesn't make sense for initdb to take responsibility to sync files > created by the backend. No, but the more interesting question is whether bootstrap mode troubles to fsync its writes. I'm not too sure about that either way ... regards, tom lane -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Jeff Davis-8
On Sun, Feb 05, 2012 at 10:53:20AM -0800, Jeff Davis wrote:
> On Sat, 2012-02-04 at 20:18 -0500, Noah Misch wrote: > > If we add fsync calls to the initdb process, they should cover the entire data > > directory tree. This patch syncs files that initdb.c writes, but we ought to > > also sync files that bootstrap-mode backends had written. > > It doesn't make sense for initdb to take responsibility to sync files > created by the backend. If there are important files that the backend > creates, it should be the backend's responsibility to fsync them (and > their parent directory, if needed). And if they are unimportant to the > backend, then there is no reason for initdb to fsync them. I meant primarily to illustrate the need to be comprehensive, not comment on which executable should fsync a particular file. Bootstrap-mode backends do not sync anything during an initdb run on my system. With your patch, we'll fsync a small handful of files and leave nearly everything else vulnerable. That being said, having each backend fsync its own writes will mean syncing certain files several times within a single initdb run. If the penalty from that proves high enough, we may do well to instead have initdb.c sync everything just once. > > initdb should do these syncs by default and offer an option to disable them. > > For test frameworks that run initdb often, that makes sense. > > But for developers, it doesn't make sense to spend 0.5s typing an option > that saves you 0.3s. So, we'd need some more convenient way to choose > the no-fsync option, like an environment variable that developers can > set. Or maybe developers don't care about 0.3s? Developers have shell aliases/functions/scripts and command history. I wouldn't object to having an environment variable control it, but I would not personally find that more convenient than a command-line switch. Thanks, nm -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Jeff Davis-8
On sön, 2012-02-05 at 10:53 -0800, Jeff Davis wrote:
> > initdb should do these syncs by default and offer an option to > disable them. > > For test frameworks that run initdb often, that makes sense. > > But for developers, it doesn't make sense to spend 0.5s typing an > option > that saves you 0.3s. So, we'd need some more convenient way to choose > the no-fsync option, like an environment variable that developers can > set. Or maybe developers don't care about 0.3s? > -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Fri, Feb 10, 2012 at 3:57 PM, Peter Eisentraut <[hidden email]> wrote:
> On sön, 2012-02-05 at 10:53 -0800, Jeff Davis wrote: >> > initdb should do these syncs by default and offer an option to >> disable them. >> >> For test frameworks that run initdb often, that makes sense. >> >> But for developers, it doesn't make sense to spend 0.5s typing an >> option >> that saves you 0.3s. So, we'd need some more convenient way to choose >> the no-fsync option, like an environment variable that developers can >> set. Or maybe developers don't care about 0.3s? >> > You can use https://launchpad.net/libeatmydata for those cases. That's hilarious. But, a command-line option seems more convenient. It also seems entirely sufficient. The comments above suggest that it would take too long to type the option, but any PG developers who are worried about the speed difference surely know how to create shell aliases, shell functions, shell scripts, ... and if anyone's really concerned about it, we can provide a short form for the option. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Noah Misch-2
On Sun, 2012-02-05 at 17:56 -0500, Noah Misch wrote:
> I meant primarily to illustrate the need to be comprehensive, not comment on > which executable should fsync a particular file. Bootstrap-mode backends do > not sync anything during an initdb run on my system. With your patch, we'll > fsync a small handful of files and leave nearly everything else vulnerable. Thank you for pointing that out. With that in mind, I have a new version of the patch which just recursively fsync's the whole directory (attached). I also introduced a new option --nosync (-N) to disable this behavior. The bad news is that it introduces a lot more time to initdb -- it goes from about 1s to about 10s on my machine. I tried fsync'ing the whole directory twice just to make sure that the second was a no-op, and indeed it didn't make much difference (still about 10s). That's pretty inefficient considering that initdb -D data --nosync && sync only takes a couple seconds. Clearly batching the operation is a big help. Maybe there's some more efficient way to fsync a lot of files/directories? Or maybe I can mitigate it by avoiding files that don't really need to be fsync'd? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Tuesday, March 13, 2012 04:49:40 AM Jeff Davis wrote:
> On Sun, 2012-02-05 at 17:56 -0500, Noah Misch wrote: > > I meant primarily to illustrate the need to be comprehensive, not comment > > on which executable should fsync a particular file. Bootstrap-mode > > backends do not sync anything during an initdb run on my system. With > > your patch, we'll fsync a small handful of files and leave nearly > > everything else vulnerable. > > Thank you for pointing that out. With that in mind, I have a new version > of the patch which just recursively fsync's the whole directory > (attached). > > I also introduced a new option --nosync (-N) to disable this behavior. > > The bad news is that it introduces a lot more time to initdb -- it goes > from about 1s to about 10s on my machine. I tried fsync'ing the whole > directory twice just to make sure that the second was a no-op, and > indeed it didn't make much difference (still about 10s). for recursively everything in dir: posix_fadvise(fd, POSIX_FADV_DONTNEED); for recursively everything in dir: fsync(fd); In my experience that gives way much better performance due to the fact that it does not force its own metadata/journal commit/transaction for every file but can be batched. copydir() does the same since some releases... Obviously its not that nice to use _DONTNEED but I havent found something that works equally well. You could try sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WRITE) in the first loop but my experience with that hasn't been that good. Andres -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Tue, 2012-03-13 at 09:42 +0100, Andres Freund wrote:
> for recursively everything in dir: > posix_fadvise(fd, POSIX_FADV_DONTNEED); > > for recursively everything in dir: > fsync(fd); Wow, that made a huge difference! no sync: ~ 1.0s sync: ~10.0s fadvise+sync: ~ 1.3s Patch attached. Now I feel much better about it. Most people will either have fadvise, a write cache (rightly or wrongly), or actually need the sync. Those that have none of those can use -N. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| Powered by Nabble | Edit this page |
