|
On Thu, Jan 20, 2011 at 09:36:11PM +0000, Simon Riggs wrote:
> I agree that the DDL behaviour is wrong and should be fixed. Thank you > for championing that alternative view. > > Swapping based upon names only works and is very flexible, much more so > than EXCHANGE could be. > > A separate utility might be worth it, but the feature set of that should > be defined in terms of correctly-working DDL behaviour. It's possible > that no further requirement exists. I remove my own patch from > consideration for this release. > > I'll review your patch and commit it, problems or objections excepted. I > haven't looked at it in any detail. Thanks. I wouldn't be very surprised if that patch is even the wrong way to achieve these semantics, but it's great that we're on the same page as to which semantics they are. > Having said that, writing the patch did nothing to convince me this was > the correct approach. Reviews should be reviews, they are not an > opportunity to provide your own alternate version of a patch. That just > confuses things and creates a competitive, not a cooperative > environment. Authors do need to listen to reviewers, so I hope I'm > demonstrating that here. Understood. I can see now that posting a second code patch, however framed, in the same thread creates a presumption of aggression that is difficult to dispel. I will have a lot to think about before doing that again. Thanks for giving this discussion, which started poorly due to my actions, a second chance. nm -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Robert Haas
Robert Haas <[hidden email]> writes:
> On Thu, Jan 20, 2011 at 4:24 PM, Tom Lane <[hidden email]> wrote: >> If you're willing to substitute an incompatible table, it's not clear >> why you don't just do >> >> begin; >> drop table t; >> alter table t_new rename to t; >> commit; > Because the whole source of this problem is dependency hell. Well, if you want to preserve dependencies, you can *not* just blindly substitute an incompatible table. You must ensure that views and foreign keys referencing the table are still valid. So I'm not sure where anybody got the idea that an implementation that fails to check all that is even worth presenting. regards, tom lane -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Simon Riggs
On Thu, Jan 20, 2011 at 11:17 AM, Simon Riggs <[hidden email]> wrote:
> On Wed, 2011-01-19 at 22:16 -0500, Robert Haas wrote: > >> That's another way of saying "the patch is not anywhere close to being done". > > My patch is materially incomplete. Certainly we may see that as grounds > for rejection, which I would not and could not argue with. It is a > popular feature, so I submitted anyway. I wouldn't say rejection per se - but I would definitely say push it out to 9.2. > When I said Noah's patch was trivial, I was referring to the amount of > work expended on it so far; no insult intended. I think the amount of > code to finish either is fairly low as well. > > If we wish to continue in this release then we must decide how. What I > was trying to indicate in my earlier comments was that my focus is on > achieving the required functionality in this release, or put another > way, I would accept Noah's patch rather than end with nothing. > > The main requirement, as I see it, is error checking. We need to do the > same checking however we do it; neither patch currently does it. > > If Noah's patch had error checking, then it would at least be safe to > recommend people do that. Then it is a simple matter of whether we think > implicit is OK, or whether it needs an explicit command. My patch does > it explicitly because that was the consensus from the earlier > discussion; I am in favour of the explicit route which is why I wrote > the patch that way, not because I wrote it that way. I'm not too sure I understand what you mean in saying that Noah's patch is "implicit"... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Noah Misch-2
On Thu, Jan 20, 2011 at 6:19 PM, Noah Misch <[hidden email]> wrote:
> On Thu, Jan 20, 2011 at 09:36:11PM +0000, Simon Riggs wrote: >> I agree that the DDL behaviour is wrong and should be fixed. Thank you >> for championing that alternative view. >> >> Swapping based upon names only works and is very flexible, much more so >> than EXCHANGE could be. >> >> A separate utility might be worth it, but the feature set of that should >> be defined in terms of correctly-working DDL behaviour. It's possible >> that no further requirement exists. I remove my own patch from >> consideration for this release. >> >> I'll review your patch and commit it, problems or objections excepted. I >> haven't looked at it in any detail. > > Thanks. I wouldn't be very surprised if that patch is even the wrong way to > achieve these semantics, but it's great that we're on the same page as to which > semantics they are. I think Noah's patch is a not a good idea, because it will result in calling RangeVarGetRelid twice even in the overwhelmingly common case where no relevant invalidation occurs. That'll add several syscache lookups per table to very common operations. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Simon Riggs
On Thu, 2011-01-20 at 21:36 +0000, Simon Riggs wrote:
> I'll review your patch and commit it, problems or objections excepted. Tom's comments elsewhere prevent me from committing. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Robert Haas
On Thu, Jan 20, 2011 at 09:48:12PM -0500, Robert Haas wrote:
> On Thu, Jan 20, 2011 at 6:19 PM, Noah Misch <[hidden email]> wrote: > > On Thu, Jan 20, 2011 at 09:36:11PM +0000, Simon Riggs wrote: > >> I agree that the DDL behaviour is wrong and should be fixed. Thank you > >> for championing that alternative view. > >> > >> Swapping based upon names only works and is very flexible, much more so > >> than EXCHANGE could be. > >> > >> A separate utility might be worth it, but the feature set of that should > >> be defined in terms of correctly-working DDL behaviour. It's possible > >> that no further requirement exists. I remove my own patch from > >> consideration for this release. > >> > >> I'll review your patch and commit it, problems or objections excepted. I > >> haven't looked at it in any detail. > > > > Thanks. ?I wouldn't be very surprised if that patch is even the wrong way to > > achieve these semantics, but it's great that we're on the same page as to which > > semantics they are. > > I think Noah's patch is a not a good idea, because it will result in > calling RangeVarGetRelid twice even in the overwhelmingly common case > where no relevant invalidation occurs. That'll add several syscache > lookups per table to very common operations. [Refresher: this was a patch to improve behavior for this test case: psql -c "CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES ('new')" psql -c 'SELECT pg_sleep(2) FROM t' & # block the ALTER or DROP briefly sleep 1 # give prev time to take AccessShareLock # Do it this way, and the next SELECT gets data from the old table. psql -c 'ALTER TABLE t RENAME TO old_t; ALTER TABLE new_t RENAME TO t' & # Do it this way, and get: ERROR: could not open relation with OID 41380 #psql -c 'DROP TABLE t; ALTER TABLE new_t RENAME TO t' & psql -c 'SELECT * FROM t' # I get 'old' or an error, never 'new'. psql -c 'DROP TABLE IF EXISTS t, old_t, new_t' It did so by rechecking the RangeVar->oid resolution after locking the found relation, by which time concurrent DDL could no longer be interfering.] I benchmarked the original patch with this function: Datum nmtest(PG_FUNCTION_ARGS) { int32 n = PG_GETARG_INT32(0); int i; RangeVar *rv = makeRangeVar(NULL, "pg_am", 0); for (i = 0; i < n; ++i) { Relation r = relation_openrv(rv, AccessShareLock); relation_close(r, AccessShareLock); } PG_RETURN_INT32(4); } (Releasing the lock before transaction end makes for an unrealistic benchmark, but so would opening the same relation millions of times in a single transaction. I'm trying to isolate the cost that would be spread over millions of transactions opening relations a handful of times. See attached shar archive for a complete extension wrapping that test function.) Indeed, the original patch slowed it by about 50%. I improved the patch, adding a global SharedInvalidMessageCounter to increment as we process messages. If this counter does not change between the RangeVarGetRelid() call and the post-lock AcceptInvalidationMessages() call, we can skip the second RangeVarGetRelid() call. With the updated patch, I get these timings (in ms) for runs of "SELECT nmtest(10000000)": master: 19697.642, 20087.477, 19748.995 patched: 19723.716, 19879.538, 20257.671 In other words, no significant difference. Since the patch removes the no-longer-needed pre-lock call to AcceptInvalidationMessages(), changing to "relation_close(r, NoLock)" in the test case actually reveals a 15% performance improvement. Granted, nothing to get excited about in light of the artificiality. This semantic improvement would be hard to test with the current pg_regress suite, so I do not include any test case addition in the patch. If sufficiently important, it could be done with isolationtester. Thanks, nm -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Sun, Jun 12, 2011 at 3:18 PM, Noah Misch <[hidden email]> wrote:
> Indeed, the original patch slowed it by about 50%. I improved the patch, > adding a global SharedInvalidMessageCounter to increment as we process > messages. If this counter does not change between the RangeVarGetRelid() call > and the post-lock AcceptInvalidationMessages() call, we can skip the second > RangeVarGetRelid() call. With the updated patch, I get these timings (in ms) > for runs of "SELECT nmtest(10000000)": > > master: 19697.642, 20087.477, 19748.995 > patched: 19723.716, 19879.538, 20257.671 > > In other words, no significant difference. Since the patch removes the > no-longer-needed pre-lock call to AcceptInvalidationMessages(), changing to > "relation_close(r, NoLock)" in the test case actually reveals a 15% > performance improvement. Granted, nothing to get excited about in light of > the artificiality. In point of fact, given the not-so-artificial results I just posted on another thread ("lazy vxid locks") I'm *very* excited about trying to reduce the cost of AcceptInvalidationMessages(). I haven't reviewed your patch in detail, but is there a way we can encapsulate the knowledge of the invalidation system down inside the sinval machinery, rather than letting the heap code have to know directly about the counter? Perhaps AIV() could return true or false depending on whether any invalidation messages were processed, or somesuch. > This semantic improvement would be hard to test with the current pg_regress > suite, so I do not include any test case addition in the patch. If > sufficiently important, it could be done with isolationtester. I haven't had a chance to look closely at the isolation tester yet, but I'm excited about the possibilities for testing this sort of thing. Not sure whether it's worth including this or not, but it doesn't seem like a bad idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Sun, Jun 12, 2011 at 06:20:53PM -0400, Robert Haas wrote:
> On Sun, Jun 12, 2011 at 3:18 PM, Noah Misch <[hidden email]> wrote: > > Indeed, the original patch slowed it by about 50%. ?I improved the patch, > > adding a global SharedInvalidMessageCounter to increment as we process > > messages. ?If this counter does not change between the RangeVarGetRelid() call > > and the post-lock AcceptInvalidationMessages() call, we can skip the second > > RangeVarGetRelid() call. ?With the updated patch, I get these timings (in ms) > > for runs of "SELECT nmtest(10000000)": > > > > master: 19697.642, 20087.477, 19748.995 > > patched: 19723.716, 19879.538, 20257.671 > > > > In other words, no significant difference. ?Since the patch removes the > > no-longer-needed pre-lock call to AcceptInvalidationMessages(), changing to > > "relation_close(r, NoLock)" in the test case actually reveals a 15% > > performance improvement. ?Granted, nothing to get excited about in light of > > the artificiality. > > In point of fact, given the not-so-artificial results I just posted on > another thread ("lazy vxid locks") I'm *very* excited about trying to > reduce the cost of AcceptInvalidationMessages(). Quite interesting. A quick look suggests there is room for optimization there. > I haven't reviewed > your patch in detail, but is there a way we can encapsulate the > knowledge of the invalidation system down inside the sinval machinery, > rather than letting the heap code have to know directly about the > counter? Perhaps AIV() could return true or false depending on > whether any invalidation messages were processed, or somesuch. I actually did it exactly that way originally. The problem was the return value only applying to the given call, while I wished to answer a question like "Did any call to AcceptInvalidationMessages() between code point A and code point B process a message?" Adding AcceptInvalidationMessages() calls to code between A and B would make the return value test yield a false negative. A global counter was the best thing I could come up with that avoided this hazard. -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Sun, Jun 12, 2011 at 9:23 PM, Noah Misch <[hidden email]> wrote:
>> I haven't reviewed >> your patch in detail, but is there a way we can encapsulate the >> knowledge of the invalidation system down inside the sinval machinery, >> rather than letting the heap code have to know directly about the >> counter? Perhaps AIV() could return true or false depending on >> whether any invalidation messages were processed, or somesuch. > > I actually did it exactly that way originally. The problem was the return value > only applying to the given call, while I wished to answer a question like "Did > any call to AcceptInvalidationMessages() between code point A and code point B > process a message?" Adding AcceptInvalidationMessages() calls to code between A > and B would make the return value test yield a false negative. A global counter > was the best thing I could come up with that avoided this hazard. Oh, interesting point. What if AcceptInvalidationMessages returns the counter? Maybe with typedef uint32 InvalidationPositionId or something like that, to make it partially self-documenting, and greppable. Taking that a bit further, what if we put that counter in shared-memory? After writing new messages into the queue, a writer would bump this count (only one process can be doing this at a time because SInvalWriteLock is held) and memory-fence. Readers would memory-fence and then read the count before acquiring the lock. If it hasn't changed since we last read it, then don't bother acquiring SInvalReadLock, because no new messages have arrived. Or maybe an exact multiple of 2^32 messages have arrived, but there's probably someway to finesse around that issue, like maybe also using some kind of memory barrier to allow resetState to be checked without the lock. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Sun, Jun 12, 2011 at 10:56:41PM -0400, Robert Haas wrote:
> On Sun, Jun 12, 2011 at 9:23 PM, Noah Misch <[hidden email]> wrote: > >> I haven't reviewed > >> your patch in detail, but is there a way we can encapsulate the > >> knowledge of the invalidation system down inside the sinval machinery, > >> rather than letting the heap code have to know directly about the > >> counter? ?Perhaps AIV() could return true or false depending on > >> whether any invalidation messages were processed, or somesuch. > > > > I actually did it exactly that way originally. ?The problem was the return value > > only applying to the given call, while I wished to answer a question like "Did > > any call to AcceptInvalidationMessages() between code point A and code point B > > process a message?" ?Adding AcceptInvalidationMessages() calls to code between A > > and B would make the return value test yield a false negative. ?A global counter > > was the best thing I could come up with that avoided this hazard. > > Oh, interesting point. What if AcceptInvalidationMessages returns the > counter? Maybe with typedef uint32 InvalidationPositionId or > something like that, to make it partially self-documenting, and > greppable. That might be a start, but it's not a complete replacement for the global counter. AcceptInvalidationMessages() is actually called in LockRelationOid(), but the comparison needs to happen a level up in RangeVarLockRelid(). So, we would be adding encapsulation in one place to lose it in another. Also, in the uncontended case, the patch only calls AcceptInvalidationMessages() once per relation_openrv. It compares the counter after that call with a counter as the last caller left it -- RangeVarLockRelid() doesn't care who that caller was. > Taking that a bit further, what if we put that counter in > shared-memory? After writing new messages into the queue, a writer > would bump this count (only one process can be doing this at a time > because SInvalWriteLock is held) and memory-fence. Readers would > memory-fence and then read the count before acquiring the lock. If it > hasn't changed since we last read it, then don't bother acquiring > SInvalReadLock, because no new messages have arrived. Or maybe an > exact multiple of 2^32 messages have arrived, but there's probably > someway to finesse around that issue, like maybe also using some kind > of memory barrier to allow resetState to be checked without the lock. This probably would not replace a backend-local counter of processed messages for RangeVarLockRelid()'s purposes. It's quite possibly a good way to reduce SInvalReadLock traffic, though. Exact multiples of 2^32 messages need not be a problem, because the queue is limited to MAXNUMMESSAGES (4096, currently). I think you will need to pack into one 32-bit value all data each backend needs to decide whether to proceed with the full process. Given that queue offsets fit into 13 bits (easily reduced to 12) and resetState is a bit, that seems practical enough at first glance. nm -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Mon, Jun 13, 2011 at 1:12 AM, Noah Misch <[hidden email]> wrote:
> That might be a start, but it's not a complete replacement for the global > counter. AcceptInvalidationMessages() is actually called in LockRelationOid(), > but the comparison needs to happen a level up in RangeVarLockRelid(). So, we > would be adding encapsulation in one place to lose it in another. Also, in the > uncontended case, the patch only calls AcceptInvalidationMessages() once per > relation_openrv. It compares the counter after that call with a counter as the > last caller left it -- RangeVarLockRelid() doesn't care who that caller was. Hmm, OK. >> Taking that a bit further, what if we put that counter in >> shared-memory? After writing new messages into the queue, a writer >> would bump this count (only one process can be doing this at a time >> because SInvalWriteLock is held) and memory-fence. Readers would >> memory-fence and then read the count before acquiring the lock. If it >> hasn't changed since we last read it, then don't bother acquiring >> SInvalReadLock, because no new messages have arrived. Or maybe an >> exact multiple of 2^32 messages have arrived, but there's probably >> someway to finesse around that issue, like maybe also using some kind >> of memory barrier to allow resetState to be checked without the lock. > > This probably would not replace a backend-local counter of processed messages > for RangeVarLockRelid()'s purposes. It's quite possibly a good way to reduce > SInvalReadLock traffic, though. > > Exact multiples of 2^32 messages need not be a problem, because the queue is > limited to MAXNUMMESSAGES (4096, currently). I think you will need to pack into > one 32-bit value all data each backend needs to decide whether to proceed with > the full process. Given that queue offsets fit into 13 bits (easily reduced to > 12) and resetState is a bit, that seems practical enough at first glance. I was imagining one shared global counter, not one per backend, and thinking that each backend could do something like: volatile uint32 *the_global_counter = &global_counter; uint32 latest_counter; mfence(); latest_counter = *the_global_counter; if (latest_counter != previous_value_of_global_counter || myprocstate->isReset) really_do_it(); previous_value_of_global_counter = latest_counter; I'm not immediately seeing why that wouldn't work for your purposes as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Mon, Jun 13, 2011 at 08:21:05AM -0400, Robert Haas wrote:
> On Mon, Jun 13, 2011 at 1:12 AM, Noah Misch <[hidden email]> wrote: > > This probably would not replace a backend-local counter of processed messages > > for RangeVarLockRelid()'s purposes. ?It's quite possibly a good way to reduce > > SInvalReadLock traffic, though. > I was imagining one shared global counter, not one per backend, and > thinking that each backend could do something like: > > volatile uint32 *the_global_counter = &global_counter; > uint32 latest_counter; > mfence(); > latest_counter = *the_global_counter; > if (latest_counter != previous_value_of_global_counter || myprocstate->isReset) > really_do_it(); > previous_value_of_global_counter = latest_counter; > > I'm not immediately seeing why that wouldn't work for your purposes as well. That takes us back to the problem of answering the (somewhat rephrased) question "Did any call to AcceptInvalidationMessages() between code point A and code point B call really_do_it()?" in a way not prone to breaking when new calls to AcceptInvalidationMessages(), perhaps indirectly, get added. That's what the local counter achieved. To achieve that, previous_value_of_global_counter would need to be exposed outside sinval.c. That leaves us with a backend-local counter updated in a different fashion. I might be missing something... -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Mon, Jun 13, 2011 at 4:04 PM, Noah Misch <[hidden email]> wrote:
> On Mon, Jun 13, 2011 at 08:21:05AM -0400, Robert Haas wrote: >> On Mon, Jun 13, 2011 at 1:12 AM, Noah Misch <[hidden email]> wrote: >> > This probably would not replace a backend-local counter of processed messages >> > for RangeVarLockRelid()'s purposes. ?It's quite possibly a good way to reduce >> > SInvalReadLock traffic, though. > >> I was imagining one shared global counter, not one per backend, and >> thinking that each backend could do something like: >> >> volatile uint32 *the_global_counter = &global_counter; >> uint32 latest_counter; >> mfence(); >> latest_counter = *the_global_counter; >> if (latest_counter != previous_value_of_global_counter || myprocstate->isReset) >> really_do_it(); >> previous_value_of_global_counter = latest_counter; >> >> I'm not immediately seeing why that wouldn't work for your purposes as well. > > That takes us back to the problem of answering the (somewhat rephrased) question > "Did any call to AcceptInvalidationMessages() between code point A and code > point B call really_do_it()?" in a way not prone to breaking when new calls to > AcceptInvalidationMessages(), perhaps indirectly, get added. That's what the > local counter achieved. To achieve that, previous_value_of_global_counter would > need to be exposed outside sinval.c. That leaves us with a backend-local > counter updated in a different fashion. I might be missing something... I see your point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Noah Misch-2
So I was the victim assigned to review this patch.
The code is pretty much impeccable as usual from Noah. But I have questions about the semantics of it. Firstly this bit makes me wonder: + /* Not-found is always final. */ + if (!OidIsValid(relOid1)) + return relOid1; If someone does BEGIN; DROP TABLE foo; CREATE TABLE foo; COMMIT; Then what prevents this logic from finding the doomed relation, blocking until the transaction commits, then finding it's deleted and returning InvalidOid? RangeVarGetRelid is just going to complete its index scan of pg_class and may not come across the newly inserted row. Am I missing something? I would have expected to have to loop around and retry if no valid record is found. But this raises the question -- if no lock was acquired then what would have triggered an AcceptInvalidatationMessages and how would we know we waited long enough to find out about the newly created table? As a side note, if there are a long stream of such concurrent DDL then this code will leave all the old versions locked. This is consistent with our "hold locks until end of transaction" semantics but it seems weird for tables that we locked "accidentally" and didn't really end up using at all. I'm not sure it's really bad though. -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
Hi Greg,
On Sun, Jun 19, 2011 at 11:44:35PM +0100, Greg Stark wrote: > So I was the victim assigned to review this patch. Thanks for doing so. > The code is pretty much impeccable as usual from Noah. But I have > questions about the semantics of it. > > Firstly this bit makes me wonder: > > + /* Not-found is always final. */ > + if (!OidIsValid(relOid1)) > + return relOid1; > > If someone does > > BEGIN; DROP TABLE foo; CREATE TABLE foo; COMMIT; > > Then what prevents this logic from finding the doomed relation, > blocking until the transaction commits, then finding it's deleted and > returning InvalidOid? > RangeVarGetRelid is just going to complete its index scan of pg_class > and may not come across the newly inserted row. RangeVarGetRelid() always runs its index scan to completion, and the blocking happens in LockRelationOid(). You will get a sequence like this: RangeVarGetRelid("foo") => 20000 LockRelationOid(20000) [... blocks ...] AcceptInvalidationMessages() [process a message] RangeVarGetRelid("foo") => 20001 [restart loop] LockRelationOid(20001) AcceptInvalidationMessages() [no new messages - done] RangeVarGetRelid() *is* vulnerable to the problem Simon just reported in the "ALTER TABLE lock strength reduction patch is unsafe" thread, which arises when the DDL transaction actually commits in the middle of a concurrent system table scan. I don't think this patch makes things better or worse in that regard, but I haven't thought it through in great depth. > Am I missing something? I would have expected to have to loop around > and retry if no valid record is found. But this raises the question -- > if no lock was acquired then what would have triggered an > AcceptInvalidatationMessages and how would we know we waited long > enough to find out about the newly created table? Good question. I think characterizing "long enough" quickly leads to defining one or more sequence points after which all backends must recognize a new table as existing. My greenfield definition would be "a command should see precisely the tables visible to its MVCC snapshot", but that has practical problems. Let's see what implementation concerns would suggest... This leads to a case I had not considered explicitly: CREATE TABLE on a name that has not recently mapped to any table. If the catcache has a negative entry on the key in question, we will rely on that and miss the new table until we call AcceptInvalidationMessages() somehow. To hit this, you need a command that dynamically chooses to query a table that has been created since the command started running. DROP/CREATE of the same name in a single transaction can't hit the problem. Consider this test script: psql -X <<\_EOSQL & -- Cleanup from last run DROP TABLE IF EXISTS public.foo; BEGIN; -- Create the neg catcache entry. SAVEPOINT q; SELECT 1 FROM public.foo; ROLLBACK to q; --SET client_min_messages = debug5; -- use with CACHEDEBUG for insight DO $$ BEGIN EXECUTE 'SELECT 1 FROM pg_am'; -- prime basic catcache entries PERFORM pg_sleep(11); EXECUTE 'SELECT 1 FROM public.foo'; END $$; _EOSQL sleep 1 psql -Xc 'CREATE TABLE public.foo ()' wait The first backend fails to see the new table despite its creating transaction having committed ~10s ago. Starting a transaction, beginning to process a new client-issued command, or successfully locking any relation prevents the miss. We could narrow the window in most cases by re-adding a call to AcceptInvalidationMessages() before RangeVarLockRelid()'s first call to RangeVarGetRelid(). My current thinking is that it's not worth adding that cost to every RangeVarLockRelid(). Thus, specify that, minimally, each client-issued command will see all tables whose names were occupied at the time the command started. I would add a comment to that effect. Thoughts? > As a side note, if there are a long stream of such concurrent DDL then > this code will leave all the old versions locked. This is consistent > with our "hold locks until end of transaction" semantics but it seems > weird for tables that we locked "accidentally" and didn't really end > up using at all. I'm not sure it's really bad though. Yes. If that outcome were more common, this would be a good place to try relaxing the rule. nm -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Sun, Jun 19, 2011 at 11:42 PM, Noah Misch <[hidden email]> wrote:
> On Sun, Jun 19, 2011 at 11:44:35PM +0100, Greg Stark wrote: >> So I was the victim assigned to review this patch. > > Thanks for doing so. This discussion seems to have died off. Let's see if we can drive this forward to some conclusion. I took a look at this patch and found that it had bit-rotted slightly. I am attaching a rebased version. Maybe this is a stupid idea, but what about changing the logic so that, if we get back InvalidOid, we AcceptInvalidationMessages() and retry if the counter has advanced? ISTM that might cover the example you mentioned in your last post, where we fail to detect a relation that has come into existence since our last call to AcceptInvalidationMessages(). It would cost an extra AcceptInvalidationMessages() only in the case where we haven't found the relation, which (a) seems like a good time to worry about whether we're missing something, since users generally try not to reference nonexistent tables and (b) should be rare enough to be ignorable from a performance perspective. In the department of minor nitpicking, why not use a 64-bit counter for SharedInvalidMessageCounter? Then we don't have to think very hard about whether overflow can ever pose a problem. It strikes me that, even with this patch, there is a fair amount of room for wonky behavior. For example, as your comment notes, if search_path = foo, bar, and we've previously referenced "x", getting "bar.x", the creation of "foo.x" will generate invalidation messages, but a subsequent reference - within the same transaction - to "x" will not cause us to read them. It would be nice to AcceptInvalidationMessages() unconditionally at the beginning of RangeVarGetRelid() [and then redo as necessary to get a stable answer], but that might have some performance consequence for transactions that repeatedly read the same tables. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Wed, Jul 06, 2011 at 03:06:40PM -0400, Robert Haas wrote:
> On Sun, Jun 19, 2011 at 11:42 PM, Noah Misch <[hidden email]> wrote: > > On Sun, Jun 19, 2011 at 11:44:35PM +0100, Greg Stark wrote: > >> So I was the victim assigned to review this patch. > > > > Thanks for doing so. > > This discussion seems to have died off. Let's see if we can drive > this forward to some conclusion. > > I took a look at this patch and found that it had bit-rotted slightly. > I am attaching a rebased version. Thanks. > Maybe this is a stupid idea, but what about changing the logic so > that, if we get back InvalidOid, we AcceptInvalidationMessages() and > retry if the counter has advanced? ISTM that might cover the example > you mentioned in your last post, where we fail to detect a relation > that has come into existence since our last call to > AcceptInvalidationMessages(). It would cost an extra > AcceptInvalidationMessages() only in the case where we haven't found > the relation, which (a) seems like a good time to worry about whether > we're missing something, since users generally try not to reference > nonexistent tables and (b) should be rare enough to be ignorable from > a performance perspective. Agreed on all points. Good idea. That improves our guarantee from "any client-issued command will see tables committed before its submission" to "_any command_ will see tables committed before its _parsing_". In particular, commands submitted using SPI will no longer be subject to this source of déjà vu. I, too, doubt that looking up nonexistent relations is a performance-critical operation for anyone. > In the department of minor nitpicking, why not use a 64-bit counter > for SharedInvalidMessageCounter? Then we don't have to think very > hard about whether overflow can ever pose a problem. Overflow is fine because I only ever compare values for equality, and I use an unsigned int to give defined behavior at overflow. However, the added cost of a 64-bit counter should be negligible, and future use cases (including external code) might appreciate it. No strong preference. > It strikes me that, even with this patch, there is a fair amount of > room for wonky behavior. For example, as your comment notes, if > search_path = foo, bar, and we've previously referenced "x", getting > "bar.x", the creation of "foo.x" will generate invalidation messages, > but a subsequent reference - within the same transaction - to "x" will > not cause us to read them. It would be nice to > AcceptInvalidationMessages() unconditionally at the beginning of > RangeVarGetRelid() [and then redo as necessary to get a stable > answer], but that might have some performance consequence for > transactions that repeatedly read the same tables. A user doing that should "LOCK bar.x" in the transaction that creates "foo.x", giving a clean cutover. (I thought of documenting that somewhere, but it seemed a tad esoteric.) In the absence of such a lock, an extra unconditional call to AcceptInvalidationMessages() narrows the window in which his commands parse as using the "wrong" table. However, commands that have already parsed will still use the old table without interruption, with no particular bound on when they may finish. I've failed to come up with a use case where the narrower window for parse inconsistencies is valuable but the remaining exposure is acceptable. There may very well be one I'm missing, though. While a mere "LOCK bar.x" is sufficient to get a clean cutover with respect to parsing, it fails to invalidate plans. To really cover all bases, you need some no-op action that invalidates "bar.x". For actual practical use, I'd recommend something like: BEGIN; ALTER TABLE bar.x RENAME TO x0; ALTER TABLE bar.x0 RENAME TO x; CREATE TABLE foo.x ... COMMIT; Probably worth making it more intuitive to DTRT here. Thanks, nm -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Wed, Jul 6, 2011 at 6:32 PM, Noah Misch <[hidden email]> wrote:
>> Maybe this is a stupid idea, but what about changing the logic so >> that, if we get back InvalidOid, we AcceptInvalidationMessages() and >> retry if the counter has advanced? ISTM that might cover the example >> you mentioned in your last post, where we fail to detect a relation >> that has come into existence since our last call to >> AcceptInvalidationMessages(). It would cost an extra >> AcceptInvalidationMessages() only in the case where we haven't found >> the relation, which (a) seems like a good time to worry about whether >> we're missing something, since users generally try not to reference >> nonexistent tables and (b) should be rare enough to be ignorable from >> a performance perspective. > > Agreed on all points. Good idea. That improves our guarantee from "any > client-issued command will see tables committed before its submission" to > "_any command_ will see tables committed before its _parsing_". In > particular, commands submitted using SPI will no longer be subject to this > source of déją vu. I, too, doubt that looking up nonexistent relations is a > performance-critical operation for anyone. > >> In the department of minor nitpicking, why not use a 64-bit counter >> for SharedInvalidMessageCounter? Then we don't have to think very >> hard about whether overflow can ever pose a problem. > > Overflow is fine because I only ever compare values for equality, and I use an > unsigned int to give defined behavior at overflow. However, the added cost of > a 64-bit counter should be negligible, and future use cases (including > external code) might appreciate it. No strong preference. Yeah, that's what I was thinking. I have a feeling we may want to use this mechanism in other places, including places where it would be nice to be able to assume that > has sensible semantics. >> It strikes me that, even with this patch, there is a fair amount of >> room for wonky behavior. For example, as your comment notes, if >> search_path = foo, bar, and we've previously referenced "x", getting >> "bar.x", the creation of "foo.x" will generate invalidation messages, >> but a subsequent reference - within the same transaction - to "x" will >> not cause us to read them. It would be nice to >> AcceptInvalidationMessages() unconditionally at the beginning of >> RangeVarGetRelid() [and then redo as necessary to get a stable >> answer], but that might have some performance consequence for >> transactions that repeatedly read the same tables. > > A user doing that should "LOCK bar.x" in the transaction that creates "foo.x", > giving a clean cutover. (I thought of documenting that somewhere, but it > seemed a tad esoteric.) In the absence of such a lock, an extra unconditional > call to AcceptInvalidationMessages() narrows the window in which his commands > parse as using the "wrong" table. However, commands that have already parsed > will still use the old table without interruption, with no particular bound on > when they may finish. I've failed to come up with a use case where the > narrower window for parse inconsistencies is valuable but the remaining > exposure is acceptable. There may very well be one I'm missing, though. > > While a mere "LOCK bar.x" is sufficient to get a clean cutover with respect to > parsing, it fails to invalidate plans. To really cover all bases, you need > some no-op action that invalidates "bar.x". For actual practical use, I'd > recommend something like: > > BEGIN; > ALTER TABLE bar.x RENAME TO x0; > ALTER TABLE bar.x0 RENAME TO x; > CREATE TABLE foo.x ... > COMMIT; > > Probably worth making it more intuitive to DTRT here. Well, what would be really nice is if it just worked. Care to submit an updated patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Wed, Jul 06, 2011 at 08:35:55PM -0400, Robert Haas wrote:
> On Wed, Jul 6, 2011 at 6:32 PM, Noah Misch <[hidden email]> wrote: > > While a mere "LOCK bar.x" is sufficient to get a clean cutover with respect to > > parsing, it fails to invalidate plans. To really cover all bases, you need > > some no-op action that invalidates "bar.x". For actual practical use, I'd > > recommend something like: > > > > BEGIN; > > ALTER TABLE bar.x RENAME TO x0; > > ALTER TABLE bar.x0 RENAME TO x; > > CREATE TABLE foo.x ... > > COMMIT; > > > > Probably worth making it more intuitive to DTRT here. > > Well, what would be really nice is if it just worked. > Care to submit an updated patch? Attached. I made the counter 64 bits wide, handled the nothing-found case per your idea, and improved a few comments cosmetically. I have not attempted to improve the search_path interposition case. We can recommend the workaround above, and doing better looks like an excursion much larger than the one represented by this patch. Thanks, nm -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Wed, Jul 6, 2011 at 10:44 PM, Noah Misch <[hidden email]> wrote:
> Attached. I made the counter 64 bits wide, handled the nothing-found case per > your idea, and improved a few comments cosmetically. I have not attempted to > improve the search_path interposition case. We can recommend the workaround > above, and doing better looks like an excursion much larger than the one > represented by this patch. I looked at this some more and started to get uncomfortable with the whole idea of having RangeVarLockRelid() be a wrapper around RangeVarGetRelid(). This hazard exists everywhere the latter function gets called, not just in relation_open(). So it doesn't seem right to fix the problem only in those places. So I went through and incorporated the logic proposed for RangeVarLockRelid() into RangeVarGetRelid() itself, and then went through and examined all the callers of RangeVarGetRelid(). There are some, such as has_table_privilege(), where it's really impractical to take any lock, first because we might have no privileges at all on that table and second because that could easily lead to a massive amount of locking for no particular good reason. I believe Tom suggested that the right fix for these functions is to have them index-scan the system catalogs using the caller's MVCC snapshot, which would be right at least for pg_dump. And there are other callers that cannot acquire the lock as part of RangeVarGetRelid() for a variety of other reasons. However, having said that, there do appear to be a number of cases that are can be fixed fairly easily. So here's a (heavily) updated patch that tries to do that, along with adding comments to the places where things still need more fixing. In addition to the problems corrected by your last version, this fixes LOCK TABLE, ALTER SEQUENCE, ALTER TABLE .. RENAME, the whole-table variant of REINDEX, CREATE CONSTRAINT TRIGGER (which is flat-out wrong as it stands, since it acquires *no lock at all* on the table specified in the FROM clause, never mind the question of doing so atomically), CREATE RULE, and (partially) DROP TRIGGER and DROP RULE. Regardless of exactly how we decide to proceed here, it strikes me that there is a heck of a lot more work that could stand to be done in this area... :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| Powered by Nabble | Edit this page |
