Quantcast

Multiple index builds on same table - in one sweep?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Multiple index builds on same table - in one sweep?

Chris Ruprecht
I have a table that I need to rebuild indexes on from time to time (records get loaded before indexes get build).

To build the indexes, I use 'create index ...', which reads the entire table and builds the index, one at a time.
I'm wondering if there is a way to build these indexes in parallel while reading the table only once for all indexes and building them all at the same time. Is there an index build tool that I missed somehow, that can do this?

Thanks,
Chris.



best regards,
chris
--
chris ruprecht
database grunt and bit pusher extraordinaíre


--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Multiple index builds on same table - in one sweep?

Tom Lane-2
Chris Ruprecht <[hidden email]> writes:
> I have a table that I need to rebuild indexes on from time to time (records get loaded before indexes get build).
> To build the indexes, I use 'create index ...', which reads the entire table and builds the index, one at a time.
> I'm wondering if there is a way to build these indexes in parallel while reading the table only once for all indexes and building them all at the same time. Is there an index build tool that I missed somehow, that can do this?

I don't know of any automated tool, but if you launch several CREATE
INDEX operations on the same table at approximately the same time (in
separate sessions), they should share the I/O required to read the
table.  (The "synchronized scans" feature guarantees this in recent
PG releases, even if you're not very careful about starting them at
the same time.)

The downside of that is that you need N times the working memory and
you will have N times the subsidiary I/O for sort temp files and writes
to the finished indexes.  Depending on the characteristics of your I/O
system it's not hard to imagine this being a net loss ... but it'd be
interesting to experiment.

                        regards, tom lane

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Multiple index builds on same table - in one sweep?

Chris Ruprecht
I'm running 2 tests now, one, where I'm doing the traditional indexing, in sequence. The server isn't doing anything else, so I should get pretty accurate results.
Test 2 will win all the create index sessions in separate sessions in parallel (echo "create index ..."|psql ... & ) once the 'serial build' test is done.

Maybe, in a future release, somebody will develop something that can create indexes as inactive and have a build tool build and activate them at the same time. Food for thought?
 
On Apr 9, 2011, at 13:10 , Tom Lane wrote:

> Chris Ruprecht <[hidden email]> writes:
>> I have a table that I need to rebuild indexes on from time to time (records get loaded before indexes get build).
>> To build the indexes, I use 'create index ...', which reads the entire table and builds the index, one at a time.
>> I'm wondering if there is a way to build these indexes in parallel while reading the table only once for all indexes and building them all at the same time. Is there an index build tool that I missed somehow, that can do this?
>
> I don't know of any automated tool, but if you launch several CREATE
> INDEX operations on the same table at approximately the same time (in
> separate sessions), they should share the I/O required to read the
> table.  (The "synchronized scans" feature guarantees this in recent
> PG releases, even if you're not very careful about starting them at
> the same time.)
>
> The downside of that is that you need N times the working memory and
> you will have N times the subsidiary I/O for sort temp files and writes
> to the finished indexes.  Depending on the characteristics of your I/O
> system it's not hard to imagine this being a net loss ... but it'd be
> interesting to experiment.
>
> regards, tom lane


--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Multiple index builds on same table - in one sweep?

Greg Smith-21
On 04/09/2011 01:23 PM, Chris Ruprecht wrote:
> Maybe, in a future release, somebody will develop something that can create indexes as inactive and have a build tool build and activate them at the same time. Food for thought?
>    

Well, the most common case where this sort of thing happens is when
people are using pg_restore to load a dump of an entire database.  In
that case, you can use "-j" to run more than one loader job in parallel,
which can easily end up doing a bunch of index builds at once,
particularly at the end.  That already works about as well as it can
because of the synchronized scan feature Tom mentioned.

I doubt you'll ever get much traction arguing for something other than
continuing to accelerate that path; correspondingly, making your own
index builds look as much like it as possible is a good practice.  Fire
up as many builds as you can stand in parallel and see how many you can
take given the indexes+data involved.  It's not clear to me how a create
as inactive strategy could improve on that.

There are some types of index build operations that bottleneck on CPU
operations, and executing several of those in parallel can be a win.  At
some point you run out of physical I/O, or the additional memory you're
using starts taking away too much from caching.  Once you're at that
point, it's better to build the indexes on another pass, even if it
requires re-scanning the table data to do it.  The tipping point varies
based on both system and workload, it's very hard to predict or automate.

--
Greg Smith   2ndQuadrant US    [hidden email]   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Multiple index builds on same table - in one sweep?

Scott Marlowe-2
On Sun, Apr 10, 2011 at 8:29 PM, Greg Smith <[hidden email]> wrote:

> On 04/09/2011 01:23 PM, Chris Ruprecht wrote:
>>
>> Maybe, in a future release, somebody will develop something that can
>> create indexes as inactive and have a build tool build and activate them at
>> the same time. Food for thought?
>>
>
> Well, the most common case where this sort of thing happens is when people
> are using pg_restore to load a dump of an entire database.  In that case,
> you can use "-j" to run more than one loader job in parallel, which can
> easily end up doing a bunch of index builds at once, particularly at the
> end.  That already works about as well as it can because of the synchronized
> scan feature Tom mentioned.

FYI, in 8.3.13 I get this for all but one index:

ERROR:  deadlock detected
DETAIL:  Process 24488 waits for ShareLock on virtual transaction
64/825033; blocked by process 27505.
Process 27505 waits for ShareUpdateExclusiveLock on relation 297369165
of database 278059474; blocked by process 24488.

I'll try it on a big server running 8.4 and see what happens.

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Multiple index builds on same table - in one sweep?

Scott Marlowe-2
On Sun, Apr 10, 2011 at 11:35 PM, Scott Marlowe <[hidden email]> wrote:

> On Sun, Apr 10, 2011 at 8:29 PM, Greg Smith <[hidden email]> wrote:
>> On 04/09/2011 01:23 PM, Chris Ruprecht wrote:
>>>
>>> Maybe, in a future release, somebody will develop something that can
>>> create indexes as inactive and have a build tool build and activate them at
>>> the same time. Food for thought?
>>>
>>
>> Well, the most common case where this sort of thing happens is when people
>> are using pg_restore to load a dump of an entire database.  In that case,
>> you can use "-j" to run more than one loader job in parallel, which can
>> easily end up doing a bunch of index builds at once, particularly at the
>> end.  That already works about as well as it can because of the synchronized
>> scan feature Tom mentioned.
>
> FYI, in 8.3.13 I get this for all but one index:
>
> ERROR:  deadlock detected
> DETAIL:  Process 24488 waits for ShareLock on virtual transaction
> 64/825033; blocked by process 27505.
> Process 27505 waits for ShareUpdateExclusiveLock on relation 297369165
> of database 278059474; blocked by process 24488.
>
> I'll try it on a big server running 8.4 and see what happens.

Same error on pg 8.4.6

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Multiple index builds on same table - in one sweep?

Greg Smith-21
In reply to this post by Scott Marlowe-2
Scott Marlowe wrote:
> FYI, in 8.3.13 I get this for all but one index:
>
> ERROR:  deadlock detected
> DETAIL:  Process 24488 waits for ShareLock on virtual transaction
> 64/825033; blocked by process 27505.
> Process 27505 waits for ShareUpdateExclusiveLock on relation 297369165
> of database 278059474; blocked by process 24488.
>  

Is that trying to build them by hand?  The upthread request here is
actually already on the TODO list at
http://wiki.postgresql.org/wiki/Todo and it talks a bit about what works
and what doesn't right now:

Allow multiple indexes to be created concurrently, ideally via a single
heap scan
-pg_restore allows parallel index builds, but it is done via
subprocesses, and there is no SQL interface for this.

This whole idea was all the rage on these lists circa early 2008, but
parallel restore seems to have satisfied enough of the demand in this
general area that it doesn't come up quite as much now.

--
Greg Smith   2ndQuadrant US    [hidden email]   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Multiple index builds on same table - in one sweep?

Scott Marlowe-2
On Mon, Apr 11, 2011 at 1:41 AM, Greg Smith <[hidden email]> wrote:

> Scott Marlowe wrote:
>>
>> FYI, in 8.3.13 I get this for all but one index:
>>
>> ERROR:  deadlock detected
>> DETAIL:  Process 24488 waits for ShareLock on virtual transaction
>> 64/825033; blocked by process 27505.
>> Process 27505 waits for ShareUpdateExclusiveLock on relation 297369165
>> of database 278059474; blocked by process 24488.
>>
>
> Is that trying to build them by hand?  The upthread request here is actually
> already on the TODO list at http://wiki.postgresql.org/wiki/Todo and it
> talks a bit about what works and what doesn't right now:

Yes, by hand.  It creates an entry for the index but lists but marks
it as INVALID

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Multiple index builds on same table - in one sweep?

Shaun Thomas
In reply to this post by Chris Ruprecht
On 04/09/2011 11:28 AM, Chris Ruprecht wrote:

> I'm wondering if there is a way to build these indexes in parallel
> while reading the table only once for all indexes and building them
> all at the same time. Is there an index build tool that I missed
> somehow, that can do this?

I threw together a very crude duo of shell scripts to do this. I've
attached them for you. To use them, you make a file named tablist.txt
which contains the names of all the tables you want to reindex, and then
you run them like this:

bash generate_rebuild_scripts.sh my_database 8
bash launch_rebuild_scripts.sh my_database

The first one in the above example would connect to my_database and
create eight scripts that would run in parallel, with indexes ordered
smallest to largest to prevent one script from getting stuck with
several large indexes while the rest got small ones. The second script
just launches them and makes a log directory so you can watch the progress.

I've run this with up to 16 concurrent threads without major issue. It
comes in handy.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
[hidden email]

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email


--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

generate_rebuild_scripts.sh (1K) Download Attachment
launch_rebuild_scripts.sh (193 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Multiple index builds on same table - in one sweep?

Tom Lane-2
In reply to this post by Scott Marlowe-2
Scott Marlowe <[hidden email]> writes:
> On Mon, Apr 11, 2011 at 1:41 AM, Greg Smith <[hidden email]> wrote:
>> Scott Marlowe wrote:
>>> FYI, in 8.3.13 I get this for all but one index:
>>> ERROR:  deadlock detected

>> Is that trying to build them by hand?  The upthread request here is actually
>> already on the TODO list at http://wiki.postgresql.org/wiki/Todo and it
>> talks a bit about what works and what doesn't right now:

> Yes, by hand.  It creates an entry for the index but lists but marks
> it as INVALID

Are you trying to use CREATE INDEX CONCURRENTLY?  AFAIR that doesn't
support multiple index creations at the same time.  Usually you wouldn't
want that combination anyway, since the point of CREATE INDEX
CONCURRENTLY is to not prevent foreground use of the table while you're
making the index --- and multiple index creations are probably going to
eat enough I/O that you shouldn't be doing them during normal operations
anyhow.

Just use plain CREATE INDEX.

                        regards, tom lane

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Multiple index builds on same table - in one sweep?

Scott Marlowe-2
On Mon, Apr 11, 2011 at 12:39 PM, Tom Lane <[hidden email]> wrote:

> Scott Marlowe <[hidden email]> writes:
>> On Mon, Apr 11, 2011 at 1:41 AM, Greg Smith <[hidden email]> wrote:
>>> Scott Marlowe wrote:
>>>> FYI, in 8.3.13 I get this for all but one index:
>>>> ERROR:  deadlock detected
>
>>> Is that trying to build them by hand?  The upthread request here is actually
>>> already on the TODO list at http://wiki.postgresql.org/wiki/Todo and it
>>> talks a bit about what works and what doesn't right now:
>
>> Yes, by hand.  It creates an entry for the index but lists but marks
>> it as INVALID
>
> Are you trying to use CREATE INDEX CONCURRENTLY?  AFAIR that doesn't
> support multiple index creations at the same time.  Usually you wouldn't
> want that combination anyway, since the point of CREATE INDEX
> CONCURRENTLY is to not prevent foreground use of the table while you're
> making the index --- and multiple index creations are probably going to
> eat enough I/O that you shouldn't be doing them during normal operations
> anyhow.
>
> Just use plain CREATE INDEX.

I thought they'd stand in line waiting on each other.  I'll give it a try.

--
Sent via pgsql-performance mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Loading...