|
I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
contrib module. I think that this module would be the basis of further SQL/MED development for core, e.g. join-push-down and ANALYZE support. I attached three patches for this new FDW. They should be applied in the order below. I separated these patches so that first (or first two) can be committed separately. * fdw_helper_doc.patch provides documents for FDW developers about helper functions existing in 9.1, so this can be back-patched. * fdw_helper_funcs.patch provides additional helper functions which would make manipulation of FDW options easier. * pgsql_fdw.patch provides new FDW for external PG server. Here are details of pgsql_fdw. Name of the wrapper =================== I used the name "pgsql_fdw" for the wrapper and its derivatives. I think it would be better to leave contrib/dblink and built-in postgresql_fdw_validator for backward compatibility, and use new name for new wrapper. Or, it might be OK to rename postgresql_fdw_validator to dblink_validator or something, or fix dblink to use validator of new wrapper. I'm not sure that dblink should be alone or integrated with pgsql_fdw... Connection management ===================== The pgsql_fdw establishes a new connection when a foreign server is accessed first for the local session. Established connection is shared between all foreign scans in the local query, and shared between even scans in following queries. Connections are discarded when the current transaction aborts so that unexpected failure won't cause connection leak. This is implemented with resource owner mechanism. User can see active connections via pgsql_fdw_connections view, and discard arbitrary connection via pgsql_fdw_disconnect() function. These can be done from only same local session. If local role has changed via SET ROLE or SET SESSION AUTHENTICATION, pgsql_fdw ignores old role's connections and looks up appropriate connection for the new role from the pool. If there wasn't suitable one, pgsql_fdw establishes new connection. When local role has changed to old role again, pooled connection will be used again. Unlike contrib/dblink, one foreign server can have only one connection at a time for one local role. This is because pgsql_fdw doesn't support named connections. Cost estimation =============== The pgsql_fdw executes an EXPLAIN command on remote side for each PlanForeignScan call. Returned costs and rows are used as local estimation for the Path with adding connection costs and data transfer costs. SELECT optimization =================== To reduce amount of data transferred from remote server, references to unnecessary columns are replaced with NULL literal in remote query. WHERE clause push-down ====================== Some kind of qualifiers in WHERE clause are pushed down to remote server so that the query result can be reduced. Currently qualifiers which include any volatile or stable element can't be pushed down. Even with these limitations, most qualifiers would be pushed down in usual cases. Cursor mode =========== The pgsql_fdw switches the way to retrieve result records according to estimated result rows; use simple SELECT for small result, and use cursor with DECLARE/FETCH statements for large result. The threshold is default to 1000, and configurable with FDW option "min_cursor_rows". In cursor mode, number of rows fetched at once can be controlled by FDW option "fetch_count". EXPLAIN output ============== The pgsql_fdw shows a remote query used for each foreign scan node in the output of EXPLAIN command with title "Remote SQL". If pgsql_fdw decided to use cursor for the scan, DECLARE statement is shown. Regards, -- Shigeru Hanada -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
2011/10/25 Shigeru Hanada <[hidden email]>:
> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a > contrib module. I think that this module would be the basis of further > SQL/MED development for core, e.g. join-push-down and ANALYZE support. I have not looked at the code itself, but I wonder if we shouldn't consider making this a part of core-proper, not just a contrib module. The fact that it isn't *already* available in core surprises a lot of people... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
Magnus Hagander <[hidden email]> writes:
> 2011/10/25 Shigeru Hanada <[hidden email]>: >> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a >> contrib module. I think that this module would be the basis of further >> SQL/MED development for core, e.g. join-push-down and ANALYZE support. > I have not looked at the code itself, but I wonder if we shouldn't > consider making this a part of core-proper, not just a contrib module. > The fact that it isn't *already* available in core surprises a lot of > people... We've just spent a whole lot of blood and sweat on making the extension mechanism work nicely. I don't understand this urge to not use it. ATM I'm not sure it's even a good idea to push pgsql_fdw into contrib. Once we do that its release schedule will get locked to core's --- wouldn't it be better to keep flexibility for now, while it's in such active development? regards, tom lane -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Tue, Oct 25, 2011 at 14:08, Tom Lane <[hidden email]> wrote:
> Magnus Hagander <[hidden email]> writes: >> 2011/10/25 Shigeru Hanada <[hidden email]>: >>> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a >>> contrib module. I think that this module would be the basis of further >>> SQL/MED development for core, e.g. join-push-down and ANALYZE support. > >> I have not looked at the code itself, but I wonder if we shouldn't >> consider making this a part of core-proper, not just a contrib module. >> The fact that it isn't *already* available in core surprises a lot of >> people... > > We've just spent a whole lot of blood and sweat on making the extension > mechanism work nicely. I don't understand this urge to not use it. We're back to the old discussion, I guess.. I'm happy to see it as an extension, but I think it should be included with the standard installation. Like we do with for example pl/pgsql (which I realize has a dependency on the backend anyway, so it can't be done another way easily) and pl/perl (which doesn't, AFAIK, so it's a better example) > ATM I'm not sure it's even a good idea to push pgsql_fdw into contrib. > Once we do that its release schedule will get locked to core's --- > wouldn't it be better to keep flexibility for now, while it's in such > active development? I would be happy to keep it outside, and integrate it in the final CF for example :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
>> ATM I'm not sure it's even a good idea to push pgsql_fdw into contrib.
>> Once we do that its release schedule will get locked to core's --- >> wouldn't it be better to keep flexibility for now, while it's in such >> active development? > > I would be happy to keep it outside, and integrate it in the final CF > for example :) > Right now, file_fdw is the only FDW module that we have in the core, however, it is inadequacy to proof the new concept of FDW feature to utilize external RDBMS, such as join push-down of foreign tables. I think the pgsql-fdw module also should be included in the core distribution as a basis of future enhancement, unless we don't need any working modules when an enhancement of FDW is proposed. Thanks, -- KaiGai Kohei <[hidden email]> -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
* Kohei KaiGai ([hidden email]) wrote:
> Right now, file_fdw is the only FDW module that we have in the core, Erm, guess I'm a bit confused why we've got that in core while not putting pgsql_fdw in core. This all gets back to previous discussions around 'recommended' contrib modules (which should really be installed by default on the filesystem through the distros, ala Debian's "recommends:" approach) and 'other' contrib modules. I'm in favor of making that distinction. I would still have pgsql_fdw, file_fdw, etc, be packaged more-or-less the same way and still use the CREATE EXTENTION framework, of course. It would be nice if we didn't have to lock the release schedule of those recommended modules to the core release schedule, or even to each other, but that's a separate issue, imv. Thanks, Stephen |
|
In reply to this post by Tom Lane-2
On Tue, Oct 25, 2011 at 3:08 PM, Tom Lane <[hidden email]> wrote:
> Magnus Hagander <[hidden email]> writes: >> 2011/10/25 Shigeru Hanada <[hidden email]>: >>> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a >>> contrib module. I think that this module would be the basis of further >>> SQL/MED development for core, e.g. join-push-down and ANALYZE support. > >> I have not looked at the code itself, but I wonder if we shouldn't >> consider making this a part of core-proper, not just a contrib module. >> The fact that it isn't *already* available in core surprises a lot of >> people... > > We've just spent a whole lot of blood and sweat on making the extension > mechanism work nicely. I don't understand this urge to not use it. > > ATM I'm not sure it's even a good idea to push pgsql_fdw into contrib. > Once we do that its release schedule will get locked to core's --- > wouldn't it be better to keep flexibility for now, while it's in such > active development? Simple question - do FDW internals need work? -- marko -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Magnus Hagander-2
(2011/10/25 19:15), Magnus Hagander wrote:
> 2011/10/25 Shigeru Hanada<[hidden email]>: >> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a >> contrib module. I think that this module would be the basis of further >> SQL/MED development for core, e.g. join-push-down and ANALYZE support. > > I have not looked at the code itself, but I wonder if we shouldn't > consider making this a part of core-proper, not just a contrib module. > The fact that it isn't *already* available in core surprises a lot of > people... Do you mean that pgsql_fdw should be a built-in extension like plpgsql so that it's available just after initdb? It would be accomplished with some more changes: * Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and install dynamically loadable module during "make install" for core. The pgsql_fdw_handler function can't be included into core binary because we must avoid liking libpq with server binary directly. This method is also used for libwalreceiver of replication module. * Create pgsql_fdw extension during initdb invocation, like plpgsql. These are not trivial, but not difficult so much. However, I think contrib would be the appropriate place for pgsql_fdw because it's (relatively) special feature. -- Shigeru Hanada -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
Shigeru Hanada <[hidden email]> writes:
> (2011/10/25 19:15), Magnus Hagander wrote: >> I have not looked at the code itself, but I wonder if we shouldn't >> consider making this a part of core-proper, not just a contrib module. >> The fact that it isn't *already* available in core surprises a lot of >> people... > Do you mean that pgsql_fdw should be a built-in extension like plpgsql > so that it's available just after initdb? If that was what he meant, I'd vote against it. There are way too many people who will *not* want their databases configured to be able to reach out onto the net. This feature should be something that has to be installed by explicit user action. regards, tom lane -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Shigeru Hanada-2
2011/10/26 Shigeru Hanada <[hidden email]>:
> (2011/10/25 19:15), Magnus Hagander wrote: >> 2011/10/25 Shigeru Hanada<[hidden email]>: >>> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a >>> contrib module. I think that this module would be the basis of further >>> SQL/MED development for core, e.g. join-push-down and ANALYZE support. >> >> I have not looked at the code itself, but I wonder if we shouldn't >> consider making this a part of core-proper, not just a contrib module. >> The fact that it isn't *already* available in core surprises a lot of >> people... > > Do you mean that pgsql_fdw should be a built-in extension like plpgsql > so that it's available just after initdb? It would be accomplished with > some more changes: > > * Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and > install dynamically loadable module during "make install" for core. The > pgsql_fdw_handler function can't be included into core binary because we > must avoid liking libpq with server binary directly. This method is > also used for libwalreceiver of replication module. > * Create pgsql_fdw extension during initdb invocation, like plpgsql. > > These are not trivial, but not difficult so much. However, I think > contrib would be the appropriate place for pgsql_fdw because it's > (relatively) special feature. I agree. pgsql_fdw will be a nice feature, but there's no reason to think that everyone will want it installed by default, and there are some security reasons to think that they might not. On the flip side, pushing it out of contrib and onto pgfoundry or whatever makes it unnecessarily difficult to install, and not as many people will benefit from it. So contrib seems exactly right to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Stephen Frost
Stephen Frost <[hidden email]> writes:
> I'm in favor of making that distinction. I would still have pgsql_fdw, > file_fdw, etc, be packaged more-or-less the same way and still use the > CREATE EXTENTION framework, of course. We called that idea “core extension” at the latest hackers meeting, and Greg Smith had a patch with a first selections of extensions to package this way. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Robert Haas
2011/10/26 Robert Haas <[hidden email]>:
> 2011/10/26 Shigeru Hanada <[hidden email]>: >> (2011/10/25 19:15), Magnus Hagander wrote: >>> 2011/10/25 Shigeru Hanada<[hidden email]>: >>>> I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a >>>> contrib module. I think that this module would be the basis of further >>>> SQL/MED development for core, e.g. join-push-down and ANALYZE support. >>> >>> I have not looked at the code itself, but I wonder if we shouldn't >>> consider making this a part of core-proper, not just a contrib module. >>> The fact that it isn't *already* available in core surprises a lot of >>> people... >> >> Do you mean that pgsql_fdw should be a built-in extension like plpgsql >> so that it's available just after initdb? It would be accomplished with >> some more changes: >> >> * Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and >> install dynamically loadable module during "make install" for core. The >> pgsql_fdw_handler function can't be included into core binary because we >> must avoid liking libpq with server binary directly. This method is >> also used for libwalreceiver of replication module. >> * Create pgsql_fdw extension during initdb invocation, like plpgsql. >> >> These are not trivial, but not difficult so much. However, I think >> contrib would be the appropriate place for pgsql_fdw because it's >> (relatively) special feature. > > I agree. pgsql_fdw will be a nice feature, but there's no reason to > think that everyone will want it installed by default, and there are > some security reasons to think that they might not. On the flip side, > pushing it out of contrib and onto pgfoundry or whatever makes it > unnecessarily difficult to install, and not as many people will > benefit from it. So contrib seems exactly right to me. > as a contrib module. It will give us clear opportunity to test new features of FDW using RDBMS characteristics; such as join-push-down. However, it should be a separated discussion whether it shall be installed by the default. Thanks, -- KaiGai Kohei <[hidden email]> -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Tom Lane-2
On Wed, Oct 26, 2011 at 16:37, Tom Lane <[hidden email]> wrote:
> Shigeru Hanada <[hidden email]> writes: >> (2011/10/25 19:15), Magnus Hagander wrote: >>> I have not looked at the code itself, but I wonder if we shouldn't >>> consider making this a part of core-proper, not just a contrib module. >>> The fact that it isn't *already* available in core surprises a lot of >>> people... > >> Do you mean that pgsql_fdw should be a built-in extension like plpgsql >> so that it's available just after initdb? > > If that was what he meant, I'd vote against it. There are way too many > people who will *not* want their databases configured to be able to > reach out onto the net. This feature should be something that has to be > installed by explicit user action. That is not what I meant. I meant installed the shared library by defualt, but still require CREATE EXTENSION. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On 10/26/2011 12:47 PM, Magnus Hagander wrote: >> >> If that was what he meant, I'd vote against it. There are way too many >> people who will *not* want their databases configured to be able to >> reach out onto the net. This feature should be something that has to be >> installed by explicit user action. > That is not what I meant. > > I meant installed the shared library by defualt, but still require > CREATE EXTENSION. > I don't see why it should be different from other standard modules, such as citext or hstore, both of which have pretty wide use, and less possible security implications than this. cheers andrew -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Wed, Oct 26, 2011 at 19:25, Andrew Dunstan <[hidden email]> wrote:
> > On 10/26/2011 12:47 PM, Magnus Hagander wrote: >>> >>> If that was what he meant, I'd vote against it. There are way too many >>> people who will *not* want their databases configured to be able to >>> reach out onto the net. This feature should be something that has to be >>> installed by explicit user action. >> >> That is not what I meant. >> >> I meant installed the shared library by defualt, but still require >> CREATE EXTENSION. >> > > I don't see why it should be different from other standard modules, such as > citext or hstore, both of which have pretty wide use, and less possible > security implications than this. As I stated earlier, it's really back to the old discussion of splitting up contrib. This would be the "additional module" part, but not the "example of how to do things" part of that... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Magnus Hagander-2
Magnus Hagander <[hidden email]> writes:
> On Wed, Oct 26, 2011 at 16:37, Tom Lane <[hidden email]> wrote: >> If that was what he meant, I'd vote against it. There are way too many >> people who will *not* want their databases configured to be able to >> reach out onto the net. This feature should be something that has to be >> installed by explicit user action. > That is not what I meant. > I meant installed the shared library by defualt, but still require > CREATE EXTENSION. Whether the shlib is installed by default is a decision for packagers to make, not us. At best we could make a recommendation. regards, tom lane -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Kohei KaiGai-3
(2011/10/26 23:57), Kohei KaiGai wrote:
> 2011/10/26 Robert Haas<[hidden email]>: >> I agree. pgsql_fdw will be a nice feature, but there's no reason to >> think that everyone will want it installed by default, and there are >> some security reasons to think that they might not. On the flip side, >> pushing it out of contrib and onto pgfoundry or whatever makes it >> unnecessarily difficult to install, and not as many people will >> benefit from it. So contrib seems exactly right to me. >> > I also agree. The pgsql_fdw will be worthful to locate in the main tree > as a contrib module. It will give us clear opportunity to test new > features of FDW using RDBMS characteristics; such as join-push-down. > However, it should be a separated discussion whether it shall be installed > by the default. (contrib or core extension, or something else), but not an external module. There are still some debatable issues, but they would be meaningless unless pgsql_fdw is qualified for a contrib module. So I'd like to continue the development of pgsql_fdw as contrib module, at least for a while. Please find attached a patch for pgsql_fdw. This patch needs first two patches attached to OP[1] to be applied. (Sorry. gathering patches from another post must be bothersome work. Should I create new CF items for fundamental patches?) [1] http://archives.postgresql.org/pgsql-hackers/2011-10/msg01329.php Changes done since last post are: * add colname FDW option support * allow some libpq options (authtype and tty) to be specified as server FDW options -- Shigeru Hanada * ポルトガル語 - 自動検出 * 英語 * 日本語 * 英語 * 日本語 <javascript:void(0);> -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
In reply to this post by Shigeru Hanada-2
2011/10/25 Shigeru Hanada <[hidden email]>:
> > Connection management > ===================== > The pgsql_fdw establishes a new connection when a foreign server is > accessed first for the local session. Established connection is shared > between all foreign scans in the local query, and shared between even > scans in following queries. Connections are discarded when the current > transaction aborts so that unexpected failure won't cause connection > leak. This is implemented with resource owner mechanism. > I have a doubt here, on sharing connection for each server. What if there are simultaneous scan on the same plan? Say, -> Nested Loop -> Foreign Scan to table T1 on server A -> Foreign Scan to table T2 on server A Okay, you are thinking about Foreign Join, so example above is too simple. But it is always possible to execute such a query if foreign scan nodes are separated far, isn't it? As far as I see from your explanation, scan T1 and scan T2 share the same connection. Now join node scans one row from left (T1) while asking rows from right (T2) without fetching all the rows from left. If T2 requests to server A, the connection's result (of T1) is discarded. Am I understand correctly? Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
|
On Sat, Oct 29, 2011 at 12:25:46AM -0700, Hitoshi Harada wrote:
> I have a doubt here, on sharing connection for each server. What if > there are simultaneous scan on the same plan? Say, > > -> Nested Loop > -> Foreign Scan to table T1 on server A > -> Foreign Scan to table T2 on server A > > Okay, you are thinking about Foreign Join, so example above is too > simple. But it is always possible to execute such a query if foreign > scan nodes are separated far, isn't it? As far as I see from your > explanation, scan T1 and scan T2 share the same connection. Now join > node scans one row from left (T1) while asking rows from right (T2) > without fetching all the rows from left. If T2 requests to server A, > the connection's result (of T1) is discarded. Am I understand > correctly? servers there is an overhead per tuple transmitted. So in the above case it might actually be quicker to do the nested loop locally. To handle the parallel case you might need to materialise in the inner loop, that would avoid the double scan. Or we could fix the protocol so you can stream multiple queries at once. Actually, you can already do this is you use DECLARE CURSOR for all the queries upfront and then FETCH as needed. That way you can do it all over one connection. Have a nice day, -- Martijn van Oosterhout <[hidden email]> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer |
|
In reply to this post by Hitoshi Harada-2
Hitoshi Harada <[hidden email]> writes:
> I have a doubt here, on sharing connection for each server. What if > there are simultaneous scan on the same plan? Say, > -> Nested Loop > -> Foreign Scan to table T1 on server A > -> Foreign Scan to table T2 on server A > Okay, you are thinking about Foreign Join, so example above is too > simple. But it is always possible to execute such a query if foreign > scan nodes are separated far, isn't it? As far as I see from your > explanation, scan T1 and scan T2 share the same connection. Now join > node scans one row from left (T1) while asking rows from right (T2) > without fetching all the rows from left. If T2 requests to server A, > the connection's result (of T1) is discarded. Am I understand > correctly? I have not looked at the code, but ISTM the way that this has to work is that you set up a portal for each active scan. Then you can fetch a few rows at a time from any one of them. If you're doing this through libpq, it'd be necessary to implement each scan using a cursor. I'm not sure whether it'd be worth our time to add more functions to libpq to allow more-direct access to the protocol portal feature. regards, tom lane -- Sent via pgsql-hackers mailing list ([hidden email]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| Powered by Nabble | See how NAML generates this page |
