SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
FOREIGN DATA
WRAPPER
ENHANCEMENTS	
June 17, 2015
PostgreSQL Developers Unconference
Clustering Track
Shigeru HANADA, Etsuro Fujita
Who are we	
•  Shigeru HANADA
•  From Tokyo, Japan
•  Working on FDW since 2010
•  Implemented initial FDW API and postgres_fdw
•  Etsuro Fujita
•  From Tokyo, Japan
•  Working on Postgres for 10 years
•  Interested in FDW enhancements
Agenda	
•  Past enhancements proposed for 9.5
•  Inheritance support (Committed)
•  Join push-down (Committed)
•  Join push-down for postgres_fdw (Returned with feedback)
•  Update push-down (Returned with feedback)
•  Possible remote query optimization in 9.5
•  Ideas for further enhancement
•  Sort push-down
•  Aggregate push-down
•  More aggressive join push-down
•  Discussions
PAST ENHANCEMENTS
PROPOSED FOR 9.5
Inheritance support	
•  Outline
•  Allow foreign table to participate in inheritance tree
•  A way to implement sharding
•  Example
postgres=# explain verbose select * from parent ;!
QUERY PLAN!
---------------------------------------------------------------------------!
Append (cost=0.00..270.00 rows=2001 width=4)!
-> Seq Scan on public.parent (cost=0.00..0.00 rows=1 width=4)!
Output: parent.a!
-> Foreign Scan on public.ft1 (cost=100.00..135.00 rows=1000 width=4)!
Output: ft1.a!
Remote SQL: SELECT a FROM public.t1!
-> Foreign Scan on public.ft2 (cost=100.00..135.00 rows=1000 width=4)!
Output: ft2.a!
Remote SQL: SELECT a FROM public.t2!
(9 rows)
Update push-down	
•  Outline
•  Send whole UPDATE/DELETE statement when it has same
semantics on the remote side
•  Example
postgres=# explain verbose update foo set a = a + 1 where a > 10;!
QUERY PLAN!
--------------------------------------------------------------------------------!
Update on public.foo (cost=100.00..139.78 rows=990 width=10)!
Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1!
-> Foreign Scan on public.foo (cost=100.00..139.78 rows=990 width=10)!
Output: (a + 1), ctid!
Remote SQL: SELECT a, ctid FROM public.foo WHERE ((a > 10)) FOR UPDATE!
(5 rows)!
!
postgres=# explain verbose update foo set a = a + 1 where a > 10;!
QUERY PLAN!
-----------------------------------------------------------------------------!
Update on public.foo (cost=100.00..139.78 rows=990 width=10)!
-> Foreign Update on public.foo (cost=100.00..139.78 rows=990 width=10)!
Remote SQL: UPDATE public.foo SET a = (a + 1) WHERE ((a > 10))!
(3 rows)	
Current	
Patched
Update push-down, cont.	
•  Issues
•  FDW-APIs for update push-down
•  Called from nodeModifyTable.c or nodeForeignscan.c?
•  Update push-down for an update on a join
•  "UPDATE foo ... FROM bar ..." (both foo and bar are remote)
•  Further enhancements
•  INSERT/UPSERT push-down
Join push-down	
•  Outline
•  Join foreign tables on remote side, if it’s safe
•  Example
	
fdw=# EXPLAIN (VERBOSE) SELECT tbalance FROM pgbench_branches b JOIN
pgbench_tellers t USING(bid);!
QUERY PLAN!
---------------------------------------------------------------------------
---------------------------------------------------------------------------
---------------------------------------------------------------------------
---------!
Foreign Scan (cost=100.00..101.00 rows=50 width=4)!
Output: t.tbalance!
Relations: (public.pgbench_branches b) INNER JOIN
(public.pgbench_tellers t)!
Remote SQL: SELECT r.a1 FROM (SELECT l.a9 FROM (SELECT bid a9 FROM
public.pgbench_branches) l) l (a1) INNER JOIN (SELECT r.a11, r.a10 FROM
(SELECT bid a10, tbalance a11 FROM public.pgbench_tellers) r) r (a1, a2) ON
((l.a1 = r.a2))!
(4 rows)
Join push-down, cont.	
•  Issues
•  Implement postgres_fdw to handle join APIs
•  Centralize deparsing remote query
•  Should use parse tree rather than planner information to generate join
query?
•  Generic SQL deparser would help porting to FDWs for other RDBMS
Possible remote query optimization in 9.5	
•  When we run a following query:	
SELECT c.grade, max(s.score) max_score!
FROM scores s LEFT JOIN classes c!
ON c.class_id = s.class_id!
WHERE c.subject = ‘Math’!
GROUP BY c.grade!
HAVING max(s.score) > 50!
ORDER BY c.grade DESC;	
“scores” and
“classes” are
foreign tables
Possible remote query optimization in 9.5	
•  When we run a following query:	
SELECT c.grade, max(s.score) max_score!
FROM scores s LEFT JOIN classes c!
ON c.class_id = s.class_id!
WHERE c.subject = ‘Math’!
GROUP BY c.grade!
HAVING max(s.score) > 50!
ORDER BY c.grade DESC;	
SELECT c.grade, s.score!
FROM scores s LEFT JOIN classes c!
ON c.class_id = s.class_id!
WHERE c.subject= ‘Math’!
ORDER BY c.grade DESC;	
Genarate remote query	
We can push-down
red portions of the
query
Possible remote query optimization in 9.5	
postgres=# EXPLAIN SELECT c.grade, max(s.score) max_score!
postgres-# FROM scores s LEFT JOIN classes c!
postgres-# ON c.class_id = s.class_id!
postgres-# WHERE c.subject= 'Math'!
postgres-# GROUP BY c.grade!
postgres-# HAVING max(s.score) > 50!
postgres-# ORDER BY c.grade DESC;!
QUERY PLAN!
----------------------------------------------------------------------------------!
GroupAggregate (cost=27.92..27.94 rows=1 width=8)!
Group Key: c.grade!
Filter: (max(s.score) > 50)!
-> Sort (cost=27.92..27.92 rows=1 width=8)!
Sort Key: c.grade DESC!
-> Hash Join (cost=20.18..27.91 rows=1 width=8)!
Hash Cond: (s.class_id = c.class_id)!
-> Seq Scan on scores s (cost=0.00..6.98 rows=198 width=8)!
-> Hash (cost=20.12..20.12 rows=4 width=8)!
-> Seq Scan on classes c (cost=0.00..20.12 rows=4 width=8)!
Filter: (subject = 'Math'::text)!
(11 rows)
IDEAS FOR FURTHER
ENHANCEMENT
Ideas for further enhancement	
•  Sort push-down
•  Aggregate push-down
•  More aggressive join push-down
•  2PC support (out of scope of this session)
•  Will be discussed in Ashutosh’s session on 19th Jun.
Sort push-down	
•  Outline
•  Mark a ForiegnScan as sorted
•  Efficacy
•  Avoid unnecessary sort on local side
•  Use ForeignScan as a source of MergeJoin directly
•  How to implement
•  Add extra ForeignPath with pathkeys
•  Estimate costs of pre-sorted path
•  Sort result of a foreign scan
•  add ORDER BY, in RDBMS FDWs
•  choose pre-sorted file, in file-based FDWs
Sort push-down	
•  Issues
•  How can we limit candidates of sort keys?
•  No brute-force approach
•  Introduce FOREIGN INDEX to represent generic remote indexes?
•  Introduce FDW-specific catalogs?
•  Extract key information from ORDER BY, JOIN, GROUP BY?
•  How can we ensure that the semantics of ordering are identical?
•  Even between PostgreSQLs, we have collation issues.
•  Is it OK to leave it to DBAs?
•  Limiting to non-character data types seems a way to go for the first cut.
•  Can we use pre-sorted join results as sorted path?
•  MergeJoin as a root node of remote query means the result is sorted by
the join key, but it is not certain even we execute EXPLAIN before
query.
•  Any idea?
Aggregate push-down	
•  Outline
•  Replace a Aggregate/GroupAggregate/HashAggregate plan node
with a ForeignScan which produces aggregated results
•  Efficacy
•  Reduce amount of data transferred
•  Off-load overheads of aggregation
•  How to implement
•  New FDW API for aggregation hooking
•  Implement API in each FDW
Aggregate push-down	
•  Issues
•  GROUP BY requires identical semantics about grouping keys.
•  We have similar issue to sort push-down.
•  How can we map functions to remote ones?
•  ROUTINE MAPPING is defined in SQL standard, but it doesn’t seem
well-designed.
More aggressive join push-down	
•  Outline
•  Send local data to join it on remote side, with following way:
•  VALUES expression in FROM clause
•  per-table replication, with logical replication, Slony-I, etc.
•  Efficacy
•  Reduce amount of data transferred from remote to local
•  Limited to cases that joining small local table and huge remote table
which produce small results
More aggressive join push-down	
•  How to implement
•  Replace reference to a small local table with VALUES()
•  Use a remote replicated table as an alternative
•  Issues
•  How can we construct VALUES() expression?
•  How can we know a table is replicated on the remote side?	
SELECT *!
FROM huge_remote_table h!
JOIN!
(VALUES (1, ‘foo’), (2, ‘bar’)) AS s (id, name)!
ON s.id;	
Generated by scanning
local small table
DISCUSSIONS

Mais conteúdo relacionado

Mais procurados

Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
DataWorks Summit
 
PostgreSQL Write-Ahead Log (Heikki Linnakangas)
PostgreSQL Write-Ahead Log (Heikki Linnakangas) PostgreSQL Write-Ahead Log (Heikki Linnakangas)
PostgreSQL Write-Ahead Log (Heikki Linnakangas)
Ontico
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
Takahiro Inoue
 

Mais procurados (20)

Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
 
CMUデータベース輪読会第8回
CMUデータベース輪読会第8回CMUデータベース輪読会第8回
CMUデータベース輪読会第8回
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
Apache Drill Workshop
Apache Drill WorkshopApache Drill Workshop
Apache Drill Workshop
 
Sasi, cassandra on full text search ride
Sasi, cassandra on full text search rideSasi, cassandra on full text search ride
Sasi, cassandra on full text search ride
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
 
Advanced Relevancy Ranking
Advanced Relevancy RankingAdvanced Relevancy Ranking
Advanced Relevancy Ranking
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using Pig
 
Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
 
PostgreSQL Write-Ahead Log (Heikki Linnakangas)
PostgreSQL Write-Ahead Log (Heikki Linnakangas) PostgreSQL Write-Ahead Log (Heikki Linnakangas)
PostgreSQL Write-Ahead Log (Heikki Linnakangas)
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
 
Postgres 9.4 First Look
Postgres 9.4 First LookPostgres 9.4 First Look
Postgres 9.4 First Look
 
Accessing Databases from R
Accessing Databases from RAccessing Databases from R
Accessing Databases from R
 
PostgreSQL and RAM usage
PostgreSQL and RAM usagePostgreSQL and RAM usage
PostgreSQL and RAM usage
 
Hypertable
HypertableHypertable
Hypertable
 

Semelhante a Foreign Data Wrapper Enhancements

Extensions on PostgreSQL
Extensions on PostgreSQLExtensions on PostgreSQL
Extensions on PostgreSQL
Alpaca
 
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdfNOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
cookie1969
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
PostgresOpen
 

Semelhante a Foreign Data Wrapper Enhancements (20)

PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개
 
PostgreSQL Extensions: A deeper look
PostgreSQL Extensions:  A deeper lookPostgreSQL Extensions:  A deeper look
PostgreSQL Extensions: A deeper look
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018
 
Extensions on PostgreSQL
Extensions on PostgreSQLExtensions on PostgreSQL
Extensions on PostgreSQL
 
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
PgConf US 2015 - ALTER DATABASE ADD more SANITY
PgConf US 2015  - ALTER DATABASE ADD more SANITYPgConf US 2015  - ALTER DATABASE ADD more SANITY
PgConf US 2015 - ALTER DATABASE ADD more SANITY
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLScaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQL
 
Percona toolkit
Percona toolkitPercona toolkit
Percona toolkit
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
 
Couchbase 5.5: N1QL and Indexing features
Couchbase 5.5: N1QL and Indexing featuresCouchbase 5.5: N1QL and Indexing features
Couchbase 5.5: N1QL and Indexing features
 
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdfNOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
 
ClickHouse new features and development roadmap, by Aleksei Milovidov
ClickHouse new features and development roadmap, by Aleksei MilovidovClickHouse new features and development roadmap, by Aleksei Milovidov
ClickHouse new features and development roadmap, by Aleksei Milovidov
 
A Recovering Java Developer Learns to Go
A Recovering Java Developer Learns to GoA Recovering Java Developer Learns to Go
A Recovering Java Developer Learns to Go
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
 
Postgres Conference (PgCon) New York 2019
Postgres Conference (PgCon) New York 2019Postgres Conference (PgCon) New York 2019
Postgres Conference (PgCon) New York 2019
 
Techday2010 Postgresql9
Techday2010 Postgresql9Techday2010 Postgresql9
Techday2010 Postgresql9
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
 

Mais de Shigeru Hanada

Mais de Shigeru Hanada (13)

PostgreSQL 10 新機能 @OSC 2017 Fukuoka
PostgreSQL 10 新機能 @OSC 2017 FukuokaPostgreSQL 10 新機能 @OSC 2017 Fukuoka
PostgreSQL 10 新機能 @OSC 2017 Fukuoka
 
PostgreSQL 10 新機能 @オープンセミナー香川 2017
PostgreSQL 10 新機能 @オープンセミナー香川 2017PostgreSQL 10 新機能 @オープンセミナー香川 2017
PostgreSQL 10 新機能 @オープンセミナー香川 2017
 
OSS-DB Gold技術解説セミナー@db tech showcase 東京 2014
OSS-DB Gold技術解説セミナー@db tech showcase 東京 2014OSS-DB Gold技術解説セミナー@db tech showcase 東京 2014
OSS-DB Gold技術解説セミナー@db tech showcase 東京 2014
 
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
 
PostgreSQLのトラブルシューティング@第5回中国地方DB勉強会
PostgreSQLのトラブルシューティング@第5回中国地方DB勉強会PostgreSQLのトラブルシューティング@第5回中国地方DB勉強会
PostgreSQLのトラブルシューティング@第5回中国地方DB勉強会
 
PostgreSQLではじめるOSS開発@OSC 2014 Hiroshima
PostgreSQLではじめるOSS開発@OSC 2014 HiroshimaPostgreSQLではじめるOSS開発@OSC 2014 Hiroshima
PostgreSQLではじめるOSS開発@OSC 2014 Hiroshima
 
OSS-DB Goldへの第一歩~実践!運用管理~
OSS-DB Goldへの第一歩~実践!運用管理~OSS-DB Goldへの第一歩~実践!運用管理~
OSS-DB Goldへの第一歩~実践!運用管理~
 
PostgreSQL開発ことはじめ - 第27回しくみ+アプリケーション勉強会
PostgreSQL開発ことはじめ - 第27回しくみ+アプリケーション勉強会PostgreSQL開発ことはじめ - 第27回しくみ+アプリケーション勉強会
PostgreSQL開発ことはじめ - 第27回しくみ+アプリケーション勉強会
 
9.3で進化した外部テーブル
9.3で進化した外部テーブル9.3で進化した外部テーブル
9.3で進化した外部テーブル
 
Extending PostgreSQL - PgDay 2012 Japan
Extending PostgreSQL - PgDay 2012 JapanExtending PostgreSQL - PgDay 2012 Japan
Extending PostgreSQL - PgDay 2012 Japan
 
PostgreSQL 9.2 新機能 - 新潟オープンソースセミナー2012
PostgreSQL 9.2 新機能 - 新潟オープンソースセミナー2012PostgreSQL 9.2 新機能 - 新潟オープンソースセミナー2012
PostgreSQL 9.2 新機能 - 新潟オープンソースセミナー2012
 
PostgreSQL 9.2 新機能 - OSC 2012 Kansai@Kyoto
PostgreSQL 9.2 新機能 - OSC 2012 Kansai@KyotoPostgreSQL 9.2 新機能 - OSC 2012 Kansai@Kyoto
PostgreSQL 9.2 新機能 - OSC 2012 Kansai@Kyoto
 
外部データラッパによる PostgreSQL の拡張
外部データラッパによる PostgreSQL の拡張外部データラッパによる PostgreSQL の拡張
外部データラッパによる PostgreSQL の拡張
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Foreign Data Wrapper Enhancements

  • 1. FOREIGN DATA WRAPPER ENHANCEMENTS June 17, 2015 PostgreSQL Developers Unconference Clustering Track Shigeru HANADA, Etsuro Fujita
  • 2. Who are we •  Shigeru HANADA •  From Tokyo, Japan •  Working on FDW since 2010 •  Implemented initial FDW API and postgres_fdw •  Etsuro Fujita •  From Tokyo, Japan •  Working on Postgres for 10 years •  Interested in FDW enhancements
  • 3. Agenda •  Past enhancements proposed for 9.5 •  Inheritance support (Committed) •  Join push-down (Committed) •  Join push-down for postgres_fdw (Returned with feedback) •  Update push-down (Returned with feedback) •  Possible remote query optimization in 9.5 •  Ideas for further enhancement •  Sort push-down •  Aggregate push-down •  More aggressive join push-down •  Discussions
  • 5. Inheritance support •  Outline •  Allow foreign table to participate in inheritance tree •  A way to implement sharding •  Example postgres=# explain verbose select * from parent ;! QUERY PLAN! ---------------------------------------------------------------------------! Append (cost=0.00..270.00 rows=2001 width=4)! -> Seq Scan on public.parent (cost=0.00..0.00 rows=1 width=4)! Output: parent.a! -> Foreign Scan on public.ft1 (cost=100.00..135.00 rows=1000 width=4)! Output: ft1.a! Remote SQL: SELECT a FROM public.t1! -> Foreign Scan on public.ft2 (cost=100.00..135.00 rows=1000 width=4)! Output: ft2.a! Remote SQL: SELECT a FROM public.t2! (9 rows)
  • 6. Update push-down •  Outline •  Send whole UPDATE/DELETE statement when it has same semantics on the remote side •  Example postgres=# explain verbose update foo set a = a + 1 where a > 10;! QUERY PLAN! --------------------------------------------------------------------------------! Update on public.foo (cost=100.00..139.78 rows=990 width=10)! Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1! -> Foreign Scan on public.foo (cost=100.00..139.78 rows=990 width=10)! Output: (a + 1), ctid! Remote SQL: SELECT a, ctid FROM public.foo WHERE ((a > 10)) FOR UPDATE! (5 rows)! ! postgres=# explain verbose update foo set a = a + 1 where a > 10;! QUERY PLAN! -----------------------------------------------------------------------------! Update on public.foo (cost=100.00..139.78 rows=990 width=10)! -> Foreign Update on public.foo (cost=100.00..139.78 rows=990 width=10)! Remote SQL: UPDATE public.foo SET a = (a + 1) WHERE ((a > 10))! (3 rows) Current Patched
  • 7. Update push-down, cont. •  Issues •  FDW-APIs for update push-down •  Called from nodeModifyTable.c or nodeForeignscan.c? •  Update push-down for an update on a join •  "UPDATE foo ... FROM bar ..." (both foo and bar are remote) •  Further enhancements •  INSERT/UPSERT push-down
  • 8. Join push-down •  Outline •  Join foreign tables on remote side, if it’s safe •  Example fdw=# EXPLAIN (VERBOSE) SELECT tbalance FROM pgbench_branches b JOIN pgbench_tellers t USING(bid);! QUERY PLAN! --------------------------------------------------------------------------- --------------------------------------------------------------------------- --------------------------------------------------------------------------- ---------! Foreign Scan (cost=100.00..101.00 rows=50 width=4)! Output: t.tbalance! Relations: (public.pgbench_branches b) INNER JOIN (public.pgbench_tellers t)! Remote SQL: SELECT r.a1 FROM (SELECT l.a9 FROM (SELECT bid a9 FROM public.pgbench_branches) l) l (a1) INNER JOIN (SELECT r.a11, r.a10 FROM (SELECT bid a10, tbalance a11 FROM public.pgbench_tellers) r) r (a1, a2) ON ((l.a1 = r.a2))! (4 rows)
  • 9. Join push-down, cont. •  Issues •  Implement postgres_fdw to handle join APIs •  Centralize deparsing remote query •  Should use parse tree rather than planner information to generate join query? •  Generic SQL deparser would help porting to FDWs for other RDBMS
  • 10. Possible remote query optimization in 9.5 •  When we run a following query: SELECT c.grade, max(s.score) max_score! FROM scores s LEFT JOIN classes c! ON c.class_id = s.class_id! WHERE c.subject = ‘Math’! GROUP BY c.grade! HAVING max(s.score) > 50! ORDER BY c.grade DESC; “scores” and “classes” are foreign tables
  • 11. Possible remote query optimization in 9.5 •  When we run a following query: SELECT c.grade, max(s.score) max_score! FROM scores s LEFT JOIN classes c! ON c.class_id = s.class_id! WHERE c.subject = ‘Math’! GROUP BY c.grade! HAVING max(s.score) > 50! ORDER BY c.grade DESC; SELECT c.grade, s.score! FROM scores s LEFT JOIN classes c! ON c.class_id = s.class_id! WHERE c.subject= ‘Math’! ORDER BY c.grade DESC; Genarate remote query We can push-down red portions of the query
  • 12. Possible remote query optimization in 9.5 postgres=# EXPLAIN SELECT c.grade, max(s.score) max_score! postgres-# FROM scores s LEFT JOIN classes c! postgres-# ON c.class_id = s.class_id! postgres-# WHERE c.subject= 'Math'! postgres-# GROUP BY c.grade! postgres-# HAVING max(s.score) > 50! postgres-# ORDER BY c.grade DESC;! QUERY PLAN! ----------------------------------------------------------------------------------! GroupAggregate (cost=27.92..27.94 rows=1 width=8)! Group Key: c.grade! Filter: (max(s.score) > 50)! -> Sort (cost=27.92..27.92 rows=1 width=8)! Sort Key: c.grade DESC! -> Hash Join (cost=20.18..27.91 rows=1 width=8)! Hash Cond: (s.class_id = c.class_id)! -> Seq Scan on scores s (cost=0.00..6.98 rows=198 width=8)! -> Hash (cost=20.12..20.12 rows=4 width=8)! -> Seq Scan on classes c (cost=0.00..20.12 rows=4 width=8)! Filter: (subject = 'Math'::text)! (11 rows)
  • 14. Ideas for further enhancement •  Sort push-down •  Aggregate push-down •  More aggressive join push-down •  2PC support (out of scope of this session) •  Will be discussed in Ashutosh’s session on 19th Jun.
  • 15. Sort push-down •  Outline •  Mark a ForiegnScan as sorted •  Efficacy •  Avoid unnecessary sort on local side •  Use ForeignScan as a source of MergeJoin directly •  How to implement •  Add extra ForeignPath with pathkeys •  Estimate costs of pre-sorted path •  Sort result of a foreign scan •  add ORDER BY, in RDBMS FDWs •  choose pre-sorted file, in file-based FDWs
  • 16. Sort push-down •  Issues •  How can we limit candidates of sort keys? •  No brute-force approach •  Introduce FOREIGN INDEX to represent generic remote indexes? •  Introduce FDW-specific catalogs? •  Extract key information from ORDER BY, JOIN, GROUP BY? •  How can we ensure that the semantics of ordering are identical? •  Even between PostgreSQLs, we have collation issues. •  Is it OK to leave it to DBAs? •  Limiting to non-character data types seems a way to go for the first cut. •  Can we use pre-sorted join results as sorted path? •  MergeJoin as a root node of remote query means the result is sorted by the join key, but it is not certain even we execute EXPLAIN before query. •  Any idea?
  • 17. Aggregate push-down •  Outline •  Replace a Aggregate/GroupAggregate/HashAggregate plan node with a ForeignScan which produces aggregated results •  Efficacy •  Reduce amount of data transferred •  Off-load overheads of aggregation •  How to implement •  New FDW API for aggregation hooking •  Implement API in each FDW
  • 18. Aggregate push-down •  Issues •  GROUP BY requires identical semantics about grouping keys. •  We have similar issue to sort push-down. •  How can we map functions to remote ones? •  ROUTINE MAPPING is defined in SQL standard, but it doesn’t seem well-designed.
  • 19. More aggressive join push-down •  Outline •  Send local data to join it on remote side, with following way: •  VALUES expression in FROM clause •  per-table replication, with logical replication, Slony-I, etc. •  Efficacy •  Reduce amount of data transferred from remote to local •  Limited to cases that joining small local table and huge remote table which produce small results
  • 20. More aggressive join push-down •  How to implement •  Replace reference to a small local table with VALUES() •  Use a remote replicated table as an alternative •  Issues •  How can we construct VALUES() expression? •  How can we know a table is replicated on the remote side? SELECT *! FROM huge_remote_table h! JOIN! (VALUES (1, ‘foo’), (2, ‘bar’)) AS s (id, name)! ON s.id; Generated by scanning local small table