PostgreSQL is often regarded as the world’s most advanced open source database—and it’s on fire. Umur Cubukcu moves beyond the typical list of features in the next release to explore why so many new projects “just use Postgres” as their system of record (or system of engagement) at scale. Along the way, you’ll learn how PostgreSQL’s extension APIs are fueling innovations in relational databases.
Topics include: a framework for thinking about modern workloads, the evolution of database infrastructure, extensibility for the database and PostgreSQL as an ecosystem
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
1. The State of Postgres
For Modern, Scalable Applications
Umur Cubukcu | Citus Data | Strata Data Conference 2018
@umurc | @citusdata | citusdata.com
2. 2 Umur Cubukcu | Citus Data | Strata Data Conference | 2018
About me & Citus Data
Citus Data Co-Founders, Left to Right
Ozgun Erdogan, Sumedh Pathak, Umur Cubukcu
Photo credit: Willy Johnson 2017
• Umur Cubukcu, Co-Founder &
CEO of Citus Data
• Citus: Distributed PostgreSQL
• Founded 2011, HQ in SOMA
@umurc | @citusdata
github.com/citusdata/citus
3. Databases used to be simple (2008)
3 Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
(OLAP)
Workloads
Proprietary
Open
Source
OperationsAnalytics
(OLTP)
RDBMS
4. Data Growth >> Silicon Growth…
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
Data
2x every
15 mo
Moore’s Law
2x every
24 mo
Data with less structure1 2
LOG
Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
Two challenges for the relational database
changed the landscape
4
5. 5 Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
6. Meanwhile: Short history of Postgres
Not the first time seeing similar challenges
6
• SQL or not? (1995)
• Post-Ingres
• Started life as object store
• Added SQL API in 1995
Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
1
2• Scaling out to handle data growth (2005)
• For analytics only: MPPs
• So many forks! AsterData, Netezza,
ParAccel (Redshift), Greenplum
7. 7
Introducing PostgreSQL Extension APIs (2011)
Amplifying vs. breaking the ecosystem
Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
Planner
Executor
Custom scan
Commit / abort
Access methods
Foreign tables
Functions
...
...
...
...
...
...
...
Extension (.so)PostgreSQL
CREATE EXTENSION ...
8. Addressing challenges to RDBMS
To structure, or not to structure?
Scaling out—compute & performance
8
1
2
Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
9. 9
Start from file system
(Hadoop)
(-) Pay cost at query time
(-) Batch vs. real-time
(-) Indexes (Append only FS)
(+) Any data, any structure
(+) ’Infinitely’ scalable storage
(+) Write fast
Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
10. 10
Worry about only one
access pattern
(-) No expressiveness for analytics
(-) No JOINS, data duplication
(-) Enforce structure at app layer
Semi-structured (JSON)
(+) Simple: Put & Get
(+) Scalable
Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
11. 11 Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
Table "public.events"
Column | Type | Sample Data
------------------------------------------------------------------
user_id | bigint | 09288
created_at | timestamp | 2018-03-08 00:57:12.6936+00
payload | jsonb |
Extend the database for JSON data
TO STRUCTURE OR NOT TO STRUCTURE?1
12. B-tree indexes
GIN & GiST indexes
Secondary indexes
Full text search
Index-only scans
Fitting indexes into memory
+
Not to forget: Parallel queries, MVCC, and many more.
Leverage indexing (and other fundamentals)
SCALING COMPUTE & PERFORMANCE2
Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres12
13. 13
SELECT FROM
events a JOIN users b
SELECT FROM (a JOIN b)
SELECT FROM (a JOIN b)
Data Node 1
events
Events_101
Events_103
SELECT FROM (a JOIN b)
SELECT FROM (a JOIN b)
Data Node 2
Data Node N
.
.
.
.
.
.
Users_101
Users_103
…
users
SCALING COMPUTE & PERFORMANCE2
Events_104
Events_102 Users_102
Users_104
Push computations (and joins) down
to many PostgreSQL instances
Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
15. PostgreSQL: Vibrant, global ecosystem
citus
pgcrypto
pg_cron
pg_partman
postgresql-HLL
cstore_fdw
unaccent
cube
jdbc_fdw
pg_trgm
PostGIS
…
Sample PostgreSQL Extensions Integrations
Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
pg_buffercache
pg_prewarm
btree_gin
btree_gist
postgis_topology
pg_stat_statements
postgresql-unit
plpgsql
plv8
pg_telemetry
foreign data wrappers
…
15
16. PostgreSQL on fire
PostgreSQL
MySQL
MongoDB
SQL Server +
Oracle
Source: % database job postings that mention each specific technology, across 20K+ job posts on Hacker News, https://news.ycombinator.com
Database adoption among developers1
Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres16
17. Source: Google Trends for the past 2 years
Winning Startups &
Enterprises
0
10
20
30
40
50
60
70
80
90
100
PG Mongo Hadoop
PostgreSQL popularity =
Hadoop + Mongo combined
Growing from already vast user base
Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres17
18. So there’s an elephant in the room
18 Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
How does it all fit in with your stack?
19. Modern workloads are evolving
19
(OLAP)
Workloads
Proprietary
Open
Source
OperationsAnalytics
(OLTP)
RDBMS
Improvement
workloads
Application workloads
- Transactions
- Short-requests
- In-app analytics
Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
20. Modern databases serve 3 types of apps
20
Time to action
Datavolume
Application data
Systems of
record
• Core workloads, transactions
• Real-time data
• Millisecond latencies
Systems of
engagement
• Drive engagement & revenue
• Real-time data, multiple sources
• Sub-second latencies
Systems of
improvement
• Identify business process improvements
• Offline data, multiple sources
• Sub-minute / hour latencies, data analysts
1
3
2
Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
21. PostgreSQL in your infrastructure stack
21
PostgreSQL
Note: Standard PostgreSQL connectors for all tools (e.g. ODBC / JDBC, PostgreSQL language bindings) available for integrations.
Application
• Standalone database
• Storage
• Compute
Data
Spark
HDFS / S3
• Persistence layer for Spark
• Persistence layer for Kafka
Kafka
NoSQL
• Adjacent to NoSQL
Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
22. Scaling the tables
Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres
23. Parting thoughts:
PostgreSQL becoming the Linux of Databases
23
Extensibility
Versatility
Ecosystem
Umur Cubukcu | Strata Data Conference | March 2018 | The State of Postgres