Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Greenplum Summit 2019

© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
Keaton Adams
Advisory Data Engineer
Pivotal Greenplum: Postgres-Based.
Multi-Cloud. Built for Analytics & AI

Parallel Load / Unload Features
1 of 25

Pivotal Greenplum
§ Launched in 2005 (14 years proven
technology!)
§ EMC Acquired in 2010
§ Pivotal Acquired in 2013
§ Massively Parallel Processing RDBMS
§ Open Source Core Based on PostgreSQL
§ Built w/ Pivotal Labs Practices
§ Over 1000 Person Years of R&D Invested
§ Hundreds of Global Customers in 34
countries
MPP
2 of 25

Pivotal Open Source Strategy
GOALS
§ Reduce Long Term Cost Structure
§ World Wide Technical Collaboration
§ Reduce Bespoke Technologies
§ Avoid Proprietary Pockets
§ Consistent Customer Interfaces
§ Combined Engineering Workforce
§ 300+ Engineers on Staff
Operational OLTP
Analytical MPP
3 of 25

A Modern Data Platform Must Be Built for Diverse
Analytics
4 of 25

Greenplum for
Kubernetes
Public CloudPrivate CloudBare-Metal
Deploy Workloads on any Infrastructure
Other Kubernetes
(on VMs or not)
Google
Container Engine
Greenplum Building
Blocks
• Pivotal blueprint + Dell
reference hardware configs
• Superior price/performance;
no expensive proprietary
hardware
• The most performant way to
run Greenplum on premises
• Certified and supported by
Pivotal
New! New!
The same Greenplum in all environments, including hybrid deployments via Kubernetes
6 of 25

All Major Public Clouds: Fully Integrated Deployment
Bring Your Own License (BYOL) and Hourly
8 of 25

Greenplum Building Blocks
It's All Just Blocks! Simple yet elegant.
● Pivotal’s Greenplum-Optimized Engineered
System to deliver unrivaled Price/Performance
for Next-Generation Analytics and AI!
● Leverages state-of-the-art DELL Servers,
Storage and Networking technologies.
● Simple AND Flexible Sizing and Scaling to fit
enterprise scale workloads from small to huge.
● Cloud Inspired, On-Premise Experienced.
7 of 25

Greenplum Integrated
In-Database Analytics
GRAPHS
Analytical SQL, Aggregations,
Windowing, Short Queries with Indices
Enables Iterative Exploration!
9 of 25

Greenplum Procedural Language support
Containerized Execution
Current Computing Interfaces
§ User Defined Types
§ User Defined Functions
§ User Defined Aggregates
Foundational work for containerized Python and R
compute environments
+ +
10 of 25

Text Analytics: Indexing and Search with GPText
GPText SQL Warehousing + Text Analytics
§ Text Search
§ Integrate Text Functions with Structured Data Analytics
Internal or External Indexing
§ Text Search
§ Madlib integration for machine learning on text data
§ PL/Python and PL/Java integration for Natural Language Processing
Natural Language & AI Integration
§ Apache Madlib
§ PL/Python and PL/Java
§ Open NLP & Madlib for machine learning
11 of 25

MPP Shared Nothing Architecture
§ Segment Host with one or more Segment Instances
§ Segment Instances process queries in parallel
Performance Through Parallelism
§ High speed interconnect for
continuous pipelining of data
processing
§ Master Host and Standby Master Host
§ Master coordinates work with Segment Hosts
§ Segment Hosts have their own
CPU, disk and memory (shared
nothing)
12 of 25

§ Physical separation of data to enable faster processing with WHERE
predicates
§ Unrequired partitions are not processed
§ Facilitates Data Retention Policies on Age
Vertical Partitioning
Dividing Data By Access Patterns
13 of 25

Column-orientedRow-oriented External HDFS, RDBMS, S3
Columnar Store. Row Store. External Data Sources.
Logical table with partitioned physical storage
§ Row oriented is faster when
returning the majority of columns
§ HEAP for many updates and deletes
§ Use Indexes for drill-through queries
§ Columnar storage compresses better
§ Optimized for retrieving a subset of the
columns in a wide table
§ Compression by column: gzip (1-9),
quicklz, Delta, RLE
§ Pivotal Extension Framework
§ Kafka and Spark integration
§ Text, CSV, Avro, parquet, etc.
§ Hadoop, S3 storage support
14 of 25

GPORCA Optimizer
GOALS
§ Unbreakable DW SQL Optimizer
§ Optimize complex SQL to produce superior runtimes
2018 Accomplishments
§ Incremental Analyze via Hyperloglog, Rapid Distinct Value Aggregation
§ Improved Optimization Time, caching and early space pruning
§ Large Table Join, Join Order Optimization using Greedy Algorithm
§ Improved cost tuning to pick index joins when appropriate
§ Support Geospatial Workloads with GIST indexes
§ Improved cardinality estimation: Left joins and predicates on text columns
§ Complex Nested Subqueries: optimizing for co-location (without deadlocks)
15 of 25

Analytics across data of wide time range with PXF
Data is stored in different
systems based on
operational requirements
Can I work with data created
5 seconds ago ?
Can I run a report on data
from 5 months ago ?
Can I inspect the data
archived 5 years ago ?
Data is available for analytics
with Greenplum no matter
where it resides !
In-memory
data grid
RDBMS
dataData Lake
HOT
WARM
COLD
16 of 25

Greenplum-Kafka Connector
Greenplum Kafka
Connector
§ Continual data loading
§ Fast parallel loading via GP Data
Segments
§ Resume on error, once only loading
Features: Benefits:
§ Lower complexity of data load
§ Lower latency from event to query
§ Easier to manage unexpected events
17 of 25

Modern Enterprise : Heterogeneous Data Formats
{ semi-structured
data }
unstructured
data
raw data
structured data
18 of 25

Greenplum Command Center
§ Database Health Indicators
§ Real Time Query Metrics
§ Locking and Blocking Views
§ Visual Explain
§ System Resource Monitoring
§ Workload Management
19 of 25

§ Greenplum Command Center
provides additional workload
management facilities built on
Resource Groups
§ Provides simplified management
§ Assign queries to workloads based
on query tags or GPDB roles
GPCC - Workload Management
20 of 25

Real-time query progress monitoring
22 of 25

Query Execution insights
23 of 25

Greenplum for Kubernetes
Capabilities
§ Private and Public Clouds
§ Flexible Efficient Scaling
§ Automation, Self-Healing
§ Deployment Experience
§ Quick
§ Consistently Repeatable
§ Pre-hardened, pre-networked
§ Service Discovery
Software Appliance Benefits
§ Docker image maintained by Pivotal
§ OS Support From Pivotal, Full Stack 1 Throat to Choke
§ Consistent logging and Monitoring Environments
§ Consistent Greenplum operational environments across public, private clouds
Alana
Give me a Greenplum
Cluster
Cluster Alana
gpdb-alana:5432
25 of 25

Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Greenplum Summit 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Greenplum Summit 2019

Similar to Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Greenplum Summit 2019 (20)

More from VMware Tanzu

More from VMware Tanzu (20)

Recently uploaded

Recently uploaded (20)

Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Greenplum Summit 2019