Vertica

•Download as ODP, PDF•

2 likes•5,133 views

Why?
●
●

●
●

Postgres benchmarks (2014-01-13)
Remember, these queries are expected to
occur within a web request/repsonse cycle!
After 60 seconds connections time out
We are used to web pages loading in 1-2
seconds

$Count ● SELECT count(*) FROM transactions ● (229527.0ms) ● => [{"count"=>78144197}] ● SELECT count(*) FROM transactions WHERE client_id = 131 ● (85451.0ms) ● => [{"count"=>34406416}]$

We have a few tricks.
●

What if we had a table that recorded 1 row per
client that tracked all the counts of transactions
for each client?
id

client_id

count_transactions

1

131

34406416

2

132

10587625

3

133

85095

What if we wired this table up to a SQL parser?

Mondrian!
●
●

●
●

Robust aggregate table interface
Auto recognizes aggregate tables via naming
convention
Queries are directed to the correct table
If aggregate tables are missed, fall back to fact
table

●

Can define multiple aggregate tables / fact table

●

Also has an intelligent segment cache

But theres a problem.
●

●

SELECT count(distinct(user_id) FROM transactions
Aggregate tables rely on properties of addition
operations

●

distinct(set_1) + distinct(set_2) != distinct(set_1 + set_2)

●

We have no choice but to query our fact table.

Options?
●

●

●

NOSQL (map reduce) – Hbase/Hadoop,
Mongo, etc
Columnar – Lucid, Paraccell, Vertica
Bleeding Edge – Google BigQuery, Apache
Drill

Much Cluster, Many Computer
●

●

●

All of these solutions are using distributed
systems to query lots of data quickly
Querying 100 million rows on a single computer
is not fast on current hardware
And we are projecting to have a lot more than
100 million rows this year

Vertica
●

Columnar

●

Distributed

●

Speaks SQL

●

Compatible with Mondrian

●

Its fast!

●

“drop in” replacement for Postgres

Row based database
id

name

favorite_color

1

brian

blue

2

dennis

red

3

nelson

green

4

spencer

green

(1,brian,blue)(2,dennis,red)(3,nelson,green)(4,spencer,green)

Columnar database
id

name

favorite_color

1

brian

blue

2

dennis

red

3

nelson

green

4

spencer

green

(1,2,3,4)(brian,dennis,nelson,spencer)(blue,red,green,green)

Nope!
●
●

●

●

●

Vertica has no indexes
Vertica has “projections” which are similar to a
materialized view
Projections are transparent to the query (like an
index)
Projections are used to optimize JOIN, GROUP
BY, and other sorts of queries
Provides a tool to autobuild projections based
on query analysis

Tradeoffs
Columnar

Row Based

●

Slow single row read

●

Fast single row read

●

Slow single row write

●

Fast single row write

●

Fast aggreagtes

●

Slow aggregates

●

Compression (5-10x)

●

No compression

Distributed
●

Data split among servers

●

Horizontal scaling

●

Data is compressed, so its stored in memory

●

Node failure is tolerated

●

Network IO is important

Count All Transactions
Postgres – 230s

Vertica – 2.10s

Distinct User Count All Transactions
Postgres – 187s

Vertica – 0.63s

So you just drop it in, right?
●

6 or 7 gems needed updates

●

Had to roll an activerecord driver

●

AreL saved us from a lot of pain

●

●
●

Still some SQL problems (database drop,
multirow insert)
Lots of DevOps help needed
Currently deployed to sand and qa, hitting
production soon!

What's hot

Introducing Azure SQL Data WarehouseJames Serra

Azure Synapse Analytics Overview (r2)James Serra

Building a modern data warehouseJames Serra

아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...Amazon Web Services Korea

SQL Database on AzureThurupathan Vijayakumar

Modern Data architecture DesignKujambu Murugesan

Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra

Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra

VerticaSamchu Li

Power BI Data Modeling.pdfVishnuGone

Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens

Amazon VPC와 ELB/Direct Connect/VPN 알아보기 - 김세준, AWS 솔루션즈 아키텍트Amazon Web Services Korea

AWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWSAmazon Web Services Korea

Azure Synapse Analytics Overview (r1)James Serra

Athena & Step Function 으로 통계 파이프라인 구축하기 - 변규현 (당근마켓) :: AWS Community Day Onl...AWSKRUG - AWS한국사용자모임

Emerging Trends in Data Architecture – What’s the Next Big ThingDATAVERSITY

Azure StorageMustafa

Power BI & SAP - Integration Options and possible PifallsJJDE

Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Amazon Web Services

Should I move my database to the cloud?James Serra

What's hot (20)

Introducing Azure SQL Data Warehouse

Azure Synapse Analytics Overview (r2)

Building a modern data warehouse

아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...

SQL Database on Azure

Modern Data architecture Design

Data Lakehouse, Data Mesh, and Data Fabric (r1)

Data Lakehouse, Data Mesh, and Data Fabric (r2)

Vertica

Power BI Data Modeling.pdf

Data Mesh in Azure using Cloud Scale Analytics (WAF)

Amazon VPC와 ELB/Direct Connect/VPN 알아보기 - 김세준, AWS 솔루션즈 아키텍트

AWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWS

Azure Synapse Analytics Overview (r1)

Athena & Step Function 으로 통계 파이프라인 구축하기 - 변규현 (당근마켓) :: AWS Community Day Onl...

Emerging Trends in Data Architecture – What’s the Next Big Thing

Azure Storage

Power BI & SAP - Integration Options and possible Pifalls

Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...

Should I move my database to the cloud?

Viewers also liked

Vertica loading best practicesZvika Gutkin

HP Vertica basicsVijayananda Mohire

Vertica architectureZvika Gutkin

How to install Vertica in a single node.Anil Maharjan

Hp vertica certification guideneinamat

Apps, hybrider eller responsivt designJeppe Hansen

Mobil E-handel i Danmark 2011Jeppe Hansen

Mobile business apps 2011Jeppe Hansen

Mobile e handel januar 2012Jeppe Hansen

Succes med Hybrid Mobil App - Phonegap BDmobil 2014Jeppe Hansen

Mobile megatrends 2011Jeppe Hansen

Mobile trends e handel mode 2013, aprilJeppe Hansen

Seminar Mobil ehandel livsstilstrends 2014Jeppe Hansen

Verticasevenseaspropertycorp

Vertica mpp columnar dbmsZvika Gutkin

Vertica finalist interviewMITX

Optimize Your Vertica Data Management InfrastructureImanis Data

Vertica the convertro wayZvika Gutkin

Bridging Structured and Unstructred Data with Apache Hadoop and VerticaSteve Watt

HPE Vertica Chile Desayuno Oct 2016Analytics10

Viewers also liked (20)

Vertica loading best practices

HP Vertica basics

Vertica architecture

How to install Vertica in a single node.

Hp vertica certification guide

Apps, hybrider eller responsivt design

Mobil E-handel i Danmark 2011

Mobile business apps 2011

Mobile e handel januar 2012

Succes med Hybrid Mobil App - Phonegap BDmobil 2014

Mobile megatrends 2011

Mobile trends e handel mode 2013, april

Seminar Mobil ehandel livsstilstrends 2014

Vertica

Vertica mpp columnar dbms

Vertica finalist interview

Optimize Your Vertica Data Management Infrastructure

Vertica the convertro way

Bridging Structured and Unstructred Data with Apache Hadoop and Vertica

HPE Vertica Chile Desayuno Oct 2016

Similar to Vertica

Amazon RedshiftJeff Patti

Accelerating analytics on the Sensor and IoT Data. Keshav Murthy

Sql on hadoop the secret presentation.3pptxPaulo Alonso

Social media analytics using Azure TechnologiesKoray Kocabas

Using Graph Analysis and Fraud Detection in the Fintech IndustryStanka Dalekova

Why Gateways are Important in Your IoT ArchitectureIBM Analytics

OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS

Plmce2012 scaling pinterestMohit Jain

Beyond php - it's not (just) about the codeWim Godden

Apache Calcite: One Frontend to Rule Them AllMichael Mior

Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Martin Zapletal

Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...Dave Stokes

Beyond PHP - it's not (just) about the codeWim Godden

Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data

Building Your Data Warehouse with Amazon RedshiftAmazon Web Services

Use Your MySQL Knowledge to Become an Instant Cassandra GuruTim Callaghan

Introducing Multi Valued Vectors Fields in Apache LuceneSease

Multi Valued Vectors LuceneSease

Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde

Similar to Vertica (20)

Amazon Redshift

Accelerating analytics on the Sensor and IoT Data.

Sql on hadoop the secret presentation.3pptx

Social media analytics using Azure Technologies

Using Graph Analysis and Fraud Detection in the Fintech Industry

Why Gateways are Important in Your IoT Architecture

OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...

Plmce2012 scaling pinterest

Beyond php - it's not (just) about the code

Apache Calcite: One Frontend to Rule Them All

Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...

Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...

Beyond PHP - it's not (just) about the code

Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...

Building Your Data Warehouse with Amazon Redshift

Use Your MySQL Knowledge to Become an Instant Cassandra Guru

Introducing Multi Valued Vectors Fields in Apache Lucene

Multi Valued Vectors Lucene

Data all over the place! How SQL and Apache Calcite bring sanity to streaming...

Recently uploaded

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

unit 4 immunoblotting technique complete.pptxBkGupta21

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

"ML in Production",Oleksandr BaganFwdays

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Advanced Computer Architecture – An IntroductionDilum Bandara

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

How to write a Business Continuity PlanDatabarracks

Recently uploaded (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

SAP Build Work Zone - Overview L2-L3.pptx

"Debugging python applications inside k8s environment", Andrii Soldatenko

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

The Ultimate Guide to Choosing WordPress Pros and Cons

unit 4 immunoblotting technique complete.pptx

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

Unleash Your Potential - Namagunga Girls Coding Club

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Streamlining Python Development: A Guide to a Modern Project Setup

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

"ML in Production",Oleksandr Bagan

What's New in Teams Calling, Meetings and Devices March 2024

What is DBT - The Ultimate Data Build Tool.pdf

Take control of your SAP testing with UiPath Test Suite

Advanced Computer Architecture – An Introduction

Developer Data Modeling Mistakes: From Postgres to NoSQL

How to write a Business Continuity Plan

Vertica

1. Vertica

2. Why? ● ● ● ● Postgres benchmarks (2014-01-13) Remember, these queries are expected to occur within a web request/repsonse cycle! After 60 seconds connections time out We are used to web pages loading in 1-2 seconds

3. Count ● SELECT count(*) FROM transactions ● (229527.0ms) ● => [{"count"=>78144197}] ● SELECT count(*) FROM transactions WHERE client_id = 131 ● (85451.0ms) ● => [{"count"=>34406416}]

4. Yikes.

5. Don't panic! (and carry a towel)

6. We have a few tricks. ● What if we had a table that recorded 1 row per client that tracked all the counts of transactions for each client? id client_id count_transactions 1 131 34406416 2 132 10587625 3 133 85095 What if we wired this table up to a SQL parser?

7. Mondrian! ● ● ● ● Robust aggregate table interface Auto recognizes aggregate tables via naming convention Queries are directed to the correct table If aggregate tables are missed, fall back to fact table ● Can define multiple aggregate tables / fact table ● Also has an intelligent segment cache

8. But theres a problem. ● ● SELECT count(distinct(user_id) FROM transactions Aggregate tables rely on properties of addition operations ● distinct(set_1) + distinct(set_2) != distinct(set_1 + set_2) ● We have no choice but to query our fact table.

9. Ok, now we can panic.

10. Options? ● ● ● NOSQL (map reduce) – Hbase/Hadoop, Mongo, etc Columnar – Lucid, Paraccell, Vertica Bleeding Edge – Google BigQuery, Apache Drill

11. Much Cluster, Many Computer ● ● ● All of these solutions are using distributed systems to query lots of data quickly Querying 100 million rows on a single computer is not fast on current hardware And we are projecting to have a lot more than 100 million rows this year

12. Vertica ● Columnar ● Distributed ● Speaks SQL ● Compatible with Mondrian ● Its fast! ● “drop in” replacement for Postgres

13. Row based database id name favorite_color 1 brian blue 2 dennis red 3 nelson green 4 spencer green (1,brian,blue)(2,dennis,red)(3,nelson,green)(4,spencer,green)

14. Columnar database id name favorite_color 1 brian blue 2 dennis red 3 nelson green 4 spencer green (1,2,3,4)(brian,dennis,nelson,spencer)(blue,red,green,green)

15. Do you even index, bro?

16. Nope! ● ● ● ● ● Vertica has no indexes Vertica has “projections” which are similar to a materialized view Projections are transparent to the query (like an index) Projections are used to optimize JOIN, GROUP BY, and other sorts of queries Provides a tool to autobuild projections based on query analysis

17. Tradeoffs Columnar Row Based ● Slow single row read ● Fast single row read ● Slow single row write ● Fast single row write ● Fast aggreagtes ● Slow aggregates ● Compression (5-10x) ● No compression

18. Distributed ● Data split among servers ● Horizontal scaling ● Data is compressed, so its stored in memory ● Node failure is tolerated ● Network IO is important

19. Count All Transactions Postgres – 230s Vertica – 2.10s Distinct User Count All Transactions Postgres – 187s Vertica – 0.63s

20. So you just drop it in, right? ● 6 or 7 gems needed updates ● Had to roll an activerecord driver ● AreL saved us from a lot of pain ● ● ● Still some SQL problems (database drop, multirow insert) Lots of DevOps help needed Currently deployed to sand and qa, hitting production soon!

21. Thank You

Vertica

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Vertica

Similar to Vertica (20)

Recently uploaded

Recently uploaded (20)

Vertica