SlideShare a Scribd company logo
1 of 40
Download to read offline
Introduction to Presenter
www.altinity.com
Leading software and services
provider for ClickHouse
Major committer and community
sponsor in US and Western Europe
Robert Hodges - Altinity CEO
30+ years on DBMS plus
virtualization and security.
ClickHouse is DBMS #20
Goals of the talk
● Introduce scaling axes of ClickHouse clusters
● Dig into distributed clusters
○ Using shards to scale writes
○ Using replicas to scale reads
● Describe handy tricks as well as common performance bottlenecks
Non-Goals:
● Boost performance of single nodes (though that’s important, too)
● Teach advanced ClickHouse performance management
Introduction to ClickHouse
Understands SQL
Runs on bare metal to cloud
Stores data in columns
Parallel and vectorized execution
Scales to many petabytes
Is Open source (Apache 2.0)
Is WAY fast!
a b c d
a b c d
a b c d
a b c d
ClickHouse Cluster Model
Clickhouse nodes can scale vertically
Storage
CPU
RAM
Host
Clickhouse nodes can scale vertically
Storage
CPU
RAM
Host
Clusters introduce horizontal scaling
Shards
Replicas
Host Host Host
Host
Replicas help with
concurrency
Shards add
IOPs
Different sharding and replication patterns
Shard 1
Shard 3
Shard 2
Shard 4
All Sharded
Data sharded 4
ways without
replication
Replica 1
Replica 3
Replica 2
Replica 4
All Replicated
Data replicated 4
times without
sharding
Shard 1
Replica 1
Shard 1
Replica 2
Shard 2
Replica 1
Shard 2
Replica 2
Sharded and
Replicated
Data sharded 2
ways and
replicated 2 times
Clusters define sharding and replication layouts
/etc/clickhouse-server/config.d/remote_servers.xml:
<yandex>
<remote_servers>
<ShardedAndReplicated>
<shard>
<replica><host>10.0.0.71</host><port>9000</port></replica>
<replica><host>10.0.0.72</host><port>9000</port></replica>
<internal_replication>true</internal_replication>
</shard>
<shard>
. . .
</shard>
</ShardedAndReplicated>
</remote_servers>
</yandex>
How ClickHouse uses Zookeeper
INSERT
Replicate
ClickHouse Node 1
Table: ontime
(Parts)
ReplicatedMergeTree
:9009
:9000 ClickHouse Node 2
Table: ontime
(Parts)
ReplicatedMergeTree
:9009
:9000
zookeeper-1
ZNodes
:2181 zookeeper-2
ZNodes
:2181 zookeeper-3
ZNodes
:2181
Cluster Performance in Practice
Setting up airline dataset on ClickHouse
clickhouse-0
ontime
_shard
airports
ontime
clickhouse-1
ontime
_shard
airports
ontime
clickhouse-2
ontime
_shard
airports
ontime
clickhouse-3
ontime
_shard
airports
ontime
Distributed
table
(No data)
Sharded,
replicated
table
(Partial data)
Fully
replicated
table
(All data)
Define sharded, replicated fact table
CREATE TABLE IF NOT EXISTS `ontime_shard` ON CLUSTER '{cluster}' (
`Year` UInt16,
`Quarter` UInt8,
...
)
Engine=ReplicatedMergeTree(
'/clickhouse/{cluster}/tables/{shard}/airline_shards/ontime_shard',
'{replica}')
PARTITION BY toYYYYMM(FlightDate)
ORDER BY (FlightDate, `Year`, `Month`, DepDel15)
Trick: Use macros to enable consistent DDL
/etc/clickhouse-server/config.d/macros.xml:
<yandex>
<macros>
<all>0</all>
<cluster>demo</cluster>
<shard>0</shard>
<replica>clickhouse-0</replica>
</macros>
</yandex>
Define a distributed table to query shards
CREATE TABLE IF NOT EXISTS ontime ON CLUSTER `{cluster}`
AS airline_shards.ontime_shard
ENGINE = Distributed(
'{cluster}', airline_shards, ontime_shard, rand())
Define a fully replicated dimension table
CREATE TABLE IF NOT EXISTS airports ON CLUSTER 'all-replicated' (
AirportID String,
Name String,
...
)
Engine=ReplicatedMergeTree(
'/clickhouse/{cluster}/tables/{all}/airline_shards/airports',
'{replica}')
PARTITION BY tuple()
PRIMARY KEY AirportID
ORDER BY AirportID
Overview of insertion options
● Local versus vs distributed data insertion
○ Local – no need to sync, larger blocks, faster
○ Distributed – sharding by ClickHouse
○ CHProxy -- distributes transactions across nodes
● Asynchronous (default) vs synchronous insertions
○ insert_distributed_sync
○ insert_quorum, select_sequential_consistency – linearization at replica level
Distributed vs local inserts in action
ontime
_shard
ontime
Insert via
distributed
table
Insert directly
to shards
ontime
_shard
ontime
ontime
_shard
ontime
ontime
_shard
ontime
Data
Pipeline Data
Pipeline
Pipeline is shard-aware
Slower and more
error prone
(Cache)
Testing cluster loading trade-offs
● With adequate I/O, RAM, CPU all
load options have equal
performance
● Direct loading is fastest for high
volume feeds
● Loading via distributed table is
most complex
○ Resource-inefficient
○ Can fail or lose data due to
async insertion
○ May generate more parts
○ Requires careful monitoring
CRASH!!
Selecting the sharding key
Shard 2 Shard 3Shard 1
Random Key, e.g.,
cityHash64(x) % 3
Must query
all shards
Nodes are
balanced
Shard 3
Specific Key e.g.,
TenantId % 3
Unbalanced
nodes
Queries can
skip shards
Shard 2Shard 1
Easier to
add nodes
Hard to
add nodes
Bi-level sharding combines both approaches
cityHash64(x,
TenantId %3)
Shard 2 Shard 3Shard 1
Natural Key e.g., TenantId % 3
Shard 2Shard 1
cityHash64(x,
TenantId % 3)
cityHash64(x,
TenantId % 3)
Shard 2Shard 1
Tenant-Group-1 Tenant-Group-2 Tenant-Group-3
Implement any sharding scheme via macros
/etc/clickhouse-server/config.d/macros.xml:
<yandex>
<macros>
<all>0</all>
<cluster>demo</cluster>
<group>2</group>
<shard>0</shard>
<replica>clickhouse-0</replica>
</macros>
</yandex>
CREATE TABLE IF NOT EXISTS `ontime_shard` ON CLUSTER '{cluster}' (
. . .)
Engine=ReplicatedMergeTree(
'/clickhouse/{cluster}/tables/ {group}/{shard}/airline_shards/ontime_shard',
'{replica}')
Adding nodes and rebalancing data
● To add servers:
○ Configure and bring up ClickHouse
○ Add schema
○ Add server to cluster definitions in remote_servers.xml, propagate to other servers
● Random sharding schemes allow easier addition of shards
○ Common pattern for time series--allow data to rebalance naturally over time
○ Use replication to propagate dimension tables
● Keyed partitioning schemes do not rebalance automatically
○ You can move parts manually using ALTER TABLE FREEZE PARTITION/rsync/ALTER TABLE
ATTACH PARTITION
○ Other methods work too
How do distributed queries work?
ontime
_shard
ontime
Application
ontime
_shard
ontime
ontime
_shard
ontime
ontime
_shard
ontime
Application
Move WHERE and
heavy grouping work
to innermost level
Innermost
subselect is
distributed
AggregateState
computed
locally
Aggregates
merged on
initiator node
Read performance using distributed tables
● Best case performance is linear
with number of nodes
● For fast queries network latency
may dominate parallelization
ClickHouse pushes down JOINs by default
SELECT o.Dest d, a.Name n, count(*) c, avg(o.ArrDelayMinutes) ad
FROM airline_shards_4.ontime o
JOIN airline_shards_4.airports a ON (a.IATA = o.Dest)
GROUP BY d, n HAVING c > 100000 ORDER BY d DESC
LIMIT 10
SELECT Dest AS d, Name AS n, count() AS c, avg(ArrDelayMinutes) AS ad
FROM airline_shards_4.ontime_shard AS o
ALL INNER JOIN airline_shards_4.airports AS a ON a.IATA = o.Dest
GROUP BY d, n HAVING c > 100000 ORDER BY d DESC LIMIT 10
...Unless the left side is a subquery
SELECT d, Name n, c AS flights, ad
FROM
(
SELECT Dest d, count(*) c, avg(ArrDelayMinutes) ad
FROM airline_shard_4.ontime
GROUP BY d HAVING c > 100000
ORDER BY ad DESC
)
LEFT JOIN airports ON airports.IATA = d
LIMIT 10
Remote
Servers
Distributed product modes affect joins
● ‘Normal’ IN/JOIN – run subquery locally on every server
○ Many nodes – many queries, expensive for distributed
● GLOBAL IN/JOIN - run subquery on initiator, and pass results to every server
● Distributed_product_mode alters “normal” IN/JOIN behavior :
○ deny (default)
○ allow – run queries in ‘normal’ mode, distributed subquery runs on every server, if GLOBAL
keyword is not used
○ local – use local tables for subqueries
○ global – automatically rewrite queries to ‘GLOBAL’ mode
Examples of IN operator processing
select foo from T1 where a in (select a from T2)
1) Subquery runs on a local table
select foo from T1_local
where a in (select a from T2_local)
2) Subquery runs on every node
select foo from T1_local
where a in (select a from T2)
3) Subquery runs on initiator node
create temporary table tmp Engine = Set AS select a from T2
select foo from T1_local where a in tmp;
Distributed query limitations and advice
● If joined table is missing, pushdown will fail
● Releases prior to 20.1 do not push down row-level security predicates
● Fully qualify table names to avoid syntax errors
● Distributed join behavior still somewhat limited
Settings to control distributed query
● distributed_product_mode -- How to handle joins of 2 distributed tables
● enable_optimize_predicate_expression -- Push down predicates
● max_replica_delay_for_distributed_queries -- Maximum permitted delay on
replicas
● load_balancing -- Load balancing algorithm to choose replicas
● prefer_localhost_replica -- Whether to try to use local replica first for queries
● optimize_skip_unused_shards -- One of several settings to avoid shards if
possible
(Plus others…)
Advanced Topics
Capacity planning for clusters
(Based on CloudFlare approach, see Resources slide below)
1. Test capacity on single nodes first
a. Understand contention between INSERTs, background merges, and SELECTs
b. Understand single node scaling issues (e.g., mark cache sizes)
2. If you can support your design ceiling with a single shard, stop here
a. Ensure you have HA covered, though
3. Build the cluster
4. Test full capacity on the cluster
a. Add shards to handle INSERTs
b. Add replicas to handle SELECTs
Debugging slow node problems
Distributed queries are only as fast as the slowest node
Use the remote() table function to test performance across the cluster. Example:
SELECT sum(Cancelled) AS cancelled_flights
FROM remote('clickhouse-0', airline_shards_4.ontime_shard)
SELECT sum(Cancelled) AS cancelled_flights
FROM remote('clickhouse-1', airline_shards_4.ontime_shard)
. . .
A non-exhaustive list of things that go wrong
Zookeeper becomes a bottleneck (avoid excessive numbers of parts)
Choosing a bad partition key
Degraded systems
Insufficient monitoring
Wrap-up and Further Resources
Key takeaways
● Shards add read/write capacity over a dataset (IOPs)
● Replicas enable more concurrent reads
● Choose sharding keys and clustering patterns with care
● Insert directly to shards for best performance
● Distributed query behavior is more complex than MergeTree
● It’s a big distributed system. Plan for things to go wrong
Well-managed clusters are extremely fast! Check your setup if you are not getting
good performance.
Resources
● Altinity Blog
● Secrets of ClickHouse Query Performance -- Altinity Webinar
● ClickHouse Capacity Planning for OLAP Workloads by Mik Kocikowski of
CloudFlare
● ClickHouse Telegram Channel
● ClickHouse Slack Channel
Thank you!
Special Offer:
Contact us for a
1-hour consultation!
Contacts:
info@altinity.com
Visit us at:
https://www.altinity.com
Free Consultation:
https://blog.altinity.com/offer

More Related Content

What's hot

ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOAltinity Ltd
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...Altinity Ltd
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareAltinity Ltd
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEODangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEOAltinity Ltd
 
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...Altinity Ltd
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesAltinity Ltd
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAltinity Ltd
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Altinity Ltd
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...Altinity Ltd
 
ClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesAltinity Ltd
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouseAltinity Ltd
 
Altinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdfAltinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdfAltinity Ltd
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...Altinity Ltd
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovAltinity Ltd
 
Creating Beautiful Dashboards with Grafana and ClickHouse
Creating Beautiful Dashboards with Grafana and ClickHouseCreating Beautiful Dashboards with Grafana and ClickHouse
Creating Beautiful Dashboards with Grafana and ClickHouseAltinity Ltd
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouserpolat
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...Altinity Ltd
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseAltinity Ltd
 

What's hot (20)

ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEODangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
 
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...
 
ClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic Continues
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
Altinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdfAltinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdf
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
Creating Beautiful Dashboards with Grafana and ClickHouse
Creating Beautiful Dashboards with Grafana and ClickHouseCreating Beautiful Dashboards with Grafana and ClickHouse
Creating Beautiful Dashboards with Grafana and ClickHouse
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
 

Similar to Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance

CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query ExecutionJ Singh
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd
 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster InnardsMartin Dvorak
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Stormthe100rabh
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
CoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love SystemdCoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love SystemdRichard Lister
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle CoherenceBen Stopford
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinaloscon2007
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinaloscon2007
 
Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12mistercteam
 
Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Jelena Zanko
 
Spark Structured APIs
Spark Structured APIsSpark Structured APIs
Spark Structured APIsKnoldus Inc.
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAndre Essing
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAmazon Web Services
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0Joe Stein
 

Similar to Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance (20)

CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo Seattle
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster Innards
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Storm
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
CoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love SystemdCoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love Systemd
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 
Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12
 
Ibm db2 case study
Ibm db2 case studyIbm db2 case study
Ibm db2 case study
 
Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20
 
Spark Structured APIs
Spark Structured APIsSpark Structured APIs
Spark Structured APIs
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep Dive
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Data race
Data raceData race
Data race
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing Performance
 
Accordion - VLDB 2014
Accordion - VLDB 2014Accordion - VLDB 2014
Accordion - VLDB 2014
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 

More from Altinity Ltd

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxAltinity Ltd
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Altinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceAltinity Ltd
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfAltinity Ltd
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfAltinity Ltd
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Altinity Ltd
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Altinity Ltd
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfAltinity Ltd
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsAltinity Ltd
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache PinotAltinity Ltd
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Ltd
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...Altinity Ltd
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfAltinity Ltd
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...Altinity Ltd
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...Altinity Ltd
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...Altinity Ltd
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...Altinity Ltd
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...Altinity Ltd
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfAltinity Ltd
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...Altinity Ltd
 

More from Altinity Ltd (20)

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
 

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance

  • 1.
  • 2. Introduction to Presenter www.altinity.com Leading software and services provider for ClickHouse Major committer and community sponsor in US and Western Europe Robert Hodges - Altinity CEO 30+ years on DBMS plus virtualization and security. ClickHouse is DBMS #20
  • 3. Goals of the talk ● Introduce scaling axes of ClickHouse clusters ● Dig into distributed clusters ○ Using shards to scale writes ○ Using replicas to scale reads ● Describe handy tricks as well as common performance bottlenecks Non-Goals: ● Boost performance of single nodes (though that’s important, too) ● Teach advanced ClickHouse performance management
  • 4. Introduction to ClickHouse Understands SQL Runs on bare metal to cloud Stores data in columns Parallel and vectorized execution Scales to many petabytes Is Open source (Apache 2.0) Is WAY fast! a b c d a b c d a b c d a b c d
  • 6. Clickhouse nodes can scale vertically Storage CPU RAM Host
  • 7. Clickhouse nodes can scale vertically Storage CPU RAM Host
  • 8. Clusters introduce horizontal scaling Shards Replicas Host Host Host Host Replicas help with concurrency Shards add IOPs
  • 9. Different sharding and replication patterns Shard 1 Shard 3 Shard 2 Shard 4 All Sharded Data sharded 4 ways without replication Replica 1 Replica 3 Replica 2 Replica 4 All Replicated Data replicated 4 times without sharding Shard 1 Replica 1 Shard 1 Replica 2 Shard 2 Replica 1 Shard 2 Replica 2 Sharded and Replicated Data sharded 2 ways and replicated 2 times
  • 10. Clusters define sharding and replication layouts /etc/clickhouse-server/config.d/remote_servers.xml: <yandex> <remote_servers> <ShardedAndReplicated> <shard> <replica><host>10.0.0.71</host><port>9000</port></replica> <replica><host>10.0.0.72</host><port>9000</port></replica> <internal_replication>true</internal_replication> </shard> <shard> . . . </shard> </ShardedAndReplicated> </remote_servers> </yandex>
  • 11. How ClickHouse uses Zookeeper INSERT Replicate ClickHouse Node 1 Table: ontime (Parts) ReplicatedMergeTree :9009 :9000 ClickHouse Node 2 Table: ontime (Parts) ReplicatedMergeTree :9009 :9000 zookeeper-1 ZNodes :2181 zookeeper-2 ZNodes :2181 zookeeper-3 ZNodes :2181
  • 13. Setting up airline dataset on ClickHouse clickhouse-0 ontime _shard airports ontime clickhouse-1 ontime _shard airports ontime clickhouse-2 ontime _shard airports ontime clickhouse-3 ontime _shard airports ontime Distributed table (No data) Sharded, replicated table (Partial data) Fully replicated table (All data)
  • 14. Define sharded, replicated fact table CREATE TABLE IF NOT EXISTS `ontime_shard` ON CLUSTER '{cluster}' ( `Year` UInt16, `Quarter` UInt8, ... ) Engine=ReplicatedMergeTree( '/clickhouse/{cluster}/tables/{shard}/airline_shards/ontime_shard', '{replica}') PARTITION BY toYYYYMM(FlightDate) ORDER BY (FlightDate, `Year`, `Month`, DepDel15)
  • 15. Trick: Use macros to enable consistent DDL /etc/clickhouse-server/config.d/macros.xml: <yandex> <macros> <all>0</all> <cluster>demo</cluster> <shard>0</shard> <replica>clickhouse-0</replica> </macros> </yandex>
  • 16. Define a distributed table to query shards CREATE TABLE IF NOT EXISTS ontime ON CLUSTER `{cluster}` AS airline_shards.ontime_shard ENGINE = Distributed( '{cluster}', airline_shards, ontime_shard, rand())
  • 17. Define a fully replicated dimension table CREATE TABLE IF NOT EXISTS airports ON CLUSTER 'all-replicated' ( AirportID String, Name String, ... ) Engine=ReplicatedMergeTree( '/clickhouse/{cluster}/tables/{all}/airline_shards/airports', '{replica}') PARTITION BY tuple() PRIMARY KEY AirportID ORDER BY AirportID
  • 18. Overview of insertion options ● Local versus vs distributed data insertion ○ Local – no need to sync, larger blocks, faster ○ Distributed – sharding by ClickHouse ○ CHProxy -- distributes transactions across nodes ● Asynchronous (default) vs synchronous insertions ○ insert_distributed_sync ○ insert_quorum, select_sequential_consistency – linearization at replica level
  • 19. Distributed vs local inserts in action ontime _shard ontime Insert via distributed table Insert directly to shards ontime _shard ontime ontime _shard ontime ontime _shard ontime Data Pipeline Data Pipeline Pipeline is shard-aware Slower and more error prone (Cache)
  • 20. Testing cluster loading trade-offs ● With adequate I/O, RAM, CPU all load options have equal performance ● Direct loading is fastest for high volume feeds ● Loading via distributed table is most complex ○ Resource-inefficient ○ Can fail or lose data due to async insertion ○ May generate more parts ○ Requires careful monitoring CRASH!!
  • 21. Selecting the sharding key Shard 2 Shard 3Shard 1 Random Key, e.g., cityHash64(x) % 3 Must query all shards Nodes are balanced Shard 3 Specific Key e.g., TenantId % 3 Unbalanced nodes Queries can skip shards Shard 2Shard 1 Easier to add nodes Hard to add nodes
  • 22. Bi-level sharding combines both approaches cityHash64(x, TenantId %3) Shard 2 Shard 3Shard 1 Natural Key e.g., TenantId % 3 Shard 2Shard 1 cityHash64(x, TenantId % 3) cityHash64(x, TenantId % 3) Shard 2Shard 1 Tenant-Group-1 Tenant-Group-2 Tenant-Group-3
  • 23. Implement any sharding scheme via macros /etc/clickhouse-server/config.d/macros.xml: <yandex> <macros> <all>0</all> <cluster>demo</cluster> <group>2</group> <shard>0</shard> <replica>clickhouse-0</replica> </macros> </yandex> CREATE TABLE IF NOT EXISTS `ontime_shard` ON CLUSTER '{cluster}' ( . . .) Engine=ReplicatedMergeTree( '/clickhouse/{cluster}/tables/ {group}/{shard}/airline_shards/ontime_shard', '{replica}')
  • 24. Adding nodes and rebalancing data ● To add servers: ○ Configure and bring up ClickHouse ○ Add schema ○ Add server to cluster definitions in remote_servers.xml, propagate to other servers ● Random sharding schemes allow easier addition of shards ○ Common pattern for time series--allow data to rebalance naturally over time ○ Use replication to propagate dimension tables ● Keyed partitioning schemes do not rebalance automatically ○ You can move parts manually using ALTER TABLE FREEZE PARTITION/rsync/ALTER TABLE ATTACH PARTITION ○ Other methods work too
  • 25. How do distributed queries work? ontime _shard ontime Application ontime _shard ontime ontime _shard ontime ontime _shard ontime Application Move WHERE and heavy grouping work to innermost level Innermost subselect is distributed AggregateState computed locally Aggregates merged on initiator node
  • 26. Read performance using distributed tables ● Best case performance is linear with number of nodes ● For fast queries network latency may dominate parallelization
  • 27. ClickHouse pushes down JOINs by default SELECT o.Dest d, a.Name n, count(*) c, avg(o.ArrDelayMinutes) ad FROM airline_shards_4.ontime o JOIN airline_shards_4.airports a ON (a.IATA = o.Dest) GROUP BY d, n HAVING c > 100000 ORDER BY d DESC LIMIT 10 SELECT Dest AS d, Name AS n, count() AS c, avg(ArrDelayMinutes) AS ad FROM airline_shards_4.ontime_shard AS o ALL INNER JOIN airline_shards_4.airports AS a ON a.IATA = o.Dest GROUP BY d, n HAVING c > 100000 ORDER BY d DESC LIMIT 10
  • 28. ...Unless the left side is a subquery SELECT d, Name n, c AS flights, ad FROM ( SELECT Dest d, count(*) c, avg(ArrDelayMinutes) ad FROM airline_shard_4.ontime GROUP BY d HAVING c > 100000 ORDER BY ad DESC ) LEFT JOIN airports ON airports.IATA = d LIMIT 10 Remote Servers
  • 29. Distributed product modes affect joins ● ‘Normal’ IN/JOIN – run subquery locally on every server ○ Many nodes – many queries, expensive for distributed ● GLOBAL IN/JOIN - run subquery on initiator, and pass results to every server ● Distributed_product_mode alters “normal” IN/JOIN behavior : ○ deny (default) ○ allow – run queries in ‘normal’ mode, distributed subquery runs on every server, if GLOBAL keyword is not used ○ local – use local tables for subqueries ○ global – automatically rewrite queries to ‘GLOBAL’ mode
  • 30. Examples of IN operator processing select foo from T1 where a in (select a from T2) 1) Subquery runs on a local table select foo from T1_local where a in (select a from T2_local) 2) Subquery runs on every node select foo from T1_local where a in (select a from T2) 3) Subquery runs on initiator node create temporary table tmp Engine = Set AS select a from T2 select foo from T1_local where a in tmp;
  • 31. Distributed query limitations and advice ● If joined table is missing, pushdown will fail ● Releases prior to 20.1 do not push down row-level security predicates ● Fully qualify table names to avoid syntax errors ● Distributed join behavior still somewhat limited
  • 32. Settings to control distributed query ● distributed_product_mode -- How to handle joins of 2 distributed tables ● enable_optimize_predicate_expression -- Push down predicates ● max_replica_delay_for_distributed_queries -- Maximum permitted delay on replicas ● load_balancing -- Load balancing algorithm to choose replicas ● prefer_localhost_replica -- Whether to try to use local replica first for queries ● optimize_skip_unused_shards -- One of several settings to avoid shards if possible (Plus others…)
  • 34. Capacity planning for clusters (Based on CloudFlare approach, see Resources slide below) 1. Test capacity on single nodes first a. Understand contention between INSERTs, background merges, and SELECTs b. Understand single node scaling issues (e.g., mark cache sizes) 2. If you can support your design ceiling with a single shard, stop here a. Ensure you have HA covered, though 3. Build the cluster 4. Test full capacity on the cluster a. Add shards to handle INSERTs b. Add replicas to handle SELECTs
  • 35. Debugging slow node problems Distributed queries are only as fast as the slowest node Use the remote() table function to test performance across the cluster. Example: SELECT sum(Cancelled) AS cancelled_flights FROM remote('clickhouse-0', airline_shards_4.ontime_shard) SELECT sum(Cancelled) AS cancelled_flights FROM remote('clickhouse-1', airline_shards_4.ontime_shard) . . .
  • 36. A non-exhaustive list of things that go wrong Zookeeper becomes a bottleneck (avoid excessive numbers of parts) Choosing a bad partition key Degraded systems Insufficient monitoring
  • 37. Wrap-up and Further Resources
  • 38. Key takeaways ● Shards add read/write capacity over a dataset (IOPs) ● Replicas enable more concurrent reads ● Choose sharding keys and clustering patterns with care ● Insert directly to shards for best performance ● Distributed query behavior is more complex than MergeTree ● It’s a big distributed system. Plan for things to go wrong Well-managed clusters are extremely fast! Check your setup if you are not getting good performance.
  • 39. Resources ● Altinity Blog ● Secrets of ClickHouse Query Performance -- Altinity Webinar ● ClickHouse Capacity Planning for OLAP Workloads by Mik Kocikowski of CloudFlare ● ClickHouse Telegram Channel ● ClickHouse Slack Channel
  • 40. Thank you! Special Offer: Contact us for a 1-hour consultation! Contacts: info@altinity.com Visit us at: https://www.altinity.com Free Consultation: https://blog.altinity.com/offer