SlideShare uma empresa Scribd logo
1 de 47
Baixar para ler offline
© 2022 Altinity, Inc.
A Day in the Life of
a ClickHouse Query
Intro to ClickHouse Internals
Robert Hodges & Altinity Engineering
1
10 February 2022
© 2022 Altinity, Inc.
Let’s make some introductions
ClickHouse support and services including Altinity.Cloud
Authors of Altinity Kubernetes Operator for ClickHouse
and other open source projects
Robert Hodges
Database geek with 30+ years
on DBMS systems. Day job:
Altinity CEO
Altinity Engineering
Database geeks with centuries
of experience in DBMS and
applications
2
© 2022 Altinity, Inc.
© 2021 Altinity, Inc.
Foundations
3
© 2022 Altinity, Inc.
Understands SQL
Runs on bare metal to cloud
Shared nothing architecture
Stores data in columns
Parallel and vectorized execution
Scales to many petabytes
Is Open source (Apache 2.0)
ClickHouse is a SQL Data Warehouse
It’s the core engine for
real-time analytics
ClickHouse
Event
Streams
ELT
Object
Storage
Interactive
Graphics
Dashboards
APIs
4
© 2022 Altinity, Inc.
If you understand the engine you can make it faster
ClickHouse has a simple execution model–there’s no magic
Any developer can understand how it works
Knowledge leads to faster and more efficient queries
5
(Another fast
engine!)
© 2022 Altinity, Inc.
© 2021 Altinity, Inc.
What happens
when you insert
data?
6
© 2022 Altinity, Inc.
Let’s create a table!
CREATE TABLE IF NOT EXISTS sdata (
DevId Int32,
Type String,
MDate Date,
MDatetime DateTime,
Value Float64
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(MDate)
ORDER BY (DevId, MDatetime)
7
Table engine type
How to break data into
parts
How to index and sort
data in each part
Table columns
© 2022 Altinity, Inc.
Let’s now insert some data…
INSERT INTO sdata VALUES
(15, 'TEMP', '2018-01-01', '2018-01-01 23:29:55', 18.0),
(15, 'TEMP', '2018-01-01', '2018-01-01 23:30:56', 18.7)
8
(This is an example. Most people don’t
insert data this way!)
© 2022 Altinity, Inc.
Part in Storage
How does ClickHouse process an insert?
9
INSERT INTO sdata
VALUES
(15, 'TEMP', . . .),
(15, 'TEMP', . . .)
2 rows in set. Elapsed: 0.271 sec.
ClickHouse Server
Parse/Plan
Respond
Load
Part in RAM
Sort rows ( table ORDER BY )
© 2022 Altinity, Inc.
Part in Storage
How can we make this more efficient? Parallelize!
10
set max_insert_threads=4
insert into ontime_test
select * from ontime
where toYear(FlightDate)
between 2000 and 2001
2 rows in set. Elapsed: 0.271 sec.
ClickHouse Server
Parse/Plan
Respond
Load
Parts in RAM
© 2022 Altinity, Inc.
Parallelism affects speed and memory usage
11
insert into ontime_test
select * from ontime
where toYear(FlightDate)
between 2000 and 2001
set max_insert_threads=1
. . .
set max_insert_threads=2
. . .
set max_insert_threads=4
© 2022 Altinity, Inc.
OK, where did those awesome stats come from?
SELECT
event_time,
type,
is_initial_query,
query_duration_ms / 1000 AS duration,
read_rows,
read_bytes,
result_rows,
formatReadableSize(memory_usage) AS memory,
query
FROM system.query_log
WHERE (user = 'default') AND (type = 'QueryFinish')
ORDER BY event_time DESC
LIMIT 50
12
© 2022 Altinity, Inc.
What’s going on down there when you INSERT?
Table
Part
Index Columns
Sparse index finds rows by Carrier,
Origin, FlightDate
Columns sorted
by Carrier, Origin,
FlightDate
Rows in the part
all belong to
same Year
Part
Index Columns
Part 13
© 2022 Altinity, Inc.
Understanding what’s in a MergeTree part
/var/lib/clickhouse/data/airline/ontime
2017-07-01 AA
2017-07-01 EV
2017-07-01 UA
2017-07-02 AA
...
primary.idx
|
|
|
|
.mrk .bin
20170701_20170731_355_355_2/
(FlightDate, Carrier...) ActualElapsedTime Airline AirlineID...
|
|
|
|
.mrk .bin
|
|
|
|
.mrk .bin
Granule
Compressed
Block
Mark
14
© 2022 Altinity, Inc.
Why MergeTree? Because it merges!
Part
Index Columns
Part
Index Columns
Rewritten, Bigger Part
Index Columns
15
Update and delete also rewrite parts
© 2022 Altinity, Inc.
Bigger parts are more efficient!
● Pick a PARTITION BY that gives nice, fat partitions ( 1-300GB, < 1000
total parts per table)
○ Can’t decide? Partition by month.
● Insert large blocks of data to avoid lots of merges afterwards
○ ClickHouse is fine with tens of millions of rows!
● The simplest way to make blocks bigger is to batch input data
○ Avoid different partition keys in the same block
○ ClickHouse has parameters like max_insert_block_size but defaults are OK
○ Look at logs and actual part sizes to see if you need to do more
16
© 2022 Altinity, Inc.
How can I see how big table parts are?
SELECT
table, partition, name,
marks, rows, data_compressed_bytes,
data_uncompressed_bytes, bytes_on_disk
FROM system.parts
WHERE active
AND level=0
AND database = 'default'
AND table = 'ontime_test'
ORDER BY table DESC, partition ASC, name ASC
17
Part is in use
Part has not been merged
© 2022 Altinity, Inc.
Tips to optimized INSERT
Making INSERT faster
● Increase max_insert_threads (parallel creation of parts)
● Enable input_format_parallel_parsing to parallelize input parsing
○ Works for TSV/CSV/Values data
● Write bigger blocks (less merging afterwards)
Making INSERT less memory intensive
● Decrease max_insert_threads (reduces parts simultaneously in memory)
● Disable input_format_parallel_parsing
● Write smaller blocks (less memory required at INSERT time)
18
© 2022 Altinity, Inc.
© 2021 Altinity, Inc.
How do basic
queries work?
19
© 2022 Altinity, Inc.
Aggregation is a key feature of analytic queries
20
SELECT Carrier,
avg(DepDelay) AS Delay
FROM ontime
GROUP BY Carrier
ORDER BY Delay DESC
Aggregates group measurements for one more more dimensions
Aggregate
Function
Dimension
© 2022 Altinity, Inc.
How does ClickHouse process a query with aggregates?
21
SELECT Carrier,
avg(DepDelay)AS Delay
FROM ontime
GROUP BY Carrier
ORDER BY Delay DESC
┌─FlightDate─┬──────────────Delay─┐
│ 2016-12-17 │ 53.72058516196447 │
│ 1990-12-21 │ 43.87767499654839 │
│ 1990-12-22 │ 43.64346952962076 │
. . .
ClickHouse Server
Parse/Plan
Merge/Sort
Scan
In-RAM
Hash
Tables
Parts in Storage
© 2022 Altinity, Inc.
How can you compute an average in parallel?
22
= 2
Numerator=6
Denominator=3
Scan
Merge
1 2 3 1 3 5 0 5 0 0
Numerator=9
Denominator=3
Numerator=5
Denominator=4
6 + 9 + 5
3 + 3 + 4 Partial
Aggregate
© 2022 Altinity, Inc.
How does a ClickHouse thread do aggregation?
23
Merge/Sort
Scan Thread
Parts in Storage
AL => 4259/1070,
2385/415, …
DL => 20663/1198,
25166/2711, …
… Scan Thread
Hash Table
Other
Scan
Thread
Hash
Tables
Result
GROUP BY
Key
Partial
Aggregates
© 2022 Altinity, Inc.
We can now understand aggregation performance drivers
24
SELECT Carrier,
avg(DepDelay)AS Delay
FROM ontime
GROUP BY Carrier
ORDER BY Delay DESC
LIMIT 50
Simple aggregate, short
GROUP BY key with few values
SELECT Carrier, FlightDate,
avg(DepDelay) AS Delay,
uniqExact(TailNum) AS Aircraft
FROM ontime
GROUP BY Carrier, FlightDate
ORDER BY Delay DESC
LIMIT 50
More complex aggregates, longer
GROUP BY with more values
3.4 sec
2.4 GB RAM
0.84 sec
1.6 KB RAM
© 2022 Altinity, Inc.
Parallelism affects speed and memory usage
25
SELECT Origin, FlightDate,
avg(DepDelay) AS Delay,
uniqExact(TailNum) AS Aircraft
FROM ontime
WHERE Carrier='WN'
GROUP BY Origin, FlightDate
ORDER BY Delay DESC
LIMIT 5
SET max_threads = 1
. . .
SET max_threads = 4
. . .
SET max_threads = 16
© 2022 Altinity, Inc.
This is a good time to mention ClickHouse memory limits
26
SQL
Query
max_memory_usage
(Default=10Gb)
Single query limit
SQL
Query
max_memory_usage_for_user
(Default=Unlimited)
All queries for a user
SQL
Query
SQL
Query
SQL
Query
All memory on server
ClickHouse
Process
max_server _memory_usage
(Default=90% of available RAM)
© 2022 Altinity, Inc.
Tips to make aggregation queries faster
● Remove/exchange “heavy” aggregation functions
● Reduce the number of values in GROUP BY
● Increase max_threads (parallelism)
● Reduce I/O
○ Filter out unnecessary rows
○ Improve compression of data in storage
27
© 2022 Altinity, Inc.
Tips to reduce memory usage in aggregation queries
● Remove/exchange “heavy” aggregation functions
● Reduce number of values in GROUP BY
● Change max_threads value
● Dump aggregates to external storage
○ SET max_bytes_before_external_group_by > 0
● Filter out unnecessary rows
28
© 2022 Altinity, Inc.
© 2021 Altinity, Inc.
How do joins
work?
29
© 2022 Altinity, Inc.
JOIN combines data between tables
SELECT o.Dest,
any(a.Name) AS AirportName,
count(Dest) AS Flights
FROM ontime o JOIN airports a
ON a.IATA = o.Dest
GROUP BY Dest
ORDER BY Flights
DESC LIMIT 10
30
Join condition
“Left”
Table
“Right”
Table
© 2022 Altinity, Inc.
How does ClickHouse process a query with a join?
Left Side
Table
(Big)
Merged
Result
Filter
Right Side
Table
(Small)
In-RAM
Hash
Table
Load the right side table
Scan left side table in
parallel, computing
aggregates and adding
joined data
Merge and sort results
31
Keys and
column
values!
© 2022 Altinity, Inc.
Let’s look more deeply at what’s happening in the scan
. . .
IATA
. . .
airports
. . .
Dest
. . .
ontime
SELECT . . . FROM ontime o JOIN airports a ON a.IATA = o.Dest
32
S
c
a
n
ATL 576
1501
3302
…
Hartsfield Jackson Atlanta International Airport
Hartsfield Jackson Atlanta International Airport
Hartsfield Jackson Atlanta International Airport
…
ORD 255 Chicago O'Hare International Airport
Partial
aggregates
© 2022 Altinity, Inc.
It would be more efficient to join after aggregating
Left Side
Table
(Big)
Merge
Filter
Right Side
Table
(Small)
In-RAM
Hash
Table
Load the right side table
Scan left side table in
parallel, computing
aggregates
Merge, then join and sort
33
Join
© 2022 Altinity, Inc.
You can do exactly that with a subquery
SELECT o.Dest, any(a.Name) AS AirportName,
count(Dest) AS Flights
FROM ontime o
JOIN default.airports a ON a.IATA = o.Dest
GROUP BY Dest ORDER BY Flights
DESC LIMIT 10
SELECT o.Dest, a.Name AS AirportName, o.Flights
FROM (
SELECT Dest, count(Dest) AS Flights
FROM ontime GROUP BY Dest ) AS o
JOIN default.airports a ON a.IATA = o.Dest
ORDER BY Flights DESC LIMIT 10
34
2.71 sec
19.9 MB RAM
0.663 sec
1.58 KB RAM
© 2022 Altinity, Inc.
Simple ways to keep JOINs fast and efficient
● Keep the right side table(s) overall size small
● Minimize the columns joined from the right side
● Add filter conditions to the right side table to reduce rows
● JOIN after aggregation if possible
● Use a Dictionary instead of a JOIN
○ Dictionaries are just loaded once and can be shared across queries
35
Pro tip: The SQL IN operator is also a join under the covers.
© 2022 Altinity, Inc.
© 2021 Altinity, Inc.
How does a
distributed query
work?
36
© 2022 Altinity, Inc.
Example of a distributed data set with shards and replicas
clickhouse-0
ontime
_local
airports
ontime
clickhouse-1
ontime
_local
airports
ontime
clickhouse-2
ontime
_local
airports
ontime
clickhouse-3
ontime
_local
airports
ontime
Distributed
table
(No data)
Sharded,
replicated
table
(Partial data)
Fully
replicated
table
(All data)
37
© 2022 Altinity, Inc.
Distributed send subqueries to multiple nodes
ontime
_local
ontime
Application
ontime
_local
ontime
ontime
_local
ontime
ontime
_local
ontime
Application
Innermost
subselect is
distributed
AggregateState
computed
locally
Aggregates
merged on
initiator node
38
© 2022 Altinity, Inc.
Queries are pushed to all shards
SELECT Carrier, avg(DepDelay) AS Delay
FROM ontime
GROUP BY Carrier ORDER BY Delay DESC
SELECT Carrier, avg(DepDelay) AS Delay
FROM ontime_local
GROUP BY Carrier ORDER BY Delay DESC
39
© 2022 Altinity, Inc.
ClickHouse pushes down JOINs by default
SELECT o.Dest d, a.Name n, count(*) c, avg(o.ArrDelayMinutes) ad
FROM default.ontime o
JOIN default.airports a ON (a.IATA = o.Dest)
GROUP BY d, n HAVING c > 100000 ORDER BY d DESC
LIMIT 10
SELECT Dest AS d, Name AS n, count() AS c, avg(ArrDelayMinutes) AS
ad
FROM default.ontime_local AS o
ALL INNER JOIN default.airports AS a ON a.IATA = o.Dest
GROUP BY d, n HAVING c > 100000 ORDER BY d DESC LIMIT 10
40
© 2022 Altinity, Inc.
...Unless the left side “table” is a subquery
SELECT d, Name n, c AS flights, ad
FROM
(
SELECT Dest d, count(*) c, avg(ArrDelayMinutes) ad
FROM default.ontime
GROUP BY d HAVING c > 100000
ORDER BY ad DESC
) AS o
LEFT JOIN airports ON airports.IATA = o.d
LIMIT 10
Remote
Servers
41
© 2022 Altinity, Inc.
It’s more complex when multiple tables are distributed
select foo from T1 where a in (select a from T2)
distributed_product_mode=?
local
select foo
from T1_local
where a in (
select a
from T2_local)
allow
select foo
from T1_local
where a in (
select a
from T2)
global
create temporary table
tmp Engine = Set
AS select a from T2;
select foo from
T1_local where a in
tmp;
(Subquery runs on
local table)
(Subquery runs on
distributed table) (Subquery runs on initiator;
broadcast to local temp table)
42
© 2022 Altinity, Inc.
Tips to make distributed queries more efficient
● Think about where your data are located
● Move WHERE and heavy grouping work to left hand side of join
● Use a subquery to order joins after the remote scan
● Use the query_log to see what actually executes on the remote node(s)
43
© 2022 Altinity, Inc.
© 2021 Altinity, Inc.
Where to learn
more
44
© 2022 Altinity, Inc.
Where is the documentation?
ClickHouse official docs – https://clickhouse.com/docs/
Altinity Blog – https://altinity.com/blog/
Altinity Youtube Channel –
https://www.youtube.com/channel/UCE3Y2lDKl_ZfjaCrh62onYA
Altinity Knowledge Base – https://kb.altinity.com/
Meetups, other blogs, and external resources. Use your powers of Search!
45
© 2022 Altinity, Inc.
References for this talk
Altinity Knowledge Base – https://kb.altinity.com/
ClickHouse Source Code – https://github.com/ClickHouse/ClickHouse
Talks and Blog Articles -
● ClickHouse Deep Dive, Alexey Milovidov
● Про JOIN’ы (в ClickHouse) - Artyem Zuikov
● Модификаторы DISTINCT и ORDER BY для всех агрегатных функций -
Sofia Sergeevna Borzenkova
● ClickHouse Kernel Analysis - Storage Structure and Query Acceleration of
MergeTree - Alibaba Cloud
46
© 2022 Altinity, Inc.
Thank you!
Questions?
https://altinity.com
47

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
 
ClickHouse Intro
ClickHouse IntroClickHouse Intro
ClickHouse Intro
 
ClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic Continues
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and how
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEODangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse
 
ClickHouse Keeper
ClickHouse KeeperClickHouse Keeper
ClickHouse Keeper
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
 
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
 
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster PerformanceWebinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
 

Semelhante a A Day in the Life of a ClickHouse Query Webinar Slides

Developers' mDay 2017. - Bogdan Kecman Oracle
Developers' mDay 2017. - Bogdan Kecman OracleDevelopers' mDay 2017. - Bogdan Kecman Oracle
Developers' mDay 2017. - Bogdan Kecman Oracle
mCloud
 

Semelhante a A Day in the Life of a ClickHouse Query Webinar Slides (20)

Altinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdfAltinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdf
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...
 
Don't Do This [FOSDEM 2023]
Don't Do This [FOSDEM 2023]Don't Do This [FOSDEM 2023]
Don't Do This [FOSDEM 2023]
 
Data warehouse or conventional database: Which is right for you?
Data warehouse or conventional database: Which is right for you?Data warehouse or conventional database: Which is right for you?
Data warehouse or conventional database: Which is right for you?
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
 
How to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollHow to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'roll
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
Creating Beautiful Dashboards with Grafana and ClickHouse
Creating Beautiful Dashboards with Grafana and ClickHouseCreating Beautiful Dashboards with Grafana and ClickHouse
Creating Beautiful Dashboards with Grafana and ClickHouse
 
Hash join use memory optimization
Hash join use memory optimizationHash join use memory optimization
Hash join use memory optimization
 
Common Table Expressions (CTE) & Window Functions in MySQL 8.0
Common Table Expressions (CTE) & Window Functions in MySQL 8.0Common Table Expressions (CTE) & Window Functions in MySQL 8.0
Common Table Expressions (CTE) & Window Functions in MySQL 8.0
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
 
PHP tips by a MYSQL DBA
PHP tips by a MYSQL DBAPHP tips by a MYSQL DBA
PHP tips by a MYSQL DBA
 
Pro Techniques for the SSAS MD Developer
Pro Techniques for the SSAS MD DeveloperPro Techniques for the SSAS MD Developer
Pro Techniques for the SSAS MD Developer
 
Partition and conquer large data in PostgreSQL 10
Partition and conquer large data in PostgreSQL 10Partition and conquer large data in PostgreSQL 10
Partition and conquer large data in PostgreSQL 10
 
Relational Database to Apache Spark (and sometimes back again)
Relational Database to Apache Spark (and sometimes back again)Relational Database to Apache Spark (and sometimes back again)
Relational Database to Apache Spark (and sometimes back again)
 
Developers’ mDay u Banjoj Luci - Bogdan Kecman, Oracle – MySQL Server 8.0
Developers’ mDay u Banjoj Luci - Bogdan Kecman, Oracle – MySQL Server 8.0Developers’ mDay u Banjoj Luci - Bogdan Kecman, Oracle – MySQL Server 8.0
Developers’ mDay u Banjoj Luci - Bogdan Kecman, Oracle – MySQL Server 8.0
 
Developers' mDay 2017. - Bogdan Kecman Oracle
Developers' mDay 2017. - Bogdan Kecman OracleDevelopers' mDay 2017. - Bogdan Kecman Oracle
Developers' mDay 2017. - Bogdan Kecman Oracle
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance
 

Mais de Altinity Ltd

Mais de Altinity Ltd (20)

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
 

Último

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 

Último (20)

Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 

A Day in the Life of a ClickHouse Query Webinar Slides

  • 1. © 2022 Altinity, Inc. A Day in the Life of a ClickHouse Query Intro to ClickHouse Internals Robert Hodges & Altinity Engineering 1 10 February 2022
  • 2. © 2022 Altinity, Inc. Let’s make some introductions ClickHouse support and services including Altinity.Cloud Authors of Altinity Kubernetes Operator for ClickHouse and other open source projects Robert Hodges Database geek with 30+ years on DBMS systems. Day job: Altinity CEO Altinity Engineering Database geeks with centuries of experience in DBMS and applications 2
  • 3. © 2022 Altinity, Inc. © 2021 Altinity, Inc. Foundations 3
  • 4. © 2022 Altinity, Inc. Understands SQL Runs on bare metal to cloud Shared nothing architecture Stores data in columns Parallel and vectorized execution Scales to many petabytes Is Open source (Apache 2.0) ClickHouse is a SQL Data Warehouse It’s the core engine for real-time analytics ClickHouse Event Streams ELT Object Storage Interactive Graphics Dashboards APIs 4
  • 5. © 2022 Altinity, Inc. If you understand the engine you can make it faster ClickHouse has a simple execution model–there’s no magic Any developer can understand how it works Knowledge leads to faster and more efficient queries 5 (Another fast engine!)
  • 6. © 2022 Altinity, Inc. © 2021 Altinity, Inc. What happens when you insert data? 6
  • 7. © 2022 Altinity, Inc. Let’s create a table! CREATE TABLE IF NOT EXISTS sdata ( DevId Int32, Type String, MDate Date, MDatetime DateTime, Value Float64 ) ENGINE = MergeTree() PARTITION BY toYYYYMM(MDate) ORDER BY (DevId, MDatetime) 7 Table engine type How to break data into parts How to index and sort data in each part Table columns
  • 8. © 2022 Altinity, Inc. Let’s now insert some data… INSERT INTO sdata VALUES (15, 'TEMP', '2018-01-01', '2018-01-01 23:29:55', 18.0), (15, 'TEMP', '2018-01-01', '2018-01-01 23:30:56', 18.7) 8 (This is an example. Most people don’t insert data this way!)
  • 9. © 2022 Altinity, Inc. Part in Storage How does ClickHouse process an insert? 9 INSERT INTO sdata VALUES (15, 'TEMP', . . .), (15, 'TEMP', . . .) 2 rows in set. Elapsed: 0.271 sec. ClickHouse Server Parse/Plan Respond Load Part in RAM Sort rows ( table ORDER BY )
  • 10. © 2022 Altinity, Inc. Part in Storage How can we make this more efficient? Parallelize! 10 set max_insert_threads=4 insert into ontime_test select * from ontime where toYear(FlightDate) between 2000 and 2001 2 rows in set. Elapsed: 0.271 sec. ClickHouse Server Parse/Plan Respond Load Parts in RAM
  • 11. © 2022 Altinity, Inc. Parallelism affects speed and memory usage 11 insert into ontime_test select * from ontime where toYear(FlightDate) between 2000 and 2001 set max_insert_threads=1 . . . set max_insert_threads=2 . . . set max_insert_threads=4
  • 12. © 2022 Altinity, Inc. OK, where did those awesome stats come from? SELECT event_time, type, is_initial_query, query_duration_ms / 1000 AS duration, read_rows, read_bytes, result_rows, formatReadableSize(memory_usage) AS memory, query FROM system.query_log WHERE (user = 'default') AND (type = 'QueryFinish') ORDER BY event_time DESC LIMIT 50 12
  • 13. © 2022 Altinity, Inc. What’s going on down there when you INSERT? Table Part Index Columns Sparse index finds rows by Carrier, Origin, FlightDate Columns sorted by Carrier, Origin, FlightDate Rows in the part all belong to same Year Part Index Columns Part 13
  • 14. © 2022 Altinity, Inc. Understanding what’s in a MergeTree part /var/lib/clickhouse/data/airline/ontime 2017-07-01 AA 2017-07-01 EV 2017-07-01 UA 2017-07-02 AA ... primary.idx | | | | .mrk .bin 20170701_20170731_355_355_2/ (FlightDate, Carrier...) ActualElapsedTime Airline AirlineID... | | | | .mrk .bin | | | | .mrk .bin Granule Compressed Block Mark 14
  • 15. © 2022 Altinity, Inc. Why MergeTree? Because it merges! Part Index Columns Part Index Columns Rewritten, Bigger Part Index Columns 15 Update and delete also rewrite parts
  • 16. © 2022 Altinity, Inc. Bigger parts are more efficient! ● Pick a PARTITION BY that gives nice, fat partitions ( 1-300GB, < 1000 total parts per table) ○ Can’t decide? Partition by month. ● Insert large blocks of data to avoid lots of merges afterwards ○ ClickHouse is fine with tens of millions of rows! ● The simplest way to make blocks bigger is to batch input data ○ Avoid different partition keys in the same block ○ ClickHouse has parameters like max_insert_block_size but defaults are OK ○ Look at logs and actual part sizes to see if you need to do more 16
  • 17. © 2022 Altinity, Inc. How can I see how big table parts are? SELECT table, partition, name, marks, rows, data_compressed_bytes, data_uncompressed_bytes, bytes_on_disk FROM system.parts WHERE active AND level=0 AND database = 'default' AND table = 'ontime_test' ORDER BY table DESC, partition ASC, name ASC 17 Part is in use Part has not been merged
  • 18. © 2022 Altinity, Inc. Tips to optimized INSERT Making INSERT faster ● Increase max_insert_threads (parallel creation of parts) ● Enable input_format_parallel_parsing to parallelize input parsing ○ Works for TSV/CSV/Values data ● Write bigger blocks (less merging afterwards) Making INSERT less memory intensive ● Decrease max_insert_threads (reduces parts simultaneously in memory) ● Disable input_format_parallel_parsing ● Write smaller blocks (less memory required at INSERT time) 18
  • 19. © 2022 Altinity, Inc. © 2021 Altinity, Inc. How do basic queries work? 19
  • 20. © 2022 Altinity, Inc. Aggregation is a key feature of analytic queries 20 SELECT Carrier, avg(DepDelay) AS Delay FROM ontime GROUP BY Carrier ORDER BY Delay DESC Aggregates group measurements for one more more dimensions Aggregate Function Dimension
  • 21. © 2022 Altinity, Inc. How does ClickHouse process a query with aggregates? 21 SELECT Carrier, avg(DepDelay)AS Delay FROM ontime GROUP BY Carrier ORDER BY Delay DESC ┌─FlightDate─┬──────────────Delay─┐ │ 2016-12-17 │ 53.72058516196447 │ │ 1990-12-21 │ 43.87767499654839 │ │ 1990-12-22 │ 43.64346952962076 │ . . . ClickHouse Server Parse/Plan Merge/Sort Scan In-RAM Hash Tables Parts in Storage
  • 22. © 2022 Altinity, Inc. How can you compute an average in parallel? 22 = 2 Numerator=6 Denominator=3 Scan Merge 1 2 3 1 3 5 0 5 0 0 Numerator=9 Denominator=3 Numerator=5 Denominator=4 6 + 9 + 5 3 + 3 + 4 Partial Aggregate
  • 23. © 2022 Altinity, Inc. How does a ClickHouse thread do aggregation? 23 Merge/Sort Scan Thread Parts in Storage AL => 4259/1070, 2385/415, … DL => 20663/1198, 25166/2711, … … Scan Thread Hash Table Other Scan Thread Hash Tables Result GROUP BY Key Partial Aggregates
  • 24. © 2022 Altinity, Inc. We can now understand aggregation performance drivers 24 SELECT Carrier, avg(DepDelay)AS Delay FROM ontime GROUP BY Carrier ORDER BY Delay DESC LIMIT 50 Simple aggregate, short GROUP BY key with few values SELECT Carrier, FlightDate, avg(DepDelay) AS Delay, uniqExact(TailNum) AS Aircraft FROM ontime GROUP BY Carrier, FlightDate ORDER BY Delay DESC LIMIT 50 More complex aggregates, longer GROUP BY with more values 3.4 sec 2.4 GB RAM 0.84 sec 1.6 KB RAM
  • 25. © 2022 Altinity, Inc. Parallelism affects speed and memory usage 25 SELECT Origin, FlightDate, avg(DepDelay) AS Delay, uniqExact(TailNum) AS Aircraft FROM ontime WHERE Carrier='WN' GROUP BY Origin, FlightDate ORDER BY Delay DESC LIMIT 5 SET max_threads = 1 . . . SET max_threads = 4 . . . SET max_threads = 16
  • 26. © 2022 Altinity, Inc. This is a good time to mention ClickHouse memory limits 26 SQL Query max_memory_usage (Default=10Gb) Single query limit SQL Query max_memory_usage_for_user (Default=Unlimited) All queries for a user SQL Query SQL Query SQL Query All memory on server ClickHouse Process max_server _memory_usage (Default=90% of available RAM)
  • 27. © 2022 Altinity, Inc. Tips to make aggregation queries faster ● Remove/exchange “heavy” aggregation functions ● Reduce the number of values in GROUP BY ● Increase max_threads (parallelism) ● Reduce I/O ○ Filter out unnecessary rows ○ Improve compression of data in storage 27
  • 28. © 2022 Altinity, Inc. Tips to reduce memory usage in aggregation queries ● Remove/exchange “heavy” aggregation functions ● Reduce number of values in GROUP BY ● Change max_threads value ● Dump aggregates to external storage ○ SET max_bytes_before_external_group_by > 0 ● Filter out unnecessary rows 28
  • 29. © 2022 Altinity, Inc. © 2021 Altinity, Inc. How do joins work? 29
  • 30. © 2022 Altinity, Inc. JOIN combines data between tables SELECT o.Dest, any(a.Name) AS AirportName, count(Dest) AS Flights FROM ontime o JOIN airports a ON a.IATA = o.Dest GROUP BY Dest ORDER BY Flights DESC LIMIT 10 30 Join condition “Left” Table “Right” Table
  • 31. © 2022 Altinity, Inc. How does ClickHouse process a query with a join? Left Side Table (Big) Merged Result Filter Right Side Table (Small) In-RAM Hash Table Load the right side table Scan left side table in parallel, computing aggregates and adding joined data Merge and sort results 31 Keys and column values!
  • 32. © 2022 Altinity, Inc. Let’s look more deeply at what’s happening in the scan . . . IATA . . . airports . . . Dest . . . ontime SELECT . . . FROM ontime o JOIN airports a ON a.IATA = o.Dest 32 S c a n ATL 576 1501 3302 … Hartsfield Jackson Atlanta International Airport Hartsfield Jackson Atlanta International Airport Hartsfield Jackson Atlanta International Airport … ORD 255 Chicago O'Hare International Airport Partial aggregates
  • 33. © 2022 Altinity, Inc. It would be more efficient to join after aggregating Left Side Table (Big) Merge Filter Right Side Table (Small) In-RAM Hash Table Load the right side table Scan left side table in parallel, computing aggregates Merge, then join and sort 33 Join
  • 34. © 2022 Altinity, Inc. You can do exactly that with a subquery SELECT o.Dest, any(a.Name) AS AirportName, count(Dest) AS Flights FROM ontime o JOIN default.airports a ON a.IATA = o.Dest GROUP BY Dest ORDER BY Flights DESC LIMIT 10 SELECT o.Dest, a.Name AS AirportName, o.Flights FROM ( SELECT Dest, count(Dest) AS Flights FROM ontime GROUP BY Dest ) AS o JOIN default.airports a ON a.IATA = o.Dest ORDER BY Flights DESC LIMIT 10 34 2.71 sec 19.9 MB RAM 0.663 sec 1.58 KB RAM
  • 35. © 2022 Altinity, Inc. Simple ways to keep JOINs fast and efficient ● Keep the right side table(s) overall size small ● Minimize the columns joined from the right side ● Add filter conditions to the right side table to reduce rows ● JOIN after aggregation if possible ● Use a Dictionary instead of a JOIN ○ Dictionaries are just loaded once and can be shared across queries 35 Pro tip: The SQL IN operator is also a join under the covers.
  • 36. © 2022 Altinity, Inc. © 2021 Altinity, Inc. How does a distributed query work? 36
  • 37. © 2022 Altinity, Inc. Example of a distributed data set with shards and replicas clickhouse-0 ontime _local airports ontime clickhouse-1 ontime _local airports ontime clickhouse-2 ontime _local airports ontime clickhouse-3 ontime _local airports ontime Distributed table (No data) Sharded, replicated table (Partial data) Fully replicated table (All data) 37
  • 38. © 2022 Altinity, Inc. Distributed send subqueries to multiple nodes ontime _local ontime Application ontime _local ontime ontime _local ontime ontime _local ontime Application Innermost subselect is distributed AggregateState computed locally Aggregates merged on initiator node 38
  • 39. © 2022 Altinity, Inc. Queries are pushed to all shards SELECT Carrier, avg(DepDelay) AS Delay FROM ontime GROUP BY Carrier ORDER BY Delay DESC SELECT Carrier, avg(DepDelay) AS Delay FROM ontime_local GROUP BY Carrier ORDER BY Delay DESC 39
  • 40. © 2022 Altinity, Inc. ClickHouse pushes down JOINs by default SELECT o.Dest d, a.Name n, count(*) c, avg(o.ArrDelayMinutes) ad FROM default.ontime o JOIN default.airports a ON (a.IATA = o.Dest) GROUP BY d, n HAVING c > 100000 ORDER BY d DESC LIMIT 10 SELECT Dest AS d, Name AS n, count() AS c, avg(ArrDelayMinutes) AS ad FROM default.ontime_local AS o ALL INNER JOIN default.airports AS a ON a.IATA = o.Dest GROUP BY d, n HAVING c > 100000 ORDER BY d DESC LIMIT 10 40
  • 41. © 2022 Altinity, Inc. ...Unless the left side “table” is a subquery SELECT d, Name n, c AS flights, ad FROM ( SELECT Dest d, count(*) c, avg(ArrDelayMinutes) ad FROM default.ontime GROUP BY d HAVING c > 100000 ORDER BY ad DESC ) AS o LEFT JOIN airports ON airports.IATA = o.d LIMIT 10 Remote Servers 41
  • 42. © 2022 Altinity, Inc. It’s more complex when multiple tables are distributed select foo from T1 where a in (select a from T2) distributed_product_mode=? local select foo from T1_local where a in ( select a from T2_local) allow select foo from T1_local where a in ( select a from T2) global create temporary table tmp Engine = Set AS select a from T2; select foo from T1_local where a in tmp; (Subquery runs on local table) (Subquery runs on distributed table) (Subquery runs on initiator; broadcast to local temp table) 42
  • 43. © 2022 Altinity, Inc. Tips to make distributed queries more efficient ● Think about where your data are located ● Move WHERE and heavy grouping work to left hand side of join ● Use a subquery to order joins after the remote scan ● Use the query_log to see what actually executes on the remote node(s) 43
  • 44. © 2022 Altinity, Inc. © 2021 Altinity, Inc. Where to learn more 44
  • 45. © 2022 Altinity, Inc. Where is the documentation? ClickHouse official docs – https://clickhouse.com/docs/ Altinity Blog – https://altinity.com/blog/ Altinity Youtube Channel – https://www.youtube.com/channel/UCE3Y2lDKl_ZfjaCrh62onYA Altinity Knowledge Base – https://kb.altinity.com/ Meetups, other blogs, and external resources. Use your powers of Search! 45
  • 46. © 2022 Altinity, Inc. References for this talk Altinity Knowledge Base – https://kb.altinity.com/ ClickHouse Source Code – https://github.com/ClickHouse/ClickHouse Talks and Blog Articles - ● ClickHouse Deep Dive, Alexey Milovidov ● Про JOIN’ы (в ClickHouse) - Artyem Zuikov ● Модификаторы DISTINCT и ORDER BY для всех агрегатных функций - Sofia Sergeevna Borzenkova ● ClickHouse Kernel Analysis - Storage Structure and Query Acceleration of MergeTree - Alibaba Cloud 46
  • 47. © 2022 Altinity, Inc. Thank you! Questions? https://altinity.com 47