SlideShare uma empresa Scribd logo
ClickHouse最佳实践
Power Your Data
新浪-⾼高鹏-2018年年10⽉月
19台服务器器
300亿/天 数据量量
800w/天 有效查询
平均查询时间 200ms
核⼼心监控查询平均40ms
存量量数据1.5万亿
截⽌止2018年年09⽉月03⽇日
• Why ClickHouse

• How it works

• Best practice
Content
MySQL DBA
About me
数据分析
Big Data
AIOps
Data Science
Visualization
Analytics
APP Net DB LB
Back
End
Monitoring
MySQL DBA
数据分析
AIOps
About Me
⽹网络
主机 负载均衡
数据库
Web
About Data
APP
CDN 云存储
DNS
•我们做什什么数据?
•量量级
•实时性
•查询需求
•接⼊入
•业务状态
•了了如指掌
About Data
图:某APM数据
About Data
•业务状态
•了了如指掌
图:某后端服务质量量监控
•数据分析 产⽣生洞洞⻅见
•探索未知纬度与组合
About Data
图:某CDN质量量数据 gif动图
CDN1 CDN2
•数据可视化
•直观⾼高效
About Data
图:某业务在不不同机房质量量数据分布
图:某地⽤用户CDN流量量分布
•精细化运维
•服务质量量⾃自证
About Data
图:Redis请求分析
•助⼒力力AIOps
•异常检测
About Data
图:某域名响应时间⽆无阈值异常监控
图:某域名访问量量⽆无阈值异常监控
About Data
•助⼒力力AIOps
•根因分析
图:5XX错误根因纬度分析
图:某KPI恶化根因纬度排查
数据
让业务放开写⽇日志

海海量量、实时、

多维、多级

精细化运维和AIOps

数据驱动业务
About Data
Why ClickHouse
DATA
Before ClickHouse
•典型架构
Why ClickHouse
1.链路路太⻓长
2.ES查询、ES存储
3.Hive太慢
4.Spark太重
5.实时⽅方式需要舍弃原始数据
6.BI⼯工具少,分析师崩溃
Before ClickHouse
俄罗斯搜索巨头Yandex开源

列列式存储

集群

超⾼高性能
压缩
驱动丰富
SQL
线性扩展

PB级别
OLAP
统计函数
updated in real time
跨数据中⼼心
异步复制
最终⼀一致
Why ClickHouse
Why ClickHouse
图:ClickHouse线上⼀一些SQL gif动图
注:该配置为4台低配服务器器,⾮非后⽂文的机型
⼀一个体量量超级⼤大的
SQL超级快的
关系型数据仓库
Why ClickHouse
Why ClickHouse
With ClickHouse
•准实时⼊入库
•原始数据随⽤用随查
•缩短数据处理理路路径
•ETL进程易易扩容、通⽤用化
•资源使⽤用⼤大⼤大降低
图:ClickHouse架构数据流程
Why ClickHouse
对⽐比 ES⽅方案 CK⽅方案 Hive⽅方案
数据接⼊入 较多开源插件 ⾃自研 ⾃自研
实时性 准实时 准实时 离线
数据查询
DSL语⾔言

SQL⽀支持有限
⽀支持复杂SQL

执⾏行行快
SQL规范,但是很慢
硬件成本 CPU密集型 CPU密集型 需要Hadoop⽣生态
存储成本 资源占⽤用多 ES的1/24 资源占⽤用⼀一般
通⽤用性 ⽐比较通⽤用 需要定义schema 需要定义schema
原始数据
⻓长期保留留成本⾼高

特殊场景需要预聚合
提供物化视图

内部聚合
可⻓长期保留留
How it works
•CK架构
Data Storage
MergeTree
ReplicatedMergeTree
ReplacingMergeTree
SummingMergeTree
Log
Memory
Buffer
Client
PHP JDBC
Python
Golang
Data Type SQL
ODBC
Cluster
Access Quotas
Kafka
Distributed
MaterializedView
Node.js
R
Scala
Julia
Rust
C++
Perl
Ruby .NET
HTTP
Functions
• 列列式存储,异步merge,集群模式

• 向量量引擎+SIMD,超越tight loop

• ⽀支持有限的delete操作

• 数据压缩,减少IO

• 不不⽀支持事务

• 引擎给⼒力力
How it works
•CK为什什么这么快?
How it works
类似LSM Tree,但是没有内存表,不不记录事务⽇日志
•MergeTree的Merge
图:LSMTree原理理图、LevelDB原理理图
How it works
•MergeTree的Merge
•block

•part

•partition
Partition
Parts
Parts
Parts
图:ClickHouse Merge示意图
How it works
•MergeTree的Tree
图:MergeTree引擎原理理
base_table
dist_table
base_table
dist_table
base_table
dist_table
读取真实数据
读 聚合返回
ck-xx.sina.com.cn
读取真实数据
How it works
集群⼯工作⽅方式
图:ClickHouse集群配置
图:ClickHouse集群读数据示意图
ZK
base_table base_table base_table base_table
How it works
数据复制
多源、多主、多向复制
数据‘互通有⽆无’
⾃自带检测机制
⾃自带同步机制(物理理复制)
依赖ZK
⾮非多数派写
图:ClickHouse数据复制示意图
1-1
1-2
1-3
1-4
Cluster 1
N-1
N-2
N-3
N-4
Cluster N
……
How it works
图:ClickHouse数据复制示意图
•令⼈人苦恼的复制架构
•令⼈人苦恼的复制架构
How it works
34⻚页PPT掌握ClickHouse的数据复制 图:ClickHouse数据复制示意图
Best practice
CK架构
暂时不不使⽤用复制架构,仅使⽤用分⽚片架构
多个⼩小集群架构
scale out, not scale up
监控
基于Prometheus+clickhouse_exporter进⾏行行监控
关注读写量量、merge情况
增加基于system库的数据展示
管理理
备份
sql_top⼯工具,定位性能瓶颈
慢查询kill⼯工具
配置⽂文件备份
表结构备份
Best practice
图:ClickHouse SQL实时监控⼯工具 gif动图
Best practice
max_execution_time
•⼀一个悲伤的故事
Best practice
•资源限制 •配额限制
Best practice
•监控
11
12
13
14
Cluster 1X
31
Cluster 3X
41
42
43
Cluster 4X
33
34
35
32
36
Best practice
37
38
44
45
46
47
Best practice
11
12
13
14
Cluster 1X
ck1x.xxxx.sina.com
ck11.xxxx.sina.com
ck12.xxxx.sina.com
ck13.xxxx.sina.com
ck14.xxxx.sina.com⽤用于读写负载均衡
⽤用于集群配置,⽅方便便切换
图:ClickHouse域名负载均衡示意图
Best practice
硬件
OS
参数 应⽤用
1. 合理理的硬件配置,⻅见后⽂文
2. 全内存数据,要考虑内存条的数量量
1. 强制部分内存空闲
2. 禁⽤用swap
3. 开启超线程
1. 调⼤大后台merge线程数:background_pool_size
2. 其他参数能明确的尽量量明确,不不要⽤用默认值
3. 调⼤大连接数、最⼤大并发查询数
4. 调⼤大⽤用户最⼤大可⽤用内存数
5. 开启query_log
6. ⽇日志trace级别
7. 启⽤用集群DDL
1. 使⽤用Golang驱动写数据
2. batch insert 5-10w起步
2. 禁⽌止业务select *
3. 合理理设置主键
CPU
内存
磁盘
E5-2650 v4 超线程 = 48core
8*16G = 128G
12*4T Raid5 = 40T
⽹网卡 双千兆bond0 = 2000Mbps
Best practice
CPU
内存
磁盘
E5-2650 v4 超线程 = 48core
8*16G = 128G
12*4T Raid5 = 40T
⽹网卡 双千兆bond0 = 2000Mbps
不不要低于这个型号
内存条数也很关键
确保IO性能:1. 磁盘数量量 2. Raid 条带⼤大⼩小
写⼊入量量⼤大,注意带宽跑满
Best practice
Best practice
•数据⽂文件10GB
•32G内存 VS 128G内存
图:Intel CPU内部架构示意图
Best practice
物化视图
聚合数据
NULL引擎
集群规划
容量量规划
算⼒力力规划
system库
query_log
merges
parts
replicas
processes
字典
映射关系
管理理
扩容、升级
分区删除的tips
虚拟集群
特殊场景需求
Best practice
• 物化视图
• 内部的pipeline

• 极⼤大提速

• 统计压⼒力力放在平时

• 注意多写带来的压⼒力力

• 原始表可以为NULL

• 选择合适的物化视图引擎
原始数据表
数据写⼊入
物化视图1 物化视图2 物化视图3
idc, domain, http_code
select ts, domain, count(*)
group by ts, domain
select ts, http_code, count(*)
group by ts, http_code
select ts, count(*)
group by ts
图:物化视图逻辑图
Best practice
• 映射类的数据怎么办?
• 放⼊入宽表:浪费空间

• 单独建表:需要join,不不利利于维护

• 字典
• File/HTTP/ODBC

• MySQL/PG/MongoDB/ClickHouse/MS-SQL

• 举例例
• 数仓中存储中⽂文,⽅方便便分析师查看

• MySQL中存储ISO-Code映射关系

• 查询时,映射为ISO-Code来绘制地图

• 避免条件过滤中,使⽤用词典
图:MySQL表字典数据
type为软删除标记
Best practice
• 系统库
• 进程

• 复制状态

• Merge状态

• 数据表统计

• 集群

• query log
图:统计merge情况
图:clusters详情
Best practice
• 系统库
• 进程

• 复制状态

• Merge状态

• 数据表统计

• 集群

• query log
图:统计压缩⽐比
Best practice
• 系统库
• 进程

• 复制状态

• Merge状态

• 数据表统计

• 集群

• query log
图:统计part数量量、容量量
Best practice
• 系统库
• 进程

• 复制状态

• Merge状态

• 数据表统计

• 集群

• query log
图:query log分析
回写MySQL
直接查MySQL数据
Limit 10 by date
直接查数据⽂文件Best practice
INSERT INTO FUNCTION
mysql('host:port', 'db', 'tb', 'user', 'passwd', 1)
SELECT xxx
CREATE TABLE xxxx
ENGINE = MergeTree ORDER BY id AS
SELECT *
FROM mysql('host:port', 'db', 'tb', 'user', 'password')
clickhouse-client
--query="SELECT partition, count() AS
number_of_parts,
formatReadableSize(sum(bytes)) AS
sum_size FROM xxx
WHERE xxxxx ;"
--external --file=test.sql --name=parts
--structure='partition UInt16,name
String,table String,engine String'
-h 127.0.0.1
GROUP BY
xxxx
ORDER BY
xxxx
LIMIT 10 BY date, city;
其他问题
Best practice
1. 写分布式表,对端本地表被删,导致数据来回重传
2. 多parts下,mlocate的updatedb进程,造成⾼高IO
3. 多次OOM
4. hang死
5. 版本混跑,出现乱码
Best practice
• Q1:DB::Exception: DDL background thread is not initialized
• Q2:DB::Exception: Memory limit (for query) exceeded
• Q3:DB::Exception: Unknown table function mysql
• Q4:DB::Exception: Merges are processing significantly
slower than inserts
• Q5:Kafka引擎
• Q6:HDFS数据导⼊入
• Q7:多磁盘
• Q8:数据更更新、⾃自定义分区
• Q9:写分布式表带来的⼀一个问题:连接数暴暴增
• Q10:ClickHouse最⼤大⽀支持多少并发
• Q11:扩/缩容问题
Future •⽀支持更更多机器器学习算法
•更更加复杂的Join
•对接Tableau
•⽀支持update
•⽀支持多磁盘
•谓词下推
•集群管理理
Summary
⽤用的好
⽤用不不好
Summary
别总是讨论能不不能替代Hadoop

别总是⽤用OLTP的标准来要求

别拿⼀一些⽆无意义的SQL来做压测
“⼯工具选的好,下班回家早”

Mais conteúdo relacionado

Mais procurados

SplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with Splunk
Splunk
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Henning Jacobs
 
Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2
Chartbeat
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflix
Vinay Kumar Chella
 
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
InfluxData
 
Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla Operator
ScyllaDB
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
Nmap
NmapNmap
Boost Performance With My S Q L 51 Partitions
Boost Performance With  My S Q L 51 PartitionsBoost Performance With  My S Q L 51 Partitions
Boost Performance With My S Q L 51 Partitions
PerconaPerformance
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
Ververica
 
Go Programming Patterns
Go Programming PatternsGo Programming Patterns
Go Programming Patterns
Hao Chen
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando Patroni
Zalando Technology
 
Writing and testing high frequency trading engines in java
Writing and testing high frequency trading engines in javaWriting and testing high frequency trading engines in java
Writing and testing high frequency trading engines in java
Peter Lawrey
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage Engine
InfluxData
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Altinity Ltd
 
Network Scanning Phases and Supporting Tools
Network Scanning Phases and Supporting ToolsNetwork Scanning Phases and Supporting Tools
Network Scanning Phases and Supporting Tools
Joseph Bugeja
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
Martin Traverso
 
User Behavior Analytics Using Machine Learning
User Behavior Analytics Using Machine LearningUser Behavior Analytics Using Machine Learning
User Behavior Analytics Using Machine Learning
DNIF
 
Jose Miguel Holguin & Marc Salinas - Taller de análisis de memoria RAM en sis...
Jose Miguel Holguin & Marc Salinas - Taller de análisis de memoria RAM en sis...Jose Miguel Holguin & Marc Salinas - Taller de análisis de memoria RAM en sis...
Jose Miguel Holguin & Marc Salinas - Taller de análisis de memoria RAM en sis...
RootedCON
 
Derbycon - The Unintended Risks of Trusting Active Directory
Derbycon - The Unintended Risks of Trusting Active DirectoryDerbycon - The Unintended Risks of Trusting Active Directory
Derbycon - The Unintended Risks of Trusting Active Directory
Will Schroeder
 

Mais procurados (20)

SplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with Splunk
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflix
 
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
 
Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla Operator
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Nmap
NmapNmap
Nmap
 
Boost Performance With My S Q L 51 Partitions
Boost Performance With  My S Q L 51 PartitionsBoost Performance With  My S Q L 51 Partitions
Boost Performance With My S Q L 51 Partitions
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Go Programming Patterns
Go Programming PatternsGo Programming Patterns
Go Programming Patterns
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando Patroni
 
Writing and testing high frequency trading engines in java
Writing and testing high frequency trading engines in javaWriting and testing high frequency trading engines in java
Writing and testing high frequency trading engines in java
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage Engine
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
Network Scanning Phases and Supporting Tools
Network Scanning Phases and Supporting ToolsNetwork Scanning Phases and Supporting Tools
Network Scanning Phases and Supporting Tools
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
 
User Behavior Analytics Using Machine Learning
User Behavior Analytics Using Machine LearningUser Behavior Analytics Using Machine Learning
User Behavior Analytics Using Machine Learning
 
Jose Miguel Holguin & Marc Salinas - Taller de análisis de memoria RAM en sis...
Jose Miguel Holguin & Marc Salinas - Taller de análisis de memoria RAM en sis...Jose Miguel Holguin & Marc Salinas - Taller de análisis de memoria RAM en sis...
Jose Miguel Holguin & Marc Salinas - Taller de análisis de memoria RAM en sis...
 
Derbycon - The Unintended Risks of Trusting Active Directory
Derbycon - The Unintended Risks of Trusting Active DirectoryDerbycon - The Unintended Risks of Trusting Active Directory
Derbycon - The Unintended Risks of Trusting Active Directory
 

Semelhante a ClickHouse北京Meetup ClickHouse Best Practice @Sina

Greenplum技术
Greenplum技术Greenplum技术
Greenplum技术锐 张
 
《数据库发展研究报告-解读(2023年)》.pdf
《数据库发展研究报告-解读(2023年)》.pdf《数据库发展研究报告-解读(2023年)》.pdf
《数据库发展研究报告-解读(2023年)》.pdf
markmind
 
大型电商的数据服务的要点和难点
大型电商的数据服务的要点和难点 大型电商的数据服务的要点和难点
大型电商的数据服务的要点和难点
Chao Zhu
 
淘宝双11双12案例分享
淘宝双11双12案例分享淘宝双11双12案例分享
淘宝双11双12案例分享
vanadies10
 
美团技术沙龙04 美团下一代分布式存储系统
美团技术沙龙04   美团下一代分布式存储系统美团技术沙龙04   美团下一代分布式存储系统
美团技术沙龙04 美团下一代分布式存储系统
美团点评技术团队
 
基于MySQL开放复制协议的同步扩展
基于MySQL开放复制协议的同步扩展基于MySQL开放复制协议的同步扩展
基于MySQL开放复制协议的同步扩展Sky Jian
 
Taobao casestudy-yufeng-qcon
Taobao casestudy-yufeng-qconTaobao casestudy-yufeng-qcon
Taobao casestudy-yufeng-qconYiwei Ma
 
Mesos-based Data Infrastructure @ Douban
Mesos-based Data Infrastructure @ DoubanMesos-based Data Infrastructure @ Douban
Mesos-based Data Infrastructure @ Douban
Zhong Bo Tian
 
开源+自主开发 - 淘宝软件基础设施构建实践
开源+自主开发  - 淘宝软件基础设施构建实践开源+自主开发  - 淘宝软件基础设施构建实践
开源+自主开发 - 淘宝软件基础设施构建实践
Wensong Zhang
 
杨少华:阿里开放数据处理服务
杨少华:阿里开放数据处理服务杨少华:阿里开放数据处理服务
杨少华:阿里开放数据处理服务
hdhappy001
 
Qcon2013 罗李 - hadoop在阿里
Qcon2013 罗李 - hadoop在阿里Qcon2013 罗李 - hadoop在阿里
Qcon2013 罗李 - hadoop在阿里
li luo
 
淘宝网前台应用性能优化实践
淘宝网前台应用性能优化实践淘宝网前台应用性能优化实践
淘宝网前台应用性能优化实践丁 宇
 
2014 Hpocon 姚仁捷 唯品会 - data driven ops
2014 Hpocon 姚仁捷   唯品会 - data driven ops2014 Hpocon 姚仁捷   唯品会 - data driven ops
2014 Hpocon 姚仁捷 唯品会 - data driven ops
Michael Zhang
 
Performance Data Analyze
Performance Data AnalyzePerformance Data Analyze
Performance Data Analyze
anysql
 
COSCUP 2019 - 開源大數據引擎 Greenplum
COSCUP 2019 - 開源大數據引擎 GreenplumCOSCUP 2019 - 開源大數據引擎 Greenplum
COSCUP 2019 - 開源大數據引擎 Greenplum
Omni-Alex Chen
 
Build 1 trillion warehouse based on carbon data
Build 1 trillion warehouse based on carbon dataBuild 1 trillion warehouse based on carbon data
Build 1 trillion warehouse based on carbon data
boxu42
 
05 杨志丰
05 杨志丰05 杨志丰
05 杨志丰锐 张
 
Accelerate Database as a Service(DBaaS) in Cloud era
Accelerate Database as a Service(DBaaS) in Cloud eraAccelerate Database as a Service(DBaaS) in Cloud era
Accelerate Database as a Service(DBaaS) in Cloud eraJunchi Zhang
 
NoSQL误用和常见陷阱分析
NoSQL误用和常见陷阱分析NoSQL误用和常见陷阱分析
NoSQL误用和常见陷阱分析
iammutex
 
用Python实现hadoop任务调度管理
用Python实现hadoop任务调度管理用Python实现hadoop任务调度管理
用Python实现hadoop任务调度管理
Leo Zhou
 

Semelhante a ClickHouse北京Meetup ClickHouse Best Practice @Sina (20)

Greenplum技术
Greenplum技术Greenplum技术
Greenplum技术
 
《数据库发展研究报告-解读(2023年)》.pdf
《数据库发展研究报告-解读(2023年)》.pdf《数据库发展研究报告-解读(2023年)》.pdf
《数据库发展研究报告-解读(2023年)》.pdf
 
大型电商的数据服务的要点和难点
大型电商的数据服务的要点和难点 大型电商的数据服务的要点和难点
大型电商的数据服务的要点和难点
 
淘宝双11双12案例分享
淘宝双11双12案例分享淘宝双11双12案例分享
淘宝双11双12案例分享
 
美团技术沙龙04 美团下一代分布式存储系统
美团技术沙龙04   美团下一代分布式存储系统美团技术沙龙04   美团下一代分布式存储系统
美团技术沙龙04 美团下一代分布式存储系统
 
基于MySQL开放复制协议的同步扩展
基于MySQL开放复制协议的同步扩展基于MySQL开放复制协议的同步扩展
基于MySQL开放复制协议的同步扩展
 
Taobao casestudy-yufeng-qcon
Taobao casestudy-yufeng-qconTaobao casestudy-yufeng-qcon
Taobao casestudy-yufeng-qcon
 
Mesos-based Data Infrastructure @ Douban
Mesos-based Data Infrastructure @ DoubanMesos-based Data Infrastructure @ Douban
Mesos-based Data Infrastructure @ Douban
 
开源+自主开发 - 淘宝软件基础设施构建实践
开源+自主开发  - 淘宝软件基础设施构建实践开源+自主开发  - 淘宝软件基础设施构建实践
开源+自主开发 - 淘宝软件基础设施构建实践
 
杨少华:阿里开放数据处理服务
杨少华:阿里开放数据处理服务杨少华:阿里开放数据处理服务
杨少华:阿里开放数据处理服务
 
Qcon2013 罗李 - hadoop在阿里
Qcon2013 罗李 - hadoop在阿里Qcon2013 罗李 - hadoop在阿里
Qcon2013 罗李 - hadoop在阿里
 
淘宝网前台应用性能优化实践
淘宝网前台应用性能优化实践淘宝网前台应用性能优化实践
淘宝网前台应用性能优化实践
 
2014 Hpocon 姚仁捷 唯品会 - data driven ops
2014 Hpocon 姚仁捷   唯品会 - data driven ops2014 Hpocon 姚仁捷   唯品会 - data driven ops
2014 Hpocon 姚仁捷 唯品会 - data driven ops
 
Performance Data Analyze
Performance Data AnalyzePerformance Data Analyze
Performance Data Analyze
 
COSCUP 2019 - 開源大數據引擎 Greenplum
COSCUP 2019 - 開源大數據引擎 GreenplumCOSCUP 2019 - 開源大數據引擎 Greenplum
COSCUP 2019 - 開源大數據引擎 Greenplum
 
Build 1 trillion warehouse based on carbon data
Build 1 trillion warehouse based on carbon dataBuild 1 trillion warehouse based on carbon data
Build 1 trillion warehouse based on carbon data
 
05 杨志丰
05 杨志丰05 杨志丰
05 杨志丰
 
Accelerate Database as a Service(DBaaS) in Cloud era
Accelerate Database as a Service(DBaaS) in Cloud eraAccelerate Database as a Service(DBaaS) in Cloud era
Accelerate Database as a Service(DBaaS) in Cloud era
 
NoSQL误用和常见陷阱分析
NoSQL误用和常见陷阱分析NoSQL误用和常见陷阱分析
NoSQL误用和常见陷阱分析
 
用Python实现hadoop任务调度管理
用Python实现hadoop任务调度管理用Python实现hadoop任务调度管理
用Python实现hadoop任务调度管理
 

ClickHouse北京Meetup ClickHouse Best Practice @Sina