SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
Performance Optimization for All Flash based on aarch64
Huawei 2021/06/11
 Core Count:32/48/64 cores
 Frequency:2.6 /3.0GHz
 Memory controller : 8 DDR4 controllers
 Interface: PCIe 4.0, CCIX, 100G RoCE, SAS/SATA
 Process:7nm
Kunpeng920 ARM-Based CPU
Industry’s Most Powerful ARM-Based CPU
Ceph solution based on Kunpeng920
2
Ceph technical architecture based on Kunpeng chips
TaiShan
hardware
OS
Acceleration
library
Compression Crypto
OpenEuler …
Optimizes performance through software and hardware
collaboration
Disk IO
CPU
Network
Improves CPU usage
• Turn CPU prefetching on or off
• Optimize the number of concurrent threads
• NUMA optimization
• kernel 4K/64K pagesize
I/O performance optimization
Optimizes NIC performance
• Interrupt core binding
• MTU adjustment
• TCP parameter adjustment
3
 Data access across NUMA
 Multi-port NIC deployment
 DDR Multi-channel deployment
 Messenger throttle low
 Long queue waiting time in the OSD
 Use 64K PageSize
 Optimized crc32c in rocksdb
Table of Contents
4
①Data access across NUMA
m
C
…
NIC
NVME
m
C
…
NVME
PCIe PCIe
m
C
…
m
C
…
NVME
PCIe
NVME
PCIe
Data flows across NUMA The data flow is completed within NUMA.
5
OSD NUMA-aware Deployment
 OSDs are evenly deployed in NUMA nodes to avoid cross-NUMA scheduling and memory access overheads.
6
9%
4%
11%
8%
11%
8%
0%
2%
4%
6%
8%
10%
12%
Rand Write Rand Read 70/30 Rand RW
NUMA-aware Deployment IOPS Higher
4K 8K
8%
3%
10%
8%
10%
7%
0%
2%
4%
6%
8%
10%
12%
Rand Write Rand Read 70/30 Rand RW
NUMA-aware Deployment Latency Lower
4K 8K
② Multi-port NIC deployment
 NIC interrupts and
OSD processes are
bind to the same
NUMA.
 NICs receive packets
and OSD recvmsg in
the same NUMA.
7
7%
5%
3%
5%
3%
6%
0%
1%
2%
3%
4%
5%
6%
7%
8%
Rand Write Rand Read 70/30 Rand RW
Multi-port NIC Deployment IOPS Higher
4K 8K
6%
5%
3%
4%
0%
6%
0%
1%
2%
3%
4%
5%
6%
7%
Rand Write Rand Read 70/30 Rand RW
Multi-port NIC Deployment Latency Lower
4K 8K
NIC0
O
S
D
O
S
D
O
S
D
NUMA0
O
S
D
O
S
D
O
S
D
NUMA1
O
S
D
O
S
D
O
S
D
NUMA2
O
S
D
O
S
D
O
S
D
NUMA3
NIC1
NIC2
NIC3
NUMA0
NIC0 NIC1 NIC2 NIC3
O
S
D
O
S
D
O
S
D
NUMA0
O
S
D
O
S
D
O
S
D
NUMA1
O
S
D
O
S
D
O
S
D
NUMA2
O
S
D
O
S
D
O
S
D
NUMA3
NUMA0
③ DDR Multi-channel connection
8
7%
11%
0%
2%
4%
6%
8%
10%
12%
14%
Rand Write Rand Read
DDR channel 16 VS 12 | 4KB Block Size | IOPS Higher
 16-channels DDR is 7% higher than that of 12-channels DDR.
④Messenger throttler
9
 Gradually increase the osd_client_message_cap.
When the value is 1000, the IOPS increases by 3.7%.
0.0%
3.1%
1.7%
0.9%
2.8%
3.5%
2.1%
1.7%
2.5%
3.7% 3.7%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
0 200 400 600 800 1000 1200 1400 1600
osd_client_message_cap
4K Rand Write | IOPS higher
⑤ Long queue waiting time in the OSD
 The average length of the OPQueue queue is less than 10 and the queuing time is short. The length of kv_queue and
kv_committing queues is greater than 30, and the queuing duration is long.
OPQueue
Messenger
Msgr-workers
OSD Threads
Data
Disk
kv_queue
bstore_aio
kv_committing
Swap
bstore_kv_sync
WAL+DB
Disk
flush
kv_committing_to_finalize onCommits
OSD Threads
bstore_kv_final
OSD Processing
Messenger
Messenger
Enqueue OP
Dequeue OP
Enqueue Swap
ContextQueue
op_before_dequeue_op_lat 16%
state_kv_queued_lat 26%
state_kv_commiting_lat 27%
10
send
0%
recv
1% dispatch
op_before_dequeue_op
_lat 17%
op_w_prepare_l
atency 3%
state_aio_wait_lat
12%
state_kv_queued_lat
27%
kv_sync_lat
7%
kv_final_lat
3%
wait in queues
28%
state_kv_commiting_lat=
kv_sync_lat+kv_final_lat+
wait in queue=38%
CPU partition affinity
11
13%
21%
8%
0%
14%
4%
0%
5%
10%
15%
20%
25%
Rand Write Rand Read 70/30 Rand RW
CPU partition affinity IOPS Higher
4K 8K
12%
18%
7%
0%
13%
4%
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
Rand Write Rand Read 70/30 Rand RW
CPU partition affinity Latency Lower
4K 8K
 The CPU partition is used to separate the msgr-worker/tp_osd_tp and
bstore threads to achieve fair scheduling.
⑥Use 64K PageSize
 Use CEPH_PAGE_SHIFT for compatibility with various page sizes
1.12
1.17
1.15
1.09
1.10
1.11
1.12
1.13
1.14
1.15
1.16
1.17
1.18
RandWrite Randread RandReadWrite
64K/4K pagesize
4KB Block Size| IOPS Higher
 Compared with 4K pages, 64K pages reduce TLB
miss and improve performance by 10%+.
Tips:
 Use small page alignment to reduce memory waste in
bufferlist::reserve
12
64K Pagesize Issues
13
 Write Amplification Issue
When bluefs_buffered_io is set to true, metadata is written using buffer I/O, and sync_file_range is called to write data to disks by page in the kernel. The
magnification factor is 2.46 for 4K pages and 5.46 for 64K pages.
When bluefs_buffered_io is set to false, metadata is written using direct I/O, sync_file_range is not called. The magnification factor is 2.29. Too many
writes affect the disk life cycle. Therefore, set bluefs_buffered_io to false.
 Tcmalloc and kernel page size issue
When the tcmalloc page size is smaller than the kernel page size, the memory keeps increasing until it approaches osd_memory_target, and the
performance deteriorates significantly. Ensure that the page size of tcmalloc is greater than the kernel page size.
⑦ Optimized crc32c in rocksdb
Rocksdb's crc32c_arm64 is supported since Ceph pacific. Backport this feature to earlier version, the performance is
improved by about 3%.
14
Questions?
THANK YOU

Mais conteúdo relacionado

Mais procurados

Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph Community
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화OpenStack Korea Community
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFShapeBlue
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...OpenStack Korea Community
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortNAVER D2
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage systemItalo Santos
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Community
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Ceph Community
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashCeph Community
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsKaran Singh
 

Mais procurados (20)

Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph RBD Update - June 2021
Ceph RBD Update - June 2021
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
Ceph on arm64 upload
Ceph on arm64   uploadCeph on arm64   upload
Ceph on arm64 upload
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Ceph as software define storage
Ceph as software define storageCeph as software define storage
Ceph as software define storage
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage system
 
Ceph issue 해결 사례
Ceph issue 해결 사례Ceph issue 해결 사례
Ceph issue 해결 사례
 
Ceph
CephCeph
Ceph
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on Flash
 
CephFS Update
CephFS UpdateCephFS Update
CephFS Update
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
 

Semelhante a Performance optimization for all flash based on aarch64 v2.0

Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong TangAccelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong TangCeph Community
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...In-Memory Computing Summit
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Community
 
Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Community
 
Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Community
 
Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage Ceph Community
 
Accelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFAccelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFinside-BigData.com
 
Storage and performance, Whiptail
Storage and performance, Whiptail Storage and performance, Whiptail
Storage and performance, Whiptail Internet World
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Community
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdfhellobank1
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] IO Visor Project
 
Redis on NVMe SSD - Zvika Guz, Samsung
 Redis on NVMe SSD - Zvika Guz, Samsung Redis on NVMe SSD - Zvika Guz, Samsung
Redis on NVMe SSD - Zvika Guz, SamsungRedis Labs
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster inwin stack
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Community
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 

Semelhante a Performance optimization for all flash based on aarch64 v2.0 (20)

Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong TangAccelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
 
Ceph on rdma
Ceph on rdmaCeph on rdma
Ceph on rdma
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage
 
Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage
 
Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage
 
Accelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFAccelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oF
 
Storage and performance, Whiptail
Storage and performance, Whiptail Storage and performance, Whiptail
Storage and performance, Whiptail
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
 
Redis on NVMe SSD - Zvika Guz, Samsung
 Redis on NVMe SSD - Zvika Guz, Samsung Redis on NVMe SSD - Zvika Guz, Samsung
Redis on NVMe SSD - Zvika Guz, Samsung
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
ceph-barcelona-v-1.2
ceph-barcelona-v-1.2ceph-barcelona-v-1.2
ceph-barcelona-v-1.2
 
Ceph barcelona-v-1.2
Ceph barcelona-v-1.2Ceph barcelona-v-1.2
Ceph barcelona-v-1.2
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Último (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Performance optimization for all flash based on aarch64 v2.0

  • 1. Performance Optimization for All Flash based on aarch64 Huawei 2021/06/11
  • 2.  Core Count:32/48/64 cores  Frequency:2.6 /3.0GHz  Memory controller : 8 DDR4 controllers  Interface: PCIe 4.0, CCIX, 100G RoCE, SAS/SATA  Process:7nm Kunpeng920 ARM-Based CPU Industry’s Most Powerful ARM-Based CPU Ceph solution based on Kunpeng920 2 Ceph technical architecture based on Kunpeng chips TaiShan hardware OS Acceleration library Compression Crypto OpenEuler …
  • 3. Optimizes performance through software and hardware collaboration Disk IO CPU Network Improves CPU usage • Turn CPU prefetching on or off • Optimize the number of concurrent threads • NUMA optimization • kernel 4K/64K pagesize I/O performance optimization Optimizes NIC performance • Interrupt core binding • MTU adjustment • TCP parameter adjustment 3
  • 4.  Data access across NUMA  Multi-port NIC deployment  DDR Multi-channel deployment  Messenger throttle low  Long queue waiting time in the OSD  Use 64K PageSize  Optimized crc32c in rocksdb Table of Contents 4
  • 5. ①Data access across NUMA m C … NIC NVME m C … NVME PCIe PCIe m C … m C … NVME PCIe NVME PCIe Data flows across NUMA The data flow is completed within NUMA. 5
  • 6. OSD NUMA-aware Deployment  OSDs are evenly deployed in NUMA nodes to avoid cross-NUMA scheduling and memory access overheads. 6 9% 4% 11% 8% 11% 8% 0% 2% 4% 6% 8% 10% 12% Rand Write Rand Read 70/30 Rand RW NUMA-aware Deployment IOPS Higher 4K 8K 8% 3% 10% 8% 10% 7% 0% 2% 4% 6% 8% 10% 12% Rand Write Rand Read 70/30 Rand RW NUMA-aware Deployment Latency Lower 4K 8K
  • 7. ② Multi-port NIC deployment  NIC interrupts and OSD processes are bind to the same NUMA.  NICs receive packets and OSD recvmsg in the same NUMA. 7 7% 5% 3% 5% 3% 6% 0% 1% 2% 3% 4% 5% 6% 7% 8% Rand Write Rand Read 70/30 Rand RW Multi-port NIC Deployment IOPS Higher 4K 8K 6% 5% 3% 4% 0% 6% 0% 1% 2% 3% 4% 5% 6% 7% Rand Write Rand Read 70/30 Rand RW Multi-port NIC Deployment Latency Lower 4K 8K NIC0 O S D O S D O S D NUMA0 O S D O S D O S D NUMA1 O S D O S D O S D NUMA2 O S D O S D O S D NUMA3 NIC1 NIC2 NIC3 NUMA0 NIC0 NIC1 NIC2 NIC3 O S D O S D O S D NUMA0 O S D O S D O S D NUMA1 O S D O S D O S D NUMA2 O S D O S D O S D NUMA3 NUMA0
  • 8. ③ DDR Multi-channel connection 8 7% 11% 0% 2% 4% 6% 8% 10% 12% 14% Rand Write Rand Read DDR channel 16 VS 12 | 4KB Block Size | IOPS Higher  16-channels DDR is 7% higher than that of 12-channels DDR.
  • 9. ④Messenger throttler 9  Gradually increase the osd_client_message_cap. When the value is 1000, the IOPS increases by 3.7%. 0.0% 3.1% 1.7% 0.9% 2.8% 3.5% 2.1% 1.7% 2.5% 3.7% 3.7% 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 0 200 400 600 800 1000 1200 1400 1600 osd_client_message_cap 4K Rand Write | IOPS higher
  • 10. ⑤ Long queue waiting time in the OSD  The average length of the OPQueue queue is less than 10 and the queuing time is short. The length of kv_queue and kv_committing queues is greater than 30, and the queuing duration is long. OPQueue Messenger Msgr-workers OSD Threads Data Disk kv_queue bstore_aio kv_committing Swap bstore_kv_sync WAL+DB Disk flush kv_committing_to_finalize onCommits OSD Threads bstore_kv_final OSD Processing Messenger Messenger Enqueue OP Dequeue OP Enqueue Swap ContextQueue op_before_dequeue_op_lat 16% state_kv_queued_lat 26% state_kv_commiting_lat 27% 10 send 0% recv 1% dispatch op_before_dequeue_op _lat 17% op_w_prepare_l atency 3% state_aio_wait_lat 12% state_kv_queued_lat 27% kv_sync_lat 7% kv_final_lat 3% wait in queues 28% state_kv_commiting_lat= kv_sync_lat+kv_final_lat+ wait in queue=38%
  • 11. CPU partition affinity 11 13% 21% 8% 0% 14% 4% 0% 5% 10% 15% 20% 25% Rand Write Rand Read 70/30 Rand RW CPU partition affinity IOPS Higher 4K 8K 12% 18% 7% 0% 13% 4% 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% Rand Write Rand Read 70/30 Rand RW CPU partition affinity Latency Lower 4K 8K  The CPU partition is used to separate the msgr-worker/tp_osd_tp and bstore threads to achieve fair scheduling.
  • 12. ⑥Use 64K PageSize  Use CEPH_PAGE_SHIFT for compatibility with various page sizes 1.12 1.17 1.15 1.09 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 RandWrite Randread RandReadWrite 64K/4K pagesize 4KB Block Size| IOPS Higher  Compared with 4K pages, 64K pages reduce TLB miss and improve performance by 10%+. Tips:  Use small page alignment to reduce memory waste in bufferlist::reserve 12
  • 13. 64K Pagesize Issues 13  Write Amplification Issue When bluefs_buffered_io is set to true, metadata is written using buffer I/O, and sync_file_range is called to write data to disks by page in the kernel. The magnification factor is 2.46 for 4K pages and 5.46 for 64K pages. When bluefs_buffered_io is set to false, metadata is written using direct I/O, sync_file_range is not called. The magnification factor is 2.29. Too many writes affect the disk life cycle. Therefore, set bluefs_buffered_io to false.  Tcmalloc and kernel page size issue When the tcmalloc page size is smaller than the kernel page size, the memory keeps increasing until it approaches osd_memory_target, and the performance deteriorates significantly. Ensure that the page size of tcmalloc is greater than the kernel page size.
  • 14. ⑦ Optimized crc32c in rocksdb Rocksdb's crc32c_arm64 is supported since Ceph pacific. Backport this feature to earlier version, the performance is improved by about 3%. 14