SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
© 2021 Arm
Ceph on Arm64
Richael Zhuang
Arm
Experience Ceph Storage Ecosystem on Arm Server
2 © 2021 Arm
Agenda
• Overview
• Ceph common libs optimization
• Ceph storage acceleration
• SPDK
• Seastar
• Bring Ceph to the Cloud
3 © 2021 Arm
Ceph Storage Ecosystem on Arm64
Seastar Framework
Storage
Networking
Linux AIO
IO_URING
Common
Hashing
CRC
UTF-8
ISA-L
……
RADOS
LIBRADOS
RADOSG
W
RBD CEPHFS
Ceph CSI
FileStore
SPDK
NVMe
BlueStore
SeaStore OSD
OSD MON
TCP(Kernel/DPDK)
RDMA
NVMe-
oF
ISA-L OCF
• Covering multiple open source
communities
• Ceph, Seastar, SPDK, DPDK, ISA-L
• Ceph CSI, OpenStack
• Covering wide technical fields
• Cloud storage
• Virtualization, Containers
• User mode storage, network
acceleration
• NVMe, NVMe-oF, RDMA
• Software optimization
• Hardware acceleration
4 © 2021 Arm
Agenda
• Overview
• Ceph common libs optimization
• Ceph storage acceleration
• SPDK
• Seastar
• Bring Ceph to the Cloud
5 © 2021 Arm
Ceph Common Libs Optimization
• We keep trying to leverage Arm CPU features to optimize Ceph common libs.
• It’s not easy as most Ceph code are already highly optimized
• But there are still chances for improvement, especially for Arm architecture
• Optimize UTF8 string handling
• Up to 8x performance boost for string validation
• Up to 50% performance gain for string encoding
https://github.com/ceph/ceph/pull/27807, https://github.com/ceph/ceph/pull/27628
• Accelerate Arm CRC32 calculation with up to 3x performance boost
https://github.com/ceph/ceph/pull/12977
0
200
400
600
Original Optimized
UTF8 Validation Throughput on Arm
0
100
200
300
400
Original Optimized
UTF8 Encoding Throughput on Arm
MB/s MB/s
6 © 2021 Arm
Ceph + ISA-L
• ISA-L offload support for Compression,Encryption and so on in Ceph
• Added support of CRC, IGZIP, RAID, AES-GCM, Multibuffer
MD5/SHA1/SHA256/SM3/SHA512,Multi-Hash SHA1/SHA256/, RollingHash using Arm
specific features and instructions
• Added multi-binary support for arm64
• Currently working on: AES-XTS,Multi-Hash Sha1+Murmur3
• Refer to isa-l/isa-l_crypto github for details. Code put under aarch64/.
• Jerry.yu@arm.com
7 © 2021 Arm
Ceph with 64K Kernel Page
• ARM has 64K kernel page support , large page size may benefit ceph in
• Improve TLB hit rate and reduce page table look up effort
• Smaller page table occupies smaller memory space
• Less page table levels, better VA->PA translation speed
• Test configuration
• Ceph cluster
– Ceph 15.2.11, SPDK (06d09c1108b1)
– 1 MON, 1 MGR, 3 OSD
– One P4610 NVMe per OSD
– CPU: 2.8GHz, 128GB DDR4-3200,multi-core
– kernel: linux5.8.0
• Client: CPU:2.8GHz, multi-core ,kernel: linux5.8.0
• Test tool: Fio (v3.16)
• Test case
– Sequential read/write and random read/write with
4/16/64/256/4096K block size, 4KB /64KB kernel pagesize
MON
OSD
1
OSD
2
OSD
3
client
Ceph cluster
MGR
8 © 2021 Arm
Ceph with 64K kernel page – Benchmark
ceph cluster side : 4 cores activated
86.22
247.2
1018.4
2573.5
2873.5
89.14
268.9
1131.9
2724.1
3134.9
0
500
1000
1500
2000
2500
3000
3500
4K 16K 64K 256K 4M
BW(MiB/s)
block size
Ceph RBD read bandwidth
4KB page 64KB page
219.2
445
661
862.1
934.1
237.5
514.9
805.8
998.3 1025.9
0
200
400
600
800
1000
1200
4K 16K 64K 256K 4M
BW(MiB/s)
block size
Ceph RBD write bandwidth
4KB page 64KB page
129.9
472.8
1405.8
2646
2811.8
138
511
1472
2867.4
3092.7
0
500
1000
1500
2000
2500
3000
3500
4K 16K 64K 256K 4M
BW(MiB/s)
block size
Ceph RBD randread bandwidth
4KB page 64KB page
20.25
72.68
222.1
500.8
927.9
21.45
79.32
256.3
593.9
1024.3
0
200
400
600
800
1000
1200
4K 16K 64K 256K 4M
BW(MiB/s)
block size
Ceph RBD readwrite bandwidth
4KB page 64KB page
3.39%
9.1%
8.35%
9.83%
11.11%
21.91%
9.99%
6.24% 5.93%
10.39%
15.4%
• Read
3.39%~11.11%
• write
8.35%~21.91%
• Randread
6.24% ~ 9.99%
• Randwrite
5.93%~15.4%
9 © 2021 Arm
Ceph with 64K kernel page – Benchmark
ceph cluster side : 2 cores activated
79.56
222.2
692.4
1433.7
2135
77.36
239
721.5
1547.4
2365.8
0
500
1000
1500
2000
2500
4K 16K 64K 256K 4M
BW(MiB/s)
block size
Ceph RBD read bandwidth
4KB page 64KB page
174.5
318
422.8
472.3
506.4
194.1
379.7
532.2
604.4
664.5
0
100
200
300
400
500
600
700
4K 16K 64K 256K 4M
BW(MiB/s)
block size
Ceph RBD write bandwidth
4KB page 64KB page
70.17
249.2
739.2
1463.6
2096.5
74.34
270.3
793.4
1606.1
2299.3
0
500
1000
1500
2000
2500
4K 16K 64K 256K 4M
BW(MiB/s)
block size
Ceph RBD randread bandwidth
4KB page 64KB page
10.33 34.94
109.3
262.7
506.4
11.57 42.12
132.9
320.1
674
0
100
200
300
400
500
600
700
800
4K 16K 64K 256K 4M
BW(MiB/s)
block size
Ceph RBD randwrite bandwidth
4KB page 64KB page
10.81%
7.93%
4.2%
31.22%
27.97%
25.88%
11.83%
19.4%
5.94%
33.1%
12.1%
9.67%
10 © 2021 Arm
Other Optimizations – Investigating
• We are investigating new optimization points
• Leverage SVE /SVE2 (Scalable vector extension)
• Leverage Non-Temporal instructions to prevent cache pollution
• BlueStore: RocksDB compression lib optimization
11 © 2021 Arm
Agenda
• Overview
• Ceph common libs optimization
• Ceph storage acceleration
• SPDK
• Seastar
• Bring Ceph to the Cloud
12 © 2021 Arm
Ceph + spdk
• SPDK can be used to accelerate the block
service built on Ceph
• Use SPDK’s user space NVMe driver instead of
Kernel NVMe driver in Bluestore
• SPDK iscsi tgt or nvme-of target to accelerate client
IO performance on ceph cluster by introducing some
caching solution
(https://www.snia.org/sites/default/files/SDC/2017/presentation
s/Storage_Architecture/Yang_Zie_Accelerate_block_service_built
_on_Ceph_via_SPDK.pdf)
SPDK iSCSI target SPDK NVMe-oF target
SPDK Ceph RBD bdev module(use librbd/librados)
Ceph RBD service
filestore Bluestore KVStore
metadata
RocksDB
BlueRocksEnv
Bluefs
Kernel driver
NVMe device
metadata
RocksDB
BlueRocksENV
Bluefs
SPDK NVMe
driver
NVMe device
Export block service
13 © 2021 Arm
Optimization in SPDK
• Optimize base64 with ARM NEON intrinsics
• Up to 2.3x speedup for encoding ,1.7x speedup for decoding
• Memory barrier
• Arm has a weak memory model
– place fewer constraints on partners and provides a lot of scope for hardware optimization
• Use more precise constraint to make fewer instructions outside our code sequence affected by our barriers:
– ISB > DSB > DMB
– Half barrier: load-acquire, store-release
– For C11: Use __atomic built-ins instead of __sync built-ins to do atomic add/sub/inc/dec/cmp ··· operations
▪ __sync builtins enforce unnecessary full barriers on some use cases
▪ __atomic ones which conform to C11 memory model have finer memory order control (__ATOMIC_RELAXED
/__ATOMIC_ACQUIRE /__ATOMIC_RELEASE…)
• NVMe over fabric optimization
• In NVMe over tcp, leverage TCP’s SO_INCOMING_CPU feature to distribute the processing of TCP
connections to specific CPUs, aim to use CPU cache locality.
14 © 2021 Arm
Ceph + SPDK performance
228.3 229.7
151.6
145.5
216.1 215.1
248
210.2
0
50
100
150
200
250
300
0
10
20
30
40
50
60
70
bluestore
bluestore+spdk
bluestore
bluestore+spdk
bluestore
bluestore+spdk
bluestore
bluestore+spdk
randread randwrite read write
BW(MiB/s)
IOPS(K)
Ceph RBD performance with qdepth=256, remote client
iops BW
• Ceph cluster
• Ceph 15.2.11, SPDK (06d09c1108b1)
• 1 MON, 1 MGR, 3 OSD
• One P4610 NVMe per OSD
• CPU: 2.8GHz, multi-core
• kernel: linux5.8.0
• Client
• CPU:2.8GHz, 128GB DDR4-3200 multi-core
• kernel: linux5.8.0
• Test tool
• Fio (v3.16)
• Test case
• Sequential read/write and random
read/write with 4KB block size
15 © 2021 Arm
Agenda
• Overview
• Ceph common libs optimization
• Ceph storage acceleration
• SPDK
• Seastar
• Bring Ceph to the Cloud
16 © 2021 Arm
Ceph OSD on Seastar
• Next generation Ceph OSD
• Targeting fast networking and storage
• Based on Seastar high-performance server application framework
• Seastar
• Each core runs a shared-nothing run-to-completion task scheduler
• Natural fit for Arm architecture with many cores
Kernel
Thread
Thread
Thread
OS
Thread
Scheduler Memory Network
epoll
Traditional application
NIC queues
Seastar application
Dedicated core and
memory for each shard
User mode scheduler,
memory management
and networking
DPDK
Linux AIO
IO_URING
Disk IO
Network IO
Tasks
Tasks
Tasks
Tasks
3
Heap-3
DPDK
Linux AIO
IO_URING
Disk IO
Network IO
Tasks
Tasks
Tasks
Tasks
2
Heap-2
DPDK
Linux AIO
IO_URING
Disk IO
Network IO
Tasks
Tasks
Tasks
Tasks
1
Heap-1
DPDK
Linux AIO
IO_URING
Network IO
Disk IO
Tasks
Tasks
Tasks
Seastar
Task
Shard 0
Bypass Kernel
17 © 2021 Arm
Seastar work on Arm64
• Upgrade Seastar DPDK to leverage new hardware
• Seastar hacks DPDK mbuf to achieve zero copy between SeaStar heap and NIC;
Leverage VFIO+IOMMU to use IOVA, not PA(/proc/self/pagemap), for DMA;
Direct map(IOVA=VA) each core’s whole heap space at startup.
• Upgrade to new offload API
• Other updates:
• Fixed network stack crash issue on Arm
caused by "func(free(ptr), use(ptr))",
on x86, parameters are passed from right to left in stack pushing order;
on Arm, parameters are passed from left to right to leverage
abundant general purpose registers, free(ptr) will be called before use(ptr),
which leads to crash.
https://github.com/scylladb/seastar/commit/6d82ee6797fcfbe3b65cfae2b4468ee68efacd48
…
IOMMU
/proc/$$/pagemap
DMA
DMA
PA
VA
VA
IOVA
NIC SeaStar
MMU
18 © 2021 Arm
Seastar Benchmark On Arm
• Benchmark http throughput vs Arm core count
• Performance improves linearly per core count
#cores requests/sec
4 368,113
8 767,244
16 1,529,532
32 2,939,638
48 3,651,702
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
4,000,000
0 10 20 30 40 50
#cores
requests/s
References https://github.com/scylladb/seastar/wiki/HTTPD-benchmark
Tested on Arm64 sever with Seastar
httpd benchmark tool and DPDK 19.05
19 © 2021 Arm
Agenda
• Overview
• Ceph common libs optimization
• Ceph storage acceleration
• SPDK
• Seastar
• Bring Ceph to the Cloud
20 © 2021 Arm
Ceph in Cloud Storage
• Ceph as OpenStack storage backend is
mature on Arm
• Object storage (swift),Block storage (Cinder),
Image service (Glance)
• Support Ceph as Kubernetes container
cloud storage backend on Arm
• official support for some critical container
images for Arm(e.g. Kubernetes CSI sidecar
images)
• Added Arm image support to Ceph-CSI
community, added Arm jobs in Community CI
• Support Rook on Arm64 (K8s+rook+ceph)
K8s node
OSD
K8s node
OSD
K8s node
OSD
K8s node
MON
RBD CEPHFS
K8s node
CEPH CSI
App
K8s node
App
PVC PVC
PV PV
21 © 2021 Arm
If you have any questions, please send emails to the following mailbox, we will response as soon as possible.
richael.zhuang@arm.com
© 2021 Arm
Thank You
Danke
Gracias
谢谢
ありがとう
Asante
Merci
감사합니다
धन्यवाद
Kiitos
‫ا‬ً‫شكر‬
ধন্যবাদ
‫תודה‬

Mais conteúdo relacionado

Mais procurados

Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
buildacloud
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
Sim Janghoon
 

Mais procurados (20)

Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Ceph issue 해결 사례
Ceph issue 해결 사례Ceph issue 해결 사례
Ceph issue 해결 사례
 
CephFS Update
CephFS UpdateCephFS Update
CephFS Update
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
 
Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS Scheduler
 
Ceph and Openstack in a Nutshell
Ceph and Openstack in a NutshellCeph and Openstack in a Nutshell
Ceph and Openstack in a Nutshell
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
Ceph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Month 2021: RADOS Update
Ceph Month 2021: RADOS Update
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency Ceph
 

Semelhante a Ceph on arm64 upload

2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
ssuser866937
 

Semelhante a Ceph on arm64 upload (20)

Ceph
CephCeph
Ceph
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
 
CLFS 2010
CLFS 2010CLFS 2010
CLFS 2010
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
Renaissance of sparc UKOUG 2014
Renaissance of sparc UKOUG 2014Renaissance of sparc UKOUG 2014
Renaissance of sparc UKOUG 2014
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der Ster
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
 
Technical sales education enterprise- svc and ibm flash best practices update
Technical sales education   enterprise- svc and ibm flash best practices updateTechnical sales education   enterprise- svc and ibm flash best practices update
Technical sales education enterprise- svc and ibm flash best practices update
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
 
pps Matters
pps Matterspps Matters
pps Matters
 
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
 
Linux one vs x86
Linux one vs x86 Linux one vs x86
Linux one vs x86
 
Linux one vs x86 18 july
Linux one vs x86 18 julyLinux one vs x86 18 july
Linux one vs x86 18 july
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Ceph on arm64 upload

  • 1. © 2021 Arm Ceph on Arm64 Richael Zhuang Arm Experience Ceph Storage Ecosystem on Arm Server
  • 2. 2 © 2021 Arm Agenda • Overview • Ceph common libs optimization • Ceph storage acceleration • SPDK • Seastar • Bring Ceph to the Cloud
  • 3. 3 © 2021 Arm Ceph Storage Ecosystem on Arm64 Seastar Framework Storage Networking Linux AIO IO_URING Common Hashing CRC UTF-8 ISA-L …… RADOS LIBRADOS RADOSG W RBD CEPHFS Ceph CSI FileStore SPDK NVMe BlueStore SeaStore OSD OSD MON TCP(Kernel/DPDK) RDMA NVMe- oF ISA-L OCF • Covering multiple open source communities • Ceph, Seastar, SPDK, DPDK, ISA-L • Ceph CSI, OpenStack • Covering wide technical fields • Cloud storage • Virtualization, Containers • User mode storage, network acceleration • NVMe, NVMe-oF, RDMA • Software optimization • Hardware acceleration
  • 4. 4 © 2021 Arm Agenda • Overview • Ceph common libs optimization • Ceph storage acceleration • SPDK • Seastar • Bring Ceph to the Cloud
  • 5. 5 © 2021 Arm Ceph Common Libs Optimization • We keep trying to leverage Arm CPU features to optimize Ceph common libs. • It’s not easy as most Ceph code are already highly optimized • But there are still chances for improvement, especially for Arm architecture • Optimize UTF8 string handling • Up to 8x performance boost for string validation • Up to 50% performance gain for string encoding https://github.com/ceph/ceph/pull/27807, https://github.com/ceph/ceph/pull/27628 • Accelerate Arm CRC32 calculation with up to 3x performance boost https://github.com/ceph/ceph/pull/12977 0 200 400 600 Original Optimized UTF8 Validation Throughput on Arm 0 100 200 300 400 Original Optimized UTF8 Encoding Throughput on Arm MB/s MB/s
  • 6. 6 © 2021 Arm Ceph + ISA-L • ISA-L offload support for Compression,Encryption and so on in Ceph • Added support of CRC, IGZIP, RAID, AES-GCM, Multibuffer MD5/SHA1/SHA256/SM3/SHA512,Multi-Hash SHA1/SHA256/, RollingHash using Arm specific features and instructions • Added multi-binary support for arm64 • Currently working on: AES-XTS,Multi-Hash Sha1+Murmur3 • Refer to isa-l/isa-l_crypto github for details. Code put under aarch64/. • Jerry.yu@arm.com
  • 7. 7 © 2021 Arm Ceph with 64K Kernel Page • ARM has 64K kernel page support , large page size may benefit ceph in • Improve TLB hit rate and reduce page table look up effort • Smaller page table occupies smaller memory space • Less page table levels, better VA->PA translation speed • Test configuration • Ceph cluster – Ceph 15.2.11, SPDK (06d09c1108b1) – 1 MON, 1 MGR, 3 OSD – One P4610 NVMe per OSD – CPU: 2.8GHz, 128GB DDR4-3200,multi-core – kernel: linux5.8.0 • Client: CPU:2.8GHz, multi-core ,kernel: linux5.8.0 • Test tool: Fio (v3.16) • Test case – Sequential read/write and random read/write with 4/16/64/256/4096K block size, 4KB /64KB kernel pagesize MON OSD 1 OSD 2 OSD 3 client Ceph cluster MGR
  • 8. 8 © 2021 Arm Ceph with 64K kernel page – Benchmark ceph cluster side : 4 cores activated 86.22 247.2 1018.4 2573.5 2873.5 89.14 268.9 1131.9 2724.1 3134.9 0 500 1000 1500 2000 2500 3000 3500 4K 16K 64K 256K 4M BW(MiB/s) block size Ceph RBD read bandwidth 4KB page 64KB page 219.2 445 661 862.1 934.1 237.5 514.9 805.8 998.3 1025.9 0 200 400 600 800 1000 1200 4K 16K 64K 256K 4M BW(MiB/s) block size Ceph RBD write bandwidth 4KB page 64KB page 129.9 472.8 1405.8 2646 2811.8 138 511 1472 2867.4 3092.7 0 500 1000 1500 2000 2500 3000 3500 4K 16K 64K 256K 4M BW(MiB/s) block size Ceph RBD randread bandwidth 4KB page 64KB page 20.25 72.68 222.1 500.8 927.9 21.45 79.32 256.3 593.9 1024.3 0 200 400 600 800 1000 1200 4K 16K 64K 256K 4M BW(MiB/s) block size Ceph RBD readwrite bandwidth 4KB page 64KB page 3.39% 9.1% 8.35% 9.83% 11.11% 21.91% 9.99% 6.24% 5.93% 10.39% 15.4% • Read 3.39%~11.11% • write 8.35%~21.91% • Randread 6.24% ~ 9.99% • Randwrite 5.93%~15.4%
  • 9. 9 © 2021 Arm Ceph with 64K kernel page – Benchmark ceph cluster side : 2 cores activated 79.56 222.2 692.4 1433.7 2135 77.36 239 721.5 1547.4 2365.8 0 500 1000 1500 2000 2500 4K 16K 64K 256K 4M BW(MiB/s) block size Ceph RBD read bandwidth 4KB page 64KB page 174.5 318 422.8 472.3 506.4 194.1 379.7 532.2 604.4 664.5 0 100 200 300 400 500 600 700 4K 16K 64K 256K 4M BW(MiB/s) block size Ceph RBD write bandwidth 4KB page 64KB page 70.17 249.2 739.2 1463.6 2096.5 74.34 270.3 793.4 1606.1 2299.3 0 500 1000 1500 2000 2500 4K 16K 64K 256K 4M BW(MiB/s) block size Ceph RBD randread bandwidth 4KB page 64KB page 10.33 34.94 109.3 262.7 506.4 11.57 42.12 132.9 320.1 674 0 100 200 300 400 500 600 700 800 4K 16K 64K 256K 4M BW(MiB/s) block size Ceph RBD randwrite bandwidth 4KB page 64KB page 10.81% 7.93% 4.2% 31.22% 27.97% 25.88% 11.83% 19.4% 5.94% 33.1% 12.1% 9.67%
  • 10. 10 © 2021 Arm Other Optimizations – Investigating • We are investigating new optimization points • Leverage SVE /SVE2 (Scalable vector extension) • Leverage Non-Temporal instructions to prevent cache pollution • BlueStore: RocksDB compression lib optimization
  • 11. 11 © 2021 Arm Agenda • Overview • Ceph common libs optimization • Ceph storage acceleration • SPDK • Seastar • Bring Ceph to the Cloud
  • 12. 12 © 2021 Arm Ceph + spdk • SPDK can be used to accelerate the block service built on Ceph • Use SPDK’s user space NVMe driver instead of Kernel NVMe driver in Bluestore • SPDK iscsi tgt or nvme-of target to accelerate client IO performance on ceph cluster by introducing some caching solution (https://www.snia.org/sites/default/files/SDC/2017/presentation s/Storage_Architecture/Yang_Zie_Accelerate_block_service_built _on_Ceph_via_SPDK.pdf) SPDK iSCSI target SPDK NVMe-oF target SPDK Ceph RBD bdev module(use librbd/librados) Ceph RBD service filestore Bluestore KVStore metadata RocksDB BlueRocksEnv Bluefs Kernel driver NVMe device metadata RocksDB BlueRocksENV Bluefs SPDK NVMe driver NVMe device Export block service
  • 13. 13 © 2021 Arm Optimization in SPDK • Optimize base64 with ARM NEON intrinsics • Up to 2.3x speedup for encoding ,1.7x speedup for decoding • Memory barrier • Arm has a weak memory model – place fewer constraints on partners and provides a lot of scope for hardware optimization • Use more precise constraint to make fewer instructions outside our code sequence affected by our barriers: – ISB > DSB > DMB – Half barrier: load-acquire, store-release – For C11: Use __atomic built-ins instead of __sync built-ins to do atomic add/sub/inc/dec/cmp ··· operations ▪ __sync builtins enforce unnecessary full barriers on some use cases ▪ __atomic ones which conform to C11 memory model have finer memory order control (__ATOMIC_RELAXED /__ATOMIC_ACQUIRE /__ATOMIC_RELEASE…) • NVMe over fabric optimization • In NVMe over tcp, leverage TCP’s SO_INCOMING_CPU feature to distribute the processing of TCP connections to specific CPUs, aim to use CPU cache locality.
  • 14. 14 © 2021 Arm Ceph + SPDK performance 228.3 229.7 151.6 145.5 216.1 215.1 248 210.2 0 50 100 150 200 250 300 0 10 20 30 40 50 60 70 bluestore bluestore+spdk bluestore bluestore+spdk bluestore bluestore+spdk bluestore bluestore+spdk randread randwrite read write BW(MiB/s) IOPS(K) Ceph RBD performance with qdepth=256, remote client iops BW • Ceph cluster • Ceph 15.2.11, SPDK (06d09c1108b1) • 1 MON, 1 MGR, 3 OSD • One P4610 NVMe per OSD • CPU: 2.8GHz, multi-core • kernel: linux5.8.0 • Client • CPU:2.8GHz, 128GB DDR4-3200 multi-core • kernel: linux5.8.0 • Test tool • Fio (v3.16) • Test case • Sequential read/write and random read/write with 4KB block size
  • 15. 15 © 2021 Arm Agenda • Overview • Ceph common libs optimization • Ceph storage acceleration • SPDK • Seastar • Bring Ceph to the Cloud
  • 16. 16 © 2021 Arm Ceph OSD on Seastar • Next generation Ceph OSD • Targeting fast networking and storage • Based on Seastar high-performance server application framework • Seastar • Each core runs a shared-nothing run-to-completion task scheduler • Natural fit for Arm architecture with many cores Kernel Thread Thread Thread OS Thread Scheduler Memory Network epoll Traditional application NIC queues Seastar application Dedicated core and memory for each shard User mode scheduler, memory management and networking DPDK Linux AIO IO_URING Disk IO Network IO Tasks Tasks Tasks Tasks 3 Heap-3 DPDK Linux AIO IO_URING Disk IO Network IO Tasks Tasks Tasks Tasks 2 Heap-2 DPDK Linux AIO IO_URING Disk IO Network IO Tasks Tasks Tasks Tasks 1 Heap-1 DPDK Linux AIO IO_URING Network IO Disk IO Tasks Tasks Tasks Seastar Task Shard 0 Bypass Kernel
  • 17. 17 © 2021 Arm Seastar work on Arm64 • Upgrade Seastar DPDK to leverage new hardware • Seastar hacks DPDK mbuf to achieve zero copy between SeaStar heap and NIC; Leverage VFIO+IOMMU to use IOVA, not PA(/proc/self/pagemap), for DMA; Direct map(IOVA=VA) each core’s whole heap space at startup. • Upgrade to new offload API • Other updates: • Fixed network stack crash issue on Arm caused by "func(free(ptr), use(ptr))", on x86, parameters are passed from right to left in stack pushing order; on Arm, parameters are passed from left to right to leverage abundant general purpose registers, free(ptr) will be called before use(ptr), which leads to crash. https://github.com/scylladb/seastar/commit/6d82ee6797fcfbe3b65cfae2b4468ee68efacd48 … IOMMU /proc/$$/pagemap DMA DMA PA VA VA IOVA NIC SeaStar MMU
  • 18. 18 © 2021 Arm Seastar Benchmark On Arm • Benchmark http throughput vs Arm core count • Performance improves linearly per core count #cores requests/sec 4 368,113 8 767,244 16 1,529,532 32 2,939,638 48 3,651,702 0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 4,000,000 0 10 20 30 40 50 #cores requests/s References https://github.com/scylladb/seastar/wiki/HTTPD-benchmark Tested on Arm64 sever with Seastar httpd benchmark tool and DPDK 19.05
  • 19. 19 © 2021 Arm Agenda • Overview • Ceph common libs optimization • Ceph storage acceleration • SPDK • Seastar • Bring Ceph to the Cloud
  • 20. 20 © 2021 Arm Ceph in Cloud Storage • Ceph as OpenStack storage backend is mature on Arm • Object storage (swift),Block storage (Cinder), Image service (Glance) • Support Ceph as Kubernetes container cloud storage backend on Arm • official support for some critical container images for Arm(e.g. Kubernetes CSI sidecar images) • Added Arm image support to Ceph-CSI community, added Arm jobs in Community CI • Support Rook on Arm64 (K8s+rook+ceph) K8s node OSD K8s node OSD K8s node OSD K8s node MON RBD CEPHFS K8s node CEPH CSI App K8s node App PVC PVC PV PV
  • 21. 21 © 2021 Arm If you have any questions, please send emails to the following mailbox, we will response as soon as possible. richael.zhuang@arm.com
  • 22. © 2021 Arm Thank You Danke Gracias 谢谢 ありがとう Asante Merci 감사합니다 धन्यवाद Kiitos ‫ا‬ً‫شكر‬ ধন্যবাদ ‫תודה‬