Mais conteúdo relacionado
Semelhante a Ceph on arm64 upload (20)
Ceph on arm64 upload
- 1. © 2021 Arm
Ceph on Arm64
Richael Zhuang
Arm
Experience Ceph Storage Ecosystem on Arm Server
- 2. 2 © 2021 Arm
Agenda
• Overview
• Ceph common libs optimization
• Ceph storage acceleration
• SPDK
• Seastar
• Bring Ceph to the Cloud
- 3. 3 © 2021 Arm
Ceph Storage Ecosystem on Arm64
Seastar Framework
Storage
Networking
Linux AIO
IO_URING
Common
Hashing
CRC
UTF-8
ISA-L
……
RADOS
LIBRADOS
RADOSG
W
RBD CEPHFS
Ceph CSI
FileStore
SPDK
NVMe
BlueStore
SeaStore OSD
OSD MON
TCP(Kernel/DPDK)
RDMA
NVMe-
oF
ISA-L OCF
• Covering multiple open source
communities
• Ceph, Seastar, SPDK, DPDK, ISA-L
• Ceph CSI, OpenStack
• Covering wide technical fields
• Cloud storage
• Virtualization, Containers
• User mode storage, network
acceleration
• NVMe, NVMe-oF, RDMA
• Software optimization
• Hardware acceleration
- 4. 4 © 2021 Arm
Agenda
• Overview
• Ceph common libs optimization
• Ceph storage acceleration
• SPDK
• Seastar
• Bring Ceph to the Cloud
- 5. 5 © 2021 Arm
Ceph Common Libs Optimization
• We keep trying to leverage Arm CPU features to optimize Ceph common libs.
• It’s not easy as most Ceph code are already highly optimized
• But there are still chances for improvement, especially for Arm architecture
• Optimize UTF8 string handling
• Up to 8x performance boost for string validation
• Up to 50% performance gain for string encoding
https://github.com/ceph/ceph/pull/27807, https://github.com/ceph/ceph/pull/27628
• Accelerate Arm CRC32 calculation with up to 3x performance boost
https://github.com/ceph/ceph/pull/12977
0
200
400
600
Original Optimized
UTF8 Validation Throughput on Arm
0
100
200
300
400
Original Optimized
UTF8 Encoding Throughput on Arm
MB/s MB/s
- 6. 6 © 2021 Arm
Ceph + ISA-L
• ISA-L offload support for Compression,Encryption and so on in Ceph
• Added support of CRC, IGZIP, RAID, AES-GCM, Multibuffer
MD5/SHA1/SHA256/SM3/SHA512,Multi-Hash SHA1/SHA256/, RollingHash using Arm
specific features and instructions
• Added multi-binary support for arm64
• Currently working on: AES-XTS,Multi-Hash Sha1+Murmur3
• Refer to isa-l/isa-l_crypto github for details. Code put under aarch64/.
• Jerry.yu@arm.com
- 7. 7 © 2021 Arm
Ceph with 64K Kernel Page
• ARM has 64K kernel page support , large page size may benefit ceph in
• Improve TLB hit rate and reduce page table look up effort
• Smaller page table occupies smaller memory space
• Less page table levels, better VA->PA translation speed
• Test configuration
• Ceph cluster
– Ceph 15.2.11, SPDK (06d09c1108b1)
– 1 MON, 1 MGR, 3 OSD
– One P4610 NVMe per OSD
– CPU: 2.8GHz, 128GB DDR4-3200,multi-core
– kernel: linux5.8.0
• Client: CPU:2.8GHz, multi-core ,kernel: linux5.8.0
• Test tool: Fio (v3.16)
• Test case
– Sequential read/write and random read/write with
4/16/64/256/4096K block size, 4KB /64KB kernel pagesize
MON
OSD
1
OSD
2
OSD
3
client
Ceph cluster
MGR
- 8. 8 © 2021 Arm
Ceph with 64K kernel page – Benchmark
ceph cluster side : 4 cores activated
86.22
247.2
1018.4
2573.5
2873.5
89.14
268.9
1131.9
2724.1
3134.9
0
500
1000
1500
2000
2500
3000
3500
4K 16K 64K 256K 4M
BW(MiB/s)
block size
Ceph RBD read bandwidth
4KB page 64KB page
219.2
445
661
862.1
934.1
237.5
514.9
805.8
998.3 1025.9
0
200
400
600
800
1000
1200
4K 16K 64K 256K 4M
BW(MiB/s)
block size
Ceph RBD write bandwidth
4KB page 64KB page
129.9
472.8
1405.8
2646
2811.8
138
511
1472
2867.4
3092.7
0
500
1000
1500
2000
2500
3000
3500
4K 16K 64K 256K 4M
BW(MiB/s)
block size
Ceph RBD randread bandwidth
4KB page 64KB page
20.25
72.68
222.1
500.8
927.9
21.45
79.32
256.3
593.9
1024.3
0
200
400
600
800
1000
1200
4K 16K 64K 256K 4M
BW(MiB/s)
block size
Ceph RBD readwrite bandwidth
4KB page 64KB page
3.39%
9.1%
8.35%
9.83%
11.11%
21.91%
9.99%
6.24% 5.93%
10.39%
15.4%
• Read
3.39%~11.11%
• write
8.35%~21.91%
• Randread
6.24% ~ 9.99%
• Randwrite
5.93%~15.4%
- 9. 9 © 2021 Arm
Ceph with 64K kernel page – Benchmark
ceph cluster side : 2 cores activated
79.56
222.2
692.4
1433.7
2135
77.36
239
721.5
1547.4
2365.8
0
500
1000
1500
2000
2500
4K 16K 64K 256K 4M
BW(MiB/s)
block size
Ceph RBD read bandwidth
4KB page 64KB page
174.5
318
422.8
472.3
506.4
194.1
379.7
532.2
604.4
664.5
0
100
200
300
400
500
600
700
4K 16K 64K 256K 4M
BW(MiB/s)
block size
Ceph RBD write bandwidth
4KB page 64KB page
70.17
249.2
739.2
1463.6
2096.5
74.34
270.3
793.4
1606.1
2299.3
0
500
1000
1500
2000
2500
4K 16K 64K 256K 4M
BW(MiB/s)
block size
Ceph RBD randread bandwidth
4KB page 64KB page
10.33 34.94
109.3
262.7
506.4
11.57 42.12
132.9
320.1
674
0
100
200
300
400
500
600
700
800
4K 16K 64K 256K 4M
BW(MiB/s)
block size
Ceph RBD randwrite bandwidth
4KB page 64KB page
10.81%
7.93%
4.2%
31.22%
27.97%
25.88%
11.83%
19.4%
5.94%
33.1%
12.1%
9.67%
- 10. 10 © 2021 Arm
Other Optimizations – Investigating
• We are investigating new optimization points
• Leverage SVE /SVE2 (Scalable vector extension)
• Leverage Non-Temporal instructions to prevent cache pollution
• BlueStore: RocksDB compression lib optimization
- 11. 11 © 2021 Arm
Agenda
• Overview
• Ceph common libs optimization
• Ceph storage acceleration
• SPDK
• Seastar
• Bring Ceph to the Cloud
- 12. 12 © 2021 Arm
Ceph + spdk
• SPDK can be used to accelerate the block
service built on Ceph
• Use SPDK’s user space NVMe driver instead of
Kernel NVMe driver in Bluestore
• SPDK iscsi tgt or nvme-of target to accelerate client
IO performance on ceph cluster by introducing some
caching solution
(https://www.snia.org/sites/default/files/SDC/2017/presentation
s/Storage_Architecture/Yang_Zie_Accelerate_block_service_built
_on_Ceph_via_SPDK.pdf)
SPDK iSCSI target SPDK NVMe-oF target
SPDK Ceph RBD bdev module(use librbd/librados)
Ceph RBD service
filestore Bluestore KVStore
metadata
RocksDB
BlueRocksEnv
Bluefs
Kernel driver
NVMe device
metadata
RocksDB
BlueRocksENV
Bluefs
SPDK NVMe
driver
NVMe device
Export block service
- 13. 13 © 2021 Arm
Optimization in SPDK
• Optimize base64 with ARM NEON intrinsics
• Up to 2.3x speedup for encoding ,1.7x speedup for decoding
• Memory barrier
• Arm has a weak memory model
– place fewer constraints on partners and provides a lot of scope for hardware optimization
• Use more precise constraint to make fewer instructions outside our code sequence affected by our barriers:
– ISB > DSB > DMB
– Half barrier: load-acquire, store-release
– For C11: Use __atomic built-ins instead of __sync built-ins to do atomic add/sub/inc/dec/cmp ··· operations
▪ __sync builtins enforce unnecessary full barriers on some use cases
▪ __atomic ones which conform to C11 memory model have finer memory order control (__ATOMIC_RELAXED
/__ATOMIC_ACQUIRE /__ATOMIC_RELEASE…)
• NVMe over fabric optimization
• In NVMe over tcp, leverage TCP’s SO_INCOMING_CPU feature to distribute the processing of TCP
connections to specific CPUs, aim to use CPU cache locality.
- 14. 14 © 2021 Arm
Ceph + SPDK performance
228.3 229.7
151.6
145.5
216.1 215.1
248
210.2
0
50
100
150
200
250
300
0
10
20
30
40
50
60
70
bluestore
bluestore+spdk
bluestore
bluestore+spdk
bluestore
bluestore+spdk
bluestore
bluestore+spdk
randread randwrite read write
BW(MiB/s)
IOPS(K)
Ceph RBD performance with qdepth=256, remote client
iops BW
• Ceph cluster
• Ceph 15.2.11, SPDK (06d09c1108b1)
• 1 MON, 1 MGR, 3 OSD
• One P4610 NVMe per OSD
• CPU: 2.8GHz, multi-core
• kernel: linux5.8.0
• Client
• CPU:2.8GHz, 128GB DDR4-3200 multi-core
• kernel: linux5.8.0
• Test tool
• Fio (v3.16)
• Test case
• Sequential read/write and random
read/write with 4KB block size
- 15. 15 © 2021 Arm
Agenda
• Overview
• Ceph common libs optimization
• Ceph storage acceleration
• SPDK
• Seastar
• Bring Ceph to the Cloud
- 16. 16 © 2021 Arm
Ceph OSD on Seastar
• Next generation Ceph OSD
• Targeting fast networking and storage
• Based on Seastar high-performance server application framework
• Seastar
• Each core runs a shared-nothing run-to-completion task scheduler
• Natural fit for Arm architecture with many cores
Kernel
Thread
Thread
Thread
OS
Thread
Scheduler Memory Network
epoll
Traditional application
NIC queues
Seastar application
Dedicated core and
memory for each shard
User mode scheduler,
memory management
and networking
DPDK
Linux AIO
IO_URING
Disk IO
Network IO
Tasks
Tasks
Tasks
Tasks
3
Heap-3
DPDK
Linux AIO
IO_URING
Disk IO
Network IO
Tasks
Tasks
Tasks
Tasks
2
Heap-2
DPDK
Linux AIO
IO_URING
Disk IO
Network IO
Tasks
Tasks
Tasks
Tasks
1
Heap-1
DPDK
Linux AIO
IO_URING
Network IO
Disk IO
Tasks
Tasks
Tasks
Seastar
Task
Shard 0
Bypass Kernel
- 17. 17 © 2021 Arm
Seastar work on Arm64
• Upgrade Seastar DPDK to leverage new hardware
• Seastar hacks DPDK mbuf to achieve zero copy between SeaStar heap and NIC;
Leverage VFIO+IOMMU to use IOVA, not PA(/proc/self/pagemap), for DMA;
Direct map(IOVA=VA) each core’s whole heap space at startup.
• Upgrade to new offload API
• Other updates:
• Fixed network stack crash issue on Arm
caused by "func(free(ptr), use(ptr))",
on x86, parameters are passed from right to left in stack pushing order;
on Arm, parameters are passed from left to right to leverage
abundant general purpose registers, free(ptr) will be called before use(ptr),
which leads to crash.
https://github.com/scylladb/seastar/commit/6d82ee6797fcfbe3b65cfae2b4468ee68efacd48
…
IOMMU
/proc/$$/pagemap
DMA
DMA
PA
VA
VA
IOVA
NIC SeaStar
MMU
- 18. 18 © 2021 Arm
Seastar Benchmark On Arm
• Benchmark http throughput vs Arm core count
• Performance improves linearly per core count
#cores requests/sec
4 368,113
8 767,244
16 1,529,532
32 2,939,638
48 3,651,702
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
4,000,000
0 10 20 30 40 50
#cores
requests/s
References https://github.com/scylladb/seastar/wiki/HTTPD-benchmark
Tested on Arm64 sever with Seastar
httpd benchmark tool and DPDK 19.05
- 19. 19 © 2021 Arm
Agenda
• Overview
• Ceph common libs optimization
• Ceph storage acceleration
• SPDK
• Seastar
• Bring Ceph to the Cloud
- 20. 20 © 2021 Arm
Ceph in Cloud Storage
• Ceph as OpenStack storage backend is
mature on Arm
• Object storage (swift),Block storage (Cinder),
Image service (Glance)
• Support Ceph as Kubernetes container
cloud storage backend on Arm
• official support for some critical container
images for Arm(e.g. Kubernetes CSI sidecar
images)
• Added Arm image support to Ceph-CSI
community, added Arm jobs in Community CI
• Support Rook on Arm64 (K8s+rook+ceph)
K8s node
OSD
K8s node
OSD
K8s node
OSD
K8s node
MON
RBD CEPHFS
K8s node
CEPH CSI
App
K8s node
App
PVC PVC
PV PV
- 21. 21 © 2021 Arm
If you have any questions, please send emails to the following mailbox, we will response as soon as possible.
richael.zhuang@arm.com
- 22. © 2021 Arm
Thank You
Danke
Gracias
谢谢
ありがとう
Asante
Merci
감사합니다
धन्यवाद
Kiitos
اًشكر
ধন্যবাদ
תודה