SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
Reddy Chagam – Principal Engineer, Storage Architect
Stephen L Blinick – Senior Cloud Storage Performance Engineer
Acknowledgments: Warren Wang, Anton Thaker (WalMart)
Orlando Moreno, Vishal Verma (Intel)
Intel technologies’ features and benefits depend on system configuration and may require
enabled hardware, software or service activation. Performance varies depending on system
configuration. No computer system can be absolutely secure. Check with your system
manufacturer or retailer or learn more at http://intel.com.
Software and workloads used in performance tests may have been optimized for performance
only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions.
Any change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases,
including the performance of that product when combined with other products.
§ Configurations: Ceph v0.94.3 Hammer Release, CentOS 7.1, 3.10-229 Kernel, Linked with JEMalloc 3.6, CBT used for testing and data acquisition, OSD
System Config: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 72 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4, Each system with 4x P3700 800GB NVMe, partitioned into 4
OSD’s each, 16 OSD’s total per node, FIO Client Systems: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 72 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4, Single 10GbE
network for client & replication data transfer. FIO 2.2.8 with LibRBD engine. Tests run by Intel DCG Storage Group iin Intel lab. Ceph configuration and CBT YAML file
provided in backup slides.
§ For more information go to http://www.intel.com/performance.
Intel, Intel Inside and the Intel logo are trademarks of Intel Corporation in the United States and
other countries. *Other names and brands may be claimed as the property of others.
© 2015 Intel Corporation.
12
DCG Storage Group 13
Agenda
• The transition to flash and the impact of NVMe
• NVMe technology with Ceph
• Cassandra & Ceph – a case for storage convergence
• The all-NVMe high-density Ceph Cluster
• Raw performance measurements and observations
• Examining performance of a Cassandra DB like workload
DCG Storage Group
Evolution of Non-Volatile Memory Storage Devices
PCIe
NVMe
10s us
>10 DW/day
<10 DW/day
100s K
10s K
PCIe NVMe
GB/s
SATA/SAS
SSDs
~100s MB/s
HDDs
~sub 100
MB/s
SATA/SAS
SSDs
100s us
HDDs
~ ms
IOPsEndurance
4K Read Latency
PCI Express® (PCIe)
NVM Express™ (NVMe)
3D XPoint™
DIMMs
3D XPoint
NVM SSDs
NVM Plays a Key Role in Delivering Performance for latency sensitive workloads
DCG Storage Group 15
Ceph Workloads
StoragePerformance
(IOPS,Throughput)
Storage Capacity
(PB)
Lower Higher
LowerHigher
Boot
Volumes
CDN
Enterprise
Dropbox
Backup,
Archive
Remote
Disks
VDI
App
Storage
BigData
Mobile
Content
Depot
Databases
Block
Object
NVM
Focus
Test &
Dev
Cloud
DVR
HPC
DCG Storage Group 16
Caching
Ceph - NVM Usages
Virtual Machine
Baremetal
RADOS
Node
Hypervisor
Guest
VM
Qemu/VirtioQemu/Virtio
ApplicationApplication
Kernel
User
RBD DriverRBD Driver
RADOSRADOS
ApplicationApplication
RADOS
Protocol
RADOS
Protocol
RBDRBD
RADOSRADOS
RADOS Protocol RADOS Protocol
OSDOSD
JournalJournal FilestoreFilestore
NVMNVM
File SystemFile System
10GbE
Client caching w/
write through
NVM
NVMNVM
NVM
NVMNVM
Journaling
Read cache
OSD data
DCG Storage Group 17
Cassandra – What and Why?
Cassandra Ring
p1
p1
p20
p5
p3
p6
p5
p2 p4p8
p10
p7
Client
• Cassandra is column-oriented NoSQL DB with CQL
interface
 Each row has unique key which is used for partitioning
 No relations
 A row can have multiple columns – not necessarily same no. of
columns
• Open source, distributed, decentralized, highly available,
linearly scalable, multi DC, …..
• Used for analytics, real-time insights, fraud-detection,
IOT/sensor data, messaging etc.
Usecases: http://www.planetcassandra.org/apachecassandra-use-cases/
• Ceph is a popular open source unified storage platform
• Many large scale Ceph deployments in production
• End customers prefer converged infrastructure to
support multiple workloads (e.g. analytics) to achieve
CapEx, OpEx savings
• Several customers are asking for Cassandra workload on
Ceph
DCG Storage Group
IP Fabric
18
Ceph and Cassandra Integration
Virtual Machine
Hypervisor
Guest VM
Qemu/VirtioQemu/Virtio
ApplicationApplication
RBDRBD
RADOSRADOS
CassandraCassandra
Virtual Machine
Hypervisor
Guest VM
Qemu/VirtioQemu/Virtio
ApplicationApplication
RBDRBD
RADOSRADOS
CassandraCassandra
Virtual Machine
Hypervisor
Guest VM
Qemu/VirtioQemu/Virtio
ApplicationApplication
RBDRBD
RADOSRADOS
CassandraCassandra
Ceph Storage Cluster
SSD SSD
OSDOSDOSDOSD OSDOSD
SSD SSD
OSDOSDOSDOSD OSDOSD
SSD SSD
OSDOSDOSDOSD OSDOSD
SSD SSD
OSDOSDOSDOSD OSDOSD
MON MON
Deployment Considerations
• Bootable Ceph volumes
(OS & Cassandra data)
• Cassandra RBD data
volumes
• Data protection
(Cassandra or Ceph)
DCG Storage Group
Ceph Storage Cluster
Hardware Environment Overview
Ceph network (192.168.142.0/24) - 10Gbps
CBT / Zabbix /
Monitoring
CBT / Zabbix /
Monitoring FIO RBD ClientFIO RBD Client
• OSD System Config: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 72 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4
• Each system with 4x P3700 800GB NVMe, partitioned into 4 OSD’s each, 16 OSD’s total per node
• FIO Client Systems: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 72 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4
• Ceph v0.94.3 Hammer Release, CentOS 7.1, 3.10-229 Kernel, Linked with JEMalloc 3.6
• CBT used for testing and data acquisition
• Single 10GbE network for client & replication data transfer, Replication factor 2
• OSD System Config: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 72 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4
• Each system with 4x P3700 800GB NVMe, partitioned into 4 OSD’s each, 16 OSD’s total per node
• FIO Client Systems: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 72 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4
• Ceph v0.94.3 Hammer Release, CentOS 7.1, 3.10-229 Kernel, Linked with JEMalloc 3.6
• CBT used for testing and data acquisition
• Single 10GbE network for client & replication data transfer, Replication factor 2
FIO RBD ClientFIO RBD Client
FIO RBD ClientFIO RBD Client
FIO RBD ClientFIO RBD Client
FIO RBD ClientFIO RBD Client
FIO RBD ClientFIO RBD Client
FatTwin (4x dual-socket XeonE5 v3)
FatTwin (4x dual-socket XeonE5 v3)
CephOSD1CephOSD1
NVMe1NVMe1 NVMe3NVMe3
NVMe2NVMe2 NVMe4NVMe4
CephOSD2CephOSD2
CephOSD3CephOSD3
CephOSD4CephOSD4
CephOSD16CephOSD16
…
CephOSD1CephOSD1
NVMe1NVMe1 NVMe3NVMe3
NVMe2NVMe2 NVMe4NVMe4
CephOSD2CephOSD2
CephOSD3CephOSD3
CephOSD4CephOSD4
CephOSD16CephOSD16
…
CephOSD1CephOSD1
NVMe1NVMe1 NVMe3NVMe3
NVMe2NVMe2 NVMe4NVMe4
CephOSD2CephOSD2
CephOSD3CephOSD3
CephOSD4CephOSD4
CephOSD16CephOSD16
…
CephOSD1CephOSD1
NVMe1NVMe1 NVMe3NVMe3
NVMe2NVMe2 NVMe4NVMe4
CephOSD2CephOSD2
CephOSD3CephOSD3
CephOSD4CephOSD4
CephOSD16CephOSD16
…
CephOSD1CephOSD1
NVMe1NVMe1 NVMe3NVMe3
NVMe2NVMe2 NVMe4NVMe4
CephOSD2CephOSD2
CephOSD3CephOSD3
CephOSD4CephOSD4
CephOSD16CephOSD16
…
SuperMicro 1028U SuperMicro 1028U SuperMicro 1028U SuperMicro 1028U SuperMicro 1028U
Intel Xeon E5 v3 18 Core CPUs
Intel P3700 NVMe PCI-e Flash
Intel Xeon E5 v3 18 Core CPUs
Intel P3700 NVMe PCI-e Flash
Easily serviceable NVMe Drives
DCG Storage Group
• High performance NVMe devices are capable of high parallelism at low latency
• DC P3700 800GB Raw Performance: 460K read IOPS & 90K Write IOPS at QD=128
• By using multiple OSD partitions, Ceph performance scales linearly
• Reduces lock contention within a single OSD process
• Lower latency at all queue-depths, biggest impact to random reads
• Introduces the concept of multiple OSD’s on the same physical device
• Conceptually similar crushmap data placement rules as managing disks in an enclosure
• High Resiliency of “Data Center” Class NVMe devices
• At least 10 Drive writes per day
• Power loss protection, full data path protection, device level telemetry
Multi-partitioning flash devices
NVMe1NVMe1
CephOSD1CephOSD1
CephOSD2CephOSD2
CephOSD3CephOSD3
CephOSD4CephOSD4
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or
software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark
parameters.
DCG Storage Group 21
Partitioning multiple OSD’s per NVMe
• Multiple OSD’s per NVMe result in higher performance, lower latency, and better CPU utilization
0
2
4
6
8
10
12
0 200,000 400,000 600,000 800,000 1,000,000 1,200,000
AvgLatency(ms)
IOPS
Latency vs IOPS - 4K Random Read - Multiple OSD's per Device comparison
5 nodes, 20/40/80 OSDs, Intel DC P3700 Xeon E5 2699v3 Dual Socket /
128GB Ram / 10GbE
Ceph0.94.3 w/ JEMalloc,
1 OSD/NVMe 2 OSD/NVMe 4 OSD/NVMe
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or
software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark
parameters.
0
10
20
30
40
50
60
70
80
90
%CPUUtilization
Single Node CPU Utilization Comparison - 4K Random Reads@QD32
4/8/16 OSDs, Intel DC P3700, Xeon E5 2699v3 Dual Socket /
128GB Ram / 10GbE
Ceph0.94.3 w/ JEMalloc
1 OSD/NVMe 2 OSD/NVMe 4 OSD/NVMe
Single OSD
Double OSD
Quad OSD
DCG Storage Group
4K Random Read & Write Performance Summary
22
First Ceph cluster to break 1 Million 4K random IOPS
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or
software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark
parameters.
Workload Pattern Max IOPS
4K 100% Random Reads (2TB Dataset)
1.35Million
4K 100% Random Reads (4.8TB Dataset)
1.15Million
4K 100% Random Writes (4.8TB Dataset)
200K
4K 70%/30% Read/Write OLTP Mix
(4.8TB Dataset) 452K
DCG Storage Group
0
1
2
3
4
5
6
7
8
9
10
0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 1,400,000
AvgLatency(ms)
IOPS
IODepth Scaling - Latency vs IOPS - Read, Write, and 70/30 4K Random Mix
5 nodes, 60 OSDs, Xeon E5 2699v3 Dual Socket / 128GB Ram / 10GbE
Ceph0.94.3 w/ JEMalloc
100% 4K RandomRead 100% 4K RandomWrite 70/30% 4K Random OLTP 100% 4K RandomRead - 2TB DataSet
4K Random Read & Write Performance and Latency
23
First Ceph cluster to break 1 Million 4K random IOPS, ~1ms response time
171K 100% 4k Random
Write IOPS @ 6ms
400K 70/30% (OLTP) 4k
Random IOPS @~3ms
1M 100% 4k Random
Read IOPS @~1.1ms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or
software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark
parameters.
1.35M 4k Random Read
IOPS w/ 2TB Hot Data
DCG Storage Group
Sequential performance (512KB)
24
• With 10gbE per node, both writes and reads are achieving line rate bottlenecked by the OSD node single interface.
• Higher throughputs would be possible through bonding or 40GbE connectivity.
3,214
5,888 5,631
0
1000
2000
3000
4000
5000
6000
7000
100% Write 100% Read 70/30% R/W Mix
MB/s
512k Sequential Performance Bandwidth
5 nodes, 80 OSDs, DC P3700, Xeon E5 2699v3 Dual Socket / 128GB
Ram / 10GbE
Ceph0.94.3 w/ JEMalloc
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or
software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark
parameters.
DCG Storage Group
Cassandra-like workload
25
242K IOPS at < 2ms latency
• Based on a typical customer cassanda workload profile
• 50% Reads and 50% Writes, predominantly 8K Reads and 12K Writes, FIO Queue depth = 8
78%
19%
3%
8K 5K 7K
92%
5%
12K 33K 115K 50K 80K
0
0.5
1
1.5
2
2.5
0.00
50,000.00
100,000.00
150,000.00
200,000.00
250,000.00
300,000.00
Latency(ms)
IOPS
Cassandra like workload - 50/50 Read/Write Mix
5 nodes, 80 OSDs, Xeon E5 2699v3 Dual Socket / 128GB
Ram / 10GbE
Ceph0.94.3 w/ JEMalloc
IOPS Latency
IO-Size Breakdown
Reads Writes
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or
software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark
parameters.
DCG Storage Group 26
Summary & Conclusions
• Flash technology including NVMe enables new performance capabilities in small
footprints
• Ceph and Cassandra provide a compelling case for feature-rich converged
storage that can support latency sensitive analytics workloads
• Using the latest standard high-volume servers and Ceph, you can now build an
open, high density, scalable, high performance cluster that can handle a low-
latency mixed workload.
• Ceph performance improvements over recent releases are significant, and today
over 1 Million random IOPS is achievable in 5U with ~1ms latency.
• Next steps:
• Address small block write performance, limited by Filestore backend
• Improve long tail latency for transactional workloads
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or
software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark
parameters.
Thank you!
28
DCG Storage Group 29
Configuration Detail – ceph.conf
Section Perf. Tuning Parameter Default Tuned
[global]
Authentication
auth_client_required cephx none
auth_cluster_required cephx none
auth_service_required cephx none
Debug logging
debug_lockdep 0/1 0/0
debug_context 0/1 0/0
debug_crush 1/1 0/0
debug_buffer 0/1 0/0
debug_timer 0/1 0/0
debug_filer 0/1 0/0
debug_objector 0/1 0/0
debug_rados 0/5 0/0
debug_rbd 0/5 0/0
debug_ms 0/5 0/0
debug_monc 0/5 0/0
debug_tp 0/5 0/0
debug_auth 1/5 0/0
debug_finisher 1/5 0/0
debug_heartbeatmap 1/5 0/0
debug_perfcounter 1/5 0/0
debug_rgw 1/5 0/0
debug_asok 1/5 0/0
debug_throttle 1/1 0/0
DCG Storage Group 30
Configuration Detail – ceph.conf (continued)Section Perf. Tuning Parameter Default Tuned
[global]
CBT specific
mon_pg_warn_max_object_skew 10 10000
mon_pg_warn_min_per_osd 0 0
mon_pg_warn_max_per_osd 32768 32768
osd_pg_bits 8 8
osd_pgp_bits 8 8
RBD cache rbd_cache true true
Other
mon_compact_on_trim true false
log_to_syslog false false
log_file /var/log/ceph/$name.log /var/log/ceph/$name.log
perf true true
mutex_perf_counter false true
throttler_perf_counter true false
[mon] CBT specific
mon_data /var/lib/ceph/mon/ceph-0 /home/bmpa/tmp_cbt/ceph/mon.$id
mon_max_pool_pg_num 65536 166496
mon_osd_max_split_count 32 10000
[osd]
Filestore parameters
filestore_wbthrottle_enable true false
filestore_queue_max_bytes 104857600 1048576000
filestore_queue_committing_max_bytes 104857600 1048576000
filestore_queue_max_ops 50 5000
filestore_queue_committing_max_ops 500 5000
filestore_max_sync_interval 5 10
filestore_fd_cache_size 128 64
filestore_fd_cache_shards 16 32
filestore_op_threads 2 6
Mount parameters
osd_mount_options_xfs rw,noatime,inode64,logbsize=256k,delaylog
osd_mkfs_options_xfs -f -i size=2048
Journal parameters
journal_max_write_entries 100 1000
journal_queue_max_ops 300 3000
journal_max_write_bytes 10485760 1048576000
journal_queue_max_bytes 33554432 1048576000
Op tracker osd_enable_op_tracker true false
OSD client
osd_client_message_size_cap 524288000 0
osd_client_message_cap 100 0
Objecter
objecter_inflight_ops 1024 102400
objecter_inflight_op_bytes 104857600 1048576000
Throttles ms_dispatch_throttle_bytes 104857600 1048576000
OSD number of threads
osd_op_threads 2 32
osd_op_num_shards 5 5
osd_op_num_threads_per_shard 2 2
DCG Storage Group 31
Configuration Detail - CBT YAML File
cluster:
user: "bmpa"
head: "ft01"
clients: ["ft01", "ft02", "ft03", "ft04", "ft05", "ft06"]
osds: ["hswNode01", "hswNode02", "hswNode03", "hswNode04", "hswNode05"]
mons:
ft02:
a: "192.168.142.202:6789"
osds_per_node: 8
fs: xfs
mkfs_opts: '-f -i size=2048 -n size=64k'
mount_opts: '-o inode64,noatime,logbsize=256k'
conf_file: '/home/bmpa/cbt/ceph_nvme_2partition_5node_hsw.conf'
use_existing: False
rebuild_every_test: False
clusterid: "ceph"
iterations: 1
tmp_dir: "/home/bmpa/tmp_cbt"
pool_profiles:
2rep:
pg_size: 4096
pgp_size: 4096
replication: 2
DCG Storage Group 32
Configuration Detail - CBT YAML File (Continued)
benchmarks:
librbdfio:
time: 300
ramp: 600
vol_size: 81920
mode: ['randrw‘]
rwmixread: [0, 70, 100]
op_size: [4096]
procs_per_volume: [1]
volumes_per_client: [10]
use_existing_volumes: False
iodepth: [4, 8, 16, 32, 64, 96, 128]
osd_ra: [128]
norandommap: True
cmd_path: '/usr/bin/fio'
pool_profile: '2rep'
log_avg_msec: 250
DCG Storage Group 33
Storage Node Diagram
Two CPU Sockets: Socket 0 and Socket 1
 Socket 0
• 2 NVMes
• Intel X540-AT2 (10Gbps)
• 64GB: 8x 8GB 2133 DIMMs
 Socket 1
• 2 NVMes
• 64GB: 8x 8GB 2133 DIMMs
Explore additional
optimizations using
cgroups, IRQ affinity
DCG Storage Group
• Generally available server designs built for high density and high performance
• High density 1U standard high volume server
• Dual socket 3rd Generation Xeon E5 (2699v3)
• 10 Front-removable 2.5” Formfactor Drive slots, 8639 connector
• Multiple 10Gb network ports, additional slots for 40Gb networking
• Intel DC P3700 NVMe drives are available in 2.5” drive form-factor
• Allowing easier service in a datacenter environment
High Performance Ceph Node Hardware Building
Blocks
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS

Mais conteúdo relacionado

Mais procurados

Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
Vmware Tanzu Kubernetes Connect(Spanish)
Vmware Tanzu Kubernetes Connect(Spanish)Vmware Tanzu Kubernetes Connect(Spanish)
Vmware Tanzu Kubernetes Connect(Spanish)GabrielaRodriguez182401
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...DataStax
 
Introduction to Ansible
Introduction to AnsibleIntroduction to Ansible
Introduction to AnsibleKnoldus Inc.
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesMydbops
 
VMworld 2017 vSAN Network Design
VMworld 2017 vSAN Network Design VMworld 2017 vSAN Network Design
VMworld 2017 vSAN Network Design Cormac Hogan
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle CoherenceBen Stopford
 
OVN Controller Incremental Processing
OVN Controller Incremental ProcessingOVN Controller Incremental Processing
OVN Controller Incremental ProcessingHan Zhou
 
OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep diveTrinath Somanchi
 
GIT: Content-addressable filesystem and Version Control System
GIT: Content-addressable filesystem and Version Control SystemGIT: Content-addressable filesystem and Version Control System
GIT: Content-addressable filesystem and Version Control SystemTommaso Visconti
 
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpPushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpJames Denton
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseAltinity Ltd
 
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...Brian Grant
 
Large scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutionsLarge scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutionsHan Zhou
 
Red hat ansible automation technical deck
Red hat ansible automation technical deckRed hat ansible automation technical deck
Red hat ansible automation technical deckJuraj Hantak
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
Understanding Storage I/O Under Load
Understanding Storage I/O Under LoadUnderstanding Storage I/O Under Load
Understanding Storage I/O Under LoadScyllaDB
 
15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG
15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG
15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUGSandesh Rao
 
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Sean Cohen
 

Mais procurados (20)

Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Vmware Tanzu Kubernetes Connect(Spanish)
Vmware Tanzu Kubernetes Connect(Spanish)Vmware Tanzu Kubernetes Connect(Spanish)
Vmware Tanzu Kubernetes Connect(Spanish)
 
Windows 2019
Windows 2019Windows 2019
Windows 2019
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
 
Introduction to Ansible
Introduction to AnsibleIntroduction to Ansible
Introduction to Ansible
 
Evolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best PracticesEvolution of MongoDB Replicaset and Its Best Practices
Evolution of MongoDB Replicaset and Its Best Practices
 
VMworld 2017 vSAN Network Design
VMworld 2017 vSAN Network Design VMworld 2017 vSAN Network Design
VMworld 2017 vSAN Network Design
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
OVN Controller Incremental Processing
OVN Controller Incremental ProcessingOVN Controller Incremental Processing
OVN Controller Incremental Processing
 
OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep dive
 
GIT: Content-addressable filesystem and Version Control System
GIT: Content-addressable filesystem and Version Control SystemGIT: Content-addressable filesystem and Version Control System
GIT: Content-addressable filesystem and Version Control System
 
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpPushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
 
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
 
Large scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutionsLarge scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutions
 
Red hat ansible automation technical deck
Red hat ansible automation technical deckRed hat ansible automation technical deck
Red hat ansible automation technical deck
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Understanding Storage I/O Under Load
Understanding Storage I/O Under LoadUnderstanding Storage I/O Under Load
Understanding Storage I/O Under Load
 
15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG
15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG
15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG
 
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
 

Destaque

Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsRed_Hat_Storage
 
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...Spark Summit
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Haoyuan Li
 
Linux Filesystems, RAID, and more
Linux Filesystems, RAID, and moreLinux Filesystems, RAID, and more
Linux Filesystems, RAID, and moreMark Wong
 
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...Spark Summit
 
The Hot Rod Protocol in Infinispan
The Hot Rod Protocol in InfinispanThe Hot Rod Protocol in Infinispan
The Hot Rod Protocol in InfinispanGalder Zamarreño
 
Advanced Data Retrieval and Analytics with Apache Spark and Openstack Swift
Advanced Data Retrieval and Analytics with Apache Spark and Openstack SwiftAdvanced Data Retrieval and Analytics with Apache Spark and Openstack Swift
Advanced Data Retrieval and Analytics with Apache Spark and Openstack SwiftDaniel Krook
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMfnothaft
 
ELC-E 2010: The Right Approach to Minimal Boot Times
ELC-E 2010: The Right Approach to Minimal Boot TimesELC-E 2010: The Right Approach to Minimal Boot Times
ELC-E 2010: The Right Approach to Minimal Boot Timesandrewmurraympc
 
Velox: Models in Action
Velox: Models in ActionVelox: Models in Action
Velox: Models in ActionDan Crankshaw
 
Naïveté vs. Experience
Naïveté vs. ExperienceNaïveté vs. Experience
Naïveté vs. ExperienceMike Fogus
 
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at ScaleSparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scalejeykottalam
 
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS StackSampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stackjeykottalam
 
A Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and ConcurrencyA Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and ConcurrencyDavid Beazley (Dabeaz LLC)
 
Lab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetLab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetZubair Nabi
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark Summit
 
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopHortonworks
 

Destaque (20)

Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
 
Open Stack Cheat Sheet V1
Open Stack Cheat Sheet V1Open Stack Cheat Sheet V1
Open Stack Cheat Sheet V1
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5
 
Linux Filesystems, RAID, and more
Linux Filesystems, RAID, and moreLinux Filesystems, RAID, and more
Linux Filesystems, RAID, and more
 
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
 
The Hot Rod Protocol in Infinispan
The Hot Rod Protocol in InfinispanThe Hot Rod Protocol in Infinispan
The Hot Rod Protocol in Infinispan
 
Advanced Data Retrieval and Analytics with Apache Spark and Openstack Swift
Advanced Data Retrieval and Analytics with Apache Spark and Openstack SwiftAdvanced Data Retrieval and Analytics with Apache Spark and Openstack Swift
Advanced Data Retrieval and Analytics with Apache Spark and Openstack Swift
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
ELC-E 2010: The Right Approach to Minimal Boot Times
ELC-E 2010: The Right Approach to Minimal Boot TimesELC-E 2010: The Right Approach to Minimal Boot Times
ELC-E 2010: The Right Approach to Minimal Boot Times
 
Velox: Models in Action
Velox: Models in ActionVelox: Models in Action
Velox: Models in Action
 
Naïveté vs. Experience
Naïveté vs. ExperienceNaïveté vs. Experience
Naïveté vs. Experience
 
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at ScaleSparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scale
 
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS StackSampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stack
 
OpenStack Cheat Sheet V2
OpenStack Cheat Sheet V2OpenStack Cheat Sheet V2
OpenStack Cheat Sheet V2
 
A Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and ConcurrencyA Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and Concurrency
 
Lab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetLab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using Mininet
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
 
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache Hadoop
 
Python in Action (Part 2)
Python in Action (Part 2)Python in Action (Part 2)
Python in Action (Part 2)
 

Semelhante a Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS

Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...Michelle Holley
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Community
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Community
 
Ceph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Community
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephDanielle Womboldt
 
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergenceDAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergenceinside-BigData.com
 
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red_Hat_Storage
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Community
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Community
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Odinot Stanislas
 
Impact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPCImpact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPCMemVerge
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Community
 
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Community
 
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Community
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Community
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red_Hat_Storage
 
JetStor portfolio update final_2020-2021
JetStor portfolio update final_2020-2021JetStor portfolio update final_2020-2021
JetStor portfolio update final_2020-2021Gene Leyzarovich
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...Shuquan Huang
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructurexKinAnx
 

Semelhante a Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS (20)

Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
 
Ceph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in Ceph
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for Ceph
 
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergenceDAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
 
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
 
Impact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPCImpact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPC
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
 
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
 
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
 
JetStor portfolio update final_2020-2021
JetStor portfolio update final_2020-2021JetStor portfolio update final_2020-2021
JetStor portfolio update final_2020-2021
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructure
 

Último

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Último (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS

  • 1. Reddy Chagam – Principal Engineer, Storage Architect Stephen L Blinick – Senior Cloud Storage Performance Engineer Acknowledgments: Warren Wang, Anton Thaker (WalMart) Orlando Moreno, Vishal Verma (Intel)
  • 2. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at http://intel.com. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. § Configurations: Ceph v0.94.3 Hammer Release, CentOS 7.1, 3.10-229 Kernel, Linked with JEMalloc 3.6, CBT used for testing and data acquisition, OSD System Config: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 72 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4, Each system with 4x P3700 800GB NVMe, partitioned into 4 OSD’s each, 16 OSD’s total per node, FIO Client Systems: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 72 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4, Single 10GbE network for client & replication data transfer. FIO 2.2.8 with LibRBD engine. Tests run by Intel DCG Storage Group iin Intel lab. Ceph configuration and CBT YAML file provided in backup slides. § For more information go to http://www.intel.com/performance. Intel, Intel Inside and the Intel logo are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others. © 2015 Intel Corporation. 12
  • 3. DCG Storage Group 13 Agenda • The transition to flash and the impact of NVMe • NVMe technology with Ceph • Cassandra & Ceph – a case for storage convergence • The all-NVMe high-density Ceph Cluster • Raw performance measurements and observations • Examining performance of a Cassandra DB like workload
  • 4. DCG Storage Group Evolution of Non-Volatile Memory Storage Devices PCIe NVMe 10s us >10 DW/day <10 DW/day 100s K 10s K PCIe NVMe GB/s SATA/SAS SSDs ~100s MB/s HDDs ~sub 100 MB/s SATA/SAS SSDs 100s us HDDs ~ ms IOPsEndurance 4K Read Latency PCI Express® (PCIe) NVM Express™ (NVMe) 3D XPoint™ DIMMs 3D XPoint NVM SSDs NVM Plays a Key Role in Delivering Performance for latency sensitive workloads
  • 5. DCG Storage Group 15 Ceph Workloads StoragePerformance (IOPS,Throughput) Storage Capacity (PB) Lower Higher LowerHigher Boot Volumes CDN Enterprise Dropbox Backup, Archive Remote Disks VDI App Storage BigData Mobile Content Depot Databases Block Object NVM Focus Test & Dev Cloud DVR HPC
  • 6. DCG Storage Group 16 Caching Ceph - NVM Usages Virtual Machine Baremetal RADOS Node Hypervisor Guest VM Qemu/VirtioQemu/Virtio ApplicationApplication Kernel User RBD DriverRBD Driver RADOSRADOS ApplicationApplication RADOS Protocol RADOS Protocol RBDRBD RADOSRADOS RADOS Protocol RADOS Protocol OSDOSD JournalJournal FilestoreFilestore NVMNVM File SystemFile System 10GbE Client caching w/ write through NVM NVMNVM NVM NVMNVM Journaling Read cache OSD data
  • 7. DCG Storage Group 17 Cassandra – What and Why? Cassandra Ring p1 p1 p20 p5 p3 p6 p5 p2 p4p8 p10 p7 Client • Cassandra is column-oriented NoSQL DB with CQL interface  Each row has unique key which is used for partitioning  No relations  A row can have multiple columns – not necessarily same no. of columns • Open source, distributed, decentralized, highly available, linearly scalable, multi DC, ….. • Used for analytics, real-time insights, fraud-detection, IOT/sensor data, messaging etc. Usecases: http://www.planetcassandra.org/apachecassandra-use-cases/ • Ceph is a popular open source unified storage platform • Many large scale Ceph deployments in production • End customers prefer converged infrastructure to support multiple workloads (e.g. analytics) to achieve CapEx, OpEx savings • Several customers are asking for Cassandra workload on Ceph
  • 8. DCG Storage Group IP Fabric 18 Ceph and Cassandra Integration Virtual Machine Hypervisor Guest VM Qemu/VirtioQemu/Virtio ApplicationApplication RBDRBD RADOSRADOS CassandraCassandra Virtual Machine Hypervisor Guest VM Qemu/VirtioQemu/Virtio ApplicationApplication RBDRBD RADOSRADOS CassandraCassandra Virtual Machine Hypervisor Guest VM Qemu/VirtioQemu/Virtio ApplicationApplication RBDRBD RADOSRADOS CassandraCassandra Ceph Storage Cluster SSD SSD OSDOSDOSDOSD OSDOSD SSD SSD OSDOSDOSDOSD OSDOSD SSD SSD OSDOSDOSDOSD OSDOSD SSD SSD OSDOSDOSDOSD OSDOSD MON MON Deployment Considerations • Bootable Ceph volumes (OS & Cassandra data) • Cassandra RBD data volumes • Data protection (Cassandra or Ceph)
  • 9. DCG Storage Group Ceph Storage Cluster Hardware Environment Overview Ceph network (192.168.142.0/24) - 10Gbps CBT / Zabbix / Monitoring CBT / Zabbix / Monitoring FIO RBD ClientFIO RBD Client • OSD System Config: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 72 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4 • Each system with 4x P3700 800GB NVMe, partitioned into 4 OSD’s each, 16 OSD’s total per node • FIO Client Systems: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 72 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4 • Ceph v0.94.3 Hammer Release, CentOS 7.1, 3.10-229 Kernel, Linked with JEMalloc 3.6 • CBT used for testing and data acquisition • Single 10GbE network for client & replication data transfer, Replication factor 2 • OSD System Config: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 72 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4 • Each system with 4x P3700 800GB NVMe, partitioned into 4 OSD’s each, 16 OSD’s total per node • FIO Client Systems: Intel Xeon E5-2699 v3 2x@ 2.30 GHz, 72 cores w/ HT, 96GB, Cache 46080KB, 128GB DDR4 • Ceph v0.94.3 Hammer Release, CentOS 7.1, 3.10-229 Kernel, Linked with JEMalloc 3.6 • CBT used for testing and data acquisition • Single 10GbE network for client & replication data transfer, Replication factor 2 FIO RBD ClientFIO RBD Client FIO RBD ClientFIO RBD Client FIO RBD ClientFIO RBD Client FIO RBD ClientFIO RBD Client FIO RBD ClientFIO RBD Client FatTwin (4x dual-socket XeonE5 v3) FatTwin (4x dual-socket XeonE5 v3) CephOSD1CephOSD1 NVMe1NVMe1 NVMe3NVMe3 NVMe2NVMe2 NVMe4NVMe4 CephOSD2CephOSD2 CephOSD3CephOSD3 CephOSD4CephOSD4 CephOSD16CephOSD16 … CephOSD1CephOSD1 NVMe1NVMe1 NVMe3NVMe3 NVMe2NVMe2 NVMe4NVMe4 CephOSD2CephOSD2 CephOSD3CephOSD3 CephOSD4CephOSD4 CephOSD16CephOSD16 … CephOSD1CephOSD1 NVMe1NVMe1 NVMe3NVMe3 NVMe2NVMe2 NVMe4NVMe4 CephOSD2CephOSD2 CephOSD3CephOSD3 CephOSD4CephOSD4 CephOSD16CephOSD16 … CephOSD1CephOSD1 NVMe1NVMe1 NVMe3NVMe3 NVMe2NVMe2 NVMe4NVMe4 CephOSD2CephOSD2 CephOSD3CephOSD3 CephOSD4CephOSD4 CephOSD16CephOSD16 … CephOSD1CephOSD1 NVMe1NVMe1 NVMe3NVMe3 NVMe2NVMe2 NVMe4NVMe4 CephOSD2CephOSD2 CephOSD3CephOSD3 CephOSD4CephOSD4 CephOSD16CephOSD16 … SuperMicro 1028U SuperMicro 1028U SuperMicro 1028U SuperMicro 1028U SuperMicro 1028U Intel Xeon E5 v3 18 Core CPUs Intel P3700 NVMe PCI-e Flash Intel Xeon E5 v3 18 Core CPUs Intel P3700 NVMe PCI-e Flash Easily serviceable NVMe Drives
  • 10. DCG Storage Group • High performance NVMe devices are capable of high parallelism at low latency • DC P3700 800GB Raw Performance: 460K read IOPS & 90K Write IOPS at QD=128 • By using multiple OSD partitions, Ceph performance scales linearly • Reduces lock contention within a single OSD process • Lower latency at all queue-depths, biggest impact to random reads • Introduces the concept of multiple OSD’s on the same physical device • Conceptually similar crushmap data placement rules as managing disks in an enclosure • High Resiliency of “Data Center” Class NVMe devices • At least 10 Drive writes per day • Power loss protection, full data path protection, device level telemetry Multi-partitioning flash devices NVMe1NVMe1 CephOSD1CephOSD1 CephOSD2CephOSD2 CephOSD3CephOSD3 CephOSD4CephOSD4 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters.
  • 11. DCG Storage Group 21 Partitioning multiple OSD’s per NVMe • Multiple OSD’s per NVMe result in higher performance, lower latency, and better CPU utilization 0 2 4 6 8 10 12 0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 AvgLatency(ms) IOPS Latency vs IOPS - 4K Random Read - Multiple OSD's per Device comparison 5 nodes, 20/40/80 OSDs, Intel DC P3700 Xeon E5 2699v3 Dual Socket / 128GB Ram / 10GbE Ceph0.94.3 w/ JEMalloc, 1 OSD/NVMe 2 OSD/NVMe 4 OSD/NVMe Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters. 0 10 20 30 40 50 60 70 80 90 %CPUUtilization Single Node CPU Utilization Comparison - 4K Random Reads@QD32 4/8/16 OSDs, Intel DC P3700, Xeon E5 2699v3 Dual Socket / 128GB Ram / 10GbE Ceph0.94.3 w/ JEMalloc 1 OSD/NVMe 2 OSD/NVMe 4 OSD/NVMe Single OSD Double OSD Quad OSD
  • 12. DCG Storage Group 4K Random Read & Write Performance Summary 22 First Ceph cluster to break 1 Million 4K random IOPS Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters. Workload Pattern Max IOPS 4K 100% Random Reads (2TB Dataset) 1.35Million 4K 100% Random Reads (4.8TB Dataset) 1.15Million 4K 100% Random Writes (4.8TB Dataset) 200K 4K 70%/30% Read/Write OLTP Mix (4.8TB Dataset) 452K
  • 13. DCG Storage Group 0 1 2 3 4 5 6 7 8 9 10 0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 1,400,000 AvgLatency(ms) IOPS IODepth Scaling - Latency vs IOPS - Read, Write, and 70/30 4K Random Mix 5 nodes, 60 OSDs, Xeon E5 2699v3 Dual Socket / 128GB Ram / 10GbE Ceph0.94.3 w/ JEMalloc 100% 4K RandomRead 100% 4K RandomWrite 70/30% 4K Random OLTP 100% 4K RandomRead - 2TB DataSet 4K Random Read & Write Performance and Latency 23 First Ceph cluster to break 1 Million 4K random IOPS, ~1ms response time 171K 100% 4k Random Write IOPS @ 6ms 400K 70/30% (OLTP) 4k Random IOPS @~3ms 1M 100% 4k Random Read IOPS @~1.1ms Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters. 1.35M 4k Random Read IOPS w/ 2TB Hot Data
  • 14. DCG Storage Group Sequential performance (512KB) 24 • With 10gbE per node, both writes and reads are achieving line rate bottlenecked by the OSD node single interface. • Higher throughputs would be possible through bonding or 40GbE connectivity. 3,214 5,888 5,631 0 1000 2000 3000 4000 5000 6000 7000 100% Write 100% Read 70/30% R/W Mix MB/s 512k Sequential Performance Bandwidth 5 nodes, 80 OSDs, DC P3700, Xeon E5 2699v3 Dual Socket / 128GB Ram / 10GbE Ceph0.94.3 w/ JEMalloc Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters.
  • 15. DCG Storage Group Cassandra-like workload 25 242K IOPS at < 2ms latency • Based on a typical customer cassanda workload profile • 50% Reads and 50% Writes, predominantly 8K Reads and 12K Writes, FIO Queue depth = 8 78% 19% 3% 8K 5K 7K 92% 5% 12K 33K 115K 50K 80K 0 0.5 1 1.5 2 2.5 0.00 50,000.00 100,000.00 150,000.00 200,000.00 250,000.00 300,000.00 Latency(ms) IOPS Cassandra like workload - 50/50 Read/Write Mix 5 nodes, 80 OSDs, Xeon E5 2699v3 Dual Socket / 128GB Ram / 10GbE Ceph0.94.3 w/ JEMalloc IOPS Latency IO-Size Breakdown Reads Writes Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters.
  • 16. DCG Storage Group 26 Summary & Conclusions • Flash technology including NVMe enables new performance capabilities in small footprints • Ceph and Cassandra provide a compelling case for feature-rich converged storage that can support latency sensitive analytics workloads • Using the latest standard high-volume servers and Ceph, you can now build an open, high density, scalable, high performance cluster that can handle a low- latency mixed workload. • Ceph performance improvements over recent releases are significant, and today over 1 Million random IOPS is achievable in 5U with ~1ms latency. • Next steps: • Address small block write performance, limited by Filestore backend • Improve long tail latency for transactional workloads Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any difference in system hardware or software design or configuration may affect actual performance. See configuration slides in backup for details on software configuration and test benchmark parameters.
  • 18. 28
  • 19. DCG Storage Group 29 Configuration Detail – ceph.conf Section Perf. Tuning Parameter Default Tuned [global] Authentication auth_client_required cephx none auth_cluster_required cephx none auth_service_required cephx none Debug logging debug_lockdep 0/1 0/0 debug_context 0/1 0/0 debug_crush 1/1 0/0 debug_buffer 0/1 0/0 debug_timer 0/1 0/0 debug_filer 0/1 0/0 debug_objector 0/1 0/0 debug_rados 0/5 0/0 debug_rbd 0/5 0/0 debug_ms 0/5 0/0 debug_monc 0/5 0/0 debug_tp 0/5 0/0 debug_auth 1/5 0/0 debug_finisher 1/5 0/0 debug_heartbeatmap 1/5 0/0 debug_perfcounter 1/5 0/0 debug_rgw 1/5 0/0 debug_asok 1/5 0/0 debug_throttle 1/1 0/0
  • 20. DCG Storage Group 30 Configuration Detail – ceph.conf (continued)Section Perf. Tuning Parameter Default Tuned [global] CBT specific mon_pg_warn_max_object_skew 10 10000 mon_pg_warn_min_per_osd 0 0 mon_pg_warn_max_per_osd 32768 32768 osd_pg_bits 8 8 osd_pgp_bits 8 8 RBD cache rbd_cache true true Other mon_compact_on_trim true false log_to_syslog false false log_file /var/log/ceph/$name.log /var/log/ceph/$name.log perf true true mutex_perf_counter false true throttler_perf_counter true false [mon] CBT specific mon_data /var/lib/ceph/mon/ceph-0 /home/bmpa/tmp_cbt/ceph/mon.$id mon_max_pool_pg_num 65536 166496 mon_osd_max_split_count 32 10000 [osd] Filestore parameters filestore_wbthrottle_enable true false filestore_queue_max_bytes 104857600 1048576000 filestore_queue_committing_max_bytes 104857600 1048576000 filestore_queue_max_ops 50 5000 filestore_queue_committing_max_ops 500 5000 filestore_max_sync_interval 5 10 filestore_fd_cache_size 128 64 filestore_fd_cache_shards 16 32 filestore_op_threads 2 6 Mount parameters osd_mount_options_xfs rw,noatime,inode64,logbsize=256k,delaylog osd_mkfs_options_xfs -f -i size=2048 Journal parameters journal_max_write_entries 100 1000 journal_queue_max_ops 300 3000 journal_max_write_bytes 10485760 1048576000 journal_queue_max_bytes 33554432 1048576000 Op tracker osd_enable_op_tracker true false OSD client osd_client_message_size_cap 524288000 0 osd_client_message_cap 100 0 Objecter objecter_inflight_ops 1024 102400 objecter_inflight_op_bytes 104857600 1048576000 Throttles ms_dispatch_throttle_bytes 104857600 1048576000 OSD number of threads osd_op_threads 2 32 osd_op_num_shards 5 5 osd_op_num_threads_per_shard 2 2
  • 21. DCG Storage Group 31 Configuration Detail - CBT YAML File cluster: user: "bmpa" head: "ft01" clients: ["ft01", "ft02", "ft03", "ft04", "ft05", "ft06"] osds: ["hswNode01", "hswNode02", "hswNode03", "hswNode04", "hswNode05"] mons: ft02: a: "192.168.142.202:6789" osds_per_node: 8 fs: xfs mkfs_opts: '-f -i size=2048 -n size=64k' mount_opts: '-o inode64,noatime,logbsize=256k' conf_file: '/home/bmpa/cbt/ceph_nvme_2partition_5node_hsw.conf' use_existing: False rebuild_every_test: False clusterid: "ceph" iterations: 1 tmp_dir: "/home/bmpa/tmp_cbt" pool_profiles: 2rep: pg_size: 4096 pgp_size: 4096 replication: 2
  • 22. DCG Storage Group 32 Configuration Detail - CBT YAML File (Continued) benchmarks: librbdfio: time: 300 ramp: 600 vol_size: 81920 mode: ['randrw‘] rwmixread: [0, 70, 100] op_size: [4096] procs_per_volume: [1] volumes_per_client: [10] use_existing_volumes: False iodepth: [4, 8, 16, 32, 64, 96, 128] osd_ra: [128] norandommap: True cmd_path: '/usr/bin/fio' pool_profile: '2rep' log_avg_msec: 250
  • 23. DCG Storage Group 33 Storage Node Diagram Two CPU Sockets: Socket 0 and Socket 1  Socket 0 • 2 NVMes • Intel X540-AT2 (10Gbps) • 64GB: 8x 8GB 2133 DIMMs  Socket 1 • 2 NVMes • 64GB: 8x 8GB 2133 DIMMs Explore additional optimizations using cgroups, IRQ affinity
  • 24. DCG Storage Group • Generally available server designs built for high density and high performance • High density 1U standard high volume server • Dual socket 3rd Generation Xeon E5 (2699v3) • 10 Front-removable 2.5” Formfactor Drive slots, 8639 connector • Multiple 10Gb network ports, additional slots for 40Gb networking • Intel DC P3700 NVMe drives are available in 2.5” drive form-factor • Allowing easier service in a datacenter environment High Performance Ceph Node Hardware Building Blocks