SlideShare a Scribd company logo
1 of 28
Download to read offline
DSS

Data & Storage Services

Building an organic block storage
service at CERN with Ceph

Dan van der Ster
Arne Wiebalck
Ceph Day 2013, London
9 October 2013

CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it
DSS CERN’s mission and tools
●

CERN studies the fundamental laws of nature
○
○
○
○
○

●

The Large Hadron Collider (LHC)
○
○
○
○

●

Internet
Services
CERN IT Department

Built in a 27km long tunnel, ~200m underground
Dipole magnets operated at -271°C (1.9K)
Particles do ~11’000 turns/sec, 600 million collisions/sec
...

Detectors
○
○

●

Why do particles have mass?
What is our universe made of?
Why is there no antimatter left?
What was matter like right after the “Big Bang”?
…

Four main experiments, each the size of a cathedral
DAQ systems Processing PetaBytes/sec

Worldwide LHC Computing Grid (WLCG)
○
○

Computer network to provide computing for LHC data analysis
CERN at the centre of 170 computing centres worldwide

CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS CERN’s mission and tools

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Big Data at CERN
Physics Data on CASTOR/EOS

Home directories for 30k users
Physics analysis dev’t
Project spaces (applications)

Service Data on AFS/NFS
●

Databases, admin applications

Tape archival with CASTOR/TSM
●
●

Internet
Services
CERN IT Department

Files

240TB

1.9B

CASTOR

87.7PB

317M

EOS

19.8PB

160M

LHC experiments produce ~10GB/s, 25PB/year

User Data on AFS/DFS
●
●
●

Size

AFS

●

Service

RAW physics outputs
Desktop/Server backups

CERN developed
CASTOR & EOS
because until very
recently our storage
reqs were globally
unique.
Following the Google /
Amazon / Facebook
innovations, we are now
trying to leverage
community solutions

CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS IT (R)evolution at CERN
Cloudifying CERN’s IT infrastructure ...
●
●
●
●

Centrally-managed and uniform hardware
○ No more service-specific storage boxes
OpenStack VMs for most services
○ Building for 100k nodes (mostly for batch processing)
Attractive desktop storage services
○ Huge demand for a local Dropbox, Google Drive …
Remote data centre in Budapest
○ More rack space and power, plus disaster recovery

… brings new storage requirements
●
●
●
●

Block storage for OpenStack VMs
○ Images and volumes
Backend storage for existing and new services
○ AFS, NFS, OwnCloud, Zenodo, ...
Regional storage
○ Make use of the new data centre in Hungary
Failure tolerance, data checksumming, easy to operate, security, ...

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Possible Solutions
GlusterFS
●
●

Cloud team at CERN found it wasn’t stable enough
Doesn’t offer block device for physical machines

NFS (NetApp)
●
●

Expensive
Vendor lock-in

Ceph
●
●

Interesting architecture (on paper)
Offers almost all features we needed

Early 2013 we started investigating Ceph ...

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS First steps
●

Set up a small scale test cluster

○
○
○
○

3 MON servers, 1 RADOS gateway (all VMs)
8 OSD hosts with 4-5 disks each (ex-CASTOR)
Ceph 0.56.4 installed via yum install ceph on SLC6.4
Various clients: kernel rbd driver, OpenStack, AI monitoring, ...

MON

MON

OSD
OSD
OSD

MON

RGW

CLT1

CLT2

...

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Early testing
●

Setup was easy
○

●

Passed our (simple) interface tests
○

●

○

remove OSD, change replication size,
delete object in pg, corrupt object in pg, …
OpenStack/Cinder

Passed our performance test
○

●

RADOS, RBD, RADOS GW, CephFS

Passed our first functional tests
○

●

~2 days for our 50TB testbed

radosbench

Passed our community expectations
○

very quick and helpful responses to issues we encountered

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Issues during early testing
●

ceph-deploy did not work for us at the time

●

“2 rooms - 3 replicas - problem”

●

“re-weight apocalypse”
○

●

“flaky” server caused Ceph timeouts and constant re-balancing
○
○

●

wrong ratio of RAM to OSDs

taking out the server “fixed” the problem
root cause not understood (can slow server slow down the cluster?)

qemu-kvm RPM on RHEL derivative SLC needs patching
○

RPM provided by Inktank

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Issues during early testing
●

ceph-deploy did not work for us at the time

●

“2 rooms - 3 replicas - problem”

●

●

The results of this initial testing allowed
“re-weight apocalypse”
○ us toratio of RAM to OSDs
wrong convince management to support
a more serious Ceph prototype ...
“flaky” server caused Ceph timeouts and constant re-balancing
○
○

●

taking out the server “fixed” the problem
root cause not understood (can slow server slow down the cluster?)

qemu-kvm RPM on RHEL derivative SLC needs patching
○

RPM provided by Inktank

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS 12 racks of disk server quads

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Our 3PB Ceph Cluster
48 OSD servers

Dual Intel Xeon E5-2650
32 threads incl. HT
Dual 10Gig-E NICs
Only one connected
24x 3TB Hitachi disks
Eco drive, ~5900 RPM
3x 2TB Hitachi system disks
Triple mirror

5 monitors

Dual Intel Xeon L5640
24 threads incl. HT
Dual 1Gig-E NICs
Only one connected
3x 2TB Hitachi system disks
Triple mirror
48GB RAM

64GB RAM

Internet
Services
CERN IT Department

[root@p01001532971954 ~]# ceph osd tree | head -n2
# id weight
type name up/down reweight
-1 2883 root default

CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Fully Puppetized Deployment
Fully puppetized deployed
●

Big thanks to eNovance for their module!
https://github.com/enovance/puppet-ceph/

Automated machine commissioning
●
●
●
●

Add a server to the hostgroup (osd, mon, radosgw)
OSD disks are detected, formatted, prepared, auth’d
Auto-generated ceph.conf
Last step is manual/controlled: service ceph start

We use mcollective for bulk operations on the servers
●
●

Ceph rpm upgrades
daemon restarts

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Our puppet-ceph changes
● Yum repository support
● Don’t export the admin key
○
○

our puppet env is shared across CERN
(get the key via k5 auth’d scp instead)

● New options:
○

osd default pool size, mon osd down out interval,
osd crush location

● RADOS GW support (RHEL only)
○

https to be completed

● /dev/disk/by-path OSDs
○

better handle disk replacements

● Unmanaged osd service
○

manual control of the daemon

● Other OSD fixes: delay mkfs, don’t mount the disks, …
Needs some cleanup before pushing back to enovance

Internet
Services
CERN IT Department

https://github.com/cernceph/puppet-ceph/

CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Puppet-ceph TODO/Wish-list
We have some further puppet work in mind:
●

Add arbitrary ceph.conf options

●

Move the OSD journal to a separate partition

●

SSD OSD journals

●

Use the udev triggers for OSD creation

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Ceph Configuration
11 data pools with 3 replicas each
● mostly test pools for a few different use-cases
● 1-4k pgs per pool; 19584 pgs total
Room/Rack in ceph.conf:
osd crush location = room=0513-R-0050
rack=RJ35
Rack-wise replication:
rule data {
ruleset 0
type replicated
min_size 1
max_size 10
step take 0513-R-0050
step chooseleaf firstn 0 type
rack
step emit
}

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Ceph Configuration
11 data pools with 3 replicas each
● mostly test pools for a few different use-cases
● 1-4k pgs per pool; 19584 pgs total
Room/Rack in ceph.conf:
osd crush location = room=0513-R-0050
rack=RJ35

Internet
Services
CERN IT Department

-1
-2
-3
-15
-16
-17
-18
-4
-23
-24
-25
-26
...

Rack-wise replication:
2883 root default
rule data { 0513-R-0050
2883
room
ruleset 0
262.1
rack RJ35
type replicated
65.52
host p05151113471870
min_size 1
65.52
host p05151113489275
max_size 10
65.52
host p05151113479552
step take 0513-R-0050
65.52
host p05151113498803
step chooseleaf firstn 0 type
262.1
rack RJ37
rack
65.52
host p05151113507373
step emit
65.52
host p05151113508409
}
65.52
host p05151113521447
65.52
host p05151113525886

CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Service Monitoring

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Service Monitoring
A few monitoring helper scripts
https://github.com/cernceph/ceph-scripts
ceph-health-cron:
● report on the ceph health hourly
cephinfo:
● python API to the ceph JSON dumps
cern-sls:
● example usage of cephinfo.py
● compute and publish ceph availability and statistics

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Initial Benchmarks
basic rados bench - saturate the network
[root@p05151113471870
Total writes made:
Write size:
Bandwidth (MB/sec):
Average Latency:
[root@p05151113471870
Total reads made:
Read size:
Bandwidth (MB/sec):
Average Latency:

~]# rados bench 30 -p test write -t 100
7596
4194304
997.560
0.395118
~]# rados bench 30 -p test seq -t 100
7312
4194304
962.649
0.411129
all-to-all rados bench

120M file test

Internet
Services
CERN IT Department

Wrote 120 million tiny files into
RADOS to measure scalability
by that dimension. No
problems observed. Then we
added one OSD server, and
the rebalance took ages
(~24hrs) which is probably to
be expected.

CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Our Users
A few early adopters are helping us evaluate Ceph:
●

OpenStack: usage for Glance images and Cinder volumes

●

AFS/NFS: backend RBD storage for these commonly used fs’s

●

CASTOR: high performance buffer of objects to be written to tape

●

DPM: backend RBD storage for this high-energy-physics fs

●

OwnCloud: S3 or CephFS backend for desktop synchronisation

●

Zenodo: backend storage for data and publications sharing service

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Openstack / Ceph Testing
We are still validating the OpenStack / Ceph integration
● Being a RedHat shop, we require the version of qemu-kvm
patched by Inktank to support RBD
●

Our workloads benefit from striping:
○

Gary McGilvary developed and pushed some patches to allow
configurable striping via the OpenStack UI

● Our grizzly cluster is using RBD
○

Small problem related to ulimit, see coming slide...

● For Cinder usage we are currently blocked:

Internet
Services
CERN IT Department

○
○
○

Deployed Grizzly with cells to divide our large facilities
Grizzly cells don’t support Cinder
Belmiro Moreira backported the Havana code for Cinder/Cells;
currently under test

CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Ceph as a Tape Buffer?
CASTOR holds much of our physics data
● 90PB total, 75PB on TAPE
Tapes write at 250MB/s; without striping CASTOR diskservers
cannot supply data at that rate.
Idea: put a Ceph buffer between the disk servers and tape drives
but… single threaded read performance
[root@p05151113471870 ~]# rados bench 10 -p test seq -t 1
Total reads made:
612
Read size:
4194304
Bandwidth (MB/sec):
244.118
Average Latency:
0.0163772

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Ceph as a Tape Buffer?
So our colleage Andreas Peters prototyped a striping RADOS
object client: cephcp
cephcp [--verbose] [-p|--pool <pool>] [-i|--id <id>] [-C|--config
<config>] [-n|--stripes <n>] [-b|--blocksize <bytes>] <sourcepath> <target-path>
<source> is file:<localpath>|- or ceph:<objectname>
<target> is ceph:<objectname> or file:<localpath>|-

Upload:
[root@p05151113471870 ~]# ./cephcp -p test -i admin -n 64 file:
/root/1G.dat ceph:/root/1G.dat
[cephcp] 1073741824 bytes copied in 1137.89 ms [ 943.63 MB/s ]

Internet
Services
CERN IT Department

Download
[root@p05151113471870 ~]# ./cephcp -p test -i admin -n 64 ceph:
/root/1G.dat file:/dev/null
[cephcp] 1073741824 bytes copied in 1022.40 ms [ 1050.22 MB/s ]

CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Current Issues
Latency:
● Our best case write latency is presently 50ms
○

●

We tested an in-memory OSD and saw ~1ms latency
○

●

1 replica, journal as a file on the OSD
So our high latency comes from our journal

We need to put our journals on the blockdev directly (should
get ~12ms writes) or use SSDs (but we’re worried they’ll wear
out)

ulimits:
● With more than >1024 OSDs, we’re getting various errors
where clients cannot create enough processes to connect to
the OSDs
○

Internet
Services
CERN IT Department

●

failed ceph tell, failed glance image uploads

Our clients have been informed to increase ulimit -u to 4096,
but it would useful if ceph was somehow less process greedy.

CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Looking forward...
The killer app for Ceph at CERN would be to build
upon it a general purpose network file system
●
●

Would help us get rid of NetApp boxes
Dare we dream that it may one day replace AFS?!

CephFS is advertised as not yet production quality,
so we don’t advertise it to our users
●

How far off is it?

To be generally usable we’d need:
●
●

HA and load balanced (for AFS we get accessed at
75kHz)
All the goodies we get from AFS: quotas, ACLs, krb5, ...

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Conclusions
We are attracting various use-cases
● OpenStack images and volumes
● RBD backends for other storage services (AFS/NFS/DPM)
● Object storage for novel applications: (tape buffer, Zenodo,
OwnCloud)

We have very high hopes for Ceph at CERN!
● the design is correct
● the performance so far is adequate
● operationally it is very attractive
With CephFS or similar coming, the future of storage at CERN
is starting to look rather ...

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
DSS Conclusions
We have very high hopes for Ceph at CERN
● the design is correct
● the performance so far is adequate
● operationally it is very attractive

We are attracting various use-cases:
● OpenStack images and volumes
● RBD backends for other storage services (AFS/NFS/DPM)
● Object storage for novel applications: (tape buffer,
Zenodo)
With CephFS or similar coming, the future of storage at
CERN is starting to look rather ...

Internet
Services
CERN IT Department
CH-1211 Geneva 23
Switzerland

www.cern.ch/it

Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph

More Related Content

What's hot

OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveTim Bell
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudRyousei Takano
 
Instrumenting the real-time web
Instrumenting the real-time webInstrumenting the real-time web
Instrumenting the real-time webbcantrill
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicTim Bell
 
Operational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in ProductionOperational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in ProductionArne Wiebalck
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
 
Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016Belmiro Moreira
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioHajime Tazaki
 
QCT Fact Sheet-English
QCT Fact Sheet-EnglishQCT Fact Sheet-English
QCT Fact Sheet-EnglishPeggy Ho
 
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical ResearchBruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical ResearchDanny Abukalam
 
An introduction and evaluations of a wide area distributed storage system
An introduction and evaluations of  a wide area distributed storage systemAn introduction and evaluations of  a wide area distributed storage system
An introduction and evaluations of a wide area distributed storage systemHiroki Kashiwazaki
 
Secure lustre on openstack
Secure lustre on openstackSecure lustre on openstack
Secure lustre on openstackJames Beal
 
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...Masaaki Nakagawa
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Belmiro Moreira
 
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Belmiro Moreira
 
Hâpy eole-gnu-linux-distribution
Hâpy eole-gnu-linux-distributionHâpy eole-gnu-linux-distribution
Hâpy eole-gnu-linux-distributionOpenNebula Project
 
Trip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningTrip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningRenaldas Zioma
 
Interactive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science CloudInteractive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science CloudHelix Nebula The Science Cloud
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 

What's hot (20)

OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspective
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
Instrumenting the real-time web
Instrumenting the real-time webInstrumenting the real-time web
Instrumenting the real-time web
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
 
Operational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in ProductionOperational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in Production
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osio
 
QCT Fact Sheet-English
QCT Fact Sheet-EnglishQCT Fact Sheet-English
QCT Fact Sheet-English
 
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical ResearchBruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
 
An introduction and evaluations of a wide area distributed storage system
An introduction and evaluations of  a wide area distributed storage systemAn introduction and evaluations of  a wide area distributed storage system
An introduction and evaluations of a wide area distributed storage system
 
Secure lustre on openstack
Secure lustre on openstackSecure lustre on openstack
Secure lustre on openstack
 
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
 
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
 
Hâpy eole-gnu-linux-distribution
Hâpy eole-gnu-linux-distributionHâpy eole-gnu-linux-distribution
Hâpy eole-gnu-linux-distribution
 
Trip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningTrip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine Learning
 
Interactive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science CloudInteractive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science Cloud
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 

Similar to London Ceph Day: Ceph at CERN

Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Ceph Community
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Belmiro Moreira
 
The OpenStack Cloud at CERN
The OpenStack Cloud at CERNThe OpenStack Cloud at CERN
The OpenStack Cloud at CERNArne Wiebalck
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stackNikos Kormpakis
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFZoltan Arnold Nagy
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFShapeBlue
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster inwin stack
 
Ceph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfCeph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfClyso GmbH
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridSwiss Big Data User Group
 
2021.06. Ceph Project Update
2021.06. Ceph Project Update2021.06. Ceph Project Update
2021.06. Ceph Project UpdateCeph Community
 
Using Ceph for Large Hadron Collider Data
Using Ceph for Large Hadron Collider DataUsing Ceph for Large Hadron Collider Data
Using Ceph for Large Hadron Collider DataRob Gardner
 
Openstack For Beginners
Openstack For BeginnersOpenstack For Beginners
Openstack For Beginnerscpallares
 
OpenEBS hangout #4
OpenEBS hangout #4OpenEBS hangout #4
OpenEBS hangout #4OpenEBS
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...Ganesan Narayanasamy
 

Similar to London Ceph Day: Ceph at CERN (20)

Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015
 
The OpenStack Cloud at CERN
The OpenStack Cloud at CERNThe OpenStack Cloud at CERN
The OpenStack Cloud at CERN
 
Lxcloud
LxcloudLxcloud
Lxcloud
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stack
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster
 
Ceph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfCeph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdf
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
 
Contrail at AllegroGroup
Contrail at AllegroGroupContrail at AllegroGroup
Contrail at AllegroGroup
 
2021.06. Ceph Project Update
2021.06. Ceph Project Update2021.06. Ceph Project Update
2021.06. Ceph Project Update
 
Using Ceph for Large Hadron Collider Data
Using Ceph for Large Hadron Collider DataUsing Ceph for Large Hadron Collider Data
Using Ceph for Large Hadron Collider Data
 
Openstack For Beginners
Openstack For BeginnersOpenstack For Beginners
Openstack For Beginners
 
CERNBox: Site Report
CERNBox: Site ReportCERNBox: Site Report
CERNBox: Site Report
 
Ceph
CephCeph
Ceph
 
OpenEBS hangout #4
OpenEBS hangout #4OpenEBS hangout #4
OpenEBS hangout #4
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

London Ceph Day: Ceph at CERN

  • 1. DSS Data & Storage Services Building an organic block storage service at CERN with Ceph Dan van der Ster Arne Wiebalck Ceph Day 2013, London 9 October 2013 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it
  • 2. DSS CERN’s mission and tools ● CERN studies the fundamental laws of nature ○ ○ ○ ○ ○ ● The Large Hadron Collider (LHC) ○ ○ ○ ○ ● Internet Services CERN IT Department Built in a 27km long tunnel, ~200m underground Dipole magnets operated at -271°C (1.9K) Particles do ~11’000 turns/sec, 600 million collisions/sec ... Detectors ○ ○ ● Why do particles have mass? What is our universe made of? Why is there no antimatter left? What was matter like right after the “Big Bang”? … Four main experiments, each the size of a cathedral DAQ systems Processing PetaBytes/sec Worldwide LHC Computing Grid (WLCG) ○ ○ Computer network to provide computing for LHC data analysis CERN at the centre of 170 computing centres worldwide CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 3. DSS CERN’s mission and tools Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 4. DSS Big Data at CERN Physics Data on CASTOR/EOS Home directories for 30k users Physics analysis dev’t Project spaces (applications) Service Data on AFS/NFS ● Databases, admin applications Tape archival with CASTOR/TSM ● ● Internet Services CERN IT Department Files 240TB 1.9B CASTOR 87.7PB 317M EOS 19.8PB 160M LHC experiments produce ~10GB/s, 25PB/year User Data on AFS/DFS ● ● ● Size AFS ● Service RAW physics outputs Desktop/Server backups CERN developed CASTOR & EOS because until very recently our storage reqs were globally unique. Following the Google / Amazon / Facebook innovations, we are now trying to leverage community solutions CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 5. DSS IT (R)evolution at CERN Cloudifying CERN’s IT infrastructure ... ● ● ● ● Centrally-managed and uniform hardware ○ No more service-specific storage boxes OpenStack VMs for most services ○ Building for 100k nodes (mostly for batch processing) Attractive desktop storage services ○ Huge demand for a local Dropbox, Google Drive … Remote data centre in Budapest ○ More rack space and power, plus disaster recovery … brings new storage requirements ● ● ● ● Block storage for OpenStack VMs ○ Images and volumes Backend storage for existing and new services ○ AFS, NFS, OwnCloud, Zenodo, ... Regional storage ○ Make use of the new data centre in Hungary Failure tolerance, data checksumming, easy to operate, security, ... Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 6. DSS Possible Solutions GlusterFS ● ● Cloud team at CERN found it wasn’t stable enough Doesn’t offer block device for physical machines NFS (NetApp) ● ● Expensive Vendor lock-in Ceph ● ● Interesting architecture (on paper) Offers almost all features we needed Early 2013 we started investigating Ceph ... Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 7. DSS First steps ● Set up a small scale test cluster ○ ○ ○ ○ 3 MON servers, 1 RADOS gateway (all VMs) 8 OSD hosts with 4-5 disks each (ex-CASTOR) Ceph 0.56.4 installed via yum install ceph on SLC6.4 Various clients: kernel rbd driver, OpenStack, AI monitoring, ... MON MON OSD OSD OSD MON RGW CLT1 CLT2 ... Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 8. DSS Early testing ● Setup was easy ○ ● Passed our (simple) interface tests ○ ● ○ remove OSD, change replication size, delete object in pg, corrupt object in pg, … OpenStack/Cinder Passed our performance test ○ ● RADOS, RBD, RADOS GW, CephFS Passed our first functional tests ○ ● ~2 days for our 50TB testbed radosbench Passed our community expectations ○ very quick and helpful responses to issues we encountered Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 9. DSS Issues during early testing ● ceph-deploy did not work for us at the time ● “2 rooms - 3 replicas - problem” ● “re-weight apocalypse” ○ ● “flaky” server caused Ceph timeouts and constant re-balancing ○ ○ ● wrong ratio of RAM to OSDs taking out the server “fixed” the problem root cause not understood (can slow server slow down the cluster?) qemu-kvm RPM on RHEL derivative SLC needs patching ○ RPM provided by Inktank Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 10. DSS Issues during early testing ● ceph-deploy did not work for us at the time ● “2 rooms - 3 replicas - problem” ● ● The results of this initial testing allowed “re-weight apocalypse” ○ us toratio of RAM to OSDs wrong convince management to support a more serious Ceph prototype ... “flaky” server caused Ceph timeouts and constant re-balancing ○ ○ ● taking out the server “fixed” the problem root cause not understood (can slow server slow down the cluster?) qemu-kvm RPM on RHEL derivative SLC needs patching ○ RPM provided by Inktank Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 11. DSS 12 racks of disk server quads Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 12. DSS Our 3PB Ceph Cluster 48 OSD servers Dual Intel Xeon E5-2650 32 threads incl. HT Dual 10Gig-E NICs Only one connected 24x 3TB Hitachi disks Eco drive, ~5900 RPM 3x 2TB Hitachi system disks Triple mirror 5 monitors Dual Intel Xeon L5640 24 threads incl. HT Dual 1Gig-E NICs Only one connected 3x 2TB Hitachi system disks Triple mirror 48GB RAM 64GB RAM Internet Services CERN IT Department [root@p01001532971954 ~]# ceph osd tree | head -n2 # id weight type name up/down reweight -1 2883 root default CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 13. DSS Fully Puppetized Deployment Fully puppetized deployed ● Big thanks to eNovance for their module! https://github.com/enovance/puppet-ceph/ Automated machine commissioning ● ● ● ● Add a server to the hostgroup (osd, mon, radosgw) OSD disks are detected, formatted, prepared, auth’d Auto-generated ceph.conf Last step is manual/controlled: service ceph start We use mcollective for bulk operations on the servers ● ● Ceph rpm upgrades daemon restarts Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 14. DSS Our puppet-ceph changes ● Yum repository support ● Don’t export the admin key ○ ○ our puppet env is shared across CERN (get the key via k5 auth’d scp instead) ● New options: ○ osd default pool size, mon osd down out interval, osd crush location ● RADOS GW support (RHEL only) ○ https to be completed ● /dev/disk/by-path OSDs ○ better handle disk replacements ● Unmanaged osd service ○ manual control of the daemon ● Other OSD fixes: delay mkfs, don’t mount the disks, … Needs some cleanup before pushing back to enovance Internet Services CERN IT Department https://github.com/cernceph/puppet-ceph/ CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 15. DSS Puppet-ceph TODO/Wish-list We have some further puppet work in mind: ● Add arbitrary ceph.conf options ● Move the OSD journal to a separate partition ● SSD OSD journals ● Use the udev triggers for OSD creation Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 16. DSS Ceph Configuration 11 data pools with 3 replicas each ● mostly test pools for a few different use-cases ● 1-4k pgs per pool; 19584 pgs total Room/Rack in ceph.conf: osd crush location = room=0513-R-0050 rack=RJ35 Rack-wise replication: rule data { ruleset 0 type replicated min_size 1 max_size 10 step take 0513-R-0050 step chooseleaf firstn 0 type rack step emit } Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 17. DSS Ceph Configuration 11 data pools with 3 replicas each ● mostly test pools for a few different use-cases ● 1-4k pgs per pool; 19584 pgs total Room/Rack in ceph.conf: osd crush location = room=0513-R-0050 rack=RJ35 Internet Services CERN IT Department -1 -2 -3 -15 -16 -17 -18 -4 -23 -24 -25 -26 ... Rack-wise replication: 2883 root default rule data { 0513-R-0050 2883 room ruleset 0 262.1 rack RJ35 type replicated 65.52 host p05151113471870 min_size 1 65.52 host p05151113489275 max_size 10 65.52 host p05151113479552 step take 0513-R-0050 65.52 host p05151113498803 step chooseleaf firstn 0 type 262.1 rack RJ37 rack 65.52 host p05151113507373 step emit 65.52 host p05151113508409 } 65.52 host p05151113521447 65.52 host p05151113525886 CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 18. DSS Service Monitoring Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 19. DSS Service Monitoring A few monitoring helper scripts https://github.com/cernceph/ceph-scripts ceph-health-cron: ● report on the ceph health hourly cephinfo: ● python API to the ceph JSON dumps cern-sls: ● example usage of cephinfo.py ● compute and publish ceph availability and statistics Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 20. DSS Initial Benchmarks basic rados bench - saturate the network [root@p05151113471870 Total writes made: Write size: Bandwidth (MB/sec): Average Latency: [root@p05151113471870 Total reads made: Read size: Bandwidth (MB/sec): Average Latency: ~]# rados bench 30 -p test write -t 100 7596 4194304 997.560 0.395118 ~]# rados bench 30 -p test seq -t 100 7312 4194304 962.649 0.411129 all-to-all rados bench 120M file test Internet Services CERN IT Department Wrote 120 million tiny files into RADOS to measure scalability by that dimension. No problems observed. Then we added one OSD server, and the rebalance took ages (~24hrs) which is probably to be expected. CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 21. DSS Our Users A few early adopters are helping us evaluate Ceph: ● OpenStack: usage for Glance images and Cinder volumes ● AFS/NFS: backend RBD storage for these commonly used fs’s ● CASTOR: high performance buffer of objects to be written to tape ● DPM: backend RBD storage for this high-energy-physics fs ● OwnCloud: S3 or CephFS backend for desktop synchronisation ● Zenodo: backend storage for data and publications sharing service Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 22. DSS Openstack / Ceph Testing We are still validating the OpenStack / Ceph integration ● Being a RedHat shop, we require the version of qemu-kvm patched by Inktank to support RBD ● Our workloads benefit from striping: ○ Gary McGilvary developed and pushed some patches to allow configurable striping via the OpenStack UI ● Our grizzly cluster is using RBD ○ Small problem related to ulimit, see coming slide... ● For Cinder usage we are currently blocked: Internet Services CERN IT Department ○ ○ ○ Deployed Grizzly with cells to divide our large facilities Grizzly cells don’t support Cinder Belmiro Moreira backported the Havana code for Cinder/Cells; currently under test CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 23. DSS Ceph as a Tape Buffer? CASTOR holds much of our physics data ● 90PB total, 75PB on TAPE Tapes write at 250MB/s; without striping CASTOR diskservers cannot supply data at that rate. Idea: put a Ceph buffer between the disk servers and tape drives but… single threaded read performance [root@p05151113471870 ~]# rados bench 10 -p test seq -t 1 Total reads made: 612 Read size: 4194304 Bandwidth (MB/sec): 244.118 Average Latency: 0.0163772 Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 24. DSS Ceph as a Tape Buffer? So our colleage Andreas Peters prototyped a striping RADOS object client: cephcp cephcp [--verbose] [-p|--pool <pool>] [-i|--id <id>] [-C|--config <config>] [-n|--stripes <n>] [-b|--blocksize <bytes>] <sourcepath> <target-path> <source> is file:<localpath>|- or ceph:<objectname> <target> is ceph:<objectname> or file:<localpath>|- Upload: [root@p05151113471870 ~]# ./cephcp -p test -i admin -n 64 file: /root/1G.dat ceph:/root/1G.dat [cephcp] 1073741824 bytes copied in 1137.89 ms [ 943.63 MB/s ] Internet Services CERN IT Department Download [root@p05151113471870 ~]# ./cephcp -p test -i admin -n 64 ceph: /root/1G.dat file:/dev/null [cephcp] 1073741824 bytes copied in 1022.40 ms [ 1050.22 MB/s ] CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 25. DSS Current Issues Latency: ● Our best case write latency is presently 50ms ○ ● We tested an in-memory OSD and saw ~1ms latency ○ ● 1 replica, journal as a file on the OSD So our high latency comes from our journal We need to put our journals on the blockdev directly (should get ~12ms writes) or use SSDs (but we’re worried they’ll wear out) ulimits: ● With more than >1024 OSDs, we’re getting various errors where clients cannot create enough processes to connect to the OSDs ○ Internet Services CERN IT Department ● failed ceph tell, failed glance image uploads Our clients have been informed to increase ulimit -u to 4096, but it would useful if ceph was somehow less process greedy. CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 26. DSS Looking forward... The killer app for Ceph at CERN would be to build upon it a general purpose network file system ● ● Would help us get rid of NetApp boxes Dare we dream that it may one day replace AFS?! CephFS is advertised as not yet production quality, so we don’t advertise it to our users ● How far off is it? To be generally usable we’d need: ● ● HA and load balanced (for AFS we get accessed at 75kHz) All the goodies we get from AFS: quotas, ACLs, krb5, ... Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 27. DSS Conclusions We are attracting various use-cases ● OpenStack images and volumes ● RBD backends for other storage services (AFS/NFS/DPM) ● Object storage for novel applications: (tape buffer, Zenodo, OwnCloud) We have very high hopes for Ceph at CERN! ● the design is correct ● the performance so far is adequate ● operationally it is very attractive With CephFS or similar coming, the future of storage at CERN is starting to look rather ... Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph
  • 28. DSS Conclusions We have very high hopes for Ceph at CERN ● the design is correct ● the performance so far is adequate ● operationally it is very attractive We are attracting various use-cases: ● OpenStack images and volumes ● RBD backends for other storage services (AFS/NFS/DPM) ● Object storage for novel applications: (tape buffer, Zenodo) With CephFS or similar coming, the future of storage at CERN is starting to look rather ... Internet Services CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Wiebalck / van der Ster -- Building an organic block storage service at CERN with Ceph