SlideShare a Scribd company logo
1 of 21
Ontario Institute
for Cancer Research
Migrating 8.3PiB of
Ceph from Filestore
to Bluestore
October 23rd 2018
2
Why move to Bluestore?
● Supportability
● Lower latency
● Higher throughput
ONTARIO INSTITUTE FOR CANCER RESEARCH
Read more @ https://ceph.com/community/new-luminous-bluestore/
ONTARIO INSTITUTE FOR CANCER RESEARCH
How?
100% AI
ONTARIO INSTITUTE FOR CANCER RESEARCH
6
1. Add Luminous repository
2. apt-get install ceph
Done!
ONTARIO INSTITUTE FOR CANCER RESEARCH
ONTARIO INSTITUTE FOR CANCER RESEARCH
8
Migration process for each Storage node
Drain
Drain data from all OSD’s on desired
storage node
Find the numerical range of OSD’s (684 to
719) and change the osd crush weight to 0
Convert the OSD’s on desired storage
node from Filestore to Bluestore
*More detail in next few slides
Convert
Refill the OSD’s on desired storage node
Using the same range of OSD’s from the
Drain step, change the osd crush weight
to the appropriate disk size
Fill
Draining
9
ONTARIO INSTITUTE FOR CANCER RESEARCH
for i in $(seq 648 683); do ceph osd crush reweight osd.$i 0; done
● for loop to drain a server worth of OSD’s
● ~24 hours per server
● 1-2 servers draining at a time
● Multi-rack draining
● Wait for ‘ceph health ok’
● Tuneables
osd recovery max active 3 -> 4
osd max backfills 1 -> 16
Draining
10
ONTARIO INSTITUTE FOR CANCER RESEARCH
Majority drained in 3 hours
Long tail of 28 hours to complete
144TB server case study
Draining
11
ONTARIO INSTITUTE FOR CANCER RESEARCH
360TB server case study
Steady drain for 13 hours
Converting to Bluestore
12
ONTARIO INSTITUTE FOR CANCER RESEARCH
Migrate bluestore script @ https://github.com/CancerCollaboratory/infrastructure
1. Stop the OSD process (systemctl stop ceph-osd@501.service)
2. Unmount the OSD (umount /dev/sdr1)
3. Zap the disk (ceph-disk zap 501)
4. Mark the OSD as destroyed (ceph osd destroy 501 --yes-i-really-mean-it)
5. Prepare the disk as Bluestore (ceph-disk prepare --bluestore /dev/sdr --osd-id 501)
Filling
13
ONTARIO INSTITUTE FOR CANCER RESEARCH
for i in $(seq 648 683); do ceph osd crush reweight osd.$i 3.640; done
● for loop to fill a server worth of OSD’s
● ~24 hours per server
● 1-2 servers filling at a time
● Multi-rack draining
● Wait for ‘ceph health ok’
● Monitoring caveat
Filling
14
ONTARIO INSTITUTE FOR CANCER RESEARCH
144TB server case study
Filling
15
ONTARIO INSTITUTE FOR CANCER RESEARCH
360TB server case study
Filling
16
ONTARIO INSTITUTE FOR CANCER RESEARCH
Monitoring caveat
Zabbix graphs built from zabbix-agent
xfs disk usage
Grafana w/ graphite and ceph-mgr
Tracking & Monitoring of progress
17
ONTARIO INSTITUTE FOR CANCER RESEARCH
How long did it take?
18
ONTARIO INSTITUTE FOR CANCER RESEARCH
0101011101010101000101101010101010
Start Finish
End of July Early September
+480TB of data uploaded during this time by researchers
+1PB of capacity added during migration (new nodes)
188TB of data served from the object store
Performance impact during migration
19
ONTARIO INSTITUTE FOR CANCER RESEARCH
Issues
20
ONTARIO INSTITUTE FOR CANCER RESEARCH
● Increased amount of drive failures
○ 4 failures within a week at the end of the migration
● Ceph monmap growing to ~15GB
Funding for the Ontario Institute for Cancer Research
is provided by the Government of Ontario

More Related Content

Similar to Open stack meetup oct 2018 migrating 8.3pb of ceph

Ceph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance ArchiectureCeph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance ArchiectureCeph Community
 
Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture Ceph Community
 
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Community
 
Architecting Ceph Solutions
Architecting Ceph SolutionsArchitecting Ceph Solutions
Architecting Ceph SolutionsRed_Hat_Storage
 
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph Ceph Community
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu
 
OS Slide Ch12 13
OS Slide Ch12 13OS Slide Ch12 13
OS Slide Ch12 13庭緯 陳
 
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration Ceph Community
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterEttore Simone
 
CEPH DAY BERLIN - DISK HEALTH PREDICTION AND RESOURCE ALLOCATION FOR CEPH BY ...
CEPH DAY BERLIN - DISK HEALTH PREDICTION AND RESOURCE ALLOCATION FOR CEPH BY ...CEPH DAY BERLIN - DISK HEALTH PREDICTION AND RESOURCE ALLOCATION FOR CEPH BY ...
CEPH DAY BERLIN - DISK HEALTH PREDICTION AND RESOURCE ALLOCATION FOR CEPH BY ...Ceph Community
 
Ceph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph Community
 
SQLIO - measuring storage performance
SQLIO - measuring storage performanceSQLIO - measuring storage performance
SQLIO - measuring storage performancevalerian_ceaus
 
Oracle Exadata Exam Dump
Oracle Exadata Exam DumpOracle Exadata Exam Dump
Oracle Exadata Exam DumpPooja C
 
Performance analysis with_ceph
Performance analysis with_cephPerformance analysis with_ceph
Performance analysis with_cephAlex Lau
 
16aug06.ppt
16aug06.ppt16aug06.ppt
16aug06.pptzagreb2
 
Appliance Launch Webcast
Appliance Launch WebcastAppliance Launch Webcast
Appliance Launch WebcastGina Tragos
 
SUSE - performance analysis-with_ceph
SUSE - performance analysis-with_cephSUSE - performance analysis-with_ceph
SUSE - performance analysis-with_cephinwin stack
 

Similar to Open stack meetup oct 2018 migrating 8.3pb of ceph (20)

Ceph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance ArchiectureCeph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance Archiecture
 
Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture
 
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
 
Architecting Ceph Solutions
Architecting Ceph SolutionsArchitecting Ceph Solutions
Architecting Ceph Solutions
 
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 
OS Slide Ch12 13
OS Slide Ch12 13OS Slide Ch12 13
OS Slide Ch12 13
 
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data Center
 
CEPH DAY BERLIN - DISK HEALTH PREDICTION AND RESOURCE ALLOCATION FOR CEPH BY ...
CEPH DAY BERLIN - DISK HEALTH PREDICTION AND RESOURCE ALLOCATION FOR CEPH BY ...CEPH DAY BERLIN - DISK HEALTH PREDICTION AND RESOURCE ALLOCATION FOR CEPH BY ...
CEPH DAY BERLIN - DISK HEALTH PREDICTION AND RESOURCE ALLOCATION FOR CEPH BY ...
 
Ceph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph used in Cancer Research at OICR
Ceph used in Cancer Research at OICR
 
SQLIO - measuring storage performance
SQLIO - measuring storage performanceSQLIO - measuring storage performance
SQLIO - measuring storage performance
 
Oracle Exadata Exam Dump
Oracle Exadata Exam DumpOracle Exadata Exam Dump
Oracle Exadata Exam Dump
 
Performance analysis with_ceph
Performance analysis with_cephPerformance analysis with_ceph
Performance analysis with_ceph
 
16aug06.ppt
16aug06.ppt16aug06.ppt
16aug06.ppt
 
Stabilizing Ceph
Stabilizing CephStabilizing Ceph
Stabilizing Ceph
 
Appliance Launch Webcast
Appliance Launch WebcastAppliance Launch Webcast
Appliance Launch Webcast
 
SUSE - performance analysis-with_ceph
SUSE - performance analysis-with_cephSUSE - performance analysis-with_ceph
SUSE - performance analysis-with_ceph
 
JetStor NAS series 2016
JetStor NAS series 2016JetStor NAS series 2016
JetStor NAS series 2016
 
ceph-barcelona-v-1.2
ceph-barcelona-v-1.2ceph-barcelona-v-1.2
ceph-barcelona-v-1.2
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Open stack meetup oct 2018 migrating 8.3pb of ceph

  • 1. Ontario Institute for Cancer Research Migrating 8.3PiB of Ceph from Filestore to Bluestore October 23rd 2018
  • 2. 2 Why move to Bluestore? ● Supportability ● Lower latency ● Higher throughput ONTARIO INSTITUTE FOR CANCER RESEARCH Read more @ https://ceph.com/community/new-luminous-bluestore/
  • 3. ONTARIO INSTITUTE FOR CANCER RESEARCH How?
  • 5. ONTARIO INSTITUTE FOR CANCER RESEARCH
  • 6. 6 1. Add Luminous repository 2. apt-get install ceph Done!
  • 7. ONTARIO INSTITUTE FOR CANCER RESEARCH
  • 8. ONTARIO INSTITUTE FOR CANCER RESEARCH 8 Migration process for each Storage node Drain Drain data from all OSD’s on desired storage node Find the numerical range of OSD’s (684 to 719) and change the osd crush weight to 0 Convert the OSD’s on desired storage node from Filestore to Bluestore *More detail in next few slides Convert Refill the OSD’s on desired storage node Using the same range of OSD’s from the Drain step, change the osd crush weight to the appropriate disk size Fill
  • 9. Draining 9 ONTARIO INSTITUTE FOR CANCER RESEARCH for i in $(seq 648 683); do ceph osd crush reweight osd.$i 0; done ● for loop to drain a server worth of OSD’s ● ~24 hours per server ● 1-2 servers draining at a time ● Multi-rack draining ● Wait for ‘ceph health ok’ ● Tuneables osd recovery max active 3 -> 4 osd max backfills 1 -> 16
  • 10. Draining 10 ONTARIO INSTITUTE FOR CANCER RESEARCH Majority drained in 3 hours Long tail of 28 hours to complete 144TB server case study
  • 11. Draining 11 ONTARIO INSTITUTE FOR CANCER RESEARCH 360TB server case study Steady drain for 13 hours
  • 12. Converting to Bluestore 12 ONTARIO INSTITUTE FOR CANCER RESEARCH Migrate bluestore script @ https://github.com/CancerCollaboratory/infrastructure 1. Stop the OSD process (systemctl stop ceph-osd@501.service) 2. Unmount the OSD (umount /dev/sdr1) 3. Zap the disk (ceph-disk zap 501) 4. Mark the OSD as destroyed (ceph osd destroy 501 --yes-i-really-mean-it) 5. Prepare the disk as Bluestore (ceph-disk prepare --bluestore /dev/sdr --osd-id 501)
  • 13. Filling 13 ONTARIO INSTITUTE FOR CANCER RESEARCH for i in $(seq 648 683); do ceph osd crush reweight osd.$i 3.640; done ● for loop to fill a server worth of OSD’s ● ~24 hours per server ● 1-2 servers filling at a time ● Multi-rack draining ● Wait for ‘ceph health ok’ ● Monitoring caveat
  • 14. Filling 14 ONTARIO INSTITUTE FOR CANCER RESEARCH 144TB server case study
  • 15. Filling 15 ONTARIO INSTITUTE FOR CANCER RESEARCH 360TB server case study
  • 16. Filling 16 ONTARIO INSTITUTE FOR CANCER RESEARCH Monitoring caveat Zabbix graphs built from zabbix-agent xfs disk usage Grafana w/ graphite and ceph-mgr
  • 17. Tracking & Monitoring of progress 17 ONTARIO INSTITUTE FOR CANCER RESEARCH
  • 18. How long did it take? 18 ONTARIO INSTITUTE FOR CANCER RESEARCH 0101011101010101000101101010101010 Start Finish End of July Early September +480TB of data uploaded during this time by researchers +1PB of capacity added during migration (new nodes) 188TB of data served from the object store
  • 19. Performance impact during migration 19 ONTARIO INSTITUTE FOR CANCER RESEARCH
  • 20. Issues 20 ONTARIO INSTITUTE FOR CANCER RESEARCH ● Increased amount of drive failures ○ 4 failures within a week at the end of the migration ● Ceph monmap growing to ~15GB
  • 21. Funding for the Ontario Institute for Cancer Research is provided by the Government of Ontario