SlideShare a Scribd company logo
1 of 43
[object Object],[object Object],[object Object]
Riak is... ,[object Object],[object Object],[object Object],[object Object]
Riak Data Model ,[object Object],[object Object],[object Object]
Basic Operations ,[object Object],[object Object],[object Object]
Extras ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
When things go wrong ,[object Object]
Situation ,[object Object],[object Object],[object Object]
Solution ,[object Object]
Hostnames ,[object Object],Aston IPA Highball Gin Framboise ESB
riak-admin join ,[object Object],[object Object],[object Object]
This can’t be good...
Quick, what do you do? ,[object Object],[object Object],[object Object]
Control the situation ,[object Object],[object Object],[object Object],[object Object]
Monitor
...for signs of...
Stabilization
Now what? ,[object Object],[object Object],[object Object]
But first ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
So what happened?! ,[object Object],[object Object],[object Object],[object Object],[object Object]
Member Status First let’s peek under the hood. $ riak-admin member_status ================================= Membership ================================Status  Ring  Pending  Node-----------------------------------------------------------------------------valid  4.3%  16.8%  riak@astonvalid  18.8%  16.8%  riak@esbvalid  19.1%  16.8%  riak@framboisevalid  19.5%  16.8%  riak@ginvalid  19.1%  16.4%  riak@highballvalid  19.1%  16.4%  riak@ipa-----------------------------------------------------------------------------Valid:6 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
Relief ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Relief ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Relief ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Eureka! ,[object Object],[object Object]
What’s the solution? ,[object Object],[object Object]
Hot Patch We patched their live, production system  while still under load. (on all nodes)  riak attachl(riak_kv_bitcask_backend).m(riak_kv_bitcask_backend).Module riak_kv_bitcask_backend compiled: Date: November 12 2011, Time: 04.18Compiler options:  [{outdir,"ebin"},  debug_info,warnings_as_errors,  {parse_transform,lager_transform},  {i,"include"}]Object file: /usr/lib/riak/lib/riak_kv-1.0.1/ebin/riak_kv_bitcask_backend.beamExports: api_version/0  is_empty/1callback/3  key_counts/0delete/4  key_counts/1drop/1  module_info/0fold_buckets/4  module_info/1fold_keys/4  put/5fold_objects/4  start/2get/3  status/1...
Bingo! And the new code did what we expected. {ok, R} = riak_core_ring_manager:get_my_ring().[riak_core_vnode_master:get_vnode_pid(Partition, riak_kv_vnode) || {Partition,_} <- riak_core_ring:all_owners(R)].(riak@gin)19> [riak_core_vnode_master:get_vnode_pid(Partition, riak_kv_vnode) || {Partition,_} <- riak_core_ring:all_owners(R)].22:48:07.423 [notice]  Unused data directories exist for partition &quot;11417981541647679048466287755595961091061972992 &quot;: &quot;/data/riak/bitcask/11417981541647679048466287755595961091061972992&quot;22:48:07.785 [notice]  Unused data directories exist for partition &quot;582317058624031631471780675535394015644160622592 &quot;: &quot;/data/riak/bitcask/582317058624031631471780675535394015644160622592&quot;22:48:07.829 [notice]  Unused data directories exist for partition &quot;782131735602866014819940711258323334737745149952 &quot;: &quot;/data/riak/bitcask/782131735602866014819940711258323334737745149952&quot;[{ok,<0.30093.11>},...
Manual Cleanup So we backed up those vnodes with unused data on Gin to another system and manually removed them. gin:/data/riak/bitcask$ ls manual_cleanup/ 11417981541647679048466287755595961091061972992  782131735602866014819940711258323334737745149952582317058624031631471780675535394015644160622592 gin:/data/riak/bitcask$ rm -rf manual_cleanup
Gin’s Status Improves
Bedtime ,[object Object],[object Object]
Next Day’s Plan ,[object Object],[object Object],[object Object],[object Object]
Let’s Get Started ,[object Object],[object Object],[object Object],[object Object],[object Object]
Gin Moves Data to IPA
Highball’s Turn Highball was next lowest now that Gin was handing data off, time to restart it too. on highball application:unset_env(riak_core, forced_ownership_handoff).application:set_env(riak_core, vnode_inactivity_timeout, 60000).application:set_env(riak_core, handoff_concurrency, 1). on gin application:set_env(riak_core, handoff_concurrency, 4). % the default setting riak_core_vnode_manager:force_handoffs().
Rebalance Starts
and keeps going...
and going...
and going...
Rebalanced
Minimal Impact ,[object Object],[object Object]
Moral of the Story ,[object Object],[object Object],[object Object]
Things break, Riak  bends . .
Thank You ,[object Object],[object Object],[object Object]

More Related Content

Similar to Riak a successful failure

Saga pattern and event sourcing with kafka
Saga pattern and event sourcing with kafkaSaga pattern and event sourcing with kafka
Saga pattern and event sourcing with kafkaRoan Brasil Monteiro
 
Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)J.B. Langston
 
Oracle Active Data Guard 12c New Features
Oracle Active Data Guard 12c New FeaturesOracle Active Data Guard 12c New Features
Oracle Active Data Guard 12c New FeaturesEmre Baransel
 
Capacity Management for Web Operations
Capacity Management for Web OperationsCapacity Management for Web Operations
Capacity Management for Web OperationsJohn Allspaw
 
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...DataStax
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com confluent
 
J An Gutierrez Credentials
J An Gutierrez CredentialsJ An Gutierrez Credentials
J An Gutierrez CredentialsJ. An Gutierrez
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01Karam Abuataya
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11gfcamachob
 
Scylla Summit 2018: What's New in Scylla Manager?
Scylla Summit 2018: What's New in Scylla Manager?Scylla Summit 2018: What's New in Scylla Manager?
Scylla Summit 2018: What's New in Scylla Manager?ScyllaDB
 
Scaling Data Quality @ Netflix
Scaling Data Quality @ NetflixScaling Data Quality @ Netflix
Scaling Data Quality @ NetflixMichelle Ufford
 
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixWhoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixDataWorks Summit
 
B35 all you wanna know about rman by francisco alvarez
B35 all you wanna know about rman by francisco alvarezB35 all you wanna know about rman by francisco alvarez
B35 all you wanna know about rman by francisco alvarezInsight Technology, Inc.
 
Oracle10g New Features I
Oracle10g New Features IOracle10g New Features I
Oracle10g New Features IDenish Patel
 
Patterns in the cloud
Patterns in the cloudPatterns in the cloud
Patterns in the cloudDavid Manning
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016
MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016
MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016Dave Stokes
 

Similar to Riak a successful failure (20)

Saga pattern and event sourcing with kafka
Saga pattern and event sourcing with kafkaSaga pattern and event sourcing with kafka
Saga pattern and event sourcing with kafka
 
Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)
 
Oracle Active Data Guard 12c New Features
Oracle Active Data Guard 12c New FeaturesOracle Active Data Guard 12c New Features
Oracle Active Data Guard 12c New Features
 
Capacity Management for Web Operations
Capacity Management for Web OperationsCapacity Management for Web Operations
Capacity Management for Web Operations
 
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
 
MySQL Monitoring 101
MySQL Monitoring 101MySQL Monitoring 101
MySQL Monitoring 101
 
Backups
BackupsBackups
Backups
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
J An Gutierrez Credentials
J An Gutierrez CredentialsJ An Gutierrez Credentials
J An Gutierrez Credentials
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11g
 
Scylla Summit 2018: What's New in Scylla Manager?
Scylla Summit 2018: What's New in Scylla Manager?Scylla Summit 2018: What's New in Scylla Manager?
Scylla Summit 2018: What's New in Scylla Manager?
 
Scaling Data Quality @ Netflix
Scaling Data Quality @ NetflixScaling Data Quality @ Netflix
Scaling Data Quality @ Netflix
 
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixWhoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
 
B35 all you wanna know about rman by francisco alvarez
B35 all you wanna know about rman by francisco alvarezB35 all you wanna know about rman by francisco alvarez
B35 all you wanna know about rman by francisco alvarez
 
Oracle10g New Features I
Oracle10g New Features IOracle10g New Features I
Oracle10g New Features I
 
Patterns in the cloud
Patterns in the cloudPatterns in the cloud
Patterns in the cloud
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016
MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016
MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016
 

More from GiltTech

Real World Cassandra
Real World CassandraReal World Cassandra
Real World CassandraGiltTech
 
Gotszling mogo db-membase
Gotszling mogo db-membaseGotszling mogo db-membase
Gotszling mogo db-membaseGiltTech
 
Couchdb at AMEX
Couchdb at AMEXCouchdb at AMEX
Couchdb at AMEXGiltTech
 
Scala for the web Lightning Talk
Scala for the web Lightning TalkScala for the web Lightning Talk
Scala for the web Lightning TalkGiltTech
 
Clojure Lightning Talk
Clojure Lightning TalkClojure Lightning Talk
Clojure Lightning TalkGiltTech
 
CoffeeScript Lightning Talk
CoffeeScript Lightning TalkCoffeeScript Lightning Talk
CoffeeScript Lightning TalkGiltTech
 
Erlang Lightning Talk
Erlang Lightning TalkErlang Lightning Talk
Erlang Lightning TalkGiltTech
 
Groovy and Grails
Groovy and GrailsGroovy and Grails
Groovy and GrailsGiltTech
 
Java to scala
Java to scalaJava to scala
Java to scalaGiltTech
 

More from GiltTech (9)

Real World Cassandra
Real World CassandraReal World Cassandra
Real World Cassandra
 
Gotszling mogo db-membase
Gotszling mogo db-membaseGotszling mogo db-membase
Gotszling mogo db-membase
 
Couchdb at AMEX
Couchdb at AMEXCouchdb at AMEX
Couchdb at AMEX
 
Scala for the web Lightning Talk
Scala for the web Lightning TalkScala for the web Lightning Talk
Scala for the web Lightning Talk
 
Clojure Lightning Talk
Clojure Lightning TalkClojure Lightning Talk
Clojure Lightning Talk
 
CoffeeScript Lightning Talk
CoffeeScript Lightning TalkCoffeeScript Lightning Talk
CoffeeScript Lightning Talk
 
Erlang Lightning Talk
Erlang Lightning TalkErlang Lightning Talk
Erlang Lightning Talk
 
Groovy and Grails
Groovy and GrailsGroovy and Grails
Groovy and Grails
 
Java to scala
Java to scalaJava to scala
Java to scala
 

Recently uploaded

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Riak a successful failure

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. This can’t be good...
  • 12.
  • 13.
  • 17.
  • 18.
  • 19.
  • 20. Member Status First let’s peek under the hood. $ riak-admin member_status ================================= Membership ================================Status Ring Pending Node-----------------------------------------------------------------------------valid 4.3% 16.8% riak@astonvalid 18.8% 16.8% riak@esbvalid 19.1% 16.8% riak@framboisevalid 19.5% 16.8% riak@ginvalid 19.1% 16.4% riak@highballvalid 19.1% 16.4% riak@ipa-----------------------------------------------------------------------------Valid:6 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. Hot Patch We patched their live, production system while still under load. (on all nodes) riak attachl(riak_kv_bitcask_backend).m(riak_kv_bitcask_backend).Module riak_kv_bitcask_backend compiled: Date: November 12 2011, Time: 04.18Compiler options: [{outdir,&quot;ebin&quot;}, debug_info,warnings_as_errors, {parse_transform,lager_transform}, {i,&quot;include&quot;}]Object file: /usr/lib/riak/lib/riak_kv-1.0.1/ebin/riak_kv_bitcask_backend.beamExports: api_version/0 is_empty/1callback/3 key_counts/0delete/4 key_counts/1drop/1 module_info/0fold_buckets/4 module_info/1fold_keys/4 put/5fold_objects/4 start/2get/3 status/1...
  • 27. Bingo! And the new code did what we expected. {ok, R} = riak_core_ring_manager:get_my_ring().[riak_core_vnode_master:get_vnode_pid(Partition, riak_kv_vnode) || {Partition,_} <- riak_core_ring:all_owners(R)].(riak@gin)19> [riak_core_vnode_master:get_vnode_pid(Partition, riak_kv_vnode) || {Partition,_} <- riak_core_ring:all_owners(R)].22:48:07.423 [notice] Unused data directories exist for partition &quot;11417981541647679048466287755595961091061972992 &quot;: &quot;/data/riak/bitcask/11417981541647679048466287755595961091061972992&quot;22:48:07.785 [notice] Unused data directories exist for partition &quot;582317058624031631471780675535394015644160622592 &quot;: &quot;/data/riak/bitcask/582317058624031631471780675535394015644160622592&quot;22:48:07.829 [notice] Unused data directories exist for partition &quot;782131735602866014819940711258323334737745149952 &quot;: &quot;/data/riak/bitcask/782131735602866014819940711258323334737745149952&quot;[{ok,<0.30093.11>},...
  • 28. Manual Cleanup So we backed up those vnodes with unused data on Gin to another system and manually removed them. gin:/data/riak/bitcask$ ls manual_cleanup/ 11417981541647679048466287755595961091061972992 782131735602866014819940711258323334737745149952582317058624031631471780675535394015644160622592 gin:/data/riak/bitcask$ rm -rf manual_cleanup
  • 30.
  • 31.
  • 32.
  • 33. Gin Moves Data to IPA
  • 34. Highball’s Turn Highball was next lowest now that Gin was handing data off, time to restart it too. on highball application:unset_env(riak_core, forced_ownership_handoff).application:set_env(riak_core, vnode_inactivity_timeout, 60000).application:set_env(riak_core, handoff_concurrency, 1). on gin application:set_env(riak_core, handoff_concurrency, 4). % the default setting riak_core_vnode_manager:force_handoffs().
  • 40.
  • 41.
  • 42. Things break, Riak bends . .
  • 43.

Editor's Notes

  1. Thank you to Greg Burd for most of these slides. He was going to give the presentation, but did not feel well enough to be here tonight.
  2. Gin had not removed that vnode’s data directory after sending it to Aston.We had confirmation, data was not being removed after transfers finished.This would have eventually eaten all space on all nodes and halted the cluster.
  3. We already had a solution ready for 1.0.2 which would properly identify any orphaned vnodes, why not simply use that?So we tested that on our laptops, creating a close approximation of the customer’s environment.
  4. At this point it was late at night, the cluster was servicing requests as always and customers had no idea anything was wrong.We all went to bed.And didn’t reconvene for 12 hours.
  5. On Gin only, we reset things we’d changed to default values and then re-enabled handoffs.