SlideShare uma empresa Scribd logo
1 de 47
ABOUT NETFLIX
NETFLIX
ACTIVE - ACTIVE
WHAT IS ACTIVE-ACTIVE
Also called dual active, it is a phrase used to
describe a network of independent processing nodes
where each node has access to a replicated database
giving each node access and usage of single
application. In an active-active system all requests are
load balanced across all available processing capacity,
Where a failure occurs on a node, another node in the
network takes its place.
DOES AN INSTANCE FAIL?
• It can, plan for it
• Bad code / configuration pushes
• Latent issues
• Hardware failure
• Test with Chaos Monkey
DOES A ZONE FAIL?
• Rarely, but happened before
• Routing issues
• DC-specific issues
• App-specific issues within a zone
• Test with Chaos Gorilla
DOES A REGION FAIL?
• Full region – unlikely, very rare
• Individual Services can fail region-wide
• Most likely, a region-wide configuration issues
• Test with Chaos Kong
EVERYTHING FAILS… EVENTUALLY
• Keep your services running by embracing isolation and
redundancy
• Construct a highly agile and highly available service
from ephemeral and assumed broken components
ISOLATION
• Changes in one region should not affect others
• Regional outage should not affect others
• Network partitioning between regions should not affect
functionality / operations
REDUNDANCY
• Make more than one (of pretty much everything)
• Specifically, distribute services across Availability
Zones and regions
HISTORY: X-MAS EVE 2012
• Netflix multi-hour outage
• US-East1 regional Elastic Load Balancing issue
• “...data was deleted by a maintenance process
that was inadvertently run against the
production ELB state data”
ACTIVE-ACTIVE ARCHITECTURE
THE PROCESS
IDENTIFYING CLUSTERS FOR AA
SNITCH CHANGES
EC2Snitch EC2MultiRegionSnitch
Uses Private IPs Uses Public IPs
PRIAM.MULTIREGION.ENABLE =TRUE
tcp 7101-7101 [ ] [10.190.21.36/32, 10.232.200.17/32, 10.33.573.26/32,
10.20.151.165/32, 10.226.99.46/32, 10.244.143.193/32]
tcp 7103-7103 [ ] [54.196.221.136/32, 54.202.200.217/32, 54.203.57.226/32,
54.205.151.165/32, 54.226.99.46/32, 54.244.143.193/32]
SPIN UP NODES IN NEW REGION
us-east-1 us-west-2
APP
UPDATE KEYSPACE
Update keyspace <keyspace> with placement_strategy =
'NetworkTopologyStrategy'
and strategy_options = {us-east : 3, us-west-2 : 3};
Existing region and replication factor New region and replication factor
REBUILD NEW REGION
Run – nodetool rebuild us-east-1 on all us-west-2 nodes
RUN NODETOOL REPAIR
VALIDATION
BENCHMARKING GLOBAL CASSANDRA
WRITE INTENSIVE TEST OF CROSS-REGION REPLICATION
CAPACITY
16 X HI1.4XLARGE SSD NODES PER ZONE = 96 TOTAL
192 TB OF SSD IN SIX LOCATIONS UP AND RUNNING
CASSANDRA IN 20 MINUTES
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
US-West-2 Region - Oregon
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
US-East-1 Region - Virginia
Test
Load
Test
Load
Validation
Load
Interzone Traffic
1 Million Writes
CL.ONE (Wait for
One Replica to ack)
1 Million Reads
after 500 ms
CL.ONE with No
Data Loss
Interregional Traffic
Up to 9Gbits/s, 83ms 18 TB backups
from S3
TEST FOR THUNDERING HERD
TEST FOR RETRIES
FAILURE
RETRY
KEY METRICS USED
• 99 /95 th Read Latency (Client & C*)
• Dropped Metrics on C*
• Exceptions on C*
• Heap Usage on C*
• CPU Usage (Client & C*)
• Threads Pending on C*
CONFIGURATION FOR TEST
• 24 Node C* SSDs
• 220 Client instances
• 70+ Jmeter Instances
C* IOPS
TOTAL READ IOPS
TOTAL WRITE IOPS
95th LATENCY
99th LATENCY
CHECK FOR CEILING
NETWORK PARTITION
us-east-1 us-west-2
TAKEAWAYS
REPAIRS AFTER EXTENSION ARE PAINFUL !!
TIME TO REPAIR DEPENDS ON
• Number of regions
• Number of replicas
• Data size
• Amount of entropy
ADJUST GC_GRACE AFTER
EXTENSION
• Column Family Setting
• Defined in seconds
• Default 10 days
• Tweak gc_grace settings to
accommodate time taken to repair
• BEWARE of deleted columns
RUNBOOK
PLAN FOR CAPACITY
CONSISTENCY LEVEL
• Check the client for consistency level setting
• In a Multiregional cluster QUORUM <>
LOCAL_QUORUM
• Recommended consistency levels
LOCAL_ONE (CASSANDRA-6202) for reads
and LOCAL_QUORUM for writes
• For region resiliency avoid – ALL or
QUORUM calls
HOW DO WE KNOW IT WORKS?
CREATE CHAOS!!
Benchmark …
Time Consuming
But worth it!
Active Active - C* Behind the Scenes at Netflix

Mais conteúdo relacionado

Destaque

3800 die-bonder overview
3800 die-bonder overview3800 die-bonder overview
3800 die-bonder overviewfastbr
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Johnny Miller
 
Cassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For OperatorsCassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For OperatorsJeff Jirsa
 
Securing Cassandra
Securing CassandraSecuring Cassandra
Securing CassandraInstaclustr
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsJulien Anguenot
 
Multi-Region Cassandra Clusters
Multi-Region Cassandra ClustersMulti-Region Cassandra Clusters
Multi-Region Cassandra ClustersInstaclustr
 
Apache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceApache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceDataStax Academy
 
Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014
Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014
Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014Arun Gupta
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupVictor Coustenoble
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSDataStax Academy
 
Cassandra Operations at Netflix
Cassandra Operations at NetflixCassandra Operations at Netflix
Cassandra Operations at Netflixgreggulrich
 
An Introduction to Priam
An Introduction to PriamAn Introduction to Priam
An Introduction to PriamJason Brown
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center StrategiesSteven Francia
 
Ficstar Software: Cassandra Installation to Optimization
Ficstar Software: Cassandra Installation to OptimizationFicstar Software: Cassandra Installation to Optimization
Ficstar Software: Cassandra Installation to OptimizationDataStax Academy
 
Target: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at TargetTarget: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at TargetDataStax Academy
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax Academy
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...DataStax Academy
 
Performance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationPerformance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationRamkumar Nottath
 
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...DataStax
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsAcunu
 

Destaque (20)

3800 die-bonder overview
3800 die-bonder overview3800 die-bonder overview
3800 die-bonder overview
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
Cassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For OperatorsCassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For Operators
 
Securing Cassandra
Securing CassandraSecuring Cassandra
Securing Cassandra
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
 
Multi-Region Cassandra Clusters
Multi-Region Cassandra ClustersMulti-Region Cassandra Clusters
Multi-Region Cassandra Clusters
 
Apache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceApache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis Price
 
Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014
Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014
Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWS
 
Cassandra Operations at Netflix
Cassandra Operations at NetflixCassandra Operations at Netflix
Cassandra Operations at Netflix
 
An Introduction to Priam
An Introduction to PriamAn Introduction to Priam
An Introduction to Priam
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center Strategies
 
Ficstar Software: Cassandra Installation to Optimization
Ficstar Software: Cassandra Installation to OptimizationFicstar Software: Cassandra Installation to Optimization
Ficstar Software: Cassandra Installation to Optimization
 
Target: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at TargetTarget: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at Target
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The Sequel
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
 
Performance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationPerformance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migration
 
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
 

Mais de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Mais de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Último

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Último (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Active Active - C* Behind the Scenes at Netflix

  • 1.
  • 4.
  • 6. WHAT IS ACTIVE-ACTIVE Also called dual active, it is a phrase used to describe a network of independent processing nodes where each node has access to a replicated database giving each node access and usage of single application. In an active-active system all requests are load balanced across all available processing capacity, Where a failure occurs on a node, another node in the network takes its place.
  • 7. DOES AN INSTANCE FAIL? • It can, plan for it • Bad code / configuration pushes • Latent issues • Hardware failure • Test with Chaos Monkey
  • 8. DOES A ZONE FAIL? • Rarely, but happened before • Routing issues • DC-specific issues • App-specific issues within a zone • Test with Chaos Gorilla
  • 9. DOES A REGION FAIL? • Full region – unlikely, very rare • Individual Services can fail region-wide • Most likely, a region-wide configuration issues • Test with Chaos Kong
  • 10. EVERYTHING FAILS… EVENTUALLY • Keep your services running by embracing isolation and redundancy • Construct a highly agile and highly available service from ephemeral and assumed broken components
  • 11. ISOLATION • Changes in one region should not affect others • Regional outage should not affect others • Network partitioning between regions should not affect functionality / operations
  • 12. REDUNDANCY • Make more than one (of pretty much everything) • Specifically, distribute services across Availability Zones and regions
  • 13. HISTORY: X-MAS EVE 2012 • Netflix multi-hour outage • US-East1 regional Elastic Load Balancing issue • “...data was deleted by a maintenance process that was inadvertently run against the production ELB state data”
  • 15.
  • 16.
  • 20. PRIAM.MULTIREGION.ENABLE =TRUE tcp 7101-7101 [ ] [10.190.21.36/32, 10.232.200.17/32, 10.33.573.26/32, 10.20.151.165/32, 10.226.99.46/32, 10.244.143.193/32] tcp 7103-7103 [ ] [54.196.221.136/32, 54.202.200.217/32, 54.203.57.226/32, 54.205.151.165/32, 54.226.99.46/32, 54.244.143.193/32]
  • 21.
  • 22. SPIN UP NODES IN NEW REGION us-east-1 us-west-2 APP
  • 23. UPDATE KEYSPACE Update keyspace <keyspace> with placement_strategy = 'NetworkTopologyStrategy' and strategy_options = {us-east : 3, us-west-2 : 3}; Existing region and replication factor New region and replication factor
  • 24. REBUILD NEW REGION Run – nodetool rebuild us-east-1 on all us-west-2 nodes
  • 27. BENCHMARKING GLOBAL CASSANDRA WRITE INTENSIVE TEST OF CROSS-REGION REPLICATION CAPACITY 16 X HI1.4XLARGE SSD NODES PER ZONE = 96 TOTAL 192 TB OF SSD IN SIX LOCATIONS UP AND RUNNING CASSANDRA IN 20 MINUTES Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C US-West-2 Region - Oregon Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C US-East-1 Region - Virginia Test Load Test Load Validation Load Interzone Traffic 1 Million Writes CL.ONE (Wait for One Replica to ack) 1 Million Reads after 500 ms CL.ONE with No Data Loss Interregional Traffic Up to 9Gbits/s, 83ms 18 TB backups from S3
  • 30. KEY METRICS USED • 99 /95 th Read Latency (Client & C*) • Dropped Metrics on C* • Exceptions on C* • Heap Usage on C* • CPU Usage (Client & C*) • Threads Pending on C*
  • 31. CONFIGURATION FOR TEST • 24 Node C* SSDs • 220 Client instances • 70+ Jmeter Instances
  • 33. TOTAL READ IOPS TOTAL WRITE IOPS
  • 38. REPAIRS AFTER EXTENSION ARE PAINFUL !!
  • 39. TIME TO REPAIR DEPENDS ON • Number of regions • Number of replicas • Data size • Amount of entropy
  • 40. ADJUST GC_GRACE AFTER EXTENSION • Column Family Setting • Defined in seconds • Default 10 days • Tweak gc_grace settings to accommodate time taken to repair • BEWARE of deleted columns
  • 43. CONSISTENCY LEVEL • Check the client for consistency level setting • In a Multiregional cluster QUORUM <> LOCAL_QUORUM • Recommended consistency levels LOCAL_ONE (CASSANDRA-6202) for reads and LOCAL_QUORUM for writes • For region resiliency avoid – ALL or QUORUM calls
  • 44. HOW DO WE KNOW IT WORKS? CREATE CHAOS!!
  • 45.