SlideShare a Scribd company logo
1 of 29
Gimme More! !
Supporting User Growth in a!
Performant and Efficient Fashion
Arun Kejariwal, Winston Lee

(@arun_kejariwal)

(@winstl)

Capacity Engineering @ Twitter

November 2013


@Twitter 1
User Experience
•  Anytime, Anywhere, Any device
q  5.2 billion mobile users by 2017 [1]
q  More than 10 billion mobile devices/connections by 2017 [1]
q  Worldwide mobile data traffic will reach 11.2 exabytes/month by 2017 (13x increase) [1]

•  Real-time performance








[1] http://newsroom.cisco.com/release/1135354 (Feb. 5, 2013)

@Twitter 2
Capacity Planning: Why bother?
•  Organic growth
q  Over 230M monthly active users [1]

•  User engagement
•  Evolving product landscape
q  Cards, Photos, Vines
§  Mobile video will increase 16-fold between 2012 and 2017 [2]

•  Events planned or unplanned





[1] http://www.sec.gov/Archives/edgar/data/1418091/000119312513400028/d564001ds1a.htm
[2] http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-520862.html

@Twitter 3
Approaches to Capacity Planning
•  Throw hardware at the problem
o  How much?
o  What kind? (Inventory management etc.)
o  Operationally inefficient!

•  Reactive approach


Bottomline

Poor UX



@Twitter 4
Systematic Capacity Planning
•  Objectives
q  Check under-allocation
§  Performance
§  Availability
o 

Adversely impact user experience 

q  Check over-allocation
§  Operational efficiency
o 

Adversely impacts bottomline

•  Determine capacity needed proactively via forecasting
q  Business metrics
q  System resource usage



@Twitter 5
Systematic Capacity Planning: Forecasting
•  Key questions
q  Which data?
§  Raw
§  Periodic Max
§  Moving average

q  Data granularity
§  Minutely
§  Daily
o 

Depends

q  Which model?
§  Linear
§  Spline
§  Holt-Winters

Non-Trivial!

§  ARIMA



@Twitter 6
Good old Linear Regression
Linear Regression based Forecast
Adjusted R-squared: 0.6062
Raw Data

Forecast

@Twitter 7
Linear Regression using periodic max
Linear Regression Using Maxes based Forecast
Adjusted R-squared: 0.5673
Standard Error

2.45x

Raw Data

Forecast

@Twitter 8
Splines
•  Smooth Spline
q λ: penalty for “wiggliness”
Spline based Fitting
Raw Data

Fitted

@Twitter 9
Splines
Spline based Forecast
Raw Data

Forecast

@Twitter 10
Splines
Boundary 2

Boundary 1

•  Sensitive to nature of time series at the boundary

@Twitter 11
Splines – Take 2
Spline based Forecast (Boundary 1)
Raw Data

Forecast

8.31x higher than end of time series

@Twitter 12
Splines – Take 3
Spline based Forecast (Boundary 2)
Raw Data

Forecast

3.77x higher than end of time series

@Twitter 13
Holt-Winters
•  Triple exponential smoothing
Estimate of linear trend
Seasonal correction factors
Holt-Winters based Fitting
Raw Data

Fitted

@Twitter 14
Holt-Winters
Holt-Winters based Forecast
Raw Data

Upper 95% CI

Forecast

@Twitter 15
ARIMA
•  Auto-Regressive Integrated Moving Average 
q  (p, d , q)
Moving Average order
Integrated order
Autoregressive order
Autoregressive component
Moving Average component

@Twitter 16
ARIMA
•  Fitting

Auto ARIMA based Fitting
Raw Data

Fitted

@Twitter 17
ARIMA – Take 1
ARIMA based Forecast

(p, d, q): (0,1,1)(0,1,1)[7] 

Raw Data

Upper 95% CI

Forecast

@Twitter 18
ARIMA – Take 2
Auto ARIMA based Forecast

(p, d, q): (1,1,1)(2,0,0)[7]

Raw Data

Upper 95% CI

Forecast

@Twitter 19
Impact of Outliers

@Twitter 20
Forecast without outlier

@Twitter 21
Good “enough”?

@Twitter 22
Impact of “Corrections”

@Twitter 23
Implications of data characteristics
ARIMA based forecast
Raw Data

Upper 95% CI

Forecast

@Twitter 24
Forecast without the boundary case
ARIMA based Forecast - 
Without initial spike
Raw Data

Upper 95% CI

Forecast

@Twitter 25
Forecast with truncation
ARIMA based Forecast - Truncated and Without initial spike

Raw Data

Upper 95% CI

Forecast

@Twitter 26
Lessons learned
•  Data fidelity
q  Anomalies
q  Absence of seasonality

•  Modeling
q  Never perfect
§  Assess forecasting error

q  Continuous refinement
§  Incoming data stream is dynamic
o 

Organic growth

o 

New products

o 

Behavioral aspect



@Twitter 27
Acknowledgements
•  Capacity Engineering Team
•  Management team

@Twitter 28
Join the Flock

Like problem solving? 

Like challenges? 

Be at cutting Edge 

Make an impact

•  We are hiring!!
q  https://twitter.com/JoinTheFlock
q  https://twitter.com/jobs
q  Contact us: @arun_kejariwal, @winstl

@Twitter 29

More Related Content

What's hot

Deep Learning for Public Safety in Chicago and San Francisco
Deep Learning for Public Safety in Chicago and San FranciscoDeep Learning for Public Safety in Chicago and San Francisco
Deep Learning for Public Safety in Chicago and San FranciscoSri Ambati
 
Apache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsApache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsMuralidhar Somisetty
 
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeUsing Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeDataWorks Summit
 
Data Philly Meetup - Big (Geo) Data
Data Philly Meetup - Big (Geo) DataData Philly Meetup - Big (Geo) Data
Data Philly Meetup - Big (Geo) DataAzavea
 
Event Processing Using Semantic Web Technologies
Event Processing Using Semantic Web TechnologiesEvent Processing Using Semantic Web Technologies
Event Processing Using Semantic Web TechnologiesMikko Rinne
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Spark Summit
 
Power of Splunk Search Processing Language (SPL) ...
Power of Splunk Search Processing Language (SPL)                             ...Power of Splunk Search Processing Language (SPL)                             ...
Power of Splunk Search Processing Language (SPL) ...Splunk
 

What's hot (8)

Deep Learning for Public Safety in Chicago and San Francisco
Deep Learning for Public Safety in Chicago and San FranciscoDeep Learning for Public Safety in Chicago and San Francisco
Deep Learning for Public Safety in Chicago and San Francisco
 
Apache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsApache Spark and future of advanced analytics
Apache Spark and future of advanced analytics
 
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeUsing Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
 
Data Philly Meetup - Big (Geo) Data
Data Philly Meetup - Big (Geo) DataData Philly Meetup - Big (Geo) Data
Data Philly Meetup - Big (Geo) Data
 
Event Processing Using Semantic Web Technologies
Event Processing Using Semantic Web TechnologiesEvent Processing Using Semantic Web Technologies
Event Processing Using Semantic Web Technologies
 
An Analytics Platform for Connected Vehicles
An Analytics Platform for Connected VehiclesAn Analytics Platform for Connected Vehicles
An Analytics Platform for Connected Vehicles
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
 
Power of Splunk Search Processing Language (SPL) ...
Power of Splunk Search Processing Language (SPL)                             ...Power of Splunk Search Processing Language (SPL)                             ...
Power of Splunk Search Processing Language (SPL) ...
 

Similar to Gimme More! Supporting User Growth in a Performant and Efficient Fashion

Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsArun Kejariwal
 
Qualcomm Institute Winter IoT Program - Final Presentation
Qualcomm Institute Winter IoT Program - Final PresentationQualcomm Institute Winter IoT Program - Final Presentation
Qualcomm Institute Winter IoT Program - Final PresentationMookeunJi
 
This is not about Tweeting and Driving
This is not about Tweeting and DrivingThis is not about Tweeting and Driving
This is not about Tweeting and DrivingSylvain Carle
 
Analysing high throughput data in real time
Analysing high throughput data in real timeAnalysing high throughput data in real time
Analysing high throughput data in real timeHotstar
 
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...MongoDB
 
Scaling Agile Data Warehousing with the Scaled Agile Framework (SAFe)
Scaling Agile Data Warehousing with the Scaled Agile Framework (SAFe)Scaling Agile Data Warehousing with the Scaled Agile Framework (SAFe)
Scaling Agile Data Warehousing with the Scaled Agile Framework (SAFe)Em Campbell-Pretty
 
Scaling Agile Data Warehousing with the Scaled Agile Framework (SAFe)
Scaling Agile Data Warehousing with the Scaled Agile Framework (SAFe)Scaling Agile Data Warehousing with the Scaled Agile Framework (SAFe)
Scaling Agile Data Warehousing with the Scaled Agile Framework (SAFe)Context Matters
 
osisoft.ppt
osisoft.pptosisoft.ppt
osisoft.pptIwl Pcu
 
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAmazon Web Services
 
Building Larimer County's Road Event Status System (RESS) (NAGW 2016)
Building Larimer County's Road Event Status System (RESS) (NAGW 2016)Building Larimer County's Road Event Status System (RESS) (NAGW 2016)
Building Larimer County's Road Event Status System (RESS) (NAGW 2016)Gregg Turnbull, CGDSP, CGCIO
 
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and SupersetInteractive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and SupersetHortonworks
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Summit
 
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon RedshiftAmazon Web Services
 
AdhearsionConf 2013 Keynote
AdhearsionConf 2013 KeynoteAdhearsionConf 2013 Keynote
AdhearsionConf 2013 KeynoteMojo Lingo
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Data Con LA
 
Báo cáo xu hướng sử dụng kỉ thuật số của người tiêu dùng năm 2015
Báo cáo xu hướng sử dụng kỉ thuật số của người tiêu dùng năm 2015Báo cáo xu hướng sử dụng kỉ thuật số của người tiêu dùng năm 2015
Báo cáo xu hướng sử dụng kỉ thuật số của người tiêu dùng năm 2015Bui Thi Quynh Duong
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersVoltDB
 
Data At Pollfish, Dec. 2015, Euangelos Linardos
Data At Pollfish, Dec. 2015, Euangelos LinardosData At Pollfish, Dec. 2015, Euangelos Linardos
Data At Pollfish, Dec. 2015, Euangelos LinardosEuangelos Linardos
 

Similar to Gimme More! Supporting User Growth in a Performant and Efficient Fashion (20)

Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and Systems
 
Qualcomm Institute Winter IoT Program - Final Presentation
Qualcomm Institute Winter IoT Program - Final PresentationQualcomm Institute Winter IoT Program - Final Presentation
Qualcomm Institute Winter IoT Program - Final Presentation
 
Dash Wireframe
Dash WireframeDash Wireframe
Dash Wireframe
 
This is not about Tweeting and Driving
This is not about Tweeting and DrivingThis is not about Tweeting and Driving
This is not about Tweeting and Driving
 
Analysing high throughput data in real time
Analysing high throughput data in real timeAnalysing high throughput data in real time
Analysing high throughput data in real time
 
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...
 
Institutional GIS
Institutional GISInstitutional GIS
Institutional GIS
 
Scaling Agile Data Warehousing with the Scaled Agile Framework (SAFe)
Scaling Agile Data Warehousing with the Scaled Agile Framework (SAFe)Scaling Agile Data Warehousing with the Scaled Agile Framework (SAFe)
Scaling Agile Data Warehousing with the Scaled Agile Framework (SAFe)
 
Scaling Agile Data Warehousing with the Scaled Agile Framework (SAFe)
Scaling Agile Data Warehousing with the Scaled Agile Framework (SAFe)Scaling Agile Data Warehousing with the Scaled Agile Framework (SAFe)
Scaling Agile Data Warehousing with the Scaled Agile Framework (SAFe)
 
osisoft.ppt
osisoft.pptosisoft.ppt
osisoft.ppt
 
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
 
Building Larimer County's Road Event Status System (RESS) (NAGW 2016)
Building Larimer County's Road Event Status System (RESS) (NAGW 2016)Building Larimer County's Road Event Status System (RESS) (NAGW 2016)
Building Larimer County's Road Event Status System (RESS) (NAGW 2016)
 
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and SupersetInteractive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
 
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
 
AdhearsionConf 2013 Keynote
AdhearsionConf 2013 KeynoteAdhearsionConf 2013 Keynote
AdhearsionConf 2013 Keynote
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
 
Báo cáo xu hướng sử dụng kỉ thuật số của người tiêu dùng năm 2015
Báo cáo xu hướng sử dụng kỉ thuật số của người tiêu dùng năm 2015Báo cáo xu hướng sử dụng kỉ thuật số của người tiêu dùng năm 2015
Báo cáo xu hướng sử dụng kỉ thuật số của người tiêu dùng năm 2015
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top Contenders
 
Data At Pollfish, Dec. 2015, Euangelos Linardos
Data At Pollfish, Dec. 2015, Euangelos LinardosData At Pollfish, Dec. 2015, Euangelos Linardos
Data At Pollfish, Dec. 2015, Euangelos Linardos
 

More from Arun Kejariwal

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The EdgeArun Kejariwal
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseArun Kejariwal
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar FunctionsArun Kejariwal
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsArun Kejariwal
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsArun Kejariwal
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series DataArun Kejariwal
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsArun Kejariwal
 
Live Anomaly Detection
Live Anomaly DetectionLive Anomaly Detection
Live Anomaly DetectionArun Kejariwal
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactArun Kejariwal
 
Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterArun Kejariwal
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceArun Kejariwal
 
Techniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintArun Kejariwal
 
A Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudA Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudArun Kejariwal
 

More from Arun Kejariwal (16)

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the Enterprise
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar Functions
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data Applications
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series Data
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Live Anomaly Detection
Live Anomaly DetectionLive Anomaly Detection
Live Anomaly Detection
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
 
Velocity 2015-final
Velocity 2015-finalVelocity 2015-final
Velocity 2015-final
 
Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ Twitter
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy service
 
Techniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud Footprint
 
A Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudA Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the Cloud
 

Recently uploaded

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Gimme More! Supporting User Growth in a Performant and Efficient Fashion

  • 1. Gimme More! ! Supporting User Growth in a! Performant and Efficient Fashion Arun Kejariwal, Winston Lee (@arun_kejariwal) (@winstl) Capacity Engineering @ Twitter November 2013 @Twitter 1
  • 2. User Experience •  Anytime, Anywhere, Any device q  5.2 billion mobile users by 2017 [1] q  More than 10 billion mobile devices/connections by 2017 [1] q  Worldwide mobile data traffic will reach 11.2 exabytes/month by 2017 (13x increase) [1] •  Real-time performance [1] http://newsroom.cisco.com/release/1135354 (Feb. 5, 2013) @Twitter 2
  • 3. Capacity Planning: Why bother? •  Organic growth q  Over 230M monthly active users [1] •  User engagement •  Evolving product landscape q  Cards, Photos, Vines §  Mobile video will increase 16-fold between 2012 and 2017 [2] •  Events planned or unplanned [1] http://www.sec.gov/Archives/edgar/data/1418091/000119312513400028/d564001ds1a.htm [2] http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-520862.html @Twitter 3
  • 4. Approaches to Capacity Planning •  Throw hardware at the problem o  How much? o  What kind? (Inventory management etc.) o  Operationally inefficient! •  Reactive approach Bottomline Poor UX @Twitter 4
  • 5. Systematic Capacity Planning •  Objectives q  Check under-allocation §  Performance §  Availability o  Adversely impact user experience q  Check over-allocation §  Operational efficiency o  Adversely impacts bottomline •  Determine capacity needed proactively via forecasting q  Business metrics q  System resource usage @Twitter 5
  • 6. Systematic Capacity Planning: Forecasting •  Key questions q  Which data? §  Raw §  Periodic Max §  Moving average q  Data granularity §  Minutely §  Daily o  Depends q  Which model? §  Linear §  Spline §  Holt-Winters Non-Trivial! §  ARIMA @Twitter 6
  • 7. Good old Linear Regression Linear Regression based Forecast Adjusted R-squared: 0.6062 Raw Data Forecast @Twitter 7
  • 8. Linear Regression using periodic max Linear Regression Using Maxes based Forecast Adjusted R-squared: 0.5673 Standard Error 2.45x Raw Data Forecast @Twitter 8
  • 9. Splines •  Smooth Spline q λ: penalty for “wiggliness” Spline based Fitting Raw Data Fitted @Twitter 9
  • 10. Splines Spline based Forecast Raw Data Forecast @Twitter 10
  • 11. Splines Boundary 2 Boundary 1 •  Sensitive to nature of time series at the boundary @Twitter 11
  • 12. Splines – Take 2 Spline based Forecast (Boundary 1) Raw Data Forecast 8.31x higher than end of time series @Twitter 12
  • 13. Splines – Take 3 Spline based Forecast (Boundary 2) Raw Data Forecast 3.77x higher than end of time series @Twitter 13
  • 14. Holt-Winters •  Triple exponential smoothing Estimate of linear trend Seasonal correction factors Holt-Winters based Fitting Raw Data Fitted @Twitter 14
  • 15. Holt-Winters Holt-Winters based Forecast Raw Data Upper 95% CI Forecast @Twitter 15
  • 16. ARIMA •  Auto-Regressive Integrated Moving Average q  (p, d , q) Moving Average order Integrated order Autoregressive order Autoregressive component Moving Average component @Twitter 16
  • 17. ARIMA •  Fitting Auto ARIMA based Fitting Raw Data Fitted @Twitter 17
  • 18. ARIMA – Take 1 ARIMA based Forecast (p, d, q): (0,1,1)(0,1,1)[7] Raw Data Upper 95% CI Forecast @Twitter 18
  • 19. ARIMA – Take 2 Auto ARIMA based Forecast (p, d, q): (1,1,1)(2,0,0)[7] Raw Data Upper 95% CI Forecast @Twitter 19
  • 24. Implications of data characteristics ARIMA based forecast Raw Data Upper 95% CI Forecast @Twitter 24
  • 25. Forecast without the boundary case ARIMA based Forecast - Without initial spike Raw Data Upper 95% CI Forecast @Twitter 25
  • 26. Forecast with truncation ARIMA based Forecast - Truncated and Without initial spike Raw Data Upper 95% CI Forecast @Twitter 26
  • 27. Lessons learned •  Data fidelity q  Anomalies q  Absence of seasonality •  Modeling q  Never perfect §  Assess forecasting error q  Continuous refinement §  Incoming data stream is dynamic o  Organic growth o  New products o  Behavioral aspect @Twitter 27
  • 28. Acknowledgements •  Capacity Engineering Team •  Management team @Twitter 28
  • 29. Join the Flock Like problem solving? Like challenges? Be at cutting Edge Make an impact •  We are hiring!! q  https://twitter.com/JoinTheFlock q  https://twitter.com/jobs q  Contact us: @arun_kejariwal, @winstl @Twitter 29