SlideShare uma empresa Scribd logo
1 de 39
Analyzing Oracle Performance Using Time Series Models  Chen (Gwen) Shapirahttp://prodlife.wordpress.com
Why? Abnormal Data Changes Trends SLAs
See Techniques Use Cases Real Data
Techniques
Trend
Trend
Moving Average Trend
Remove Trend
Seasonality
Seasonal Effect
Components
More AutoCorrelation
Xt= 0.33Xt-1 +         0.07Xt-2 –         0.09Xt-3+ e
Test Model
Use Cases
Fake Incident
Detect By Remove trend Remove Seasonality Mark “normal data” What’s left?
Spot the Incident
“I have seen the future and it is very much like the present, only longer” KehlogAlbran
Exponential Smoothing Calculate moving average of future Add seasonality
AutoCorrelation 	Use the model:Xt = aXt-1…To calculate Xt+1,Xt+2…
Real Data 1:Redo Blocks per Hour
Holiday
Seasonality
Abnormal Data
Real Data 2:CPU on DB Server
Seasonality?
Partial AutoCorrelation
Check Fit of Model
Prediction
Conclusions Use moving average to describe trend Look for seasonality Predict with Exponential Smoothing AutoCorrelation? Seasonality aware monitoring
Questions?

Mais conteúdo relacionado

Destaque

Destaque (13)

JRE300 - Assignment 2
JRE300 - Assignment 2 JRE300 - Assignment 2
JRE300 - Assignment 2
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data Meetup
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
The Business Case for Corporate Performance Management
The Business Case for Corporate Performance ManagementThe Business Case for Corporate Performance Management
The Business Case for Corporate Performance Management
 
2011-2012 Slumber Parties Catalog
2011-2012 Slumber Parties Catalog2011-2012 Slumber Parties Catalog
2011-2012 Slumber Parties Catalog
 
Biography= My grandmother
Biography= My grandmotherBiography= My grandmother
Biography= My grandmother
 
Cell project example
Cell project exampleCell project example
Cell project example
 
Leadership Sustainability: Seven Disciplines to Achieve the Changes Great Lea...
Leadership Sustainability: Seven Disciplines to Achieve the Changes Great Lea...Leadership Sustainability: Seven Disciplines to Achieve the Changes Great Lea...
Leadership Sustainability: Seven Disciplines to Achieve the Changes Great Lea...
 
Satellite Videoconferencing
Satellite VideoconferencingSatellite Videoconferencing
Satellite Videoconferencing
 

Semelhante a Database Performance Analysis with Time Series

An Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social SciencesAn Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social Sciences
fsmart01
 
Introduction to Simulation
Introduction to SimulationIntroduction to Simulation
Introduction to Simulation
chimco.net
 

Semelhante a Database Performance Analysis with Time Series (20)

Mine Your Simulation Model: Automated Discovery of Business Process Simulatio...
Mine Your Simulation Model: Automated Discovery of Business Process Simulatio...Mine Your Simulation Model: Automated Discovery of Business Process Simulatio...
Mine Your Simulation Model: Automated Discovery of Business Process Simulatio...
 
Introduction to simulation and modeling
Introduction to simulation and modelingIntroduction to simulation and modeling
Introduction to simulation and modeling
 
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into value
 
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge_David
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge_DavidUsing Apache Pulsar to Provide Real-Time IoT Analytics on the Edge_David
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge_David
 
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeUsing Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
 
Introduction to simulation.pdf
Introduction to simulation.pdfIntroduction to simulation.pdf
Introduction to simulation.pdf
 
SIMULATION.pdf
SIMULATION.pdfSIMULATION.pdf
SIMULATION.pdf
 
An Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social SciencesAn Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social Sciences
 
Analyzing Performance Test Data
Analyzing Performance Test DataAnalyzing Performance Test Data
Analyzing Performance Test Data
 
Cs854 lecturenotes01
Cs854 lecturenotes01Cs854 lecturenotes01
Cs854 lecturenotes01
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Chapter1
Chapter1Chapter1
Chapter1
 
Introduction to Simulation
Introduction to SimulationIntroduction to Simulation
Introduction to Simulation
 
Introduction to System, Simulation and Model
Introduction to System, Simulation and ModelIntroduction to System, Simulation and Model
Introduction to System, Simulation and Model
 
Time series
Time seriesTime series
Time series
 
Bank entities.pdf
Bank entities.pdfBank entities.pdf
Bank entities.pdf
 
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARNMLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
 
Industrial plant optimization in reduced dimensional spaces
Industrial plant optimization in reduced dimensional spacesIndustrial plant optimization in reduced dimensional spaces
Industrial plant optimization in reduced dimensional spaces
 
Automatic Forecasting at Scale
Automatic Forecasting at ScaleAutomatic Forecasting at Scale
Automatic Forecasting at Scale
 
Real-time ranking with concept drift using expert advice
Real-time ranking with concept drift using expert adviceReal-time ranking with concept drift using expert advice
Real-time ranking with concept drift using expert advice
 

Mais de Gwen (Chen) Shapira

Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding Failure
Gwen (Chen) Shapira
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
Gwen (Chen) Shapira
 

Mais de Gwen (Chen) Shapira (20)

Velocity 2019 - Kafka Operations Deep Dive
Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep Dive
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service mesh
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
 
Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings Meetup
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
R for hadoopers
R for hadoopersR for hadoopers
R for hadoopers
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding Failure
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Database Performance Analysis with Time Series

Notas do Editor

  1. Time Series – Data that is collected sequentially, usually in regular intervals.Time series are all around us – weather, stock, cpu, disk space…
  2. Recognize abnormal data and send alerts Recognize changes and be proactive Analyze long term trends for planning Set Realistic SLAs
  3. One question we’ll keep asking ourselves: Which techniques are really useful?
  4. All kinds of data issues can prevent analysisYou can and sometimes should fix the data so analysis is possibleReplace missing data with average values (or maximum values where makes sense)Remove outliers when it makes sense.Analyze two sides of discontinuity separately
  5. Linear trend. Easy to fit and use, but rarely makes sense in real life
  6. Moving Average requires picking a window size and weights.Small window – matches data better, but may include noiseLarge window – more of a general trend, but will contain a delay
  7. Remove trend to allow analyzing other components.
  8. 50 degrees Fahrenheit is cold for August but hot for January. How about 60% CPU? Is it always OK or always a problem?
  9. Reminder: Correlation is a measure of the strength of the relation between two variables. How much do the variables change together?
  10. How is data in our series correlates to itself? We see strong correlation between data points 24 hours away.
  11. Average CPU for each hour. Similar to those average temperatures for each month charts you sometimes see in tour guides.
  12. One chart to rule them all – data, trend, seasonality and all the rest.
  13. “All the rest” is not completely random – there is still some auto-correlation. Data correlates to points with a lag of one and two.
  14. R used the auto-correlations to model the data
  15. We test the model.We can see that the residuals no longer have auto-correlationand the statistical test for the fit shows that the result is likely not random.
  16. I added couple of hours with high CPU here. Can you spot them?
  17. After removing seasonality and average, we can clearly see that data point that is an outlier. It stands out.
  18. Calculate moving average of future by adding the moving average for the last 20 points as an additional point. Then using the last 19 real points and the new one to calculate another point… Obviously this gets less accurate the more you do it.Adding seasonality is a matter of adding the hourly average to the appropriate new points.
  19. Red – Match the model to existing dataBlue – Predicted dataGreen – 99% probability that we will not get data outside these lines
  20. A bit like moving average but with very specific weights.
  21. Blue – Predicted dataGreen – 99% probability that we will not get data outside these lines
  22. The redo data is very noisy, but adding a moving average trend allows us to see a point where redo generation drops. This happened to be Dec 20 where many users left for vacation.
  23. Correlation every 6 hours and stronger correlation every 24. These are the times we recalculate materialized views. Few views every 6 hours and a bunch every 24.
  24. Removing the seasonality allows us to notice abnormal data. Worth investigating – what was running at that time? Is it likely to happen again?
  25. Not exactly trend, but we do have changing levels of data.
  26. There are periodic correlations but they are not regular, so it is not seasonality.This graph does indicate extremely strong auto-correlation
  27. Partial Autocorrelation graph. This is similar to autocorrelation, but when we calculate auto-correlation for lag 2, we remove the correlation already explained by lag 1 and so on.Using this graph we can see auto-correlation up to lag 17. Once the CPU climbs, it may take over 3 hours until it is back to normal!
  28. Checking that the AR(17) model fits.