2. What do Capacity do? Manager ask “Can our database survive in next year or the new promotion program”? Performance Tuning != Capacity planning DBA is not totally equal to Capacity Analyst
3. What do Capacity do? How Much headroom do we still have for further increasing, how many days can we hold without add/upgrade hardware? What’s the costs or impact to the site with adding or changing application code What kind of platform/database/OS should we use for the new introduced applications How to survive for the sudden performance deviations?
4. Agenda Resource Model and Theory Response Time Analysis Steps to do Capacity Analysis Case Study
6. Modeling - Making the Complex Simple The world is much too complex for us to understand. Mathematical Model Queue theory Line modeling Regression analysis Utilization Baseline Model is not perfect, not 100% precision
10. Queuing Theory Remain in the queue until its turn to be serviced Common FIFO or priority queue Queue length Wait times and wait events CPU queue and IO queue
11. Response Time Drill Down Response Time=Service Time + Queue Time Rt =St + Qt CPU Queue Network Transfer CPU Usr+Sys Memory Queue Memory Access Disk Queue Disk Transfer Network Queue
12. Utilization and Headroom Headroom is available usable resources -Total Capacity minus Peak Utilization and Margin -Applies to CPU, RAM, Net, Disk and OS -Can be very complex to determine, it depends
13. CPU Capacity Measurements CPU utilization is defined as busy time divided by elapsed time for each CPU CPU time = CPU Queue + CPU usr+sys Processes wait on a run queue, causing high load averages, then run on a CPU in user and system mode. More CPUs reduce queue wait. Faster CPUs reduce usr+sys time.
14. CPU Capacity Measurements U=λ*St*M CPU Utilization=Arrival rate* cpu_time_exec(us)/POWER(10,6)/number_of_CPU CPU Utilization=buffer_gets* buffer_gets_time_per_exec(us)/POWER(10,6)/number_of_CPU We can use this format for many cases
24. Data Collection What kind of data to collect When to collect the data Where to put the data How often to collect the data How long to keep the data How to interpret and present the data A Picture Is Worth A Thousand Words Script and automate is necessary
25. Capacity Monitoring in database level Peak executions Sessions Shared pool usage LIO/exec PIO/exec CPU_time/LIO Redo size Free memory Commits Disk space usage …..
26. Risk Mitigation Strategies Capacity Analyst is not only DBA Tuning/fixing the issue - DBA or SA’s task? Balancing existing workload Upgrade and buy more CPU capacity Split and Sharding
27. Steps to take for Capacity Analysis 1. Determine the question 2. Gather workload data - What, how and how often 3. Characterize the workload data - Map, Interpret the data 4. Develop and use appropriate model - Present your data, Graph 5. Validate the forecast 6. Forecast
28. Case Study #1 – Delete Performance Questions: How many data can we delete every day? If the delete will catch up in no-peak time? How many thread can we use to do delete? What’s the main cost for delete job?
29. Case Study #1 – Delete Performance delete performance is decided by IO response time/ PIO_Per_row. SNAP_TIME EXEC_PER_SEC LIO Per Exec PIO Per Exec Rows Per Exec -------------------- ------------ ------------ ------------ ------------- 2011/02/10 15:49 .04 10194.44 2323.02 1000 2011/02/10 16:04 .05 10200.82 2322 1000 2011/02/10 16:19 .06 10198.03 1967.9 999.9 2011/02/10 16:34 .06 10201.81 1985.98 1000 2011/02/10 16:49 .06 10194.11 2088.38 999.9 1/6m/(2323/1000)=1000000/6/2323=71 rows The real case: deletion started at: 2011-02-10 15:38:45 rows to delete: 1232177 rows deleted: 1232171 deletion ended at: 2011-02-10 20:47:51 1232171/((TO_DATE('2011-02-1020:47:51','YYYY-MM-DDHH24:MI:SS')-TO_DATE('2011-02-1015:38:45','YYYY-MM-DDHH24:MI:SS'))*24*60*60) 66.4386391
30. Case Study #2 – Using MySQL What capacity analysis Should we do to evaluate MySQL ? MySQL version Machine OS LOCAL DISK/SSD Kernel configuration MySQL Parameters mysqlySQL InnoDB setup best practice.doc
31. Answers to what Capacity need to do Measure the capacity of the site correctly and accurately. Be able to predict the growth of site, identify future performance problem Define what is balance and find a strategy to keep dynamic balance. Impact analysis of system level change. Identify dangerous performance deviations.