O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Carregando em…3
×
1 de 92

DB2 Workload Manager Histograms

1

Compartilhar

Baixar para ler offline

Learn how you can use the new workload management histograms feature in IBM® DB2® 9.5 for Linux®, UNIX®, and Windows® to better understand your workloads, determine the root cause of system slowdowns related to changes in workload, and easily track adherence to performance Service Level Agreements.

Livros relacionados

Gratuito durante 30 dias do Scribd

Ver tudo

DB2 Workload Manager Histograms

  1. 1. DB2 9.5 Monitoring Performance Tuning and Problem Determination
  2. 2. Agenda <ul><li>What I’m going to talk about </li></ul><ul><ul><li>Monitoring features new to DB2 9.5 </li></ul></ul><ul><ul><li>Measuring against SLAs so that the system can be tuned to achieve them </li></ul></ul><ul><ul><li>Using new monitoring features for problem determination </li></ul></ul><ul><li>What I’m not going to talk about </li></ul><ul><ul><li>Monitoring features that existed prior to DB2 9.5 </li></ul></ul><ul><ul><li>Tuning individual queries, bufferpool hit ratios, etc. </li></ul></ul>
  3. 3. New DB2 9.5 Monitoring Features <ul><li>Types of Information </li></ul><ul><ul><li>Statistics </li></ul></ul><ul><ul><ul><li>Counters </li></ul></ul></ul><ul><ul><ul><li>High watermarks </li></ul></ul></ul><ul><ul><ul><li>Histograms </li></ul></ul></ul><ul><ul><li>Activities </li></ul></ul><ul><li>Types of Presentation </li></ul><ul><ul><li>Table functions </li></ul></ul><ul><ul><li>Event monitors </li></ul></ul>
  4. 4. The Bell Curve
  5. 5. The Bell Curve in Height
  6. 6. The Power Law Curve
  7. 7. If Height Were A Power Law <ul><li>“ If the average height of two hundred men was five foot ten; the most frequent (or modal) height would be held by dozens of men who were each only a foot tall …” </li></ul>
  8. 8. If Height Were A Power Law <ul><li>“… the median height would be two feet tall (a hundred men shorter than two feet, and a hundred taller)…” </li></ul>
  9. 9. If Height Were A Power Law <ul><li>“ Most important, in such a distribution, the five tallest men would be 40, 50, 66, 100, and 200 feet tall respectively.” </li></ul><ul><li>– Clay Shirky, Here Comes Everybody </li></ul>
  10. 10. <ul><li>“ When we encounter a system like a database where there is no such thing as a representative query, the habits of mind that come from thinking about averages are not merely useless, they’re harmful.” </li></ul>
  11. 11. Drawing A Histogram – Step 1
  12. 12. Drawing A Histogram – Step 2
  13. 13. 6.20 20 5.90 19 5.75 18 5.72 17 5.38 16 4.79 15 4.73 14 4.70 13 4.53 12 4.34 11 3.67 10 3.28 9 2.77 8 2.48 7 2.41 6 1.63 5 1.36 4 1.24 3 1.21 2 0.53 1 Response Time Query
  14. 14. ( 6 - 7 ] ( 5 - 6 ] ( 4 - 5 ] ( 3 - 4 ] ( 2 - 3 ] ( 1 - 2 ] [ 0 - 1 ] Frequency Bin Range 6.20 20 5.90 19 5.75 18 5.72 17 5.38 16 4.79 15 4.73 14 4.70 13 4.53 12 4.34 11 3.67 10 3.28 9 2.77 8 2.48 7 2.41 6 1.63 5 1.36 4 1.24 3 1.21 2 0.53 1 Response Time Query
  15. 15. 6.20 20 5.90 19 5.75 18 5.72 17 5.38 16 4.79 15 4.73 14 4.70 13 4.53 12 4.34 11 3.67 10 3.28 9 2.77 8 2.48 7 2.41 6 1.63 5 1.36 4 1.24 3 1.21 2 0.53 1 Response Time Query ( 6 - 7 ] ( 5 - 6 ] ( 4 - 5 ] ( 3 - 4 ] ( 2 - 3 ] ( 1 - 2 ] 1 [ 0 - 1 ] Frequency Bin Range
  16. 16. ( 6 - 7 ] ( 5 - 6 ] ( 4 - 5 ] ( 3 - 4 ] ( 2 - 3 ] 4 ( 1 - 2 ] 1 [ 0 - 1 ] Frequency Bin Range 6.20 20 5.90 19 5.75 18 5.72 17 5.38 16 4.79 15 4.73 14 4.70 13 4.53 12 4.34 11 3.67 10 3.28 9 2.77 8 2.48 7 2.41 6 1.63 5 1.36 4 1.24 3 1.21 2 0.53 1 Response Time Query
  17. 17. ( 6 - 7 ] ( 5 - 6 ] ( 4 - 5 ] ( 3 - 4 ] 3 ( 2 - 3 ] 4 ( 1 - 2 ] 1 [ 0 - 1 ] Frequency Bin Range 6.20 20 5.90 19 5.75 18 5.72 17 5.38 16 4.79 15 4.73 14 4.70 13 4.53 12 4.34 11 3.67 10 3.28 9 2.77 8 2.48 7 2.41 6 1.63 5 1.36 4 1.24 3 1.21 2 0.53 1 Response Time Query
  18. 18. 1 ( 6 - 7 ] 4 ( 5 - 6 ] 5 ( 4 - 5 ] 2 ( 3 - 4 ] 3 ( 2 - 3 ] 4 ( 1 - 2 ] 1 [ 0 - 1 ] Frequency Bin Range 6.20 20 5.90 19 5.75 18 5.72 17 5.38 16 4.79 15 4.73 14 4.70 13 4.53 12 4.34 11 3.67 10 3.28 9 2.77 8 2.48 7 2.41 6 1.63 5 1.36 4 1.24 3 1.21 2 0.53 1 Response Time Query
  19. 19. 1 ( 6 - 7 ] 4 ( 5 - 6 ] 5 ( 4 - 5 ] 2 ( 3 - 4 ] 3 ( 2 - 3 ] 4 ( 1 - 2 ] 1 [ 0 - 1 ] Frequency Bin Range
  20. 20. Why Are Histograms Useful? <ul><li>Reveal hidden distribution details </li></ul><ul><li>Detect outliers </li></ul><ul><li>Can be aggregated </li></ul>
  21. 21. Reveal Hidden Distribution Details <ul><li>Month 1 - all users are happy </li></ul><ul><ul><li>Average response time: 19 seconds </li></ul></ul><ul><ul><li>Standard deviation: 6 seconds </li></ul></ul>
  22. 22. Reveal Hidden Distribution Details <ul><li>Month 2 - some users are complaining </li></ul><ul><ul><li>Average response time: 19 seconds </li></ul></ul><ul><ul><li>Standard deviation: 6 seconds </li></ul></ul>
  23. 23. Reveal Hidden Distribution Details <ul><li>Month 1 - all users are happy </li></ul><ul><ul><li>Average response time: 19 seconds </li></ul></ul><ul><ul><li>Standard deviation: 6 seconds </li></ul></ul>
  24. 24. Reveal Hidden Distribution Details <ul><li>Month 2 - some users are complaining </li></ul><ul><ul><li>Average response time: 19 seconds </li></ul></ul><ul><ul><li>Standard deviation: 6 seconds </li></ul></ul>
  25. 25. Outliers
  26. 26. Detecting Outliers <ul><li>Sometimes, outliers have a small, even negligible effect on the average </li></ul><ul><ul><li>Adding the heaviest land animal on earth to a group of 1000 people changes the average weight by less than 10%, that is, little more than 10 pounds. </li></ul></ul>
  27. 27. Detecting Outliers <ul><li>Sometimes, outliers have a significant effect </li></ul><ul><ul><li>Adding the richest man on earth to a group of 1000 people changes the average net worth by 40,000%, that is, tens of millions of dollars. </li></ul></ul>
  28. 28. Net Worth Histogram
  29. 29. Net Worth Histogram
  30. 30. Aggregating Histograms <ul><li>Aggregate short periods of time into longer periods </li></ul><ul><ul><li>Daily response times for day-to-day decision-making </li></ul></ul><ul><ul><li>Monthly reports </li></ul></ul>
  31. 31. Aggregating Averages <ul><li>In Innumeracy , John Allen Paulos gives an example of a sports anomaly involving batting averages. </li></ul>
  32. 32. Batting Averages <ul><li>“ Babe Ruth hits for a higher batting average for the first half of the season and hits for a higher batting average in the second half of the season as well, …” </li></ul>
  33. 33. Batting Averages <ul><li>“… but Lou Gehrig ends up with a higher batting average for the season as a whole.” </li></ul>
  34. 34. Batting Averages 200 .390 Second 100 .290 First Lou Gehrig 100 .400 Second 200 .300 First Babe Ruth At Bats Average Half Player
  35. 35. Batting Averages 200 .390 Second 100 .290 First Lou Gehrig 100 .400 Second 60.0 200 .300 First Babe Ruth Average × At Bats At Bats Average Half Player
  36. 36. Batting Averages 200 .390 Second 100 .290 First Lou Gehrig 40.0 100 .400 Second 60.0 200 .300 First Babe Ruth Average × At Bats At Bats Average Half Player
  37. 37. Batting Averages 78.0 200 .390 Second 29.0 100 .290 First Lou Gehrig 40.0 100 .400 Second 60.0 200 .300 First Babe Ruth Average × At Bats At Bats Average Half Player
  38. 38. Batting Averages 78.0 200 .390 Second 29.0 100 .290 First Lou Gehrig 100.0 300 40.0 100 .400 Second 60.0 200 .300 First Babe Ruth Average × At Bats At Bats Average Half Player
  39. 39. Batting Averages 107.00 300 78.0 200 .390 Second 29.0 100 .290 First Lou Gehrig 100.0 300 40.0 100 .400 Second 60.0 200 .300 First Babe Ruth Average × At Bats At Bats Average Half Player
  40. 40. 0 ( 20 - 21 ] 0 ( 19 - 20 ] 0 ( 18 - 19 ] 0 ( 17 - 18 ] 0 ( 16 - 17 ] 0 ( 15 - 16 ] 0 ( 14 - 15 ] 0 ( 13 - 14 ] 0 ( 12 - 13 ] 0 ( 11 - 12 ] 438 ( 10 - 11 ] 1350 ( 9 - 10 ] 3238 ( 8 - 9 ] 6049 ( 7 - 8 ] 8802 ( 6 - 7 ] 9974 ( 5 - 6 ] 8802 ( 4 - 5 ] 6049 ( 3 - 4 ] 3238 ( 2 - 3 ] 1350 ( 1 - 2 ] 438 [ 0 - 1 ] January Range (seconds)
  41. 41. 613 613 0 ( 20 - 21 ] 1890 1890 0 ( 19 - 20 ] 4533 4533 0 ( 18 - 19 ] 8469 8469 0 ( 17 - 18 ] 12322 12322 0 ( 16 - 17 ] 13963 13963 0 ( 15 - 16 ] 12322 12322 0 ( 14 - 15 ] 8469 8469 0 ( 13 - 14 ] 4533 4533 0 ( 12 - 13 ] 1890 1890 0 ( 11 - 12 ] 1051 613 438 ( 10 - 11 ] 1350 0 1350 ( 9 - 10 ] 3238 0 3238 ( 8 - 9 ] 6049 0 6049 ( 7 - 8 ] 8802 0 8802 ( 6 - 7 ] 9974 0 9974 ( 5 - 6 ] 8802 0 8802 ( 4 - 5 ] 6049 0 6049 ( 3 - 4 ] 3238 0 3238 ( 2 - 3 ] 1350 0 1350 ( 1 - 2 ] 438 0 438 [ 0 - 1 ] Jan + Feb February January Range (seconds)
  42. 42. 613 0 613 0 ( 20 - 21 ] 1890 0 1890 0 ( 19 - 20 ] 4533 0 4533 0 ( 18 - 19 ] 8469 0 8469 0 ( 17 - 18 ] 12322 0 12322 0 ( 16 - 17 ] 14489 526 13963 0 ( 15 - 16 ] 13942 1620 12322 0 ( 14 - 15 ] 12355 3886 8469 0 ( 13 - 14 ] 11792 7259 4533 0 ( 12 - 13 ] 12452 10562 1890 0 ( 11 - 12 ] 13019 11968 613 438 ( 10 - 11 ] 11912 10562 0 1350 ( 9 - 10 ] 10497 7259 0 3238 ( 8 - 9 ] 9935 3886 0 6049 ( 7 - 8 ] 10422 1620 0 8802 ( 6 - 7 ] 10500 526 0 9974 ( 5 - 6 ] 8802 0 0 8802 ( 4 - 5 ] 6049 0 0 6049 ( 3 - 4 ] 3238 0 0 3238 ( 2 - 3 ] 1350 0 0 1350 ( 1 - 2 ] 438 0 0 438 [ 0 - 1 ] First Quarter March February January Range (seconds)
  43. 43. DB2 WLM Histograms <ul><li>Activity </li></ul><ul><ul><li>Execution time </li></ul></ul><ul><ul><li>Queue time </li></ul></ul><ul><ul><li>Lifetime </li></ul></ul><ul><ul><li>Inter-arrival time </li></ul></ul><ul><ul><li>Estimated cost </li></ul></ul><ul><li>Request </li></ul><ul><ul><li>Execution time </li></ul></ul>
  44. 44. Lifecycle Of An Activity
  45. 45. Inter-arrival Times
  46. 46. Inter-arrival Times
  47. 54. How Much Data Is Enough?
  48. 55. How Much Data Is Enough?
  49. 56. How Much Data Is Enough?
  50. 57. How To Collect DB2 Histograms <ul><li>Activity execution time </li></ul><ul><li>Activity queue time </li></ul><ul><li>Activity lifetime </li></ul><ul><li>alter service class MANAGERS </li></ul><ul><li>under MARKETING </li></ul><ul><li>collect aggregate activity data BASE </li></ul><ul><li>alter work action set MAPLOADS </li></ul><ul><li>alter work action MAPLOADS </li></ul><ul><li>collect aggregate activity data BASE </li></ul>
  51. 58. How To Collect DB2 Histograms <ul><li>Activity execution time </li></ul><ul><li>Activity queue time </li></ul><ul><li>Activity lifetime </li></ul><ul><li>Activity inter-arrival time </li></ul><ul><li>DML activity estimated cost </li></ul><ul><li>alter service class MANAGERS </li></ul><ul><li>under MARKETING </li></ul><ul><li>collect aggregate activity data EXTENDED </li></ul><ul><li>alter work action set MAPLOADS </li></ul><ul><li>alter work action MAPLOADS </li></ul><ul><li>collect aggregate activity data EXTENDED </li></ul>
  52. 59. How To Collect DB2 Histograms <ul><li>Request execution time </li></ul><ul><li>alter service class MANAGERS </li></ul><ul><li>under MARKETING </li></ul><ul><li>collect aggregate request data BASE </li></ul>
  53. 60. How To Collect DB2 Histograms <ul><li>An event monitor must be active to receive the data and write it to a table, file or pipe </li></ul><ul><li>create event monitor DB2STATISTICS </li></ul><ul><li>for statistics write to table </li></ul><ul><li>set event monitor DB2STATISTICS state 1 </li></ul>
  54. 61. Triggering A Collection <ul><li>WLM_COLLECT_INT database configuration parameter </li></ul><ul><ul><li>To collect once every 24 hours (1440 minutes) </li></ul></ul><ul><ul><li>update db cfg using WLM_COLLECT_INT 1440 </li></ul></ul><ul><li>WLM_COLLECT_STATS stored procedure </li></ul><ul><ul><li>To collect immediately </li></ul></ul><ul><ul><li>call wlm_collect_stats() </li></ul></ul>
  55. 62. The HISTOGRAMBIN Table <ul><li>DESCRIBE TABLE HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>                           Data type                     Column </li></ul><ul><li>Column name                     schema    Data type name      Length     Scale Nulls </li></ul><ul><li>------------------------------- --------- ------------------- ---------- ----- ----- </li></ul><ul><li>BIN_ID                          SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>BOTTOM                          SYSIBM    BIGINT                       8     0 No </li></ul><ul><li>HISTOGRAM_TYPE                  SYSIBM    VARCHAR                     64     0 No </li></ul><ul><li>NUMBER_IN_BIN                   SYSIBM    BIGINT                       8     0 No </li></ul><ul><li>SERVICE_CLASS_ID                SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>STATISTICS_TIMESTAMP            SYSIBM    TIMESTAMP                   10     0 No </li></ul><ul><li>TOP                             SYSIBM    BIGINT                       8     0 No </li></ul><ul><li>WORK_ACTION_SET_ID              SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>WORK_CLASS_ID                   SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>  9 record(s) selected. </li></ul>
  56. 63. The HISTOGRAMBIN Table <ul><li>DESCRIBE TABLE HISTOGRAMBIN _DB2STATISTICS </li></ul><ul><li>                           Data type                     Column </li></ul><ul><li>Column name                     schema    Data type name      Length     Scale Nulls </li></ul><ul><li>------------------------------- --------- ------------------- ---------- ----- ----- </li></ul><ul><li>BIN_ID                          SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>BOTTOM                          SYSIBM    BIGINT                       8     0 No </li></ul><ul><li>HISTOGRAM_TYPE                  SYSIBM    VARCHAR                     64     0 No </li></ul><ul><li>NUMBER_IN_BIN                   SYSIBM    BIGINT                       8     0 No </li></ul><ul><li>SERVICE_CLASS_ID                SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>STATISTICS_TIMESTAMP            SYSIBM    TIMESTAMP                   10     0 No </li></ul><ul><li>TOP                             SYSIBM    BIGINT                       8     0 No </li></ul><ul><li>WORK_ACTION_SET_ID              SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>WORK_CLASS_ID                   SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>  9 record(s) selected. </li></ul>
  57. 64. The HISTOGRAMBIN Table <ul><li>DESCRIBE TABLE HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>                           Data type                     Column </li></ul><ul><li>Column name                     schema    Data type name      Length     Scale Nulls </li></ul><ul><li>------------------------------- --------- ------------------- ---------- ----- ----- </li></ul><ul><li>BIN_ID                          SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>BOTTOM                          SYSIBM    BIGINT                       8     0 No </li></ul><ul><li>HISTOGRAM_TYPE                   SYSIBM    VARCHAR                     64     0 No </li></ul><ul><li>NUMBER_IN_BIN                   SYSIBM    BIGINT                       8     0 No </li></ul><ul><li>SERVICE_CLASS_ID                 SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>STATISTICS_TIMESTAMP             SYSIBM    TIMESTAMP                   10     0 No </li></ul><ul><li>TOP                             SYSIBM    BIGINT                       8     0 No </li></ul><ul><li>WORK_ACTION_SET_ID              SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>WORK_CLASS_ID                   SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>  9 record(s) selected. </li></ul>
  58. 65. The HISTOGRAMBIN Table <ul><li>DESCRIBE TABLE HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>                           Data type                     Column </li></ul><ul><li>Column name                     schema    Data type name      Length     Scale Nulls </li></ul><ul><li>------------------------------- --------- ------------------- ---------- ----- ----- </li></ul><ul><li>BIN_ID                          SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>BOTTOM                          SYSIBM    BIGINT                       8     0 No </li></ul><ul><li>HISTOGRAM_TYPE                   SYSIBM    VARCHAR                     64     0 No </li></ul><ul><li>NUMBER_IN_BIN                   SYSIBM    BIGINT                       8     0 No </li></ul><ul><li>SERVICE_CLASS_ID                SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>STATISTICS_TIMESTAMP             SYSIBM    TIMESTAMP                   10     0 No </li></ul><ul><li>TOP                             SYSIBM    BIGINT                       8     0 No </li></ul><ul><li>WORK_ACTION_SET_ID               SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>WORK_CLASS_ID                    SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>  9 record(s) selected. </li></ul>
  59. 66. The HISTOGRAMBIN Table <ul><li>DESCRIBE TABLE HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>                           Data type                     Column </li></ul><ul><li>Column name                     schema    Data type name      Length     Scale Nulls </li></ul><ul><li>------------------------------- --------- ------------------- ---------- ----- ----- </li></ul><ul><li>BIN_ID                          SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>BOTTOM                          SYSIBM    BIGINT                       8     0 No </li></ul><ul><li>HISTOGRAM_TYPE                  SYSIBM    VARCHAR                     64     0 No </li></ul><ul><li>NUMBER_IN_BIN                    SYSIBM    BIGINT                       8     0 No </li></ul><ul><li>SERVICE_CLASS_ID                SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>STATISTICS_TIMESTAMP            SYSIBM    TIMESTAMP                   10     0 No </li></ul><ul><li>TOP                              SYSIBM    BIGINT                       8     0 No </li></ul><ul><li>WORK_ACTION_SET_ID              SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>WORK_CLASS_ID                   SYSIBM    INTEGER                      4     0 No </li></ul><ul><li>  9 record(s) selected. </li></ul>
  60. 67. Visualizing A Histogram <ul><li>SELECT TOP, NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMBIN_DB2STATISTICS </li></ul>
  61. 68. Visualizing A Histogram <ul><li>SELECT TOP, NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>WHERE SERVICE_CLASS_ID = 13 </li></ul>
  62. 69. Visualizing A Histogram <ul><li>SELECT TOP, NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>WHERE SERVICE_CLASS_ID = 13 </li></ul><ul><li>   AND HISTOGRAM_TYPE = ‘CoordActLifetime’ </li></ul>
  63. 70. Visualizing A Histogram <ul><li>SELECT TOP, NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>WHERE SERVICE_CLASS_ID = 13 </li></ul><ul><li>   AND HISTOGRAM_TYPE = ‘CoordActLifetime’ </li></ul><ul><li>AND STATISTICS_TIMESTAMP = ‘2009-03-12-14.30.00.000000’ </li></ul>
  64. 71. Visualizing A Histogram <ul><li>SELECT TOP, NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>WHERE SERVICE_CLASS_ID = 13 </li></ul><ul><li>   AND HISTOGRAM_TYPE = ‘CoordActLifetime’ </li></ul><ul><li>AND STATISTICS_TIMESTAMP = ‘2009-03-12-14.30.00.000000’ </li></ul><ul><li>ORDER BY TOP </li></ul>
  65. 72. Visualizing A Histogram <ul><li>SELECT TOP, NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>WHERE SERVICE_CLASS_ID = 13 </li></ul><ul><li>   AND HISTOGRAM_TYPE = ‘CoordActLifetime’ </li></ul><ul><li>AND STATISTICS_TIMESTAMP = ‘2009-03-12-14.30.00.000000’ </li></ul><ul><li>ORDER BY TOP </li></ul><ul><li>TOP         NUMBER_IN_BIN </li></ul><ul><li>----------- ------------- </li></ul><ul><li>         -1 0 </li></ul><ul><li>           1 1 </li></ul><ul><li>          2 6 </li></ul><ul><li>3 21 </li></ul><ul><li>5 179 </li></ul><ul><li>8 298 </li></ul><ul><li>12 141 </li></ul><ul><li>19 47 </li></ul><ul><li>          29 5 </li></ul><ul><li>      44 2 </li></ul><ul><li>68             0 </li></ul>
  66. 73. Visualizing A Histogram <ul><li>WITH HISTOGRAMS AS </li></ul><ul><li>( SELECT TOP, NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>WHERE SERVICE_CLASS_ID = 13 </li></ul><ul><li>   AND HISTOGRAM_TYPE = ‘CoordActLifetime’ </li></ul><ul><li>AND STATISTICS_TIMESTAMP = ‘2009-03-12-14.30.00.000000’ ) </li></ul><ul><li>SELECT TOP, </li></ul><ul><li>NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMS </li></ul><ul><li>ORDER BY TOP </li></ul>
  67. 74. Visualizing A Histogram <ul><li>WITH HISTOGRAMS AS </li></ul><ul><li>(SELECT TOP, NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>WHERE SERVICE_CLASS_ID = 13 </li></ul><ul><li>   AND HISTOGRAM_TYPE = ‘CoordActLifetime’ </li></ul><ul><li>AND STATISTICS_TIMESTAMP = ‘2009-03-12-14.30.00.000000’) </li></ul><ul><li>SELECT TOP, </li></ul><ul><li>NUMBER_IN_BIN / </li></ul><ul><li>(SELECT MAX(NUMBER_IN_BIN) FROM HISTOGRAMS) </li></ul><ul><li>FROM HISTOGRAMS </li></ul><ul><li>ORDER BY TOP </li></ul>
  68. 75. Visualizing A Histogram <ul><li>WITH HISTOGRAMS AS </li></ul><ul><li>(SELECT TOP, NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>WHERE SERVICE_CLASS_ID = 13 </li></ul><ul><li>   AND HISTOGRAM_TYPE = ‘CoordActLifetime’ </li></ul><ul><li>AND STATISTICS_TIMESTAMP = ‘2009-03-12-14.30.00.000000’) </li></ul><ul><li>SELECT TOP, </li></ul><ul><li>CAST(60 * NUMBER_IN_BIN / </li></ul><ul><li>(SELECT MAX(NUMBER_IN_BIN) FROM HISTOGRAMS) </li></ul><ul><li>AS INTEGER) </li></ul><ul><li>FROM HISTOGRAMS </li></ul><ul><li>ORDER BY TOP </li></ul>
  69. 76. Visualizing A Histogram <ul><li>WITH HISTOGRAMS AS </li></ul><ul><li>(SELECT TOP, NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>WHERE SERVICE_CLASS_ID = 13 </li></ul><ul><li>   AND HISTOGRAM_TYPE = ‘CoordActLifetime’ </li></ul><ul><li>AND STATISTICS_TIMESTAMP = ‘2009-03-12-14.30.00.000000’) </li></ul><ul><li>SELECT TOP, </li></ul><ul><li>SUBSTR(REPEAT(‘#’, </li></ul><ul><li>CAST(60 * NUMBER_IN_BIN / </li></ul><ul><li>(SELECT MAX(NUMBER_IN_BIN) FROM HISTOGRAMS) </li></ul><ul><li>AS INTEGER) ), 1, 60) AS GRAPH </li></ul><ul><li>FROM HISTOGRAMS </li></ul><ul><li>ORDER BY TOP </li></ul>
  70. 77. Visualizing A Histogram <ul><li>WITH HISTOGRAMS AS </li></ul><ul><li>(SELECT TOP, NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>WHERE SERVICE_CLASS_ID = 13 </li></ul><ul><li>   AND HISTOGRAM_TYPE = ‘CoordActLifetime’ </li></ul><ul><li>AND STATISTICS_TIMESTAMP = ‘2009-03-12-14.30.00.000000’) </li></ul><ul><li>SELECT TOP, </li></ul><ul><li>SUBSTR(REPEAT(‘#’, </li></ul><ul><li>CAST(60 * NUMBER_IN_BIN / </li></ul><ul><li>(SELECT MAX(NUMBER_IN_BIN) FROM HISTOGRAMS) </li></ul><ul><li>AS INTEGER) ), 1, 60) AS GRAPH </li></ul><ul><li>FROM HISTOGRAMS </li></ul><ul><li>ORDER BY TOP </li></ul><ul><li>TOP         GRAPH </li></ul><ul><li>----------- ------------------------------------------------------------ </li></ul><ul><li>-1 </li></ul><ul><li>1 </li></ul><ul><li>2 # </li></ul><ul><li>3 #### </li></ul><ul><li>5 #################################### </li></ul>
  71. 78. Visualizing A Histogram <ul><li>TOP         GRAPH </li></ul><ul><li>----------- ------------------------------------------------------------ </li></ul><ul><li>-1 </li></ul><ul><li>1 </li></ul><ul><li>2 # </li></ul><ul><li>3 #### </li></ul><ul><li>5 #################################### </li></ul><ul><li>8 ############################################################ </li></ul><ul><li>12 ############################ </li></ul><ul><li>19 ######### </li></ul><ul><li>29 # </li></ul><ul><li>44 </li></ul><ul><li>68 </li></ul><ul><li>103 </li></ul><ul><li>158 </li></ul><ul><li>241 </li></ul><ul><li>369 </li></ul><ul><li>562 </li></ul><ul><li>858 </li></ul><ul><li>1309 </li></ul><ul><li>1997 </li></ul>
  72. 79. Controlling The Range Of A Histogram <ul><li>create histogram template LIFETIME_TEMPLATE high bin value 44 </li></ul><ul><li>alter service class SYSDEFAULTSUBCLASS </li></ul><ul><li>under SYSDEFAULTUSERCLASS </li></ul><ul><li>activity lifetime histogram template LIFETIME_TEMPLATE </li></ul><ul><li>call wlm_collect_stats </li></ul><ul><li>Note: You must call wlm_collect_stats after changing a histogram template for the change to take effect. </li></ul>
  73. 80. Controlling The Range Of A Histogram <ul><li>Duplicates now occur because bin sizes grow exponentially and are rounded to an integer. </li></ul><ul><li>Modify the query to group by top and change number_in_bin to sum(number_in_bin) to eliminate these duplicates – however, number of bins is reduced </li></ul>
  74. 81. Controlling The Range Of A Histogram <ul><li>SELECT TOP, SUM(NUMBER_IN_BIN) NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>WHERE SERVICE_CLASS_ID = 13 </li></ul><ul><li>   AND HISTOGRAM_TYPE = ‘CoordActLifetime’ </li></ul><ul><li>AND STATISTICS_TIMESTAMP = ‘2009-03-12-14.30.00.000000’ </li></ul><ul><li>GROUP BY TOP </li></ul><ul><li>ORDER BY TOP </li></ul><ul><li>TOP         NUMBER_IN_BIN </li></ul><ul><li>----------- ------------- </li></ul><ul><li>         -1 0 </li></ul><ul><li>           1 1 </li></ul><ul><li>          2 6 </li></ul><ul><li>3 21 </li></ul><ul><li>4 89 </li></ul><ul><li>5 90 </li></ul><ul><li>6 91 </li></ul><ul><li>7 112 </li></ul><ul><li>          8 96 </li></ul><ul><li>      9 82 </li></ul>
  75. 82. The Purpose Of The Infinite Bin <ul><li>The infinite bin, or catch-all bin, is the bin whose TOP is -1 </li></ul><ul><li>Alerts you when the template fails to cover the entire range of the data </li></ul><ul><li>To choose a better high bin value: </li></ul><ul><ul><li>For lifetime, use COORD_ACT_LIFETIME_TOP high watermark </li></ul></ul><ul><ul><li>For estimated cost, use COST_ESTIMATE_TOP high watermark </li></ul></ul>
  76. 83. Why Do Bins Grow Exponentially?
  77. 84. <ul><ul><li>6 hours </li></ul></ul><ul><ul><li>40 bins </li></ul></ul><ul><ul><li>= </li></ul></ul><ul><ul><li>9 minutes/bin </li></ul></ul>
  78. 85. Service Level Agreements
  79. 86. Measuring For An SLA <ul><li>The activity lifetime histogram provides an easy way to measure against such an SLA. Here’s how: </li></ul><ul><ul><li>Convert NUMBER_IN_BIN into a percentage </li></ul></ul><ul><ul><li>Convert the percentage into a cumulative percentage </li></ul></ul>
  80. 87. Measuring For An SLA <ul><li>WITH HISTOGRAMS AS </li></ul><ul><li>(SELECT TOP, NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>WHERE SERVICE_CLASS_ID = 13 </li></ul><ul><li>   AND HISTOGRAM_TYPE = ‘CoordActLifetime’ </li></ul><ul><li>AND STATISTICS_TIMESTAMP = ‘2009-03-12-14.30.00.000000’) </li></ul><ul><li>SELECT TOP, </li></ul><ul><li>NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMS </li></ul><ul><li>ORDER BY TOP </li></ul>
  81. 88. Measuring For An SLA <ul><li>WITH HISTOGRAMS AS </li></ul><ul><li>(SELECT TOP, NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>WHERE SERVICE_CLASS_ID = 13 </li></ul><ul><li>   AND HISTOGRAM_TYPE = ‘CoordActLifetime’ </li></ul><ul><li>AND STATISTICS_TIMESTAMP = ‘2009-03-12-14.30.00.000000’) </li></ul><ul><li>SELECT TOP, </li></ul><ul><li>CAST(100 * NUMBER_IN_BIN / </li></ul><ul><li>(SELECT CAST(SUM(NUMBER_IN_BIN) AS DOUBLE) </li></ul><ul><li>FROM HISTOGRAMS) </li></ul><ul><li>AS DECIMAL(9,2)) PERCENTAGE_IN_BIN </li></ul><ul><li>FROM HISTOGRAMS </li></ul><ul><li>ORDER BY TOP </li></ul>
  82. 89. Measuring For An SLA <ul><li>WITH HISTOGRAMS AS </li></ul><ul><li>(SELECT TOP, NUMBER_IN_BIN </li></ul><ul><li>FROM HISTOGRAMBIN_DB2STATISTICS </li></ul><ul><li>WHERE SERVICE_CLASS_ID = 13 </li></ul><ul><li>   AND HISTOGRAM_TYPE = ‘CoordActLifetime’ </li></ul><ul><li>AND STATISTICS_TIMESTAMP = ‘2009-03-12-14.30.00.000000’) </li></ul><ul><li>SELECT TOP, </li></ul><ul><li>CAST(100 * NUMBER_IN_BIN / </li></ul><ul><li>(SELECT CAST(SUM(NUMBER_IN_BIN) AS DOUBLE) </li></ul><ul><li>FROM HISTOGRAMS) </li></ul><ul><li>AS DECIMAL(9,2)) PERCENTAGE_IN_BIN </li></ul><ul><li>CAST((SELECT 100 * SUM(NUMBER_IN_BIN) </li></ul><ul><li>FROM HISTOGRAM_BIN </li></ul><ul><li>WHERE TOP <= OUTERHIST.TOP) / </li></ul><ul><li>(SELECT CAST(SUM(NUMBER_IN_BIN) AS DOUBLE) </li></ul><ul><li>FROM HISTOGRAMS) </li></ul><ul><li>AS DECIMAL(9,2)) CUMULATIVE_PERCENTAGE </li></ul><ul><li>FROM HISTOGRAMS AS OUTERHIST </li></ul><ul><li>WHERE TOP != -1 </li></ul><ul><li>ORDER BY TOP </li></ul>
  83. 90. Measuring For An SLA <ul><li>TOP         PERCENTAGE_IN_BIN CUMULATIVE_PERCENTAGE </li></ul><ul><li>----------- ----------------- --------------------- </li></ul><ul><li>1 0.00 0.00 </li></ul><ul><li>2 0.09 0.09 </li></ul><ul><li>3 0.00 0.09 </li></ul><ul><li>5 0.09 0.19 </li></ul><ul><li>8 0.00 0.19 </li></ul><ul><li>12 0.09 0.28 </li></ul><ul><li>19 0.28 0.57 </li></ul><ul><li>29 0.09 0.67 </li></ul><ul><li>44 0.76 1.44 </li></ul><ul><li>         68 0.67 2.11 </li></ul><ul><li>         103 1.34 3.46 </li></ul><ul><li>        158 1.53 5.00 </li></ul><ul><li>241 1.92 6.92 </li></ul><ul><li>369 2.98 9.91 </li></ul><ul><li>562 1.63 11.54 </li></ul><ul><li>858 2.11 13.66 </li></ul><ul><li>1309 0.86 14.53 </li></ul><ul><li>        1997 2.02 16.55 </li></ul><ul><li>    3046 2.02 18.57 </li></ul><ul><li>4647 3.84 22.42 </li></ul><ul><li>7089 5.67 28.10 </li></ul><ul><li>10813 7.12 35.22 </li></ul><ul><li>16493 6.35 41.57 </li></ul><ul><li>25157 12.41 53.99 </li></ul><ul><li>38373 15.39 69.39 </li></ul><ul><li>58532 13.95 83.34 </li></ul><ul><li>89280 10.49 93.84 </li></ul>
  84. 91. Measuring For An SLA <ul><li>TOP         PERCENTAGE_IN_BIN CUMULATIVE_PERCENTAGE </li></ul><ul><li>----------- ----------------- --------------------- </li></ul><ul><li>1 0.00 0.00 </li></ul><ul><li>2 0.09 0.09 </li></ul><ul><li>3 0.00 0.09 </li></ul><ul><li>5 0.09 0.19 </li></ul><ul><li>8 0.00 0.19 </li></ul><ul><li>12 0.09 0.28 </li></ul><ul><li>19 0.28 0.57 </li></ul><ul><li>29 0.09 0.67 </li></ul><ul><li>44 0.76 1.44 </li></ul><ul><li>         68 0.67 2.11 </li></ul><ul><li>         103 1.34 3.46 </li></ul><ul><li>        158 1.53 5.00 </li></ul><ul><li>241 1.92 6.92 </li></ul><ul><li>369 2.98 9.91 </li></ul><ul><li>562 1.63 11.54 </li></ul><ul><li>858 2.11 13.66 </li></ul><ul><li>1309 0.86 14.53 </li></ul><ul><li>        1997 2.02 16.55 </li></ul><ul><li>    3046 2.02 18.57 </li></ul><ul><li>4647 3.84 22.42 </li></ul><ul><li>7089 5.67 28.10 </li></ul><ul><li>10813 7.12 35.22 </li></ul><ul><li>16493 6.35 41.57 </li></ul><ul><li>25157 12.41 53.99 </li></ul><ul><li>38373 15.39 69.39 </li></ul><ul><li>58532 13.95 83.34 </li></ul><ul><li>89280 10.49 93.84 </li></ul>
  85. 92. For More Information <ul><li>DB2 workload management histograms, Part 1: A gentle introduction to histograms </li></ul><ul><li>https://www6.software.ibm.com/developerworks/offers/kits/db2/dbakit/articles/dm-0810mcdonald/ </li></ul><ul><li>DB2 workload management histograms, Part 2: Understanding the six histograms of DB2 workload management </li></ul><ul><li>https://www6.software.ibm.com/developerworks/offers/kits/db2/dbakit/articles/dm-0810mcdonald2/ </li></ul><ul><li>DB2 workload management histograms, Part 3: Visualizing and deriving statistics from DB2 histograms using SQL </li></ul><ul><li>https://www6.software.ibm.com/developerworks/offers/kits/db2/dbakit/articles/dm-0810mcdonald3/ </li></ul>

Notas

  • Two of the most common applications of monitoring are performance tuning and problem determination. In these slides, we will see how histograms and other statistics can help with tuning exercises and how monitoring individual activities (e.g. queries) helps with problem determination.
  • This is a big topic, so I am going to limit this talk to the new functionality in DB2 for LUW 9.5 rather than cover the various monitoring features DB2 has gained over the years since its introduction. I am not going to talk about low-level details like bufferpool hit ratios, but high-level statistics like response time and throughput information. We aren&apos;t going to look at individual queries and analyze them with explains, but we will look at the system as a whole and see what steps can be taken when overall system performance is being impacted.
  • There are 3 types of new monitoring features. This talk is focused on histograms.
  • I&apos;m going to start by talking about the distributions and I will start with the most famous one. The Bell Curve, or Gaussian, or Normal distribution has a special property: the mean, the median, and the mode all occur at the same spot on the curve, right in the middle.
  • Let&apos;s say you measure height. You have a sample of five men and they report their heights as follows: 5&apos;10&apos;&apos;, 5&apos;8&apos;&apos;, 5&apos;10&apos;&apos;, 6&apos;0&apos;&apos;, 5&apos;10&apos;&apos;. The arithmetic mean of those heights is the sum of the 5 heights divided by 5, or 5&apos;10&apos;&apos;. The median of those heights is the number that occurs in the middle when we order the heights from smallest to largest or largest to smallest. The mode is the most frequently occurring height. In a bell curve, the mode, median, and mean are all the same number. If you take a random sample of, say, 1000 people, the histogram of their heights would take the shape of a bell curve.
  • Another type of distribution is called the Power Law distribution, or long-tailed distribution. It has a very different set of properties than a bell curve. The smallest measurements are enormously more frequent than the larger ones. Also, the frequency of seeing a very large measurement doesn&apos;t fall off quickly towards zero as it would on a bell curve – it keeps going, which is where it gets the name “the long tail”.
  • Clay Shirky, in his book “Here Comes Everybody”, imagined what it would be like if height followed a power law instead of a bell curve and he sampled the heights of two hundred men.
  • We saw that, with a bell curve, the mean of 5&apos;10&apos;&apos; was a good representative height – most of us don&apos;t stray too far from that. However, when the distribution of heights in the population is a power law, the mean of 5&apos;10&apos;&apos; isn&apos;t very representative any more. Why am I talking about power laws? I am talking about power laws because almost everything having to do with database workloads, such as the distribution of response times, or the times between arrivals of queries into a database system, takes the form of a power law, or something very close to it.
  • If averages aren&apos;t useful, what do we do? We need to see the whole distribution, which only a histogram can provide.
  • To draw a histogram, you draw a horizontal axis and you divide it up into class intervals. That&apos;s your measurement axis. For example, height would go along this axis. You could break in off into: 4 to just under 5 feet, 5 to just under 6 feet, 6 to just under 7 feet, etc. Then you draw your vertical axis and this measures the number of observations. We had 5 observations before.
  • Four were between 5 and just under 6 feet, so we would draw a box whose height is four over the 5 to just under 6 feet class interval. One was between 6 and just under 7 feet so we would draw a box with a height of 1 over the 6 to just under 7 feet class interval.
  • Let&apos;s do another one. Here is some data on query response times that we&apos;ll turn into a histogram. They&apos;re sorted from smallest to largest.
  • On the right, we now have a table of class intervals that will fit all of the response times on the left. We just need to count the observations. How many queries on the left fell into the first bin [0 to 1]?
  • That&apos;s right: One.
  • How many fell into the range (1 to 2]? Four.
  • And (2 to 3]? Three.
  • And we keep going until we&apos;ve finished building the histogram. That&apos;s the histogram in table form, but we can better look as a graph.
  • This is very little data, so it doesn&apos;t take the form of any particular curve. We&apos;ll come back to the importance of having enough data later.
  • I think we understand what a histogram is, so what is it good for? I&apos;ll talk about each of these advantages in detail in the next few slides.
  • Let&apos;s look at a scenario to illustrate why seeing the distribution is useful. Say we have a database and during the first month in which we examine it, all the users are happy with its performance. We look at both the average response time (19 seconds) and the standard deviation (6 seconds), and we assume that if we continue to get these numbers, users will be happy, but we are wrong.
  • Now we look at month 2. Users are complaining? What changed? Average response times are the same as month 1 aren’t they?
  • Going back to month 1, we see the histogram is a bell curve centered at 19 seconds
  • Now we see the problem. The average response time for half the users went up over 30% - they’re the ones complaining. It improved just as much for the other half of the users. That&apos;s why it is useful to be able to see the distribution.
  • Another reason for having histograms is to detect outliers. In his book, The Black Swan, Nassim Nicholas Taleb wrote about rare events that have a big, relevant impact. He called these rare events black swans because, up until the 17 th Century, everyone knew that all swans were white. But then black swans were discovered in Australia and a perceived impossibility had come to pass.
  • Taleb distinguished between two kinds of outliers. The first kind has a small, almost negligible impact. For example, say you have a group of 1000 people and you measure their weights and you compute an average. And then you add an elephant, the heaviest land animal on earth, to the group. It changes the average by less than 10%.
  • The other type of outlier can have a significant effect on the mean, in effect, making it useless as a measurement. Say you have the same thousand people and instead of measuring their weights, you measure their net worths. Then, instead of adding the heaviest land animal on earth, you add one of the richest men on earth. If their mean net worth had been, say $200,000, adding Bill Gates to the group changes that mean to $80 million – that&apos;s an increase of over 40,000%.
  • We can look at this in the form of a histogram. On the left, we see the net worths of the thousand people. Then we see a large number of empty bins.
  • Then we see Bill Gate&apos;s net worth on the far right. Take a look at that horizontal axis – that&apos;s an exponential scale. To get a useful idea of what a representative net worth is, we want exclude Bill Gates from the calculation and the histogram let&apos;s us do that. Another alternative might be to compute the median rather than the arithmetic mean. The problem is, unlike a running average, where we only need to keep track of the average and the count, for the median, we need to keep every data point and this becomes a problem when you&apos;re, say, trying to track the median of query response time when you have hundreds of thousands of queries per day, sometimes thousands per second. A histogram gives you the ability to identify and potentially filter out outliers without the performance and storage penalty of keeping around the median.
  • A final advantage of histograms is that they are easy to aggregate together, much like averages are. So you can keep a histogram of the daily response times of your database activities and then, once a month, aggregate these histograms into one histogram for the month.
  • I mentioned that averages are easy to calculate, but you still have to be careful. In his book Innumeracy, John Allen Paulos gives an example of a common misconception in combining averages that gives rise to an apparent paradox involving batting averages.
  • Paulos then proceeds to show us how this can occur. He says that Ruth bats .300 in the first half compared to Gehrig&apos;s .290, then Ruth bats .400 in the second half compared to Gehrig&apos;s .390. The key is that Ruth has 200 at-bats in the first half compared to Gehrig&apos;s 100 and Gehrig has 200 at-bats in the second half compared to Ruth&apos;s 100.
  • So we multiply the average by the at-bats and we get 60 for the first half for Ruth.
  • And we get 40 for the second half for Ruth.
  • We get 29 and 78 for Gehrig.
  • Ruth sums to 100.
  • Gehrig sums to 107 for the same number of at-bats as Ruth, giving Gehrig the higher batting average.
  • So, combining averages is fairly easy, you just have to remember to multiply by the count. But if you want to somehow combine anything more sophisticated than the average, such as standard deviation, you&apos;re out of luck. However, you can combine histograms and it is even easier than combining averages – you just add the observation counts. There is no multiplication necessary. Say you have a histogram of database query response times for January. It is a bell curve with an average between 5 and 6 seconds.
  • Then in February, you have a response time histogram with an average between 15 and 16, also a bell curve, but it is a taller one since there were more queries executed in February than January. If you combine them, you get the Jan + Feb column and adding these is easy since there is hardly any overlap, so in most cases, you are just adding a zero to the number from January or a zero to the number from February.
  • Then you add March. March is also a bell curve, with its average somewhere between 10 and 11 seconds. The combined histogram for the first quarter is now definitely not a bell curve, but it was easy to put together.
  • In support of the Workload Manager work in DB2 9.5, we added six histograms. Five are related to activities. Activities are general concept we use in DB2 Workload Manager to talk about queries, loads, and DDL. I am going to speak in more detail on the next few slides about what each of these activity histograms measure. Also, one histogram exists for requests. Requests are the OPEN, FETCH, CLOSE actions that make up a cursor activity, or may not be part of an activity at all, such as PREPARE request. They may also not be externalized. There are requests from one partition to another in systems that have the data partitioning feature.
  • Let&apos;s talk about the lifecycle of an activity. Here is an activity that illustrates the first three measurements that are made into histograms: execution time, queue time, and lifetime. This activity is a cursor activity: it gets opened, it fetches twice, and is closed. In between each fetch, there is a delay while DB2 waits for the client to submit a new request, such as the next FETCH request or the CLOSE request. This activity is also queued by the Workload Manager. We measure the queue time as the time between when the activity arrived into the system and when it comes off the queue to execute. In this case, the queue time was 20 seconds. The execution time is the time DB2 spends executing the activity, but not any idle time spent waiting for the next request from the client. So execution is just those four gray colored 3 second blocks for a total of 12 seconds. The lifetime is the total time an activity spends in the system, including the queue time, including the execution time, and including the idle time. Since the total idle time was also 12 seconds (4 + 4 + 4), the lifetime is 44 seconds (20 seconds of queue time + 12 of idle time + 12 of execution time). So those are the first 3 measurements for which we have histograms.
  • The fourth measurement is something called inter-arrival time. This is the time between the arrival of one activity into the system and the arrival of the next. It is the inverse of the arrival rate. In this example, we have three activities.
  • If we say activity 1 arrives at time 0 and activity 2 arrives at time 3, the inter-arrival time between activity 1 and activity 2 is 3. And between activity 2 and activity 3, the inter-arrival time is 12. So we have three activities arriving in a 15 second period. That&apos;s an average arrival rate of 0.2 activities per second, which converts to an average inter-arrival time of 5 seconds.
  • The last of the measurements for activity histograms is a measurement of estimated cost. This is the cost that the SQL compiler computes whenever you compile a DML activity and by DML I mean SELECT, INSERT, UPDATE, DELETE and the variations on that. This cost is a weighted combination of CPU and I/O resources that DB2 can tell from the access plan and assumptions about the cardinality of the tables that the activity is going to consume. As such, it is highly dependent on table and index statistics being up-to-date. Let&apos;s assume you have a query with negligible I/O demands and on a single CPU might look like the diagram. Here we have a query with a cost of 7. Costs are generally in units of timerons, but let&apos;s assume that we have somehow converted timerons into estimated timeslices on the CPU. The blue vertical bars represent a timeslice. Query 1 demands 7 timeslices and it is the only thing running on the system so its execution time is 7, indicated by the E=7 to the right of the query. We&apos;re assuming this isn&apos;t a cursor activity, so there is no idle time. The lifetime is exactly the same as its execution time, which is 7. This is indicated by the L=7. The queue time is zero, so we Q=0 to the right of the query as well. What do you think will happened when our query has to share the one CPU with another query.
  • What happens is its lifetime gets longer because it has to share the CPU with query 2. Query 2 suffers in the same way. The CPU will round robin between the two queries. When one them finishes (in this case, query 1 finishes first) the other query will get the CPU to itself. Notice that the lifetime and execution time of query 1 increased from 7 on the previous slide up to 12.
  • If we add another query , query 1&apos;s lifetime and execution time jump up to 16...
  • and then to 18 if we add a fourth query...
  • ... and then to 20 if we add a fifth. Query 1 now almost takes 3 times as long to finish as it did original when it finished in 7 seconds.
  • Let&apos;s go back to two queries at the same time. Let&apos;s say that instead of trying to run both queries are the same time, we use Workload Manager to limit the concurrency to 1 and then we queue anyone else that comes in. Can you guess what happens to query 1? To query 2.
  • Query 1 finishes in 7 timeslices because it arrived first and doesn&apos;t get queued. Then query 2 arrives while query 1 is still running. It has to queue until query 1 finishes which is 6 timeslices later. Then it gets the CPU to itself where it takes 9 timeslices to finish, for a total of 15. Notice that the sum of the queue time and execution time is the lifetime. I hope this explains the idealized concept of estimated cost and why you can&apos;t just assume estimated cost will tell you exactly how long your query will run without taking into account sharing of resources such as CPU resources.
  • I want to talk now about how much data you need to see the underlying distribution of your data. Here we have estimated cost histograms for two workloads that are identical in terms of their distribution. The individual costs are pseudo-random numbers . A pseudo-random number is generated by a pseudo-random number generator. A pseudo-random number generator takes a “seed” and generates one pseudo-random number after another. If you ask a pseudo-random number generator to generate a second set of numbers but you give it the same seed, you get the same set of numbers. If you give a pseudo-random number generator a different seed, you get a different set of pseudo-random numbers but these numbers are still distributed the same way no matter what the seed is. In the histogram I used here, I could have used a bell curve for the distribution, but to show another example of a distribution, I used an exponential distribution. The first thing you should notice is that these histograms look nothing alike. That&apos;s because you have too little data – only 10 activities. Let&apos;s see what happens when you have 50 activities...
  • A little better. We&apos;re starting to converge on a shape, but they&apos;re still quite different. Let&apos;s see what 250 activities can do...
  • The histograms are almost identical now. Additionally, both are starting to look like an idealized exponential distribution. This gives us insight into how we can tell when we have enough data to make good comparisons of one histogram with another. Here is a trick for ensuring you have enough data to do such comparisons: Say you have two periods of time that you want to compare. Say the first time period is between 2:00 and 3:00 PM on a Thursday afternoon. Then you take the following hour, between 3:00 and 4:00 PM and you measure histograms during each of these time periods and you compare them. If you get something like the first of the three examples, you double the size of the interval and you compare that to the next double-size interval. In our example, you would compare a 2:00 to 4:00 PM interval with a 4:00 to 6:00 PM interval. If they still don&apos;t match closely, double again. Eventually, the curves will either start to look similar as in the third example or else their distributions have changed in some way.
  • That&apos;s enough theory. Let&apos;s look at how we can use histograms in the DB2 for LUW Workload Manager. There are three steps to collecting histograms (and other aggregates like high watermarks and counters). First, alter the service class or work action set that you want to collect histograms on and specify the COLLECT AGGREGATE ACTIVITY DATA option. Use the BASE option to get the three basic histograms.
  • Use the EXTENDED option to get the original 3 histograms plus the inter-arrival and estimated cost histograms.
  • Finally, to collect the request histogram, specify the COLLECT AGGREGATE REQUEST DATA BASE clause. Note that there is no EXTENDED option.
  • The second step is to create an event monitor to write out the histograms and you need to set its state to 1 to activate it.
  • Finally, you need to say when you want a collection to happen. You can set it up to automatically collect periodically using the WLM_COLLECT_INT database configuration parameter. Here we have it collecting once a day. Otherwise, you can have it collect immediately using the WLM_COLLECT_STATS stored procedure.
  • When you turn on collection, activated your event monitor and triggered a collection, a table or file having this structure is what you get. You can get it as a table or a file depending on the type of event monitor you create. Note that this is a histogram bin table rather than a histogram table. Each row in this table is a bin in a histogram.
  • The name after HISTOGRAMBIN is the name of the event monitor that you created in step 2.
  • Let&apos;s look at the columns in this table: You can uniquely identify the histogram that this bin belongs to by looking at the combination of service_class_id, statistics_timestamp and histogram_type, assuming you set your service class up as generating this histogram.
  • If, on the other hand, the histogram belongs to a work action set, you uniquely identify it by using histogram_type and statistics_timestamp with a work_action_set_id and a work_class_id. The histogram_type column has values like “CoordActLifetime” for the activity lifetime histogram or “CoordActQueueTime” for the activity queue time histogram.
  • So now that you know how to uniquely identify a histogram, you know how to write a WHERE clause for queries against this table, but what should you return in your output? It turns out you only need two columns: TOP and NUMBER_IN_BIN. You don&apos;t necessarily need the BOTTOM column because it is just the TOP of the previous bin. The NUMBER_IN_BIN is just the number of observations.
  • Let&apos;s try to visualize a histogram by querying the histogram bin table. You select the two columns from the previous slide, top and number_in_bin.
  • Let&apos;s say we want a service classes histogram so we specify the service_class_id, say, 13. You can look up these numbers in the syscat.serviceclasses table or do a join with that table.
  • Say we want to look at query lifetimes and...
  • ...say we did a select distinct statistics_timestamp and we got back a timestamp of 2:30 PM on March 12, 2009.
  • Let&apos;s order it by the TOP column in ascending order.
  • This gives us an idea of the histogram shape, but it could be more visual.
  • Let&apos;s refactor this code a little bit first. We&apos;ll move the query into a WITH to make the main select simpler because we&apos;re going to complicate it a bit.
  • Let&apos;s replace number_in_bin with the proportion of the peak number_in_bin.
  • Now we multiply by the number of columns that we want a bar in our bar graph at most to take up, in this case, 60 columns.
  • Here&apos;s the fun part: this idea comes from a blog post by by Bernd Kuennen (http://154pm.blogspot.com/2008/01/generating-daily-usage-statistics.html)
  • And that gives us a histogram graph.
  • Here’s a better picture. Notice that, after the bin with a top of 29, everything else is empty. Only the first 8 bins have any data in them and there are 41 bins altogether. That’s a whole lot of empty bins. That is because the default is to fit 6 hours into the first 40 bins and the longest we ran was 29 milliseconds or under. As a result, every bin between 29 milliseconds and 6 hours is empty.
  • We can provide a tighter bound than 6 hours by modifying what is called the “template” of the histogram. We do this by creating a histogram template in the database with a new largest bin value to use instead of 6 hours. In this case, I chose 44 milliseconds. Then we tell the service class to use the template by altering it and specifying our template as the lifetime histogram template for that service class. Then we need to collect stats once more to make the change take effect.
  • You may have noticed that the bins aren’t all the same size. This is deliberate and I will talk about that in a bit, but one of the consequences that concerns us now is that when you reduce the top of the highest bin with something tiny like 44 milliseconds, several of your smallest bins now have TOP values that are “one point something” and since they are all integers they get rounded down to 1. So you get multiple bins with the same TOP. Only the first of those bins has any data put in it. It’s not pretty to see all those duplicates, so you can add a GROUP BY on the TOP column and put an aggregating function on the NUMBER_IN_BIN to make it look better.
  • Here we’ve added an aggregating function and a GROUP BY and we now see the histogram with a highest bin top of 44. Now almost every bin contains data. It can be useful to fit the histogram to your data to get better precision as we’ve done here, but remember to leave some room to cover any surprises.
  • If you don’t leave room, the surprises end up in what is called the “catch-all” bin. This bin is the one marked with a TOP of -1. A non-zero value in this bin alerts you that you need to modify your histogram template to have a higher top bin value. For help in choosing a good top bin value, you can use the COORD_ACT_LIFETIME_TOP high watermark for lifetime histograms and it should work fairly well for execution time histograms too. For estimated cost histograms, you can use the COST_ESTIMATE_TOP high watermark.
  • DB2 histogram bins are not all the same size – they increase exponentially This is because of the nature of the data The smallest activities finish in milliseconds, the longest in many hours Imagine if all bins were the same size High bin value is 6 hours
  • 6 hours / 40 bins = 9 minutes per bin The smallest queries had response times in tens of milliseconds The histogram we visualized earlier would have all data in the first bin and nothing in the other bins
  • A service level agreement (SLA) formally defines the level of service in a service contract It often contains performance metrics such as throughput or response time requirements Sometimes, it is specified as just an average Other times, it is expressed as “X% of activities must finish with Y minutes” e.g. “95% of activities must finish within 25 seconds”. Remember this example SLA since we’ll come back to it later.
  • We can use the activity lifetime histogram to easily see if we are meeting our SLA. First we convert NUMBER_IN_BIN to a percentage of the total number in the histogram. Then we sum the counts in each bin with those of all the bins that came before it. That will give us a cumulative percentage.
  • Let’s go back to the query we used earlier and modify it to allow us to measure against an SLA.
  • Here we are scaling the NUMBER_IN_BIN into a PERCENTAGE_IN_BIN…
  • Then we add another column called CUMULATIVE_PERCENTAGE and we’ll filter out the “catch-all” bin.
  • Here’s what we get. Let’s see if activities are meeting our SLA. If you recall, our SLA was 95% of activities finish within 25 seconds.
  • We see in the text highlighted in blue that only 53.99% of activities are finishing within 25.157 seconds rather than 95%, so these activities are not meeting their SLA.
  • If you found this talk interesting, there are three articles that I wrote on developerWorks on which I based these slides. If you remember anything about this talk, I hope that you remember that the distribution of your data is important, or else you’d see 200 ft giants walking around and most of us would be 1 foot tall. I hope you remember that when Bill Gates joins a crowd, he drastically changes the crowd’s net worth, but a histogram let’s us get back to the real data. Finally, I hope you remember that histograms are as easy to combine together as the batting averages of two baseball greats.
  • ×