SlideShare uma empresa Scribd logo
1 de 17
VLDB Statistics Gathering Strategy


Something You Need to Know But Maybe You Don’t
About Me

                   • Technical Director @ Enmo Tech
                   • ACOUG Co-founder, President



                   • Interests:
                       – My wife and my son
                       – Database technology (all related)
                       – World of Warcraft (online game)




                   • http://www.enmotech.com
                   • http://www.acoug.org
                   • http://www.dbform.com

云和恩墨 成就所托
Statistics Gathering

• Manually
  – dbms_stats.gather_table_stats


• Automatically
  – 10g: DBMS_SCHEDULER
  – 11g: DBMS_AUTO_TASK_ADMIN




云和恩墨 成就所托
What is Histogram
12

10

 8
                                                 Value 1
                                                 Value 2
 6
                                                 Value 3
 4                                               Value 4
                                                 Value 5
 2

 0
     Column 1   Column 2   Column 3   Column 4



云和恩墨 成就所托
What is Histogram

• frequency
   800
                                       700
   700
   600
                                 500
   500                                       Distinct Value 1
                        400                  Distinct Value 2
   400
                                             Distinct Value 3
   300
                200                          Distinct Value 4
   200                                       Distinct Value 5
          100
   100
     0
                      Records#



云和恩墨 成就所托
What is Histogram

• height balanced
   120
         1   2   3   4   5   5      5   6   7   8   Bucket 1
   100
                                                    Bucket 2
    80                                              Bucket 3
                                                    Bucket 4
    60                                              Bucket 5
                                                    Bucket 6
    40
                                                    Bucket 7
    20                                              Bucket 8
                                                    Bucket 9
     0                                              Bucket 10
                         Records#



云和恩墨 成就所托
When We DON’T Need Histogram?


• 此列不用于查询

• 列值分布平均

• 不需要多个执行计划




云和恩墨 成就所托
Histogram Impact




                               COST!!


云和恩墨 成就所托
How Many Choices
                                           dba_tab_modific
• method_opt=>                                 ations

  –   for columns size skewonly [column_name]
  –   for columns size auto [column_name]
                                                COL_USAGE$
  –   for columns size repeat [column_name]
  –   for columns size 1 [column_name]

  – for all columns
  – for all indexed columns




云和恩墨 成就所托
How Should We Do?

• Step 1
   – method_opt=>FOR ALL COLUMNS SIZE 1
• Step 2 (repeat)
   – method_opt=>FOR COLUMNS SIZE AUTO [COLUMN_NAME]
• Step 3
   – Using atomatic statistics gathering job
   – (10g) exec DBMS_STATS.SET_PARAM ('METHOD_OPT', 'FOR
     ALL COLUMNS SIZE REPEAT');
   – (11g) exec
     DBMS_STATS.SET_GLOBAL_PREFS('METHOD_OPT','FOR ALL
     COLUMNS SIZE REPEAT');



云和恩墨 成就所托
What is Granularity

•   只针对分区表
•   ALL
•   AUTO (默认值)
•   DEFAULT = GLOBAL AND PARTITION
•   GLOBAL
•   GLOBAL AND PARTITION
•   PARTITION
•   SUBPARTITION



云和恩墨 成就所托
Global Statistics

• 收集统计信息
 exec dbms_stats.gather_table_stats('KAMUS', 'TAB_PART’);


• 聚合统计信息 – 降低开销
 exec
   dbms_stats.gather_table_stats(‘KAMUS’, ‘TAB_PART’, GRANU
   LARITY => SUBPARTITION);


 TAB_PART表,20万记录
 收集统计信息:Consistent Read = 23432
 聚合统计信息:Consistent Read = 12036



云和恩墨 成就所托
坏的情况

• 新加载数据
• 收集有数据变化的子分区统计信息
 exec
   dbms_stats.gather_table_stats('KAMUS','TAB_PART', GRANU
   LARITY => 'SUBPARTITION', PARTNAME =>
   'P_20111206_BEIJING');


• 聚合统计信息正确
• 列统计信息呢?NDV




云和恩墨 成就所托
还有坏的情况

• 增加子分区
 ALTER TABLE TAB_PART
 ADD PARTITION P_20111208 VALUES LESS THAN (20111209);
• 新加载数据
• 收集有数据变化的子分区统计信息
 exec
   dbms_stats.gather_table_stats('KAMUS','TAB_PART', GRANULAR
   ITY => 'SUBPARTITION', PARTNAME =>
   'P_20111208_BEIJING');



• 聚合统计信息呢?
• 列统计信息呢?

云和恩墨 成就所托
还有更坏的情况

• 新创建的分区表没有任何数据
• 收集分区统计信息
 exec dbms_stats.gather_table_stats('KAMUS','TAB_PART’);


• 新加载数据
• 聚合?
 • 收集子分区的统计信息?
 • 收集全部子分区的统计信息?
 • 收集全部分区的统计信息?




云和恩墨 成就所托
Conclusion




• 如果只选择收集SUBPARTITION统计信息,那么要
  确认聚合统计信息会正确生成。




云和恩墨 成就所托
问          答
云和恩墨 成就所托

Mais conteúdo relacionado

Semelhante a Vldb Statistics Gathering Strategy

Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data邦宇 叶
 
设计高性能mysql应用-TechClub技术沙龙
设计高性能mysql应用-TechClub技术沙龙设计高性能mysql应用-TechClub技术沙龙
设计高性能mysql应用-TechClub技术沙龙banping
 
MySQL查询优化浅析
MySQL查询优化浅析MySQL查询优化浅析
MySQL查询优化浅析frogd
 
淘宝数据魔方的系统架构 -长林
淘宝数据魔方的系统架构 -长林淘宝数据魔方的系统架构 -长林
淘宝数据魔方的系统架构 -长林Shaoning Pan
 
Hbase介绍
Hbase介绍Hbase介绍
Hbase介绍Kay Yan
 
Mysql调优
Mysql调优Mysql调优
Mysql调优ken shin
 
PostGIS 初入門應用
PostGIS 初入門應用PostGIS 初入門應用
PostGIS 初入門應用Chengtao Lin
 
淘宝网前台应用性能优化实践
淘宝网前台应用性能优化实践淘宝网前台应用性能优化实践
淘宝网前台应用性能优化实践丁 宇
 
改善Programmer生活的sql技能
改善Programmer生活的sql技能改善Programmer生活的sql技能
改善Programmer生活的sql技能Rack Lin
 
Skyline 简介
Skyline 简介Skyline 简介
Skyline 简介琛琳 饶
 
Php study.20130110
Php study.20130110Php study.20130110
Php study.20130110bngoogle
 
阿里CDN技术揭秘
阿里CDN技术揭秘阿里CDN技术揭秘
阿里CDN技术揭秘Joshua Zhu
 
郑焕义 重温网站重构
郑焕义 重温网站重构郑焕义 重温网站重构
郑焕义 重温网站重构Webrebuild
 
Reviews of Designing with Web Standards
Reviews of Designing with Web StandardsReviews of Designing with Web Standards
Reviews of Designing with Web Standardsavenirzheng
 
MySQL InnoDB 源码实现分析(一)
MySQL InnoDB 源码实现分析(一)MySQL InnoDB 源码实现分析(一)
MySQL InnoDB 源码实现分析(一)frogd
 

Semelhante a Vldb Statistics Gathering Strategy (18)

Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
设计高性能mysql应用-TechClub技术沙龙
设计高性能mysql应用-TechClub技术沙龙设计高性能mysql应用-TechClub技术沙龙
设计高性能mysql应用-TechClub技术沙龙
 
MySQL查询优化浅析
MySQL查询优化浅析MySQL查询优化浅析
MySQL查询优化浅析
 
淘宝数据魔方的系统架构 -长林
淘宝数据魔方的系统架构 -长林淘宝数据魔方的系统架构 -长林
淘宝数据魔方的系统架构 -长林
 
Tangramgrid
TangramgridTangramgrid
Tangramgrid
 
Hbase介绍
Hbase介绍Hbase介绍
Hbase介绍
 
Mysql调优
Mysql调优Mysql调优
Mysql调优
 
PostGIS 初入門應用
PostGIS 初入門應用PostGIS 初入門應用
PostGIS 初入門應用
 
淘宝网前台应用性能优化实践
淘宝网前台应用性能优化实践淘宝网前台应用性能优化实践
淘宝网前台应用性能优化实践
 
改善Programmer生活的sql技能
改善Programmer生活的sql技能改善Programmer生活的sql技能
改善Programmer生活的sql技能
 
Skyline 简介
Skyline 简介Skyline 简介
Skyline 简介
 
Python 温故
Python 温故Python 温故
Python 温故
 
Php study.20130110
Php study.20130110Php study.20130110
Php study.20130110
 
阿里CDN技术揭秘
阿里CDN技术揭秘阿里CDN技术揭秘
阿里CDN技术揭秘
 
HBase
HBaseHBase
HBase
 
郑焕义 重温网站重构
郑焕义 重温网站重构郑焕义 重温网站重构
郑焕义 重温网站重构
 
Reviews of Designing with Web Standards
Reviews of Designing with Web StandardsReviews of Designing with Web Standards
Reviews of Designing with Web Standards
 
MySQL InnoDB 源码实现分析(一)
MySQL InnoDB 源码实现分析(一)MySQL InnoDB 源码实现分析(一)
MySQL InnoDB 源码实现分析(一)
 

Mais de Leyi (Kamus) Zhang

Mais de Leyi (Kamus) Zhang (9)

Oracle 12.2 sharding learning more
Oracle 12.2 sharding learning moreOracle 12.2 sharding learning more
Oracle 12.2 sharding learning more
 
Oracle 12.2 sharded database management
Oracle 12.2 sharded database managementOracle 12.2 sharded database management
Oracle 12.2 sharded database management
 
Mac & Oracle
Mac & OracleMac & Oracle
Mac & Oracle
 
Vim - Amazing Editor for DBAs
Vim - Amazing Editor for DBAsVim - Amazing Editor for DBAs
Vim - Amazing Editor for DBAs
 
Hanganalyze presentation
Hanganalyze presentationHanganalyze presentation
Hanganalyze presentation
 
Exadata
ExadataExadata
Exadata
 
Kamus silde for summit
Kamus silde for summitKamus silde for summit
Kamus silde for summit
 
Understanding histogramppt.prn
Understanding histogramppt.prnUnderstanding histogramppt.prn
Understanding histogramppt.prn
 
DTCC Rac Load Balancing Failover
DTCC Rac Load Balancing FailoverDTCC Rac Load Balancing Failover
DTCC Rac Load Balancing Failover
 

Vldb Statistics Gathering Strategy

  • 1. VLDB Statistics Gathering Strategy Something You Need to Know But Maybe You Don’t
  • 2. About Me • Technical Director @ Enmo Tech • ACOUG Co-founder, President • Interests: – My wife and my son – Database technology (all related) – World of Warcraft (online game) • http://www.enmotech.com • http://www.acoug.org • http://www.dbform.com 云和恩墨 成就所托
  • 3. Statistics Gathering • Manually – dbms_stats.gather_table_stats • Automatically – 10g: DBMS_SCHEDULER – 11g: DBMS_AUTO_TASK_ADMIN 云和恩墨 成就所托
  • 4. What is Histogram 12 10 8 Value 1 Value 2 6 Value 3 4 Value 4 Value 5 2 0 Column 1 Column 2 Column 3 Column 4 云和恩墨 成就所托
  • 5. What is Histogram • frequency 800 700 700 600 500 500 Distinct Value 1 400 Distinct Value 2 400 Distinct Value 3 300 200 Distinct Value 4 200 Distinct Value 5 100 100 0 Records# 云和恩墨 成就所托
  • 6. What is Histogram • height balanced 120 1 2 3 4 5 5 5 6 7 8 Bucket 1 100 Bucket 2 80 Bucket 3 Bucket 4 60 Bucket 5 Bucket 6 40 Bucket 7 20 Bucket 8 Bucket 9 0 Bucket 10 Records# 云和恩墨 成就所托
  • 7. When We DON’T Need Histogram? • 此列不用于查询 • 列值分布平均 • 不需要多个执行计划 云和恩墨 成就所托
  • 8. Histogram Impact COST!! 云和恩墨 成就所托
  • 9. How Many Choices dba_tab_modific • method_opt=> ations – for columns size skewonly [column_name] – for columns size auto [column_name] COL_USAGE$ – for columns size repeat [column_name] – for columns size 1 [column_name] – for all columns – for all indexed columns 云和恩墨 成就所托
  • 10. How Should We Do? • Step 1 – method_opt=>FOR ALL COLUMNS SIZE 1 • Step 2 (repeat) – method_opt=>FOR COLUMNS SIZE AUTO [COLUMN_NAME] • Step 3 – Using atomatic statistics gathering job – (10g) exec DBMS_STATS.SET_PARAM ('METHOD_OPT', 'FOR ALL COLUMNS SIZE REPEAT'); – (11g) exec DBMS_STATS.SET_GLOBAL_PREFS('METHOD_OPT','FOR ALL COLUMNS SIZE REPEAT'); 云和恩墨 成就所托
  • 11. What is Granularity • 只针对分区表 • ALL • AUTO (默认值) • DEFAULT = GLOBAL AND PARTITION • GLOBAL • GLOBAL AND PARTITION • PARTITION • SUBPARTITION 云和恩墨 成就所托
  • 12. Global Statistics • 收集统计信息 exec dbms_stats.gather_table_stats('KAMUS', 'TAB_PART’); • 聚合统计信息 – 降低开销 exec dbms_stats.gather_table_stats(‘KAMUS’, ‘TAB_PART’, GRANU LARITY => SUBPARTITION); TAB_PART表,20万记录 收集统计信息:Consistent Read = 23432 聚合统计信息:Consistent Read = 12036 云和恩墨 成就所托
  • 13. 坏的情况 • 新加载数据 • 收集有数据变化的子分区统计信息 exec dbms_stats.gather_table_stats('KAMUS','TAB_PART', GRANU LARITY => 'SUBPARTITION', PARTNAME => 'P_20111206_BEIJING'); • 聚合统计信息正确 • 列统计信息呢?NDV 云和恩墨 成就所托
  • 14. 还有坏的情况 • 增加子分区 ALTER TABLE TAB_PART ADD PARTITION P_20111208 VALUES LESS THAN (20111209); • 新加载数据 • 收集有数据变化的子分区统计信息 exec dbms_stats.gather_table_stats('KAMUS','TAB_PART', GRANULAR ITY => 'SUBPARTITION', PARTNAME => 'P_20111208_BEIJING'); • 聚合统计信息呢? • 列统计信息呢? 云和恩墨 成就所托
  • 15. 还有更坏的情况 • 新创建的分区表没有任何数据 • 收集分区统计信息 exec dbms_stats.gather_table_stats('KAMUS','TAB_PART’); • 新加载数据 • 聚合? • 收集子分区的统计信息? • 收集全部子分区的统计信息? • 收集全部分区的统计信息? 云和恩墨 成就所托
  • 16. Conclusion • 如果只选择收集SUBPARTITION统计信息,那么要 确认聚合统计信息会正确生成。 云和恩墨 成就所托
  • 17. 答 云和恩墨 成就所托

Notas do Editor

  1. granularity Granularity of statistics to collect (only pertinent if the table is partitioned).'ALL' - gathers all (subpartition, partition, and global) statistics'AUTO'- determines the granularity based on the partitioning type. This is the default value.'DEFAULT' - gathers global and partition-level statistics. This option is obsolete, and while currently supported, it is included in the documentation for legacy reasons only. You should use the 'GLOBAL AND PARTITION' for this functionality. Note that the default value is now 'AUTO'.'GLOBAL' - gathers global statistics'GLOBAL AND PARTITION' - gathers the global and partition level statistics. No subpartition level statistics are gathered even if it is a composite partitioned object.'PARTITION '- gathers partition-level statistics'SUBPARTITION' - gathers subpartition-level statistics.