The thinking persons guide to data warehouse design

The Thinking Person’s Guide to Data Warehouse Design Robin Schumacher VP Products Calpont

Agenda Building a logical design Transitioning to a physical design Monitoring and tuning the design

What is the key component for success? In other words, what you do with your MySQL Server – in terms of physical design, schema design, and performance design – will be the biggest factor on whether a BI system hits the mark… * Philip Russom, “Next Generation Data Warehouse Platforms”, TDWI, 2009. *

First – get/use a modeling tool

Simple reporting databases OLTP Database Read Shard One Reporting Database Application Servers End Users ETL Just use the same design on a different box… Replication

The logical design for analytics/data warehousing

Logical Design Considerations ,[object Object],[object Object],[object Object],[object Object]

Manual horizontal partitioning Modeling technique to overcome large data volumes

Manual Vertical Partitioning Modeling technique to overcome wide tables/rows

Pro’s/con’s to manual partitioning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Pro’s Con’s

The bottom line on logical modeling ,[object Object],[object Object],[object Object],[object Object]

Transitioning to a physical design

SQL or NoSQL…? Row or Column database…? How to scale…? Should I worry about High availability…? Index or no…? How should I partition my data…? Is sharding a good idea…?

General list of top BI database design decisions ,[object Object],[object Object],[object Object],[object Object],[object Object]

Divide & conquer is the best approach ,[object Object],[object Object],[object Object]

What technologies you should be looking at * Philip Russom, “Next Generation Data Warehouse Platforms”, TDWI, 2009. *

Row or column-based engine? Medium-very large data Small-medium data Very dynamic; query patterns change Know exactly what to index; won’t change Need very fast loads; little DML Will be doing lots of single inserts/deletes Only need subset of columns for query Will need most columns in a table for query Yes, Column-based tables! Yes, Row-based tables!

Column vs. row orientation A column-oriented architecture looks the same on the surface, but stores data differently than legacy/row-based databases…

Example: InfiniDB vs. “Leading” row DB InfiniDB takes up 22% less space InfiniDB loaded data 22% faster InfiniDB total query times were 65% less InfiniDB average query times were 59% less Notice not only are the queries faster, but also more predictable * Tests run on standalone machine: 16 CPU, 16GB RAM, CentOS 5.4 with 2TB of raw data

Why not use both…? ,[object Object],[object Object],[object Object],[object Object]

Most used DW Storage engines internal to MySQL MyISAM Archive Memory CSV ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Also:Merge for pre-5.1 partitioning

What about NoSQL options? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Partitioning – not ‘if’ but ‘how’ ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],90% Response Time Reduction

Partitioning – Stripe your Partitions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Note that striping only works for some engines (e.g. MyISAM, Archive) and for only certain operating systems (e.g. the option is ignored on Windows). You can use the REORGANIZE PARTITION command to move current partitions to new devices.

Partitioning – Smart Data Pruning ,[object Object],[object Object],[object Object],Most data warehouses have pruning or obsolete data operations that remove unwanted data. Using partitioning allows you to much more quickly and efficiently remove obsolete data: mysql> alter table t1 drop partition p1; Query OK, 0 rows affected (0.03 sec) Records: 0 Duplicates: 0 Warnings: 0 VS. The DROP PARTITION is A DDL operation, which runs much faster than a DML DELETE.

Index Creation and Placement ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Optimizing for data loads ,[object Object],[object Object],[object Object],[object Object]

Optimizing for data loads ,[object Object],[object Object],[object Object]

Monitoring and tuning the design

Three performance analysis methods Bottleneck analysis Workload analysis Ratio analysis

Bottleneck analysis ,[object Object],[object Object],[object Object],[object Object],[object Object]

Workload analysis ,[object Object],[object Object],[object Object],[object Object]

The pain of slow SQL * Philip Russom, “Next Generation Data Warehouse Platforms”, TDWI, 2009.

Ratio analysis ,[object Object],[object Object],[object Object],[object Object]

Conclusions ,[object Object],[object Object],[object Object],[object Object]

For More Information ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],www.infinidb.org www.calpont.com

The Thinking Person’s Guide to Data Warehouse Design Robin Schumacher [email_address]

The thinking persons guide to data warehouse design

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Destaque

Destaque (12)

Semelhante a The thinking persons guide to data warehouse design

Semelhante a The thinking persons guide to data warehouse design (20)

The thinking persons guide to data warehouse design

Notas do Editor