You find a column named EntityNum in a table you manage, but what data belongs in this column? Not every detail of usage is clear from just SQL data type and constraints. What is the sensible range of values? Unit of measure? How is the column used by applications? Who in the world knows? We need a way to add comments to the database schema, just as we would write comments in application code to document how programmers should use it. But comments are useful only if they're correct and current, and if they're easy to read and to update. Schemadoc is an experimental tool to help in these goals.
2. Schema Can Be Unclear
CREATE TABLE AccountActivity (
Id INT PRIMARY KEY,
whose name?
Name VARCHAR(100),
Status VARCHAR(20), what are the values?
EntityNum INT, what’s an entity?
Order INT what’s a number?
); commerce, or ordinal?
www.percona.com
3. Comments
CREATE TABLE AccountActivity (
Id INT PRIMARY KEY,
Name VARCHAR(100) COMMENT ‘person who placed the order’,
Status VARCHAR(20) COMMENT ‘new, open, or closed’,
EntityNum INT COMMENT ‘how many items in the order’,
Order INT COMMENT ‘reference to the Orders table’
) COMMENT ‘any change to e-commerce orders’;
www.percona.com
4. Length
• Limits increased in MySQL 5.5.3:
• Per column: 1024 characters
• Per index: 1024 characters
• Per table: 2048 characters
• Per partition: 80 characters
(MySQL 5.6 increases this to 1024)
www.percona.com
5. Why don’t we use comments?
SELECT TABLE_NAME, TABLE_COMMENT
FROM INFORMATION_SCHEMA.TABLES
WHERE (TABLE_SCHEMA, TABLE_NAME) = (‘EcommerceDB’, ‘AccountActivity’);
SELECT COLUMN_NAME, COLUMN_COMMENT
FROM INFORMATION_SCHEMA.COLUMNS
WHERE (TABLE_SCHEMA, TABLE_NAME) = (‘EcommerceDB’, ‘AccountActivity’)
ORDER BY ORDINAL_POSITION;
SELECT INDEX_NAME, CONCAT(‘(’, GROUP_CONCAT(COLUMN_NAME ORDER
BY SEQ_IN_INDEX), ‘)’) AS INDEX_COLUMNS, COMMENT AS INDEX_COMMENT
FROM INFORMATION_SCHEMA.STATISTICS
WHERE (TABLE_SCHEMA, TABLE_NAME) = (‘EcommerceDB’, ‘AccountActivity’)
GROUP BY INDEX_NAME;
SELECT PARTITION_NAME, PARTITION_COMMENT
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE (TABLE_SCHEMA, TABLE_NAME) = (‘EcommerceDB’, ‘AccountActivity’)
ORDER BY PARTITION_ORDINAL_POSITION;
www.percona.com
6. “Is there a tool for MySQL
like javadoc for Java code?”
www.percona.com
12. But Wait, There’s More
• Parses SQL dump file
• Reads from a live MySQL instance
• Document many schemas
• Update one schema at a time
www.percona.com
13. Schema Analysis
• Number of indexes and columns per data type
• Columns with same name but different type
• ENUM columns that aren’t NOT NULL
• Use of FLOAT or DOUBLE
• Tables with one index per column
• Tables with one index over all columns
• Tables with no indexes
• INT(N) other than default N
• Lack of INT UNSIGNED columns
www.percona.com
14. Schema Analysis
• Tables that look like Entity-Attribute-Value
• Tables that look like Polymorphic Associations
• Superfluous primary key
• Report table with least / most indexes
• IP addresses stored in VARCHAR
• Excessive use of VARCHAR(255),
report actual max string length
...And supports plugins for other checks!
www.percona.com
16. Future of Schemadoc
• Integrate with Percona Toolkit library
• Read extra metadata (foreign keys)
• Show table sizes and growth rates
• Show SQL privileges (who has access?)
• CLI for adding / updating comments
• Batch-mode output
• Redesign HTML output with modern look & feel...
www.percona.com
20. Copyright 2012 Bill Karwin
www.slideshare.net/billkarwin
Released under a Creative Commons 3.0 License:
http://creativecommons.org/licenses/by-nc-nd/3.0/
You are free to share - to copy, distribute and
transmit this work, under the following conditions:
Attribution. Noncommercial. No Derivative Works.
You must attribute this You may not use this work You may not alter,
work to Bill Karwin. for commercial purposes. transform, or build
upon this work.
www.percona.com