A common belief in the enterprise software world is that MySQL cannot scale to large databases sizes. The Internet industry proved it can be done. These days many of the Internet giants, processing billions of events every day, are based on MySQL. Most of these giants were able to turn MySQL into a mighty database machines by implementing Sharding.
What is Sharding? What kinds of Sharding can you implement? What are the best practices? All these issues will be address in this lecture by Moshe Kaplan from RockeTier. the performance experts
25. Startup your Engines Thank you [email_address] http://top-performance.blogspot.com
Notas do Editor
Initial setup of a u-Page from toolbar 14M users today 20M users 2009Q1 600K downloads per day Moving from 1:N templates to 10-15% users that updates the page (from 1:1000 templates/users ratio to 1:7 templates/users) Statistics: user changes: 1 per week, upload: 2 times a day Wants to use the default if the user did not modified the template. However, wants to support pushed changes from templates to users pages, even if those were changed Have the same situation now, but now it’s saved on the users desktop Current applicative cache: - int: toolbar_id, list of int: widget_id[] - int: widget_id, object: widget - int: user_id, object: user Table design: - User id, tab id and user_tab_id are GUID – meaning that they can be distributed between databases - Other are not (may be needed to support?) - Expected XML scheme size: 45KB (up to 500KB) Why migration to MySQL will be problematic (they are already using the current SQL Server SP features): - Maximal row size * http://dev.mysql.com/doc/refman/5.0/en/innodb-restrictions.html * The maximum row length, except for VARBINARY , VARCHAR , BLOB and TEXT columns, is slightly less than half of a database page. That is, the maximum row length is about 8000 bytes - Scope identity - For XML - Bulk Insert from XML - Rollback/Transaction - BEGIN TRY Applicative changes by the user: - Move - Delete - Add - Minimize - Change internal parameters Options to be considered: - SQL Server based large machine - Sharding based on MySQL, SQL Server - Gigaspaces solution: felt that it’s a large machine, and they prefer the database way Other options - Saving only XML in the current way - Save the summed page configuration in an XML, so little read should be done from the DB (Tab based) - Write 20K files of 40KB each on the my laptop HD: 149s - Read XML: 109.5s - Write to DB: 1789s Use serialization of .Net - Save the XML on the disk in order to avoid variable length fields - Use memcached to hold the hash of users? Things to be considered: - What horizontal sharding algorithm should be selected - Hibernate Shards – provided by Google. Still beta-testing phase - What vertical sharding tables should be spitted to different databases - How do you manage so many databases (distribute data and so on) - There is not really an option to do that - Defining optimal table sizes - Retrieval of data from the disk vs. Getting data from the tables: Tab (1 per displayed page), Zone (3 per displayed page), Widget instance (10-300 per displayed page + should be extracted with/out zones) - OLAP solution to merge data OLAP solution Toolbar design: - Saving