3. Pixable in numbers
~5 million users
9 billion photos (~5 Terabytes)
35 new million photos a day
80 million categories
16 million writes/hour (~40GB/hour)
30 million reads/hour (~120GB/hour)
Logging and profiling
- 15k inserts/sec
7. Presentation
In Pixable we have migrated from/to different data storage
solutions. To accomplish this, we've built a plugin-like data layer
to allow complete separation between application code and
data storage. In fact, our whole migration from MySQL to
MongoDB was performed over this layer, helping us to move
chunks of data little by little while learning how the system
behaved under the new configuration. During the process, we
managed to maintain duplicate copies in MySQL and Mongo for
a while until the transition was complete. All of this happened in
a way almost transparent to the application code, requiring very
little changes in the code.
During this talk, we are going to show how we built this
architecture and how easy is to integrate other data storages
(memcached, S3, etc) on it. We will also share some tips that
we've learned down the road and pros/cons of working under
this schema.
8. Initial infrastructure
LAMMP (Lynux-Apache-Memcache-MySQL-PHP)
Backend
Frontend MySQL
API User DB
class user {
$id;
$first_name;
$last_name;
public function getUser($id) {
$sql = ‘SELECT * FROM users WHERE id =’.$id;
$userRS = db->fetchArray($sql);
$user = $this->buildUser($userRS);
return $user;
}
}
9. Issues encountered
Limit on the DB connections to Master.
Not able to hit the DB hard without generating
lag on the slave servers.
Adding a field to an existing table with billions
of records would mean downtime of the App.
Adding new DB servers was slow (in some
cases required downtime of the app) and high
in server costs.
…so we needed a DB engine easy to
grow, schema-less and low in server cost.
10. Solution found
MongoDB
Has built-in sharding
ReplicaSet features automatic data
clone, synchronization and PRIMARY failover.
Our data fits perfectly in the MongoDB document
paradigm.
Schema-less.
Easier to have many small machines, failures or
maintenances are less traumatic.
Background index creation.
…now we needed a way to start migration
without having downtime and data loss.
11. Implementing solution
Migrating code from classes/functions with
SQL queries all around the project code to
the new Flexible Plugin-like data layer.
API Backend
Frontend MySQL
User DB
MySQL MySQL
User DS
DB
API Backend User
Frontend Data
User Source MongoDB Mongo
User DS
DB
12. Implementing solution – Step 1
User Data Source (Plugin manager)
• Gets the call from the backend for the user
data source.
• Evaluates the conditions defined by us to see
what Data Source to return and looks for the
User
user in the correct data source.
Data
• If the conditions for migrating are activated will
Source
migrate the user if its not migrated already to
the new DB Engine.
• Will return the Data Source defined by the
conditions above.
13. Implementing solution – Step 2
Building each DB engine plugin
User
Data
Source
MySQL MongoDB Memcached …
Plugin Plugin Plugin Plugin
Requirements:
• All plugins have to implement the same set of public
methods/functions.
• All have to reply in the exact same data structure and format.
• All plugins constructors may accept as a parameter another plugin
so we can chain them together if needed.
14. Implementing solution – Step 3
Moving all SQL queries from different classes
methods/functions to the new Data Source infrastructure:
Old Class code: New Class code:
class user { class user {
$id; $id;
$first_name; $first_name;
$last_name; $last_name;
public function getUser($id) { public function getUser($id) {
$sql = ‘SELECT * FROM users WHERE id =’.$id;
$userRS = db->fetchArray($sql);
$uDS = UserDataSource::getUserDS($id);
$user = $this->buildUser($userRS); $userRS = $uDS->getUser();
return $user; $user = $this->buildUser($userRS);
} return $user;
} }
}
15. Example 1
Condition:
• Read operation and found in Memcached
• Write operation, writing in MySQL and MongoDB.
MySQL
Plugin
User MongoDB
Backen Memcached
Data MySQL
d Plugin
Source DS
MongoDB
Plugin
Read operation
Write operation
16. Example 2
Condition:
• Read and write to MySQL but use MongoDB as backup.
MySQL
Plugin
User MongoDB
Backen Memcached
Data MySQL
d Plugin
Source DS
MongoDB
Plugin
Read operation
Write operation
17. Example 3
Condition:
• Only new users should be migrated but use MongoDB as
backup for all read operations from existing users.
MySQL
Plugin
User MongoDB
Backen Memcached
Data MySQL
d Plugin
Source DS
MongoDB
Plugin
Read operation
Write operation
18. Conclusion
Pros:
Separates your app’s code from the Data Storage engines
languages.
Adding new Data Engines easily.
Lets you balance the load generated to each Data Engine.
As the company grows, a team can be dedicated to the Data
Plugins development and optimization, while other team can
actually develop the application itself.
Cons:
Your App will generate more queries to the Data Engines.
You will have to write more lines of code when implementing
this plugins that when only using one Data Engine.
19. Final Recap
App
User
Data
Source
MySQL MongoDB Memcached …
Plugin Plugin Plugin Plugin
Now a days fast changing market/products force one to evolve during the process
Q: How many of you started with LAMP?Easy query, but usually queries are much more complicated than this. Tear them down once applied the new plugin structure.
For doing so we had to normalize all the objects to one same structure with default values and data types (dates to linux time format, etc.) as all Data Layers have to reply to the APP in the exact same format to guarantee compatibility if we decide to deactivate one of the layers.
Conditions for migrating can be from true to false to start migrating users as they come to the app, to more complex conditionals like comparing the last two digits of a user ID and comparing it to a percentage defined by us to see if it should be migrated or not to the new Data Source.Chain Data Sources
Plugins IMPLEMENT a interface class, for minimal public functions check.Data Format = reply with an instance of the object already.
Backend calls insertUser().User DS checks conditions and replies with the data source for the user.Memcached won’t find it so it will call the MongoMySQL DS.MongoMySQL DS will then run from the logic of migration and determined that because it’s a new user we will store it only on MongoDB.
Backend calls insertUser().User DS checks conditions and replies with the data source for the user.Memcached won’t find it so it will call the MongoMySQL DS.MongoMySQL DS will then run from the logic of migration and determined that because it’s a new user we will store it only on MongoDB.
Backend calls insertUser().User DS checks conditions and replies with the data source for the user.Memcached won’t find it so it will call the MongoMySQL DS.MongoMySQL DS will then run from the logic of migration and determined that because it’s a new user we will store it only on MongoDB.
----- Meeting Notes (10/23/12 18:59) -----More queries will only happen while in the migration process.
You can chain them together.You can have different DataSources in different geographic locations for testing a new DB Engine on a location where you have less amount of users.Went form LAMMP to LADP (D = Data = Multiple Data Storage)