Slides from a talk I gave at MongoNYC on using MongoDB with Drupal. I will most likely be doing this as a webcast and giving this presentation at Drupalcamp NYC 8 this July.
2. Drupal (FTW!) Drupal: Aspect-oriented (modular) “social-publishing” framework, written in php (pdo) that allows easy creation and integration of multiple data-rich social networking sites and robust web applications & services. Used on large high-performance websites. Roadmap anticipates future web (rdf, ggg) & emerging technologies (MongoDB!)
3. The Problem with SQL* Webapps > Once you have related data, you need joins > Indexing on joins does not work that well ...So you: > Introduce denormalisation > Build extra tables *(RDBMS)
4. The Problem with Schema > Changes broadly lock data, requiring downtime or putting excessive, short-term load on the system. > The data integrity constraints don’t quite support application integrity constraints. For example, there’s no standard way in SQL to require a column to contain only valid URLs. > Coordinating schema changes with application code changes is difficult, largely because schema changes lack convenient coupling with code.
8. ( Why Mongo ?) Tableless Queriless Schemaless Blazingly fast Faster Development times Nicer learning curves Code is trimmer Future-proof
9. Performance / Scaling Whitehouse.gov / direct engagement 15K/day contact requests 2M records in db 4GB db: replication risks MongoDB: 180M+ documents in 1 collection
10. Writing Performant Queries Sort column must be the last column used in the index. Range query must also be the last column in an index, Only use a range query or sort on one column. Conserve indexes by re-ordering columns used in straight = queries Never use Mongo's $ne or $nin operator's Never use Mongo's $exists operator http://jira.mongodb.org/browse/WEBSITE-12
11. Install Mongo public array authenticate (string $username, string $password ) public array command ( array $data ) __construct ( Mongo $conn , string $name ) public MongoCollection createCollection ( string $name [, bool $capped = FALSE [, int $size = 0 [, int $max = 0 ]]] ) public array createDBRef ( string $collection , mixed $a ) public array drop ( void ) public array dropCollection ( mixed $coll ) public array execute ( mixed $code [, array $args = array() ] ) public bool forceError ( void ) public MongoCollection __get ( string $name ) public array getDBRef ( array $ref ) public MongoGridFS getGridFS ([ string $prefix = "fs" ] ) public int getProfilingLevel ( void ) public array lastError ( void ) public array listCollections ( void ) public array prevError ( void ) public array repair ([ bool $preserve_cloned_files = FALSE [, bool $backup_original_files = FALSE ]] ) public array resetError ( void ) public MongoCollection selectCollection ( string $name ) public int setProfilingLevel ( int $level ) public string __toString ( void )
12. Real World Example list the nodes of a user ordered by comment count uid is stored in the node table and the comment count is in node_comment_statistics > thus query cannot be indexed (Comparison of dissimilar columns may prevent use of indexes if values cannot be compared directly without conversion.)
13. What's already in Drupal * mongodb: support library for the other modules (D7/D6) * mongodb_block: Store block information in mongodb. Very close to the core block API. * mongodb_cache: Store cache items in mongodb. * mongodb_session: Store sessions in mongodb. * mongodb_watchdog: Store watchdog messages in mongodb * mongodb_queue: DrupalQueueInterface implementation using mongodb. * mongodb_field_storage: Store the fields in mongodb.
20. mongodb_block function hook_block_view_alter(&$data, $block) { // Remove the contextual links on all blocks that provide them. if (is_array($data['content']) && isset($data['content']['#contextual_links'])) { unset($data['content']['#contextual_links']); } // Add a theme wrapper function defined by the current module to all blocks // provided by the "somemodule" module. if (is_array($data['content']) && $block->module == 'somemodule') { $data['content']['#theme_wrappers'][] = 'mymodule_special_block'; } }
21. Block rebuild Notice : Undefined variable: block_html_id in include() (line 4 of /var/www/Drupal/drupal-7.0-alpha4/themes/garland/block.tpl.php ). Notice : Undefined variable: block_html_id in include() (line 4 of /var/www/Drupal/drupal-7.0-alpha4/themes/garland/block.tpl.php ). Notice : Undefined variable: block_html_id in include() (line 4 of /var/www/Drupal/drupal-7.0-alpha4/themes/garland/block.tpl.php ). Notice : Undefined variable: block_html_id in include() (line 4 of /var/www/Drupal/drupal-7.0-alpha4/themes/garland/block.tpl.php ). Notice : Undefined variable: block_html_id in include() (line 4 of /var/www/Drupal/drupal-7.0-alpha4/themes/garland/block.tpl.php ).
22. Render main content block function mongodb_block_theme() { 'block' => array( 'render element' => 'elements', 'template' => 'block', 'path' => drupal_get_path('module', 'block'), ), } function mongodb_block_mongodb_block_info_alter(&$blocks) { // Enable the main content block. $blocks['system_main']['region'] = 'content'; $blocks['system_main']['weight'] = 0; $blocks['system_main']['status'] = 1; } function mongodb_block_rehash($redirect = FALSE) { $collection = mongodb_collection('block'); $theme = variable_get('theme_default', 'garland');
23. mongodb_field_storage don't : variable_set('field_storage_default', 'mongodb_field_storage'); instead : $conf['field_storage_default'] = 'mongodb_field_storage'; in settings.php ESP. for session/caching backends
25. Import all Nodes > MongoDB* (* in 14 l.o.c.) // Connect $mongo = new Mongo(); // Get the database (it is created automatically) $db = $mongo->testDatabase; // Get the collection for nodes (it is created automatically) $collection = $db->nodes; // Get a listing of all of the node IDs $r = db_query('SELECT nid FROM {node}'); // Loop through all of the nodes... while($row = db_fetch_object($r)) { print "Writing node $row->nid"; // Load each node and convert it to an array. $node = (array)node_load($row->nid); // Store the node in MongoDB $collection->save($node); }
26. Import all Nodes > MongoDB* (* in 14 l.o.c.) # drush script mongoimport.php # use testDatabase; # db.nodes.find( {title: /about/i} , {title: true}).limit(4);
27. Import all Nodes > MongoDB* (* in 14 l.o.c.) <?php // Connect $mongo = new Mongo(); // Write our search filter (same as shell example above) $filter = array( 'title' => new MongoRegex('/about/i'), ); // Run the query, getting only 5 results. $res = $mongo->quiddity->nodes->find($filter)->limit(5); // Loop through and print the title of each article. foreach ($res as $row) { print $row['title'] . PHP_EOL; } ?>
28. What's Next? Multiple DB servers – Data Persistance Query logging - Devel support Query builder – Views integration DBTNG – Full DB Abstraction MongoDB API
29. Query Logging Extend Mongo collection class Pass instance back from mongodb_collection Implement all collection methods
33. Where Mongo Won't Work Joining across Entities ex. return birthday from profile belonging to author of current node
34. Thanks! Comments & questions to: ForestMars @gmail.com ForestMars @googlewave.com Facebook, LinkedIn, etc. twitter: @elvetica (identica@forest)
Notas do Editor
However Schema-based RDBMS tend to be the worst offenders; SQL in particular has a number of issues: > MySQL / InnoDB has a very unfriendly method of doing schema changes: it rebuilds the table and blocks all writes until the new table is ready. > Big price – transactions. Worked so hard to get transactions in, wait, why are we working hard? > Rich feature set string functions you've never used > More abstractly, reality, or at least the reality of the Interwebs doesn't play nice with SQL as the former doesn't have a rigid schema that the later wants to enforce. A thousand blog posts bear witness to the list of reasons that essentially boil down to the big price you have to pay to support features of a system that reality doesn't map well to reality. And this is of course is one of the most important things these systems/machines/programs do: map reality. And while it may seem to be a fairly high level of abstraction to compare database models to reality ('itself') there is in fact a crucial dialogue in database programming between abstraction and actuality that is exemplified by what my signature quote <blockquote>In Theory, ...