This document discusses migrating 100,000 pages of content from a legacy CMS to Drupal. It provides an overview of planning the migration by asking important questions, using tools like the Migrate and Pathauto modules for importing content and rewriting URLs, dealing with issues like file management and testing the migrated site, and finally deploying the new Drupal site. The key steps are planning through asking the right questions, using modules and custom scripts for importing content and files and rewriting URLs, thorough testing, and a phased deployment approach.
WSO2's API Vision: Unifying Control, Empowering Developers
Migration from Legacy CMS to Drupal
1. Drupal Migration Migrating 100,000 pages of content From Legacy CMS to Drupal Rachel Jaro Solutions Architect at PrometSource www.prometsource.com
2. Overview We’ll talk about: Successful migration recipe Common questions you should be asking before you start Top 3 tools to do migration in Drupal Issues Tools to use in URL Rewriting File management Comparison in D6 Testing Deploying Solution
3. Data Migration “Data migration solutions extract data from a source system, correct errors, reformat, restructure and load the data into a replacement target system”. It sounds simple, but poorly managed data migration is the most common cause of failure in implementing a replacement system. -- Gershon Pick, March 2001
6. Plan: What to Ask Node types (Content separation, fields) Do you want to separate contents into pages, articles, biography, news, etc. What fields are needed for each node? Who can access it? Do you really need that content type? Or can we just use taxonomies instead for similar contents.
7. Plan: What to Ask Taxonomy (Categorization, tags) Do you need to categorize nodes? Would you need different access? What kind of taxonomy groups or vocabularies you would need? Permission (per nodes) and User Roles Who are going to use the site? What are particularly their access rights?
8. Plan: What to Ask New URL mapping Do you need to make SEO friendly URLs? Files, files permissions and file directory Do you need advance file management or document management tool? Do you need simpler solutions? How simple is that. Do you need access rights for each folder? Do you need browser type interface to access them? What kind of files do you need to store? Images, pdfs?
10. Requirements Use CSV files to import data Divide migration into group or sections Map and replace old URL to SEO friendly URL Before: 05-200.htm
11. Data in CSV Example December 13, 2005 3:39:54 PM||||||||||December 13, 2005||||||||||Report Spotlights Need for Reform in Jackpot Jurisdictions||||||||||/press/releases/2005/december/||||||||||05-200||||||||||{UUID}|||||||||| Economics^^^^^^^^^^Economy |||||||||| <p>LoremIpsum is simply dummy text of the printing and typesetting industry. LoremIpsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. </p> <p>LoremIpsum is simply dummy text of the printing and typesetting industry. LoremIpsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. </p> $$$$$$$$$$ Separator: |||||||||| End of Row: $$$$$$$$$$
12. Content Type Division Example: CNN.com Divide migration sequences into US, World, Politics, Justice, etc
14. TW & Migrate Module Combo http://drupal.org/project/tw Supports Migrate module to run views of source data http://drupal.org/project/migrate a flexible framework for migrating content
15. Migrate Module Features: users browse their legacy data using views support for creating Drupal nodes, users, and comments is included hooks permit migration of other types of content. provides a dashboard for running mini migrations Drush support
16. Why I did not choose migrate Importing to mysql was not an option. CSV were used instead Cannot map old URL to new URL
22. File Management Client requirements Intuitive Has wysiwyg support Access control – upload, edit, delete, revise files by different roles Revision control – optional but good to have Limited time!
23. File Management Modules *DbFm was not included due to problems encountered during tests in D6
25. URLs Rewriting Solution Not recommended .htaccess Too many URL to handle. Too much server load Recommended pathauto + path_redirect modules automated alias settings 301 redirect set global redirect Additional reference: http://acquia.com/blog/migrating-drupal-way-part-ii-saving-those-old-urls
27. Access control Alternative /default/files/PressReleases /default/files/Documents /default/files/International /default/files/International/America /default/files/International/England /default/files/International/Asia
28. Test, Test and did I say Test? Source: http://www.flickr.com/photos/paperpariah/2424107350/
29. Common problems Broken links Misconfigured page Empty pages Invalid date File not found or orphan pages Page format Test when CACHE is on
33. Deployment Mockup * shadow box is your migrated data’s production box * old CMS is still active at this time
34.
35.
36. Deployment Pros Less risk, less stress Editors can do continues data entry daily Cons URL rewriting can be a tricky Updating the production box with new content can be an arduous task
37. Deployment: Updating Production Automation SVN Drush scripts to migrate contents from tester’s box to shadow box Deploy – http://drupal.org/project/deploy Manual Document configuration changes Document database changes
38. Recap SDLC + Agile Common questions you should be asking before you start Top 3 tools to do migration in Drupal TW & Migrate, node_import(), drush Issues File management Comparison in D6 Tools to use in URL Rewriting Testing Deployment Solution