The document discusses challenges related to migrating data from legacy systems to new applications and systems. It notes there are typically many source systems in various formats with incomplete or unknown information. Effective data migration requires understanding source systems, data mapping, quality analysis, and design of the migration process. It also stresses the importance of data governance and quality to ensure migrated data can be effectively used.
Establishing A Robust Data Migration Methodology - White Paper
Data Migration and MDM - DMM5
1. The Data Migration Challenge:
Elements including MDM
by Wael Elrifai
London - New York - Dubai - Mumbai - Hong Kong 2012
2. Understanding Migration
Assumptions
Few source Specific All Data Documented Valid
systems Data Available System Data
Formats Interfaces
T R U T H
Many More Data in Needed Data Unknown Poor Data
Source unknown is Missing System Quality
Systems formats Interfaces
“Migration is not just about moving the data…
It’s about making the data work.”
Confidential - not for redistribution
3. These Application Projects have a Common Critical
Requirement: Migrating Data
Application Implementation From legacy into new application
Application Upgrade From previous to new version
Application Instance
Consolidation From multiple instances to fewer
M&A Integration From acquired systems
Legacy Retirement From legacy into new systems
Outsourcing
From company to outsourcer
4. Project Overview: Data Migration to ERP
• 200+ source systems
• Operating in 14 languages
• Different sets of users working in different regions with different
applications and languages
• Highly fragmented lines of business and regions
• No concept of Data Governance or Master Data Management
• No concept of Data Quality Analysis
5. Methodology: Practical Data Migration
Landscape Gap Analysis & Migration Design
Analysis Mapping & Execution
(LA) (GAM) (MDE)
Legacy
Technical
Decommissioning
Migration (LD)
Controller
Migration Strategy &
Profiling Tool
Data Quality Tool
Governance
DMZ
(MSG)
Data Quality Rules
(DQR)
Engagement
Key Data Stakeholder System Retirement Plan
Business
Management (SRP)
(KDSM)
6. Team Structure & Communications
• Primary Business Team located in Hong Kong
• 6 Business Analysts
• 2 Technical Coordinators
• Primary Development Team in Hong Kong
• 8 Developers
• Offshore Development Team in Mumbai, India
• 4 Developers
• Unique Aspects
• Agile/Scrum meetings conducted via Video Conference
• Email usage limited
• Assigned secretary with output immediately posted on Wiki for comments
• Team Lead makes final “closing comments” on each issue
7. Application Migration: The Anatomy of Failure
Long development times
•Often many months or even years without any „visible‟ signs of
progress
•CAUSE: failure to properly decompose development into practical,
achievable and meaningful „phases‟ and „sprints‟
Long development times – for individual ETL flows
•Due to extensive and repeated re-working of ETL code
•Resulting from failures in unit testing and user acceptance testing
•CAUSE: poor and inadequate design
Considerable variations in quality & efficiency of code
•Increasing time for new/other developers to modify code
•CAUSE: failure to define and firmly enforce standards
8. Application Migration : The Anatomy of Failure
Minimal attention to data cleansing or standardisation
•Leading to longer report development times
•And greater inconsistencies in reporting
•Effectively pushing data quality management to report developers
•AND information consumers
•CAUSE: failure to recognise importance and impact of employing
a systematic approach to managing data quality
Poor reliability
•Arising from „unexpected‟ variations in structure or content of
incoming source files
•CAUSE: failure to cater for Murphy‟s Law – i.e. the most frequent
and most obvious causes of
9. Application Migration : The Anatomy of Failure
Poor performance
•CAUSE: failure to give due consideration to scale and complexity
of ETL processes – during the design stage
•CAUSE: failure to fully understand the underlying causes – when
performance problems become evident
•CAUSE: failure to routinely monitor performance or undertake
adequate capacity planning – to cater for gradual or step-change
increases in data volumes
10. Application Migration: The Anatomy of Success
Entity Level Data Model Design
„MAPPING‟ & ETL Phasing
TEMPLATES
REUSABLE
Forensic
Sprint COMPONENTS
Hosted Data Analysis Code
Go Live
Translations
Soft Detailed &
Go Live Functional Design Master
Schedule
UAT Detailed
Technical Design
Enforce
Including Peer Review
System Standards
Master
Test Technical Authority &
Schedule
Reusable
Components
Peer Review Build
Technical Authority Unit Test
11. Abstraction of Rules & Reusability
• Automated ETL mapping development based on source system metadata
• Automated data type verification for flat file data based on header information
•Consistent use of a single value mapping table abstracted to accommodate data
migration rules
• Automated data type verification for flat file data based on header information
•Single generic “run script” which operates based on a simple dependency
matrix
• This is more important in operational rather that data migration
situations, but becomes important when dependencies are complex
12. Data Migration Guiding Principles
Creating Data Standards to Reduce Complexity
Future State Environments Create Entity Attribute Model
• Enterprise Apps Data
Models
• ODS Data Models
ODS
Common Data Standards
Enterprise Representation
Current State • Create Domain Model DW
Environments • Create Entity Model
• Source Tables • Create Entity Relationship
• Source Attributes Model
• Upstream Sources
Customer
• Downstream Targets
• Create as is Domain Model
• Create as is Entity Model ETC
Initial Common Data
Rationalize Domains and Rationalize Attributes across Standards and creation of:
Entities across Current State Map in all Application
Current State and Future •Initial DQ Program Environments to the
and Future State State Environments •Initial Data Ownership Model
Environments Enterprise Standard
•Initial Data Management
•Governance Processes
Confidential - not for redistribution
14. Data Governance - 14-step (sounds like a lot!) program
1. Review available documentation on process flow
2. Agree scope of work
3. Plan and schedule meetings
4. Produce initial definitions of DG framework
5. Assemble DG working group
6. Engage with Data Stewards
7. AS-IS business process analysis
8. AS-IS data analysis
9. Define TO-BE processes
10. Define TO-BE system requirements
11. Assemble business glossary
12. Introduce standardization of business-critical data items
13. Implement DG KPI tracking and DQ exception reporting
14. Conduct periodic audit of business processes
15. Master Data Management - Highlights
• DON‟T FORGET! Your data migration tools may end up being the
real-time MDM Hub communication logic/tools as well, design
appropriately
• Simplified load tools that can be used by analysts
• Custom match/merge algorithms
• Gray‟s coding
• 14 languages including European, Middle Eastern (right-to-left), East
Asian
• Some transliteration rules built using statistical regression on 30m
customer records
• Match/merge algorithms with discrete variables and user interface
• Ability to allow users to target hotspots
• Variable “sliders” - Meshed variables for hotspot analysis allows for
more merge sensitivity flexibility
• Data analysis for predicting why false positives and false negatives
occur
• Role of each source
• Types of data that most often “fails”
• Google Maps/Address integration for matching (cloud), data
enhancement, and more
16. Testing
• Custom “Black Box” testing tool designed
• Specialized for database tests
• Requires addition of some metadata columns to data model
• S_ID
• Batch_ID
• LOAD_TIME
• Automatic storage of test cases
• Test data
• Documentation on test being run
• User metadata
• Test metadata
• Sets database into a known state
• Can generate test data
• Single unified interface
• Fault-Fix workflow management
17. Documentation
• Automated
• Driven by
• Business requirements documented in
• Custom testing tool
• Wiki documentation
• ETL tool metadata
• Custom testing tool metadata
This is highly contingent on being able to enforce developer rules
about documentation within tools.
18. Risk Mitigation
Extract data early
• Data should be seen immediately. We‟ve seen problems come up because
data didn‟t conform to expectations.
Convert data early
• Our existing build will allow for the first conversion to take place within
weeks for all objects.
Convert data often
• An iterative approach to both data quality and conversion allows for
repeated analysis. This should be driven by development schedules rather
than inversely by validation schedules that aren‟t related to development
time.
Use real data from the start
• Conversion team should have direct access to source systems, without a
dependency on another team to create extracts.
Seek to incorporate external and up-to-date information about your
Master Data
• Tools like Google‟s business services, D&B, Bloomberg and others can
help
19. Data Migration through Information Development
Lessons Learned
Prioritise Planning
• Define business priorities and start with quick wins
• Don't do everything at once – Deliver complex projects through an incremental
programme
• “Chunks” need to be appropriate, based on elements like homogeneity of front-
end, single sets of business users across geographies, language usage, etc.
Focus on the Areas of High Complexity
•Don't wait until the 11th hour to deal with Data Quality issues – Fix them early
•Follow the 80/20 rule for fixing data – Does this iteratively through multiple cycles
•Understand the sophistication required for Application Co-Existence and that in the
• In the short term your systems will get more complex
Keep the Business Engaged
• Communicate continuously on the planned approach defined in the strategy The overall
Blueprint is the communications document for the life of the programme
• Try not to be completely infrastructure-focused for long-running releases – Always
deliver some form of new business functionality
• Align the migration programme with analytical initiatives to give business users more
access to data
• Ensure that the Data Governance program has “teeth”
Confidential - not for redistribution
20. Questions?
?
Peak Consulting UK Headquarters
90 Long Acre, Covent Garden
London WC2E 9RZ
T: +44 (0)20 7849 3422
F: +44 (0)20 7990 9478
www.peakconsulting.eu
Confidential - not for redistribution