1. September 12 and 13, 2002
Satyam Technology Center
GDU, Surface Transport, Bhubaneswar
Database Migration – Approach & Planning
Keshav Tripathy, Pragjnyajeet Mohanty and Biraja Prasad Nath
GDU Surface Transport, Bhubaneswar.
Summary
Database migration is the process of moving the schema, data and application associated
with the current system to a different technology/platform. It is one of the most important
tasks in making any system to go live when there is a shift from one platform to another
along with database and associated applications. The authors here are trying to examine
some of the critical issues that arise out of the analysis and implementation of a database
migration project. The idea is to keep the discussion independent of any platforms but
assumes RDBMs architecture as target. The paper is not a guide to the implementation of
data migration but is a discussion on the various aspects associated with it.
Key Words
Database migration, Data Migration, Schema Migration, Extraction, Loading
Introduction
With the growing dominance of business reengineering efforts and enterprise wide
application integration, organizations come to a stage where they have to move their
database from multiple platform to a single one or from one platform to another, driven by
technology and requirements best suited for the particular application. System evolution
throws up the challenges for organizations to keep pace with the rapidly growing technology
and capitalize on the advantages of features it offers. Organizations also migrate when they
realize that their existing systems have performance and scalability limitations, which cannot
cater to their ever-expanding business needs.
Database migration is the process of moving the data, schema and applications associated
with the current system to a different technology/platform. Database migration is one of the
most common but a major task in any application migration or porting of an application or
moving towards ERP or EAI environment.
It may be thought that when two systems must maintain similar data then they would map
from one to another with ease, but that is hardly ever the case. Owing to the difference in
system, design, technology and implementation many issues creep into the process of
database migration, which makes the mapping of the older system to the newer one a job of
importance. For example moving from a hierarchical database system based on de-
normalization and redundant storage to a RDBMs system, based on normalization can be a
really arduous job and not a straightforward transformation work. The following are a few
migration scenarios, which vary with the type of migration as:
Moving from Hierarchical database to RDBMs
Moving from Network database system to RDBMs
Moving from RDBMs to RDBMs
The actual implementation of database migration projects differs with the technology used
and/or the customer requirement, but the authors have taken up their experience to come up
with a methodical approach to go about database migration projects that would lessen last
minute surprises. Though this paper is not a step by step guide to database migration, this
certainly helps in enhancing one’s understanding of it and related issues. This paper is aimed
to be database or platform independent but assumes the final database to be is of RDBMs
type.
Database Migration – Approach & Planning 1
2. September 12 and 13, 2002
Satyam Technology Center
GDU, Surface Transport, Bhubaneswar
Data Migration Vs Database Migration
Data migration and database migration are different, though database migration
encompasses data migration also. Data migration is simply the movement of data from one
database (or File System)/platform to another. This may include extraction of the data,
cleansing of the data and loading the same into the target database. But database migration
essentially means the movement of data and conversion of various other structures and
objects associated with the database, viz.
Business Logic - Stored Procedure, Triggers, Packages, Functions
Schema – Tables, Views, Synonyms, Sequences, Indexes
Physical Data – Security, Users, Roles, Privileges
Database dependency of applications associated with the database
Hence data migration is a subset when database migration activities are carried out, though
data migration may also be taken up independently.
For example, when an application is developed which requires as its source, data that
already exists on another database, either RDBMs, Network or Hierarchical system, it is
required to get those data for the newly developed application to operate. In this case only
the data is moved from the required database to the database used by the new application.
This is called data migration. But a database migration is when there is shifting from one type
of database systems to an entirely new type of database system or to a database system
with entirely new features and functionality. Here a lot more things like schema, associated
application etc are affected apart from the data. Though here still the data needs to be
moved, there would also be changes to the database programs and database dependency of
all associated applications too.
Why Database Migration
It is interesting that when the existing systems are running with current database then why is
it required to move to other database. Organizations move because they perceive better
value in the newer system to which they are moving. Motivations for database migrations are
1. Technology changes – With the rapid change in technology, organizations wish to go
for the latest offerings with more benefits and features. Another scenario is that with
stride in technology, the older systems become obsolete and may be left without support
from the vendors. In this case too organizations wish to move on to the latest technology
which suit their current needs and future plans. Business need may also outgrow the
current system and technology giving impetus to move on to a new system. For example
a leading insurance company had a character based application driven by an 8 years old
Informix database. The business needs changed and they wanted to go for an e-
business and ERP integration, with one vendor providing all the solutions. To take the
benefit of the current technology and its capabilities, the company shifted to Oracle 8i
database and a corporate Intranet driven by Oracle 9iAS [5].
2. Database Consolidation – It may be asked why would multiple database come up in a
single organization. One of the major reasons for it is that applications are developed on
an ad-hoc basis. So the most immediate solution is looked for rather than thinking from
an overall perspective. Hence organizations usually end up having different applications
running on different databases. Having multiple database is a logistical nightmare, in
terms of maintenance and tracking of the system. Sometimes different sources feed the
multiple databases and these need to be synchronized. All this involves a lot of effort and
cost. Multiple systems also mean multiple licensing issues and support from various
vendors. Thus it makes sense to consolidate the data into a single sink if the system and
requirements permit.
A leading airline company had applications running on nine different database platforms
on diverse operating systems. With pressures for cost effectiveness and simplicity they
Database Migration – Approach & Planning 2
3. September 12 and 13, 2002
Satyam Technology Center
GDU, Surface Transport, Bhubaneswar
decided to migrate all their databases and associated applications to only two database
platforms [5].
3. Lower cost of ownership – Multiple database means that many skilled DBAs and many
application developers for the different platforms to develop and maintain the applications
and database. In an effort to cut down this cost and leverage upon the strength of a
particular database, organizations are moving unto a single system wherever feasible.
4. System Optimization – During re-engineering of the organization’s business process,
the organization may have to change their data storage strategy and hence making it
necessary to shift, migrate or consolidate databases running in different areas of the
business process. Again when business merge or organizations go for acquisition, they
find themselves with multiple databases and applications running on them. In this case
too, they would like to migrate the multiple database to a single database.
A global company dealing mainly in defense, commercial electronics and aviation
technology merged with another electronics and aviation giant. Both the companies were
working with different database platforms. But after the merger in order to consolidate
their business process they decided to migrate their database into a single platform[5].
5. Upgrading from legacy systems – With large volumes of data stored in legacy
systems, organizations are thinking of moving to the current systems and RDBMs to
capitalize on the rich features and capabilities they offer. Moreover the support for legacy
systems is on the decline, so moving on to contemporary systems makes business sense
in terms of support, service, upgradation and using the latest technology.
Components of Database Migration
Database migration, consists of three major components, they are,
Schema Migration – This consists of mapping and migrating the source schema with the
target schema. For this the schema needs to be extracted from the source system and
the equivalent needs to be replicated in the target system
Data Migration – This is the part where the data is extracted from the source database.
Then it is checked for consistency and accuracy, it is cleansed if necessary. Finally it is
loaded into the target system.
Application Migration – This necessarily consists of changing the database dependent
areas (function calls, data accessing methods etc) of the application so that the
Input/Output behavior of the converted application with the target database is exactly
identical with that of the original application with the source database.
Network database to RDBMs Migration
In the case where migration of a Network database to RDBMs database [4] system is done,
the changes occur at three major levels:
Migration of database design and structure - Each record in the network database system
needs to be converted to a table in the RDBMs and the set relationship has to be
converted to foreign key definitions in the respective RDBMs.
Migration of data
Migration of associated programs and JCLs
The migration methods from Network database to RDBMs database system may vary
according to the extent to which the data and application process flow are modified.
Hierarchical database to RDBMs Migration
[2]
In the case of migration between a Hierarchical database to RDBMs database system,
there are three major software solution. They are:
Database Migration – Approach & Planning 3
4. September 12 and 13, 2002
Satyam Technology Center
GDU, Surface Transport, Bhubaneswar
Language Interfaces – In this method either a SQL interface is provided to the
hierarchical database or a procedural, record by record interface is provided to the
relational database. When it is possible to provide a procedural, record by record
interface is provided to the relational database, only the data needs to be moved but the
existing applications need not change. For the application the database will be
transparent in this case.
Source Code conversion – In this solution the data is moved from the hierarchical
database to the RDBMs. Along with it the source code of all the associated programs are
converted to work with the RDBMs systems.
Data Propagation – When RDBMs system and a hierarchical system are concurrently
run, the data propagators are used to synchronize both the database systems.
The database migration in this case is done at three level, they are:
Mapping and migration of the keys from the hierarchical database system to the RDBMs
system.
Data migration
Migration of Hierarchical database calls to SQL calls.
For example for their database migration, Swiss Bank and IBM have designed and
developed the IBM Data Propagator MVS/ESA, which supports interactive and batch data
propagation. This software migrates data from the hierarchical IMS to the relational DB2,
without affecting existing applications. It supports forward and reverse data propagation,
which lets heterogeneous databases coexist [2]
RDBMs to RDBMs Migration
In case of a RDBMs migration, applications evolve over time and in many cases database
schema will change. There are three levels involved in a RDBMs migration:
Schema Migration
Data Migration
Query Transformation
It is important to develop methods and tools supporting the encoding, elicitation, enrichment
and editing of the schema mapping. Based on the formulation of the schema mapping, one
needs to develop theories and tools for migrating data from one database to another and
converting the SQL for a schema to another schema.
Tasks involved in a database migration project
The major tasks of a database migration can be classified as:
1. Source to Target Mapping – The first need is to map the various parameters of the
source database to the target database. Mapping includes the following:
Data Structure Mapping
Data Type Mapping
Internal Storage Mapping
Physical Storage Mapping
Column Mapping
Semantic Mapping
Index Mapping
A strategy has to be finalized for mapping when the source attributes do not map exactly
with the target attributes. For example, if there is a data type in the source database and
there are no exact counterparts in the target database, then the nature of the data
present would have to be seen and then it has to be decided which data type in the target
environment is close enough to hold the data.
2. Database Constraints Study – A study of the source database constraints must be
undertaken to find out the relationship between different tables and associated
Database Migration – Approach & Planning 4
5. September 12 and 13, 2002
Satyam Technology Center
GDU, Surface Transport, Bhubaneswar
constraints. The study would help to decide where to implement the database
constraints, whether to implement them at the database level or at the application level.
This would also help in the data loading process, where it has to be seen that none of the
constraints are violated.
3. Database Sizing – Sizing involves the process of estimating the parameters of target
database size, taking into consideration various parameters of the source database. The
sizing is done for the database, tables and indexes. This would help in allocating enough
space and setting the proper parameters in the target database.
4. Data Cleansing – Often there is the need to clean the data when moving from an
existing system to a newer one. There may be the case where the data on the older
system may prove to be inconsistent when being moved to the newer system. For
example, if there is an employee id in the older system which is present in different tables
or files in different lengths (say 10 and 15). When this employee id has to be moved to
the target database the consistency of the field length and then which data to move has
to be decided, before it is uploaded to the new system.
There may also be the case in the older system where the data is inconsistent, i.e. the
system has bad data. In this case too while moving to the newer system, the data needs
to be fixed and the authenticity of the data decided before it is moved. Inconsistent data
may be in the form of data being stored in different representation in different
table/files/records. For example the data ”Satyam Computer Services Limited” may be
stored as “Satyam”, “scsl” or ”Satyam Computer Services Limited” in the source system
at various places. But when moving this data into the target database the consistency of
the data has to be taken care of, a single form of data among the above forms has to be
finalized and moved to the target database. Inconsistent, incorrect or “Bad data” would be
problematic for the business process too. For example
Inaccurate data caused an insurance company to raise its risk exposure too high and
suffer very expensive losses on many of the policies it wrote.
A manufacturer sold off what it thought was excess stock because of invalid data.
The company was actually short of stock, leading to thousands of unfilled orders,
unhappy customers, and lost revenue.
Data cleaning also includes the case where there is the need to reformat the existing
data to fit into the target environment. For example Sybase stores dates in date and time
including milliseconds format. While doing a migration to Oracle care has to be taken of
this, as Oracle stores date in date and time till seconds format. So the Sybase data
needs to be cleansed when moving to the Oracle platform.
5. Data Feed –Care has to be taken of the various data sources which are going to feed the
database, It has to be considered if the data is coming from file systems or legacy
systems or some other data source. The data formats, the volume of data etc has also to
be taken into account.
6. Conversion of database programs – All the database programs like stored procedures,
triggers, packages, functions etc need to be converted from the existing database
programs to the target database programs, to support the required business process.
7. Conversion of the Application – The application with database specific dependency
now needs to be converted/improved/enhanced keeping in view the desired capabilities
and features of the target database.
8. Data Extraction and Loading – When the database is ready with proper sizing and an
idea about the nature of data to be loaded is there, the next stage is for the extraction of
data from the source database and loading of the target database. Data is first extracted
from the source database. This can be done by utilities provided with the database (ex.
BCP of Sybase with the “OUT” parameter) or by spooling the data from within the
database to flat files. Data loading can be done primarily in two ways. Either by using
data loading utilities/tools which come with all the major databases (SQL* Loader from
Oracle, BCP from Sybase with the “IN” parameter etc) or by writing database programs
for the target database (ex - stored procedures) to read the extracted data and load them
Database Migration – Approach & Planning 5
6. September 12 and 13, 2002
Satyam Technology Center
GDU, Surface Transport, Bhubaneswar
into the target database. A parallel run of both the system is done and the data on the
new system is synchronized with the existing system.
9. Testing – A thorough testing methodology mainly based on the before image and the
after image of the system needs to be done. The testing should be repetitive and as
exhaustive as possible with the critical conditions taken into considerations.
10. Go Live – This is the final step where the database with it’s data and associated
application are live and in production.
Factors affecting the Database Migration
The general approach and planning for a database migration is by and large same for
different sources and different targets, but they depend to some extent on the following
cases -
Migrating from legacy system to RDBMs
Coupling of the applications with the database – It is simpler for loosely coupled
applications to be migrated than closely coupled ones.
Are associated applications off the shelf or custom built
Does change of database done along with a change of underlying operating system.
Phases of Database Migration
The database migration projects can be divided into distinct phases [1]. Broadly the phases
can be defined as:
Strategy Definition Phase
Analysis Phase
Design Phase
Conversion/Migration Phase
Testing Phase
Implementation Phase
Strategy Definition Phase
During this stage there is the need to define and finalize the exact goal of the database
migration project. Objectives and deliverables are clearly defined. At this stage there is a
macro level view of the entire system and it is decide what all portion of the system will be
affected and touched during the conversion. There is a need to arrive at a definitive plan of
what all needs to be changed and which systems need to be converted. When database
migration is not a stand alone job and is part of a bigger project like system integration or re-
engineering work, the strategy phase of the database migration project should be done
Database Migration – Approach & Planning 6
7. September 12 and 13, 2002
Satyam Technology Center
GDU, Surface Transport, Bhubaneswar
concurrently with the strategy phase of the associated project. This would give a better and a
more firm idea of the working of the overhauled or new system and it’s final working. But the
importance of database migration is generally overlooked when it is part of the bigger project.
This is because in an overall project database migration sounds like an innocuous part;
though this is hardly ever the case so.
The milestone and deliverable of this phase is a strategy document where the goals of the
overall migration effort and the reasons for the conclusion are presented.
Analysis Phase
The analysis stage takes input from the strategy phase. It expands on the strategy
document. By this time the goals of the project are known, so now it is determined what
needs to be done in which area. Here the “quality” of the data in the older system is seen and
it is determined if they will go on to the newer system. During the analysis phase the most
important aspect is to examine all the database objects of the source database and their
equivalence to the target database. For the database, schema extraction and analysis tools
may be used. The schema extraction tool queries the meta-data of the database. The logical
structure of the database is found out in this way. Then the inspection and analysis of the
database programs and the data gives the relationship (cardinality), aggregation, computed
data etc. about the data and data structures in the schema. There may be some
inconsistencies with respect to data types and their internal representation, this has to be
resolved with respect to the target database.
The target system may be replacing or adding to the existing capabilities of the current
system in most cases. Moreover when the newer system is not an exact mapping of the
older system the gaps between the two systems has to be noted. If the new system does
require new data in some areas, it has to be decided how those data can be acquired or
generated based on the older data.
During the analysis phase the following information has to be gathered about the current
system
Hardware and Operating system specifications
Table structure, constraints, table size etc
Indexes and Index size
Stored Procedures, Packages, function and triggers with their complexity
Data types used in the columns
Database maintenance schedules and associated scripts
Associated applications and their processing profile (online, batch etc)
Any ERP/CRM applications involved (SAP, Siebel etc)
Any custom code and code profile (language, development and testing)
Primary development languages and tools (C, Java, VB, Powerbuilder, etc.) for the
developed applications.
Any Middleware used (like tuxedo or any application server)
After the above information is gathered now the stage is ready to map these systems into the
target database.
Design Phase
This phase is where the findings in the analysis phase are validated. Preparations of the
mapping document based on the inputs which are got from the analysis phase is done. This
phase should ideally involve a business analyst who has intimate knowledge of the system
and what it is expected to do. What needs to be done on the ground is finalized based on
their inputs. The database schema for the target system is designed here taking inputs from
the analysis phase. The changes for the database programs and the changes for the
database dependency of the associated applications are also finalized here. Based on the
Database Migration – Approach & Planning 7
8. September 12 and 13, 2002
Satyam Technology Center
GDU, Surface Transport, Bhubaneswar
earlier database and other parameters this is also the place where the target database sizing
is done before actually creating the database physically. Appropriate space is specified for
different database objects. The right sizing at the beginning saves a lot of trouble and
hassles during the final implementation and Go Live stage. Based on the information and
knowledge of the system that has been gathered till this phase following activities are done:
Choose a migration method
Build the migration plan
Now having chosen the migration method and the actual specifics of the migration plan the
migration/conversion in the required areas is started.
Conversion/Migration Phase
This is where the design document is taken as input and actual conversion/migration of the
database is started. The actual conversion can be represented as shown in figure – 2.
The implementation can be broadly divided into three areas.
Database schema creation - Here the target database schema is created according to
the inputs from the design phase. Target database and the required database objects as
tables, views, synonyms, indexes, sequences, users, roles etc are created according to
the required schema and sizing of the database.
Data extraction and loading - The data from the source database is first extracted to
prepare for loading into the target database. Database programs may be written to load
up the data or data loader utilities, which come with the databases (ex – SQL loader in
Oracle and BCP in Sybase) may be used. The actual data cleansing and manipulation
may be done here before loading them into the target database.
Moving of the associated applications and database programs to the new system -
Databases come with associated database objects and programs (stored procedures,
triggers, functions, packages etc) and applications. So in the migration work these
applications and database objects have to be changed to fit the target database.
Wherever third party interfaces like ODBC/JDBC are used to connect the applications
with the database, the migration is fairly straightforward and simple. But where database
Database Migration – Approach & Planning 8
9. September 12 and 13, 2002
Satyam Technology Center
GDU, Surface Transport, Bhubaneswar
specific or native code is used for the database connectivity of the application, more effort
is required for the same.
Unit testing is also done in this phase. This phase also includes the enhancement of the
testing strategy, which needs to be implemented later. Taking inputs from the business
analysts, end-users and seeing the program itself, a comprehensive testing strategy and
methodology should be arrived at which would validate the migration work.
System and Integration Testing Phase
This is a very crucial aspect of the entire project. This phase would include the actual
programmers, business analyst and the end–users testing in tandem. The testing strategy
prepared during the implementation stage comes very handy at this stage. At this stage the
aim is to capture any logical error that might have crept during the migration of the system.
The key points to keep in mind while testing are:
Has data loading moved all relevant data to the target database
Is proper data residing in the intended tables and fields
Are the associated applications and database programs doing what they were intended
to do and manipulating the data properly.
Here the help of the business analysts and the end – users are taken. Though the end users
may not help at previous stage, but once they see the system physically they would surely
help in validating if the converted system behaves as the previous system. The more the
involvement of the end user in this phase the better the chances to ward off possible errors
and aberrations in the system at this stage. Users are more suited to test the database
programs and associated applications but they would find it very difficult to test the
authenticity of the loaded data in the target database. For testing for proper data loading
automated tools or utilities may be used. These utilities compare the data in the source and
target database and come up with discrepancy if any in them. Then it is for the developers
and the business analysts to go through the discrepancy and decide if this falls in line with
the data cleansing scope or if it is an error in data loading. The variation and inconsistencies
of the data on the target database from the source database may lead problems in the
business process. For example It may not come as surprise if at this stage, based on the
data discrepancy observed during testing it is found that some activities have been missed,
which needed to be taken care of earlier. Then a small iteration of the previous stages is
done to rectify this.
Implementation Phase
The implementation is the stage where the target system goes live. It primarily depends on
the type of system and differs on case to case basis. The general approach is to have a
parallel run of the older and the new system. When the reliability of the new system is
assured, working of the older system is stopped. Before this there must be a comprehensive
backup of the data and a recovery strategy formalized to face any unforeseen scenario
where reverting back to the existing system is needed.
Migration Tools/Utilities
With many similar tasks to be performed during the course of database migration, there are
many tools/utilities available to aid and assist in the process. Tools are generally used to
capture the meta-data from the source database and store it in a repository. From here they
generate the schema for the target database. They also help to capture all the database
programs like stored procedures, packages, triggers, functions etc from the source database
which are also stored in a central repository and convert them into target database programs
Database Migration – Approach & Planning 9
10. September 12 and 13, 2002
Satyam Technology Center
GDU, Surface Transport, Bhubaneswar
with minimal human intervention. The tools/utilities also help in the extraction and loading of
data. All this is done with scope for customization according to the requirements at every
stage of the migration. [3]
All major database vendors have their own database migration tools (like OMWB – Oracle
Migration Workbench, Ispirer Chyfo, SQLPorter, CRYSWARE for migration from mainframes
to RDBMs). Vendors also have their own constancy services to help in the process of
migration. When the tools assist and support the above mentioned phases they would prove
helpful based on the end requirement.
Case Study
A case study of a database migration project is provided in Annexure –1.
Conclusion
Database migration is seldom achieved in a single effort, as there are a host of unique
factors associated with database migration itself and the uniqueness of the customer’s
requirement. Hence it is not exactly a cut and dry affair but is different every time. But the
authors believe this approach would minimize the iteration and rework involved in a database
migration project. The secret to a hassle free database migration project is to invest more
time and energy in the analysis and design stage and monitoring of each stage very carefully
from the first day.
References
[1] Data Migration Methodology, Web Qualify, Satyam Computers Services Limited.
[2] Hierarchical to Relational Database Migration – Andreas Meier, Rolf Dippolod, Jacky
Mercerat, Alex Muriset, Jean-Claude Untersinger, Robert Eckerlin and Flavio Ferrara –IEEE
Software 1994, vol-IV, PP-21 to 27.
[3] http://otn.oracle.com/tech/migration/workbench/content.html
[4] Migrating from CA-IDMS® to a Relational Database, Prince Software Inc.
[5] The Great Migration, Oracle Magazine, May/June-2002 Volume XVI, Issue 3, page–57-65
About the Authors:
Keshav Tripathy – Has been with Satyam from February 2001. Currently working with GDU
Surface Transport as Project Manager. He was responsible for migrating legacy applications
to ERP as well as database migrations, where he has handled migration issues. His area of
interest are database design, data modeling and semantic query optimization. He can be
reached at Keshav_Tripathy@satyam.com
Biraja Prasad Nath - Has been with Satyam since June 1997. Currently working with GDU
Surface Transport as a Project Leader. He has worked extensively in Oracle, Sybase and
SQL Server database. He has handled number of database migration projects. His area of
interest is Database design, Data Modeling and Database tuning. He can be reached at
Biraja_Nath@satyam.com
Pragjnyajeet Mohanty– Has been working with Satyam from September 2000. Currently
working with GDU Surface transport as team member. He has worked on various modules of
database migration projects. His interest lies in both curricular and extra curricular activities.
He can be reached at Prag_Mohanty@satyam.com
Database Migration – Approach & Planning 10