Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g
1. Database
MIGRATING A DATA WAREHOUSE FROM MICROSOFT SQL SERVER TO ORACLE
11G
Dylan Kucera, Senior Manager – Data Architecture
Ontario Teachers’ Pension Plan
INTRODUCTION
IT infrastructure is often sized according to a 3 to 5 year growth projection which can generally be understood as a sensible
and cost effective practice. A sign of success for any deployment is when demand begins to outstrip the capability of the
technology after this time period has past. When looking at the organization’s central Data Warehouse, a DBA or senior
technology architect may foresee the need for a stronger technology capability, however, management may not be so easily
convinced. Furthermore, core services such as an organization’s central Data Warehouse can be difficult to replace with a
different vendor solution once dozens or even hundreds of mission critical applications are wired to the existing deployment.
With patience, extensive metrics gathering, and a strong business case, management buy-in may be attainable. This paper
outlines a number of hints that are worth considering while crafting a business case for a Data Warehouse migration to
present to IT management.
Oracle Database 11g provides a number of key technologies that allow for a gradual Data Warehouse migration strategy to
unfold over a period of staged deployments, minimizing and in some cases completely eliminating disruption to the end-user
experience. The main purpose of this paper is to outline these technologies and how they can be employed as a part of the
migration process. This paper is also meant to point out a number of pitfalls within these technologies, and how to avoid
them.
GAINING BUY-IN FOR A DATA WAREHOUSE MIGRATION
When it comes to pitching a Data Warehouse migration, patience isn’t just a virtue, it is a requirement. Go in expecting the
acceptance process to take a long time.
Armed with the knowledge that you will return to this topic with the management group a number of times, plan for these
iterations and make your first presentation about your timeline for evaluation. Unfold your message in a staged fashion.
Start thinking about metrics before anything else; management understands metrics better than technology. The challenge
with this, however, is to make sure the metrics encompass the entire problem at hand. If the bar is set too low because your
metrics do not capture the full scope of your Data Warehouse challenges, your goal of beginning a migration path may be
compromised as management may not understand the severity or urgency of the issues.
Remember to tie every message to management about the Data Warehouse to business requirements and benefits. As a
technologist you may naturally see the benefit to the business, but don’t expect management to make this leap with you.
Highlight ways in which the technology will help meet service levels or business goals.
1 Session 387
2. Database
ARTICULATING BENEFITS OF A DATA WAREHOUSE MIGRATION
Benefits of a Data Warehouse migration must be tailored to the specific business requirements of the organization in
question. There are a number of general areas where Oracle Database 11g is particularly strong and will likely stand out as
providing significant benefit in any circumstance.
• Scalability
While RAC will be the obvious key point around scalability, RAC is actually only part of the solution. Consider how
the Locking Model in Microsoft SQL Server enforces one uncommitted writer to block all readers of the same row
until the writer commits. Oracle Database 11g on the other hand allows readers to proceed with reading all
committed changes up to the point in time where their query began. The latter is the more sensible behaviour in a
large scale Data Warehouse. Microsoft SQL Server will choose to perform a lock escalation during periods of peak
load, in the worst case scenario causing an implicit lock of the temporary space catalog, effectively blocking all work
until this escalation is cleared. Oracle on the other hand has no such concept of escalating a row level lock. Again,
the latter behaviour will provide superior service in a busy multi-user Data Warehouse. Also evaluate the mature
Workload Balancing capabilities of Oracle Database 11g to allow preferential treatment of priority queries.
• Availability
RAC is of course the key availability feature in Oracle 11g. Be sure to also consider Flashback capabilities which can
allow for much faster recovery from data corruption than a traditional backup/restore model. Evaluate other
availability issues in your environment; for example, perhaps your external stored procedures crash your SQL Server
because they run in-process unlike Oracle extprocs which run safely out-of-process.
• Environment Capability
PL/SQL is a fully featured language based on ADA and as such may simplify development within your environment.
The Package concept allows for code encapsulation and avoids global namespace bloat for large and complex
solutions. Advanced Data Warehousing features such as Materialized Views may greatly simplify your ETL
processes and increase responsiveness and reliability.
• Maintainability
Oracle Enterprise Manager is a mature and fully featured management console capable of centralizing the
management of a complex data warehouse infrastructure. Your current environment may involve some amount of
replication that was put in place to address scalability. Consider how RAC could lower maintenance costs or increase
data quality by eliminating data replication.
• Fit with strategic deployment
Perhaps your organization is implementing new strategic products or services that leverage the Oracle Database.
Should this be the case, be sure to align your recommendations to these strategies as this could be your strongest and
best understood justification.
2 Session 387
3. Database
EXECUTING A DATA WAREHOUSE MIGRATION
If you are lucky enough to manage a Data Warehouse that has a single front-end such as an IT managed Business Intelligence
layer, then it may be possible for you to plan a “Big Bang” migration. More likely, however, your Data Warehouse has
dozens or hundreds of direct consumers, ranging from Business Unit developed Microsoft Access links to complex custom
legacy applications. Given this circumstance, a phased migration approach must be taken over a longer period of time. You
will need to expose a new Data Warehouse technology that can be built against while continuing to support the legacy Data
Warehouse containing a synchronized data set. This paper outlines three Oracle Database capabilities that are key to the
success of a seamless large-scale Data Warehouse Migration: Oracle Migration (Workbench), Transparent Gateway (Data
Gateway as of Oracle 11g), and Oracle Streams Heterogeneous Replication.
ORACLE MIGRATION (WORKBENCH)
The Oracle Migration features of Oracle’s SQL Developer (formerly known as Oracle Migration Workbench, hereafter
referred to as such for clarity) can help fast-track Microsoft Transact-SQL to Oracle PL/SQL code migration. Be aware
though that a machine will only do a marginal job of translating your code. The translator doesn’t know how to do things
“Better” with PL/SQL than was possible with Transact-SQL. The resulting code product almost certainly will not conform
to your coding standards in terms of variable naming, formatting, or syntax. You need to ask yourself the difficult question as
to whether the effort or time saved in using Oracle Migration Workbench is worth the cost of compromising the quality of
the new Data Warehouse code base.
Executing the Oracle Migration Workbench is as simple as downloading the necessary Microsoft SQL Server JDBC driver,
adding it to Oracle SQL Developer, creating a connection to the target SQL Server, and executing the “Capture Microsoft
SQL Server” function, as shown below
Figure 1 : Oracle Migration – Capturing existing code
3 Session 387
4. Database
Once Oracle captures the model of the target SQL Server, you will be able to view all of the Transact-SQL code. The sample
below shows a captured Transact-SQL stored procedure that employs a temporary table and uses a number of Transact-SQL
functions such as “stuff” and “patindex”
Figure 2 : Oracle Migration – Existing code captured
Using Oracle Migration Workbench to convert this Transact-SQL to Oracle PL/SQL produces this result:
Figure 3 : Oracle Migration – Translated Code
Notice that the Temporary table is converted to the necessary DDL that will create the analog Oracle Global Temporary
Table. The name, however, may be less than desirable because tt_ as a prefix does not necessarily conform to your naming
standards. Furthermore, the Global Temporary Table is now global to the target schema and should probably have a better
name than the Transact-SQL table “Working” which was isolated to the scope of the single stored procedure. Also notice
that because there are often subtle differences in the built-in Transact-SQL functions as compared to similar PL/SQL
functions, the Oracle Migration Workbench creates a package of functions called “sqlserver_utilities” to replicate the
behaviour of the Transact-SQL functions precisely. Again, this might not be the best choice for a new code base.
4 Session 387
5. Database
Oracle Migration Workbench can also be used to migrate tables, data, and other schema objects. Taking this approach
however considerably limits your ability to rework the data model in the new Oracle Data Warehouse. Using Oracle
Migration Workbench to migrate tables and data is not well suited to a “Parallel support” model where both the legacy Data
Warehouse as well as the new Oracle Data Warehouse will be kept in sync as applications are migrated. The remaining
sections of this paper describe an alternate approach to table and data migration that provides a seamless and paced migration
path.
TRANSPARENT GATEWAY (DATA GATEWAY)
Oracle Transparent Gateway (branded Data Gateway as of 11g; this paper uses Transparent Gateway to avoid confusion with
the general word “Data”) is an add-on product for Oracle Database that provides access to foreign data stores via Database
Links. Transparent Gateway is similar to Heterogeneous Services (included as part of the base Oracle Database license),
however, Transparent Gateway is built for specific foreign targets and as such enables features not available in Heterogeneous
Services such as foreign Stored Procedure calls and Heterogeneous Streams Replication.
VIEWS EMPLOYING TRANSPARENT GATEWAY
One tool that can be used to fast-track the usefulness of your new Oracle Data Warehouse is to employ Views that link
directly to the legacy data store. This approach can be used for key tables that will require some planning and time to fully
migrate, and yet the availability of these tables will greatly influence the adoption of the new Oracle Data Warehouse. Once
you consider the new table name and column names that meet with your standards, a View can be created similar to the
following example:
CREATE OR REPLACE VIEW PLAY.VALUE_TABLE_SAMPLE AS
SELECT
"IDENTIFIER" AS ID_,
"VALUE" AS VALUE_,
FILE_DATE AS FILE_DATE
FROM
SampleLegacyTable@MSSQL
Figure 4 : Oracle View to Legacy table
Perhaps your naming standards suggest that the prefix VIEW_ should be used for all Views. Keep in mind though that this
View is destined to become a physical table on Oracle once the data population process can be moved and a synchronization
strategy employed. This paper will assume some sort of ETL process is used for data population, but even transactional
tables can be considered for a staged migration using this approach so long as the locking model of the legacy system is
considered carefully.
PITFALLS OF TRANSPARENT GATEWAY (VIEWS)
When developing queries against a View that uses Transparent Gateway such as the one shown above, it is important to
remember that these Views are meant as a stop-gap measure. Creating complex queries against these sorts of Views is a risky
venture. For example, consider the following query:
5 Session 387
6. Database
DECLARE
tDate DATE := '2008-12-31';
BEGIN
INSERT INTO PLAY.TEMP_SAMPLE_7445
(ID_, NAME_, PREV_VALUE, CURR_VALUE,
VALUE_SUPPLIER, DATE_VALUE_CHANGED)
SELECT ID_, '', '', '', 'SAMPLE', MAX(FILE_DATE)
FROM PLAY.VALUE_TABLE_SAMPLE
WHERE FILE_DATE <= tDate
GROUP BY ID_;
END;
Figure 5 : Complex use of View can cause internal errors
This query, because it inserts to a table, selects a constant, uses an aggregate function, filters using a variable, and employs a
Group By clause, throws an ORA-03113: end-of-file on communication channel error (Alert log shows ORA-07445:
exception encountered: core dump [intel_fast_memcpy.A()+18] [ACCESS_VIOLATION] [ADDR:0x115354414B]
[PC:0x52A9DFE] [UNABLE_TO_READ] [])
While this particular problem is fixed in Oracle Database 11.1.0.6 patch 10 and 11.1.0.7 patch 7, getting this patch from
Oracle took several months. The example is meant to illustrate that queries of increased complexity have a higher likelihood
of failing or hanging. Keeping this in mind, Views over Transparent Gateway can be a powerful tool to bridge data
availability gaps in the short-term.
STORED PROCEDURES EMPLOYING TRANSPARENT GATEWAY
In a similar vain to creating pass-through Views to quickly expose legacy data to the Oracle Data Warehouse, Stored
Procedure wrappers can be created to provide an Oracle PL/SQL entry point for legacy stored procedures. This method can
be particularly useful in preventing the creation of new application links directly to stored procedures within the legacy Data
Warehouse when it is not possible to immediately migrate the logic contained within the stored procedure.
Consider the following Microsoft Transact-SQL stored procedure:
CREATE PROCEDURE dbo.GetScheduleForRange
@inStartDate DATETIME,
@inEndDate DATETIME
AS
SELECT DATE, DURATION, SESSION_ID, TITLE
FROM NorthWind..COLLABSCHED
WHERE DATE BETWEEN @inStartDate AND @inEndDate
Figure 6 : Transact SQL procedure
6 Session 387
7. Database
The following PL/SQL wrapper produces a simple yet effective Oracle entry point for the legacy procedure above:
CREATE OR REPLACE PROCEDURE PLAY.RPT_COLLABORATE_SCHEDULE_RANGE (
inStart_Date DATE,
inEnd_Date DATE,
RC1 IN OUT SYS_REFCURSOR) IS
tRC1_MS SYS_REFCURSOR;
tDate DATE;
tDuration NUMBER;
tSession_ID NUMBER;
tTitle VARCHAR2(256);
BEGIN
DELETE FROM PLAY.TEMP_COLLABORATE_SCHEDULE;
dbo.GetScheduleForRange@MSSQL(inStart_Date, inEnd_Date, tRC1_MS);
LOOP
FETCH tRC1_MS INTO tDate, tDuration, tSession_ID, tTitle;
EXIT WHEN tRC1_MS%NOTFOUND;
BEGIN
INSERT INTO PLAY.TEMP_COLLABORATE_SCHEDULE
(DATE_, DURATION, SESSION_ID, TITLE)
VALUES(tDate, tDuration, tSession_ID, tTitle);
END;
END LOOP;
CLOSE tRC1_MS;
OPEN RC1 FOR
SELECT DATE_, DURATION, SESSION_ID, TITLE
FROM PLAY.TEMP_COLLABORATE_SCHEDULE
ORDER BY SESSION_ID;
END RPT_COLLABORATE_SCHEDULE_RANGE;
Figure 7 : PL/SQL wrapper for legacy procedure
Regardless of the complexity of the body of the Transact-SQL stored procedure, a simple wrapper similar to the one above
can be created using only the knowledge of the required parameters, the structure of the result set and a simple 5 step
formula:
1. Declare Variables for all Transact-SQL Result set columns
2. Call Transact-SQL Procedure
3. Fetch Result one row at a time
4. Insert row to Oracle Temporary Table
7 Session 387
8. Database
5. Open Ref Cursor result set
PITFALLS OF TRANSPARENT GATEWAY (STORED PROCEDURES)
Oracle Data Gateway for Microsoft SQL Server version 11.1.0.6 for Windows 32-bit contains a rather serious bug with
respect to calling remote stored procedures that return result sets and actually attempting to retrieve the contents of the result
set. Calling the procedure above using an ODBC driver:
{CALL PLAY.RPT_COLLABORATE_SCHEDULE_RANGE('2009-05-06 12:00:00', '2009-05-06 17:00:00')}
Results in ORA-06504: PL/SQL: Return types of Result Set variables or query do not match. This bug is not fixed until
11.1.0.7 Patch 7 which needs to be applied to the Gateway home (assuming the Gateway is installed in a different Oracle
home than the Database).
ORACLE STREAMS AS AN ENABLER OF MIGRATION
Proxies for Views and Stored Procedures like the ones shown above can be helpful in making your new Oracle Data
Warehouse useful in the early stages of a migration effort. How can you then begin to migrate tables and data to Oracle while
still providing a transition period for applications where data is equally available in the legacy Data Warehouse? In any case
you will need to start by developing a new load (ETL) process for the Oracle Data Warehouse. Perhaps you could just leave
the old ETL process running in Parallel. Employing this approach, reconciliation would be a constant fear unless you have
purchased an ETL tool that will somehow guarantee both Data Warehouses are loaded or neither is loaded. A more elegant
approach that won’t overload your ETL support people is to employ Oracle Streams Heterogeneous Replication.
Oracle Streams combined with Transparent Gateway allows for seamless Heterogeneous Replication back to the legacy Data
Warehouse. Using this approach, the Data Warehouse staff need build and support only one ETL process, and DBA’s
support Oracle Streams like any other aspect of the Database Infrastructure.
ORACLE STREAMS – IF WE BUILD IT, WILL THEY COME?
Old habits die hard for Developers and Business users. Legacy systems have a way of surviving for a long time. How can
you motivate usage of the new Oracle Data Warehouse?
A strong set of metadata documentation describing how the new model replaces the old model will be a key requirement in
helping to motivate a move toward the new Data Warehouse. Easy to read side by side tables showing the new vs. old data
structures will be welcomed by your developers and users. Try to make these available in paper as well as online form just to
make sure you’ve covered everyone’s preference in terms of work habits. Be prepared to do a series of road-shows to display
the new standards and a sample of the metadata. You will need to commit to keeping this documentation up to date as you
grow your new Data Warehouse and migrate more of the legacy.
Occasionally your development group will find that it can no longer support a legacy application because it is written in a
language or manner that no one completely understands any more. You need to make sure standards are put in place early
and have your Architecture Review staff enforcing that the newly designed and engineered application must access only the
new Data Warehouse. Try to prioritize your warehouse migration according to the data assets this re-engineered application
requires to minimize exceptions and/or many Views and Stored Procedure proxies.
Some Data Warehouse access will never be motivated to migrate by anything other than a grass roots effort from the Data
Warehouse group. You may find that Business Unit developed applications have this characteristic. You should be planning
8 Session 387
9. Database
for a certain amount of Data Warehouse staff time that will be spent with owners of these (often smaller departmental)
solutions to help users re-target their data access to the new Data Warehouse.
9 Session 387
10. Database
Once in a while, a project will be sponsored that requires a significant overhaul of an application; so much so that the effort is
essentially a full re-write. Much like the circumstance of the development group refreshing the technology behind an
application, you want to be sure that the right members of the project working group are aware of the new Data Warehouse
standards. You should try to help them understand the benefits to the project in order to create an ally in assuring that the
proper Warehouse is targeted.
Finally, completely new solutions will be purchased or built. You should aim to be in the same position to have these
deployments target the new Oracle Data Warehouse as described in some of the situations above.
ORACLE STREAMS – BUILDING A HETEROGENEOUS STREAM
When building a Heterogeneous Streams setup, the traditional separated Capture and Apply model must be used. Much can
be learned about the architecture of Oracle Streams by reading the Oracle Streams Concepts and Administration manual. In a very
small nutshell, the Capture Process is responsible for Mining the archive logs and finding/queueing all DML that needs to be
sent to the legacy Data Warehouse target. The Apply Process takes from this queue and actually ships the data downstream
to the legacy target.
In general, Streams is a very memory hungry process. Be prepared to allocate 2 to 4 gigabytes of memory to the Streams
Pool. Explicitly split your Capture and Apply processes over multiple nodes if you are employing RAC in order to smooth
the memory usage across your environment. The value that Streams will provide to your Data Warehouse migration strategy
should hopefully pay for the cost of the memory resources it requires.
ORACLE STREAMS – CAPTURE PROCESS AND RULES
The Capture process is created the same way as any Homogeneous capture process would be and is well described in the
manual Oracle Streams Concepts and Administration. This paper will therefore not focus on the creation of the Capture process
further, except to show a script that can be used to create an example Capture process called “SAMPLE_CAPTURE” and a
Capture rule to capture the table “PLAY.COLLABORATE_SCHEDULE”:
BEGIN
DBMS_STREAMS_ADM.SET_UP_QUEUE(
queue_table => 'SAMPLE_STREAM_QT',
queue_name => 'SAMPLE_STREAM_Q',
queue_user => 'STRMADMIN'
);
END;
/
BEGIN
DBMS_CAPTURE_ADM.CREATE_CAPTURE(
queue_name => 'SAMPLE_STREAM_Q',
capture_name => 'SAMPLE_CAPTURE',
capture_user => 'STRMADMIN',
checkpoint_retention_time => 3
);
END;
/
10 Session 387
12. Database
BEGIN
DBMS_STREAMS_ADM.ADD_TABLE_RULES(
table_name => 'PLAY.COLLABORATE_SCHEDULE',
streams_type => 'CAPTURE',
streams_name => 'SAMPLE_CAPTURE',
queue_name => 'SAMPLE_STREAM_Q',
include_dml => true,
include_ddl => false,
include_tagged_lcr => false,
inclusion_rule => true
);
END;
/
Figure 9 : Oracle Streams – Standard Capture Rule
ORACLE STREAMS – TRANSPARENT GATEWAY CONFIGURATION
Before you begin building the Streams Apply process, a Transparent Gateway Database Link must first be in place. The
recommended configuration is to create a separate Database Link for your Streams processes even if you have a Database
Link available to applications and users to the same remote target. Doing so allows you to use different permissions for the
Streams user (eg. The Streams link must be able to write to remote tables while Applications must not write to these same
tables or the replication will become out of sync!), and also provides flexibility in configuring or even upgrading and patching
the gateway for Streams in a different way than the gateway for applications and users.
Creating and configuring the Database Link for Streams is therefore like any other Database Link, except we will make it
owned by the database user STRMADMIN. This example shows a link named MSSQL_STREAMS_NORTHWIND that
links to the SQL Server Northwind database on a server named SQLDEV2:
#
# HS init parameters
#
HS_FDS_CONNECT_INFO=SQLDEV2//Northwind
HS_FDS_TRACE_LEVEL=OFF
HS_COMMIT_POINT_STRENGTH=0
HS_FDS_RESULTSET_SUPPORT=TRUE
HS_FDS_DEFAULT_OWNER=dbo
Figure 10 : Text file “initLDB_STREAMS_NORTHWIND.ora”
CREATE DATABASE LINK MSSQL_STREAMS_NORTHWIND
CONNECT TO STRMADMIN IDENTIFIED BY ********
USING 'LDB_STREAMS_NORTHWIND’;
Figure 11 : DDL to create Database Link MSSQL_STREAMS_NORTHWIND
12 Session 387
13. Database
ORACLE STREAMS – APPLY PROCESS AND RULES
The Streams Apply Process is where the work to send rows to the Heterogeneous target occurs. Each step in the Apply
Process and Rules creation/configuration is worth looking at in some detail and so this paper will focus more closely on the
Apply Process configuration than previous steps.
When creating a Heterogeneous Apply Process, a Database Link is named. This means that in the design of your Streams
Topology, you will need to include at least one Apply Process for each “Database” on the target server. This is especially
important to consider when targeting Microsoft SQL Server or Sybase, as a Database in those environments is more like a
Schema in Oracle. Below is a script to create a sample Heterogeneous Apply process called
“SAMPLE_APPLY_NORTHWIND”:
BEGIN
DBMS_APPLY_ADM.CREATE_APPLY(
queue_name => 'SAMPLE_STREAM_Q',
apply_name => 'SAMPLE_APPLY_NORTHWIND',
apply_captured => TRUE,
apply_database_link => 'MSSQL_STREAMS_NORTHWIND'
);
END;
/
Figure 12 : Oracle Streams – Heterogeneous Apply
In a Heterogeneous Apply situation, the Apply Table Rule itself does not differ from a typical Streams Apply Table Rule.
Below is an example of an Apply Table Rule that includes the same table we captured in the sections above,
PLAY.COLLABORATE_SCHEDULE, as a part of the table rules for the Apply Process
SAMPLE_APPLY_NORTHWIND.
BEGIN
DBMS_STREAMS_ADM.ADD_TABLE_RULES(
table_name => 'PLAY.COLLABORATE_SCHEDULE',
streams_type => 'APPLY',
streams_name => 'SAMPLE_APPLY_NORTHWIND',
queue_name => 'SAMPLE_STREAM_Q',
include_dml => true,
include_ddl => false
);
END;
/
Figure 13 : Oracle Streams – Standard Apply Rule
13 Session 387
14. Database
ORACLE STREAMS – APPLY TRANSFORMS – TABLE RENAME
The Apply Table Rename transform is one of the most noteworthy steps in the process of setting up Heterogeneous streams
because it is absolutely required, unless you are applying to the same schema on the legacy Data Warehouse as the schema
owner of the table in the new Oracle Data Warehouse. It is more likely that you have either redesigned your schemas to be
aligned with the current business model, or in the case of a Microsoft SQL Server legacy you have made Oracle Schemas out
of the Databases on the SQL Server, and the legacy owner of the tables is “dbo”. You may also have wanted to take the
opportunity to create the table in the Oracle Data Warehouse using more accurate or standardized names. Below is an
example of an Apply Table Rename transform that maps the new table PLAY.COLLABORATE_SCHEDULE to the legacy
dbo.COLLABSCHED table in the Northwind database:
BEGIN
DBMS_STREAMS_ADM.RENAME_TABLE(
rule_name => 'COLLABORATE_SCHEDULE554',
from_table_name => 'PLAY.COLLABORATE_SCHEDULE',
to_table_name => '"dbo".COLLABSCHED',
step_number => 0,
operation =>'ADD');
END;
/
Figure 14 : Oracle Streams – Apply Table Rename rule
Notice that the rule name is suffixed in this example with the number 554. This number was chosen by Oracle in the Add
Table Rule step. You will need to pull this out of the view DBA_STREAMS_RULES after executing the
ADD_TABLE_RULE step, or write a more sophisticated script that stores the rule name in a variable using the overloaded
ADD_TABLE_RULE procedure that allows this to be obtained as an OUT variable.
One final note about the Rename Table transform: it is not possible to Apply to a Heterogeneous target table whose name is
in Mixed Case. For example, Microsoft SQL Server allows for mixed case table names. You will need to have your DBA’s
change the table names to upper case on the target before the Apply process will work. Luckily Microsoft SQL Server is
completely case insensitive when it comes to the use of the tables, and so while changing the table names to upper case may
make a legacy “Camel Case” table list look rather ugly, nothing should functionally break as a result of this change.
ORACLE STREAMS – APPLY TRANSFORMS – COLUMN RENAME
The Column Rename transform is similar in nature to the Table Rename Transform. Notice in the example below how a
column is being renamed because the legacy table contains a column named “DATE” which is completely disallowed in
Oracle as a column name because DATE is a key word (data type). The same restriction applies to Column names as with
Table names in a Heterogeneous Apply configuration: All column names on the target must be in upper case. Again, this
should have no impact on your legacy code as systems that allow mixed case column names such as Microsoft SQL Server are
typically not case sensitive when using the column.
14 Session 387
15. Database
BEGIN
DBMS_STREAMS_ADM.RENAME_COLUMN(
rule_name => 'COLLABORATE_SCHEDULE554',
table_name => 'PLAY.COLLABORATE_SCHEDULE',
from_column_name => '"DATE_"',
to_column_name => '"DATE"',
value_type => '*',
step_number => 0,
operation => 'ADD');
END;
/
Figure 15 : Oracle Streams – Apply Column Rename rule
ORACLE STREAMS – EXERCISING THE STREAM
Assuming we have tables set up in both the legacy and new Data Warehouse that have only the table name and one column
name difference in terms of structure, the steps above are sufficient to now put the Stream into action. Streams has no ability
to synchronize tables that are out of sync. Before setting up the Stream you must ensure that the table content matches
exactly. Let’s assume for now that you are starting with zero rows and plan to insert all the data after the Stream is set up.
The screenshot below illustrates for this example that the legacy target table on Microsoft SQL Server is empty:
Figure 16 : Oracle Streams – Empty Microsoft SQL Server target table
Below is a rudimentary script showing the execution of some seed data being inserted into the Oracle table. You would of
course want to use a more sophisticated approach such as SQL*Loader, however, this sample is meant to be simple for the
purposes of understanding and transparency:
15 Session 387
16. Database
SQL> INSERT INTO PLAY.COLLABORATE_SCHEDULE (DATE_,DURATION,SESSION_ID,TITLE) VALUES
('2009-05-06 08:30:00',60,359,'Oracle Critical Patch Updates: Insight and Understanding');
1 row inserted
SQL> INSERT INTO PLAY.COLLABORATE_SCHEDULE (DATE_,DURATION,SESSION_ID,TITLE) VALUES
('2009-05-06 11:00:00',60,237,'Best Practices for Managing Successful BI Implementations');
1 row inserted
SQL> INSERT INTO PLAY.COLLABORATE_SCHEDULE (DATE_,DURATION,SESSION_ID,TITLE) VALUES
('2009-05-06 12:15:00',60,257,'Best practices for deploying a Data Warehouse on Oracle Database 11g');
1 row inserted
SQL> INSERT INTO PLAY.COLLABORATE_SCHEDULE (DATE_,DURATION,SESSION_ID,TITLE) VALUES
('2009-05-06 13:30:00',60,744,'Business Intelligence Publisher Overview and Planned Features');
1 row inserted
SQL> INSERT INTO PLAY.COLLABORATE_SCHEDULE (DATE_,DURATION,SESSION_ID,TITLE) VALUES
('2009-05-06 15:15:00',60,387,'Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g');
1 row inserted
SQL> INSERT INTO PLAY.COLLABORATE_SCHEDULE (DATE_,DURATION,SESSION_ID,TITLE) VALUES
('2009-05-06 16:30:00',60,245,'Data Quality Heartburn? Get 11g Relief');
1 row inserted
SQL> COMMIT;
Commit complete
Figure 17 : Oracle Streams – Inserting to the new Oracle Data Warehouse Table
Allowing the Capture and Apply processes a few seconds to catch-up, re-executing the query from above on the legacy Data
Warehouse shows that the rows have been replicated through Streams to the target.
Figure 18 : Oracle Streams – Populated Microsoft SQL Server target Table
16 Session 387
17. Database
ORACLE STREAMS - STREAMS SPEED AND SYNCHRONIZING TABLES IN ADVANCE
While the capabilities of Oracle Streams to be able to seamlessly replicate data to a heterogeneous legacy target are
phenomenal, Streams and especially Heterogeneous Streams over Transparent Gateway won’t be knocking your socks off in
terms of speed. At best, with today’s hardware, you will see 500-600 rows per second flowing through to the target. In a fully
built up Data Warehouse, you’re more likely to see 100-200 rows per second. Hopefully you’ll be able to engineer your ETL
processes so that this limited speed won’t be an issue due to the incremental nature of the Data Warehouse. But let’s say your
Data Warehouse table needs to be seeded with 2 million rows of existing data. The smarter way to start in this case is to
synchronize the tables before setting up the Stream. This approach comes with some extra considerations outlined below.
ORACLE STREAMS – EMPTY STRINGS VS. NULL VALUES
Microsoft SQL Server treats empty strings as distinct from NULL values. Oracle on the other hand does not. If you
synchronize your tables outside of Streams, you must ensure there are no empty strings in the Microsoft SQL Server data
before doing so. If you find a column that contains empty strings, there may be some leg work required in advance to make
sure there are no consuming systems that will behave differently if they see a NULL instead of an empty string.
ORACLE STREAMS – SYNCHRONIZING TABLES CONTAINING FLOATS
One of the simplest ways one can imagine to synchronize the new Oracle Data Warehouse table with the legacy table is to use
an Insert/Select statement to select the data from Transparent Gateway and insert the data to the Oracle target. A set
operation via Transparent Gateway will after all work orders of magnitude faster than Streams operating row by row.
Unfortunately, if your data contains Float or Real columns in Microsoft SQL Server, this method will not work due to a
limitation in Transparent Gateway. This limitation is best illustrated with an example. Below is a sample of a couple of
floating point numbers being inserted to a Microsoft SQL Server table. Notice the final two digits of precision:
Figure 19 : Oracle Streams – Floats in Microsoft SQL Server
Now have a look at the very same table selected via Oracle Transparent Gateway. Notice how in either case, using the default
display precision or explicitly forcing Oracle to show us 24 digits of precision, the last two digits of precision are missing
when compared to the Select done straight on the SQL Server above:
17 Session 387
18. Database
Figure 20 : Oracle Streams – Floats over Transparent Gateway
A fact that is unintuitive and yet undeniably clear once you begin working with Heterogeneous Streams: the manner in which
Oracle Streams uses Transparent Gateway will require the digits of precision that are missing from the Gateway Select
statement. If you were to sync up the table shown above to an equivalent Oracle table using an Insert/Select over
Transparent Gateway, set up a Capture and Apply process linking the tables, and finally delete from the Oracle side, the
Streams Apply Process would fail with a “No Data Found” error when it went to find the SQL Server rows to delete.
The most reliable way to synchronize the two sides in preparation for Streams is to extract the rows to a comma separated
value file, and then use SQL*Loader to import the data to Oracle. Below is an example of using Microsoft DTS to generate
the CSV file, and beside that proof that the CSV file contains all required digits of precision:
Figure 22 : CSV file produced by DTS contains full precision
Figure 21 : Microsoft DTS used to extract seed data to CSV
18 Session 387
20. Database
CONCLUSION
Committing to a new Data Warehouse technology is a difficult decision for an organization to make. The effort in executing
the migration is costly in terms of time and resources. Remember to respect these facts when making your case to
management. Remain confident in your recommendations and plan, but unfold these in a paced fashion that allows you time
to build your message and allows those around you the space to come to terms with the requirements.
While migration tools such as Oracle Migration Workbench can help with the migration of certain Data Warehouse assets, the
bigger challenge comes in executing a seamless migration over a period of time. Focus on your strategy to enable a new
Oracle Data Warehouse while maintaining reliable service to the legacy over your parallel period.
Employ tools such as Oracle Transparent Gateway or Oracle Heterogeneous Streams to enable your migration strategy, but
be prepared to weather the storm. Because these products are more niche than the core features of the Oracle Database,
limitations and product bugs will surface along the way.
Finally, old habits will be hard to break for your developers and business users. Be sure to consider the standards, metadata,
education, and mentoring that your consumers will require in order to make your new Oracle Data Warehouse deployment an
overwhelming success.
20 Session 387