SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
Loading a large volume of Master Data Management
data quickly: Using MDM Server maintenance services
batch
Neeraj Singh (sneeraj@in.ibm.com)
Advisory Software Engineer
IBM  

14 August 2009

Yongli An (yongli@ca.ibm.com)
MDM Performance Manager
IBM
The maintenance services for IBM InfoSphere™ Master Data Management Server solution
address the needs of clients in the first phase of implementing initial load solutions. Using
MDM, clients need to perform initial and delta loads, typically as a batch. This article focuses
on the maintenance transaction approach to perform initial loads, including an introduction,
installation, and setup. It also covers performance tuning tips and best practices. You can
leverage recommendations in this article as guidance in your own MDM Server initial load
solutions using maintenance services.
View more content in this series

Introduction
IBM InfoSphere Master Data Management Server (MDM Server) is an enterprise application that
helps companies gain control of business information by enabling them to manage and maintain a
complete and accurate view of their master data. MDM Server provides a unified operational view
of their customers, accounts, and products, and it provides an environment that processes updates
to and from multiple channels. It aligns these front office systems with multiple back office systems
in real time, providing a single source of truth for master data.
The maintenance services for IBM InfoSphere Master Data Management (MDM) Server solution
is built to address the needs of clients in the first phase of implementing initial load solutions. At
this stage, clients deploy InfoSphere MDM Server for master data management, when data is
loaded into the MDM Server repository but most data changes are still coming from existing legacy
systems. With MDM Server, the client performs initial and delta loads, typically in a batch. Initial
load is the original movement of data from source systems into the MDM Server repository when
© Copyright IBM Corporation 2009
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Trademarks
Page 1 of 34
developerWorks®

ibm.com/developerWorks/

the repository is empty. Delta loads are regular (such as daily) data updates from source systems
into InfoSphere MDM Server.
There are two different approaches to loading data into InfoSphere MDM Server in batch.
The maintenance service batch approach loads data into InfoSphere MDM Server using the
maintenance services invoked by the Batch Processor. Alternatively, data can be loaded directly
into the database using DataStage jobs.
This article shares an IBM team's experience performing case studies focusing on the
Maintenance Transaction approach using InfoSphere MDM Server version 8.0.1.
The article starts with an introduction to MDM Server Maintenance Transactions. Then it goes
on to cover the basic installation and setup steps of the MDM Server environment, including
DB2® database server, WebSphere® Application Server, InfoSphere MDM Server, MDM Server
Maintenance Transactions, and batch processor. The article covers a high-level summary of key
performance results based on internal case studies. It concludes with a list of performance tuning
tips and best practices to get optimal performance while doing initial data load. Using this article,
you can leverage the IBM team's experience, and you can use recommendations as guidance in
your own InfoSphere MDM Server initial load solutions.

Introducing the MDM Server service batch approach
The MDM Server service batch approach loads data into MDM Server using the maintenance
transactions batch processor invokes or using any other batch framework. Because MDM Server
services process the data during load, this approach provides the best level of business data
validation. You can use the same set of maintenance transactions for both initial and delta loads.
To create the setup that uses this option, you need to install InfoSphere MDM Server capable of
running maintenance transactions. You also need to prepare the input data in a format that the
Batch Processor can consume.

What are maintenance transactions?
InfoSphere MDM Server creates a unique internal identifier for each record or business entity that
serves as its internal key. The regular InfoSphere MDM Server services expect the internal key to
be provided as part of the update service request, to ensure that services can identify the correct
business entity in the database. However, when data flows into InfoSphere MDM Server directly
from external applications such as legacy systems, the internal key is not known, and often the
nature of the data change is also not known.
Maintenance transactions address this problem. These transactions do not require the internal
key as part of the input. They also do not require the external system to specify whether this entity
needs to be added or updated in InfoSphere MDM Server. Instead of the internal key, maintenance
transactions expect the business key as part of the input, which is the unique identifier of the
business entity in external applications. Maintenance transactions use the business key provided
in the load operation to locate the correct instance of the business entity in the database. If an
existing entity is found, it is updated using the appropriate transaction, such as updateParty. If no
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 2 of 34
ibm.com/developerWorks/

developerWorks®

existing entity is found, a new entity is created in InfoSphere MDM Server using the appropriate
transaction, such as addParty.
There are many types of maintenance transactions, including maintainParty,
maintainPersonName, and maintainContractPlus. For a complete
list of the transactions and more details about them, refer to the
MDMRapidDeploymentPackage_CompositeMaintenanceServices.pdf document, available as part
of the EntryLevelMDM patch.
Maintenance transactions are not part of default InfoSphere MDM Server 8.0.1 distribution and
installation. You need to obtain and install EntryLevelMDM patch to use these transactions.
Note: Maintenance transactions are part of default InfoSphere MDM Server 8.5 distribution. They
are provided with source code as part of the MDM Server Samples distribution archive. You need
to install them on top of an existing InfoSphere MDM Server 8.5 instance. See Resources for a link
to instructions. It's recommended that you get assets from the FTP site mentioned in the Get the
Installer section in this article to ensure you have the latest version.

Batch transaction processing
You can use maintenance transactions to load data using MDM Server Batch, or they can
be invoked as any other service exposed by MDM Server using the RMI or JMS messaging
mechanisms. This article focuses on the invocation batch method. InfoSphere MDM Server
provides two ways to perform batch transaction processing. You can use either the J2SE
Batch processor framework or the WebSphere Application Server eXtended Deployment batch
framework. This article focuses on the first option: the J2SE Batch Processor framework.
The J2SE Batch processor framework is a J2SE client application, and it is part of a default
InfoSphere MDM Server installation. The batch processor is a multi-threaded application that can
process large volumes of batch data. It can process multiple records from the same batch input
simultaneously, increasing the throughput. Additionally, you can run multiple instances of the batch
processor simultaneously, each one processing a separate batch input and pointing to the same
server or to different servers.
Each batch record in the batch input flows through the batch processor in the following sequence:
1. The reader consumer reads the record from the batch input. The submitter consumer sends it
to the request/response framework for parsing and processing.
2. The parser transforms the input request into one or more business objects.
3. After passing through business proxy, business processing and persistence logic are applied
to the business objects.
4. The application responses are sent to the constructor in order to construct the desired batch
output response.
5. The constructed response is returned to the batch processor.
6. The writer records the transaction outcome in the writer log, if necessary. For example,
FailedWriter logs any failed messages.
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 3 of 34
developerWorks®

ibm.com/developerWorks/

The batch processor is shipped with pre-built readers and writers that can be used as is. The
default reader expects the batch input is an XML data format where each line contains one XML
request. The default writer writes the response in the XML format. You can also use the InfoSphere
MDM Server batch processor to process batch files containing messages in SIF format.
If your input data is not in the format specified above, you need to convert them to the required
format, or use a customized reader and parser. It is possible to customize many of the components
of the Batch Processor, but customization is not within the scope of this article.

Understanding software and hardware requirements
The following is a typical system topology for InfoSphere MDM Server deployment using
QualityStage from Information Server for Standardization and Matching:
• Application Server and InfoSphere MDM Server are installed on one physical box or LPAR
with the correct CPU capacity (Server1). The number of CPUs depends on the overall
throughput requirements.
• The database server is installed on another physical box or LPAR (Server2) with wellequipped IO capacity.
• IIS Server should be installed either on the database server or on a third physical box or
LPAR (Server3) with adequate IO bandwidth.
• IIS Client is used to configure QS jobs, and it is installed on a Windows® computer.
To efficiently maximize the performance for the given configuration, follow the following general
guidelines:
• The ratio of the number of CPUs on InfoSphere MDM Server and DB server can range from
2:1 to 3:1. For example, if you have a database server with 4 CPUs, the recommended
number of CPUs on the MDM Server box is at least 8 CPUs in order to well-utilize the CPU
capacity on the database server.
• You should have 5 to 10 physical disk spindles available for each CPU on the database
server.
• The ratio of the number of CPUs on InfoSphere MDM Server and IIS server can range from
2:1 to 1:1. For example, if you have MDM Server with 8 CPUs, the recommended number of
CPUs on the IIS server box is between 4 and 8.
Note: You only need IIS server if you plan to use QualityStage for standardization and matching
(such as suspect processing). InfoSphere MDM Server default configuration does not use
QualityStage.

Exploring the example environment
This section briefly describes the example environment, including hardware and software
information, in each layer in the stack. It also describes the system topology used in the tests.

Software and hardware stack
• Server 1 (AppServer and InfoSphere MDM Server)
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 4 of 34
ibm.com/developerWorks/

developerWorks®

• Hardware
• Machine type: IBM 9116-561, PowerPC® POWER5™
• CPUs: 8 core Power5 with 16 threads, 1.5GHz , 64 bit
• Memory/IO: 32 GB RAM, 6 internal disks
• Software
• OS : AIX® Version 5300-06 (64 bit)
• WebSphere® Application Server ND 6.1.0.11 (32 bit)
• InfoSphere MDM Server 8.0.1 + EntryLevelMDM patch
• Server 2 (DB2® database Server)
• Hardware
• Machine type: IBM 9116-561, PowerPC POWER5
• CPUs: 8 core Power5 with 16 threads, 1.5GHz , 64 bit
• Memory/IO : 32 GB RAM, 6 internal disks + 40 external disks
• Software
• OS : AIX Version 5300-06 (64 bit)
• DB2® database server v9.5 (64 bit)
• Server 3 (Information Server)
• Hardware
• Machine type: IBM 9116-561, PowerPC POWER5
• CPUs: 8 core Power5 with 16 threads, 1.5GHz , 64 bit
• Memory/IO : 32 GB RAM, 6 internal disks
• Software
• OS : AIX Version 5300-06 (64 bit)
• IIS v8.0.1
• Server 4 (IIS Client - To configure QualityStage jobs, not needed while running the test)
• Hardware
• 32 bit x86 machine
• Software
• OS : Windows 2003 Server
• IIS client version 8.0.1 for Windows

System topology
For InfoSphere MDM Server to use QualityStage jobs for standardization and matching, you need
Server3 and Server4, as shown in Figure 1. For default standardization and matching algorithms
from InfoSphere MDM Server, Server1 and Server2 are sufficient.

Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 5 of 34
developerWorks®

ibm.com/developerWorks/

Figure 1. System topology

Installing the components
The purpose of this section is to show the high-level steps required to get the needed software
installed in the test environment. The steps focus on the Maintenance services-related steps, while
briefly mentioning the prerequisite software installation, including WebSphere® Application Server,
DB2 database server, InfoSphere MDM Server, and InfoSphere Information Server.

Installation prerequisites
The prerequisite installations include WebSphere Application Server, DB2 database server, and
InfoSphere Information Server. For installation instructions, see each product's Information Center
in Resources.
1. On Server1, install IBM WebSphere Application Server Network Deployment, Version 6.1, and
upgrade it with Fixpack 11.
2. On Server2, install DB2 Database Server, Version 9.5.
3. On Server3, install IIS Server, Version 8.0.1.
4. On Server4 (Windows machine), install IIS client.

InfoSphere MDM Server Installation
For InfoSphere MDM Server installation, see Resources for a link to the information center. You
can install it on a standalone WebSphere Application Server or on a WebSphere Application
Server cluster.

Installation of Entry Level MDM Server patch for maintenance services
Follow the steps in this section to apply the Entry Level MDM (ELMDM) Server patch, which
enables you to use maintenance transactions.
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 6 of 34
ibm.com/developerWorks/

developerWorks®

These instructions assume that you have already installed InfoSphere MDM Server and have
applied all the required fixpacks. These instructions are based on software stack mentioned in the
Test Environment section.
Step 1. Get the installer.
Maintenance transactions are not part of the default installation of MDM Server, and they need
to be installed separately. If you have a service agreement with IBM, you can get the installer
for maintenance transactions by logging into the Secure File Transfer site and finding https://
testcase.boulder.ibm.com/www/prot/MDM_RDP/?T. At the time of writing, the latest installable
package is https://testcase.boulder.ibm.com/www/prot/MDM_RDP/MDMServer801_RDP801/
ELMDM-20090407.tar.gz. Contact your IBM service representative if you need help getting this
package.
For more instructions, see the chapter titled Installing Rapid Deployment Package for
MDM Server Maintenance transactions and MDM Customizations in the document
MDMRapidDeploymentPackage_InstallGuide.pdf. You can find this document under the directory
Docs when you uncompress the installer.
Step 2. Make required backups before installing.
The installer makes changes to the InfoSphere MDM Server Database. As a precaution, you
might want to make a backup of this database before running the installer. The installer creates
backup copies of files that it changes. These files are named *.beforeELMDM. However, they
get overwritten during subsequent installer runs. So before you invoke the installer again for any
reason, ensure you have moved the previous set of files to a safe place.
The files modified by the installer are:
• MDM Server home directory installable .ear file. For example, /usr/IBM/MDM_801/
installableApps/MDM.ear
• A set of files in the <MDM_Instance>.ear directory under WebSphere Application Server. For
example, /opt/IBM/WebSphere/AppServer/profiles/AppSrv1/installedApps/myHostCell01/
MDM_801.ear/
Step 3. Prepare the installer.
Complete the following steps to prepare the installer.
a. Create a new base directory named setup.
b. Extract the installer (.tar.gz file) in this directory. It creates several directories, including one
named install.
c. Go to directory setup/install/DB2 database server.
d. Give execute permissions for all the scripts using the command chmod 755 *.sh
e. Connect to the InfoSphere MDM Server database and execute the SQL below. The schema
name is assumed to be mySchema.
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 7 of 34
developerWorks®

ibm.com/developerWorks/

Listing 1. SQL to execute
db2 "insert into mySchema.DataAssociation values
(25083715210700005,'a_name',current_timestamp,'a_description',null)"

Step 4. Customize a clustered environment.
This step is not required if your MDM Server is a standalone server. If you are installing ELMDM
on a Clustered MDM Server installation (MDM Server running on a cluster of WebSphere
Application Servers), make the following modifications in the scripts.
a. In setVariables.sh, add the line in Listing 2 at the beginning of the script. NAME_OF_SERVER
refers to the name of the WebSphere Application Server instance that is a member of the
cluster.

Listing 2. Added line
#add the line below
export SRV_NAME=NAME_OF_SERVER

b. In the scripts install_DisableHVL.sh, install_EnableHVL.sh, and install_ELPCustom.sh, make
the changes shown in Listing 3.

Listing 3. Changes to script files
#comment out the line below and replace with the new line as shown below
#$CURRENT/restartServer.sh $WAS_HOME $NODE_NAME $APP_NAME $ADMIN_USER $ADMIN_PASSWORD
#add the line below
$CURRENT/restartServer.sh $WAS_HOME $NODE_NAME $SRV_NAME $ADMIN_USER $ADMIN_PASSWORD

c. In the install_ELPTx.sh script, make the changes in Listing 4.

Listing 4. The install_EPLTx.sh script
#comment out the line below and replace with the new line as shown below
#$LOC/restartServer.sh $WAS_HOME $NODE_NAME $APP_NAME $ADMIN_USER $ADMIN_PASSWORD
#add the line below
$LOC/restartServer.sh $WAS_HOME $NODE_NAME $SRV_NAME $ADMIN_USER $ADMIN_PASSWORD

Step 5. Optionally modify the installer to help in debugging.
Complete the following steps to modify the installer to debug.
a. At the beginning of each script, add set -x
b. Add the verbose option to db2 calls by replacing all occurrences of db2 -tf with db2 -tvf in the
scripts below:
• runsql.sh
• install_ELPCustom.sh
• install_EnableHVL.sh
• install_DisableHVL.sh
Step 6. Set your environment variables
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 8 of 34
ibm.com/developerWorks/

developerWorks®

Modify the setVariables.sh script according to your environment. The values given in Listing 5 are
examples. Read the comments and instructions embedded within the example.

Listing 5. Extract from the setVariables.sh script
export WAS_HOME=/opt/IBM/WebSphere/AppServer
export CELL_NAME=myhostCell01
#set the profile name used by WAS running MDM Server. such as AppSrv01 and Custom01
export NODE_NAME=Custom01
export APP_NAME=MDM_801
#The Name of the WebSphere Application Server running MDM Server,
#You will have this only if you followed Step 4 above
export SRV_NAME=Cluster_member1
export INSTALL_HOME=/usr/IBM/MDM_801
# IIS Server Version: Could be 801 or 81
export IIS_SRV_VERSION=801
export
export
export
export
export
export

DB_NAME=MDMDB
DB_USER=myDBuser
DB_PASSWORD=myDBpassword
TABLE_SPACE=TABLESPACE1
INDEX_SPACE=INDEXSPACE1
LONG_SPACE=LONGSPACE1

export TRIG=COMPOUND
export DEL_TRIG=TRUE
export APPLICATION_NAME='WebSphere Customer Center'
export APPLICATION_VERSION=8.0.1.0
export DEPLOY_NAME=MDM_801
#You need to set this only if you are integrating QualityStage with MDM Server.
#Please note the back slashes. The number 2809 here refers to the
#bootstrap port of WebSphere Application Server instance running IIS server.
export ISP_URL='iiop://myIISserver.mylab.ibm.com:2809'

Step 7. Execute the scripts.
a. Execute install_ELPTx.sh.
b. If you are integrating InfoSphere MDM Server with QualityStage, run the
install_ELPCustom.sh script as well.
Step 8. Check for errors.
Go through all the log files to ensure there are no errors.
Step 9. Repeat steps for a clustered environment.
If you are installing in a clustered environment, complete the steps below for each cluster member.
a. Reconfigure setVariables.sh to point to another cluster member.
b. Run the additionalClusterInstall.sh script.
c. If you are integrating InfoSphere MDM Server with QualityStage, run the
install_ELPCustom.sh script.
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 9 of 34
developerWorks®

ibm.com/developerWorks/

Note: As part of the install_ELPCustom.sh script, there are changes made to InfoSphere MDM
Server database. Some of these changes cannot be executed more than once (such as a DB
insert). Either ignore these errors during repeated execution of this script, or alter the script so that
it does not attempt to repeat the database operations.
Step 10. Configure the SIF parser.
Complete this step only if you want to use a SIF parser. Otherwise, skip to Step 11. The example
uses the default XML parser. To configure the batch processor to use the SIF parser, modify the
following:
a. In the DWLCommon_extention.property file, which is in properties.jar on server runtime
environment, set sif_compatibility_mode = on.
b. In batch extension property file, set ParserAndExecConfiguration.Parser = SIF.
For more details, see the section SIF Parser in
MDMRapidDeploymentPackage_CompositeMaintenanceServices.pdf.
Step 11. Restart the InfoSphere MDM Server.
Restart the InfoSphere MDM Server, including all the servers in a cluster.

Integration of InfoSphere MDM Server with QualityStage
If you want to use default standardization and matching algorithms from InfoSphere MDM
Server, these steps are not needed, and you can continue to Optimizing performance with key
configuration parameters. However, if you want InfoSphere MDM Server to use QualityStage for
standardization and matching, this section describes how to configure them.
These instructions assume the following:
• InfoSphere MDM Server is installed and all the required fixpacks are applied.
• EntryLevelMDM is installed.
• The IIS server and IIS client are installed. The version of the IIS client must be the same as
that of the IIS server.
• The software stack is similar to that described in the Software and hardware stack section of
the example environment.
See Resources to access the documentation for InfoSphere MDM Server and QS integration
(MDM Server Developers Guide, chapter titled Integrating IBM Information Server QualityStage
with IBM InfoSphere Master Data Management Server). The instructions in this article complement
those mentioned in the developer's guide. However, there are a few configuration changes
mentioned in this article that are helpful during the installation.
Step 1. Change security settings.

Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 10 of 34
ibm.com/developerWorks/

developerWorks®

If global security is enabled on the WebSphere Application Server running IIS, the transaction
protocol security on that server must be disabled. To disable protocol security on a server,
complete the following steps in the administrative console:
a. In the administrative console, click Servers > Application Servers > server_name. The
properties of the application server are displayed in the content pane.
b. Under Container Settings, expand Container Services and click Transaction Service to
display the properties page for the transaction service.
c. Under Additional Properties, click Custom Properties.
d. On the Custom Properties page, click New.
e. Type DISABLE_PROTOCOL_SECURITY in the Name field, and type TRUE in the Value
field.
f. Click Apply or OK.
g. Click Save to save your changes to the master configuration.
h. Restart the server.
Optionally, if WebSphere Application Server application security is turned on for InfoSphere MDM
Server, the LTPA keys need to be shared between the MDM WebSphere Application Server cell
and the IIS WebSphere Application Server cell. For detailed instructions, refer to the WebSphere
Application Server Information Center (see Resources).
Step 2. Get the installer.
The installable components are part of the same bundle that you used while installing maintenance
services. You will find them in the QualityStage folder.
Step 3. Create the IIS project.
Use the IIS Administrator Client to connect to the IIS server. Create a new project called
ELMDMQS.
Step 4. Import the IIS project.
1. Log into the ELMDMQS project through the DataStage and QualityStage Designer.
2. Click Import > Datastage Components.
3. Browse to the ELMDMQS.dsx file under the EntryLevelMDMQualityStage folder you
extracted above.
4. Import the file.
Step 5. Provision imported rule sets.
You need to provision imported rule sets to the designer client before a job that uses them can be
compiled. Complete the following steps to provision imported rule sets.
a. In the Designer client, find the rule set within the repository tree ELMDMQS > ELMDMRT >
Standardization Rules > MDMQS.
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 11 of 34
developerWorks®

ibm.com/developerWorks/

b. Select the rule set by right-clicking and selecting Provision All from the menu, as shown in
Figure 2.

Figure 2. Provisioning rule sets

c. Repeat the steps for all the rulesets listed below.
• MDMQSStandardization RulesMDMCanadaCAADDRMDMCAADDR
• MDMQSStandardization RulesMDMCanadaCAAREAMDMCAAREA
• MDMQSStandardization RulesMDMUSAUSADDRMDMUSADDR
• MDMQSStandardization RulesMDMUSAUSAREAMDMUSAREA
• MDMQSStandardization RulesMNADKEYSMNADKEYS
• MDMQSStandardization RulesMNNAMEMNNAME
• MDMQSStandardization RulesMNNMKEYS
• MDMQSStandardization RulesMNPHONEMNPHONE
• MDMQSStandardization RulesMNSPOSTMNSPOST
Step 6. Prepare test data and configure parameters
a. Copy the provided test data (*.csv files and *.txt) into a directory on your IIS server (not the IIS
client) called /data01/ELMDMQS.
b. Open the parameter set ELMDMQS_Data_Directory under ELMDMQSELMDMRTParameter
Sets (in the Repository view of the designer).
c. Double-click on the Parameter set.
d. Go to the Values tab and set the value of the parameter DATADIR to the directory path into
which you just copied the test data (/data01/ELMDMQS/ in this example), as shown in Figure
3. Note the slash (/) at the end of the parameter value.

Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 12 of 34
ibm.com/developerWorks/

developerWorks®

Figure 3. Parameter set

e. Under the ELMDMQSELMDMRTShared Containers folder, double-click to open the shared
container MDMQSPartySuspectReferenceMatchOrganization.
f. Set the file paths of data set stages Data_Frequency and Reference_Frequency to the same
path that you provided for ELMDMQS_Data_Directory.DATADIR to in the previous step, as
shown in Figure 4.

Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 13 of 34
developerWorks®

ibm.com/developerWorks/

Figure 4. Edit input file path

g. Click OK to save the changes.
h. Close the stage, clicking Yes when it prompts you to save the changes in the stage.
i. Repeat the above steps for MDMQSPartySuspectReferenceMatchPerson.
Step 7. Compile the jobs.
a. Compile all the jobs inside the ELMDMQSELMDMRTJobs folder and its subfolders using
Tool > Multiple Job compile from the designer client's menu.
b. Follow the instructions in the wizard, and start compiling.
Note: Batch versions of jobs can be found in the ELMDMQSELMDMRTJobs folder. Information
Service Director (ISD) versions of these jobs can be found in the ELMDMQSELMDMRTJobsISD
folder.
Step 8. Generate match frequency data
a. Use the director client to run the job ELMDMQSELMDMRTJobs
MDMQS_Person_Match_Frequency_Generation to generate the match frequency
data. When completed, it generates files PersonRefMatchTransFreq.txt and
PersonRefMatchCandFreq.txt, as shown in Figure 5.
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 14 of 34
ibm.com/developerWorks/

developerWorks®

Figure 5. Generating match frequency data

b. Similarly, run ELMDMQSELMDMRTJobsMDMQS_Org_Match_Frequency_Generation to
generate files OrgRefMatchTransFreq.txt and OrgRefMatchCandFreq.txt
Step 9. Run the test jobs.
a. Use the director client to run the following batch jobs to test that they execute successfully on
your system before you use the ISD jobs:
• All jobs in ELMDMQSELMDMRTStandardization Testing
• All the Jobs in ELMDMQSELMDMRTMatch Testing
b. After running the jobs, view the output in the Sequential file to check the result
Step 10. Deploy services using ISD
a. Log on to the IBM Information Server (IIS) console.
b. Click File > Import Information Services Project > Browse for the file
ELMDMQS_ISDProject.xml in the EntryLevelMDMQualityStage directory.
c. Keep all the default settings, and click Import.
d. Open the Information Service Application (ELMDMQS) contained in the imported project.
e. Click Develop, as shown in Figure 6.
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 15 of 34
developerWorks®

ibm.com/developerWorks/

Figure 6. Selecting the Develop icon

f. Click Information Services Application.
g. On the resulting screen, double-click the ELMDMQS application to open it.
h. Go into Edit mode.
i. In the Select a View window, click Services > ELMDMQSService, as shown in Figure 7.

Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 16 of 34
ibm.com/developerWorks/

developerWorks®

Figure 7. Configuring jobs using ISD

j. In the expanded tree, select Operations, and double-click the operations one at a time to edit
each of them.

Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 17 of 34
developerWorks®

ibm.com/developerWorks/

Figure 8. Checking the project name

k. Edit each of the operations as follows:
i. Ensure that the project name is correct, as shown in Box 1 in Figure 8. When you
created the new project using the administration client, if you chose ELMDMQS as the
name of the project, you can keep the defaults. If you specified another name, ensure
that the project name and the job names are correct. To check the project and job
names, click the Edit button, and browse to the project and job in the ISD folder.
ii. Ensure that the Group Arguments into Structure option is enabled for inputs, as shown in
Box 2 in Figure 8.
iii. Change the input data type according to Table 1 below, as shown in Box 3 in Figure 8.
iv. Check or uncheck the Accept array checkboxes according to Table 1, as shown in Box 4
in Figure 8 (the checkbox should show a checkmark if the table entry indicates Yes).
v. Check or uncheck the output data type and Accept array checkboxes on the output tab
according to Table 1.

Table 1. ISD job configuration
Operation name
standardizeAddress

Operation job name

Inputs accept array

ISD_MDMQS_Address_Standardization
No

Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Input data type
AddressInput

Outputs return array
No

Output data type
AddressOutput

Page 18 of 34
ibm.com/developerWorks/

developerWorks®

elPersonMatch

ISD_MDMQS_Party_Suspect_Reference_Match_Person
Yes
ELPersonMatchInput

Yes

ELPersonMatchOutput

elOrgMatch

ISD_MDMQS_Party_Suspect_Reference_Match_Org
Yes
ELOrgMatchInput

Yes

ELOrgMatchOutput

standardizePhoneNumber
ISD_MDMQS_Phone_Standardization
No

PhoneNumberInput

No

PhoneNumberOutput

standardizeOrgName

OrgNameInput

No

OrgNameOutput

PersonNameInput

No

PersonNameOutput

ISD_MDMQS_Organization_Standardization
No

standardizePersonNameISD_MDMQS_Person_Standardization
No

l. On the Provider Properties tab, modify the credentials according to your setup, as shown in
Figure 9.

Figure 9. Modifying your credentials

m. Save and close the application.
n. Deploy the application by clicking on the Develop menu. Figure 10 shows an example. Note
the highlighted box that shows Select the Application ELMDMQS.

Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 19 of 34
developerWorks®

ibm.com/developerWorks/

Figure 10. Deploying the application

o. Click Deploy, as shown in the Figure 10.
p. Leave the defaults, and click Deploy to start the deployment.
Step 11. Set configuration values for QualityStage.
Note: This example integration is being done for an InfoSphere MDM Server installation on
which maintenance services are installed. During the installation of maintenance services, if you
ran install_ELPCustom.sh then you can skip to Optimizing performance with key configuration
parameters.
Set the configuration values according to Table 2 in order to properly communicate with the IIS-QS
server.

Table 2. Configuration modifications
Configuration name

Default value

/IBM/ThirdPartyAdapters/IIS/defaultCountry

185

/IBM/ThirdPartyAdapters/IIS/initialContextFactory

This configuration element is used in conjunction with the provider URL
to use JNDI registry initial context. A typical value for this element is
com.ibm.websphere.naming.WsnInitialContextFactory.

/IBM/ThirdPartyAdapters/IIS/providerURL

iiop://<yourQSServer>:<QSServerBootstrapPort>. For example: iiop://
myIIS.torolab.ibm.com:2809.

Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 20 of 34
ibm.com/developerWorks/

developerWorks®

/IBM/Party/Standardizer/Name/className

com.ibm.mdm.thirdparty.integration.iis8.adapter.InfoServerStandardizerAdapter

/IBM/Party/Standardizer/Address/className

com.ibm.mdm.thirdparty.integration.iis8.adapter.InfoServerStandardizerAdapter

Step 12: Use QualityStage (QS) name and address standardization.
Use QS to standardize names and addresses that are entered into InfoSphere MDM Server. See
Standardizing name, address and phone number information in the MDM developer's guide (see
Resources) for more information.
Step 13: Using QualityStage in suspect duplicate processing.
QualityStage can be used with the InfoSphere MDM Server Suspect Duplicate Processing (SDP)
feature. See Configuring IBM Information Server QualityStage integration for SDP in the MDM
developer's guide (see Resources) for more information on using QualityStage with SDP.

Optimizing performance with key configuration parameters
After you install the InfoSphere MDM Server, tune the key configuration parameters for optimal
performance.

InfoSphere MDM Server and batch processor configuration
1. Increase the number of submitters to increase parallelism. Do this by editing the file
<MDM_installation_Folder>/BatchProcessor/properties/Batch.properties. On an 8-way MDM
Server box, 24 submitters are optimal.
2. Increase JVM heap settings for the batch processor. Do this by editing the file
<MDM_installation_Folder>/BatchProcessor/bin/runbatch.sh. For example: for 24 submitters,
512MB of heap is sufficient.
3. Reduce BatchProcessor logging by setting the threshold to ERROR. Do this by editing
<MDM_installation_Folder>/BatchProcessor/Log4J.properties and setting the logging
threshold to ERROR, if it is not already. For example: log4j.appender.file.Threshold=ERROR.
4. Reduce MDM Server logging by setting the threshold to ERROR. Do this by editing
Log4J.properties inside the properties.jar file at <WebSphere_Location>/profiles/
<ServerName>/installedApps/<CellName>/<InstanceName>/properties.jar.

WebSphere Application Server configuration
1. Increase the JDBC connection pool size to support the parallelism.
a. From the WebSphere Administration Console, go to Resources >JDBC > Data sources
> DWLCustomer > Connection pool properties
b. Increase the value for Maximum connections. The example setup uses 50.
2. Increase the prepared statement cache size.
a. The size of the prepared statement cache depends on the number of unique SQL
statements used in your application. For InfoSphere MDM Server, set it to 300 and
monitor the application to determine if the cache size needs to be increased.
b. It can be changed from the WebSphere Administration Console. Go to Resources
> JDBC > Data sources > DWLCustomer > Connection pools > WebSphere
Application Server data source properties.
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 21 of 34
developerWorks®

ibm.com/developerWorks/

3. Increase the EJB cache size. Do this by using the WebSphere Administration Console to go
to Servers > Application servers > [ServerName] > EJB Container Settings > EJB cache
settings. The example uses 4000.
4. Change the JVM heap size and GC policy.
a. From the WebSphere Administration Console, go to Servers > Application servers >
[ServerName] > Java and Process Management > Process Definition > Java Virtual
Machine.
b. Indicate the initial heap size as 512 MB and the maximum heap size as 1024 MB.
c. Use gencon GC policy for better performance. To use this GC policy, specify Xgcpolicy:gencon under Generic JVM arguments. While testing the example using the
gencon GC policy, sometimes WebSphere Application Server generates unnecessary
heapdumps. To disable this behavior, do the following after the server is started:
i. From the WebSphere Administration Console, go to Servers > Application
servers > [ServerName] > Performance > Performance and Diagnostic Advisor
Configuration > Runtime (tab).
ii. Uncheck the check box (ensure the checkbox is empty) for Enable automatic heap
dump collection.

Database tuning (DB2)
It is recommended to follow best practices and recommendations to set up a database server. It
is also recommended to closely monitor your database performance and to tune your database
as needed for optimal performance and productive resource usage. This section briefly describes
several recommendations on configuring and tuning a DB2 database. The basic concepts also
apply to other types of databases.
• Typically it is recommended that you use one set of dedicated disks for DB2 transaction logs
and you use another set of dedicated disks for DB2 table spaces. If possible, it is even better
to use different disk controllers for DB2 transaction logs and DB2 table spaces, because this
gives you the flexibility to configure the disk controllers independently for different I/O patterns
to favor writes instead of a mix of writes and reads.
• Ensure read and write cache is enabled on the storage system. Monitor the cache
effectiveness, and configure the cache size properly.
• Properly plan the table spaces to ensure balanced I/O operations across all of the available
disks. This avoids hot spots in your database and avoids limiting your overall database
performance to the bandwidth of a few of the busiest disks. This maximizes the utilization of
all the I/O bandwidth available from all the physical disks.
• In addition to a well-planned table space layout over the I/O system, one of the biggest
configuration parameters that affects performance dramatically is the database buffer pool
size. Pay close attention to the overall buffer pool hit ratio, which tells how often it needs to
go to the physical disks (which is very expensive) for the needed data that is found in the
database buffer pools.
• Strive for a buffer pool hit ratio of 80% or higher for data, and 90% or higher for indexes.
Typically in MDM Server implementations, start with one big buffer pool for both data and
indexes. If necessary, separate data and indexes into two different buffer pools to help ensure
a good index buffer pool hit ratio.
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 22 of 34
ibm.com/developerWorks/

developerWorks®

• Because an MDM Server enables a good amount of customization and extension, analyze
the most expensive SQLs from the database snapshot or other tools. Ensure that those SQLs
have optimal access plans with the best indexes in place.
Those recommendations should be considered together to achieve what you need for
performance, because the behavior of one area might be just a symptom of another incorrectly
configured or misbehaving area.

Understanding performance test methodology used in the example
Input data preparation
The maintainContractPlus transaction was used for testing the example. Because the default
parser from the BatchProcessor was used, the input data format had to be LineFeed delimited
XML transactions.
The first step toward getting the input data set was to create seed-data. The seed-data was
generated using a home-grown, Java-based tool with key distributions based on U.S. Census
data (2000). Some realistic data was added to make the overall parties closely match a typical
MDM business scenario. The seed-data contained details such as name, gender, date of birth,
addresses.
As a second step, a template for maintainContractPlus transaction was created. This template had
variables for key party details that needed to be filled in with generated seed-data. Another homegrown, Java-based tool was used to generate the XML transactions. One such transaction yielded
one person with one name, one address, one contract, and one contact method. Table 3 shows
the detailed profile of database tables populated by a single transaction. The example run used a
total of one million such records as one input data set, representing one party and its associated
attributes.

Suspect duplicate data preparation
The data generated in the example so far was primarily clean. A similar approach was used
to generate dirty data, which included 40% duplicates. This data set was used when Suspect
Duplicate Processing was turned on.
During the initial load, the input data might have duplicate entries, where details from one record
closely resemble those from another one. Such records are termed as suspect duplicates.
Depending on how closely two records match, suspect duplicates are assigned a match category.
To determine the match category, some critical data fields are used while comparing two records.
The critical data fields include first name, last name, address, date of birth, gender, and social
security number. Based on comparison results, the suspect duplicates are assigned a matchscore and a non-match-score, and then the match category is derived. Depending on the match
category, InfoSphere MDM Server takes appropriate actions for the suspect duplicates.
When testing the example, two sets of data were used:
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 23 of 34
developerWorks®

ibm.com/developerWorks/

• 100% clean data with no suspect duplicates in the input data set
• 60% clean data with 40% of the records as suspect duplicates.
The example test included 4 types of suspect duplicates in the 60% clean data set. Population
of each type of suspect duplicate was kept equal, and they were randomly distributed in the data
using home-grown, Java-based tools.
The details of this data set are shown in Table 3.

Table 3. Details of input data with suspects
sr#

Matching critical
data details

Non-matching
critical data details

Population

Weight (match/
non-match score)

Match category

1

Gender, FirstName,
LastName, Address,
DOB, SSN

None

10%

63/0

A1

2

Gender, FirstName,
LastName, DOB,SSN

Address

10%

60/3

A2

3

Gender, Address,
DOB, SSN

FirstName, LastName

10%

55/4

A2

4

Gender, Address, Last First Name (and SSN
Name DOB
field is empty)

10%

46/1

B

The scores and categories in the Table 3 are calculated by InfoSphere MDM Server's deterministic
matching approach, which is the default implementation for party-matching. In contrast,
QualityStage matching offers a probabilistic matching approach, and it calculates only one
composite weight.

Data profile
Table 4 shows the population of InfoSphere MDM Server database tables when the two sets of
input data are loaded.

Table 4. Database population
Table name

100% clean data

60% clean data

ADDRESS

1,000,000

700,000

ADDRESSGROUP

1,000,000

900,000

CONTACT

1,000,000

900,000

CONTACTMETHOD

1,000,000

900,000

CONTACTMETHODGROUP

1,000,000

900,000

CONTEQUIV

1,000,000

1,000,000

CONTRACT

1,000,000

1,000,000

CONTRACTCOMPONENT

1,000,000

1,000,000

CONTRACTROLE

1,000,000

1,000,000

IDENTIFIER

1,000,000

900,000

Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 24 of 34
ibm.com/developerWorks/

developerWorks®

LOBREL

1,000,000

900,000

LOCATIONGROUP

2,000,000

1,800,000

MISCVALUE

1,000,000

1,000,000

PERSON

1,000,000

900,000

PERSONNAME

1,000,000

900,000

PERSONSEARCH

1,000,000

900,000

SUSPECT

0

300,000

Test methodology
Different tests were performed to check stability and scalability and to measure the overhead
associated with several commonly used features. All the tests were conducted in two solution
configurations:
• The MDM Server only solution, where InfoSphere MDM Server uses its own algorithm for
standardization and matching. In this case, IBM Information Server is not required.
• MDM Server + QS solution, where InfoSphere MDM Server uses QualityStage to do the
standardization and matching.
The methodology for all these tests was similar:
1. Set up the systems. Do the configuration and tuning of various components as mentioned in
previous sections.
2. Prepare a set of input data with 10000 records using the approach mentioned.
3. Load the input data with 10000 records using 1 submitter in the batch processor. This is done
to avoid deadlocks while working with an empty database.
4. Perform DB2 reorgchk on all the tables to update statistics.
5. Create a backup of the MDM Server database at this stage, and use it is as the starting point
for all the tests.
The following steps were used to run the example test:
1. Restore the database using the backup copy.
2. Change the database configuration if required for the test. For example, you may want to
switch OFF Suspect Duplicate Processing.
3. Restart WebSphere Application Server running InfoSphere MDM Server.
4. Run data collection scripts in the background, which collect CPU statistics, IO statistics, and
database snapshots.
5. Start the test to load the selected input dataset.
6. Collect the logs from InfoSphere MDM Server, WebSphere Application Server, and DB2
database server.
7. Derive response time and throughput from transactiondata.log as generated by InfoSphere
MDM Server.

Measuring performance results
This section describes the performance measurements including the following:
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 25 of 34
developerWorks®

ibm.com/developerWorks/

• Results showing very stable performance throughput and response time
• Performance overhead of some commonly used features in the context of initial data loading
• Scalability of throughput

Test 1: Stability of throughput and response time
The purpose of this test is to show whether the throughput and response times remain stable as
the loading progresses and as the database size increases. This test also measures the system
resource usage pattern along the test. The data for throughput and response time is derived from
transactiondata.log, as generated by InfoSphere MDM Server.
Various tests were conducted for both MDM Server only and MDM Server + QS scenarios, and all
of them showed good stability. Table 5 shows the configuration settings for the first test.

Table 5. Test 1 configuration
Parameter

Value

Hardware/Software stack

As described in example test environment

InfoSphere MDM Server heap size

Initial : 512MB; Max 1024MB

InfoSphere MDM Server JVM GC policy

gencon

Number of submitters in batch processor

24

Batch processor JVM memory

512MB

ISD job configurations (applicable to MDM
Server + QS scenario only)

Default

Type of transaction used

MaintainContractPlus

Total volume

1 million parties and their associated records

Input data quality

60% clean
40% suspected duplicates of various types

Name standardization

ON (default)

Address standardization

ON (StandardFormatingIndicator to N in the
requestXML)

Suspect duplicate processing

ON

History triggers

Enabled

Test 1 results: Stability results
Figure 11 shows the throughput and response times captured for the MDM Server only scenario.
The chart shows that throughput and response time are stable during the whole run duration. The
results for the MDM Server + QS scenario are similar.

Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 26 of 34
ibm.com/developerWorks/

developerWorks®

Figure 11. Throughput and response time

Figure 12 shows that by configuring a sufficient number of submitters to the required number,
almost all CPU resources on WebSphere Application Server running InfoSphere MDM Server can
be used, and the system does not have any other bottlenecks. Figure 10 also shows the resource
usage on other systems.

Figure 12. Resource usage

Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 27 of 34
developerWorks®

ibm.com/developerWorks/

Test 2: Feature overheads
The purpose of the tests is to measure the overhead of four commonly used features of
InfoSphere MDM Server. Under this series of tests, the overhead of the following were measured:
•
•
•
•

Name standardization
Address standardization
Suspect duplicate processing
History triggers

Overhead is expressed as a percentage reduction in throughput per unit of time when the
feature is enabled. For example, 5% overhead associated with a particular feature means that if
throughput was 100 transactions per second (TPS), it becomes 95 TPS due to overhead when the
feature is enabled. Throughput is measured as total data volume loaded / total time taken.
Various tests were conducted for both MDM Server only and MDM Server + QS scenarios,
enabling one or more features at a time. In the MDM Server + QS scenario, the overheads of
standardization and suspect duplicate processing should be higher because they involve extra
processing by QualityStage.
Table 6 shows the configuration settings for the second test.

Table 6. Test 2 configuration
Parameter

Value

Hardware/Software stack

As described in example test environment

InfoSphere MDM Server heap size

Initial: 512MB ; Max 1024MB

InfoSphere MDM Server JVM GC policy

Default

Number of submitters in batch processor

24

Batch processor JVM memory

512MB

ISD job configurations (applicable to MDM Server + QS scenario only)

Default

Type of transaction used

MaintainContractPlus

Total volume

1 million parties and their associated records

Input data quality

a) 100% clean; b) 60% clean

Following are some notes about the configuration:
• Name standardization was turned ON or OFF by setting /IBM/Party/
ExcludePartyNameStandardization/enabled to FALSE or TRUE, respectively.
• Address standardization was effectively switched ON or OFF by setting
StandardFormatingIndicator to N/Y in the transaction request XMLs.
• Suspect duplicate processing was switched ON or OFF by setting the following to TRUE or
FALSE respectively in the configuration table:
• /IBM/Party/SuspectProcessing/enabled
• /IBM/Party/SuspectProcessing/AddParty/returnSuspect
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 28 of 34
ibm.com/developerWorks/

developerWorks®

Test 2 results: Feature overheads
Standardization
The following table shows the overhead of standardization only for the MDM Server only scenario.
Tests were conducted with both datasets (100% clean and 60% clean) when suspect duplicate
processing was switched ON. History triggers were enabled during these tests.

Table 7. Overhead of standardization
Overhead

SDP OFF

SDP ON (100% clean)

SDP ON (60% clean)

Overhead of name standardization 2%

3%

3%

Overhead of address
standardization

2%

2%

0%

Overhead of name and address
standardization

4%

3%

2%

Note: With 60% clean data, there are fewer unique addresses. This can result in less overhead.
Suspect duplicate processing
Table 8 shows the overhead of suspect duplicate processing with and without standardization in
the MDM Server only scenario. Tests were conducted with both datasets (100% clean and 60%
clean). History triggers were enabled during these tests.

Table 8. Overhead of suspect duplicate processing
Overhead

100% clean data

60% clean data

Overhead of suspect duplicate processing

3%

20%

Overhead of suspect duplicate processing
along with name and address standardization

6%

21%

History triggers
If history triggers are enabled, the IO requirement on the database server increases significantly
(nearly doubles). With enough IO bandwidth provided, the overhead is small (approximately 5%).

Test 3: Scalability tests
By definition, scalability is a measure of how well the throughput increases when more load is
put on the system. However, for the example test, the number of processors did not actually vary.
Instead, the number of parallel requests to the InfoSphere MDM Server were changed by varying
the number of submitters in the batch processor. Data points were collected between 1 submitter
and 24 submitters, at which point the system was clearly saturated.
The test was conducted for both the MDM Server only and the MDM Server + QS scenarios. Tests
were conducted in different configurations, and all of them showed near linear scalability.
Table 9 shows the configuration settings for the third test.
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 29 of 34
developerWorks®

ibm.com/developerWorks/

Table 9. Test 3 configuration
Parameter

Value

Hardware/Software stack

As described in example test environment

InfoSphere MDM Server heap size

Initial: 512MB; Max 1024MB

InfoSphere MDM Server JVM GC policy

Default

Number of submitters in batch processor

Varied between 1 to 24

Batch processor JVM memory

512MB

ISD job configurations (applicable to the MDM Server + QS scenario
only)

Default

Type of transaction used

MaintainContractPlus

Total volume

15000 to 100,000 records

Input data quality

60% clean

Name standardization

ON (default)

Address standardization

ON (StandardFormatingIndicator to N in the requestXML)

Suspect duplicate processing

ON

History triggers

Enabled

Test 3 results: Scalability results
Figure 13 shows the scalability for the MDM Server only scenario. As shown by green line, the
throughput increases almost linearly with an increase in the number of submitters. The example
configuration utilized more than 90% of CPU capacity on the server running InfoSphere MDM
Server. The results for MDM Server + QS are similar.

Figure 13. Scalability of InfoSphere MDM Server with SDP ON

Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 30 of 34
ibm.com/developerWorks/

developerWorks®

Conclusion
Designed to provide flexibility in its deployments, developed on leading technology, and offering
unmatched performance and scalability, InfoSphere Master Data Management Server has
been the leading choice for a large number of organizations across a range of industries when
implementing their MDM solutions. As the leader, IBM has the largest number of successfully
deployed MDM implementations in the market today.
This article explained what maintenance services are and how to set up maintenance services in
an InfoSphere MDM Server environment. You saw enough details about configuration and tuning
tips so you can follow and get maintenance service batch up and running with high performance.
This article also covers the steps for setting up Information Server QualityStage for standardization
and matching, if such configuration is required. Some key performance data points from various
common scenarios are described, and they show that maintenance services, when being used
for initial load, provides sustainable high performance and excellent scalability. Finally, this article
summarized performance overhead measurements of some key features commonly used in MDM
Server implementations. You might find them useful for capacity planning an MDM Server system
based on the chosen features and for ensuring the required performance during initial load.

Acknowledgments
We would like to thank Lena Woolf, Berni Schiefer, and Karen Chouinard for their input and
suggestions. We would also like to thank the other MDM Server team members for their support
during this project.

Notices
©IBM Corporation 2009. All Rights Reserved.
THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED FOR INFORMATIONAL
PURPOSES ONLY. ALTHOUGH EFFORTS WERE MADE TO VERIFY THE COMPLETENESS
AND ACCURACY OF THE INFORMATION CONTAINED IN THIS DOCUMENT, IT IS PROVIDED
“AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS
INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH
ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR
OTHERWISE RELATED TO, THIS DOCUMENT OR ANY OTHER DOCUMENTATION. NOTHING
CONTAINED IN THIS DOCUMENT IS INTENDED TO, OR SHALL HAVE THE EFFECT OF
CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS
OR THEIR SUPPLIERS AND/OR LICENSORS); OR ALTERING THE TERMS AND CONDITIONS
OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE.
All performance data contained in this publication was obtained in the specific operating
environment and under the conditions described above and is presented as an illustration only.
Performance obtained in other operating environments may vary and customers should conduct
their own testing.
Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 31 of 34
developerWorks®

ibm.com/developerWorks/

Performance is based on measurements and projections using standard IBM benchmarks in
a controlled environment. The actual throughput or performance that any user will experience
will vary depending upon many factors, including considerations such as the amount of
multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and
the workload processed. Therefore, no assurance can be given that an individual user will achieve
results similar to those stated here.
The information in this document concerning non-IBM products was obtained from the supplier(s)
of those products. IBM has not tested such products and cannot confirm the accuracy of the
performance, compatibility or any other claims related to non-IBM products. Questions about the
capabilities of non-IBM products should be addressed to the supplier(s) of those products.
The information contained in this publication is provided for informational purposes only. While
efforts were made to verify the completeness and accuracy of the information contained in this
publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this
information is based on IBM’s current product plans and strategy, which are subject to change
by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or
otherwise related to, this publication or any other materials. Nothing contained in this publication
is intended to, nor shall have the effect of, creating any warranties or representations from IBM or
its suppliers or licensors, or altering the terms and conditions of the applicable license agreement
governing the use of IBM software.
References in this publication to IBM products, programs, or services do not imply that they will
be available in all countries in which IBM operates. Product release dates and/or capabilities
referenced in this presentation may change at any time at IBM’s sole discretion based on market
opportunities or other factors, and are not intended to be a commitment to future product or feature
availability in any way. Nothing contained in these materials is intended to, nor shall have the effect
of, stating or implying that any activities undertaken by you will result in any specific sales, revenue
growth, savings or other results.

Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 32 of 34
ibm.com/developerWorks/

developerWorks®

Resources
Learn
• See IBM Redbook™Master Data Management: Rapid Deployment Package for MDM for
more instructions.
• Refer to the IBM InfoSphere MDM Server Information Center for more instructions.
• Refer to the WebSphere Application Server, Version 6.1 Information Center to install IBM
WebSphere Application Server Network Deployment, Version 6.1, and upgrade it with
Fixpack 11.
• Refer to the IBM DB2 Database for Linux®, UNIX®, and Windows Information Center to
install DB2 Database Server, Version 9.5.
• Refer to the IBM Information Server Information Center to install IIS Server, Version 8.0.1.
• Learn more from IBM Redpaper WebSphere Customer Center: Understanding Performance
• Discover DB2 Tuning Tips for OLTP Applications from this classic developerWorks article.
• Explore the Information Management Software for z/OS Solutions Information Center.
• Learn more about Information Management at the developerWorks Information Management
zone. Find technical documentation, how-to articles, education, downloads, product
information, and more.
• Stay current with developerWorks technical events and webcasts.
Get products and technologies
• Build your next development project with IBM trial software, available for download directly
from developerWorks.
Discuss
• Participate in the discussion forum for this content.
• Check out the developerWorks blogs and get involved in the developerWorks community.

Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 33 of 34
developerWorks®

ibm.com/developerWorks/

About the authors
Neeraj Singh
Neeraj R Singh is currently a senior performance engineer working on Master
Data Management Server performance. He has prior experience leading the Java
technologies test team for functional, system, and performance tests as technical
lead and test project leader. He joined IBM in 2000 and holds a Bachelors Degree in
Electronics and Communications Engineering.

Yongli An
Yongli An is an experienced performance engineer focusing on Master Data
Management products and solutions. He is also experienced in DB2 database server
and WebSphere performance tuning and benchmarking. He is an IBM Certified
Application Developer and Database Administrator - DB2 for Linux, UNIX, and
Windows. He joined IBM in 1998. He holds a bachelor degree in Computer Science
and Engineering and a Masters degree in Computer Science. Currently Yongli is the
manager of the MDM performance and benchmarks team, focusing on Master Data
Management Server performance and benchmarks, and helping customers achieve
optimal performance for their MDM systems.
© Copyright IBM Corporation 2009
(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)

Loading a large volume of Master Data Management data
quickly: Using MDM Server maintenance services batch

Page 34 of 34

Mais conteúdo relacionado

Mais procurados

Backup Options for IBM PureData for Analytics powered by Netezza
Backup Options for IBM PureData for Analytics powered by NetezzaBackup Options for IBM PureData for Analytics powered by Netezza
Backup Options for IBM PureData for Analytics powered by NetezzaTony Pearson
 
Netezza Architecture and Administration
Netezza Architecture and AdministrationNetezza Architecture and Administration
Netezza Architecture and AdministrationBraja Krishna Das
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload managementBiju Nair
 
Oracle 11g data warehouse introdution
Oracle 11g data warehouse introdutionOracle 11g data warehouse introdution
Oracle 11g data warehouse introdutionAditya Trivedi
 
The IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse applianceThe IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse applianceIBM Danmark
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsshanker_uma
 
Teradata Unity
Teradata UnityTeradata Unity
Teradata UnityTeradata
 
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP OpsIRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP OpsIRJET Journal
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...Niraj Tolia
 
Software architecture case study - why and why not sql server replication
Software architecture   case study - why and why not sql server replicationSoftware architecture   case study - why and why not sql server replication
Software architecture case study - why and why not sql server replicationShahzad
 
Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems divjeev
 
Performance tuning and optimization (ppt)
Performance tuning and optimization (ppt)Performance tuning and optimization (ppt)
Performance tuning and optimization (ppt)Harish Chand
 
Whitepaper : Working with Greenplum Database using Toad for Data Analysts
Whitepaper : Working with Greenplum Database using Toad for Data Analysts Whitepaper : Working with Greenplum Database using Toad for Data Analysts
Whitepaper : Working with Greenplum Database using Toad for Data Analysts EMC
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsDavid Portnoy
 
High Availability And Oracle Data Guard 11g R2
High Availability And Oracle Data Guard 11g R2High Availability And Oracle Data Guard 11g R2
High Availability And Oracle Data Guard 11g R2Mario Redón Luz
 

Mais procurados (20)

Backup Options for IBM PureData for Analytics powered by Netezza
Backup Options for IBM PureData for Analytics powered by NetezzaBackup Options for IBM PureData for Analytics powered by Netezza
Backup Options for IBM PureData for Analytics powered by Netezza
 
Netezza Architecture and Administration
Netezza Architecture and AdministrationNetezza Architecture and Administration
Netezza Architecture and Administration
 
IBM Netezza
IBM NetezzaIBM Netezza
IBM Netezza
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload management
 
Oracle 11g data warehouse introdution
Oracle 11g data warehouse introdutionOracle 11g data warehouse introdution
Oracle 11g data warehouse introdution
 
The IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse applianceThe IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse appliance
 
Tera data
Tera dataTera data
Tera data
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobs
 
Fast Analytics
Fast Analytics Fast Analytics
Fast Analytics
 
Teradata Unity
Teradata UnityTeradata Unity
Teradata Unity
 
Migration from 8.1 to 11.3
Migration from 8.1 to 11.3Migration from 8.1 to 11.3
Migration from 8.1 to 11.3
 
58750024 datastage-student-guide
58750024 datastage-student-guide58750024 datastage-student-guide
58750024 datastage-student-guide
 
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP OpsIRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
 
Software architecture case study - why and why not sql server replication
Software architecture   case study - why and why not sql server replicationSoftware architecture   case study - why and why not sql server replication
Software architecture case study - why and why not sql server replication
 
Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems
 
Performance tuning and optimization (ppt)
Performance tuning and optimization (ppt)Performance tuning and optimization (ppt)
Performance tuning and optimization (ppt)
 
Whitepaper : Working with Greenplum Database using Toad for Data Analysts
Whitepaper : Working with Greenplum Database using Toad for Data Analysts Whitepaper : Working with Greenplum Database using Toad for Data Analysts
Whitepaper : Working with Greenplum Database using Toad for Data Analysts
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse Platforms
 
High Availability And Oracle Data Guard 11g R2
High Availability And Oracle Data Guard 11g R2High Availability And Oracle Data Guard 11g R2
High Availability And Oracle Data Guard 11g R2
 

Semelhante a netezza-pdf

Gp Installation Presentation
Gp Installation PresentationGp Installation Presentation
Gp Installation Presentationguest2fc298
 
Gp Installation Presentation
Gp Installation PresentationGp Installation Presentation
Gp Installation Presentationddauphin
 
Mobile store management
Mobile store management Mobile store management
Mobile store management Rupendra Verma
 
How to boost performance of your rails app using dynamo db and memcached
How to boost performance of your rails app using dynamo db and memcachedHow to boost performance of your rails app using dynamo db and memcached
How to boost performance of your rails app using dynamo db and memcachedAndolasoft Inc
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lakeCapgemini
 
What Is The Use Of Database Server------
What Is The Use Of Database Server------What Is The Use Of Database Server------
What Is The Use Of Database Server------Shattered Silicon
 
Computing And Information Technology Programmes Essay
Computing And Information Technology Programmes EssayComputing And Information Technology Programmes Essay
Computing And Information Technology Programmes EssayLucy Nader
 
Implementation of dbms
Implementation of dbmsImplementation of dbms
Implementation of dbmsPrashant Ranka
 
Service Models - Databas - Monitoring - Communication.ppt
Service Models - Databas - Monitoring - Communication.pptService Models - Databas - Monitoring - Communication.ppt
Service Models - Databas - Monitoring - Communication.pptMohammadArmanulHaque
 
“Salesforce Multi-tenant architecture”,
“Salesforce Multi-tenant architecture”,“Salesforce Multi-tenant architecture”,
“Salesforce Multi-tenant architecture”,Manik Singh
 
Performance tuning datasheet
Performance tuning datasheetPerformance tuning datasheet
Performance tuning datasheetGlobalSoftUSA
 
Steps for Implementing Dynamics GP Ecommerce
Steps for Implementing Dynamics GP EcommerceSteps for Implementing Dynamics GP Ecommerce
Steps for Implementing Dynamics GP EcommerceIES
 
Components and Advantages of DBMS
Components and Advantages of DBMSComponents and Advantages of DBMS
Components and Advantages of DBMSShubham Joon
 
DBA, LEVEL III TTLM Monitoring and Administering Database.docx
DBA, LEVEL III TTLM Monitoring and Administering Database.docxDBA, LEVEL III TTLM Monitoring and Administering Database.docx
DBA, LEVEL III TTLM Monitoring and Administering Database.docxseifusisay06
 

Semelhante a netezza-pdf (20)

Gp Installation Presentation
Gp Installation PresentationGp Installation Presentation
Gp Installation Presentation
 
Gp Installation Presentation
Gp Installation PresentationGp Installation Presentation
Gp Installation Presentation
 
Building a SaaS Style Application
Building a SaaS Style ApplicationBuilding a SaaS Style Application
Building a SaaS Style Application
 
Mobile store management
Mobile store management Mobile store management
Mobile store management
 
How to boost performance of your rails app using dynamo db and memcached
How to boost performance of your rails app using dynamo db and memcachedHow to boost performance of your rails app using dynamo db and memcached
How to boost performance of your rails app using dynamo db and memcached
 
Mdb dn 2016_12_single_view
Mdb dn 2016_12_single_viewMdb dn 2016_12_single_view
Mdb dn 2016_12_single_view
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lake
 
What Is The Use Of Database Server------
What Is The Use Of Database Server------What Is The Use Of Database Server------
What Is The Use Of Database Server------
 
Computing And Information Technology Programmes Essay
Computing And Information Technology Programmes EssayComputing And Information Technology Programmes Essay
Computing And Information Technology Programmes Essay
 
Implementation of dbms
Implementation of dbmsImplementation of dbms
Implementation of dbms
 
Internship
InternshipInternship
Internship
 
Gp10 enus ins_07
Gp10 enus ins_07Gp10 enus ins_07
Gp10 enus ins_07
 
Service Models - Databas - Monitoring - Communication.ppt
Service Models - Databas - Monitoring - Communication.pptService Models - Databas - Monitoring - Communication.ppt
Service Models - Databas - Monitoring - Communication.ppt
 
“Salesforce Multi-tenant architecture”,
“Salesforce Multi-tenant architecture”,“Salesforce Multi-tenant architecture”,
“Salesforce Multi-tenant architecture”,
 
Performance tuning datasheet
Performance tuning datasheetPerformance tuning datasheet
Performance tuning datasheet
 
Steps for Implementing Dynamics GP Ecommerce
Steps for Implementing Dynamics GP EcommerceSteps for Implementing Dynamics GP Ecommerce
Steps for Implementing Dynamics GP Ecommerce
 
Components and Advantages of DBMS
Components and Advantages of DBMSComponents and Advantages of DBMS
Components and Advantages of DBMS
 
Data Base
Data BaseData Base
Data Base
 
DBA, LEVEL III TTLM Monitoring and Administering Database.docx
DBA, LEVEL III TTLM Monitoring and Administering Database.docxDBA, LEVEL III TTLM Monitoring and Administering Database.docx
DBA, LEVEL III TTLM Monitoring and Administering Database.docx
 
Final project cafe coffe
Final project cafe coffeFinal project cafe coffe
Final project cafe coffe
 

Último

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Último (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

netezza-pdf

  • 1. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Neeraj Singh (sneeraj@in.ibm.com) Advisory Software Engineer IBM   14 August 2009 Yongli An (yongli@ca.ibm.com) MDM Performance Manager IBM The maintenance services for IBM InfoSphere™ Master Data Management Server solution address the needs of clients in the first phase of implementing initial load solutions. Using MDM, clients need to perform initial and delta loads, typically as a batch. This article focuses on the maintenance transaction approach to perform initial loads, including an introduction, installation, and setup. It also covers performance tuning tips and best practices. You can leverage recommendations in this article as guidance in your own MDM Server initial load solutions using maintenance services. View more content in this series Introduction IBM InfoSphere Master Data Management Server (MDM Server) is an enterprise application that helps companies gain control of business information by enabling them to manage and maintain a complete and accurate view of their master data. MDM Server provides a unified operational view of their customers, accounts, and products, and it provides an environment that processes updates to and from multiple channels. It aligns these front office systems with multiple back office systems in real time, providing a single source of truth for master data. The maintenance services for IBM InfoSphere Master Data Management (MDM) Server solution is built to address the needs of clients in the first phase of implementing initial load solutions. At this stage, clients deploy InfoSphere MDM Server for master data management, when data is loaded into the MDM Server repository but most data changes are still coming from existing legacy systems. With MDM Server, the client performs initial and delta loads, typically in a batch. Initial load is the original movement of data from source systems into the MDM Server repository when © Copyright IBM Corporation 2009 Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Trademarks Page 1 of 34
  • 2. developerWorks® ibm.com/developerWorks/ the repository is empty. Delta loads are regular (such as daily) data updates from source systems into InfoSphere MDM Server. There are two different approaches to loading data into InfoSphere MDM Server in batch. The maintenance service batch approach loads data into InfoSphere MDM Server using the maintenance services invoked by the Batch Processor. Alternatively, data can be loaded directly into the database using DataStage jobs. This article shares an IBM team's experience performing case studies focusing on the Maintenance Transaction approach using InfoSphere MDM Server version 8.0.1. The article starts with an introduction to MDM Server Maintenance Transactions. Then it goes on to cover the basic installation and setup steps of the MDM Server environment, including DB2® database server, WebSphere® Application Server, InfoSphere MDM Server, MDM Server Maintenance Transactions, and batch processor. The article covers a high-level summary of key performance results based on internal case studies. It concludes with a list of performance tuning tips and best practices to get optimal performance while doing initial data load. Using this article, you can leverage the IBM team's experience, and you can use recommendations as guidance in your own InfoSphere MDM Server initial load solutions. Introducing the MDM Server service batch approach The MDM Server service batch approach loads data into MDM Server using the maintenance transactions batch processor invokes or using any other batch framework. Because MDM Server services process the data during load, this approach provides the best level of business data validation. You can use the same set of maintenance transactions for both initial and delta loads. To create the setup that uses this option, you need to install InfoSphere MDM Server capable of running maintenance transactions. You also need to prepare the input data in a format that the Batch Processor can consume. What are maintenance transactions? InfoSphere MDM Server creates a unique internal identifier for each record or business entity that serves as its internal key. The regular InfoSphere MDM Server services expect the internal key to be provided as part of the update service request, to ensure that services can identify the correct business entity in the database. However, when data flows into InfoSphere MDM Server directly from external applications such as legacy systems, the internal key is not known, and often the nature of the data change is also not known. Maintenance transactions address this problem. These transactions do not require the internal key as part of the input. They also do not require the external system to specify whether this entity needs to be added or updated in InfoSphere MDM Server. Instead of the internal key, maintenance transactions expect the business key as part of the input, which is the unique identifier of the business entity in external applications. Maintenance transactions use the business key provided in the load operation to locate the correct instance of the business entity in the database. If an existing entity is found, it is updated using the appropriate transaction, such as updateParty. If no Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 2 of 34
  • 3. ibm.com/developerWorks/ developerWorks® existing entity is found, a new entity is created in InfoSphere MDM Server using the appropriate transaction, such as addParty. There are many types of maintenance transactions, including maintainParty, maintainPersonName, and maintainContractPlus. For a complete list of the transactions and more details about them, refer to the MDMRapidDeploymentPackage_CompositeMaintenanceServices.pdf document, available as part of the EntryLevelMDM patch. Maintenance transactions are not part of default InfoSphere MDM Server 8.0.1 distribution and installation. You need to obtain and install EntryLevelMDM patch to use these transactions. Note: Maintenance transactions are part of default InfoSphere MDM Server 8.5 distribution. They are provided with source code as part of the MDM Server Samples distribution archive. You need to install them on top of an existing InfoSphere MDM Server 8.5 instance. See Resources for a link to instructions. It's recommended that you get assets from the FTP site mentioned in the Get the Installer section in this article to ensure you have the latest version. Batch transaction processing You can use maintenance transactions to load data using MDM Server Batch, or they can be invoked as any other service exposed by MDM Server using the RMI or JMS messaging mechanisms. This article focuses on the invocation batch method. InfoSphere MDM Server provides two ways to perform batch transaction processing. You can use either the J2SE Batch processor framework or the WebSphere Application Server eXtended Deployment batch framework. This article focuses on the first option: the J2SE Batch Processor framework. The J2SE Batch processor framework is a J2SE client application, and it is part of a default InfoSphere MDM Server installation. The batch processor is a multi-threaded application that can process large volumes of batch data. It can process multiple records from the same batch input simultaneously, increasing the throughput. Additionally, you can run multiple instances of the batch processor simultaneously, each one processing a separate batch input and pointing to the same server or to different servers. Each batch record in the batch input flows through the batch processor in the following sequence: 1. The reader consumer reads the record from the batch input. The submitter consumer sends it to the request/response framework for parsing and processing. 2. The parser transforms the input request into one or more business objects. 3. After passing through business proxy, business processing and persistence logic are applied to the business objects. 4. The application responses are sent to the constructor in order to construct the desired batch output response. 5. The constructed response is returned to the batch processor. 6. The writer records the transaction outcome in the writer log, if necessary. For example, FailedWriter logs any failed messages. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 3 of 34
  • 4. developerWorks® ibm.com/developerWorks/ The batch processor is shipped with pre-built readers and writers that can be used as is. The default reader expects the batch input is an XML data format where each line contains one XML request. The default writer writes the response in the XML format. You can also use the InfoSphere MDM Server batch processor to process batch files containing messages in SIF format. If your input data is not in the format specified above, you need to convert them to the required format, or use a customized reader and parser. It is possible to customize many of the components of the Batch Processor, but customization is not within the scope of this article. Understanding software and hardware requirements The following is a typical system topology for InfoSphere MDM Server deployment using QualityStage from Information Server for Standardization and Matching: • Application Server and InfoSphere MDM Server are installed on one physical box or LPAR with the correct CPU capacity (Server1). The number of CPUs depends on the overall throughput requirements. • The database server is installed on another physical box or LPAR (Server2) with wellequipped IO capacity. • IIS Server should be installed either on the database server or on a third physical box or LPAR (Server3) with adequate IO bandwidth. • IIS Client is used to configure QS jobs, and it is installed on a Windows® computer. To efficiently maximize the performance for the given configuration, follow the following general guidelines: • The ratio of the number of CPUs on InfoSphere MDM Server and DB server can range from 2:1 to 3:1. For example, if you have a database server with 4 CPUs, the recommended number of CPUs on the MDM Server box is at least 8 CPUs in order to well-utilize the CPU capacity on the database server. • You should have 5 to 10 physical disk spindles available for each CPU on the database server. • The ratio of the number of CPUs on InfoSphere MDM Server and IIS server can range from 2:1 to 1:1. For example, if you have MDM Server with 8 CPUs, the recommended number of CPUs on the IIS server box is between 4 and 8. Note: You only need IIS server if you plan to use QualityStage for standardization and matching (such as suspect processing). InfoSphere MDM Server default configuration does not use QualityStage. Exploring the example environment This section briefly describes the example environment, including hardware and software information, in each layer in the stack. It also describes the system topology used in the tests. Software and hardware stack • Server 1 (AppServer and InfoSphere MDM Server) Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 4 of 34
  • 5. ibm.com/developerWorks/ developerWorks® • Hardware • Machine type: IBM 9116-561, PowerPC® POWER5™ • CPUs: 8 core Power5 with 16 threads, 1.5GHz , 64 bit • Memory/IO: 32 GB RAM, 6 internal disks • Software • OS : AIX® Version 5300-06 (64 bit) • WebSphere® Application Server ND 6.1.0.11 (32 bit) • InfoSphere MDM Server 8.0.1 + EntryLevelMDM patch • Server 2 (DB2® database Server) • Hardware • Machine type: IBM 9116-561, PowerPC POWER5 • CPUs: 8 core Power5 with 16 threads, 1.5GHz , 64 bit • Memory/IO : 32 GB RAM, 6 internal disks + 40 external disks • Software • OS : AIX Version 5300-06 (64 bit) • DB2® database server v9.5 (64 bit) • Server 3 (Information Server) • Hardware • Machine type: IBM 9116-561, PowerPC POWER5 • CPUs: 8 core Power5 with 16 threads, 1.5GHz , 64 bit • Memory/IO : 32 GB RAM, 6 internal disks • Software • OS : AIX Version 5300-06 (64 bit) • IIS v8.0.1 • Server 4 (IIS Client - To configure QualityStage jobs, not needed while running the test) • Hardware • 32 bit x86 machine • Software • OS : Windows 2003 Server • IIS client version 8.0.1 for Windows System topology For InfoSphere MDM Server to use QualityStage jobs for standardization and matching, you need Server3 and Server4, as shown in Figure 1. For default standardization and matching algorithms from InfoSphere MDM Server, Server1 and Server2 are sufficient. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 5 of 34
  • 6. developerWorks® ibm.com/developerWorks/ Figure 1. System topology Installing the components The purpose of this section is to show the high-level steps required to get the needed software installed in the test environment. The steps focus on the Maintenance services-related steps, while briefly mentioning the prerequisite software installation, including WebSphere® Application Server, DB2 database server, InfoSphere MDM Server, and InfoSphere Information Server. Installation prerequisites The prerequisite installations include WebSphere Application Server, DB2 database server, and InfoSphere Information Server. For installation instructions, see each product's Information Center in Resources. 1. On Server1, install IBM WebSphere Application Server Network Deployment, Version 6.1, and upgrade it with Fixpack 11. 2. On Server2, install DB2 Database Server, Version 9.5. 3. On Server3, install IIS Server, Version 8.0.1. 4. On Server4 (Windows machine), install IIS client. InfoSphere MDM Server Installation For InfoSphere MDM Server installation, see Resources for a link to the information center. You can install it on a standalone WebSphere Application Server or on a WebSphere Application Server cluster. Installation of Entry Level MDM Server patch for maintenance services Follow the steps in this section to apply the Entry Level MDM (ELMDM) Server patch, which enables you to use maintenance transactions. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 6 of 34
  • 7. ibm.com/developerWorks/ developerWorks® These instructions assume that you have already installed InfoSphere MDM Server and have applied all the required fixpacks. These instructions are based on software stack mentioned in the Test Environment section. Step 1. Get the installer. Maintenance transactions are not part of the default installation of MDM Server, and they need to be installed separately. If you have a service agreement with IBM, you can get the installer for maintenance transactions by logging into the Secure File Transfer site and finding https:// testcase.boulder.ibm.com/www/prot/MDM_RDP/?T. At the time of writing, the latest installable package is https://testcase.boulder.ibm.com/www/prot/MDM_RDP/MDMServer801_RDP801/ ELMDM-20090407.tar.gz. Contact your IBM service representative if you need help getting this package. For more instructions, see the chapter titled Installing Rapid Deployment Package for MDM Server Maintenance transactions and MDM Customizations in the document MDMRapidDeploymentPackage_InstallGuide.pdf. You can find this document under the directory Docs when you uncompress the installer. Step 2. Make required backups before installing. The installer makes changes to the InfoSphere MDM Server Database. As a precaution, you might want to make a backup of this database before running the installer. The installer creates backup copies of files that it changes. These files are named *.beforeELMDM. However, they get overwritten during subsequent installer runs. So before you invoke the installer again for any reason, ensure you have moved the previous set of files to a safe place. The files modified by the installer are: • MDM Server home directory installable .ear file. For example, /usr/IBM/MDM_801/ installableApps/MDM.ear • A set of files in the <MDM_Instance>.ear directory under WebSphere Application Server. For example, /opt/IBM/WebSphere/AppServer/profiles/AppSrv1/installedApps/myHostCell01/ MDM_801.ear/ Step 3. Prepare the installer. Complete the following steps to prepare the installer. a. Create a new base directory named setup. b. Extract the installer (.tar.gz file) in this directory. It creates several directories, including one named install. c. Go to directory setup/install/DB2 database server. d. Give execute permissions for all the scripts using the command chmod 755 *.sh e. Connect to the InfoSphere MDM Server database and execute the SQL below. The schema name is assumed to be mySchema. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 7 of 34
  • 8. developerWorks® ibm.com/developerWorks/ Listing 1. SQL to execute db2 "insert into mySchema.DataAssociation values (25083715210700005,'a_name',current_timestamp,'a_description',null)" Step 4. Customize a clustered environment. This step is not required if your MDM Server is a standalone server. If you are installing ELMDM on a Clustered MDM Server installation (MDM Server running on a cluster of WebSphere Application Servers), make the following modifications in the scripts. a. In setVariables.sh, add the line in Listing 2 at the beginning of the script. NAME_OF_SERVER refers to the name of the WebSphere Application Server instance that is a member of the cluster. Listing 2. Added line #add the line below export SRV_NAME=NAME_OF_SERVER b. In the scripts install_DisableHVL.sh, install_EnableHVL.sh, and install_ELPCustom.sh, make the changes shown in Listing 3. Listing 3. Changes to script files #comment out the line below and replace with the new line as shown below #$CURRENT/restartServer.sh $WAS_HOME $NODE_NAME $APP_NAME $ADMIN_USER $ADMIN_PASSWORD #add the line below $CURRENT/restartServer.sh $WAS_HOME $NODE_NAME $SRV_NAME $ADMIN_USER $ADMIN_PASSWORD c. In the install_ELPTx.sh script, make the changes in Listing 4. Listing 4. The install_EPLTx.sh script #comment out the line below and replace with the new line as shown below #$LOC/restartServer.sh $WAS_HOME $NODE_NAME $APP_NAME $ADMIN_USER $ADMIN_PASSWORD #add the line below $LOC/restartServer.sh $WAS_HOME $NODE_NAME $SRV_NAME $ADMIN_USER $ADMIN_PASSWORD Step 5. Optionally modify the installer to help in debugging. Complete the following steps to modify the installer to debug. a. At the beginning of each script, add set -x b. Add the verbose option to db2 calls by replacing all occurrences of db2 -tf with db2 -tvf in the scripts below: • runsql.sh • install_ELPCustom.sh • install_EnableHVL.sh • install_DisableHVL.sh Step 6. Set your environment variables Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 8 of 34
  • 9. ibm.com/developerWorks/ developerWorks® Modify the setVariables.sh script according to your environment. The values given in Listing 5 are examples. Read the comments and instructions embedded within the example. Listing 5. Extract from the setVariables.sh script export WAS_HOME=/opt/IBM/WebSphere/AppServer export CELL_NAME=myhostCell01 #set the profile name used by WAS running MDM Server. such as AppSrv01 and Custom01 export NODE_NAME=Custom01 export APP_NAME=MDM_801 #The Name of the WebSphere Application Server running MDM Server, #You will have this only if you followed Step 4 above export SRV_NAME=Cluster_member1 export INSTALL_HOME=/usr/IBM/MDM_801 # IIS Server Version: Could be 801 or 81 export IIS_SRV_VERSION=801 export export export export export export DB_NAME=MDMDB DB_USER=myDBuser DB_PASSWORD=myDBpassword TABLE_SPACE=TABLESPACE1 INDEX_SPACE=INDEXSPACE1 LONG_SPACE=LONGSPACE1 export TRIG=COMPOUND export DEL_TRIG=TRUE export APPLICATION_NAME='WebSphere Customer Center' export APPLICATION_VERSION=8.0.1.0 export DEPLOY_NAME=MDM_801 #You need to set this only if you are integrating QualityStage with MDM Server. #Please note the back slashes. The number 2809 here refers to the #bootstrap port of WebSphere Application Server instance running IIS server. export ISP_URL='iiop://myIISserver.mylab.ibm.com:2809' Step 7. Execute the scripts. a. Execute install_ELPTx.sh. b. If you are integrating InfoSphere MDM Server with QualityStage, run the install_ELPCustom.sh script as well. Step 8. Check for errors. Go through all the log files to ensure there are no errors. Step 9. Repeat steps for a clustered environment. If you are installing in a clustered environment, complete the steps below for each cluster member. a. Reconfigure setVariables.sh to point to another cluster member. b. Run the additionalClusterInstall.sh script. c. If you are integrating InfoSphere MDM Server with QualityStage, run the install_ELPCustom.sh script. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 9 of 34
  • 10. developerWorks® ibm.com/developerWorks/ Note: As part of the install_ELPCustom.sh script, there are changes made to InfoSphere MDM Server database. Some of these changes cannot be executed more than once (such as a DB insert). Either ignore these errors during repeated execution of this script, or alter the script so that it does not attempt to repeat the database operations. Step 10. Configure the SIF parser. Complete this step only if you want to use a SIF parser. Otherwise, skip to Step 11. The example uses the default XML parser. To configure the batch processor to use the SIF parser, modify the following: a. In the DWLCommon_extention.property file, which is in properties.jar on server runtime environment, set sif_compatibility_mode = on. b. In batch extension property file, set ParserAndExecConfiguration.Parser = SIF. For more details, see the section SIF Parser in MDMRapidDeploymentPackage_CompositeMaintenanceServices.pdf. Step 11. Restart the InfoSphere MDM Server. Restart the InfoSphere MDM Server, including all the servers in a cluster. Integration of InfoSphere MDM Server with QualityStage If you want to use default standardization and matching algorithms from InfoSphere MDM Server, these steps are not needed, and you can continue to Optimizing performance with key configuration parameters. However, if you want InfoSphere MDM Server to use QualityStage for standardization and matching, this section describes how to configure them. These instructions assume the following: • InfoSphere MDM Server is installed and all the required fixpacks are applied. • EntryLevelMDM is installed. • The IIS server and IIS client are installed. The version of the IIS client must be the same as that of the IIS server. • The software stack is similar to that described in the Software and hardware stack section of the example environment. See Resources to access the documentation for InfoSphere MDM Server and QS integration (MDM Server Developers Guide, chapter titled Integrating IBM Information Server QualityStage with IBM InfoSphere Master Data Management Server). The instructions in this article complement those mentioned in the developer's guide. However, there are a few configuration changes mentioned in this article that are helpful during the installation. Step 1. Change security settings. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 10 of 34
  • 11. ibm.com/developerWorks/ developerWorks® If global security is enabled on the WebSphere Application Server running IIS, the transaction protocol security on that server must be disabled. To disable protocol security on a server, complete the following steps in the administrative console: a. In the administrative console, click Servers > Application Servers > server_name. The properties of the application server are displayed in the content pane. b. Under Container Settings, expand Container Services and click Transaction Service to display the properties page for the transaction service. c. Under Additional Properties, click Custom Properties. d. On the Custom Properties page, click New. e. Type DISABLE_PROTOCOL_SECURITY in the Name field, and type TRUE in the Value field. f. Click Apply or OK. g. Click Save to save your changes to the master configuration. h. Restart the server. Optionally, if WebSphere Application Server application security is turned on for InfoSphere MDM Server, the LTPA keys need to be shared between the MDM WebSphere Application Server cell and the IIS WebSphere Application Server cell. For detailed instructions, refer to the WebSphere Application Server Information Center (see Resources). Step 2. Get the installer. The installable components are part of the same bundle that you used while installing maintenance services. You will find them in the QualityStage folder. Step 3. Create the IIS project. Use the IIS Administrator Client to connect to the IIS server. Create a new project called ELMDMQS. Step 4. Import the IIS project. 1. Log into the ELMDMQS project through the DataStage and QualityStage Designer. 2. Click Import > Datastage Components. 3. Browse to the ELMDMQS.dsx file under the EntryLevelMDMQualityStage folder you extracted above. 4. Import the file. Step 5. Provision imported rule sets. You need to provision imported rule sets to the designer client before a job that uses them can be compiled. Complete the following steps to provision imported rule sets. a. In the Designer client, find the rule set within the repository tree ELMDMQS > ELMDMRT > Standardization Rules > MDMQS. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 11 of 34
  • 12. developerWorks® ibm.com/developerWorks/ b. Select the rule set by right-clicking and selecting Provision All from the menu, as shown in Figure 2. Figure 2. Provisioning rule sets c. Repeat the steps for all the rulesets listed below. • MDMQSStandardization RulesMDMCanadaCAADDRMDMCAADDR • MDMQSStandardization RulesMDMCanadaCAAREAMDMCAAREA • MDMQSStandardization RulesMDMUSAUSADDRMDMUSADDR • MDMQSStandardization RulesMDMUSAUSAREAMDMUSAREA • MDMQSStandardization RulesMNADKEYSMNADKEYS • MDMQSStandardization RulesMNNAMEMNNAME • MDMQSStandardization RulesMNNMKEYS • MDMQSStandardization RulesMNPHONEMNPHONE • MDMQSStandardization RulesMNSPOSTMNSPOST Step 6. Prepare test data and configure parameters a. Copy the provided test data (*.csv files and *.txt) into a directory on your IIS server (not the IIS client) called /data01/ELMDMQS. b. Open the parameter set ELMDMQS_Data_Directory under ELMDMQSELMDMRTParameter Sets (in the Repository view of the designer). c. Double-click on the Parameter set. d. Go to the Values tab and set the value of the parameter DATADIR to the directory path into which you just copied the test data (/data01/ELMDMQS/ in this example), as shown in Figure 3. Note the slash (/) at the end of the parameter value. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 12 of 34
  • 13. ibm.com/developerWorks/ developerWorks® Figure 3. Parameter set e. Under the ELMDMQSELMDMRTShared Containers folder, double-click to open the shared container MDMQSPartySuspectReferenceMatchOrganization. f. Set the file paths of data set stages Data_Frequency and Reference_Frequency to the same path that you provided for ELMDMQS_Data_Directory.DATADIR to in the previous step, as shown in Figure 4. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 13 of 34
  • 14. developerWorks® ibm.com/developerWorks/ Figure 4. Edit input file path g. Click OK to save the changes. h. Close the stage, clicking Yes when it prompts you to save the changes in the stage. i. Repeat the above steps for MDMQSPartySuspectReferenceMatchPerson. Step 7. Compile the jobs. a. Compile all the jobs inside the ELMDMQSELMDMRTJobs folder and its subfolders using Tool > Multiple Job compile from the designer client's menu. b. Follow the instructions in the wizard, and start compiling. Note: Batch versions of jobs can be found in the ELMDMQSELMDMRTJobs folder. Information Service Director (ISD) versions of these jobs can be found in the ELMDMQSELMDMRTJobsISD folder. Step 8. Generate match frequency data a. Use the director client to run the job ELMDMQSELMDMRTJobs MDMQS_Person_Match_Frequency_Generation to generate the match frequency data. When completed, it generates files PersonRefMatchTransFreq.txt and PersonRefMatchCandFreq.txt, as shown in Figure 5. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 14 of 34
  • 15. ibm.com/developerWorks/ developerWorks® Figure 5. Generating match frequency data b. Similarly, run ELMDMQSELMDMRTJobsMDMQS_Org_Match_Frequency_Generation to generate files OrgRefMatchTransFreq.txt and OrgRefMatchCandFreq.txt Step 9. Run the test jobs. a. Use the director client to run the following batch jobs to test that they execute successfully on your system before you use the ISD jobs: • All jobs in ELMDMQSELMDMRTStandardization Testing • All the Jobs in ELMDMQSELMDMRTMatch Testing b. After running the jobs, view the output in the Sequential file to check the result Step 10. Deploy services using ISD a. Log on to the IBM Information Server (IIS) console. b. Click File > Import Information Services Project > Browse for the file ELMDMQS_ISDProject.xml in the EntryLevelMDMQualityStage directory. c. Keep all the default settings, and click Import. d. Open the Information Service Application (ELMDMQS) contained in the imported project. e. Click Develop, as shown in Figure 6. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 15 of 34
  • 16. developerWorks® ibm.com/developerWorks/ Figure 6. Selecting the Develop icon f. Click Information Services Application. g. On the resulting screen, double-click the ELMDMQS application to open it. h. Go into Edit mode. i. In the Select a View window, click Services > ELMDMQSService, as shown in Figure 7. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 16 of 34
  • 17. ibm.com/developerWorks/ developerWorks® Figure 7. Configuring jobs using ISD j. In the expanded tree, select Operations, and double-click the operations one at a time to edit each of them. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 17 of 34
  • 18. developerWorks® ibm.com/developerWorks/ Figure 8. Checking the project name k. Edit each of the operations as follows: i. Ensure that the project name is correct, as shown in Box 1 in Figure 8. When you created the new project using the administration client, if you chose ELMDMQS as the name of the project, you can keep the defaults. If you specified another name, ensure that the project name and the job names are correct. To check the project and job names, click the Edit button, and browse to the project and job in the ISD folder. ii. Ensure that the Group Arguments into Structure option is enabled for inputs, as shown in Box 2 in Figure 8. iii. Change the input data type according to Table 1 below, as shown in Box 3 in Figure 8. iv. Check or uncheck the Accept array checkboxes according to Table 1, as shown in Box 4 in Figure 8 (the checkbox should show a checkmark if the table entry indicates Yes). v. Check or uncheck the output data type and Accept array checkboxes on the output tab according to Table 1. Table 1. ISD job configuration Operation name standardizeAddress Operation job name Inputs accept array ISD_MDMQS_Address_Standardization No Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Input data type AddressInput Outputs return array No Output data type AddressOutput Page 18 of 34
  • 19. ibm.com/developerWorks/ developerWorks® elPersonMatch ISD_MDMQS_Party_Suspect_Reference_Match_Person Yes ELPersonMatchInput Yes ELPersonMatchOutput elOrgMatch ISD_MDMQS_Party_Suspect_Reference_Match_Org Yes ELOrgMatchInput Yes ELOrgMatchOutput standardizePhoneNumber ISD_MDMQS_Phone_Standardization No PhoneNumberInput No PhoneNumberOutput standardizeOrgName OrgNameInput No OrgNameOutput PersonNameInput No PersonNameOutput ISD_MDMQS_Organization_Standardization No standardizePersonNameISD_MDMQS_Person_Standardization No l. On the Provider Properties tab, modify the credentials according to your setup, as shown in Figure 9. Figure 9. Modifying your credentials m. Save and close the application. n. Deploy the application by clicking on the Develop menu. Figure 10 shows an example. Note the highlighted box that shows Select the Application ELMDMQS. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 19 of 34
  • 20. developerWorks® ibm.com/developerWorks/ Figure 10. Deploying the application o. Click Deploy, as shown in the Figure 10. p. Leave the defaults, and click Deploy to start the deployment. Step 11. Set configuration values for QualityStage. Note: This example integration is being done for an InfoSphere MDM Server installation on which maintenance services are installed. During the installation of maintenance services, if you ran install_ELPCustom.sh then you can skip to Optimizing performance with key configuration parameters. Set the configuration values according to Table 2 in order to properly communicate with the IIS-QS server. Table 2. Configuration modifications Configuration name Default value /IBM/ThirdPartyAdapters/IIS/defaultCountry 185 /IBM/ThirdPartyAdapters/IIS/initialContextFactory This configuration element is used in conjunction with the provider URL to use JNDI registry initial context. A typical value for this element is com.ibm.websphere.naming.WsnInitialContextFactory. /IBM/ThirdPartyAdapters/IIS/providerURL iiop://<yourQSServer>:<QSServerBootstrapPort>. For example: iiop:// myIIS.torolab.ibm.com:2809. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 20 of 34
  • 21. ibm.com/developerWorks/ developerWorks® /IBM/Party/Standardizer/Name/className com.ibm.mdm.thirdparty.integration.iis8.adapter.InfoServerStandardizerAdapter /IBM/Party/Standardizer/Address/className com.ibm.mdm.thirdparty.integration.iis8.adapter.InfoServerStandardizerAdapter Step 12: Use QualityStage (QS) name and address standardization. Use QS to standardize names and addresses that are entered into InfoSphere MDM Server. See Standardizing name, address and phone number information in the MDM developer's guide (see Resources) for more information. Step 13: Using QualityStage in suspect duplicate processing. QualityStage can be used with the InfoSphere MDM Server Suspect Duplicate Processing (SDP) feature. See Configuring IBM Information Server QualityStage integration for SDP in the MDM developer's guide (see Resources) for more information on using QualityStage with SDP. Optimizing performance with key configuration parameters After you install the InfoSphere MDM Server, tune the key configuration parameters for optimal performance. InfoSphere MDM Server and batch processor configuration 1. Increase the number of submitters to increase parallelism. Do this by editing the file <MDM_installation_Folder>/BatchProcessor/properties/Batch.properties. On an 8-way MDM Server box, 24 submitters are optimal. 2. Increase JVM heap settings for the batch processor. Do this by editing the file <MDM_installation_Folder>/BatchProcessor/bin/runbatch.sh. For example: for 24 submitters, 512MB of heap is sufficient. 3. Reduce BatchProcessor logging by setting the threshold to ERROR. Do this by editing <MDM_installation_Folder>/BatchProcessor/Log4J.properties and setting the logging threshold to ERROR, if it is not already. For example: log4j.appender.file.Threshold=ERROR. 4. Reduce MDM Server logging by setting the threshold to ERROR. Do this by editing Log4J.properties inside the properties.jar file at <WebSphere_Location>/profiles/ <ServerName>/installedApps/<CellName>/<InstanceName>/properties.jar. WebSphere Application Server configuration 1. Increase the JDBC connection pool size to support the parallelism. a. From the WebSphere Administration Console, go to Resources >JDBC > Data sources > DWLCustomer > Connection pool properties b. Increase the value for Maximum connections. The example setup uses 50. 2. Increase the prepared statement cache size. a. The size of the prepared statement cache depends on the number of unique SQL statements used in your application. For InfoSphere MDM Server, set it to 300 and monitor the application to determine if the cache size needs to be increased. b. It can be changed from the WebSphere Administration Console. Go to Resources > JDBC > Data sources > DWLCustomer > Connection pools > WebSphere Application Server data source properties. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 21 of 34
  • 22. developerWorks® ibm.com/developerWorks/ 3. Increase the EJB cache size. Do this by using the WebSphere Administration Console to go to Servers > Application servers > [ServerName] > EJB Container Settings > EJB cache settings. The example uses 4000. 4. Change the JVM heap size and GC policy. a. From the WebSphere Administration Console, go to Servers > Application servers > [ServerName] > Java and Process Management > Process Definition > Java Virtual Machine. b. Indicate the initial heap size as 512 MB and the maximum heap size as 1024 MB. c. Use gencon GC policy for better performance. To use this GC policy, specify Xgcpolicy:gencon under Generic JVM arguments. While testing the example using the gencon GC policy, sometimes WebSphere Application Server generates unnecessary heapdumps. To disable this behavior, do the following after the server is started: i. From the WebSphere Administration Console, go to Servers > Application servers > [ServerName] > Performance > Performance and Diagnostic Advisor Configuration > Runtime (tab). ii. Uncheck the check box (ensure the checkbox is empty) for Enable automatic heap dump collection. Database tuning (DB2) It is recommended to follow best practices and recommendations to set up a database server. It is also recommended to closely monitor your database performance and to tune your database as needed for optimal performance and productive resource usage. This section briefly describes several recommendations on configuring and tuning a DB2 database. The basic concepts also apply to other types of databases. • Typically it is recommended that you use one set of dedicated disks for DB2 transaction logs and you use another set of dedicated disks for DB2 table spaces. If possible, it is even better to use different disk controllers for DB2 transaction logs and DB2 table spaces, because this gives you the flexibility to configure the disk controllers independently for different I/O patterns to favor writes instead of a mix of writes and reads. • Ensure read and write cache is enabled on the storage system. Monitor the cache effectiveness, and configure the cache size properly. • Properly plan the table spaces to ensure balanced I/O operations across all of the available disks. This avoids hot spots in your database and avoids limiting your overall database performance to the bandwidth of a few of the busiest disks. This maximizes the utilization of all the I/O bandwidth available from all the physical disks. • In addition to a well-planned table space layout over the I/O system, one of the biggest configuration parameters that affects performance dramatically is the database buffer pool size. Pay close attention to the overall buffer pool hit ratio, which tells how often it needs to go to the physical disks (which is very expensive) for the needed data that is found in the database buffer pools. • Strive for a buffer pool hit ratio of 80% or higher for data, and 90% or higher for indexes. Typically in MDM Server implementations, start with one big buffer pool for both data and indexes. If necessary, separate data and indexes into two different buffer pools to help ensure a good index buffer pool hit ratio. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 22 of 34
  • 23. ibm.com/developerWorks/ developerWorks® • Because an MDM Server enables a good amount of customization and extension, analyze the most expensive SQLs from the database snapshot or other tools. Ensure that those SQLs have optimal access plans with the best indexes in place. Those recommendations should be considered together to achieve what you need for performance, because the behavior of one area might be just a symptom of another incorrectly configured or misbehaving area. Understanding performance test methodology used in the example Input data preparation The maintainContractPlus transaction was used for testing the example. Because the default parser from the BatchProcessor was used, the input data format had to be LineFeed delimited XML transactions. The first step toward getting the input data set was to create seed-data. The seed-data was generated using a home-grown, Java-based tool with key distributions based on U.S. Census data (2000). Some realistic data was added to make the overall parties closely match a typical MDM business scenario. The seed-data contained details such as name, gender, date of birth, addresses. As a second step, a template for maintainContractPlus transaction was created. This template had variables for key party details that needed to be filled in with generated seed-data. Another homegrown, Java-based tool was used to generate the XML transactions. One such transaction yielded one person with one name, one address, one contract, and one contact method. Table 3 shows the detailed profile of database tables populated by a single transaction. The example run used a total of one million such records as one input data set, representing one party and its associated attributes. Suspect duplicate data preparation The data generated in the example so far was primarily clean. A similar approach was used to generate dirty data, which included 40% duplicates. This data set was used when Suspect Duplicate Processing was turned on. During the initial load, the input data might have duplicate entries, where details from one record closely resemble those from another one. Such records are termed as suspect duplicates. Depending on how closely two records match, suspect duplicates are assigned a match category. To determine the match category, some critical data fields are used while comparing two records. The critical data fields include first name, last name, address, date of birth, gender, and social security number. Based on comparison results, the suspect duplicates are assigned a matchscore and a non-match-score, and then the match category is derived. Depending on the match category, InfoSphere MDM Server takes appropriate actions for the suspect duplicates. When testing the example, two sets of data were used: Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 23 of 34
  • 24. developerWorks® ibm.com/developerWorks/ • 100% clean data with no suspect duplicates in the input data set • 60% clean data with 40% of the records as suspect duplicates. The example test included 4 types of suspect duplicates in the 60% clean data set. Population of each type of suspect duplicate was kept equal, and they were randomly distributed in the data using home-grown, Java-based tools. The details of this data set are shown in Table 3. Table 3. Details of input data with suspects sr# Matching critical data details Non-matching critical data details Population Weight (match/ non-match score) Match category 1 Gender, FirstName, LastName, Address, DOB, SSN None 10% 63/0 A1 2 Gender, FirstName, LastName, DOB,SSN Address 10% 60/3 A2 3 Gender, Address, DOB, SSN FirstName, LastName 10% 55/4 A2 4 Gender, Address, Last First Name (and SSN Name DOB field is empty) 10% 46/1 B The scores and categories in the Table 3 are calculated by InfoSphere MDM Server's deterministic matching approach, which is the default implementation for party-matching. In contrast, QualityStage matching offers a probabilistic matching approach, and it calculates only one composite weight. Data profile Table 4 shows the population of InfoSphere MDM Server database tables when the two sets of input data are loaded. Table 4. Database population Table name 100% clean data 60% clean data ADDRESS 1,000,000 700,000 ADDRESSGROUP 1,000,000 900,000 CONTACT 1,000,000 900,000 CONTACTMETHOD 1,000,000 900,000 CONTACTMETHODGROUP 1,000,000 900,000 CONTEQUIV 1,000,000 1,000,000 CONTRACT 1,000,000 1,000,000 CONTRACTCOMPONENT 1,000,000 1,000,000 CONTRACTROLE 1,000,000 1,000,000 IDENTIFIER 1,000,000 900,000 Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 24 of 34
  • 25. ibm.com/developerWorks/ developerWorks® LOBREL 1,000,000 900,000 LOCATIONGROUP 2,000,000 1,800,000 MISCVALUE 1,000,000 1,000,000 PERSON 1,000,000 900,000 PERSONNAME 1,000,000 900,000 PERSONSEARCH 1,000,000 900,000 SUSPECT 0 300,000 Test methodology Different tests were performed to check stability and scalability and to measure the overhead associated with several commonly used features. All the tests were conducted in two solution configurations: • The MDM Server only solution, where InfoSphere MDM Server uses its own algorithm for standardization and matching. In this case, IBM Information Server is not required. • MDM Server + QS solution, where InfoSphere MDM Server uses QualityStage to do the standardization and matching. The methodology for all these tests was similar: 1. Set up the systems. Do the configuration and tuning of various components as mentioned in previous sections. 2. Prepare a set of input data with 10000 records using the approach mentioned. 3. Load the input data with 10000 records using 1 submitter in the batch processor. This is done to avoid deadlocks while working with an empty database. 4. Perform DB2 reorgchk on all the tables to update statistics. 5. Create a backup of the MDM Server database at this stage, and use it is as the starting point for all the tests. The following steps were used to run the example test: 1. Restore the database using the backup copy. 2. Change the database configuration if required for the test. For example, you may want to switch OFF Suspect Duplicate Processing. 3. Restart WebSphere Application Server running InfoSphere MDM Server. 4. Run data collection scripts in the background, which collect CPU statistics, IO statistics, and database snapshots. 5. Start the test to load the selected input dataset. 6. Collect the logs from InfoSphere MDM Server, WebSphere Application Server, and DB2 database server. 7. Derive response time and throughput from transactiondata.log as generated by InfoSphere MDM Server. Measuring performance results This section describes the performance measurements including the following: Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 25 of 34
  • 26. developerWorks® ibm.com/developerWorks/ • Results showing very stable performance throughput and response time • Performance overhead of some commonly used features in the context of initial data loading • Scalability of throughput Test 1: Stability of throughput and response time The purpose of this test is to show whether the throughput and response times remain stable as the loading progresses and as the database size increases. This test also measures the system resource usage pattern along the test. The data for throughput and response time is derived from transactiondata.log, as generated by InfoSphere MDM Server. Various tests were conducted for both MDM Server only and MDM Server + QS scenarios, and all of them showed good stability. Table 5 shows the configuration settings for the first test. Table 5. Test 1 configuration Parameter Value Hardware/Software stack As described in example test environment InfoSphere MDM Server heap size Initial : 512MB; Max 1024MB InfoSphere MDM Server JVM GC policy gencon Number of submitters in batch processor 24 Batch processor JVM memory 512MB ISD job configurations (applicable to MDM Server + QS scenario only) Default Type of transaction used MaintainContractPlus Total volume 1 million parties and their associated records Input data quality 60% clean 40% suspected duplicates of various types Name standardization ON (default) Address standardization ON (StandardFormatingIndicator to N in the requestXML) Suspect duplicate processing ON History triggers Enabled Test 1 results: Stability results Figure 11 shows the throughput and response times captured for the MDM Server only scenario. The chart shows that throughput and response time are stable during the whole run duration. The results for the MDM Server + QS scenario are similar. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 26 of 34
  • 27. ibm.com/developerWorks/ developerWorks® Figure 11. Throughput and response time Figure 12 shows that by configuring a sufficient number of submitters to the required number, almost all CPU resources on WebSphere Application Server running InfoSphere MDM Server can be used, and the system does not have any other bottlenecks. Figure 10 also shows the resource usage on other systems. Figure 12. Resource usage Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 27 of 34
  • 28. developerWorks® ibm.com/developerWorks/ Test 2: Feature overheads The purpose of the tests is to measure the overhead of four commonly used features of InfoSphere MDM Server. Under this series of tests, the overhead of the following were measured: • • • • Name standardization Address standardization Suspect duplicate processing History triggers Overhead is expressed as a percentage reduction in throughput per unit of time when the feature is enabled. For example, 5% overhead associated with a particular feature means that if throughput was 100 transactions per second (TPS), it becomes 95 TPS due to overhead when the feature is enabled. Throughput is measured as total data volume loaded / total time taken. Various tests were conducted for both MDM Server only and MDM Server + QS scenarios, enabling one or more features at a time. In the MDM Server + QS scenario, the overheads of standardization and suspect duplicate processing should be higher because they involve extra processing by QualityStage. Table 6 shows the configuration settings for the second test. Table 6. Test 2 configuration Parameter Value Hardware/Software stack As described in example test environment InfoSphere MDM Server heap size Initial: 512MB ; Max 1024MB InfoSphere MDM Server JVM GC policy Default Number of submitters in batch processor 24 Batch processor JVM memory 512MB ISD job configurations (applicable to MDM Server + QS scenario only) Default Type of transaction used MaintainContractPlus Total volume 1 million parties and their associated records Input data quality a) 100% clean; b) 60% clean Following are some notes about the configuration: • Name standardization was turned ON or OFF by setting /IBM/Party/ ExcludePartyNameStandardization/enabled to FALSE or TRUE, respectively. • Address standardization was effectively switched ON or OFF by setting StandardFormatingIndicator to N/Y in the transaction request XMLs. • Suspect duplicate processing was switched ON or OFF by setting the following to TRUE or FALSE respectively in the configuration table: • /IBM/Party/SuspectProcessing/enabled • /IBM/Party/SuspectProcessing/AddParty/returnSuspect Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 28 of 34
  • 29. ibm.com/developerWorks/ developerWorks® Test 2 results: Feature overheads Standardization The following table shows the overhead of standardization only for the MDM Server only scenario. Tests were conducted with both datasets (100% clean and 60% clean) when suspect duplicate processing was switched ON. History triggers were enabled during these tests. Table 7. Overhead of standardization Overhead SDP OFF SDP ON (100% clean) SDP ON (60% clean) Overhead of name standardization 2% 3% 3% Overhead of address standardization 2% 2% 0% Overhead of name and address standardization 4% 3% 2% Note: With 60% clean data, there are fewer unique addresses. This can result in less overhead. Suspect duplicate processing Table 8 shows the overhead of suspect duplicate processing with and without standardization in the MDM Server only scenario. Tests were conducted with both datasets (100% clean and 60% clean). History triggers were enabled during these tests. Table 8. Overhead of suspect duplicate processing Overhead 100% clean data 60% clean data Overhead of suspect duplicate processing 3% 20% Overhead of suspect duplicate processing along with name and address standardization 6% 21% History triggers If history triggers are enabled, the IO requirement on the database server increases significantly (nearly doubles). With enough IO bandwidth provided, the overhead is small (approximately 5%). Test 3: Scalability tests By definition, scalability is a measure of how well the throughput increases when more load is put on the system. However, for the example test, the number of processors did not actually vary. Instead, the number of parallel requests to the InfoSphere MDM Server were changed by varying the number of submitters in the batch processor. Data points were collected between 1 submitter and 24 submitters, at which point the system was clearly saturated. The test was conducted for both the MDM Server only and the MDM Server + QS scenarios. Tests were conducted in different configurations, and all of them showed near linear scalability. Table 9 shows the configuration settings for the third test. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 29 of 34
  • 30. developerWorks® ibm.com/developerWorks/ Table 9. Test 3 configuration Parameter Value Hardware/Software stack As described in example test environment InfoSphere MDM Server heap size Initial: 512MB; Max 1024MB InfoSphere MDM Server JVM GC policy Default Number of submitters in batch processor Varied between 1 to 24 Batch processor JVM memory 512MB ISD job configurations (applicable to the MDM Server + QS scenario only) Default Type of transaction used MaintainContractPlus Total volume 15000 to 100,000 records Input data quality 60% clean Name standardization ON (default) Address standardization ON (StandardFormatingIndicator to N in the requestXML) Suspect duplicate processing ON History triggers Enabled Test 3 results: Scalability results Figure 13 shows the scalability for the MDM Server only scenario. As shown by green line, the throughput increases almost linearly with an increase in the number of submitters. The example configuration utilized more than 90% of CPU capacity on the server running InfoSphere MDM Server. The results for MDM Server + QS are similar. Figure 13. Scalability of InfoSphere MDM Server with SDP ON Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 30 of 34
  • 31. ibm.com/developerWorks/ developerWorks® Conclusion Designed to provide flexibility in its deployments, developed on leading technology, and offering unmatched performance and scalability, InfoSphere Master Data Management Server has been the leading choice for a large number of organizations across a range of industries when implementing their MDM solutions. As the leader, IBM has the largest number of successfully deployed MDM implementations in the market today. This article explained what maintenance services are and how to set up maintenance services in an InfoSphere MDM Server environment. You saw enough details about configuration and tuning tips so you can follow and get maintenance service batch up and running with high performance. This article also covers the steps for setting up Information Server QualityStage for standardization and matching, if such configuration is required. Some key performance data points from various common scenarios are described, and they show that maintenance services, when being used for initial load, provides sustainable high performance and excellent scalability. Finally, this article summarized performance overhead measurements of some key features commonly used in MDM Server implementations. You might find them useful for capacity planning an MDM Server system based on the chosen features and for ensuring the required performance during initial load. Acknowledgments We would like to thank Lena Woolf, Berni Schiefer, and Karen Chouinard for their input and suggestions. We would also like to thank the other MDM Server team members for their support during this project. Notices ©IBM Corporation 2009. All Rights Reserved. THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. ALTHOUGH EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS DOCUMENT, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS DOCUMENT OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS DOCUMENT IS INTENDED TO, OR SHALL HAVE THE EFFECT OF CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE. All performance data contained in this publication was obtained in the specific operating environment and under the conditions described above and is presented as an illustration only. Performance obtained in other operating environments may vary and customers should conduct their own testing. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 31 of 34
  • 32. developerWorks® ibm.com/developerWorks/ Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. The information in this document concerning non-IBM products was obtained from the supplier(s) of those products. IBM has not tested such products and cannot confirm the accuracy of the performance, compatibility or any other claims related to non-IBM products. Questions about the capabilities of non-IBM products should be addressed to the supplier(s) of those products. The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in this publication to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth, savings or other results. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 32 of 34
  • 33. ibm.com/developerWorks/ developerWorks® Resources Learn • See IBM Redbook™Master Data Management: Rapid Deployment Package for MDM for more instructions. • Refer to the IBM InfoSphere MDM Server Information Center for more instructions. • Refer to the WebSphere Application Server, Version 6.1 Information Center to install IBM WebSphere Application Server Network Deployment, Version 6.1, and upgrade it with Fixpack 11. • Refer to the IBM DB2 Database for Linux®, UNIX®, and Windows Information Center to install DB2 Database Server, Version 9.5. • Refer to the IBM Information Server Information Center to install IIS Server, Version 8.0.1. • Learn more from IBM Redpaper WebSphere Customer Center: Understanding Performance • Discover DB2 Tuning Tips for OLTP Applications from this classic developerWorks article. • Explore the Information Management Software for z/OS Solutions Information Center. • Learn more about Information Management at the developerWorks Information Management zone. Find technical documentation, how-to articles, education, downloads, product information, and more. • Stay current with developerWorks technical events and webcasts. Get products and technologies • Build your next development project with IBM trial software, available for download directly from developerWorks. Discuss • Participate in the discussion forum for this content. • Check out the developerWorks blogs and get involved in the developerWorks community. Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 33 of 34
  • 34. developerWorks® ibm.com/developerWorks/ About the authors Neeraj Singh Neeraj R Singh is currently a senior performance engineer working on Master Data Management Server performance. He has prior experience leading the Java technologies test team for functional, system, and performance tests as technical lead and test project leader. He joined IBM in 2000 and holds a Bachelors Degree in Electronics and Communications Engineering. Yongli An Yongli An is an experienced performance engineer focusing on Master Data Management products and solutions. He is also experienced in DB2 database server and WebSphere performance tuning and benchmarking. He is an IBM Certified Application Developer and Database Administrator - DB2 for Linux, UNIX, and Windows. He joined IBM in 1998. He holds a bachelor degree in Computer Science and Engineering and a Masters degree in Computer Science. Currently Yongli is the manager of the MDM performance and benchmarks team, focusing on Master Data Management Server performance and benchmarks, and helping customers achieve optimal performance for their MDM systems. © Copyright IBM Corporation 2009 (www.ibm.com/legal/copytrade.shtml) Trademarks (www.ibm.com/developerworks/ibm/trademarks/) Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 34 of 34