SlideShare uma empresa Scribd logo
1 de 322
Baixar para ler offline
DS314SVR
STUDENT GUIDE
Contents - 2
Copyright © 2002 Ascential Software Corporation
Version 6.0: 09/01/02
Copyright
This document and the software described herein are the property of Ascential Software
Corporation and its licensors and contain confidential trade secrets. All rights to this
publication are reserved. No part of this document may be reproduced, transmitted,
transcribed, stored in a retrieval system or translated into any language, in any form or by any
means, without prior permission from Ascential Software Corporation.

Copyright © 2002 Ascential Software Corporation. All rights Reserved

Ascential Software Corporation reserves the right to make changes to this document and the
software described herein at any time and without notice. No warranty is expressed or
implied other than any contained in the terms and conditions of sale.

                              Ascential Software Corporation
                                   50 Washington Street
                              Westboro, MA 01581-1021 USA
                                  Phone: (508) 366-3888
                                   Fax: (508) 389-8749

Ardent, Axielle, DataStage, Iterations, MetaBroker, MetaStage, and uniVerse are registered
trademarks of Ascential Software Corporation. Pick is a registered trademark of Pick
Systems. Ascential Software is not a licensee of Pick Systems. Other trademarks and
registered trademarks are the property of the respective trademark holder.


09-01-2002




                                                                                 Contents - 3
Contents - 4
Copyright © 2002 Ascential Software Corporation
Version 6.0: 09/01/02
Table of Contents
Module 1: Introduction to DataStage ............................ 1-01
Module 2: Installing DataStage ..................................... 2-01
Module 3: Configuring Projects ..................................... 3-01
Module 4: Designing and Running Jobs ........................ 4-01
Module 5: Working with Metadata................................. 5-01
Module 6: Working with Relational Data ....................... 6-01
Module 7: Constraints and Derivations .......................... 7-01
Module 8: Creating BASIC Expressions ........................ 8-01
Module 9: Troubleshooting ............................................ 9-01
Module 10: Defining Lookups ...................................... 10-01
Module 11: Aggregating Data ...................................... 11-01
Module 12: Job Control................................................ 12-01
Module 13: Working with Plug-Ins ............................... 13-01
Module 14: Scheduling and Reporting ........................ 14-01
Module 15: Optimizing Job Performance .................... 15-01
Module 16: Putting It All Together .............................. 16-01




                                                                Contents - 5
Contents - 6
Copyright © 2002 Ascential Software Corporation
Version 6.0: 09/01/02
Module 1



Introduction to DataStage
Module 1 – Introduction to DataStage                           DataStage 314Svr




Ascential software provides the enterprise with a full featured data integration
platform that can take data from any source and load it into any target. Sources
can range from customer relationship systems to legacy systems to data
warehouses -- in fact, any system that houses data. Target systems, likewise, can
consist of data in warehouses, real-time systems, Web services -- any application
that houses data.

Depending on your needs, source data can undergo scrutiny and transformation
through several stages:
    1. Data profiling -- a discovery process where relevant information for target
       enterprise applications is gathered
    2. Data quality -- a preparation process where data can be cleansed and
       corrected
    3. Extract, Transform, Load -- a transformation process where data is
       enriched and loaded into the target

Underlying these processes is an application framework that allows you to
   1. Utilize parallel processing for maximum performance
   2. Manage and share metadata amongst all the stages

Overlaying all of this is a command and control structure that allows you to tailor
your environment to your specific needs.

1-2
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials   Module 1 – Introduction to DataStage




                                                      1-3
Module 1 – Introduction to DataStage                         DataStage 314Svr




A data warehouse is a central database that integrates data from many operational
sources within an organization. The data is transformed, summarized, and
organized to support business analysis and report generation.
•   Repository of data
•   Optimized for analysis
•   Supports business:
    − Projections
    − Comparisons
    − Assessments
•   Extracted from operational sources
    − Integrated
    − Summarized
    − Filtered
    − Cleansed
    − Denormalized
    − Historical


1-4
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                    Module 1 – Introduction to DataStage




Data marts are like data warehouses but smaller in scope. Frequently an
organization will have both an enterprise-wide data warehouse and data marts that
extract data from it for specialized purposes.
•   Like data warehouses but smaller in scope
•   Organize data from a single subject area or department
•   Solve a small set of business requirements
•   Are cheaper and faster to build than a data warehouse
•   Distribute data away from the data warehouse




                                                                            1-5
Module 1 – Introduction to DataStage                            DataStage 314Svr




DataStage is a comprehensive tool for the fast, easy creation and maintenance of
data marts and data warehouses. It provides the tools you need to build, manage,
and expand them. With DataStage, you can build solutions faster and give users
access to the data and reports they need.
With DataStage you can:
•   Design the jobs that extract, integrate, aggregate, load, and transform the data
    for your data warehouse or data mart.
•   Create and reuse metadata and job components.
•   Run, monitor, and schedule these jobs.
•   Administer your development and execution environments.




1-6
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                    Module 1 – Introduction to DataStage




DataStage is client/server software. The server stores all DataStage objects and
metadata in a repository, which consists of the UniVerse RDBMS. The clients
interface with the server.
The clients run on Windows 95 or later (Windows 98, NT, 2000). The server runs
on Windows NT 4.0 and Windows 2000. Most versions of UNIX are supported.
See the installation release notes for details.
The DataStage client components are:

 Component               Description
 Administrator           Administers DataStage projects and conducts
                         housekeeping on the server

 Designer                Creates DataStage jobs that are compiled into
                         executable programs

 Director                Used to run and monitor the DataStage jobs

 Manager                 Allows you to view and edit the contents of the
                         repository




                                                                              1-7
Module 1 – Introduction to DataStage                        DataStage 314Svr




True or False? The DataStage Server and clients must be running on the
same machine.
True: Incorrect. Typically, there are many client machines each accessing the
same DataStage Server running on a separate machine. The Server can be
running on Windows NT or UNIX. The clients can be running on a variety of
Windows platforms.
False: Correct! Typically, there are many client machines each accessing the
same DataStage Server running on a separate machine. The Server can be
running on Windows NT or UNIX. The clients can be running on a variety of
Windows platforms.




1-8
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                    Module 1 – Introduction to DataStage




Use the Administrator to specify general server defaults, add and delete projects,
and to set project properties. The Administrator also provides a command
interface to the UniVerse repository.
•   Use the Administrator Project Properties window to:
•   Set job monitoring limits and other Director defaults on the General tab.
•   Set user group privileges on the Permissions tab.
•   Enable or disable server-side tracing on the Tracing tab.
•   Specify a user name and password for scheduling jobs on the Schedule tab.
•   Specify hashed file stage read and write cache sizes on the Tunables tab.
General server defaults can be set on the Administrator DataStage
Administration window (not shown):
•   Change license information.
•   Set server connection timeout.
The DataStage Administrator is discussed in detail in a later module.




                                                                                1-9
Module 1 – Introduction to DataStage                             DataStage 314Svr




Use the Manager to store and manage reusable metadata for the jobs you define in
the Designer. This metadata includes table and file layouts and routines for
transforming extracted data.
Manager is also the primary interface to the DataStage repository. In addition to
table and file layouts, it displays the routines, transforms, and jobs that are defined
in the project. Custom routines and transforms can also be created in Manager.




1 - 10
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                     Module 1 – Introduction to DataStage




The DataStage Designer allows you to use familiar graphical point-and-click
techniques to develop processes for extracting, cleansing, transforming,
integrating and loading data into warehouse tables.
The Designer provides a “visual data flow” method to easily interconnect and
configure reusable components.
Use Designer to:
•   Specify how the data is extracted.
•   Specify data transformations.
•   Decode (denormalize) data going into the data mart using reference lookups.
    − For example, if the sales order records contain customer IDs, you can look
      up the name of the customer in the CustomerMaster table.
    − This avoids the need for a join when users query the data mart, thereby
      speeding up the access.
•   Aggregate data.
•   Split data into multiple outputs on the basis of defined constraints.
You can easily move between the Director, Designer, and Manager by selecting
commands in the Tools menu.




                                                                            1 - 11
Module 1 – Introduction to DataStage                         DataStage 314Svr




Use the Director to validate, run, schedule, and monitor your DataStage jobs.
You can also gather statistics as the job runs.




1 - 12
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                    Module 1 – Introduction to DataStage




•   Define your project’s properties: Administrator
•   Open (attach to) your project
•   Import metadata that defines the format of data stores your jobs will read from
    or write to: Manager
•   Design the job: Designer
    − Define data extractions (reads)
    − Define data flows
    − Define data integration
    − Define data transformations
    − Define data constraints
    − Define data loads (writes)
    − Define data aggregations
•   Compile and debug the job: Designer
•   Run and monitor the job: Director




                                                                            1 - 13
Module 1 – Introduction to DataStage                          DataStage 314Svr




All your work is done in a DataStage project. Before you can do anything, other
than some general administration, you must open (attach to) a project.
Projects are created during and after the installation process. You can add
projects after installation on the Projects tab of Administrator.
A project is associated with a directory. The project directory is used by
DataStage to store your jobs and other DataStage objects and metadata.
You must open (attach to) a project before you can do any work in it.
Projects are self-contained. Although multiple projects can be open at the same
time, they are separate environments. You can, however, import and export
objects between them.
Multiple users can be working in the same project at the same time. However,
DataStage will prevent multiple users from accessing the same job at the same
time.




1 - 14
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                   Module 1 – Introduction to DataStage




DataStage Designer is used to build and compile your Extraction,
Transformation, and Load (ETL) jobs.
True: Correct! With Designer you can graphically build your job by placing
graphical components (called "stages") on a canvas. After you build it, your job
is compiled in Designer.
False: Incorrect. With Designer you can graphically build your job by placing
graphical components (called "stages") on a canvas. After you build it, your job
is compiled in Designer.


DataStage Manager is used to execute your jobs after you build them.
True: Incorrect. DataStage Manager is your primary interface to the DataStage
repository. Use Manager to manage metadata and other DataStage objects.
False: Correct! DataStage Manager is your primary interface to the DataStage
repository. Use Manager to manage metadata and other DataStage objects.




                                                                           1 - 15
Module 1 – Introduction to DataStage                        DataStage 314Svr


DataStage Director is used to execute your jobs after they have been built.
True: Correct! Use Director to validate and run your jobs. You can also
monitor the job while it is running.
False: Incorrect. Use Director to validate and run your jobs. You can also
monitor the job while it is running.


DataStage Administrator is used to set global and project properties.
True: Correct! You can set some global properties such as connection timeout,
as well as project properties, such as permissions.
False: Incorrect. You can set some global properties such as connection timeout,
as well as project properties, such as permissions.




1 - 16
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials   Module 1 – Introduction to DataStage




                                                     1 - 17
Module 2



Installing DataStage
Module 2 – Installing DataStage                   DataStage 314Svr




2-2
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                           Module 2 – Installing DataStage




The DataStage server should be installed before the DataStage clients are
installed. The server can be installed on Windows NT (including Workstation
and Server), Windows 2000, or UNIX. This module describes the Windows NT
installation.
The exact system requirements depend on your version of DataStage. See the
installation CD for the latest system requirements.
To install the server you will need the installation CD and a license for the
DataStage server. The license contains the following information:
•   Serial number
•   Project count
    − The maximum number of projects you can have installed on the server.
      This includes new projects as well as previously created projects to be
      upgraded.
•   Expiration date
•   Authorization code
    − This information must be entered exactly as written in the license.




                                                                                2-3
Module 2 – Installing DataStage                              DataStage 314Svr


The installation wizard guides you through the following steps:
•   Enter license information
•   Specify server directories
•   Select program folder
•   Create new projects and/or upgrade existing projects




2-4
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials   Module 2 – Installing DataStage




                                                 2-5
Module 2 – Installing DataStage                   DataStage 314Svr




2-6
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                         Module 2 – Installing DataStage




The DataStage services must be running on the server machine in order to run any
DataStage client applications. To start or stop the DataStage services in Windows
2000, open the DataStage Control Panel window in the Windows 2000 Control
Panel. Then click Start All Services (or Stop All Services). These services must
be stopped when installing or reinstalling DataStage.
UNIX note: In UNIX, these services are started and stopped using the uv.rc
script with the stop or start command options. The exact name varies by platform.
For SUN Solaris, it is /etc/rc2.d/S99uv.rc.




                                                                            2-7
Module 2 – Installing DataStage                                 DataStage 314Svr




The DataStage clients should be installed after the DataStage server is installed.
The clients can be installed on Windows 95, Windows 98, Windows NT, or
Windows 2000.
There are two editions of DataStage.
•   The Developer’s edition contains all the client applications (in addition to the
    server).
•   The Operator’s edition contains just the client applications needed to run and
    monitor DataStage jobs (in addition to the server), namely, the Director and
    Administrator.
To install the Developer’s edition you need a license for DataStage Developer.
To install the Operator’s edition you need a license for DataStage Director. The
license contains the following information:
•   Serial number
•   User limit
•   Expiration date
•   Authorization code
    − This information must be entered exactly as written in the license.


2-8
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials   Module 2 – Installing DataStage




                                                 2-9
Module 2 – Installing DataStage                   DataStage 314Svr




2 - 10
Copyright © 2002 Ascential Software Corporation
09/01/02
Module 3



Configuring Projects
Module 3 – Configuring Projects                   DataStage 314Svr




3-2
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                           Module 3 – Configuring Projects




In DataStage all development work is done within a project. Projects are created
during installation and after installation using Administrator.
Each project is associated with a directory. The directory stores the objects (jobs,
metadata, custom routines, etc.) created in the project.
Before you can work in a project you must attach to it (open it).
You can set the default properties of a project using DataStage Administrator.




                                                                               3-3
Module 3 – Configuring Projects                   DataStage 314Svr




3-4
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                          Module 3 – Configuring Projects




Click Properties on the DataStage Administration window to open the Project
Properties window. There are five active tabs. (The Mainframe tab is only
enabled if your license supports mainframe jobs.) The default is the General tab.
If you select the Enable job administration in Director box, you can perform
some administrative functions in Director without opening Administrator.
When a job is run in Director, events are logged describing the progress of the
job. For example, events are logged when a job starts, when it stops, and when it
aborts. The number of logged events can grow very large. The Auto-purge of
job log box tab allows you to specify conditions for purging these events.
You can limit the logged events either by number of days or number of job runs.




                                                                             3-5
Module 3 – Configuring Projects                               DataStage 314Svr




Use this page to set user group permissions for accessing and using DataStage.
All DataStage users must belong to a recognized user role before they can log on
to DataStage. This helps to prevent unauthorized access to DataStage projects.
There are three roles of DataStage user:
•   DataStage Developer, who has full access to all areas of a DataStage project.
•   DataStage Operator, who can run and manage released DataStage jobs.
•   <None>, who does not have permission to log on to DataStage.
UNIX note: In UNIX, the groups displayed are defined in /etc/group.




3-6
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                            Module 3 – Configuring Projects




This tab is used to enable and disable server-side tracing.
The default is for server-side tracing to be disabled. When you enable it,
information about server activity is recorded for any clients that subsequently
attach to the project. This information is written to trace files. Users with in-depth
knowledge of the system software can use it to help identify the cause of a client
problem. If tracing is enabled, users receive a warning message whenever they
invoke a DataStage client.
Warning: Tracing causes a lot of server system overhead. This should only be
used to diagnose serious problems.




                                                                                3-7
Module 3 – Configuring Projects                               DataStage 314Svr




Use the Schedule tab to specify a user name and password for running scheduled
jobs in the selected project. If no user is specified here, the job runs under the
same user name as the system scheduler.




3-8
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                           Module 3 – Configuring Projects




On the Tunables tab, you can specify the sizes of the memory caches used when
reading rows in hashed files and when writing rows to hashed files. Hashed files
are mainly used for lookups and are discussed in a later module.
Active-to-Active link performance settings will be covered in detail in a later
module in this course.




                                                                                  3-9
Module 3 – Configuring Projects                   DataStage 314Svr




3 - 10
Copyright © 2002 Ascential Software Corporation
09/01/02
Module 4



Designing and Running Jobs
Module 4 – Designing and Running Jobs             DataStage 314Svr




4-2
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                  Module 4 – Designing and Running Jobs




A job is an executable DataStage program. In DataStage, you can design and run
jobs that perform many useful data warehouse tasks, including data extraction,
data conversion, data aggregation, data loading, etc.
DataStage jobs are:
•   Designed and built in Designer.
•   Scheduled, invoked, and monitored in Director.
•   Executed under the control of DataStage.




                                                                          4-3
Module 4 – Designing and Running Jobs                        DataStage 314Svr




In this module, you will go through the whole process with a simple job, except
for the first bullet. In this module you will manually define the metadata.




4-4
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                 Module 4 – Designing and Running Jobs




In the center right is the Designer canvas. On it you place stages and links from
the Tools Palette on the right. On the bottom left is the Repository window,
which displays the branches in Manager. Items in Manager, such as jobs and
table definitions can be dragged to the canvas area. Click View>Repository to
display the Repository window.
Click View>Property Browser to display the Property Broswer window. This
window displays the properties of objects selected on the canvas.




                                                                              4-5
Module 4 – Designing and Running Jobs                         DataStage 314Svr




The toolbar at the top contains quick access to the main functions of Designer.




4-6
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                  Module 4 – Designing and Running Jobs




The tool palette contains icons that represent the components you can add to your
job design.
Most of the stages shown here are automatically installed when you install
DataStage. You can also install additional stages called plug-ins for special
purposes. For example, there is a plug-in called sort that can be used to sort data.
Plug-ins are discussed in a later module.




                                                                               4-7
Module 4 – Designing and Running Jobs                         DataStage 314Svr




There are two kinds of stages:
Passive stages define read and write access to data sources and repositories.
•   Sequential
•   ODBC
•   Hashed
Active stages define how data is filtered and transformed.
•   Transformer
•   Aggregator
•   Sort plug-in




4-8
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                  Module 4 – Designing and Running Jobs




True or False? The Sequential stage is an active stage.
True: Incorrect. The Sequential stage is considered a passive stage because it is
used to extract or load sequential data from a file. It is not used to transform or
modify data.
False: Correct! The Sequential stage is considered a passive stage because it is
used to extract or load sequential data from a file. It is not used to transform or
modify data.




                                                                                4-9
Module 4 – Designing and Running Jobs             DataStage 314Svr




4 - 10
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                   Module 4 – Designing and Running Jobs




The Sequential stage is used to extract data from a sequential file or to load data
into a sequential file.
The main things you need to specify when editing the sequential file stage are the
following:
•   Path and name of file
•   File format
•   Column definitions
•   If the sequential stage is being used as a target, specify the write action:
    Overwrite the existing file or append to it.




                                                                                   4 - 11
Module 4 – Designing and Running Jobs             DataStage 314Svr




4 - 12
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                    Module 4 – Designing and Running Jobs




Defining a sequential target stage is similar to defining a sequential source stage.
You are defining the format of the data flowing into the stage, that is, from the
input links. Define each input link listed in the Input name box.
You are defining the file the job will write to. If the file doesn’t exist, it will be
created. Specify whether to overwrite or append the data in the Update action
set of buttons.
General Tab Filter command. Here you can specify a filter program for
processing the file you are extracting data from. This feature can be used, for
example, to unzip a compressed file before reading it. You can type in or browse
for the filter program, and specify any command line arguments it requires in the
text box. This text box is enabled only if you have selected the Stage uses filter
commands checkbox on the Stage page General tab. Note that, if you specify a
filter command, data browsing is not available so the View Data button is
disabled.

On the Format tab, you can specify a different format for the target file than you
specified for the source file.
If the target file doesn’t exist, you will not (of course!) be able to view its data
until after the job runs. If you click the View data button, DataStage will return a
“Failed to open …” error.




                                                                                  4 - 13
Module 4 – Designing and Running Jobs                          DataStage 314Svr


The column definitions you defined in the source stage for a given (output) link
will appear already defined in the target stage for the corresponding (input) link.
Think of a link as like a pipe. What flows in one end flows out the other end.
The format going in is the same as the format going out.




4 - 14
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                 Module 4 – Designing and Running Jobs




The Transformer stage is the primary active stage. Other active stages perform
more specialized types of transformations.
In the Transformer stage you can specify:
•   Column mappings
•   Derivations
•   Constraints
A column mapping maps an input column to an output column. Values are
passed directly from the input column to the output column.
Derivations calculate the values to go into output columns based on values in zero
or more input columns.
Constraints specify the conditions under which incoming rows will be written to
output links.




                                                                            4 - 15
Module 4 – Designing and Running Jobs                            DataStage 314Svr




Notice the following elements of the transformer:
The top, left pane displays the columns of the input links. If there are multiple
input links, multiple sets of columns are displayed.
The top, right pane displays the contents of the output links. We haven’t defined
any fields here yet. If there are multiple output links, multiple sets of columns are
displayed.
For now, ignore the Stage Variables window in the top, right pane. This will be
discussed in a later module.
The bottom area shows the column definitions (metadata) for the input and output
links.
If there are multiple input and/or output links, there will be multiple tabs.




4 - 16
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials   Module 4 – Designing and Running Jobs




                                                      4 - 17
Module 4 – Designing and Running Jobs                      DataStage 314Svr




Add one or more Annotation stages to the canvas to document your job.
An Annotation stage works like a text box with various formatting options. You
can optionally show or hide the Annotation stages by pressing a button on the
toolbar.
There are two Annotation stages. The Description Annotation stage is discussed
in a later module.




4 - 18
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                   Module 4 – Designing and Running Jobs




Type the text in the box. Then specify the various options including:
•   Text font and color
•   Text box color
•   Vertical and horizontal text justification




                                                                        4 - 19
Module 4 – Designing and Running Jobs                       DataStage 314Svr




Before you can run your job, you must compile it. This generates executable code
that can be run by the DataStage Server engine. To compile a job, click
File>Compile or click the Compile button on the toolbar. The Compile Job
window displays the status of the compile.
If an error occurs:
Click Show Error to identify the stage where the error occurred.
Click More to retrieve more information about the error.




4 - 20
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                    Module 4 – Designing and Running Jobs




As you know, you run your jobs in Director. You can open Director from within
Designer by clicking Tools>Run Director.
In a similar way, you can move between Director, Manager, and Designer.
There are two methods for running a job:
•   Run it immediately.
•   Schedule it to run at a later time or date.
To run a job immediately:
•   Select the job in the Job Status view. The job must have been compiled.
•   Click Job>Run Now or click the Run Now button in the toolbar. The Job
    Run Options window is displayed.




                                                                          4 - 21
Module 4 – Designing and Running Jobs                         DataStage 314Svr




This shows the Director Job Status view. To run a job, select it and then click
Job>Run Now.
Other views available:
   •   Job log – view messages from job run
   •   Schedule – view dates and times job is scheduled to run




4 - 22
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                  Module 4 – Designing and Running Jobs




    •


The Job Run Options window is displayed when you click Job>Run Now.
This window allows you to stop the job after:
•   A certain number of rows.
•   A certain number of warning messages.
You can validate your job before you run it. Validation performs some checks
that are necessary in order for your job to run successfully. These include:
•   Verifying that connections to data sources can be made.
•   Verifying that files can be opened.
•   Verifying that SQL statements used to select data can be prepared.
Click Run to run the job after it is validated. The Status column displays the
status of the job run.




                                                                            4 - 23
Module 4 – Designing and Running Jobs                          DataStage 314Svr




Click the Log button in the toolbar to view the job log. The job log records
events that occur during the execution of a job.
These events include control events, such as the starting, finishing, and aborting
of a job; informational messages; warning messages; error messages; and
program-generated messages.




4 - 24
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials   Module 4 – Designing and Running Jobs




                                                      4 - 25
Module 4 – Designing and Running Jobs             DataStage 314Svr




4 - 26
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials   Module 4 – Designing and Running Jobs




                                                      4 - 27
Module 5



Working with Meta Data
Module 5 – Working with Meta Data                 DataStage 314Svr




5-2
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                       Module 5 – Working with Meta Data




DataStage Manager is a graphical tool for managing the contents of your
DataStage project repository, which contains metadata and other DataStage
components such as jobs and routines.
Metadata is “data about data” that describes the formats of sources and targets.
This includes general format information such as whether the record columns are
delimited and, if so, the delimiting character. It also includes the specific column
definitions.




                                                                               5-3
Module 5 – Working with Meta Data                 DataStage 314Svr




5-4
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                        Module 5 – Working with Meta Data




The left pane contains the project tree. There are eight main branches, but you
can create subfolders under each. Select a folder in the project tree to display its
contents. In this example, a folder named DS304 has been created that contains
some of the jobs in the project.
Data Elements branch: Lists the built-in and custom data elements. (Data
elements are extensions of data types, and are discussed in a later module.)
Jobs branch: Lists the jobs in the current project.
Routines branch: Lists the built-in and custom routines.
Routines are blocks of DataStage BASIC code that can be called within a job.
(Routines are discussed in a later module.)
Shared Containers branch: Shared Containers encapsulate sets of DataStage
components into a single stage. (Shared Containers are discussed in a later
module.)
Stage Types branch: Lists the types of stages that are available within a job.
Built-in stages include the sequential and transformer stages you used in
Designer.
Table Definitions branch: Lists the table definitions available for loading into a
job.




                                                                                 5-5
Module 5 – Working with Meta Data                           DataStage 314Svr


Transforms branch: Lists the built-in and custom transforms. Transforms are
functions you can use within a job for data conversion. Transforms are discussed
in a later module.




5-6
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                       Module 5 – Working with Meta Data




DataStage Manager manages two different types of objects:
•   Metadata describing sources and targets:
    − Called table definitions in Manager. These are not to be confused with
      relational tables. DataStage table definitions are used to describe the
      format and column definitions of any type of source: sequential,
      relational, hashed file, etc.
    − Table definitions can be created in Manager or Designer and they can also
      be imported from the sources or targets they describe.
•   DataStage components
    − Every object in DataStage (jobs, routines, table definitions, etc.) is stored
      in the DataStage repository. Manager is the interface to this repository.
    − DataStage components, including whole projects, can be exported from
      and imported into Manager.




                                                                               5-7
Module 5 – Working with Meta Data                              DataStage 314Svr




Any set of DataStage objects, including whole projects, which are stored in the
Manager Repository, can be exported to a file. This export file can then be
imported back into DataStage.
Import and export can be used for many purposes, including:
•   Backing up jobs and projects.
•   Maintaining different versions of a job or project.
•   Moving DataStage objects from one project to another. Just export the
    objects, move to the other project, then re-import them into the new project.
•   Sharing jobs and projects between developers. The export files, when zipped,
    are small and can be easily emailed from one developer to another.




5-8
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                        Module 5 – Working with Meta Data




Click Export>DataStage Components in Manager to begin the export process.
Any object in Manager can be exported to a file. Use this procedure to backup
your work or to move DataStage objects from one project to another.
Select the types of components to export. You can select either the whole project
or select a portion of the objects in the project.
Specify the name and path of the file to export to. By default, objects are
exported to a text file in a special format. By default, the extension is dsx.
Alternatively, you can export the objects to an XML document.
The directory you export to is on the DataStage client, not the server.




                                                                                 5-9
Module 5 – Working with Meta Data                             DataStage 314Svr




True or False? You can export DataStage objects such as jobs, but you can't
export metadata, such as field definitions of a sequential file.
True: Incorrect. Metadata describing files and relational tables are stored as
"Table Definitions". Table definitions can be exported and imported as any
DataStage objects can.
False: Correct! Metadata describing files and relational tables are stored as
"Table Definitions". Table definitions can be exported and imported as any
DataStage objects can.




5 - 10
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                     Module 5 – Working with Meta Data




True or False? The directory you export to is on the DataStage client
machine, not on the DataStage server machine.
True: Correct! The directory you select for export must be addressible by your
client machine.
False: Incorrect. The directory you select for export must be addressible by your
client machine.




                                                                           5 - 11
Module 5 – Working with Meta Data                 DataStage 314Svr




5 - 12
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                      Module 5 – Working with Meta Data




To import DataStage components, click Import>DataStage Components.
Select the file to import. Click Import all to begin the import process or Import
selected to view a list of the objects in the import file. You can import selected
objects from the list. Select the Overwrite without query button to overwrite
objects with the same name without warning.




                                                                            5 - 13
Module 5 – Working with Meta Data                 DataStage 314Svr




5 - 14
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials   Module 5 – Working with Meta Data




                                                  5 - 15
Module 5 – Working with Meta Data                              DataStage 314Svr




Table definitions define the formats of a variety of data files and tables. These
definitions can then be used and reused in your jobs to specify the formats of data
stores.
For example, you can import the format and column definitions of the
Customers.txt file. You can then load this into the sequential source stage of a
job that extracts data from the Customers.txt file.
You can load this same metadata into other stages that access data with the same
format. In this sense the metadata is reusable. It can be used with any file or data
store with the same format.
If the column definitions are similar to what you need you can modify the
definitions and save the table definition under a new name.
You can also use the same table definition for different types of data stores with
the same format. For example, you can import a table definition from a sequential
file and use it to specify the format for an ODBC table. In this sense the metadata
is “loosely coupled” with the data whose format it defines.
You can import and define several different kinds of table definitions including:
Sequential files, ODBC data sources, UniVerse tables, hashed files.




5 - 16
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                        Module 5 – Working with Meta Data




To start the import, click Import>Table Definitions>Sequential File
Definitions. The Import Meta Data (Sequential) window is displayed.
Select the directory containing the sequential files. The Files box is then
populated with the files you can import.
Select the file to import.
Select or specify a category (folder) to import into.
•   The format is: <Category><Sub-category>
•   <Category> is the first-level sub-folder under Table Definitions.
•   <Sub-category> is (or becomes) a sub-folder under the type.




                                                                              5 - 17
Module 5 – Working with Meta Data                               DataStage 314Svr




In Manager, select the category (folder) that contains the table definition.
Double-click the table definition to open the Table Definition window.
Click the Columns tab to view and modify any column definitions. Select the
Format tab to edit the file format specification.




5 - 18
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials   Module 5 – Working with Meta Data




                                                  5 - 19
Module 5 – Working with Meta Data                 DataStage 314Svr




5 - 20
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials   Module 5 – Working with Meta Data




                                                  5 - 21
Module 5 – Working with Meta Data                 DataStage 314Svr




5 - 22
Copyright © 2002 Ascential Software Corporation
09/01/02
Module 6



Working with Relational Data
Module 6 – Working with Relational Data           DataStage 314Svr




6-2
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                  Module 6 – Working with Relational Data




You can perform the same tasks with relational data that you can with sequential
data. You can extract, filter, and transform data from relational tables.
You can also load data into relational tables.
Although you can work with many relational databases through native drivers
(including UniVerse, UniData, and Oracle), you can access many more relational
databases using ODBC.
In the ODBC stage, you can either specify your query to one or more tables in the
database interactively or you can type the query or you can paste in an existing
query.




                                                                            6-3
Module 6 – Working with Relational Data                      DataStage 314Svr




Before you can access data through ODBC you must define an ODBC data
source. In Windows NT, this can be done using the (32 bit) ODBC Data Source
Administrator in the Control Panel.
The ODBC Data Source Administrator has several tabs. For use with DataStage,
you should define your data sources on the System DSN tab (not User DSN).
You can install drivers for most of the common relational database systems from
the DataStage installation CD.
Click Add to define a new data source. When you click Add a list of available
drivers is displayed. Select the appropriate driver and then click Finish.
Different relational databases have different requirements. As an example, we
will define a Microsoft Access data source.
•   Type the name of the data source in the Data Source Name box.
•   Click Select to define a connection to an existing database. Type the name
    and location of the database.
•   Click Create to define a connection to a new database.




6-4
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials   Module 6 – Working with Relational Data




                                                         6-5
Module 6 – Working with Relational Data                       DataStage 314Svr




Importing table definitions from ODBC databases is similar to importing
sequential file definitions. Click Import>Table Definitions>ODBC Table
Definitions in Manager to start the process.
The DSN list displays the data sources that are defined for the DataStage Server.
Select the data source you want to import from and, if necessary, provide a user
name and password.
The Import Metadata window is displayed. It lists all tables in the database that
are available for import. Select one or more tables and a category to import to,
and then click OK.




6-6
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                   Module 6 – Working with Relational Data




Extracting data from a relational table is similar to extracting data from a
sequential file except that you use an ODBC stage instead of a sequential stage.
In this example, we’ll extract data from a relational table and load it into a
sequential file.




                                                                                 6-7
Module 6 – Working with Relational Data                    DataStage 314Svr




Specify the ODBC data source name in the Data source name box on the
General tab of the ODBC stage.
You can click the Get SQLInfo button to retrieve the quote character and schema
delimiters from the ODBC database.




6-8
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                Module 6 – Working with Relational Data




Specify the table name on the General tab of the Outputs tab.
Select Generated query to define the SQL SELECT statement interactively using
the Columns and Selection tabs. Select User-defined SQL query to write your
own SQL SELECT statement to send to database.




                                                                        6-9
Module 6 – Working with Relational Data                   DataStage 314Svr




Load the table definitions from Manager on the Columns tab. The procedure is
the same as for sequential files.
When you click Load, the Select Columns window is displayed. Select the
columns data is to be extracted from.




6 - 10
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                 Module 6 – Working with Relational Data




Optionally, specify a WHERE clause and other additional SQL clauses on the
Selection tab.
Other clauses can be anything else you wish to add to the Select clause, such as
ORDER BY.




                                                                            6 - 11
Module 6 – Working with Relational Data                   DataStage 314Svr




The View SQL tab enables you to view the SELECT statement that will be used
to select the data from the table.
The SQL displayed in “read-only.” Click View Data to test the SQL statement
against the database.




6 - 12
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials               Module 6 – Working with Relational Data




If you want to define your own SQL query, click User-defined SQL query on
the General tab and then write or paste the query into the SQL for primary
inputs box on the SQL Query tab.




                                                                       6 - 13
Module 6 – Working with Relational Data           DataStage 314Svr




6 - 14
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                  Module 6 – Working with Relational Data




Editing an ODBC target stage is similar to editing an ODBC source stage. It
includes the following tasks:
•   Specify the data source containing the target table.
•   Specify the name of the table.
•   Select the update action. You can choose from a variety of INSERT and/or
    UPDATE actions.
•   Optionally, create the table.
•   Load the column definitions from the Manager table definition.




                                                                          6 - 15
Module 6 – Working with Relational Data                       DataStage 314Svr




Some of the options are different in the ODBC stage when it is used as a target.
Select the type of action to perform from the Update action list.
You can optionally have DataStage create the target table or you can load to an
existing table.
On the View SQL tab you can view the SQL statement used to insert the data into
the target table.




6 - 16
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                Module 6 – Working with Relational Data




On the Edit DDL tab you can generate and modify the CREATE TABLE
statement used to create the target table.
If you make any changes to column definitions, you need to regenerate the
CREATE TABLE statement by clicking the Create DDL button.




                                                                            6 - 17
Module 6 – Working with Relational Data                        DataStage 314Svr




Transaction Handling: Allows you to specify a transaction isolation level for
read data. The isolation level specifies how potential conflicts between
transactions (i.e., dirty read, nonrepeatable reads, and phantom reads) are handled.

By default, all the rows are written to the target table before a COMMIT. In the
Rows per transaction box, you can specify a specific number of rows to write
before the COMMIT.




6 - 18
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials   Module 6 – Working with Relational Data




                                                       6 - 19
Module 6 – Working with Relational Data                      DataStage 314Svr




True or False? Using a single ODBC stage, you can only extract data from a
single table.
True: Incorrect. You can join data from multiple tables within a single data
source.
False: Correct! You can join data from multiple tables within a single data
source.




6 - 20
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials   Module 6 – Working with Relational Data




                                                       6 - 21
Module 6 – Working with Relational Data           DataStage 314Svr




6 - 22
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials               Module 6 – Working with Relational Data




The ORAOCI8 plug-in lets you rapidly and efficiently prepare and load streams
of tabular data from any DataStage stage (for example, the ODBC stage, the
Sequential File stage, and so forth) to and from tables of the target Oracle
database. The Oracle client on Windows NT or UNIX uses SQL*Net to access an
Oracle server on Windows NT or UNIX.




                                                                       6 - 23
Module 6 – Working with Relational Data                       DataStage 314Svr




The plug-in appears as any other stage on the designer work area. It can extract or
write data contained in Oracle tables.
Features:
   •   Each ORAOCI8 plug-in stage is a passive stage that can have any number
       of input, output, and reference output links.
   •   Input links specify the data you are writing, which is a stream of rows to
       be loaded into an Oracle database. You can specify the data on an input
       link using an SQL statement constructed by DataStage or a user-defined
       SQL statement.
   •   Output links specify the data you are extracting, which is a stream of
       rowsto be read from an Oracle database. You can also specify the data on
       an output link using an SQL statement constructed by DataStage or a
       userdefined SQL statement.
   •   Each reference output link represents a row that is key read from an
       Oracle database (that is, it reads the record using the key field in the
       WHERE clause of the SQL SELECT statement).




6 - 24
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                   Module 6 – Working with Relational Data




General Tab
This tab is displayed by default. It contains the following fields:

Table name. This required field is editable when the update action is not User-
defined SQL (otherwise, it is read-only). It is the name of the target Oracle table
the data is written to, and the table must exist or be created by choosing generate
DDL from the Create table action list. You must have insert, update, or delete
privileges, depending on input mode. You must specify Table name if you do not
specify User-defined SQL. There is no default. Click … (Browse button) to
browse the Repository to select the table.

Update action. Specifies which SQL statements are used to update the target
table. Some update actions require key columns to update or delete rows. There is
no default. Choose the option you want from the list.

Clear table then insert rows. Deletes the contents of the table and adds the new
rows, with slower performance because of transaction logging.

Truncate table then insert rows. Truncates the table with no transaction logging
and faster performance.




                                                                             6 - 25
Module 6 – Working with Relational Data                        DataStage 314Svr


Insert rows without clearing. Inserts the new rows in the table.

Delete existing rows only. Deletes existing rows in the target table that have
identical keys in the source files.

Replace existing rows completely. Deletes the existing rows, then adds the new
rows to the table.

Update existing rows only. Updates the existing data rows. Any rows in the data
that do not exist in the table are ignored.

Update existing rows or insert new rows. Updates the existing data rows before
adding new rows. It is faster to update first when you have a large number of
records.

Insert new rows or update existing rows. Inserts the new rows before updating
existing rows. It is faster to insert first if you have only a few records.

User-defined SQL. Writes the data using a user-defined SQL statement,
which overrides the default SQL statement generated by the stage. If you
choose this option, you enter the SQL statement on the SQL tab.

User-defined SQL file. Reads the contents of the specified file to write
the data.

Transaction Isolation. Provides the necessary concurrency control between
transactions in the job and other transactions. Use one of the following transaction
isolation levels:
    • Read committed. Takes exclusive locks on modified data and sharable
        locks on all other data. Each query executed by a transaction sees only
        data that was committed before the query (not the transaction) began.
        Oracle queries never read dirty (uncommitted) data. This is the default.
    • Serializable. Takes exclusive locks on modified data and sharable locks
        on all other data. Serializable transactions see only the changes that were
        committed at the time the transaction began.
                Note: If Enable transaction grouping is selected on the
                Transaction
                Handling tab, only the Transaction Isolation value for the first
                link is used for the entire group.
    • Array size. Specifies the number of rows to be transferred in one call
        between DataStage and Oracle before they are written. Enter a positive
        integer to indicate how often Oracle performs writes at a time to the
        database. The default value is 1, that is, each row is written in a separate
        statement. Larger numbers use more memory on the client to cache the

6 - 26
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                Module 6 – Working with Relational Data


       rows. This minimizes server round trips and maximizes performance by
       executingfewer statements. If this number is too large, the client may run
       out of memory.
   •   Transaction size. This field exists for backward compatibility, but it is
       ignored for version 3.0 and later of the plug-in. The transaction size for
       new jobs is now handled by Rows per transaction on the Transaction
       Handling tab.
   •   Create table action. Creates the target table in the specified database if
       Generate DDL is selected. It uses the column definitions in the Columns
       tab and the table name and the TABLESPACE and STORAGE properties
       for the target table. The generated Create Table statement includes the
       TABLESPACE and STORAGE keywords, which indicate the location
       where the table is created and the storage expression for the Oracle
       storage_clause. You must have CREATE TABLE privileges on your
       schema. You can also specify your own CREATE TABLE SQL statement.
       You must enter the storage clause in Oracle format. (Use the User-defined
       DDL tab on the SQL tab for a complex statement.)




                                                                          6 - 27
Module 7



Constraints and Derivations
Module 7 – Constraints and Derivations            DataStage 314Svr




7-2
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                  Module 7 – Constraints and Derivations




A constraint specifies the condition under which data flows through a link. For
example, suppose you want to split the data in the jobs file into separate files
based on the job level.
We need to define a constraint on each link so that only jobs within a certain level
range are written to each file.




                                                                               7-3
Module 7 – Constraints and Derivations                           DataStage 314Svr




Click the Constraints button in the toolbar at the top of the Transformer Stage
window to open the Transformer Stage Contraints window.
The Transformer Stage Contraints window lists all the links out of the
transformer. Double-click on the cell next to a link to create the constraint.
•   Rows that are not written out to previous rows are written to a rejects link.
•   A row of data is sent down all the links it satisfies.
•   If there is no constraint on a (non-rejects) link, all rows will be sent down the
    link.




7-4
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                  Module 7 – Constraints and Derivations




This shows the Constraints window. Constraints are defined for each of the top
three links. The Reject Row box is selected for the last link. All rows that fail to
satisfy the top three links will be sent down this link.




                                                                               7-5
Module 7 – Constraints and Derivations                           DataStage 314Svr




True or False? A constraint specifies a condition under which incoming rows
of data will be written to an output link
True: Correct! You can separately define a constraint for each output link. If no
constraint is written for a particular output link, then all rows will be written to
that link.
False: Incorrect. You can separately define a constraint for each output link. If
no constraint is written for a particular output link, then all rows will be written to
that link.




7-6
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                   Module 7 – Constraints and Derivations




True or False? A Rejects link can be placed anywhere in the link ordering.
True: Incorrect. A Rejects link should be placed last in the link ordering, if it is
to get every row that doesn't satisfy any of the other constraints.
False: Correct! A Rejects link should be placed last in the link ordering, if it is
to get every row that doesn't satisfy any of the other constraints.




                                                                                7-7
Module 7 – Constraints and Derivations            DataStage 314Svr




7-8
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                   Module 7 – Constraints and Derivations




A derivation is an expression that specifies the value to be moved into a target
column (field).
Every target column must have a derivation. The simplest derivation is an input
column. The value in the input column is moved to the target column.
To construct a derivation for a target column double-click on the derivation cell
next to the target column.
Derivations are constructed in the same way that constraints are constructed:
•   Type constants.
•   Type or enter operators from Operator shortcut menu.
•   Type or enter operands from Operand shortcut menu.
What’s the difference between derivations and constraints?
•   Constraints apply to links; derivations apply to columns.
•   Constraints are conditions, either true or false; derivations specify a value to
    go into a target column.




                                                                                 7-9
Module 7 – Constraints and Derivations                       DataStage 314Svr




In this example the concatenation of several fields is moved into the FullName
target field.
The colon (:) is the concatenation operator. You can insert this from the Operator
menu or type it in.




7 - 10
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                 Module 7 – Constraints and Derivations




True or False? If the constraint for a particular link is not satisified, then the
derivations defined for that link are not executed.
True: Correct! Constraints have precedence over derivations. Derivations in an
output link are only executed if the constraint is satisfied.
False: Incorrect. Constraints have precedence over derivations. Derivations in
an output link are only executed if the constraint is satisfied.




                                                                           7 - 11
Module 7 – Constraints and Derivations            DataStage 314Svr




7 - 12
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                  Module 7 – Constraints and Derivations




You can create stage variables for use in your column derivations and constraints.
Stage variables store values without writing them out to a target file or table.
They can be used in expressions just like constants, input columns, and other
operands.
Stage variables retain their values across reads. This allows them to be used as
counters and accumulators. You can also use them to compare a current input
value to a previous input value.
To create a new stage variable, click the right mouse button over the Stage
Variables window and then click Append New Stage Variable (or Insert New
Stage Variable).
After you create it, you specify a derivation for it in the same way as for columns.




                                                                              7 - 13
Module 7 – Constraints and Derivations                        DataStage 314Svr




This lists the execution order:
•   Derivations in stage variables are executed before constraints. This allows
    them to be used in constraints.
•   Next constraints are executed.
•   Then column derivations are executed.
•   Derivations in higher columns are executed before lower columns.




7 - 14
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                 Module 7 – Constraints and Derivations




Note the output link reordering icon available on the toolbar from within the
Transformer stage.




                                                                          7 - 15
Module 7 – Constraints and Derivations                         DataStage 314Svr




To get to the link ordering screen, open the transformer stage, then click on the
output link execution order icon. The above screen will appear. Select a link
and use the arrow buttons to reposition a link in the execution order.




7 - 16
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                  Module 7 – Constraints and Derivations




Derivations for stage variables are executed before derivations for any
output link columns.
True: Correct! So you can be sure that the derivations for any of the stage
variables referenced in column derivations will have already been executed.
False: Incorrect. The derivations for stage variables are executed first. So you
can be sure that the derivations for any of the stage variables referenced in column
derivations will have already been executed.




                                                                              7 - 17
Module 7 – Constraints and Derivations            DataStage 314Svr




7 - 18
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials   Module 7 – Constraints and Derivations




                                                       7 - 19
Module 7 – Constraints and Derivations            DataStage 314Svr




7 - 20
Copyright © 2002 Ascential Software Corporation
09/01/02
Module 8



Creating Basic Expressions
Module 8 – Creating Basic Expressions             DataStage 304




8-2
Copyright © 2002 Ascential Software Corporation
03/01/02
DataStage Essentials                   Module 8 – Creating Basic Expressions




DataStage BASIC is a form of BASIC that has been customized to work with
DataStage.
In the previous module you learned how to define constraints and derivations.
Derivations and constraints are written using DataStage BASIC.
Job control routines, which are discussed in a later module, are also written in
DataStage BASIC.
This module will not attempt to teach you BASIC programming. Our focus is on
what you need to know in order to construct complex DataStage constraints and
derivations.




                                                                               8-3
Module 8 – Creating Basic Expressions                                 DataStage 304




For more information about BASIC operators than is provided here, search for
“BASIC Operators” in Help. You can insert these operators from the Operators
menu (except for the IF operator, which is on the Operands menu).
•   Arithmetic operators: -, +, *, /
•   Relational operators: =, <, >, <=, >=
•   Logical operators: AND, OR, NOT
•   IF operator:
    − IF min_lvl < 0 THEN “Out of Range” ELSE “In Range”
•   Concatenation operator (:)
    − “The employee’s name is ” : lname : “, ” : fname
•   Substring operator ([start, length]). First character is 1 (not 0).
    − “APPL3245”[1, 4] → “APPL”
    − “APPL3245”[5, 2] → “32”




8-4
Copyright © 2002 Ascential Software Corporation
03/01/02
DataStage Essentials                 Module 8 – Creating Basic Expressions




For more information about BASIC functions than is provided here, look up
Alphabetical List BASIC Functions and Statements in Help. BASIC functions
include the standard Pick BASIC functions. Click Function from the Operands
menu to insert a function.
Here are a few of the more common functions:
•   TRIM(string), TRIM(string, character), TRIMF, TRIMB
    − TRIM(“ xyz       ” ) → “xyz”
•   LEN(string)
•   UPCASE(string), DOWNCASE(string)
•   ICONV, OCONV
    − ICONV is used to convert values to an internal format
    − OCONV is used to convert values from an internal format
    − Very powerful functions. Often used for date and time conversions and
      manipulations.
    − These functions are discussed later in the module.




                                                                         8-5
Module 8 – Creating Basic Expressions                         DataStage 304




For more information about BASIC system variables than is provided here, look
up System Variables in Help. Click System Variable from the Operands menu
to insert a system variable.
•   @DATE, @TIME            Date/time job started
    − @YEAR, @MONTH, @DAY                  Extracted from @DATE
•   @INROWNUM               row counter - incoming link
•   @OUTROWNUM              row counter - outgoing link
•   @LOGNAME                User logon name
•   @NULL                   NULL value
•   @TRUE, @FALSE
•   @WHO                    Name of current project




8-6
Copyright © 2002 Ascential Software Corporation
03/01/02
DataStage Essentials                Module 8 – Creating Basic Expressions




True or False? TRIM is a system variable.
True: Incorrect. TRIM is a DataStage function that removes surrounding spaces
in a character string.
False: Correct! TRIM is a DataStage function that removes surrounding spaces
in a character string.




                                                                         8-7
Module 8 – Creating Basic Expressions                          DataStage 304




True or False? @INROWNUM is a DataStage function.
True: Incorrect. System variables all begin with the @-sign. @INROWNUM is
a system variable that contains the number of the last row read from the input
link.
False: Correct! System variables all begin with the @-sign. @INROWNUM is
a system variable that contains the number of the last row read from the input
link.




8-8
Copyright © 2002 Ascential Software Corporation
03/01/02
DataStage Essentials                  Module 8 – Creating Basic Expressions




DataStage is supplied with a number of functions you can use to obtain
information about your jobs and projects. You can insert these functions into
derivations.
DS functions and macros are discussed in a later module.




                                                                                8-9
Module 8 – Creating Basic Expressions                              DataStage 304




DS (DataStage) routines are defined in DataStage Manager. There are several
types of DS routines. The type you can insert into your derivations and
constraints are of the Transform Function type. A DS Transform Function
Routine consists of a predefined block of BASIC statements that takes one or
more arguments and returns a single value.
DS routines are defined in DataStage Manager. You can define your own
routines, but there are also a number of pre-built routines that are supplied with
DataStage.
The pre-built routines include a number of routines for manipulating dates, such
as ConvertMonth, QuarterTag, and Timestamp.




8 - 10
Copyright © 2002 Ascential Software Corporation
03/01/02
DataStage Essentials   Module 8 – Creating Basic Expressions




                                                      8 - 11
Module 8 – Creating Basic Expressions                          DataStage 304




Data elements are extended data types. For example, a phone number is a kind of
string. You could define a data element called PHONE.NUMBER to precisely
define this type.
Data elements are defined in DataStage Manager. A number of built-in types are
supplied with DataStage. For example MONTH.TAG represents a string of the
form “YYYY-MM”.




8 - 12
Copyright © 2002 Ascential Software Corporation
03/01/02
DataStage Essentials   Module 8 – Creating Basic Expressions




                                                      8 - 13
Module 8 – Creating Basic Expressions                            DataStage 304




DS Transforms are similar to DS Transform Function routines. They take one or
more arguments and return a single value. There are two primary differences:
•   The argument(s) and return value have specific data elements associated with
    them. In this sense, they transform data from one data element type to
    another data element type.
•   Unlike DS routines, they do not consist of blocks of BASIC statements.
    Rather, they consist of a single (though possibly very complex) BASIC
    expression.
You can define your own DS Transforms, but there are also a number of pre-built
transforms that are supplied with DataStage.
The pre-built transforms include a number of routines for manipulating strings
and dates.




8 - 14
Copyright © 2002 Ascential Software Corporation
03/01/02
DataStage Essentials   Module 8 – Creating Basic Expressions




                                                      8 - 15
Module 8 – Creating Basic Expressions             DataStage 304




8 - 16
Copyright © 2002 Ascential Software Corporation
03/01/02
DataStage Essentials   Module 8 – Creating Basic Expressions




                                                      8 - 17
Module 8 – Creating Basic Expressions                          DataStage 304




Date manipulation in DataStage can be done in several ways:
•   Using the Iconv and Oconv functions using the “D” conversion code.
•   Using the built-in date Transforms.
•   Using the built-in date routines.
•   Using routines in the DataStage Software Development Kit (SDK)
Using routines in the DataStage Software Development Kit (SDK) is covered in
another DataStage course. Your instructor can provide further details. The SDK
routines are installed in the Manager Routinessdk folder.




8 - 18
Copyright © 2002 Ascential Software Corporation
03/01/02
DataStage Essentials                   Module 8 – Creating Basic Expressions




For detailed help on Iconv and Oconv, see their entries in the Alphabetical List
of BASIC Functions and Statements in Help.
Use Iconv to convert a string date in a variety of formats to the internal DataStage
integer format. Use Oconv to convert an internal date to a string date in a variety
of formats. Use these two functions together to covert a string date from one
format to another.
The internal format for a date is based on a reference date of December 31, 1967,
which is day 0. Dates before are negative integers; dates after are positive
integers.
Use the “D” conversion code to specify the format of the date to be converted to
an internal date by Iconv or the format of the date to be output by Oconv.




                                                                             8 - 19
Module 8 – Creating Basic Expressions                            DataStage 304




For detailed help (more than you probably want), see D Code under Iconv or
Oconv in Help.
“D4-MDY[2,2,4]”
•   D          Date conversion code
•   4          Number of digits in year
•   -          Separator
•   MDY        Ordering is month, day, year
•   [2,2,4]    Number of digits for M,D,Y, respectively
Note:
•   The number in brackets for “Y” (namely 4) overrides the number following
    “D”.
•   Iconv ignores some of the characters.
    − Any separator will do.
    − Number of characters is ignored if there are separators.




8 - 20
Copyright © 2002 Ascential Software Corporation
03/01/02
DataStage Essentials                 Module 8 – Creating Basic Expressions




•   Iconv(“12-31-67”, “D2-MDY[2,2,2]”)      → 0
•   Iconv(“12311967”, “D MDY[2,2,4]”)       → 0
•   Iconv(“31-12-1967”, “D-DMY[2,2,4]”) → 0
•   Oconv(0, “D2-MDY[2,2,4]”)               → “12-31-1967”
•   Oconv(0, “D2/DMY[2,2,2]”)               → “31/12/67”
•   Oconv(10, “D/YDM[4,2,A10]”)             → “1968/10/JANUARY”
    − This example illustrates the use of an additional formatting option. The
      “A10” options says to alphabetically express the name, length 10
      characters.
•   Oconv( Iconv(“12-31-67”, “D2-MDY[2,2,2]”), “D/YDM[4,2,A10]”)
       → “1967/31/DECEMBER”
    − This example shows how to convert from one string representation to
      another.




                                                                          8 - 21
Module 8 – Creating Basic Expressions             DataStage 304




8 - 22
Copyright © 2002 Ascential Software Corporation
03/01/02
DataStage Essentials                  Module 8 – Creating Basic Expressions




DataStage provides a number of built-in transforms you can use for date
conversions.
The following data elements are used with the built-in transforms:


 Data element                String format                    Example
 DATE.TAG                    YYYY-MM-DD                       1999-02-24
 WEEK.TAG                    YYYYWnn                          1999W06
 MONTH.TAG                   YYYY-MM                          1999-02
 QUARTER.TAG                 YYYYQn                           1999Q4
 YEAR.TAG                    YYYY                             1999




                                                                           8 - 23
Module 8 – Creating Basic Expressions                            DataStage 304




True or False? You can use Oconv to convert a string date from one format
to another.
True: Incorrect. Oconv by itself can't do this. You would first use Iconv to
convert the input string into a day integer. Then you can use Oconv to convert the
day integer into the output string.
False: Correct! Oconv by itself can't do this. You would first use Iconv to
convert the input string into a day integer. Then you can use Oconv to convert the
day integer into the output string.




8 - 24
Copyright © 2002 Ascential Software Corporation
03/01/02
DataStage Essentials                 Module 8 – Creating Basic Expressions




The transforms can be grouped into the following categories:
•   String to day number
    − Formatted string → internal date integer
•   Day number to date string
    − Internal date integer → formatted string
•   Date string to date string
    − DATE.TAG string → formatted string




                                                                    8 - 25
Module 8 – Creating Basic Expressions                          DataStage 304




The following transforms convert strings of the specified format (MONTH.TAG,
QUARTER.TAG, …) to an internal date representing the first or last day of the
period.
 Function              Tag                  Description

 MONTH.FIRST           MONTH.TAG            Returns a numeric internal date
                                            corresponding to the first/last day
 MONTH.LAST
                                            of a month
 QUARTER.FIRST         QUARTER.TAG          Returns a numeric internal date
                                            corresponding to the first/last day
 QUARTER.LAST
                                            of a quarter
 WEEK.FIRST            WEEK.TAG             Returns a numeric internal date
                                            corresponding to the first day
 WEEK.LAST
                                            (Monday) / last day (Sunday) of a
                                            week
 YEAR.FIRST            YEAR.TAG             Returns a numeric internal date
                                            corresponding to the first/last day
 YEAR.LAST
                                            of a year



8 - 26
Copyright © 2002 Ascential Software Corporation
03/01/02
DataStage Essentials        Module 8 – Creating Basic Expressions


Examples:
MONTH.FIRST(“1993-02”) → 9164
MONTH.LAST(“1993-02”) → 9191




                                                           8 - 27
Module 8 – Creating Basic Expressions                             DataStage 304




The following functions convert internal dates to strings in various formats
(DATE.TAG, MONTH.TAG, …).
 Function                Argument type        Description
 DATE.TAG                Internal date        Converts internal date to string in
                                              DATE.TAG format
 MONTH.TAG               Internal date        Converts internal date to string in
                                              MONTH.TAG format
 QUARTER.TAG             Internal date        Converts internal date to string in
                                              QUARTER.TAG format
 WEEK.TAG                Internal date        Converts internal date to string in
                                              WEEK.TAG format


Examples:
MONTH.TAG(9177) → “1993-02”
DATE.TAG(9177)        → “1993-02-14”



8 - 28
Copyright © 2002 Ascential Software Corporation
03/01/02
DataStage Essentials                Module 8 – Creating Basic Expressions




The following functions convert strings in DATE.TAG format to strings in
various other formats (DAY.TAG, MONTH.TAG, …).
 Function                 Tag               Description
 TAG.TO.MONTH             DATE.TAG          Convert DATE.TAG to
                                            MONTH.TAG
 TAG.TO.QUARTER           DATE.TAG          Convert DATE.TAG to
                                            QUARTER.TAG
 TAG.TO.WEEK              DATE.TAG          Convert DATE.TAG to
                                            WEEK.TAG
 TAG.TO.DAY               DATE.TAG          Convert DATE.TAG to
                                            DAY.TAG


Examples:
TAG.TO.MONTH(“1993-02-14”)          → “1993-02”
TAG.TO.QUARTER(“1993-02-14”) → “1993Q1”




                                                                           8 - 29
Module 8 – Creating Basic Expressions             DataStage 304




8 - 30
Copyright © 2002 Ascential Software Corporation
03/01/02
DataStage Essentials   Module 8 – Creating Basic Expressions




                                                      8 - 31
Module 8 – Creating Basic Expressions             DataStage 304




8 - 32
Copyright © 2002 Ascential Software Corporation
03/01/02
Module 9



Troubleshooting
Module 9 – Troubleshooting                        DataStage 314Svr




9-2
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                                Module 9 – Troubleshooting




Events are logged to the job log file when a job is validated, run, or reset. You
can use the log file to troubleshoot jobs that fail during validation or a run.
Various entries are written to the log, including when:
•   The job starts
•   The job finishes
•   An active stage starts
•   An active stage finishes
•   Rows are rejected (yellow icons)
•   Errors occur (red icons)
•   DataStage informational reports are logged
•   User-invoked messages are displayed




                                                                              09 - 3
Module 9 – Troubleshooting                                    DataStage 314Svr




The event window shows the events that are logged for a job during its run.
The job log contains the following information:
 Column Name          Description
 Occurred             Time the event occurred
 On date              Date the event occurred
 Type                 Info Informational. No action required.
                      Warning An error occurred. Investigate the cause of the
                      warning, as this may indicate a serious error.
                      Fatal A fatal error occurred.
                      Control The job starts and finishes.
                      Reject Rejected rows are output.
                      Reset A job or the log is reset.
 Event                A message describing the event. The system displays the
                      first line of the message. If a message has an ellipsis (…)
                      at the end, it contains more than one line. You can view
                      the full message in the Event Detail window.


9-4
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                     Module 9 – Troubleshooting



Clearing the log
To clear the log, click Job>Clear Log.




                                                             09 - 5
Module 9 – Troubleshooting                                  DataStage 314Svr




Double-click on an event to open the Event Detail window. This window gives
you more information.
When an active stage finishes, DataStage logs an informational message that
describes how many rows were read in to the stage and how many were written.
This provides you with valuable information that can indicate possible errors.




9-6
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                             Module 9 – Troubleshooting




The Monitor can be used to display information about a job while it is running.
To start the Monitor, click Tools>New Monitor. Once in Monitor, click the right
mouse button and then select Show links to display information about each of the
input and output links.




                                                                          09 - 7
Module 9 – Troubleshooting                                  DataStage 314Svr




When you are testing a job, you can save time by limiting the number of rows and
warnings.




9-8
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                               Module 9 – Troubleshooting




Server side tracing is enabled in Administrator. It is designed to be used to help
customer support analysts troubleshoot serious problems. When enabled, it logs a
record to a trace file whenever DataStage clients interact with the server.
Caution: Because of the overhead caused by server side tracing it should only be
used when working with customer support.




                                                                            09 - 9
Module 9 – Troubleshooting                        DataStage 314Svr




9 - 10
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                               Module 9 – Troubleshooting




DataStage provides a debugger for testing and debugging your job designs. The
debugger runs within Designer. With the DataStage debugger you can:
•   Set breakpoints on job links, including conditional breakpoints.
•   Step through your job link-by-link or row-by-row.
•   Watch the values going into link columns.




                                                                       09 - 11
Module 9 – Troubleshooting                                       DataStage 314Svr




To begin debugging a program, click View>Debug Bar to display the debug
toolbar. The toolbar provides access to all of the debugging functions.

              Stop                 Toggle
                                   breakpoint
  Next link          Debug job
                     parameters
                                           View job log




                              Clear breakpoints

  Go                                     Debug window
          Next row    Edit breakpoints



 Button                           Description
 Go                               Start/continue debugging.
 Next Link                        The job continues until the next action occurs on
                                  the link.
 Next Row                         The job continues until the next row is processed
                                  or ntil another link ith a breakpoint is
9 - 12
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                          Module 9 – Troubleshooting


                        or until another link with a breakpoint is
                        encountered.
Stop Job                Stops the job at the point it is at. Click Go to
                        continue.
Job Parameters          Set limits on rows and warnings.
Edit Breakpoints        Displays the Edit Breakpoints window, in which
                        you can edit existing breakpoints.
Toggle Breakpoint       Set or clear a breakpoint on a selected link.
Clear All Breakpoints   Removes breakpoints from all links.
View job log            Open Director and view the job log.
Debug Window            Show/hide the Debug Window, which displays
                        link column values.




                                                                        09 - 13
Module 9 – Troubleshooting                                     DataStage 314Svr




To set a breakpoint on a link, select the link and then click the Toggle
Breakpoint button. A black circle appears on the link.




9 - 14
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                               Module 9 – Troubleshooting




Click the Edit Breakpoints button to open the Edit Breakpoints window.
Existing breakpoints are listed in the lower pane.
To set a condition for a breakpoint, select the breakpoint and then specify the
condition in the above pane. You can either specify the number of rows before
breaking or specify an expression to break upon when it’s true.




                                                                          09 - 15
Module 9 – Troubleshooting                                       DataStage 314Svr




Click the Debug Window button to open the Debug Window.
•   The top pane lists all the columns defined for all links.
•   The Local Data column lists the data currently in the column.
•   The Current Break box at the top of the window lists the link where
    execution stopped.
•   To add a column to the lower pane (where it is isolated), select the column
    and then click Add Watch.
•   If a breakpoint is set, execution stops at that link when a row is written to the
    link.




9 - 16
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                                 Module 9 – Troubleshooting




You can step through row-by-row or step-by-step.
•   Next Row extracts a row of data and stops at the next link with a breakpoint
    that the row is written to.
    − For example, if a breakpoint is set on the MexicoCustomersOut link,
      execution stops at the MexicoCustomersOut link when a Mexican
      customer is read.
    − If a breakpoint is not set on the MexicoCustomersOut link, execution
      will not stop at the MexicoCustomersOut link when a Mexican customer
      is read.
    − Execution will stop at the CustomersIn link (even if there is no
      breakpoint set on it) because all rows are read through that link.
•   Next Link stops at the next link that data is written to.




                                                                           09 - 17
Module 9 – Troubleshooting                        DataStage 314Svr




9 - 18
Copyright © 2002 Ascential Software Corporation
09/01/02
Module 10



Defining Lookups
Module 10 – Defining Lookups                      DataStage 314Svr




10 - 2
Copyright © 2002 Ascential Software Corporation
09/01/02
DataStage Essentials                               Module 10 – Defining Lookups




A hashed file is a file that distributes records in one or more evenly-sized groups
based on a primary key. The primary key value is processed by a "hashing
algorithm" to determine the location of the record.
The number of groups in the file is referred to as its modulus.
In this example, there are 5 groups (modulus 5).
Hashed files are used for reference lookups in DataStage because of their fast
performance. The hashing algorithm determines the group the record is in. The
groups contain a small number of records, so the record can be quickly located
within the group.
If write caching is enabled, DataStage does not write hashed file records directly
to disk. Instead it caches the records in memory, and writes the cached records to
disk when the cache is full. This improved performance. You can specify the
size of the cache on the Tunables tab in Administrator.




                                                                              10 - 3
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide
58750024 datastage-student-guide

Mais conteúdo relacionado

Mais procurados

Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobs
shanker_uma
 
datastage training | datastage online training | datastage training videos | ...
datastage training | datastage online training | datastage training videos | ...datastage training | datastage online training | datastage training videos | ...
datastage training | datastage online training | datastage training videos | ...
Nancy Thomas
 
Oracle 11g data warehouse introdution
Oracle 11g data warehouse introdutionOracle 11g data warehouse introdution
Oracle 11g data warehouse introdution
Aditya Trivedi
 
Cooper Oracle 11g Overview
Cooper Oracle 11g OverviewCooper Oracle 11g Overview
Cooper Oracle 11g Overview
moin_azeem
 
Oracle & sql server comparison 2
Oracle & sql server comparison 2Oracle & sql server comparison 2
Oracle & sql server comparison 2
Mohsen B
 
Db2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallsDb2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfalls
sam2sung2
 

Mais procurados (19)

Migration from 8.1 to 11.3
Migration from 8.1 to 11.3Migration from 8.1 to 11.3
Migration from 8.1 to 11.3
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobs
 
Datastage to ODI
Datastage to ODIDatastage to ODI
Datastage to ODI
 
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
 
Ibm info sphere datastage tutorial part 1 architecture examples
Ibm info sphere datastage tutorial part 1  architecture examplesIbm info sphere datastage tutorial part 1  architecture examples
Ibm info sphere datastage tutorial part 1 architecture examples
 
datastage training | datastage online training | datastage training videos | ...
datastage training | datastage online training | datastage training videos | ...datastage training | datastage online training | datastage training videos | ...
datastage training | datastage online training | datastage training videos | ...
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
Oracle 11g data warehouse introdution
Oracle 11g data warehouse introdutionOracle 11g data warehouse introdution
Oracle 11g data warehouse introdution
 
The Database Environment Chapter 13
The Database Environment Chapter 13The Database Environment Chapter 13
The Database Environment Chapter 13
 
Whitepaper tableau for-the-enterprise-0
Whitepaper tableau for-the-enterprise-0Whitepaper tableau for-the-enterprise-0
Whitepaper tableau for-the-enterprise-0
 
DBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureDBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructure
 
Teradata 13.10
Teradata 13.10Teradata 13.10
Teradata 13.10
 
Introduction to oracle database (basic concepts)
Introduction to oracle database (basic concepts)Introduction to oracle database (basic concepts)
Introduction to oracle database (basic concepts)
 
Cooper Oracle 11g Overview
Cooper Oracle 11g OverviewCooper Oracle 11g Overview
Cooper Oracle 11g Overview
 
Oracle DBA Tutorial for Beginners -Oracle training institute in bangalore
Oracle DBA Tutorial for Beginners -Oracle training institute in bangaloreOracle DBA Tutorial for Beginners -Oracle training institute in bangalore
Oracle DBA Tutorial for Beginners -Oracle training institute in bangalore
 
Oracle dba training
Oracle  dba    training Oracle  dba    training
Oracle dba training
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL Server
 
Oracle & sql server comparison 2
Oracle & sql server comparison 2Oracle & sql server comparison 2
Oracle & sql server comparison 2
 
Db2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallsDb2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfalls
 

Destaque

Curriculum Vitae - Dinesh Babu S V
Curriculum Vitae - Dinesh Babu S VCurriculum Vitae - Dinesh Babu S V
Curriculum Vitae - Dinesh Babu S V
Dinesh Babu S V
 
Europan barrena
Europan barrenaEuropan barrena
Europan barrena
iktklik
 
Dtq4_ita
Dtq4_itaDtq4_ita
Dtq4_ita
DTQ4
 
10 tips from 4 years of freelancing
10 tips from 4 years of freelancing10 tips from 4 years of freelancing
10 tips from 4 years of freelancing
keithdevon
 
TateLUDLpresentation
TateLUDLpresentationTateLUDLpresentation
TateLUDLpresentation
mzlauren10
 
Front cover & contents page research
Front cover & contents page researchFront cover & contents page research
Front cover & contents page research
05colesben
 
Primero 2014 bmo conference final
Primero 2014 bmo conference finalPrimero 2014 bmo conference final
Primero 2014 bmo conference final
primero_mining
 

Destaque (17)

Datastage
DatastageDatastage
Datastage
 
Data stage faqs datastage faqs
Data stage faqs  datastage faqsData stage faqs  datastage faqs
Data stage faqs datastage faqs
 
Datastage real time scenario
Datastage real time scenarioDatastage real time scenario
Datastage real time scenario
 
Resume_Sathish
Resume_SathishResume_Sathish
Resume_Sathish
 
Curriculum Vitae - Dinesh Babu S V
Curriculum Vitae - Dinesh Babu S VCurriculum Vitae - Dinesh Babu S V
Curriculum Vitae - Dinesh Babu S V
 
Sql server select queries ppt 18
Sql server select queries ppt 18Sql server select queries ppt 18
Sql server select queries ppt 18
 
Datastage developer Resume
Datastage developer ResumeDatastage developer Resume
Datastage developer Resume
 
Europan barrena
Europan barrenaEuropan barrena
Europan barrena
 
Khmer culture, civilization (part1)
Khmer culture, civilization (part1)Khmer culture, civilization (part1)
Khmer culture, civilization (part1)
 
Dtq4_ita
Dtq4_itaDtq4_ita
Dtq4_ita
 
10 tips from 4 years of freelancing
10 tips from 4 years of freelancing10 tips from 4 years of freelancing
10 tips from 4 years of freelancing
 
PowerShell for SharePoint Developers
PowerShell for SharePoint DevelopersPowerShell for SharePoint Developers
PowerShell for SharePoint Developers
 
Q4 and Full Year 2012
Q4 and Full Year 2012Q4 and Full Year 2012
Q4 and Full Year 2012
 
TateLUDLpresentation
TateLUDLpresentationTateLUDLpresentation
TateLUDLpresentation
 
Front cover & contents page research
Front cover & contents page researchFront cover & contents page research
Front cover & contents page research
 
A08
A08A08
A08
 
Primero 2014 bmo conference final
Primero 2014 bmo conference finalPrimero 2014 bmo conference final
Primero 2014 bmo conference final
 

Semelhante a 58750024 datastage-student-guide

DBA, LEVEL III TTLM Monitoring and Administering Database.docx
DBA, LEVEL III TTLM Monitoring and Administering Database.docxDBA, LEVEL III TTLM Monitoring and Administering Database.docx
DBA, LEVEL III TTLM Monitoring and Administering Database.docx
seifusisay06
 
Oracle - Enterprise Manager 12c Overview
Oracle - Enterprise Manager 12c OverviewOracle - Enterprise Manager 12c Overview
Oracle - Enterprise Manager 12c Overview
Fred Sim
 
Pramodkumar_SQL_DBA(5YRS EXP)
Pramodkumar_SQL_DBA(5YRS EXP)Pramodkumar_SQL_DBA(5YRS EXP)
Pramodkumar_SQL_DBA(5YRS EXP)
pramod singh
 

Semelhante a 58750024 datastage-student-guide (20)

Government and Education Webinar: There's More Than One Way to Monitor SQL Da...
Government and Education Webinar: There's More Than One Way to Monitor SQL Da...Government and Education Webinar: There's More Than One Way to Monitor SQL Da...
Government and Education Webinar: There's More Than One Way to Monitor SQL Da...
 
Sql interview question part 10
Sql interview question part 10Sql interview question part 10
Sql interview question part 10
 
Ebook10
Ebook10Ebook10
Ebook10
 
DBA, LEVEL III TTLM Monitoring and Administering Database.docx
DBA, LEVEL III TTLM Monitoring and Administering Database.docxDBA, LEVEL III TTLM Monitoring and Administering Database.docx
DBA, LEVEL III TTLM Monitoring and Administering Database.docx
 
Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7
 
Chetan.Kumar-SQL_DBA 9115
Chetan.Kumar-SQL_DBA 9115Chetan.Kumar-SQL_DBA 9115
Chetan.Kumar-SQL_DBA 9115
 
KarenResumeDBA
KarenResumeDBAKarenResumeDBA
KarenResumeDBA
 
KarenResumeDBA
KarenResumeDBAKarenResumeDBA
KarenResumeDBA
 
Info sphere overview
Info sphere overviewInfo sphere overview
Info sphere overview
 
SQL_DBA USA_M&T Bank
SQL_DBA USA_M&T BankSQL_DBA USA_M&T Bank
SQL_DBA USA_M&T Bank
 
Oracle - Enterprise Manager 12c Overview
Oracle - Enterprise Manager 12c OverviewOracle - Enterprise Manager 12c Overview
Oracle - Enterprise Manager 12c Overview
 
Resume ratna rao updated
Resume ratna rao updatedResume ratna rao updated
Resume ratna rao updated
 
Resume_Ratna Rao updated
Resume_Ratna Rao updatedResume_Ratna Rao updated
Resume_Ratna Rao updated
 
SAP BODS 4.2
SAP BODS 4.2 SAP BODS 4.2
SAP BODS 4.2
 
SSDT unleashed
SSDT unleashedSSDT unleashed
SSDT unleashed
 
IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)
 
Pramodkumar_SQL_DBA(5YRS EXP)
Pramodkumar_SQL_DBA(5YRS EXP)Pramodkumar_SQL_DBA(5YRS EXP)
Pramodkumar_SQL_DBA(5YRS EXP)
 
Estuate EDM Checklist
Estuate EDM ChecklistEstuate EDM Checklist
Estuate EDM Checklist
 
Metadata Modeling Best Practices with IBM Cognos Framework Manager
Metadata Modeling Best Practices with IBM Cognos Framework ManagerMetadata Modeling Best Practices with IBM Cognos Framework Manager
Metadata Modeling Best Practices with IBM Cognos Framework Manager
 
Oracle Enterprise Manager Seven Robust Features to Put in Action final
Oracle Enterprise Manager Seven Robust Features to Put in Action finalOracle Enterprise Manager Seven Robust Features to Put in Action final
Oracle Enterprise Manager Seven Robust Features to Put in Action final
 

58750024 datastage-student-guide

  • 2. Contents - 2 Copyright © 2002 Ascential Software Corporation Version 6.0: 09/01/02
  • 3. Copyright This document and the software described herein are the property of Ascential Software Corporation and its licensors and contain confidential trade secrets. All rights to this publication are reserved. No part of this document may be reproduced, transmitted, transcribed, stored in a retrieval system or translated into any language, in any form or by any means, without prior permission from Ascential Software Corporation. Copyright © 2002 Ascential Software Corporation. All rights Reserved Ascential Software Corporation reserves the right to make changes to this document and the software described herein at any time and without notice. No warranty is expressed or implied other than any contained in the terms and conditions of sale. Ascential Software Corporation 50 Washington Street Westboro, MA 01581-1021 USA Phone: (508) 366-3888 Fax: (508) 389-8749 Ardent, Axielle, DataStage, Iterations, MetaBroker, MetaStage, and uniVerse are registered trademarks of Ascential Software Corporation. Pick is a registered trademark of Pick Systems. Ascential Software is not a licensee of Pick Systems. Other trademarks and registered trademarks are the property of the respective trademark holder. 09-01-2002 Contents - 3
  • 4. Contents - 4 Copyright © 2002 Ascential Software Corporation Version 6.0: 09/01/02
  • 5. Table of Contents Module 1: Introduction to DataStage ............................ 1-01 Module 2: Installing DataStage ..................................... 2-01 Module 3: Configuring Projects ..................................... 3-01 Module 4: Designing and Running Jobs ........................ 4-01 Module 5: Working with Metadata................................. 5-01 Module 6: Working with Relational Data ....................... 6-01 Module 7: Constraints and Derivations .......................... 7-01 Module 8: Creating BASIC Expressions ........................ 8-01 Module 9: Troubleshooting ............................................ 9-01 Module 10: Defining Lookups ...................................... 10-01 Module 11: Aggregating Data ...................................... 11-01 Module 12: Job Control................................................ 12-01 Module 13: Working with Plug-Ins ............................... 13-01 Module 14: Scheduling and Reporting ........................ 14-01 Module 15: Optimizing Job Performance .................... 15-01 Module 16: Putting It All Together .............................. 16-01 Contents - 5
  • 6. Contents - 6 Copyright © 2002 Ascential Software Corporation Version 6.0: 09/01/02
  • 8. Module 1 – Introduction to DataStage DataStage 314Svr Ascential software provides the enterprise with a full featured data integration platform that can take data from any source and load it into any target. Sources can range from customer relationship systems to legacy systems to data warehouses -- in fact, any system that houses data. Target systems, likewise, can consist of data in warehouses, real-time systems, Web services -- any application that houses data. Depending on your needs, source data can undergo scrutiny and transformation through several stages: 1. Data profiling -- a discovery process where relevant information for target enterprise applications is gathered 2. Data quality -- a preparation process where data can be cleansed and corrected 3. Extract, Transform, Load -- a transformation process where data is enriched and loaded into the target Underlying these processes is an application framework that allows you to 1. Utilize parallel processing for maximum performance 2. Manage and share metadata amongst all the stages Overlaying all of this is a command and control structure that allows you to tailor your environment to your specific needs. 1-2 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 9. DataStage Essentials Module 1 – Introduction to DataStage 1-3
  • 10. Module 1 – Introduction to DataStage DataStage 314Svr A data warehouse is a central database that integrates data from many operational sources within an organization. The data is transformed, summarized, and organized to support business analysis and report generation. • Repository of data • Optimized for analysis • Supports business: − Projections − Comparisons − Assessments • Extracted from operational sources − Integrated − Summarized − Filtered − Cleansed − Denormalized − Historical 1-4 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 11. DataStage Essentials Module 1 – Introduction to DataStage Data marts are like data warehouses but smaller in scope. Frequently an organization will have both an enterprise-wide data warehouse and data marts that extract data from it for specialized purposes. • Like data warehouses but smaller in scope • Organize data from a single subject area or department • Solve a small set of business requirements • Are cheaper and faster to build than a data warehouse • Distribute data away from the data warehouse 1-5
  • 12. Module 1 – Introduction to DataStage DataStage 314Svr DataStage is a comprehensive tool for the fast, easy creation and maintenance of data marts and data warehouses. It provides the tools you need to build, manage, and expand them. With DataStage, you can build solutions faster and give users access to the data and reports they need. With DataStage you can: • Design the jobs that extract, integrate, aggregate, load, and transform the data for your data warehouse or data mart. • Create and reuse metadata and job components. • Run, monitor, and schedule these jobs. • Administer your development and execution environments. 1-6 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 13. DataStage Essentials Module 1 – Introduction to DataStage DataStage is client/server software. The server stores all DataStage objects and metadata in a repository, which consists of the UniVerse RDBMS. The clients interface with the server. The clients run on Windows 95 or later (Windows 98, NT, 2000). The server runs on Windows NT 4.0 and Windows 2000. Most versions of UNIX are supported. See the installation release notes for details. The DataStage client components are: Component Description Administrator Administers DataStage projects and conducts housekeeping on the server Designer Creates DataStage jobs that are compiled into executable programs Director Used to run and monitor the DataStage jobs Manager Allows you to view and edit the contents of the repository 1-7
  • 14. Module 1 – Introduction to DataStage DataStage 314Svr True or False? The DataStage Server and clients must be running on the same machine. True: Incorrect. Typically, there are many client machines each accessing the same DataStage Server running on a separate machine. The Server can be running on Windows NT or UNIX. The clients can be running on a variety of Windows platforms. False: Correct! Typically, there are many client machines each accessing the same DataStage Server running on a separate machine. The Server can be running on Windows NT or UNIX. The clients can be running on a variety of Windows platforms. 1-8 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 15. DataStage Essentials Module 1 – Introduction to DataStage Use the Administrator to specify general server defaults, add and delete projects, and to set project properties. The Administrator also provides a command interface to the UniVerse repository. • Use the Administrator Project Properties window to: • Set job monitoring limits and other Director defaults on the General tab. • Set user group privileges on the Permissions tab. • Enable or disable server-side tracing on the Tracing tab. • Specify a user name and password for scheduling jobs on the Schedule tab. • Specify hashed file stage read and write cache sizes on the Tunables tab. General server defaults can be set on the Administrator DataStage Administration window (not shown): • Change license information. • Set server connection timeout. The DataStage Administrator is discussed in detail in a later module. 1-9
  • 16. Module 1 – Introduction to DataStage DataStage 314Svr Use the Manager to store and manage reusable metadata for the jobs you define in the Designer. This metadata includes table and file layouts and routines for transforming extracted data. Manager is also the primary interface to the DataStage repository. In addition to table and file layouts, it displays the routines, transforms, and jobs that are defined in the project. Custom routines and transforms can also be created in Manager. 1 - 10 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 17. DataStage Essentials Module 1 – Introduction to DataStage The DataStage Designer allows you to use familiar graphical point-and-click techniques to develop processes for extracting, cleansing, transforming, integrating and loading data into warehouse tables. The Designer provides a “visual data flow” method to easily interconnect and configure reusable components. Use Designer to: • Specify how the data is extracted. • Specify data transformations. • Decode (denormalize) data going into the data mart using reference lookups. − For example, if the sales order records contain customer IDs, you can look up the name of the customer in the CustomerMaster table. − This avoids the need for a join when users query the data mart, thereby speeding up the access. • Aggregate data. • Split data into multiple outputs on the basis of defined constraints. You can easily move between the Director, Designer, and Manager by selecting commands in the Tools menu. 1 - 11
  • 18. Module 1 – Introduction to DataStage DataStage 314Svr Use the Director to validate, run, schedule, and monitor your DataStage jobs. You can also gather statistics as the job runs. 1 - 12 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 19. DataStage Essentials Module 1 – Introduction to DataStage • Define your project’s properties: Administrator • Open (attach to) your project • Import metadata that defines the format of data stores your jobs will read from or write to: Manager • Design the job: Designer − Define data extractions (reads) − Define data flows − Define data integration − Define data transformations − Define data constraints − Define data loads (writes) − Define data aggregations • Compile and debug the job: Designer • Run and monitor the job: Director 1 - 13
  • 20. Module 1 – Introduction to DataStage DataStage 314Svr All your work is done in a DataStage project. Before you can do anything, other than some general administration, you must open (attach to) a project. Projects are created during and after the installation process. You can add projects after installation on the Projects tab of Administrator. A project is associated with a directory. The project directory is used by DataStage to store your jobs and other DataStage objects and metadata. You must open (attach to) a project before you can do any work in it. Projects are self-contained. Although multiple projects can be open at the same time, they are separate environments. You can, however, import and export objects between them. Multiple users can be working in the same project at the same time. However, DataStage will prevent multiple users from accessing the same job at the same time. 1 - 14 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 21. DataStage Essentials Module 1 – Introduction to DataStage DataStage Designer is used to build and compile your Extraction, Transformation, and Load (ETL) jobs. True: Correct! With Designer you can graphically build your job by placing graphical components (called "stages") on a canvas. After you build it, your job is compiled in Designer. False: Incorrect. With Designer you can graphically build your job by placing graphical components (called "stages") on a canvas. After you build it, your job is compiled in Designer. DataStage Manager is used to execute your jobs after you build them. True: Incorrect. DataStage Manager is your primary interface to the DataStage repository. Use Manager to manage metadata and other DataStage objects. False: Correct! DataStage Manager is your primary interface to the DataStage repository. Use Manager to manage metadata and other DataStage objects. 1 - 15
  • 22. Module 1 – Introduction to DataStage DataStage 314Svr DataStage Director is used to execute your jobs after they have been built. True: Correct! Use Director to validate and run your jobs. You can also monitor the job while it is running. False: Incorrect. Use Director to validate and run your jobs. You can also monitor the job while it is running. DataStage Administrator is used to set global and project properties. True: Correct! You can set some global properties such as connection timeout, as well as project properties, such as permissions. False: Incorrect. You can set some global properties such as connection timeout, as well as project properties, such as permissions. 1 - 16 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 23. DataStage Essentials Module 1 – Introduction to DataStage 1 - 17
  • 24.
  • 26. Module 2 – Installing DataStage DataStage 314Svr 2-2 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 27. DataStage Essentials Module 2 – Installing DataStage The DataStage server should be installed before the DataStage clients are installed. The server can be installed on Windows NT (including Workstation and Server), Windows 2000, or UNIX. This module describes the Windows NT installation. The exact system requirements depend on your version of DataStage. See the installation CD for the latest system requirements. To install the server you will need the installation CD and a license for the DataStage server. The license contains the following information: • Serial number • Project count − The maximum number of projects you can have installed on the server. This includes new projects as well as previously created projects to be upgraded. • Expiration date • Authorization code − This information must be entered exactly as written in the license. 2-3
  • 28. Module 2 – Installing DataStage DataStage 314Svr The installation wizard guides you through the following steps: • Enter license information • Specify server directories • Select program folder • Create new projects and/or upgrade existing projects 2-4 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 29. DataStage Essentials Module 2 – Installing DataStage 2-5
  • 30. Module 2 – Installing DataStage DataStage 314Svr 2-6 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 31. DataStage Essentials Module 2 – Installing DataStage The DataStage services must be running on the server machine in order to run any DataStage client applications. To start or stop the DataStage services in Windows 2000, open the DataStage Control Panel window in the Windows 2000 Control Panel. Then click Start All Services (or Stop All Services). These services must be stopped when installing or reinstalling DataStage. UNIX note: In UNIX, these services are started and stopped using the uv.rc script with the stop or start command options. The exact name varies by platform. For SUN Solaris, it is /etc/rc2.d/S99uv.rc. 2-7
  • 32. Module 2 – Installing DataStage DataStage 314Svr The DataStage clients should be installed after the DataStage server is installed. The clients can be installed on Windows 95, Windows 98, Windows NT, or Windows 2000. There are two editions of DataStage. • The Developer’s edition contains all the client applications (in addition to the server). • The Operator’s edition contains just the client applications needed to run and monitor DataStage jobs (in addition to the server), namely, the Director and Administrator. To install the Developer’s edition you need a license for DataStage Developer. To install the Operator’s edition you need a license for DataStage Director. The license contains the following information: • Serial number • User limit • Expiration date • Authorization code − This information must be entered exactly as written in the license. 2-8 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 33. DataStage Essentials Module 2 – Installing DataStage 2-9
  • 34. Module 2 – Installing DataStage DataStage 314Svr 2 - 10 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 36. Module 3 – Configuring Projects DataStage 314Svr 3-2 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 37. DataStage Essentials Module 3 – Configuring Projects In DataStage all development work is done within a project. Projects are created during installation and after installation using Administrator. Each project is associated with a directory. The directory stores the objects (jobs, metadata, custom routines, etc.) created in the project. Before you can work in a project you must attach to it (open it). You can set the default properties of a project using DataStage Administrator. 3-3
  • 38. Module 3 – Configuring Projects DataStage 314Svr 3-4 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 39. DataStage Essentials Module 3 – Configuring Projects Click Properties on the DataStage Administration window to open the Project Properties window. There are five active tabs. (The Mainframe tab is only enabled if your license supports mainframe jobs.) The default is the General tab. If you select the Enable job administration in Director box, you can perform some administrative functions in Director without opening Administrator. When a job is run in Director, events are logged describing the progress of the job. For example, events are logged when a job starts, when it stops, and when it aborts. The number of logged events can grow very large. The Auto-purge of job log box tab allows you to specify conditions for purging these events. You can limit the logged events either by number of days or number of job runs. 3-5
  • 40. Module 3 – Configuring Projects DataStage 314Svr Use this page to set user group permissions for accessing and using DataStage. All DataStage users must belong to a recognized user role before they can log on to DataStage. This helps to prevent unauthorized access to DataStage projects. There are three roles of DataStage user: • DataStage Developer, who has full access to all areas of a DataStage project. • DataStage Operator, who can run and manage released DataStage jobs. • <None>, who does not have permission to log on to DataStage. UNIX note: In UNIX, the groups displayed are defined in /etc/group. 3-6 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 41. DataStage Essentials Module 3 – Configuring Projects This tab is used to enable and disable server-side tracing. The default is for server-side tracing to be disabled. When you enable it, information about server activity is recorded for any clients that subsequently attach to the project. This information is written to trace files. Users with in-depth knowledge of the system software can use it to help identify the cause of a client problem. If tracing is enabled, users receive a warning message whenever they invoke a DataStage client. Warning: Tracing causes a lot of server system overhead. This should only be used to diagnose serious problems. 3-7
  • 42. Module 3 – Configuring Projects DataStage 314Svr Use the Schedule tab to specify a user name and password for running scheduled jobs in the selected project. If no user is specified here, the job runs under the same user name as the system scheduler. 3-8 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 43. DataStage Essentials Module 3 – Configuring Projects On the Tunables tab, you can specify the sizes of the memory caches used when reading rows in hashed files and when writing rows to hashed files. Hashed files are mainly used for lookups and are discussed in a later module. Active-to-Active link performance settings will be covered in detail in a later module in this course. 3-9
  • 44. Module 3 – Configuring Projects DataStage 314Svr 3 - 10 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 45. Module 4 Designing and Running Jobs
  • 46. Module 4 – Designing and Running Jobs DataStage 314Svr 4-2 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 47. DataStage Essentials Module 4 – Designing and Running Jobs A job is an executable DataStage program. In DataStage, you can design and run jobs that perform many useful data warehouse tasks, including data extraction, data conversion, data aggregation, data loading, etc. DataStage jobs are: • Designed and built in Designer. • Scheduled, invoked, and monitored in Director. • Executed under the control of DataStage. 4-3
  • 48. Module 4 – Designing and Running Jobs DataStage 314Svr In this module, you will go through the whole process with a simple job, except for the first bullet. In this module you will manually define the metadata. 4-4 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 49. DataStage Essentials Module 4 – Designing and Running Jobs In the center right is the Designer canvas. On it you place stages and links from the Tools Palette on the right. On the bottom left is the Repository window, which displays the branches in Manager. Items in Manager, such as jobs and table definitions can be dragged to the canvas area. Click View>Repository to display the Repository window. Click View>Property Browser to display the Property Broswer window. This window displays the properties of objects selected on the canvas. 4-5
  • 50. Module 4 – Designing and Running Jobs DataStage 314Svr The toolbar at the top contains quick access to the main functions of Designer. 4-6 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 51. DataStage Essentials Module 4 – Designing and Running Jobs The tool palette contains icons that represent the components you can add to your job design. Most of the stages shown here are automatically installed when you install DataStage. You can also install additional stages called plug-ins for special purposes. For example, there is a plug-in called sort that can be used to sort data. Plug-ins are discussed in a later module. 4-7
  • 52. Module 4 – Designing and Running Jobs DataStage 314Svr There are two kinds of stages: Passive stages define read and write access to data sources and repositories. • Sequential • ODBC • Hashed Active stages define how data is filtered and transformed. • Transformer • Aggregator • Sort plug-in 4-8 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 53. DataStage Essentials Module 4 – Designing and Running Jobs True or False? The Sequential stage is an active stage. True: Incorrect. The Sequential stage is considered a passive stage because it is used to extract or load sequential data from a file. It is not used to transform or modify data. False: Correct! The Sequential stage is considered a passive stage because it is used to extract or load sequential data from a file. It is not used to transform or modify data. 4-9
  • 54. Module 4 – Designing and Running Jobs DataStage 314Svr 4 - 10 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 55. DataStage Essentials Module 4 – Designing and Running Jobs The Sequential stage is used to extract data from a sequential file or to load data into a sequential file. The main things you need to specify when editing the sequential file stage are the following: • Path and name of file • File format • Column definitions • If the sequential stage is being used as a target, specify the write action: Overwrite the existing file or append to it. 4 - 11
  • 56. Module 4 – Designing and Running Jobs DataStage 314Svr 4 - 12 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 57. DataStage Essentials Module 4 – Designing and Running Jobs Defining a sequential target stage is similar to defining a sequential source stage. You are defining the format of the data flowing into the stage, that is, from the input links. Define each input link listed in the Input name box. You are defining the file the job will write to. If the file doesn’t exist, it will be created. Specify whether to overwrite or append the data in the Update action set of buttons. General Tab Filter command. Here you can specify a filter program for processing the file you are extracting data from. This feature can be used, for example, to unzip a compressed file before reading it. You can type in or browse for the filter program, and specify any command line arguments it requires in the text box. This text box is enabled only if you have selected the Stage uses filter commands checkbox on the Stage page General tab. Note that, if you specify a filter command, data browsing is not available so the View Data button is disabled. On the Format tab, you can specify a different format for the target file than you specified for the source file. If the target file doesn’t exist, you will not (of course!) be able to view its data until after the job runs. If you click the View data button, DataStage will return a “Failed to open …” error. 4 - 13
  • 58. Module 4 – Designing and Running Jobs DataStage 314Svr The column definitions you defined in the source stage for a given (output) link will appear already defined in the target stage for the corresponding (input) link. Think of a link as like a pipe. What flows in one end flows out the other end. The format going in is the same as the format going out. 4 - 14 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 59. DataStage Essentials Module 4 – Designing and Running Jobs The Transformer stage is the primary active stage. Other active stages perform more specialized types of transformations. In the Transformer stage you can specify: • Column mappings • Derivations • Constraints A column mapping maps an input column to an output column. Values are passed directly from the input column to the output column. Derivations calculate the values to go into output columns based on values in zero or more input columns. Constraints specify the conditions under which incoming rows will be written to output links. 4 - 15
  • 60. Module 4 – Designing and Running Jobs DataStage 314Svr Notice the following elements of the transformer: The top, left pane displays the columns of the input links. If there are multiple input links, multiple sets of columns are displayed. The top, right pane displays the contents of the output links. We haven’t defined any fields here yet. If there are multiple output links, multiple sets of columns are displayed. For now, ignore the Stage Variables window in the top, right pane. This will be discussed in a later module. The bottom area shows the column definitions (metadata) for the input and output links. If there are multiple input and/or output links, there will be multiple tabs. 4 - 16 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 61. DataStage Essentials Module 4 – Designing and Running Jobs 4 - 17
  • 62. Module 4 – Designing and Running Jobs DataStage 314Svr Add one or more Annotation stages to the canvas to document your job. An Annotation stage works like a text box with various formatting options. You can optionally show or hide the Annotation stages by pressing a button on the toolbar. There are two Annotation stages. The Description Annotation stage is discussed in a later module. 4 - 18 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 63. DataStage Essentials Module 4 – Designing and Running Jobs Type the text in the box. Then specify the various options including: • Text font and color • Text box color • Vertical and horizontal text justification 4 - 19
  • 64. Module 4 – Designing and Running Jobs DataStage 314Svr Before you can run your job, you must compile it. This generates executable code that can be run by the DataStage Server engine. To compile a job, click File>Compile or click the Compile button on the toolbar. The Compile Job window displays the status of the compile. If an error occurs: Click Show Error to identify the stage where the error occurred. Click More to retrieve more information about the error. 4 - 20 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 65. DataStage Essentials Module 4 – Designing and Running Jobs As you know, you run your jobs in Director. You can open Director from within Designer by clicking Tools>Run Director. In a similar way, you can move between Director, Manager, and Designer. There are two methods for running a job: • Run it immediately. • Schedule it to run at a later time or date. To run a job immediately: • Select the job in the Job Status view. The job must have been compiled. • Click Job>Run Now or click the Run Now button in the toolbar. The Job Run Options window is displayed. 4 - 21
  • 66. Module 4 – Designing and Running Jobs DataStage 314Svr This shows the Director Job Status view. To run a job, select it and then click Job>Run Now. Other views available: • Job log – view messages from job run • Schedule – view dates and times job is scheduled to run 4 - 22 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 67. DataStage Essentials Module 4 – Designing and Running Jobs • The Job Run Options window is displayed when you click Job>Run Now. This window allows you to stop the job after: • A certain number of rows. • A certain number of warning messages. You can validate your job before you run it. Validation performs some checks that are necessary in order for your job to run successfully. These include: • Verifying that connections to data sources can be made. • Verifying that files can be opened. • Verifying that SQL statements used to select data can be prepared. Click Run to run the job after it is validated. The Status column displays the status of the job run. 4 - 23
  • 68. Module 4 – Designing and Running Jobs DataStage 314Svr Click the Log button in the toolbar to view the job log. The job log records events that occur during the execution of a job. These events include control events, such as the starting, finishing, and aborting of a job; informational messages; warning messages; error messages; and program-generated messages. 4 - 24 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 69. DataStage Essentials Module 4 – Designing and Running Jobs 4 - 25
  • 70. Module 4 – Designing and Running Jobs DataStage 314Svr 4 - 26 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 71. DataStage Essentials Module 4 – Designing and Running Jobs 4 - 27
  • 72.
  • 74. Module 5 – Working with Meta Data DataStage 314Svr 5-2 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 75. DataStage Essentials Module 5 – Working with Meta Data DataStage Manager is a graphical tool for managing the contents of your DataStage project repository, which contains metadata and other DataStage components such as jobs and routines. Metadata is “data about data” that describes the formats of sources and targets. This includes general format information such as whether the record columns are delimited and, if so, the delimiting character. It also includes the specific column definitions. 5-3
  • 76. Module 5 – Working with Meta Data DataStage 314Svr 5-4 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 77. DataStage Essentials Module 5 – Working with Meta Data The left pane contains the project tree. There are eight main branches, but you can create subfolders under each. Select a folder in the project tree to display its contents. In this example, a folder named DS304 has been created that contains some of the jobs in the project. Data Elements branch: Lists the built-in and custom data elements. (Data elements are extensions of data types, and are discussed in a later module.) Jobs branch: Lists the jobs in the current project. Routines branch: Lists the built-in and custom routines. Routines are blocks of DataStage BASIC code that can be called within a job. (Routines are discussed in a later module.) Shared Containers branch: Shared Containers encapsulate sets of DataStage components into a single stage. (Shared Containers are discussed in a later module.) Stage Types branch: Lists the types of stages that are available within a job. Built-in stages include the sequential and transformer stages you used in Designer. Table Definitions branch: Lists the table definitions available for loading into a job. 5-5
  • 78. Module 5 – Working with Meta Data DataStage 314Svr Transforms branch: Lists the built-in and custom transforms. Transforms are functions you can use within a job for data conversion. Transforms are discussed in a later module. 5-6 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 79. DataStage Essentials Module 5 – Working with Meta Data DataStage Manager manages two different types of objects: • Metadata describing sources and targets: − Called table definitions in Manager. These are not to be confused with relational tables. DataStage table definitions are used to describe the format and column definitions of any type of source: sequential, relational, hashed file, etc. − Table definitions can be created in Manager or Designer and they can also be imported from the sources or targets they describe. • DataStage components − Every object in DataStage (jobs, routines, table definitions, etc.) is stored in the DataStage repository. Manager is the interface to this repository. − DataStage components, including whole projects, can be exported from and imported into Manager. 5-7
  • 80. Module 5 – Working with Meta Data DataStage 314Svr Any set of DataStage objects, including whole projects, which are stored in the Manager Repository, can be exported to a file. This export file can then be imported back into DataStage. Import and export can be used for many purposes, including: • Backing up jobs and projects. • Maintaining different versions of a job or project. • Moving DataStage objects from one project to another. Just export the objects, move to the other project, then re-import them into the new project. • Sharing jobs and projects between developers. The export files, when zipped, are small and can be easily emailed from one developer to another. 5-8 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 81. DataStage Essentials Module 5 – Working with Meta Data Click Export>DataStage Components in Manager to begin the export process. Any object in Manager can be exported to a file. Use this procedure to backup your work or to move DataStage objects from one project to another. Select the types of components to export. You can select either the whole project or select a portion of the objects in the project. Specify the name and path of the file to export to. By default, objects are exported to a text file in a special format. By default, the extension is dsx. Alternatively, you can export the objects to an XML document. The directory you export to is on the DataStage client, not the server. 5-9
  • 82. Module 5 – Working with Meta Data DataStage 314Svr True or False? You can export DataStage objects such as jobs, but you can't export metadata, such as field definitions of a sequential file. True: Incorrect. Metadata describing files and relational tables are stored as "Table Definitions". Table definitions can be exported and imported as any DataStage objects can. False: Correct! Metadata describing files and relational tables are stored as "Table Definitions". Table definitions can be exported and imported as any DataStage objects can. 5 - 10 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 83. DataStage Essentials Module 5 – Working with Meta Data True or False? The directory you export to is on the DataStage client machine, not on the DataStage server machine. True: Correct! The directory you select for export must be addressible by your client machine. False: Incorrect. The directory you select for export must be addressible by your client machine. 5 - 11
  • 84. Module 5 – Working with Meta Data DataStage 314Svr 5 - 12 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 85. DataStage Essentials Module 5 – Working with Meta Data To import DataStage components, click Import>DataStage Components. Select the file to import. Click Import all to begin the import process or Import selected to view a list of the objects in the import file. You can import selected objects from the list. Select the Overwrite without query button to overwrite objects with the same name without warning. 5 - 13
  • 86. Module 5 – Working with Meta Data DataStage 314Svr 5 - 14 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 87. DataStage Essentials Module 5 – Working with Meta Data 5 - 15
  • 88. Module 5 – Working with Meta Data DataStage 314Svr Table definitions define the formats of a variety of data files and tables. These definitions can then be used and reused in your jobs to specify the formats of data stores. For example, you can import the format and column definitions of the Customers.txt file. You can then load this into the sequential source stage of a job that extracts data from the Customers.txt file. You can load this same metadata into other stages that access data with the same format. In this sense the metadata is reusable. It can be used with any file or data store with the same format. If the column definitions are similar to what you need you can modify the definitions and save the table definition under a new name. You can also use the same table definition for different types of data stores with the same format. For example, you can import a table definition from a sequential file and use it to specify the format for an ODBC table. In this sense the metadata is “loosely coupled” with the data whose format it defines. You can import and define several different kinds of table definitions including: Sequential files, ODBC data sources, UniVerse tables, hashed files. 5 - 16 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 89. DataStage Essentials Module 5 – Working with Meta Data To start the import, click Import>Table Definitions>Sequential File Definitions. The Import Meta Data (Sequential) window is displayed. Select the directory containing the sequential files. The Files box is then populated with the files you can import. Select the file to import. Select or specify a category (folder) to import into. • The format is: <Category><Sub-category> • <Category> is the first-level sub-folder under Table Definitions. • <Sub-category> is (or becomes) a sub-folder under the type. 5 - 17
  • 90. Module 5 – Working with Meta Data DataStage 314Svr In Manager, select the category (folder) that contains the table definition. Double-click the table definition to open the Table Definition window. Click the Columns tab to view and modify any column definitions. Select the Format tab to edit the file format specification. 5 - 18 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 91. DataStage Essentials Module 5 – Working with Meta Data 5 - 19
  • 92. Module 5 – Working with Meta Data DataStage 314Svr 5 - 20 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 93. DataStage Essentials Module 5 – Working with Meta Data 5 - 21
  • 94. Module 5 – Working with Meta Data DataStage 314Svr 5 - 22 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 95. Module 6 Working with Relational Data
  • 96. Module 6 – Working with Relational Data DataStage 314Svr 6-2 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 97. DataStage Essentials Module 6 – Working with Relational Data You can perform the same tasks with relational data that you can with sequential data. You can extract, filter, and transform data from relational tables. You can also load data into relational tables. Although you can work with many relational databases through native drivers (including UniVerse, UniData, and Oracle), you can access many more relational databases using ODBC. In the ODBC stage, you can either specify your query to one or more tables in the database interactively or you can type the query or you can paste in an existing query. 6-3
  • 98. Module 6 – Working with Relational Data DataStage 314Svr Before you can access data through ODBC you must define an ODBC data source. In Windows NT, this can be done using the (32 bit) ODBC Data Source Administrator in the Control Panel. The ODBC Data Source Administrator has several tabs. For use with DataStage, you should define your data sources on the System DSN tab (not User DSN). You can install drivers for most of the common relational database systems from the DataStage installation CD. Click Add to define a new data source. When you click Add a list of available drivers is displayed. Select the appropriate driver and then click Finish. Different relational databases have different requirements. As an example, we will define a Microsoft Access data source. • Type the name of the data source in the Data Source Name box. • Click Select to define a connection to an existing database. Type the name and location of the database. • Click Create to define a connection to a new database. 6-4 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 99. DataStage Essentials Module 6 – Working with Relational Data 6-5
  • 100. Module 6 – Working with Relational Data DataStage 314Svr Importing table definitions from ODBC databases is similar to importing sequential file definitions. Click Import>Table Definitions>ODBC Table Definitions in Manager to start the process. The DSN list displays the data sources that are defined for the DataStage Server. Select the data source you want to import from and, if necessary, provide a user name and password. The Import Metadata window is displayed. It lists all tables in the database that are available for import. Select one or more tables and a category to import to, and then click OK. 6-6 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 101. DataStage Essentials Module 6 – Working with Relational Data Extracting data from a relational table is similar to extracting data from a sequential file except that you use an ODBC stage instead of a sequential stage. In this example, we’ll extract data from a relational table and load it into a sequential file. 6-7
  • 102. Module 6 – Working with Relational Data DataStage 314Svr Specify the ODBC data source name in the Data source name box on the General tab of the ODBC stage. You can click the Get SQLInfo button to retrieve the quote character and schema delimiters from the ODBC database. 6-8 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 103. DataStage Essentials Module 6 – Working with Relational Data Specify the table name on the General tab of the Outputs tab. Select Generated query to define the SQL SELECT statement interactively using the Columns and Selection tabs. Select User-defined SQL query to write your own SQL SELECT statement to send to database. 6-9
  • 104. Module 6 – Working with Relational Data DataStage 314Svr Load the table definitions from Manager on the Columns tab. The procedure is the same as for sequential files. When you click Load, the Select Columns window is displayed. Select the columns data is to be extracted from. 6 - 10 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 105. DataStage Essentials Module 6 – Working with Relational Data Optionally, specify a WHERE clause and other additional SQL clauses on the Selection tab. Other clauses can be anything else you wish to add to the Select clause, such as ORDER BY. 6 - 11
  • 106. Module 6 – Working with Relational Data DataStage 314Svr The View SQL tab enables you to view the SELECT statement that will be used to select the data from the table. The SQL displayed in “read-only.” Click View Data to test the SQL statement against the database. 6 - 12 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 107. DataStage Essentials Module 6 – Working with Relational Data If you want to define your own SQL query, click User-defined SQL query on the General tab and then write or paste the query into the SQL for primary inputs box on the SQL Query tab. 6 - 13
  • 108. Module 6 – Working with Relational Data DataStage 314Svr 6 - 14 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 109. DataStage Essentials Module 6 – Working with Relational Data Editing an ODBC target stage is similar to editing an ODBC source stage. It includes the following tasks: • Specify the data source containing the target table. • Specify the name of the table. • Select the update action. You can choose from a variety of INSERT and/or UPDATE actions. • Optionally, create the table. • Load the column definitions from the Manager table definition. 6 - 15
  • 110. Module 6 – Working with Relational Data DataStage 314Svr Some of the options are different in the ODBC stage when it is used as a target. Select the type of action to perform from the Update action list. You can optionally have DataStage create the target table or you can load to an existing table. On the View SQL tab you can view the SQL statement used to insert the data into the target table. 6 - 16 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 111. DataStage Essentials Module 6 – Working with Relational Data On the Edit DDL tab you can generate and modify the CREATE TABLE statement used to create the target table. If you make any changes to column definitions, you need to regenerate the CREATE TABLE statement by clicking the Create DDL button. 6 - 17
  • 112. Module 6 – Working with Relational Data DataStage 314Svr Transaction Handling: Allows you to specify a transaction isolation level for read data. The isolation level specifies how potential conflicts between transactions (i.e., dirty read, nonrepeatable reads, and phantom reads) are handled. By default, all the rows are written to the target table before a COMMIT. In the Rows per transaction box, you can specify a specific number of rows to write before the COMMIT. 6 - 18 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 113. DataStage Essentials Module 6 – Working with Relational Data 6 - 19
  • 114. Module 6 – Working with Relational Data DataStage 314Svr True or False? Using a single ODBC stage, you can only extract data from a single table. True: Incorrect. You can join data from multiple tables within a single data source. False: Correct! You can join data from multiple tables within a single data source. 6 - 20 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 115. DataStage Essentials Module 6 – Working with Relational Data 6 - 21
  • 116. Module 6 – Working with Relational Data DataStage 314Svr 6 - 22 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 117. DataStage Essentials Module 6 – Working with Relational Data The ORAOCI8 plug-in lets you rapidly and efficiently prepare and load streams of tabular data from any DataStage stage (for example, the ODBC stage, the Sequential File stage, and so forth) to and from tables of the target Oracle database. The Oracle client on Windows NT or UNIX uses SQL*Net to access an Oracle server on Windows NT or UNIX. 6 - 23
  • 118. Module 6 – Working with Relational Data DataStage 314Svr The plug-in appears as any other stage on the designer work area. It can extract or write data contained in Oracle tables. Features: • Each ORAOCI8 plug-in stage is a passive stage that can have any number of input, output, and reference output links. • Input links specify the data you are writing, which is a stream of rows to be loaded into an Oracle database. You can specify the data on an input link using an SQL statement constructed by DataStage or a user-defined SQL statement. • Output links specify the data you are extracting, which is a stream of rowsto be read from an Oracle database. You can also specify the data on an output link using an SQL statement constructed by DataStage or a userdefined SQL statement. • Each reference output link represents a row that is key read from an Oracle database (that is, it reads the record using the key field in the WHERE clause of the SQL SELECT statement). 6 - 24 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 119. DataStage Essentials Module 6 – Working with Relational Data General Tab This tab is displayed by default. It contains the following fields: Table name. This required field is editable when the update action is not User- defined SQL (otherwise, it is read-only). It is the name of the target Oracle table the data is written to, and the table must exist or be created by choosing generate DDL from the Create table action list. You must have insert, update, or delete privileges, depending on input mode. You must specify Table name if you do not specify User-defined SQL. There is no default. Click … (Browse button) to browse the Repository to select the table. Update action. Specifies which SQL statements are used to update the target table. Some update actions require key columns to update or delete rows. There is no default. Choose the option you want from the list. Clear table then insert rows. Deletes the contents of the table and adds the new rows, with slower performance because of transaction logging. Truncate table then insert rows. Truncates the table with no transaction logging and faster performance. 6 - 25
  • 120. Module 6 – Working with Relational Data DataStage 314Svr Insert rows without clearing. Inserts the new rows in the table. Delete existing rows only. Deletes existing rows in the target table that have identical keys in the source files. Replace existing rows completely. Deletes the existing rows, then adds the new rows to the table. Update existing rows only. Updates the existing data rows. Any rows in the data that do not exist in the table are ignored. Update existing rows or insert new rows. Updates the existing data rows before adding new rows. It is faster to update first when you have a large number of records. Insert new rows or update existing rows. Inserts the new rows before updating existing rows. It is faster to insert first if you have only a few records. User-defined SQL. Writes the data using a user-defined SQL statement, which overrides the default SQL statement generated by the stage. If you choose this option, you enter the SQL statement on the SQL tab. User-defined SQL file. Reads the contents of the specified file to write the data. Transaction Isolation. Provides the necessary concurrency control between transactions in the job and other transactions. Use one of the following transaction isolation levels: • Read committed. Takes exclusive locks on modified data and sharable locks on all other data. Each query executed by a transaction sees only data that was committed before the query (not the transaction) began. Oracle queries never read dirty (uncommitted) data. This is the default. • Serializable. Takes exclusive locks on modified data and sharable locks on all other data. Serializable transactions see only the changes that were committed at the time the transaction began. Note: If Enable transaction grouping is selected on the Transaction Handling tab, only the Transaction Isolation value for the first link is used for the entire group. • Array size. Specifies the number of rows to be transferred in one call between DataStage and Oracle before they are written. Enter a positive integer to indicate how often Oracle performs writes at a time to the database. The default value is 1, that is, each row is written in a separate statement. Larger numbers use more memory on the client to cache the 6 - 26 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 121. DataStage Essentials Module 6 – Working with Relational Data rows. This minimizes server round trips and maximizes performance by executingfewer statements. If this number is too large, the client may run out of memory. • Transaction size. This field exists for backward compatibility, but it is ignored for version 3.0 and later of the plug-in. The transaction size for new jobs is now handled by Rows per transaction on the Transaction Handling tab. • Create table action. Creates the target table in the specified database if Generate DDL is selected. It uses the column definitions in the Columns tab and the table name and the TABLESPACE and STORAGE properties for the target table. The generated Create Table statement includes the TABLESPACE and STORAGE keywords, which indicate the location where the table is created and the storage expression for the Oracle storage_clause. You must have CREATE TABLE privileges on your schema. You can also specify your own CREATE TABLE SQL statement. You must enter the storage clause in Oracle format. (Use the User-defined DDL tab on the SQL tab for a complex statement.) 6 - 27
  • 122.
  • 123. Module 7 Constraints and Derivations
  • 124. Module 7 – Constraints and Derivations DataStage 314Svr 7-2 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 125. DataStage Essentials Module 7 – Constraints and Derivations A constraint specifies the condition under which data flows through a link. For example, suppose you want to split the data in the jobs file into separate files based on the job level. We need to define a constraint on each link so that only jobs within a certain level range are written to each file. 7-3
  • 126. Module 7 – Constraints and Derivations DataStage 314Svr Click the Constraints button in the toolbar at the top of the Transformer Stage window to open the Transformer Stage Contraints window. The Transformer Stage Contraints window lists all the links out of the transformer. Double-click on the cell next to a link to create the constraint. • Rows that are not written out to previous rows are written to a rejects link. • A row of data is sent down all the links it satisfies. • If there is no constraint on a (non-rejects) link, all rows will be sent down the link. 7-4 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 127. DataStage Essentials Module 7 – Constraints and Derivations This shows the Constraints window. Constraints are defined for each of the top three links. The Reject Row box is selected for the last link. All rows that fail to satisfy the top three links will be sent down this link. 7-5
  • 128. Module 7 – Constraints and Derivations DataStage 314Svr True or False? A constraint specifies a condition under which incoming rows of data will be written to an output link True: Correct! You can separately define a constraint for each output link. If no constraint is written for a particular output link, then all rows will be written to that link. False: Incorrect. You can separately define a constraint for each output link. If no constraint is written for a particular output link, then all rows will be written to that link. 7-6 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 129. DataStage Essentials Module 7 – Constraints and Derivations True or False? A Rejects link can be placed anywhere in the link ordering. True: Incorrect. A Rejects link should be placed last in the link ordering, if it is to get every row that doesn't satisfy any of the other constraints. False: Correct! A Rejects link should be placed last in the link ordering, if it is to get every row that doesn't satisfy any of the other constraints. 7-7
  • 130. Module 7 – Constraints and Derivations DataStage 314Svr 7-8 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 131. DataStage Essentials Module 7 – Constraints and Derivations A derivation is an expression that specifies the value to be moved into a target column (field). Every target column must have a derivation. The simplest derivation is an input column. The value in the input column is moved to the target column. To construct a derivation for a target column double-click on the derivation cell next to the target column. Derivations are constructed in the same way that constraints are constructed: • Type constants. • Type or enter operators from Operator shortcut menu. • Type or enter operands from Operand shortcut menu. What’s the difference between derivations and constraints? • Constraints apply to links; derivations apply to columns. • Constraints are conditions, either true or false; derivations specify a value to go into a target column. 7-9
  • 132. Module 7 – Constraints and Derivations DataStage 314Svr In this example the concatenation of several fields is moved into the FullName target field. The colon (:) is the concatenation operator. You can insert this from the Operator menu or type it in. 7 - 10 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 133. DataStage Essentials Module 7 – Constraints and Derivations True or False? If the constraint for a particular link is not satisified, then the derivations defined for that link are not executed. True: Correct! Constraints have precedence over derivations. Derivations in an output link are only executed if the constraint is satisfied. False: Incorrect. Constraints have precedence over derivations. Derivations in an output link are only executed if the constraint is satisfied. 7 - 11
  • 134. Module 7 – Constraints and Derivations DataStage 314Svr 7 - 12 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 135. DataStage Essentials Module 7 – Constraints and Derivations You can create stage variables for use in your column derivations and constraints. Stage variables store values without writing them out to a target file or table. They can be used in expressions just like constants, input columns, and other operands. Stage variables retain their values across reads. This allows them to be used as counters and accumulators. You can also use them to compare a current input value to a previous input value. To create a new stage variable, click the right mouse button over the Stage Variables window and then click Append New Stage Variable (or Insert New Stage Variable). After you create it, you specify a derivation for it in the same way as for columns. 7 - 13
  • 136. Module 7 – Constraints and Derivations DataStage 314Svr This lists the execution order: • Derivations in stage variables are executed before constraints. This allows them to be used in constraints. • Next constraints are executed. • Then column derivations are executed. • Derivations in higher columns are executed before lower columns. 7 - 14 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 137. DataStage Essentials Module 7 – Constraints and Derivations Note the output link reordering icon available on the toolbar from within the Transformer stage. 7 - 15
  • 138. Module 7 – Constraints and Derivations DataStage 314Svr To get to the link ordering screen, open the transformer stage, then click on the output link execution order icon. The above screen will appear. Select a link and use the arrow buttons to reposition a link in the execution order. 7 - 16 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 139. DataStage Essentials Module 7 – Constraints and Derivations Derivations for stage variables are executed before derivations for any output link columns. True: Correct! So you can be sure that the derivations for any of the stage variables referenced in column derivations will have already been executed. False: Incorrect. The derivations for stage variables are executed first. So you can be sure that the derivations for any of the stage variables referenced in column derivations will have already been executed. 7 - 17
  • 140. Module 7 – Constraints and Derivations DataStage 314Svr 7 - 18 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 141. DataStage Essentials Module 7 – Constraints and Derivations 7 - 19
  • 142. Module 7 – Constraints and Derivations DataStage 314Svr 7 - 20 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 143. Module 8 Creating Basic Expressions
  • 144. Module 8 – Creating Basic Expressions DataStage 304 8-2 Copyright © 2002 Ascential Software Corporation 03/01/02
  • 145. DataStage Essentials Module 8 – Creating Basic Expressions DataStage BASIC is a form of BASIC that has been customized to work with DataStage. In the previous module you learned how to define constraints and derivations. Derivations and constraints are written using DataStage BASIC. Job control routines, which are discussed in a later module, are also written in DataStage BASIC. This module will not attempt to teach you BASIC programming. Our focus is on what you need to know in order to construct complex DataStage constraints and derivations. 8-3
  • 146. Module 8 – Creating Basic Expressions DataStage 304 For more information about BASIC operators than is provided here, search for “BASIC Operators” in Help. You can insert these operators from the Operators menu (except for the IF operator, which is on the Operands menu). • Arithmetic operators: -, +, *, / • Relational operators: =, <, >, <=, >= • Logical operators: AND, OR, NOT • IF operator: − IF min_lvl < 0 THEN “Out of Range” ELSE “In Range” • Concatenation operator (:) − “The employee’s name is ” : lname : “, ” : fname • Substring operator ([start, length]). First character is 1 (not 0). − “APPL3245”[1, 4] → “APPL” − “APPL3245”[5, 2] → “32” 8-4 Copyright © 2002 Ascential Software Corporation 03/01/02
  • 147. DataStage Essentials Module 8 – Creating Basic Expressions For more information about BASIC functions than is provided here, look up Alphabetical List BASIC Functions and Statements in Help. BASIC functions include the standard Pick BASIC functions. Click Function from the Operands menu to insert a function. Here are a few of the more common functions: • TRIM(string), TRIM(string, character), TRIMF, TRIMB − TRIM(“ xyz ” ) → “xyz” • LEN(string) • UPCASE(string), DOWNCASE(string) • ICONV, OCONV − ICONV is used to convert values to an internal format − OCONV is used to convert values from an internal format − Very powerful functions. Often used for date and time conversions and manipulations. − These functions are discussed later in the module. 8-5
  • 148. Module 8 – Creating Basic Expressions DataStage 304 For more information about BASIC system variables than is provided here, look up System Variables in Help. Click System Variable from the Operands menu to insert a system variable. • @DATE, @TIME Date/time job started − @YEAR, @MONTH, @DAY Extracted from @DATE • @INROWNUM row counter - incoming link • @OUTROWNUM row counter - outgoing link • @LOGNAME User logon name • @NULL NULL value • @TRUE, @FALSE • @WHO Name of current project 8-6 Copyright © 2002 Ascential Software Corporation 03/01/02
  • 149. DataStage Essentials Module 8 – Creating Basic Expressions True or False? TRIM is a system variable. True: Incorrect. TRIM is a DataStage function that removes surrounding spaces in a character string. False: Correct! TRIM is a DataStage function that removes surrounding spaces in a character string. 8-7
  • 150. Module 8 – Creating Basic Expressions DataStage 304 True or False? @INROWNUM is a DataStage function. True: Incorrect. System variables all begin with the @-sign. @INROWNUM is a system variable that contains the number of the last row read from the input link. False: Correct! System variables all begin with the @-sign. @INROWNUM is a system variable that contains the number of the last row read from the input link. 8-8 Copyright © 2002 Ascential Software Corporation 03/01/02
  • 151. DataStage Essentials Module 8 – Creating Basic Expressions DataStage is supplied with a number of functions you can use to obtain information about your jobs and projects. You can insert these functions into derivations. DS functions and macros are discussed in a later module. 8-9
  • 152. Module 8 – Creating Basic Expressions DataStage 304 DS (DataStage) routines are defined in DataStage Manager. There are several types of DS routines. The type you can insert into your derivations and constraints are of the Transform Function type. A DS Transform Function Routine consists of a predefined block of BASIC statements that takes one or more arguments and returns a single value. DS routines are defined in DataStage Manager. You can define your own routines, but there are also a number of pre-built routines that are supplied with DataStage. The pre-built routines include a number of routines for manipulating dates, such as ConvertMonth, QuarterTag, and Timestamp. 8 - 10 Copyright © 2002 Ascential Software Corporation 03/01/02
  • 153. DataStage Essentials Module 8 – Creating Basic Expressions 8 - 11
  • 154. Module 8 – Creating Basic Expressions DataStage 304 Data elements are extended data types. For example, a phone number is a kind of string. You could define a data element called PHONE.NUMBER to precisely define this type. Data elements are defined in DataStage Manager. A number of built-in types are supplied with DataStage. For example MONTH.TAG represents a string of the form “YYYY-MM”. 8 - 12 Copyright © 2002 Ascential Software Corporation 03/01/02
  • 155. DataStage Essentials Module 8 – Creating Basic Expressions 8 - 13
  • 156. Module 8 – Creating Basic Expressions DataStage 304 DS Transforms are similar to DS Transform Function routines. They take one or more arguments and return a single value. There are two primary differences: • The argument(s) and return value have specific data elements associated with them. In this sense, they transform data from one data element type to another data element type. • Unlike DS routines, they do not consist of blocks of BASIC statements. Rather, they consist of a single (though possibly very complex) BASIC expression. You can define your own DS Transforms, but there are also a number of pre-built transforms that are supplied with DataStage. The pre-built transforms include a number of routines for manipulating strings and dates. 8 - 14 Copyright © 2002 Ascential Software Corporation 03/01/02
  • 157. DataStage Essentials Module 8 – Creating Basic Expressions 8 - 15
  • 158. Module 8 – Creating Basic Expressions DataStage 304 8 - 16 Copyright © 2002 Ascential Software Corporation 03/01/02
  • 159. DataStage Essentials Module 8 – Creating Basic Expressions 8 - 17
  • 160. Module 8 – Creating Basic Expressions DataStage 304 Date manipulation in DataStage can be done in several ways: • Using the Iconv and Oconv functions using the “D” conversion code. • Using the built-in date Transforms. • Using the built-in date routines. • Using routines in the DataStage Software Development Kit (SDK) Using routines in the DataStage Software Development Kit (SDK) is covered in another DataStage course. Your instructor can provide further details. The SDK routines are installed in the Manager Routinessdk folder. 8 - 18 Copyright © 2002 Ascential Software Corporation 03/01/02
  • 161. DataStage Essentials Module 8 – Creating Basic Expressions For detailed help on Iconv and Oconv, see their entries in the Alphabetical List of BASIC Functions and Statements in Help. Use Iconv to convert a string date in a variety of formats to the internal DataStage integer format. Use Oconv to convert an internal date to a string date in a variety of formats. Use these two functions together to covert a string date from one format to another. The internal format for a date is based on a reference date of December 31, 1967, which is day 0. Dates before are negative integers; dates after are positive integers. Use the “D” conversion code to specify the format of the date to be converted to an internal date by Iconv or the format of the date to be output by Oconv. 8 - 19
  • 162. Module 8 – Creating Basic Expressions DataStage 304 For detailed help (more than you probably want), see D Code under Iconv or Oconv in Help. “D4-MDY[2,2,4]” • D Date conversion code • 4 Number of digits in year • - Separator • MDY Ordering is month, day, year • [2,2,4] Number of digits for M,D,Y, respectively Note: • The number in brackets for “Y” (namely 4) overrides the number following “D”. • Iconv ignores some of the characters. − Any separator will do. − Number of characters is ignored if there are separators. 8 - 20 Copyright © 2002 Ascential Software Corporation 03/01/02
  • 163. DataStage Essentials Module 8 – Creating Basic Expressions • Iconv(“12-31-67”, “D2-MDY[2,2,2]”) → 0 • Iconv(“12311967”, “D MDY[2,2,4]”) → 0 • Iconv(“31-12-1967”, “D-DMY[2,2,4]”) → 0 • Oconv(0, “D2-MDY[2,2,4]”) → “12-31-1967” • Oconv(0, “D2/DMY[2,2,2]”) → “31/12/67” • Oconv(10, “D/YDM[4,2,A10]”) → “1968/10/JANUARY” − This example illustrates the use of an additional formatting option. The “A10” options says to alphabetically express the name, length 10 characters. • Oconv( Iconv(“12-31-67”, “D2-MDY[2,2,2]”), “D/YDM[4,2,A10]”) → “1967/31/DECEMBER” − This example shows how to convert from one string representation to another. 8 - 21
  • 164. Module 8 – Creating Basic Expressions DataStage 304 8 - 22 Copyright © 2002 Ascential Software Corporation 03/01/02
  • 165. DataStage Essentials Module 8 – Creating Basic Expressions DataStage provides a number of built-in transforms you can use for date conversions. The following data elements are used with the built-in transforms: Data element String format Example DATE.TAG YYYY-MM-DD 1999-02-24 WEEK.TAG YYYYWnn 1999W06 MONTH.TAG YYYY-MM 1999-02 QUARTER.TAG YYYYQn 1999Q4 YEAR.TAG YYYY 1999 8 - 23
  • 166. Module 8 – Creating Basic Expressions DataStage 304 True or False? You can use Oconv to convert a string date from one format to another. True: Incorrect. Oconv by itself can't do this. You would first use Iconv to convert the input string into a day integer. Then you can use Oconv to convert the day integer into the output string. False: Correct! Oconv by itself can't do this. You would first use Iconv to convert the input string into a day integer. Then you can use Oconv to convert the day integer into the output string. 8 - 24 Copyright © 2002 Ascential Software Corporation 03/01/02
  • 167. DataStage Essentials Module 8 – Creating Basic Expressions The transforms can be grouped into the following categories: • String to day number − Formatted string → internal date integer • Day number to date string − Internal date integer → formatted string • Date string to date string − DATE.TAG string → formatted string 8 - 25
  • 168. Module 8 – Creating Basic Expressions DataStage 304 The following transforms convert strings of the specified format (MONTH.TAG, QUARTER.TAG, …) to an internal date representing the first or last day of the period. Function Tag Description MONTH.FIRST MONTH.TAG Returns a numeric internal date corresponding to the first/last day MONTH.LAST of a month QUARTER.FIRST QUARTER.TAG Returns a numeric internal date corresponding to the first/last day QUARTER.LAST of a quarter WEEK.FIRST WEEK.TAG Returns a numeric internal date corresponding to the first day WEEK.LAST (Monday) / last day (Sunday) of a week YEAR.FIRST YEAR.TAG Returns a numeric internal date corresponding to the first/last day YEAR.LAST of a year 8 - 26 Copyright © 2002 Ascential Software Corporation 03/01/02
  • 169. DataStage Essentials Module 8 – Creating Basic Expressions Examples: MONTH.FIRST(“1993-02”) → 9164 MONTH.LAST(“1993-02”) → 9191 8 - 27
  • 170. Module 8 – Creating Basic Expressions DataStage 304 The following functions convert internal dates to strings in various formats (DATE.TAG, MONTH.TAG, …). Function Argument type Description DATE.TAG Internal date Converts internal date to string in DATE.TAG format MONTH.TAG Internal date Converts internal date to string in MONTH.TAG format QUARTER.TAG Internal date Converts internal date to string in QUARTER.TAG format WEEK.TAG Internal date Converts internal date to string in WEEK.TAG format Examples: MONTH.TAG(9177) → “1993-02” DATE.TAG(9177) → “1993-02-14” 8 - 28 Copyright © 2002 Ascential Software Corporation 03/01/02
  • 171. DataStage Essentials Module 8 – Creating Basic Expressions The following functions convert strings in DATE.TAG format to strings in various other formats (DAY.TAG, MONTH.TAG, …). Function Tag Description TAG.TO.MONTH DATE.TAG Convert DATE.TAG to MONTH.TAG TAG.TO.QUARTER DATE.TAG Convert DATE.TAG to QUARTER.TAG TAG.TO.WEEK DATE.TAG Convert DATE.TAG to WEEK.TAG TAG.TO.DAY DATE.TAG Convert DATE.TAG to DAY.TAG Examples: TAG.TO.MONTH(“1993-02-14”) → “1993-02” TAG.TO.QUARTER(“1993-02-14”) → “1993Q1” 8 - 29
  • 172. Module 8 – Creating Basic Expressions DataStage 304 8 - 30 Copyright © 2002 Ascential Software Corporation 03/01/02
  • 173. DataStage Essentials Module 8 – Creating Basic Expressions 8 - 31
  • 174. Module 8 – Creating Basic Expressions DataStage 304 8 - 32 Copyright © 2002 Ascential Software Corporation 03/01/02
  • 176. Module 9 – Troubleshooting DataStage 314Svr 9-2 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 177. DataStage Essentials Module 9 – Troubleshooting Events are logged to the job log file when a job is validated, run, or reset. You can use the log file to troubleshoot jobs that fail during validation or a run. Various entries are written to the log, including when: • The job starts • The job finishes • An active stage starts • An active stage finishes • Rows are rejected (yellow icons) • Errors occur (red icons) • DataStage informational reports are logged • User-invoked messages are displayed 09 - 3
  • 178. Module 9 – Troubleshooting DataStage 314Svr The event window shows the events that are logged for a job during its run. The job log contains the following information: Column Name Description Occurred Time the event occurred On date Date the event occurred Type Info Informational. No action required. Warning An error occurred. Investigate the cause of the warning, as this may indicate a serious error. Fatal A fatal error occurred. Control The job starts and finishes. Reject Rejected rows are output. Reset A job or the log is reset. Event A message describing the event. The system displays the first line of the message. If a message has an ellipsis (…) at the end, it contains more than one line. You can view the full message in the Event Detail window. 9-4 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 179. DataStage Essentials Module 9 – Troubleshooting Clearing the log To clear the log, click Job>Clear Log. 09 - 5
  • 180. Module 9 – Troubleshooting DataStage 314Svr Double-click on an event to open the Event Detail window. This window gives you more information. When an active stage finishes, DataStage logs an informational message that describes how many rows were read in to the stage and how many were written. This provides you with valuable information that can indicate possible errors. 9-6 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 181. DataStage Essentials Module 9 – Troubleshooting The Monitor can be used to display information about a job while it is running. To start the Monitor, click Tools>New Monitor. Once in Monitor, click the right mouse button and then select Show links to display information about each of the input and output links. 09 - 7
  • 182. Module 9 – Troubleshooting DataStage 314Svr When you are testing a job, you can save time by limiting the number of rows and warnings. 9-8 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 183. DataStage Essentials Module 9 – Troubleshooting Server side tracing is enabled in Administrator. It is designed to be used to help customer support analysts troubleshoot serious problems. When enabled, it logs a record to a trace file whenever DataStage clients interact with the server. Caution: Because of the overhead caused by server side tracing it should only be used when working with customer support. 09 - 9
  • 184. Module 9 – Troubleshooting DataStage 314Svr 9 - 10 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 185. DataStage Essentials Module 9 – Troubleshooting DataStage provides a debugger for testing and debugging your job designs. The debugger runs within Designer. With the DataStage debugger you can: • Set breakpoints on job links, including conditional breakpoints. • Step through your job link-by-link or row-by-row. • Watch the values going into link columns. 09 - 11
  • 186. Module 9 – Troubleshooting DataStage 314Svr To begin debugging a program, click View>Debug Bar to display the debug toolbar. The toolbar provides access to all of the debugging functions. Stop Toggle breakpoint Next link Debug job parameters View job log Clear breakpoints Go Debug window Next row Edit breakpoints Button Description Go Start/continue debugging. Next Link The job continues until the next action occurs on the link. Next Row The job continues until the next row is processed or ntil another link ith a breakpoint is 9 - 12 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 187. DataStage Essentials Module 9 – Troubleshooting or until another link with a breakpoint is encountered. Stop Job Stops the job at the point it is at. Click Go to continue. Job Parameters Set limits on rows and warnings. Edit Breakpoints Displays the Edit Breakpoints window, in which you can edit existing breakpoints. Toggle Breakpoint Set or clear a breakpoint on a selected link. Clear All Breakpoints Removes breakpoints from all links. View job log Open Director and view the job log. Debug Window Show/hide the Debug Window, which displays link column values. 09 - 13
  • 188. Module 9 – Troubleshooting DataStage 314Svr To set a breakpoint on a link, select the link and then click the Toggle Breakpoint button. A black circle appears on the link. 9 - 14 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 189. DataStage Essentials Module 9 – Troubleshooting Click the Edit Breakpoints button to open the Edit Breakpoints window. Existing breakpoints are listed in the lower pane. To set a condition for a breakpoint, select the breakpoint and then specify the condition in the above pane. You can either specify the number of rows before breaking or specify an expression to break upon when it’s true. 09 - 15
  • 190. Module 9 – Troubleshooting DataStage 314Svr Click the Debug Window button to open the Debug Window. • The top pane lists all the columns defined for all links. • The Local Data column lists the data currently in the column. • The Current Break box at the top of the window lists the link where execution stopped. • To add a column to the lower pane (where it is isolated), select the column and then click Add Watch. • If a breakpoint is set, execution stops at that link when a row is written to the link. 9 - 16 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 191. DataStage Essentials Module 9 – Troubleshooting You can step through row-by-row or step-by-step. • Next Row extracts a row of data and stops at the next link with a breakpoint that the row is written to. − For example, if a breakpoint is set on the MexicoCustomersOut link, execution stops at the MexicoCustomersOut link when a Mexican customer is read. − If a breakpoint is not set on the MexicoCustomersOut link, execution will not stop at the MexicoCustomersOut link when a Mexican customer is read. − Execution will stop at the CustomersIn link (even if there is no breakpoint set on it) because all rows are read through that link. • Next Link stops at the next link that data is written to. 09 - 17
  • 192. Module 9 – Troubleshooting DataStage 314Svr 9 - 18 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 194. Module 10 – Defining Lookups DataStage 314Svr 10 - 2 Copyright © 2002 Ascential Software Corporation 09/01/02
  • 195. DataStage Essentials Module 10 – Defining Lookups A hashed file is a file that distributes records in one or more evenly-sized groups based on a primary key. The primary key value is processed by a "hashing algorithm" to determine the location of the record. The number of groups in the file is referred to as its modulus. In this example, there are 5 groups (modulus 5). Hashed files are used for reference lookups in DataStage because of their fast performance. The hashing algorithm determines the group the record is in. The groups contain a small number of records, so the record can be quickly located within the group. If write caching is enabled, DataStage does not write hashed file records directly to disk. Instead it caches the records in memory, and writes the cached records to disk when the cache is full. This improved performance. You can specify the size of the cache on the Tunables tab in Administrator. 10 - 3