11. DataStage Essentials Module 1 – Introduction to DataStage
Data marts are like data warehouses but smaller in scope. Frequently an
organization will have both an enterprise-wide data warehouse and data marts that
extract data from it for specialized purposes.
• Like data warehouses but smaller in scope
• Organize data from a single subject area or department
• Solve a small set of business requirements
• Are cheaper and faster to build than a data warehouse
• Distribute data away from the data warehouse
1-5
13. DataStage Essentials Module 1 – Introduction to DataStage
DataStage is client/server software. The server stores all DataStage objects and
metadata in a repository, which consists of the UniVerse RDBMS. The clients
interface with the server.
The clients run on Windows 95 or later (Windows 98, NT, 2000). The server runs
on Windows NT 4.0 and Windows 2000. Most versions of UNIX are supported.
See the installation release notes for details.
The DataStage client components are:
Component Description
Administrator Administers DataStage projects and conducts
housekeeping on the server
Designer Creates DataStage jobs that are compiled into
executable programs
Director Used to run and monitor the DataStage jobs
Manager Allows you to view and edit the contents of the
repository
1-7
15. DataStage Essentials Module 1 – Introduction to DataStage
Use the Administrator to specify general server defaults, add and delete projects,
and to set project properties. The Administrator also provides a command
interface to the UniVerse repository.
• Use the Administrator Project Properties window to:
• Set job monitoring limits and other Director defaults on the General tab.
• Set user group privileges on the Permissions tab.
• Enable or disable server-side tracing on the Tracing tab.
• Specify a user name and password for scheduling jobs on the Schedule tab.
• Specify hashed file stage read and write cache sizes on the Tunables tab.
General server defaults can be set on the Administrator DataStage
Administration window (not shown):
• Change license information.
• Set server connection timeout.
The DataStage Administrator is discussed in detail in a later module.
1-9
17. DataStage Essentials Module 1 – Introduction to DataStage
The DataStage Designer allows you to use familiar graphical point-and-click
techniques to develop processes for extracting, cleansing, transforming,
integrating and loading data into warehouse tables.
The Designer provides a “visual data flow” method to easily interconnect and
configure reusable components.
Use Designer to:
• Specify how the data is extracted.
• Specify data transformations.
• Decode (denormalize) data going into the data mart using reference lookups.
− For example, if the sales order records contain customer IDs, you can look
up the name of the customer in the CustomerMaster table.
− This avoids the need for a join when users query the data mart, thereby
speeding up the access.
• Aggregate data.
• Split data into multiple outputs on the basis of defined constraints.
You can easily move between the Director, Designer, and Manager by selecting
commands in the Tools menu.
1 - 11
19. DataStage Essentials Module 1 – Introduction to DataStage
• Define your project’s properties: Administrator
• Open (attach to) your project
• Import metadata that defines the format of data stores your jobs will read from
or write to: Manager
• Design the job: Designer
− Define data extractions (reads)
− Define data flows
− Define data integration
− Define data transformations
− Define data constraints
− Define data loads (writes)
− Define data aggregations
• Compile and debug the job: Designer
• Run and monitor the job: Director
1 - 13
21. DataStage Essentials Module 1 – Introduction to DataStage
DataStage Designer is used to build and compile your Extraction,
Transformation, and Load (ETL) jobs.
True: Correct! With Designer you can graphically build your job by placing
graphical components (called "stages") on a canvas. After you build it, your job
is compiled in Designer.
False: Incorrect. With Designer you can graphically build your job by placing
graphical components (called "stages") on a canvas. After you build it, your job
is compiled in Designer.
DataStage Manager is used to execute your jobs after you build them.
True: Incorrect. DataStage Manager is your primary interface to the DataStage
repository. Use Manager to manage metadata and other DataStage objects.
False: Correct! DataStage Manager is your primary interface to the DataStage
repository. Use Manager to manage metadata and other DataStage objects.
1 - 15
27. DataStage Essentials Module 2 – Installing DataStage
The DataStage server should be installed before the DataStage clients are
installed. The server can be installed on Windows NT (including Workstation
and Server), Windows 2000, or UNIX. This module describes the Windows NT
installation.
The exact system requirements depend on your version of DataStage. See the
installation CD for the latest system requirements.
To install the server you will need the installation CD and a license for the
DataStage server. The license contains the following information:
• Serial number
• Project count
− The maximum number of projects you can have installed on the server.
This includes new projects as well as previously created projects to be
upgraded.
• Expiration date
• Authorization code
− This information must be entered exactly as written in the license.
2-3
31. DataStage Essentials Module 2 – Installing DataStage
The DataStage services must be running on the server machine in order to run any
DataStage client applications. To start or stop the DataStage services in Windows
2000, open the DataStage Control Panel window in the Windows 2000 Control
Panel. Then click Start All Services (or Stop All Services). These services must
be stopped when installing or reinstalling DataStage.
UNIX note: In UNIX, these services are started and stopped using the uv.rc
script with the stop or start command options. The exact name varies by platform.
For SUN Solaris, it is /etc/rc2.d/S99uv.rc.
2-7
37. DataStage Essentials Module 3 – Configuring Projects
In DataStage all development work is done within a project. Projects are created
during installation and after installation using Administrator.
Each project is associated with a directory. The directory stores the objects (jobs,
metadata, custom routines, etc.) created in the project.
Before you can work in a project you must attach to it (open it).
You can set the default properties of a project using DataStage Administrator.
3-3
39. DataStage Essentials Module 3 – Configuring Projects
Click Properties on the DataStage Administration window to open the Project
Properties window. There are five active tabs. (The Mainframe tab is only
enabled if your license supports mainframe jobs.) The default is the General tab.
If you select the Enable job administration in Director box, you can perform
some administrative functions in Director without opening Administrator.
When a job is run in Director, events are logged describing the progress of the
job. For example, events are logged when a job starts, when it stops, and when it
aborts. The number of logged events can grow very large. The Auto-purge of
job log box tab allows you to specify conditions for purging these events.
You can limit the logged events either by number of days or number of job runs.
3-5
41. DataStage Essentials Module 3 – Configuring Projects
This tab is used to enable and disable server-side tracing.
The default is for server-side tracing to be disabled. When you enable it,
information about server activity is recorded for any clients that subsequently
attach to the project. This information is written to trace files. Users with in-depth
knowledge of the system software can use it to help identify the cause of a client
problem. If tracing is enabled, users receive a warning message whenever they
invoke a DataStage client.
Warning: Tracing causes a lot of server system overhead. This should only be
used to diagnose serious problems.
3-7
43. DataStage Essentials Module 3 – Configuring Projects
On the Tunables tab, you can specify the sizes of the memory caches used when
reading rows in hashed files and when writing rows to hashed files. Hashed files
are mainly used for lookups and are discussed in a later module.
Active-to-Active link performance settings will be covered in detail in a later
module in this course.
3-9
47. DataStage Essentials Module 4 – Designing and Running Jobs
A job is an executable DataStage program. In DataStage, you can design and run
jobs that perform many useful data warehouse tasks, including data extraction,
data conversion, data aggregation, data loading, etc.
DataStage jobs are:
• Designed and built in Designer.
• Scheduled, invoked, and monitored in Director.
• Executed under the control of DataStage.
4-3
49. DataStage Essentials Module 4 – Designing and Running Jobs
In the center right is the Designer canvas. On it you place stages and links from
the Tools Palette on the right. On the bottom left is the Repository window,
which displays the branches in Manager. Items in Manager, such as jobs and
table definitions can be dragged to the canvas area. Click View>Repository to
display the Repository window.
Click View>Property Browser to display the Property Broswer window. This
window displays the properties of objects selected on the canvas.
4-5
51. DataStage Essentials Module 4 – Designing and Running Jobs
The tool palette contains icons that represent the components you can add to your
job design.
Most of the stages shown here are automatically installed when you install
DataStage. You can also install additional stages called plug-ins for special
purposes. For example, there is a plug-in called sort that can be used to sort data.
Plug-ins are discussed in a later module.
4-7
53. DataStage Essentials Module 4 – Designing and Running Jobs
True or False? The Sequential stage is an active stage.
True: Incorrect. The Sequential stage is considered a passive stage because it is
used to extract or load sequential data from a file. It is not used to transform or
modify data.
False: Correct! The Sequential stage is considered a passive stage because it is
used to extract or load sequential data from a file. It is not used to transform or
modify data.
4-9
55. DataStage Essentials Module 4 – Designing and Running Jobs
The Sequential stage is used to extract data from a sequential file or to load data
into a sequential file.
The main things you need to specify when editing the sequential file stage are the
following:
• Path and name of file
• File format
• Column definitions
• If the sequential stage is being used as a target, specify the write action:
Overwrite the existing file or append to it.
4 - 11
57. DataStage Essentials Module 4 – Designing and Running Jobs
Defining a sequential target stage is similar to defining a sequential source stage.
You are defining the format of the data flowing into the stage, that is, from the
input links. Define each input link listed in the Input name box.
You are defining the file the job will write to. If the file doesn’t exist, it will be
created. Specify whether to overwrite or append the data in the Update action
set of buttons.
General Tab Filter command. Here you can specify a filter program for
processing the file you are extracting data from. This feature can be used, for
example, to unzip a compressed file before reading it. You can type in or browse
for the filter program, and specify any command line arguments it requires in the
text box. This text box is enabled only if you have selected the Stage uses filter
commands checkbox on the Stage page General tab. Note that, if you specify a
filter command, data browsing is not available so the View Data button is
disabled.
On the Format tab, you can specify a different format for the target file than you
specified for the source file.
If the target file doesn’t exist, you will not (of course!) be able to view its data
until after the job runs. If you click the View data button, DataStage will return a
“Failed to open …” error.
4 - 13
59. DataStage Essentials Module 4 – Designing and Running Jobs
The Transformer stage is the primary active stage. Other active stages perform
more specialized types of transformations.
In the Transformer stage you can specify:
• Column mappings
• Derivations
• Constraints
A column mapping maps an input column to an output column. Values are
passed directly from the input column to the output column.
Derivations calculate the values to go into output columns based on values in zero
or more input columns.
Constraints specify the conditions under which incoming rows will be written to
output links.
4 - 15
63. DataStage Essentials Module 4 – Designing and Running Jobs
Type the text in the box. Then specify the various options including:
• Text font and color
• Text box color
• Vertical and horizontal text justification
4 - 19
65. DataStage Essentials Module 4 – Designing and Running Jobs
As you know, you run your jobs in Director. You can open Director from within
Designer by clicking Tools>Run Director.
In a similar way, you can move between Director, Manager, and Designer.
There are two methods for running a job:
• Run it immediately.
• Schedule it to run at a later time or date.
To run a job immediately:
• Select the job in the Job Status view. The job must have been compiled.
• Click Job>Run Now or click the Run Now button in the toolbar. The Job
Run Options window is displayed.
4 - 21
67. DataStage Essentials Module 4 – Designing and Running Jobs
•
The Job Run Options window is displayed when you click Job>Run Now.
This window allows you to stop the job after:
• A certain number of rows.
• A certain number of warning messages.
You can validate your job before you run it. Validation performs some checks
that are necessary in order for your job to run successfully. These include:
• Verifying that connections to data sources can be made.
• Verifying that files can be opened.
• Verifying that SQL statements used to select data can be prepared.
Click Run to run the job after it is validated. The Status column displays the
status of the job run.
4 - 23
75. DataStage Essentials Module 5 – Working with Meta Data
DataStage Manager is a graphical tool for managing the contents of your
DataStage project repository, which contains metadata and other DataStage
components such as jobs and routines.
Metadata is “data about data” that describes the formats of sources and targets.
This includes general format information such as whether the record columns are
delimited and, if so, the delimiting character. It also includes the specific column
definitions.
5-3
77. DataStage Essentials Module 5 – Working with Meta Data
The left pane contains the project tree. There are eight main branches, but you
can create subfolders under each. Select a folder in the project tree to display its
contents. In this example, a folder named DS304 has been created that contains
some of the jobs in the project.
Data Elements branch: Lists the built-in and custom data elements. (Data
elements are extensions of data types, and are discussed in a later module.)
Jobs branch: Lists the jobs in the current project.
Routines branch: Lists the built-in and custom routines.
Routines are blocks of DataStage BASIC code that can be called within a job.
(Routines are discussed in a later module.)
Shared Containers branch: Shared Containers encapsulate sets of DataStage
components into a single stage. (Shared Containers are discussed in a later
module.)
Stage Types branch: Lists the types of stages that are available within a job.
Built-in stages include the sequential and transformer stages you used in
Designer.
Table Definitions branch: Lists the table definitions available for loading into a
job.
5-5
79. DataStage Essentials Module 5 – Working with Meta Data
DataStage Manager manages two different types of objects:
• Metadata describing sources and targets:
− Called table definitions in Manager. These are not to be confused with
relational tables. DataStage table definitions are used to describe the
format and column definitions of any type of source: sequential,
relational, hashed file, etc.
− Table definitions can be created in Manager or Designer and they can also
be imported from the sources or targets they describe.
• DataStage components
− Every object in DataStage (jobs, routines, table definitions, etc.) is stored
in the DataStage repository. Manager is the interface to this repository.
− DataStage components, including whole projects, can be exported from
and imported into Manager.
5-7
81. DataStage Essentials Module 5 – Working with Meta Data
Click Export>DataStage Components in Manager to begin the export process.
Any object in Manager can be exported to a file. Use this procedure to backup
your work or to move DataStage objects from one project to another.
Select the types of components to export. You can select either the whole project
or select a portion of the objects in the project.
Specify the name and path of the file to export to. By default, objects are
exported to a text file in a special format. By default, the extension is dsx.
Alternatively, you can export the objects to an XML document.
The directory you export to is on the DataStage client, not the server.
5-9
83. DataStage Essentials Module 5 – Working with Meta Data
True or False? The directory you export to is on the DataStage client
machine, not on the DataStage server machine.
True: Correct! The directory you select for export must be addressible by your
client machine.
False: Incorrect. The directory you select for export must be addressible by your
client machine.
5 - 11
85. DataStage Essentials Module 5 – Working with Meta Data
To import DataStage components, click Import>DataStage Components.
Select the file to import. Click Import all to begin the import process or Import
selected to view a list of the objects in the import file. You can import selected
objects from the list. Select the Overwrite without query button to overwrite
objects with the same name without warning.
5 - 13
89. DataStage Essentials Module 5 – Working with Meta Data
To start the import, click Import>Table Definitions>Sequential File
Definitions. The Import Meta Data (Sequential) window is displayed.
Select the directory containing the sequential files. The Files box is then
populated with the files you can import.
Select the file to import.
Select or specify a category (folder) to import into.
• The format is: <Category><Sub-category>
• <Category> is the first-level sub-folder under Table Definitions.
• <Sub-category> is (or becomes) a sub-folder under the type.
5 - 17
97. DataStage Essentials Module 6 – Working with Relational Data
You can perform the same tasks with relational data that you can with sequential
data. You can extract, filter, and transform data from relational tables.
You can also load data into relational tables.
Although you can work with many relational databases through native drivers
(including UniVerse, UniData, and Oracle), you can access many more relational
databases using ODBC.
In the ODBC stage, you can either specify your query to one or more tables in the
database interactively or you can type the query or you can paste in an existing
query.
6-3
101. DataStage Essentials Module 6 – Working with Relational Data
Extracting data from a relational table is similar to extracting data from a
sequential file except that you use an ODBC stage instead of a sequential stage.
In this example, we’ll extract data from a relational table and load it into a
sequential file.
6-7
103. DataStage Essentials Module 6 – Working with Relational Data
Specify the table name on the General tab of the Outputs tab.
Select Generated query to define the SQL SELECT statement interactively using
the Columns and Selection tabs. Select User-defined SQL query to write your
own SQL SELECT statement to send to database.
6-9
105. DataStage Essentials Module 6 – Working with Relational Data
Optionally, specify a WHERE clause and other additional SQL clauses on the
Selection tab.
Other clauses can be anything else you wish to add to the Select clause, such as
ORDER BY.
6 - 11
107. DataStage Essentials Module 6 – Working with Relational Data
If you want to define your own SQL query, click User-defined SQL query on
the General tab and then write or paste the query into the SQL for primary
inputs box on the SQL Query tab.
6 - 13
109. DataStage Essentials Module 6 – Working with Relational Data
Editing an ODBC target stage is similar to editing an ODBC source stage. It
includes the following tasks:
• Specify the data source containing the target table.
• Specify the name of the table.
• Select the update action. You can choose from a variety of INSERT and/or
UPDATE actions.
• Optionally, create the table.
• Load the column definitions from the Manager table definition.
6 - 15
111. DataStage Essentials Module 6 – Working with Relational Data
On the Edit DDL tab you can generate and modify the CREATE TABLE
statement used to create the target table.
If you make any changes to column definitions, you need to regenerate the
CREATE TABLE statement by clicking the Create DDL button.
6 - 17
117. DataStage Essentials Module 6 – Working with Relational Data
The ORAOCI8 plug-in lets you rapidly and efficiently prepare and load streams
of tabular data from any DataStage stage (for example, the ODBC stage, the
Sequential File stage, and so forth) to and from tables of the target Oracle
database. The Oracle client on Windows NT or UNIX uses SQL*Net to access an
Oracle server on Windows NT or UNIX.
6 - 23
119. DataStage Essentials Module 6 – Working with Relational Data
General Tab
This tab is displayed by default. It contains the following fields:
Table name. This required field is editable when the update action is not User-
defined SQL (otherwise, it is read-only). It is the name of the target Oracle table
the data is written to, and the table must exist or be created by choosing generate
DDL from the Create table action list. You must have insert, update, or delete
privileges, depending on input mode. You must specify Table name if you do not
specify User-defined SQL. There is no default. Click … (Browse button) to
browse the Repository to select the table.
Update action. Specifies which SQL statements are used to update the target
table. Some update actions require key columns to update or delete rows. There is
no default. Choose the option you want from the list.
Clear table then insert rows. Deletes the contents of the table and adds the new
rows, with slower performance because of transaction logging.
Truncate table then insert rows. Truncates the table with no transaction logging
and faster performance.
6 - 25
121. DataStage Essentials Module 6 – Working with Relational Data
rows. This minimizes server round trips and maximizes performance by
executingfewer statements. If this number is too large, the client may run
out of memory.
• Transaction size. This field exists for backward compatibility, but it is
ignored for version 3.0 and later of the plug-in. The transaction size for
new jobs is now handled by Rows per transaction on the Transaction
Handling tab.
• Create table action. Creates the target table in the specified database if
Generate DDL is selected. It uses the column definitions in the Columns
tab and the table name and the TABLESPACE and STORAGE properties
for the target table. The generated Create Table statement includes the
TABLESPACE and STORAGE keywords, which indicate the location
where the table is created and the storage expression for the Oracle
storage_clause. You must have CREATE TABLE privileges on your
schema. You can also specify your own CREATE TABLE SQL statement.
You must enter the storage clause in Oracle format. (Use the User-defined
DDL tab on the SQL tab for a complex statement.)
6 - 27
125. DataStage Essentials Module 7 – Constraints and Derivations
A constraint specifies the condition under which data flows through a link. For
example, suppose you want to split the data in the jobs file into separate files
based on the job level.
We need to define a constraint on each link so that only jobs within a certain level
range are written to each file.
7-3
127. DataStage Essentials Module 7 – Constraints and Derivations
This shows the Constraints window. Constraints are defined for each of the top
three links. The Reject Row box is selected for the last link. All rows that fail to
satisfy the top three links will be sent down this link.
7-5
129. DataStage Essentials Module 7 – Constraints and Derivations
True or False? A Rejects link can be placed anywhere in the link ordering.
True: Incorrect. A Rejects link should be placed last in the link ordering, if it is
to get every row that doesn't satisfy any of the other constraints.
False: Correct! A Rejects link should be placed last in the link ordering, if it is
to get every row that doesn't satisfy any of the other constraints.
7-7
131. DataStage Essentials Module 7 – Constraints and Derivations
A derivation is an expression that specifies the value to be moved into a target
column (field).
Every target column must have a derivation. The simplest derivation is an input
column. The value in the input column is moved to the target column.
To construct a derivation for a target column double-click on the derivation cell
next to the target column.
Derivations are constructed in the same way that constraints are constructed:
• Type constants.
• Type or enter operators from Operator shortcut menu.
• Type or enter operands from Operand shortcut menu.
What’s the difference between derivations and constraints?
• Constraints apply to links; derivations apply to columns.
• Constraints are conditions, either true or false; derivations specify a value to
go into a target column.
7-9
133. DataStage Essentials Module 7 – Constraints and Derivations
True or False? If the constraint for a particular link is not satisified, then the
derivations defined for that link are not executed.
True: Correct! Constraints have precedence over derivations. Derivations in an
output link are only executed if the constraint is satisfied.
False: Incorrect. Constraints have precedence over derivations. Derivations in
an output link are only executed if the constraint is satisfied.
7 - 11
135. DataStage Essentials Module 7 – Constraints and Derivations
You can create stage variables for use in your column derivations and constraints.
Stage variables store values without writing them out to a target file or table.
They can be used in expressions just like constants, input columns, and other
operands.
Stage variables retain their values across reads. This allows them to be used as
counters and accumulators. You can also use them to compare a current input
value to a previous input value.
To create a new stage variable, click the right mouse button over the Stage
Variables window and then click Append New Stage Variable (or Insert New
Stage Variable).
After you create it, you specify a derivation for it in the same way as for columns.
7 - 13
137. DataStage Essentials Module 7 – Constraints and Derivations
Note the output link reordering icon available on the toolbar from within the
Transformer stage.
7 - 15
139. DataStage Essentials Module 7 – Constraints and Derivations
Derivations for stage variables are executed before derivations for any
output link columns.
True: Correct! So you can be sure that the derivations for any of the stage
variables referenced in column derivations will have already been executed.
False: Incorrect. The derivations for stage variables are executed first. So you
can be sure that the derivations for any of the stage variables referenced in column
derivations will have already been executed.
7 - 17
145. DataStage Essentials Module 8 – Creating Basic Expressions
DataStage BASIC is a form of BASIC that has been customized to work with
DataStage.
In the previous module you learned how to define constraints and derivations.
Derivations and constraints are written using DataStage BASIC.
Job control routines, which are discussed in a later module, are also written in
DataStage BASIC.
This module will not attempt to teach you BASIC programming. Our focus is on
what you need to know in order to construct complex DataStage constraints and
derivations.
8-3
147. DataStage Essentials Module 8 – Creating Basic Expressions
For more information about BASIC functions than is provided here, look up
Alphabetical List BASIC Functions and Statements in Help. BASIC functions
include the standard Pick BASIC functions. Click Function from the Operands
menu to insert a function.
Here are a few of the more common functions:
• TRIM(string), TRIM(string, character), TRIMF, TRIMB
− TRIM(“ xyz ” ) → “xyz”
• LEN(string)
• UPCASE(string), DOWNCASE(string)
• ICONV, OCONV
− ICONV is used to convert values to an internal format
− OCONV is used to convert values from an internal format
− Very powerful functions. Often used for date and time conversions and
manipulations.
− These functions are discussed later in the module.
8-5
149. DataStage Essentials Module 8 – Creating Basic Expressions
True or False? TRIM is a system variable.
True: Incorrect. TRIM is a DataStage function that removes surrounding spaces
in a character string.
False: Correct! TRIM is a DataStage function that removes surrounding spaces
in a character string.
8-7
151. DataStage Essentials Module 8 – Creating Basic Expressions
DataStage is supplied with a number of functions you can use to obtain
information about your jobs and projects. You can insert these functions into
derivations.
DS functions and macros are discussed in a later module.
8-9
161. DataStage Essentials Module 8 – Creating Basic Expressions
For detailed help on Iconv and Oconv, see their entries in the Alphabetical List
of BASIC Functions and Statements in Help.
Use Iconv to convert a string date in a variety of formats to the internal DataStage
integer format. Use Oconv to convert an internal date to a string date in a variety
of formats. Use these two functions together to covert a string date from one
format to another.
The internal format for a date is based on a reference date of December 31, 1967,
which is day 0. Dates before are negative integers; dates after are positive
integers.
Use the “D” conversion code to specify the format of the date to be converted to
an internal date by Iconv or the format of the date to be output by Oconv.
8 - 19
165. DataStage Essentials Module 8 – Creating Basic Expressions
DataStage provides a number of built-in transforms you can use for date
conversions.
The following data elements are used with the built-in transforms:
Data element String format Example
DATE.TAG YYYY-MM-DD 1999-02-24
WEEK.TAG YYYYWnn 1999W06
MONTH.TAG YYYY-MM 1999-02
QUARTER.TAG YYYYQn 1999Q4
YEAR.TAG YYYY 1999
8 - 23
167. DataStage Essentials Module 8 – Creating Basic Expressions
The transforms can be grouped into the following categories:
• String to day number
− Formatted string → internal date integer
• Day number to date string
− Internal date integer → formatted string
• Date string to date string
− DATE.TAG string → formatted string
8 - 25
171. DataStage Essentials Module 8 – Creating Basic Expressions
The following functions convert strings in DATE.TAG format to strings in
various other formats (DAY.TAG, MONTH.TAG, …).
Function Tag Description
TAG.TO.MONTH DATE.TAG Convert DATE.TAG to
MONTH.TAG
TAG.TO.QUARTER DATE.TAG Convert DATE.TAG to
QUARTER.TAG
TAG.TO.WEEK DATE.TAG Convert DATE.TAG to
WEEK.TAG
TAG.TO.DAY DATE.TAG Convert DATE.TAG to
DAY.TAG
Examples:
TAG.TO.MONTH(“1993-02-14”) → “1993-02”
TAG.TO.QUARTER(“1993-02-14”) → “1993Q1”
8 - 29
177. DataStage Essentials Module 9 – Troubleshooting
Events are logged to the job log file when a job is validated, run, or reset. You
can use the log file to troubleshoot jobs that fail during validation or a run.
Various entries are written to the log, including when:
• The job starts
• The job finishes
• An active stage starts
• An active stage finishes
• Rows are rejected (yellow icons)
• Errors occur (red icons)
• DataStage informational reports are logged
• User-invoked messages are displayed
09 - 3
181. DataStage Essentials Module 9 – Troubleshooting
The Monitor can be used to display information about a job while it is running.
To start the Monitor, click Tools>New Monitor. Once in Monitor, click the right
mouse button and then select Show links to display information about each of the
input and output links.
09 - 7
183. DataStage Essentials Module 9 – Troubleshooting
Server side tracing is enabled in Administrator. It is designed to be used to help
customer support analysts troubleshoot serious problems. When enabled, it logs a
record to a trace file whenever DataStage clients interact with the server.
Caution: Because of the overhead caused by server side tracing it should only be
used when working with customer support.
09 - 9
185. DataStage Essentials Module 9 – Troubleshooting
DataStage provides a debugger for testing and debugging your job designs. The
debugger runs within Designer. With the DataStage debugger you can:
• Set breakpoints on job links, including conditional breakpoints.
• Step through your job link-by-link or row-by-row.
• Watch the values going into link columns.
09 - 11
187. DataStage Essentials Module 9 – Troubleshooting
or until another link with a breakpoint is
encountered.
Stop Job Stops the job at the point it is at. Click Go to
continue.
Job Parameters Set limits on rows and warnings.
Edit Breakpoints Displays the Edit Breakpoints window, in which
you can edit existing breakpoints.
Toggle Breakpoint Set or clear a breakpoint on a selected link.
Clear All Breakpoints Removes breakpoints from all links.
View job log Open Director and view the job log.
Debug Window Show/hide the Debug Window, which displays
link column values.
09 - 13
189. DataStage Essentials Module 9 – Troubleshooting
Click the Edit Breakpoints button to open the Edit Breakpoints window.
Existing breakpoints are listed in the lower pane.
To set a condition for a breakpoint, select the breakpoint and then specify the
condition in the above pane. You can either specify the number of rows before
breaking or specify an expression to break upon when it’s true.
09 - 15
191. DataStage Essentials Module 9 – Troubleshooting
You can step through row-by-row or step-by-step.
• Next Row extracts a row of data and stops at the next link with a breakpoint
that the row is written to.
− For example, if a breakpoint is set on the MexicoCustomersOut link,
execution stops at the MexicoCustomersOut link when a Mexican
customer is read.
− If a breakpoint is not set on the MexicoCustomersOut link, execution
will not stop at the MexicoCustomersOut link when a Mexican customer
is read.
− Execution will stop at the CustomersIn link (even if there is no
breakpoint set on it) because all rows are read through that link.
• Next Link stops at the next link that data is written to.
09 - 17
195. DataStage Essentials Module 10 – Defining Lookups
A hashed file is a file that distributes records in one or more evenly-sized groups
based on a primary key. The primary key value is processed by a "hashing
algorithm" to determine the location of the record.
The number of groups in the file is referred to as its modulus.
In this example, there are 5 groups (modulus 5).
Hashed files are used for reference lookups in DataStage because of their fast
performance. The hashing algorithm determines the group the record is in. The
groups contain a small number of records, so the record can be quickly located
within the group.
If write caching is enabled, DataStage does not write hashed file records directly
to disk. Instead it caches the records in memory, and writes the cached records to
disk when the cache is full. This improved performance. You can specify the
size of the cache on the Tunables tab in Administrator.
10 - 3