Data warehousing labs maunal

Data Warehousing Lab Manual
Engr. Muhammad Waseem

SUBJECT: DATAWARE HOUSING & MINING
SUBJECT CODE: (CS-401)
LIST OF PRACTICALS
LAB NO:1 Understanding Teradata
LAB NO:2 Creating Database and Users
LAB NO:3 Creating the Tables in the Database
LAB NO:4 To be familiar with Teradata SQL
Assistant
LAB NO:5 Execute the different data
manipulation queries
LAB NO:6 To be familiar with the visual
Explain
LAB NO:7 Generating reports using Teradata
Warehouse Miner 5.3.0 Express
LAB NO:8 Histograms generation using
Teradata Warehouse Miner 5.3.0
Express
LAB NO:9 Connecting database with VB
LAB NO:10 Loading of data using Fastload
utility
LAB NO:11 To be familiar with schemas
LAB NO:12 Teradata Warehouse Builder
Visual Interface
LAB NO:13 Generating frequency diagram of
data using Warehouse Miner
LAB NO:14 To become familiar with Teradata
Parallel Transporter Wizard 13.0
LAB NO:15 Creating a job script by using
Teradata Parallel Transporter
Wizard 3.0
Engr. Shakeel Ahmed Shaikh

LAB TASK # 01
Understanding Teradata
Object
The purpose of this lab is to introduce you with the Teradata.
Introduction
The Teradata provides the solutions for the data warehousing. TERADATA is a
registered trademark of NCR International, Inc. The Teradata Tools and Utilities
are a group of products designed to work with the Teradata RDBMS.
Tools
• Teradata Service Control
• Administrator
Theory
The Teradata RDBMS is a complete relational database management system
composed of hardware and software. The system can use either of two attachment
methods to connect to other computer systems as illustrated in the following table:
This attachment method… This attachment method… Allows the
system to be attached…
Channel Directly to an I/O channel of a
mainframe computer.
Network To intelligent workstations
through a Local Area Network
(LAN).
With the Teradata RDBMS, you can access, store, and operate on data using
Teradata Structured Query Language (Teradata SQL).
Teradata Service Control
It is used to start and stop the Teradata server.
Teradata Administrator
Teradata Administrator is a Windows-based tool that interfaces with the Teradata
Database Data Dictionary to perform database administration tasks. Teradata
Administrator enables you to view the hierarchy of databases and users on a
Teradata system. You can then display information about these objects, create
new objects, or perform other maintenance functions on the system.
Procedure
Starting the Teradata server
To start the Teradata server click on the start menu then Programs>> Teradata
Database Express 13.0>>Teradata Service Control
You will see the following window.

You can see that currently Teradata is Down/Stopped. To start the Teradata click
on the Start Teradata!. Then wait for two minutes. When the Teradata server will
be started it status will be shown in the same window.
The databases and the users in the database
You can check all databases, users, tables, and views etc by using the database
browser to start the database browser click on
Start button to start the Teradata Administrator program via:
Start>>Programs>>Teradata Administrator 13.0
For the 'please select a data source' window, select tdadmin and click on the OK
button

When you will click ok you will the Teradata Administrator windows.
Result
Now we know how to start the Teradata Server and how to check the different
databases and users in the database.

LAB TASK # 02
Creating Database and Users
Object
To create a database and its Users
Introduction
In Teradata, a database is always created inside another database. The dbc
database is the parent of all databases. The users are created for the databases to
perform different operations on the data.
Tools
• Teradata Administrator 6.0
Theory
When the Teradata RDBMS software is installed, the Teradata RDBMS contains
the following system users/databases:
• DBC
• SysAdmin
• SystemFE
Initially, the special system user DBC owns all space in the Teradata
Database. Some space is assigned from DBC to the system users and databases
named SysAdmin, SystemFE, Crashdumps, and Sys_Calendar.
Everyone higher in the hierarchy is a parent or owner. Everyone lower in the
hierarchy is a child.
Every object has one and only one creator. The creator is the user who executes
the CREATE statement.
The GRANT statement enables you to grant any of the privileges you have to
another user. For example, when logged on as user DBC you need to grant all the
privileges retained by DBC to your new DBAdmin user:
GRANT ALL ON ALL TO DBADMIN;
The GIVE statement enables you to transfer ownership of a database or user to a
non-owner. GIVE transfers to the recipient not only the specified database or user
space, but also all of the databases, users, and objects owned by that database or
user.
Permanent space allocated to a Teradata user or database for creating:
• Data tables
• Permanent journals
• Table headers of global temporary tables (one header row per table)
• Secondary indexes (SIs)

• Join indexes ( JIs)
• Hash indexes ( HIs)
• Stored procedures
• Triggers
Procedure
Start the Teradata server using the Teradata Service Control and start the Teradata
Administrator 13.0. To start the Teradata Administrator 13.0 click on Start button:
Start>>Programs>>Teradata Administrator 13.0
For the 'please select a data source' window, select tdadmin and click on the OK
button
When you will click ok you will the Teradata Administrator windows.

To create the database click on the diamond as shown in the center of fig below

Type the entries as shown in the table below
Database Name Mydatabase
Owner Dbc
Perm Space 10
Spool Space 10
Temp Space 10
For perm space, Spool Space and Temp Space select the option
Click on the create button, the database will be created. See the status bar
message.
To create the user click on the user icon as shown at the top of fig below

Type the entries as shown in the table below :
User Name Ahmed
Owner Mydatabase
Password Ahmed
Perm Space 10
Spool Space 10
Temp Space 10
Account
Default Database mydatabase
Click on the create button, the user will be created.
You can see the database and its user in the Teradata Administrator 13.0 as shown
in the fig below.

Finally grand the privileges to the user Ahmed.
Result
Now we are familiar to Teradata Administrator 13.0 and we can create databases
and users.
.

LAB TASK # 03
Creating the tables in the database
Object
To create the tables in the database using BTEQ
Introduction
The tables created in the relational database management system store the data.
They are created within a database by a user. Tables consist of rows and tables.
The rows store the records and columns store the same type data.
Tools
• BTEQ
Theory
The BTEQ is used to execute the SQL queries. You can start many sessions of
BTEQ at one time.
BTEQ is an abbreviation of Basic Teradata Query. It is a general-purpose,
command-based program that allows users on a workstation to communicate with
one or more Teradata RDBMS systems, and to format reports for both print and
screen output. Using BTEQ you can submit SQL queries to the Teradata RDBMS.
BTEQ formats the results and returns them to the screen, a file, or to a designated
printer.
A BTEQ session provides a quick and easy way to access a Teradata RDBMS. In
a BTEQ session, you can do the following:
• Enter Teradata SQL statements to view, add, modify, and delete data.
• Enter BTEQ commands.
• Enter operating system commands.
• Create and use Teradata stored procedures from a BTEQ session.
Procedure
Start the BTEQ. To start the BTEQ click on the start menu then
program>>Teradata Client>>BTEQ
In the BTEQ window type the following commands to logon.
.logon
UserId: Asad
Password: Lodhi
The session is shown in the fig bellow in the fig..
Now you can create tables by executing the following SQL command.
CREATE TABLE Event (
account_id char(12) CHARACTER SET LATIN,
account_type char(2) CHARACTER SET LATIN,
Complaint_id integer ,
Complaint_detail varchar(50) CHARACTER SET LATIN,
Actions_taken varchar(20) CHARACTER SET LATIN,
Remarks varchar(50) CHARACTER SET LATIN
);

When you execute the above command in BTEQ after the execution of that
command the table will be created and the following message will be displayed.
*** Table has been created.
*** Total elapsed time was 1 second.
Result
We are familiar with the BTEQ and we can create the tables using this utility.

LAB TASK # 04
Teradata SQL Assistant
Object
To be familiar with Teradata SQL Assistant
Introduction
Designed to provide a simple way to execute and manage your queries against a
Teradata, or other ODBC compliant database, SQL Assistant stores your queries
for easy re-use, and provides you with an audit trail that shows the steps that
produced your current results.
Tools
• Teradata SQL Assistant 6.1
Theory
There are several tools for executing the SQL queries. But the Teradata SQL
Assistant 6.1 is the visual and easiest query submitting tool. Any kind of query
can executed using this utility. You can create new databases, tables, views,
macros etc. the data present in the tables can also be manipulated.
Procedure
Start SQL Assistant
To start SQL Assistant click on the following menu from the Windows Start
bar: Start >> Programs Teradata SQL Assistant 6.1
You will see the following window. Now connect it with the DemoTDAT.

Now type any valid SQL query and execute it by pressing F5 key. The results will
be displayed. The results of a query are shown in the following figure.

Result
Now we are familiar with the Teradata SQL Assistant 6.1. We can execute any
query using this utility.
LAB TASK # 05
Data manipulation
Object
Execute the different data manipulation queries.
Introduction

Data manipulation statements affect one or more table rows.
Tools
• BTEQ
• Teradata SQL Assistant 6.1
Theory
Some of the data manipulation statements and their purpose is given in the
following table
Command Purpose
ABORT Terminates the current transaction.
BEGIN TRANSACTION Defines the beginning of a single logical transaction.
CALL Invokes a stored procedure.
CHECKPOINT Places a flag in a journal table that can be used to coordinate
transaction recovery.
COMMENT Adds or replaces an object comment.
COMMIT Terminates the current ANSI SQL transaction and commits all
changes made within it.
DELETE Removes rows from a table.
ECHO Returns a fixed character string to the requestor.
END TRANSATION Defines the end of a single logical transaction.
EXECUTE ( Macro
Form)
Performs a macro.
EXPLAIN Modifier Reports a summary of the plan generated by the SQL query
optimizer: the steps Teradata would perform to resolve a request.
The request itself is not processed.
GROUP BY Clause Groups result rows of a SELECT query by the values in one or
more columns.
HAVING Clause Specifies a conditional expression that must be satisfied by the
rows in a SELECT query to be included in the resulting data.
INCLUDE
INSERT Adds new rows to a named table by directly specifying the row
data to be inserted (valued form) or by retrieving the new row
data from another table (selected form).
LOCKING Modifier Locks a database, table, view, or row hash, overriding the default
usage lock that Teradata places on a database, table, view, or row
hash in response to a request.
MERGE Merges a source row into a target table based on whether any
target rows satisfy a specified matching condition with the source
row.
ORDER BY Clause Specifies how result data in a SELECT statement is to be ordered.
QUALIFY Clause Eliminates rows from a SELECT query based on the value of a
computation.
ROLLBACK Terminates and rolls back the current transaction.

SAMPLE Clause Selects rows from a SELECT query for further processing to
satisfy a conditional expression expressed in a WHERE clause.
SELECT Returns selected rows in the form of a result table.
SELECT INTO Returns a selected row from a table and assigns its values to host
variables.
UPDATE (Searched
Form)
Modifies field values in existing rows of a table.
UPDATE (Upsert Form) Updates column values in a specified row and, if the row does not
exist, inserts it into the table with a specified set of initial column
values.
USING Row Descriptor Defines one or more variable parameter names.
WHERE Cause Selects rows in a SELECT query that satisfy a conditional
expression.
WITH Clause Specifies summary lines and breaks (grouping conditions) that
determine how selected results from a SELECT query are
returned (typically used for subtotals).
Procedure
Open any query executing tool, here we are using the Teradata SQL Assistant
6.1.
To start SQL Assistant click on the following menu from the Windows
Start bar: Start >> Programs >> Teradata SQL Assistant 6.1.
Connect the Teradata SQL Assistant 6.1 by with DemoTDAT clicking on the
connection button .
Type the following SQL select query in the Query window and press F5 to
execute it.
select Complaint_detail as vent_f_Compalint_detail,b,rank(b) as rnk from (select Complaint_detail ,count(*)
as cnt from thesis.Event group by 1) as foo(Complaint_detail ,b) qualify rnk< 25;
You will the following result.
Execute the other sample query as shown which is shown below.
select Actions_taken as vent_f_Actions_taken,b,rank(b) as rnk from (select Actions_taken ,count(*) as
cnt from thesis.Event group by 1) as foo(Actions_taken ,b) qualify rnk< 25;

After executing above query you will find the following result.
select account_type as Eventaccount_type,b,rank(b) as rnk from (select account_type ,count(*) as cnt
from thesis.Event group by 1) as foo(account_type ,b) qualify rnk< 25;
select Complaint_detail as EventCompalint_detail,b,rank(b) as rnk from (select Complaint_detail ,count(*)
as cnt from thesis.Event group by 1) as foo(Complaint_detail ,b) qualify rnk< 25;
select account_id as Eventaccount_id,b,rank(b) as rnk from (select account_id ,count(*) as cnt from
thesis.Event group by 1) as foo(account_id ,b) qualify rnk< 25;

select Complaint_id as EventComapint_id,b,rank(b) as rnk from (select complaint_id ,count(*) as cnt from
thesis.Event group by 1) as foo(Complaint_id ,b) qualify rnk< 25;
select Actions_taken as EventActions_taken,b,rank(b) as rnk from (select Actions_taken ,count(*) as
cnt from thesis.Event group by 1) as foo(Actions_taken ,b) qualify rnk< 25;
Execute the other sample query as shown which is shown below
explain select * from thesis.event;
Explanation
-------------------------------------------------------------------------
1) First, we lock a distinct thesis."pseudo table" for read on a
RowHash to prevent global deadlock for thesis.event.

2) Next, we lock thesis.event for read.
3) We do an all-AMPs RETRIEVE step from thesis.event by way of an
all-rows scan with no residual conditions into Spool 1
(group_amps), which is built locally on the AMPs. The input table
will not be cached in memory, but it is eligible for synchronized
scanning. The size of Spool 1 is estimated with low confidence to
be 12,936 rows. The estimated time for this step is 0.31 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.31 seconds.
Similarly we can execute the other command in the query editor of Teradata
SQL Assistant 6.1.
Result
We are familiar with the data manipulation statements.

LAB TASK # 06
Visual Explain
Object
To be familiar with the visual Explain
Introduction
The Visual Explain Demo provides a visual depiction of the execution plan
chosen by the Teradata Database Optimizer to access data.
Tools
• Teradata Visual Explain 3.0
Theory
The visual explain does visual depiction by turning the output text of the
EXPLAIN modifier into a series of easily readable icons. We will use 7 queries
in the Visual Explain lab.
Procedure
Start the Visual Explain and Compare Utility
Start>>Programs>>Teradata>>Visual Explain 3.0
• Connect to the Teradata Database by clicking the green connect icon
(looks like a plug)
• Highlight “DemoTDAT”
• Click OK

• Click on File>>Open Plan from Database…
• Under Selection fill in the Database name: QCD
• Click on Browse QCD… button. Note: Make sure Query Tag field is
blank.
• A list of seven queries appears; click the checkbox for the first query
Select the first item and click ADD, the entry now appears on the right
hand side
.
• Click OPEN, the query plan will load
• The visual plan now appears
• A summary will appear on top of the plan, click the X in the upper-right
corner to close it.
• Moving the mouse over the plan components will display various pieces
of information about the plan

Result
Now we are familiar with the visual explain.

LAB TASK # 07
Generating reports using the Miner
Object
Generating the frequency diagrams of our data using the Miner
Introduction
Compute frequency of column values or multi-column combined values.
Optionally, compute frequency of values for pairs of columns in a single column
list or two column lists.
Tools
• Teradata Warehouse Miner
Theory
Frequency analysis is designed to count the occurrence of individual data values
in columns that contain categorical data. It can be useful in understanding the
meaning of a particular data element, and it may point out the need to recode
some of the data values found, either permanently or in the course of building an
analytic data set. This function can also be useful in analyzing combinations of
values occurring in two or more columns.
Procedure
To generate the frequency diagrams start the Teradata Warehouse Miner by
clicking it icon on desktop.
Connect it with the thesis database.
Start the new project by clicking the new project icon or from file menu.
Now click the Project menu>>Add new Analysis.
In the Analysis window, from categories pane select Descriptive Statistics and
click on the Frequency icon in the Analysis pane
and then press the OK button.

• Select the Employee table from available tables.
• Select Deptno column.
• Click the right arrow to move the Deptno into the selected columns,
as shown in the fig below.

Start the report generating by Clicking on OR using Shortcut F5. The
status can be seen in the execution status pane. As shown below
The resultant report can be viewed by clicking on the Results icon in the
frequency window.
Output can be viewed in ways ( Data, Graph, SQL ) as shown below. Click on the
Graph icon in the Frequency window.
The resultant graph will be displayed as shown below.
Result
We are familiar with the frequency diagrams and also we know how to
generate them.

LAB TASK # 08
Histograms generation
Object
To generate the histograms of data
Introduction
Determine the distribution of a numeric column(s) giving counts with optional
overlay counts and statistics. Optionally sub-bin numeric column(s) and
determine data "spikes" giving additional counts as an option.
Tools
• Teradata Warehouse Miner
Theory
Histogram analysis is designed to study the distribution of continuous numeric
values in a column by providing the data necessary to create a histogram graph.
This type of analysis is sometimes also referred to as binning because it counts the
occurrence of values in a series of numeric ranges called bins.
Procedure
Start the new project by clicking the new project icon or from file menu. Now
click the Project menu>>Add>>Descriptive Statistics.
Then click on the Histogram icon in the Add Descriptive Statistics Function
window and press the OK button.

Select the event table from available tables, and move the account_id into the
selected Bin columns and Aliases as shown in the fig below.
Start the report generating by pressing F5 or Run>>Start F5.

The resultant report can be checked by clicking on the Results icon in the new
project window.
Click on the Histogram Graph icon in the Analysis Results window.
Result
We are familiar with the Histogram diagrams and also we know how to
generate them.

Connecting database with VB
Object
To connect the database with VB
Introduction
Visual basic is a language, used to make different software.90% applications of
VB are in Databases.
Tools
1. Teradata Service Control
2. Visual Basic 6.0
Theory
We use the Adodc1 control to connect with database. We define the data sources,
and then connect the adodc1 with those data sources. Following are the drivers
which VB supports for data base.

Procedure
To connect any data base with VB we should set the driver for the data source. To do this click
on the data sources icon in the control panel
Following window will be displayed.
Click on the add button.

In create new data source window select the Teradata and click on the Finish button.

Fill the entries as shown above or according to your requirement then click on the Ok
button.
Click yes for the warning message.
You can check Event data source name having driver of Teradata in the ODBC Data Source
Administrator window. Close the ODBC Data Source Administrator window.
Start a new standard EXE project in the visual basic. And select the adodc1 data control form the
components. Place the adodc1 control on the form and right click on it. Select properties form
the popup menu.
Select Event in the Use ODBC Data Source Name under the General tag.

Write the user name and password under the Authentication tag.
Select the 2- adCmdTable in the command Type and Event in the Table or Stored Procedure
Name under the RecordSource tag.
Click ok to close the properties page.
Select the dataGrid control form the components and place it on the VB form. Make it bigger so
that data could be seen easily.

Now set the data source property of grid control to adodc1
Press F5 to run the VB project the data will be shown in the grid control as shown below.
Result
We are learned form this lab how to view the data places in a data warehouse using VB

LAB TASK # 10
Loading of data
Object
To load the data in the tables using the fast load utility
Introduction
To load the data we have some utilities like Fastload, BTEQ, Tpump, Multiload,
tbuild. Fastload is used to load data in empty tables.
Theory
FastLoad is a command-driven utility you can use to quickly load large amounts
of data in an empty table on a Teradata Relational Database Management System
(RDBMS).
You can load data from:
• Disk or tape files on a channel-attached client system
• Input files on a network-attached workstation
• Special input module (INMOD) routines you write to select, validate, and
preprocess input data
• Any other device providing properly formatted source data
FastLoad uses multiple sessions to load data. However, it loads data into only one
table on a Teradata RDBMS per job. If you want to load data into more than one
table in an RDBMS, you must submit multiple FastLoad jobs—one for each table.
Procedure
To start the Fastload click on the start menu then Program>>Teradata
Client>>FastLoad
You will see the following Fastload screen.

The following script is used to create one table and then load data in that table from flat file. Run
the command of the script in the Fastload. You will see the data will be loaded in the table
LOGON dbc/dbc , dbc;
.set record unformatted
DATABASE thesis;
DROP table Event;
drop table Event_error1;
CREATE TABLE Event (
account_id char(12) CHARACTER SET LATIN,
account_type char(2) CHARACTER SET LATIN,
Complaint_id integer ,
Complaint_detail varchar(50) CHARACTER SET LATIN,
Actions_taken varchar(20) CHARACTER SET LATIN,

Remarks varchar(50) CHARACTER SET LATIN
);
DEFINE
account_id (char(12)),
account_type (char(2)),
Complaint_id (char(5)),
Complaint_detail (char(50)),
Actions_taken (char(20)),
Remarks (char(50)),
newline3 (char(2))
FILE= c:flevents.txt;
BEGIN LOADING
Event
ERRORFILES
Event_ERROR1,Event_ERROR2
CHECKPOINT 10000;
INSERT INTO Event
(
account_id ,
account_type ,
Complaint_id ,
Complaint_detail ,
Actions_taken ,
Remarks
)
VALUES

(
:account_id ,
:account_type ,
:Complaint_id ,
:Complaint_detail ,
:Actions_taken ,
:Remarks
);
END LOADING
;
.LOGOFF;
Here we are loading the data from event.txt file. You can retrieve that data using BTEQ or any
other utility. Here is the sample data.
Result
Now we are familiar about loading of data

LAB TASK # 12
Teradata Warehouse Builder Visual Interface
Object
To run the job scripts in the Teradata Warehouse Builder Visual Interface.
Introduction
The warehouse builder is used to create the warehouse. We can create the DBMS,
Schema, Table, TableSet, Operator, Logger, LogView, and Job by using the
Teradata Warehouse Builder Visual Interface.
Tools
2. Teradata Warehouse Builder Visual Interface
Theory
This demo script will show you how to familiarize yourself with the application,
to collect and retrieve recommendations for a table, and to execute the
recommendations. In this short lesson we will:
 Start the Teradata Warehouse Builder Visual Interface application
 Run a predefined data load job.
 Check to see data was loaded into Warehouse as a result of a job using
Teradata Administrator (WinDDI)
The following is a graphical representation of the tasks that will be performed in
Demo script 1. The other scripts are graphically represented at the end of this
demo.
Flat
file
Teradata
Union
All
Teradata
Data
Connector
ODBC
Operator
Update
OperatorUpdate
Operator
Data
Connector

Procedure
Start the Teradata Warehouse Builder Visual Interface application
To start Teradata Statistics Wizard click on the following menu from the Windows
Start bar:
Start >> Programs >> Teradata Client >> Teradata Warehouse Builder Visual
Interface or double click on the desktop shortcut.
The Teradata Warehouse Builder application window will open:
 Click “+” sign at the left of Job. A list of predefined jobs will be shown.
 Click “+” sign at the left of “Demo_01_setup” job. This will show nothing at this point.
 Click “Demo_01_setup” to highlight this job then right mouse click.
 Select “Submit” to run this job.

 Enter a name like “run-01” and click OK button.

 Answer OK to the pop-up window saying job is being started.
 If the following window does not appear, expand the “Demo_01_SETUP and click on the
run-01 selection. Under the job you will see the name of the running job and in the Job
Output window a message saying the job has started. Next click on the “Job Details” tab
 You will see the details of the setup tasks that are being performed. Wait until it
terminates then click back on the “Job Output” tab.

 Back in the Job State output window you will see the summary of the completed tasks.
Also, note that the icon for the job you just ran is now a checker flag indicating the job
has finished.
 Next select “DEMO_01” job to highlight it.

 Again right mouse click and select submit to run this new job.
 Like before fill in a job name like “run-01” and click on OK. You’ll also click on OK to
close the message box.
 If you don’t see any job information to right then click the “+” sign to the left of the job
name to reveal the running job underneath.
 Like before click on the “Job Details” tab to see the task being performed. For this job
you will see much more information. This job loads data from flat file and merges it with
a record read from Teradata via ODBC connection.

 Watch the job as it executes the various stages. This particular job will take several
minutes to complete as it load approximately 100,000 rows.

 Back on the “Job Output” tab you will see summary of all the steps completed.
 Now let have a look at the warehouse to see if indeed the rows were loaded. Bring up
Teradata Administrator via the Start>>Programs>>Teradata Administrator 6.0
 Double click DemoTDAT then OK to access the data. If you are not familiar with
Teradata Administrator then run through it’s demo script 1st
.
 Navigate down the left hand side until you find “twbdemo” database/user. Double click
on “twbdemo”.

 Next select the “twb_target_table”, which is where we just loaded data from
“Demo_01”.
 Right mouse click and select “Row Count”.

 You will see the count of 100,001 rows. 100,000 rows were added from flat files and 1
row came from the ODBC connection to “twb_source_table”.
Result
We are familiar with the Teradata warehouse builder visual interface.

LAB TASK # 13
Generating the Frequency Diagrams using the Miner
Object
Generating the frequency diagrams of our data using the Miner
Introduction
Compute frequency of column values or multi-column combined values.
Optionally, compute frequency of values for pairs of columns in a single column
list or two column lists.
Tools
2. Teradata Warehouse Miner
Theory
Frequency analysis is designed to count the occurrence of individual data values
in columns that contain categorical data. It can be useful in understanding the
meaning of a particular data element, and it may point out the need to recode
some of the data values found, either permanently or in the course of building an
analytic data set. This function can also be useful in analyzing combinations of
values occurring in two or more columns.
Procedure
Start the new project by clicking the new project icon or from file menu.
Now click the Project menu>>Add new Analysis.

In the Analysis window, from categories pane select Descriptive Statistics and
click on the Frequency icon in the Analysis pane
and then press the OK button.
 Select the Employee table from available tables.
 Select Deptno column.
 Click the right arrow to move the Deptno into the selected columns,
as shown in the fig below.

Start the report generating by Clicking on OR using Shortcut F5. The
status can be seen in the execution status pane. As shown below
The resultant report can be viewed by clicking on the Results icon in the
frequency window.

Output can be viewed in ways ( Data, Graph, SQL ) as shown below. Click on the
Graph icon in the Frequency window.
Result
We are familiar with the frequency diagrams and also we know how to
generate them.

LAB TASK # 11
Implementing Schemas
Object
To be familiar with schemas
Introduction
A schema is a set of metadata definitions about the columns and rows of a data
source or destination object, such as:
 Data types and column sizes
 Precision, scale, and null-value indicators
 Database tables, columns and rows
Tools
2. BTEQ
3. Tbuild
4. Teradata Administrator 6.0
Theory
Teradata WB uses schema definitions, which are similar to SQL’s table
definitions. The schema definitions used in Teradata WB:
 Represent virtual tables. They do not have to correspond to any actual
tables in the Teradata RDBMS.
 Contain column definitions: names, and data types.
 Act as reusable templates
 Describe the contents of various data sources and targets, such as files,
relational tables, etc.
 Are similar to record layout definitions used by the Teradata load and
unload utilities.
Procedure
Run the following code in the BTEQ utility to create the tables.
.LOGON dbc/Asad,Lodhi;
DATABASE thesis;
drop table RL_Event;

DROP table Event;
CREATE TABLE Event ( account_id char(12) CHARACTER SET LATIN,
account_type char(2) CHARACTER SET LATIN, Complaint_id integer ,
Complaint_detail varchar(50) CHARACTER SET LATIN, Actions_taken
varchar(20) CHARACTER SET LATIN, Remarks varchar(50) CHARACTER
SET LATIN );
.LOGOFF;
Then execute the following code the tbuild utility
DEFINE JOB PRODUCT_SOURCE_LOAD
DESCRIPTION 'LOAD PRODUCT DEFINITION TABLE'
(
DEFINE SCHEMA PRODUCT_SOURCE_SCHEMA
DESCRIPTION 'PRODUCT INFORMATION SCHEMA'
(
account_id char(12),
account_type char(2),
Complaint_id char(5),
Complaint_detail char(50),
Actions_taken char(20),

Remarks char(50),
newline3 char(2)
);
DEFINE OPERATOR LOAD_OPERATOR ()
DESCRIPTION 'TERADATA WB LOAD OPERATOR'
TYPE CONSUMER
INPUT SCHEMA *
EXTERNAL NAME 'libldop'
ALLOW PARALLEL MULTIPHASE
MSGCATALOG 'pcommon'
ATTRIBUTES
(
VARCHAR PauseAcq ,
INTEGER ErrorLimit = 50,
INTEGER BufferSize ,
INTEGER TenacityHours,
INTEGER TenacitySleep,
INTEGER MaxSessions = 2,
INTEGER MinSessions,
INTEGER RowInteval,
VARCHAR TdpID = 'dbc',
VARCHAR UserName = 'Asad',
VARCHAR UserPassword = 'Lodhi',
VARCHAR AccountID,

VARCHAR TargetTable = 'Event',
VARCHAR ErrorTable1 = 'Event_ERROR1',
VARCHAR ErrorTable2 = 'Event_ERROR2',
VARCHAR LogTable = 'RL_Event',
VARCHAR PrivateLogName ,
VARCHAR WorkingDatabase = 'thesis'
) ;
DEFINE OPERATOR DATACON
DESCRIPTION 'TERADATA WB DATACONNECTOR OPERATOR'
TYPE PRODUCER
OUTPUT SCHEMA PRODUCT_SOURCE_SCHEMA
EXTERNAL NAME 'libdtac'
ALLOW PARALLEL MULTIPHASE
MSGCATALOG 'pdatacon'
ATTRIBUTES
(
VARCHAR AccessModuleName ,
VARCHAR PrivateLogName ,
VARCHAR DirectoryPath = 'c:fl',
VARCHAR FileName = 'events.txt',
VARCHAR IndicatorMode = 'N',
VARCHAR OpenMode = 'read' ,
VARCHAR Format = 'UNFORMATTED'
) ;

APPLY
(
'INSERT INTO Event (
account_id ,
account_type ,
Complaint_id ,
Complaint_detail ,
Actions_taken ,
Remarks ) VALUES (
:account_id ,
:account_type ,
:Complaint_id ,
:Complaint_detail ,
:Actions_taken ,
:Remarks );
')
TO OPERATOR ( LOAD_OPERATOR() [1])
SELECT * FROM OPERATOR
( DATACON());
) ;

tbuild is a utility which is used to implement schemas on the tables, executing different jobs and
building operators. Above two codes defines and creates a complete warehouse.
You can check the data using any query executing tool.
Result
Now we are familiar with schemas and we know how to develop a warehouse.

LAB TASK # 14
Object
To become familiar with Teradata Parallel Transporter Wizard 13.0
Tools
6. Tbuild
7. Teradata Parallel Transporter Wizard 13.0
Introduction
Teradata PT is an object-oriented client application that provides scalable, high-speed, parallel
data:
 Extraction
 Loading
 Updating
These capabilities can be extended with customizations or with third-party products Teradata PT
uses and expands on the functionality of the traditional Teradata extract and load utilities, that is,
FastLoad, MultiLoad, FastExport, and TPump, also known as standalone Utilities.
Teradata PT supports:
• Process-specific operators: Teradata PT jobs are run using operators. These are discrete
object-oriented modules that perform specific extraction, loading, and updating
processes.
• Access modules: These are software modules that give Teradata PT access to various
data stores.
• A parallel execution structure: Teradata PT can simultaneously load data from multiple
and dissimilar data sources into, and extract data from, Teradata Database. In addition,
Teradata PT can execute multiple instances of an operator to run multiple and concurrent

loads and extracts and perform inline updating of data. Teradata PT maximizes
throughput performance through scalability and parallelism.
Basic Processing
Teradata PT can load data into, and export data from, any accessible database object in the
Teradata Database or other data store using Teradata PT operators or access modules.
Multiple targets are possible in a single Teradata PT job. A data target or destination for a
Teradata PT job can be any of the following:
Databases (both relational and non-relational)
• Database servers
• Data storage devices
• File objects, texts, and comma separated values (CSV)
When job scripts are submitted, Teradata PT can do the following:
Analyze the statements in the job script.
• Initialize its internal components.
• Create, optimize, and execute a parallel plan for completing the job by:
• Creating instances of the required operator objects.
• Creating a network of data streams that interconnect the operator instances.
• Coordinating the execution of the operators.
• Coordinate checkpoint and restart processing.
• Restart the job automatically when the Teradata Database signals restart.
• Terminate the processing environments.
Between the data source and destination, Teradata PT jobs can:
• Retrieve, store, and transport specific data objects using parallel data streams.
• Merge or split multiple parallel data streams.
• Duplicate data streams for loading multiple targets.
• Filter, condition, and cleanse data.
Teradata PT Parallel Environment
Although the traditional Teradata standalone utilities offer load and extract functions, these
utilities are limited to a serial environment. Figure given below illustrates the parallel
environment of Teradata PT.

Traditional Teradata Utilities Teradata Parallel Transporter
Teradata PT uses data streams that act as a pipeline between operators. With data streams, data
basically flows from one operator to another.
Teradata PT supports the following types of environments
 Pipeline Parallelism
• Data Parallelism
Pipeline Parallelism
Teradata PT pipeline parallelism is achieved by connecting operator instances through data
streams during a single job.
• An export operator on the left that extracts data from a data source and writes it to the
data stream.
• A filter operator extracts data from the data stream, processes it, then writes it to another
data stream.
• A load operator starts writing data to a target as soon as data is available from the data
stream.
All three operators, each running its own process, can operate independently and concurrently.
As the figure shows, data sources and destinations for Teradata PT jobs can include:

• Databases (both relational and non-relational)
• Database servers
• Data storage devices, such as tapes or DVD readers
• File objects, such as images, pictures, voice, and text
Data Parallelism
Figure given below shows how larger quantities of data can be processed by partitioning a source
data into a number of separate sets, with each partition handled by a separate instance of an
operator.

Teradata PT Data Parallelism
Verifying the Teradata PT Version
To verify the version of Teradata PT you are running, issue a tbuild command (on the command
line) with no options specified, as follows:
tbuild
Switching Versions
Multiple versions of Teradata Warehouse Builder (Teradata WB) and Teradata PT can be
installed.
Result
We are now become familiar with Teradata Parallel Transporter Wizard 13.0 .

LAB TASK # 15
Object
Creating a iob script a by using Teradata Parallel Transporter Wizard 13.0.
Tools
2. Tbuild
3. Teradata Parallel Transporter Wizard 13.0
Introduction
Creating a job script requires that you define the job components in the declarative section of the
job script, and then apply them in the executable section of the script to accomplish the desired
extract, load, or update tasks. The object definition statements in the declarative section of the
script can be in any order as long as they appear prior to being referenced by another object.
The following sections describe how to define the components of a Teradata PT job script.
• Defining the Job Header and Job Name
• Defining a Schema
• Defining Operators
• Coding the Executable Section
• Defining Job Steps
Defining the Job Header and Job Name
A Teradata PT script starts with an optional header that contains general information about the
job, and the required DEFINE JOB statement that names and describes the job, as shown in
Figure.

Job Header and Job Name
Consider the following when creating the job header and assigning the job name.
The Script Name shown in the job header is optional, and is there for quick reference. It can be
the same as the jobname or it can be the filename for the script.
• The jobname shown in the DEFINE JOB statement is required. It is best to use a descriptive
name, in the case of the example script, something like “Two Source Bulk
Update.”
Note that the jobname shown in the DEFINE JOB statement is not necessarily the same as the
“jobname” specified in the tbuild statement when launching the job, although it can be. The
tbuild statement might specify something like “Two Source Bulk Updateddmmyy,” to
differentiate a specific run of the job.
Defining a Schema
Teradata PT requires that the job script describe the structure of the data to be processed, that
is the columns in table rows or fields in file records. This description is called the schema.
Schemas are created using the DEFINE SCHEMA statement.
The value following the keyword SCHEMA in a DEFINE OPERATOR statement identifies the
schema that the operator will use to process job data. Schemas specified in operator definitions
must have been previously defined in the job script. To determine how many schemas you must

define, observe the following guidelines on how and why schemas are referenced in operator
definitions (except standalone operators):
The schema referenced in a producer operator definition describes the structure of the
source data.
• The schema referenced in a consumer operator definition describes the structure of the data that
will be loaded into the target. The consumer operator schema can be coded as SCHEMA * (a
deferred schema), which means that it will accept the scheme of the output data from the
producer.
• You can use the same schema for multiple operators.
• You cannot use multiple schemas within a single operator, except in filter operators, which use
two schemas (input and output).
• The column names in a schema definition in a Teradata PT script do not have to match the
actual column names of the target table, but their data types must match exactly. Note,
that when a Teradata PT job is processing character data in the UTF-16 character set, all
CHAR(m) and VARCHAR(n) schema columns will have byte count values m and n,
respectively, that are twice the character count values in the corresponding column definitions of
the DBS table. Because of this, m and n must be even numbers.
The following is an example of a schema definition:
Defining Operators
Choosing operators for use in a job script is based on the type of data source, the
characteristics of the target tables, and the specific operations to be performed.
Teradata PT scripts can contain one or more of the following operator types.
• Producer operators “produce” data streams after reading data from data sources.
• Consumer operators “consume” data from data streams and write it to target tables or files.
• Filter operators read data from input data streams, perform operations on the data or filter it,
and write it to output data streams. Filter operators are optional.
• Standalone operators issue Teradata SQL statements or host operating system commands to
set up or clean up jobs; they do not read from, or write to, the data stream.
Coding the Executable Section
After defining the Teradata PT script objects required for a job, you must code the executable
(processing) statement to specify which objects the script will use to execute the job tasks and
the order in which the tasks will be executed. The APPLY statement may also include data

transformations by including filter operators or through the use of derived columns in its
SELECT FROM.
A job script must always contain at least one APPLY statement, and if the job contains
multiple steps, each step must have an APPLY statement.
Coding the APPLY Statement
An APPLY statement typically contains two parts, which must appear in the order shown:
A DML statement (such as INSERT, UPDATE, or DELETE) that is applied TO the consumer
operator that will write the data to the target, as shown in Figure below. The statement may also
include a conditional CASE or WHERE clause.
2. For most jobs, the APPLY statement also includes the read activity, which uses a SELECT
FROM statement to reference the producer operator. If the APPLY statement uses a standalone
operator, it does not need the SELECT FROM statement.

Note: In Figure below, the SELECT statement also contains the UNION ALL statement to
combine the rows from two SELECT operations against separate sources, each with its own
operator.
Defining Job Steps
Job steps are units of execution in a Teradata PT job. Using job steps is optional, but when
used, they can execute multiple operations within a single Teradata PT job. Job steps are
subject to the following rules:
A job must have at least one step, but jobs with only one step do not need to use the STEP
syntax.
• Each job step contains an APPLY statement that specifies the operation to be performed
and the operators that will perform it.
• Most job steps involve the movement of data from one or more sources to one or more
targets, using a minimum of one producer and one consumer operator.
• Some job steps may use a single standalone operator, such as:
• DDL operator, for setup or cleanup operations in the Teradata Database.
• The Update operator, for bulk delete of data from the Teradata Database.
• OS Command operator, for operating system tasks such as file backup.
Using Job Steps
Job steps are executed in the order in which they appear within the DEFINE JOB statement.
Each job step must complete before the next step can begin. For example, the first job step
could execute a DDL operator to create a target table. The second step could execute a Load
operator to load the target table. A final step could then execute a cleanup operation.
The following is a sample of implementing multiple job steps:
DEFINE JOB multi-step
(
DEFINE SCHEMA...;
DEFINE SCHEMA...;

DEFINE OPERATOR...;
DEFINE OPERATOR...;
STEP first_step
(
APPLY...; /* DDL step */
);
STEP second_step
(
APPLY...; /* DML step */
);
STEP third_step
(
APPLY...; /* DDL step */
);
);
Starting a Job from a Specified Job Step
You can start a job from step one or from an intermediate step. The tbuild -s command
option allows you to specify the step from which the job should start, identifying it by either
the step name, as specified in the job STEP syntax, or by the implicit step number, such as 1, 2,
3, and so on. Job execution begins at the specified job step, skipping the job steps that precede
it in the script.
Result
We have now created a iob script a by using Teradata Parallel Transporter Wizard 13.0

Data warehousing labs maunal

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (19)

Semelhante a Data warehousing labs maunal

Semelhante a Data warehousing labs maunal (20)

Mais de Education

Mais de Education (12)

Último

Último (20)

Data warehousing labs maunal