We offer online IT training with placements, project assistance in different platforms with real time industry consultants to provide quality training for all it professionals, corporate clients and students etc. Special features by InformaticaTrainingClasses are Extensive Training will be in both Informatica Online Training and Placement. We help you in resume preparation and conducting Mock Interviews.
Emphasis is given on important topics which are essential and mostly used in real time projects. Informatica training Classes is an Online Training Leader when it comes to high-end effective and efficient I.T Training. We have always been and still are focusing on the key aspects which are providing utmost effective and competent training to both students and professionals who are eager to enrich their technical skills.
Training Features at Informatica training classes:
We believe that online training has to be measured by three major aspects viz., Quality, Content and Relationship with the Trainer and Student. Not only our online training classes are important but apart from that the material which we provide are in tune with the latest IT training standards, so a student has not to worry at all whether the training imparted is outdated or latest.
Course content:
• Basics of data warehousing concepts
• Power center components
• Informatica concepts and overview
• Sources
• Targets
• Transformations
• Advanced Informatica concepts
Please Visit us for the Demo Classes, we have regular batches and weekend batches.
Informatica online training classes
Phone: (404)-900-9988
Email: info@informaticatrainingclasses.com
Web: http://www.informaticatrainingclasses.com
Dataware house introduction by InformaticaTrainingClasses
1. INTRODUCTION TO DATA
WAREHOUSING
BY
INFORMATICATRAININGCLASSES
PHONE : (404)-900-9988
EMAIL : INFO@INFORMATICATRAININGCLASSES.COM
WEBSITE : WWW.INFORMATICATRAININGCLASSES.COM
2. DATAWAREHOUSE
Maintain historic data
Analysis to get better understanding of business
Better Decision making
Definition: A data warehouse is a
subject-oriented
integrated
time-varying
non-volatile
collection of data that is used primarily in organizational
decision making.
-- Bill Inmon, Building the Data Warehouse
1996
3. SUBJECT ORIENTED
• Data warehouse is organized around subjects such as sales,
product, customer.
• It focuses on modeling and analysis of data for decision
makers.
• Excludes data not useful in decision support process.
4. INTEGRATED
• Data Warehouse is constructed by integrating multiple
heterogeneous sources.
• Data Preprocessing are applied to ensure consistency.
RDBMS
Legacy
System
Data
Warehouse
Flat File
Data Processing
Data Transformation
Data Processing
Data Transformation
5. NON-VOLATILE
• Mostly, data once recorded will not be updated.
• Data warehouse requires two operations in data accessing
- Incremental loading of data
- Access of data
load access
6. TIME VARIANT
• Provides information from historical perspective e.g. past 5-
10 years
• Every key structure contains either implicitly or explicitly an
element of time
7. WHY DATA WAREHOUSE?
Problem Statement:
• ABC Pvt Ltd is a company with branches at USA,
UK,CANADA,INDIA
• The Sales Manager wants quarterly sales report across the
branches.
• Each branch has a separate operational system where sales
transactions are recorded.
8. WHY DATA WAREHOUSE?
USA
UK
CANADA
INDIA
Sales
Manager
Get quarterly sales figure
for each branch
and manually calculate
sales figure across branches.
What if he need daily sales report across the branches?
9. WHY DATA WAREHOUSE?
Solution:
• Extract sales information from each database.
• Store the information in a common repository at a single site.
10. WHY DATA WAREHOUSE?
USA
UK
CANADA
INDIA
Data
Warehouse
Sales
Manager
Query &
Analysis tools
11. CHARACTERISTICS OF DATAWAREHOUSE
Relational / Multidimensional database
Query and Analysis rather than transaction
Historical data from transactions
Consolidates Multiple data sources
Separates query load from transactions
Mostly non volatile
Large amount of data in order of TBs
12. WHEN WE SAY LARGE - WE MEAN IT!
• Terabytes -- 10^12 bytes:
• Petabytes -- 10^15 bytes:
• Exabytes -- 10^18 bytes:
• Zettabytes -- 10^21 bytes:
• Zottabytes -- 10^24 bytes:
Yahoo! – 300 Terabytes and
growing
Geographic Information Systems
National Medical Records
Weather images
Intelligence Agency Videos
13. OLTP VS DATA WAREHOUSE (OLAP)
OLTP Data Warehouse (OLAP)
Indexes Few Many
Data Normalized Generally De-normalized
Joins Many Some
Derived data and aggregates Rare Common
14. DATA WAREHOUSE ARCHITECTURE
Flat
Files
ETL
(Extract
Transform
and Load)
Data
Warehouse
Sales
Data Mart
Inventory
Data Mart
Analysis
Data Mining
Reporting
Generic
Data Mart
Operational
System
Operational
System
Flat
Files
15. ETL
ETL stands for Extract, Transform and Load
Data is distributed across different sources
– Flat files, Streaming Data, DB Systems, XML, JSON
Data can be in different format
– CSV, Key Value Pairs
Different units and representation
– Country: IN or India
– Date: 20 Nov 2010 or 20101020
16. ETL FUNCTIONS
Extract
– Collect data from different sources
– Parse data
– Remove unwanted data
Transform
– Project
– Generate Surrogate keys
– Encode data
– Join data from different sources
– Aggregate
Load
17. ETL STEPS
• The first step in ETL process is mapping the data between
source systems and target database.
• The second step is cleansing of source data in staging area.
• The third step is transforming cleansed source data.
• Fourth step is loading into the target system.
Data before ETL Processing:
Data after ETL Processing:
18. ETL GLOSSARY
Mapping:
Defining relationship between source and target objects.
Cleansing:
The process of resolving inconsistencies in source data.
Transformation:
The process of manipulating data. Any manipulation beyond
copying is a transformation. Examples include aggregating, and
integrating data from multiple sources.
Staging Area:
A place where data is processed before entering the warehouse.
19. DIMENSION
Categorizes the data. For example - time, location, etc.
A dimension can have one or more attributes. For example
- day, week and month are attributes of time dimension.
Role of dimensions in data warehousing.
- Slice and dice
- Filter by dimensions
20. TYPES OF DIMENSIONS
• Conformed Dimension - A dimension that is shared across fact tables.
• Junk Dimension - A junk dimension is a convenient grouping of flags
and indicators. For example, payment method, shipping method.
• De-generated Dimension - A dimension key, that has no attributes and
hence does not have its own dimension table. For example,
transaction number, invoice number. Value of these dimension is
mostly unique within a fact table.
• Role Playing Dimensions - Role Playing dimension refers to a
dimension that play different roles in fact tables depending on the
context. For example, the Date dimension can be used for the ordered
date, shipment date, and invoice date.
• Slowly Changing Dimensions - Dimensions that have data that
changes slowly, rather than changing on a time-based, regular
schedule.
21. TYPES OF SLOWLY CHANGING DIMENSION
• Type1 - The Type 1 methodology overwrites old data with new data, and
therefore does not track historical data at all.
• Type 2 - The Type 2 method tracks historical data by creating multiple records
for a given value in dimension table with separate surrogate keys.
• Type 3 - The Type 3 method tracks changes using separate columns. Whereas
Type 2 had unlimited history preservation, Type 3 has limited history
preservation, as it's limited to the number of columns we designate for storing
historical data.
• Type 4 - The Type 4 method is usually referred to as using "history tables",
where one table keeps the current data, and an additional table is used to keep
a record of all changes.
Type 1, 2 and 3 are commonly used.
Some books talks about Type 0 and 6 also.
http://en.wikipedia.org/wiki/Slowly_changing_dimension
22. FACTS
Facts are values that can be examined and analyzed.
For Example - Page Views, Unique Users, Pieces Sold,
Profit.
Fact and measure are synonymous.
Types of facts:
– Additive - Measures that can be added across all
dimensions.
– Non Additive - Measures that cannot be added across
all dimensions.
– Semi Additive - Measures that can be added across
few dimensions and not with others.
23. HOW TO STORE DATA?
Facts and Dimensions:
1. Select the business process to model
2. Declare the grain of the business process
3. Choose the dimensions that apply to each fact table row
4. Identify the numeric facts that will populate each fact table
row
24. DIMENSION TABLE
Contains attributes of dimensions e.g. month is an attribute
of Time dimension.
Can also have foreign keys to another dimension table
Usually identified by a unique integer primary key called
surrogate key
25. FACT TABLE
Contains Facts
Foreign keys to dimension tables
Primary Key: usually composite key of all FKs
26. TYPES OF SCHEMA USED IN DATA WAREHOUSE
Star Schema
Snowflake Schema
Fact Constellation Schema
27. STAR SCHEMA
Multi-dimensional Data
Dimension and Fact Tables
A fact table with pointers to Dimension tables
29. SNOWFLAKE SCHEMA
An extension of star schema in which the dimension tables
are partly or fully normalized.
Dimension table hierarchies broken down into simpler
tables.
31. FACT CONSTELLATION SCHEMA
• A fact constellation schema allows dimension tables to be
shared between fact tables.
• This Schema is used mainly for the aggregate fact tables,
OR where we want to split a fact table for better
comprehension.
For example, a separate fact table for daily, weekly and
monthly reporting requirement.
32. FACT CONSTELLATION SCHEMA
In this example, the dimensions tables for time, item, and location are
shared between both the sales and shipping fact tables.
34. DRILL DOWN
Time
Product
Category e.g Home Appliances
Sub Category e.g Kitchen Appliances
Product e.g Toaster
35. ROLL UP
Year
Quarter
Month
Fiscal Year
Fiscal Quarter
Fiscal Month
Fiscal Week
Day
36. SLICE & DICE
Time
Product
Product = Toaster
Time
37. PIVOTING
Time
Product
• Also called rotation
• Rotate on an axis
• Interchange Rows and Columns
Region
Product
38. ADVANTAGES OF DATA WAREHOUSE
• One consistent data store for reporting, forecasting, and
analysis
• Easier and timely access to data
• Scalability
• Trend analysis and detection
• Drill down analysis
39. DISADVANTAGES OF DATA WAREHOUSE
• Preparation may be time consuming.
• High associated cost
40. CASE STUDY: WHY DATA WAREHOUSE
• G2G Courier Pvt. Ltd. is an established brand in courier
industry which has its own network in main cities and also
have sub contracted in rural areas across the country to
various partners.
• The President of the company wants to look deep into the
financial health of the company and different performance
aspects.
41. CHALLENGES
• Apart from G2G’s own transaction system, each partner has
their own system which make the data very heterogeneous.
• Granularity of data in various systems is also different. For
eg: minute accuracy and day accuracy.
• To do analysis on metrics like Revenue and Timely delivery
across various geographical locations and partner, we need
to have a unified system.