SlideShare uma empresa Scribd logo
1 de 76
Accenture Ab Initio Training 1
Introduction to
Ab Initio
Prepared By : Ashok Chanda
Accenture Ab Initio Training 2
Ab initio Session 1
 Introduction to DWH
 Explanation of DW Architecture
 Operating System / Hardware Support
 Introduction to ETL Process
 Introduction to Ab Initio
 Explanation of Ab Initio Architecture
Accenture Ab Initio Training 3
What is Data Warehouse
 A data warehouse is a copy of transaction data
specifically structured for querying and
reporting.
 A data warehouse is a subject-oriented,
integrated, time-variant and non-volatile
collection of data in support of management's
decision making process.
 A data warehouse is a central repository for all
or significant parts of the data that an
enterprise's various business systems collect.
Accenture Ab Initio Training 4
Data Warehouse-Definitions
 A data warehouse is a database geared towards
the business intelligence requirements of an
organization. The data warehouse integrates
data from the various operational systems and is
typically loaded from these systems at regular
intervals. Data warehouses contain historical
information that enables analysis of business
performance over time. A collection of databases
combined with a flexible data extraction system.
Accenture Ab Initio Training 5
Data Warehouse
 A data warehouse can be normalized or
denormalized. It can be a relational
database, multidimensional database, flat
file, hierarchical database, object
database, etc. Data warehouse data often
gets changed. And data warehouses often
focus on a specific activity or entity.
Accenture Ab Initio Training 6
Why Use a Data Warehouse?
 Data Exploration and Discovery
 Integrated and Consistent data
 Quality assured data
 Easily accessible data
 Production and performance awareness
 Access to data in a timely manner
Accenture Ab Initio Training 7
Simplified Datawarehouse
Architecture
Accenture Ab Initio Training 8
Data warehouse Architecture
 Data Warehouses can be architected in many different
ways, depending on the specific needs of a
business. The model shown below is the "hub-and-
spokes" Data Warehousing architecture that is popular in
many organizations.
 In short, data is moved from databases used in
operational systems into a data warehouse staging area,
then into a data warehouse and finally into a set of
conformed data marts. Data is copied from one
database to another using a technology called ETL
(Extract, Transform, Load).
Accenture Ab Initio Training 9
Accenture Ab Initio Training 10
The ETL Process
 Capture
 Scrub or Data cleansing
 Transform
 Load and Index
Accenture Ab Initio Training 11
ETL Technology
 ETL Technology is an important component of the Data
Warehousing Architecture. It is used to copy data from
Operational Applications to the Data Warehouse Staging
Area, from the DW Staging Area into the Data
Warehouse and finally from the Data Warehouse into a
set of conformed Data Marts that are accessible by
decision makers.
 The ETL software extracts data, transforms values of
inconsistent data, cleanses "bad" data, filters data and
loads data into a target database. The scheduling of
ETL jobs is critical. Should there be a failure in one ETL
job, the remaining ETL jobs must respond appropriately.

Accenture Ab Initio Training 12
Data Warehouse Staging Area
 The Data Warehouse Staging Area is temporary location
where data from source systems is copied. A staging
area is mainly required in a Data Warehousing
Architecture for timing reasons. In short, all required
data must be available before data can be integrated
into the Data Warehouse.
 Due to varying business cycles, data processing cycles,
hardware and network resource limitations and
geographical factors, it is not feasible to extract all the
data from all Operational databases at exactly the same
time
Accenture Ab Initio Training 13
Examples- Staging Area
 For example, it might be reasonable to extract sales data on a daily
basis, however, daily extracts might not be suitable for financial
data that requires a month-end reconciliation process. Similarly, it
might be feasible to extract "customer" data from a database in
Singapore at noon eastern standard time, but this would not be
feasible for "customer" data in a Chicago database.
 Data in the Data Warehouse can be either persistent (i.e. remains
around for a long period) or transient (i.e. only remains around
temporarily).
 Not all business require a Data Warehouse Staging Area. For many
businesses it is feasible to use ETL to copy data directly from
operational databases into the Data Warehouse.
Accenture Ab Initio Training 14
Data warehouse
 The purpose of the Data Warehouse in the overall Data
Warehousing Architecture is to integrate corporate
data. It contains the "single version of truth" for the
organization that has been carefully constructed from
data stored in disparate internal and external operational
databases.
 The amount of data in the Data Warehouse is
massive. Data is stored at a very granular level of
detail. For example, every "sale" that has ever occurred
in the organization is recorded and related to dimensions
of interest. This allows data to be sliced and diced,
summed and grouped in unimaginable ways.
Accenture Ab Initio Training 15
Data Warehouse
 Contrary to popular opinion, the Data Warehouses does
not contain all the data in the organization. It's purpose
is to provide key business metrics that are needed by
the organization for strategic and tactical decision
making.
 Decision makers don't access the Data Warehouse
directly. This is done through various front-end Data
Warehouse Tools that read data from subject specific
Data Marts.
 The Data Warehouse can be either "relational" or
"dimensional". This depends on how the business
intends to use the information.
Accenture Ab Initio Training 16
Data Warehouse Environment
In addition to a
relational/multidimensional database, a
data warehouse environment often
consists of an ETL solution, an OLAP
engine, client analysis tools, and other
applications that manage the process of
gathering data and delivering it to
business users.
Accenture Ab Initio Training 17
Data Mart
 A subset of a data warehouse, for use by a
single department or function.
 A repository of data gathered from operational
data and other sources that is designed to serve
a particular community of knowledge workers.
 A subset of the information contained in a data
warehouse.
 Data marts have the same definition as the data
warehouse (see below), but data marts have a
more limited audience and/or data content.
Accenture Ab Initio Training 18
Data Mart
 ETL (Extract Transform Load) jobs extract data from the Data
Warehouse and populate one or more Data Marts for use by groups
of decision makers in the organizations. The Data Marts can be
Dimensional (Star Schemas) or relational, depending on how the
information is to be used and what "front end" Data Warehousing
Tools will be used to present the information.
 Each Data Mart can contain different combinations of tables,
columns and rows from the Enterprise Data Warehouse. For
example, an business unit or user group that doesn't require a lot of
historical data might only need transactions from the current
calendar year in the database. The Personnel Department might
need to see all details about employees, whereas data such as
"salary" or "home address" might not be appropriate for a Data Mart
that focuses on Sales.
Accenture Ab Initio Training 19
Star Schema
 The star schema is perhaps the simplest data
warehouse schema.
 It is called a star schema because the entity-
relationship diagram of this schema resembles a
star, with points radiating from a central table.
 The center of the star consists of a large fact
table and the points of the star are the
dimension tables.
Accenture Ab Initio Training 20
Star Schema – continued
 A star schema is characterized by one or
more very large fact tables that contain
the primary information in the data
warehouse, and a number of much
smaller dimension tables (or lookup
tables), each of which contains
information about the entries for a
particular attribute in the fact table.
Accenture Ab Initio Training 21
Advantages of Star Schemas
 Provide a direct and intuitive mapping between
the business entities being analyzed by end
users and the schema design.
 Provide highly optimized performance for typical
star queries.
 Are widely supported by a large number of
business intelligence tools, which may anticipate
or even require that the data-warehouse schema
contain dimension tables
 Star schemas are used for both simple data
marts and very large data warehouses.
Accenture Ab Initio Training 22
Star schema
 Diagrammatic representation of star
schema
Accenture Ab Initio Training 23
Snowflake Schema
 The snowflake schema is a more complex
data warehouse model than a star
schema, and is a type of star schema.
 It is called a snowflake schema because
the diagram of the schema resembles a
snowflake.
 Snowflake schemas normalize dimensions
to eliminate redundancy.
Accenture Ab Initio Training 24
Snowflake Schema - Example
 That is, the dimension data has been grouped
into multiple tables instead of one large table.
For example, a product dimension table in a star
schema might be normalized into a products
table, a product_category table, and a
product_manufacturer table in a snowflake
schema. While this saves space, it increases the
number of dimension tables and requires more
foreign key joins. The result is more complex
queries and reduced query performance.
Accenture Ab Initio Training 25
Diagrammatic representation
for Snowflake Schema
Accenture Ab Initio Training 26
Fact Table
The centralized table in a star schema is
called as FACT table. A fact table typically
has two types of columns: those that
contain facts and those that are foreign
keys to dimension tables. The primary key
of a fact table is usually a composite key
that is made up of all of its foreign keys.
Accenture Ab Initio Training 27
What happens during the ETL
process?
 During extraction, the desired data is identified and
extracted from many different sources, including
database systems and applications. Depending on the
source system's capabilities (for example, operating
system resources), some transformations may take place
during this extraction process. The size of the extracted
data varies from hundreds of kilobytes up to gigabytes,
depending on the source system and the business
situation. After extracting data, it has to be physically
transported to the target system or an intermediate
system for further processing.
Accenture Ab Initio Training 28
Examples of Second-
Generation ETL Tools
 Powermart 4.5 – Informatica Corporation
 Pioneer due to market share
 Ardent DataStage – Ardent Software, Inc.
 General-purpose tool oriented to data marts
 Sagent Data Mart Solution 3.0 – Sagent
Technology
 Progressively integrated with Microsoft
 Ab Initio 2.2 – Ab Initio Software
 A kit of tools that can be used to build applications
 Tapestry 2.1 – D2K, Inc
 End-to-end data warehousing solution from a single vendor
Accenture Ab Initio Training 29
What to look for in ETL tools
 Use optional data cleansing tool to clean-up source data
 Use extraction/transformation/load tool to retrieve,
cleanse, transform, summarize, aggregate, and load data
 Use modern, engine-driven technology for fast, parallel
operation
 Goal: define 100% of the transform rule with point and
click interface
 Support development of logical and physical data models
 Generate and manage central metadata repository
 Open metadata exchange architecture to integrate central
metadata with local metadata.
 Support metadata standards
 Provide end users access to metadata in business terms
Accenture Ab Initio Training 30
Operating System / Hardware
Support
 This section discusses how a DBMS utilizes
OS/hardware features such as parallel
functionality, SMP/MPP support, and
clustering. These OS/hardware features
greatly extend the scalability and improve
performance. However, managing an
environment with these features is difficult
and expensive.
Accenture Ab Initio Training 31
Parallel Functionality
 The introduction and maturation of parallel
processing environments are key enablers of
increasing database sizes, as well as providing
acceptable response times for storing, retrieving,
and administrating data. DBMS vendors are
continually bringing products to market that take
advantage of multi-processor hardware
platforms. These products can perform table
scans, backups, loads, and queries in parallel.
Accenture Ab Initio Training 32
Parallel Features
An overview of typical parallel functionality is given below :
 Queries — Parallel queries can enhance scalability for many query
operations
 Data load — Performance is always a serious issue when loading
large databases. Meeting response time requirements is the
overriding factor for determining the best load method and should
be a key part of a performance benchmark
 Create table as select — This feature makes it possible to create
aggregated tables in parallel
 Index creation — Parallel index creation exploits the benefits of
parallel hardware by distributing the workload generated by a large
index created for a large number of processors .
Accenture Ab Initio Training 33
Which parallel processor
configuration, SMP or MPP ?
 SMP and clustered SMP environments , have the
flexibility and ability to scale in small increments.
 SMP environments are often useful for the large,
but static data warehouse, where the data
cannot be easily partitioned, due to the
unpredictable nature of how the data is joined
over multiple tables for complex searches and
ad-hoc queries.
Accenture Ab Initio Training 34
Which parallel processor
configuration, SMP or MPP ?
 MPP works well in environments where growth is potentially
unlimited, access patterns to the database are predictable, and the
data can be easily partitioned across different MPP nodes with
minimal data accesses crossing between them. This often occurs in
large OLTP environments, where transactions are generally small
and predictable, as opposed to decision support and data
warehouse environments, where multiple tables can be joined in
unpredictable ways.
 In fact, data warehousing and decision support are the areas most
vendors of parallel hardware platforms and DBMSs are targeting.
 MPP does not scale well if heavy data warehouse database accesses
must cross MPP nodes, causing I/O bottlenecks over the MPP
interconnect, or if multiple MPP nodes are continually locked for
concurrent record updates.
Accenture Ab Initio Training 35
A Multi-CPU Computer (SMP)
Accenture Ab Initio Training 36
A Network of Multi-CPU Nodes
Accenture Ab Initio Training 37
A Network of Networks
Accenture Ab Initio Training 38
Parallel Computer Architecture
 Computers come in many “shapes and sizes”:
 Single-CPU, Multi-CPU
 Network of single-CPU computers
 Network of multi-CPU computers
 Multi-CPU machines are often called SMP’s (for
Symmetric Multi Processors).
 Specially-built networks of machines are often called
MPP’s (for Massively Parallel Processors).
Accenture Ab Initio Training 39
Introduction to Ab
Initio
Accenture Ab Initio Training 40
History of Ab Initio
 Ab Initio Software Corporation was founded
in the mid 1990's by Sheryl Handler, the former
CEO at Thinking Machines Corporation, after
TMC filed for bankruptcy. In addition to Handler,
other former TMC people involved in the
founding of Ab Initio included Cliff Lasser,
Angela Lordi, and Craig Stanfill.
 Ab Initio is known for being very secretive in the
way that they run their business, but their
software is widely regarded as top notch.
Accenture Ab Initio Training 41
History of Ab Initio
 The Ab Initio software is a fourth generation
data analysis, batch processing, data
manipulation graphical user interface (GUI)-
based parallel processing tool that is used
mainly to extract, transform and load data.
 The Ab Initio software is a suite of products that
together provides platform for robust data
processing applications. The Core Ab Initio
Products are: The [Co>Operating System] The
Component Library The Graphical Development
Environment
Accenture Ab Initio Training 42
What Does “Ab Initio” Mean?
 Ab Initio is Latin for “From the Beginning.”
 From the beginning our software was designed to
support a complete range of business applications, from
simple to the most complex. Crucial capabilities like
parallelism and checkpointing can’t be added after the
fact.
 The Graphical Development Environment and a powerful
set of components allow our customers to get valuable
results from the beginning.
Accenture Ab Initio Training 43
Ab Initio’s focus
 “Moving Data”
 move small and large volumes of data in an
efficient manner
 deal with the complexity associated with business
data
 High Performance
 scalable solutions
 Better productivity
Accenture Ab Initio Training 44
Ab Initio’s Software
 Ab Initio software is a general-purpose
data processing platform for mission-
critical applications such as:
 Data warehousing
 Batch processing
 Click-stream analysis
 Data movement
 Data transformation
Accenture Ab Initio Training 45
Applications of Ab Initio
Software
 Processing just about any form and volume of data.
 Parallel sort/merge processing.
 Data transformation.
 Rehosting of corporate data.
 Parallel execution of existing applications.
Accenture Ab Initio Training 46
Ab Initio Provides For:
 Distribution - a platform for applications to
execute across a collection of processors within
the confines of a single machine or across
multiple machines.
 Reduced Run Time Complexity - the ability for
applications to run in parallel on any
combination of computers where the Ab Initio
Co>Operating System is installed from a single
point of control.
Accenture Ab Initio Training 47
Applications of Ab Initio
Software in terms of Data
Warehouse
 Front end of Data Warehouse:
 Transformation of disparate sources
 Aggregation and other preprocessing
 Referential integrity checking
 Database loading
 Back end of Data Warehouse:
 Extraction for external processing
 Aggregation and loading of Data Marts
Accenture Ab Initio Training 48
Ab Initio or Informatica-
Powerful ETL
 Informatica and Ab Initio both support parallelism. But Informatica
supports only one type of parallelism but the Ab Initio supports
three types of parallelism. In Informatica the developer need to do
some partitions in server manager by using that you can achieve
parallelism concepts. But in Ab Initio the tool it self take care of
parallelism we have three types of parallelisms in Ab Initio 1.
Component 2. Data Parallelism 3. Pipe Line parallelism this is the
difference in parallelism concepts.
2. We don't have scheduler in Ab Initio like Informatica you need to
schedule through script or u need to run manually.
3. Ab Initio supports different types of text files means you can read
same file with different structures that is not possible in Informatica,
and also Ab Initio is more user friendly than Informatica so there is
a lot of differences in Informatica and Ab initio.
 8. AbInitio doesn't need a dedicated administrator, UNIX or NT Admin will suffice, where as other ETL tools do have administrative work.
Accenture Ab Initio Training 49
Ab Initio or Informatica-
Powerful ETL-continued
 Error Handling - In Ab Initio you can attach error and reject files to
each transformation and capture and analyze the message and data
separately. Informatica has one huge log! Very inefficient when
working on a large process, with numerous points of failure.
 Robust transformation language - Informatica is very basic as far as
transformations go. While I will not go into a function by function
comparison, it seems that Ab Initio was much more robust.
 Instant feedback - On execution, Ab Initio tells you how many
records have been processed/rejected/etc. and detailed
performance metrics for each component. Informatica has a debug
mode, but it is slow and difficult to adapt to.
Accenture Ab Initio Training 50
Both tools are fundamentally
different
Which one to use depends on the work at hand and
existing infrastructure and resources available.
Informatica is an engine based ETL tool, the power this
tool is in it's transformation engine and the code that it
generates after development cannot be seen or
modified. Ab Initio is a code based ETL tool, it generates
ksh or bat etc. code, which can be modified to achieve
the goals, if any that cannot be taken care through the
ETL tool itself.
Ab Initio doesn't need a dedicated administrator, UNIX
or NT Admin will suffice, where as other ETL tools do
have administrative work.
Accenture Ab Initio Training 51
Ab Initio Product Architecture
Native Operating System (Unix, Windows, OS/390)
The Ab Initio Co>Operating® System
Component
Library
Development Environments
GDE Shell
3rd Party
Components
User-defined
Components
User Applications
Ab Initio
EME
Accenture Ab Initio Training 52
Ab Initio Architecture-
Explanation
 The Ab Initio Cooperating system unites the network of
computing resources-CPUs,storage disks , programs ,
datasets into a production quality data processing
system with scalable performance and mainframe class
reliability.
 The Cooperating system is layered on the top of the
native operating systems of the collection of servers .It
provides a distributed model for process execution, file
management ,debugging, process monitoring ,
checkpointing .A user may perform all these functions
from a single point of control.
Accenture Ab Initio Training 53
Co>Operating System Services
 Parallel and distributed application execution
 Control
 Data Transport
 Transactional semantics at the application level.
 Checkpointing.
 Monitoring and debugging.
 Parallel file management.
 Metadata-driven components.
Accenture Ab Initio Training 54
Ab Initio: What We Do
 Ab Initio software helps you build large-scale data
processing applications and run them in parallel
environments. Ab Initio software consists of two main
programs:
 Co>Operating System:
which your system administrator installs on a host Unix
or Windows NT server, as well as on processing
computers.
 The Graphical Development Environment (GDE):
which you install on your PC (GDE Computer) and
configure to communicate with the host.
Accenture Ab Initio Training 55
The Ab Initio Co>Operating®
System
 The Co>Operating System Runs across
a variety of Operating Systems and
Hardware Platforms including OS/390 on
Mainframe, Unix, and Windows. Supports
distributed and parallel execution. Can
provide scalability proportional to the
hardware resources provided. Supports
platform independent data transport.
Accenture Ab Initio Training 56
The Ab Initio Co>Operating®
System-Continued
The Ab Initio Co>Operating System
depends on parallelism to connect (i.e.,
cooperate with) diverse databases. It
extracts,
transforms and loads data to and from
Teradata and other data sources.
Accenture Ab Initio Training 57
Solaris,
AIX, NT,
Linux,
NCR
Top Layer
Co-Op System
Any OS
Same Co-Op Command
On any OS.
Graphs can be moved from
One OS to another w/o any
Changes.
Co-Operating System Layer
GDE
GDE
GDE
GDE
Accenture Ab Initio Training 58
The Ab Initio Co>Operating System
Runs on:
 Sun Solaris
 IBM AIX
 Hewlett-Packard HP-
UX
 Siemens Pyramid
Reliant UNIX
 IBM DYNIX/ptx
 Silicon Graphics IRIX
 Red Hat Linux
 Windows NT 4.0
(x86)
 Windows NT 2000
(x86)
 Compaq Tru64 UNIX
 IBM OS/390
 NCR MP-RAS
Accenture Ab Initio Training 59
Connectivity to Other Software
 Common, high performance database
interfaces:
 IBM DB2, DB2/PE, DB2EEE, UDB, IMS
 Oracle, Informix XPS,Sybase,Teradata,MS SQL
Server 7
 OLE-DB
 ODBC
 Other software packages:
 Connectors to many other third party products
 Trillium, ErWin, Siebel, etc.
Accenture Ab Initio Training 60
Ab Initio Cooperating System
Ab Initio Software Corporation, headquartered in Lexington, MA, develops
software solutions that process vast amounts of data (well into the terabyte
range) in a timely fashion by employing many (often hundreds) of server
processors in parallel. Major corporations worldwide use Ab Initio software
in mission critical, enterprise-wide, data processing systems. Together,
Teradata and Ab Initio
deliver:
• End-to-end solutions for integrating and processing data throughout
the enterprise
• Software that is flexible, efficient, and robust, with unlimited scalability
• Professional and highly responsive support
The Co>Operating System executes your application by creating and managing
the processes and data flows that the components and arrows represent.
Accenture Ab Initio Training 61
Graphical Development
Environment GDE
Accenture Ab Initio Training 62
The GDE
The Graphical Development Environment (GDE) provides
a graphical user interface into the services of the
Co>Operating System. The Graphical Development
Environment Enables you to create applications by
dragging and dropping Components. Allows you to point
and click operations on executable flow charts. The
Co>Operating System can execute these flowcharts
directly. Graphical monitoring of running applications
allows you to quantify data volumes and execution
times, helping spot opportunities for improving
performance.
Accenture Ab Initio Training 63
The Graph Model
Accenture Ab Initio Training 64
The Component Library:
 The Component Library: Reusable software
Modules for Sorting, Data Transformation,
database Loading Etc. The components adapt at
runtime to the record formats and business rules
controlling their behavior.
 Ab Initio products have helped reduce a
project’s development and research time
significantly.
Accenture Ab Initio Training 65
Components
 Components may run on any computer running
the Co>Operating System.
 Different components do different jobs.
 The particular work a component accomplishes
depends upon its parameter settings.
 Some parameters are data transformations, that
is business rules to be applied to an input (s) to
produce a required output.
Accenture Ab Initio Training 66
3rd Party Components
Accenture Ab Initio Training 67
EME
 The Enterprise Meta>Environment (EME) is a high-
performance object-oriented storage system that
inventories and manages various kinds of information
associated with Ab Initio applications. It provides storage
for all aspects of your data processing system, from
design information to operations data.
 The EME also provides rich store for the applications
themselves, including data formats and business rules. It
acts as hub for data and definitions . Integrated
metadata management provides the global and
consolidated view of the structure and meaning of
applications and data- information that is usually
scattered throughout you business .
Accenture Ab Initio Training 68
Benefits of EME
The Enterprise Meta>Environment provides a rich store
for applications and all of their associated information
including :
 Technical Metadata-Applications related business rules
,record formats and execution statistics
 Business Metadata-User defined documentations of job
functions ,roles and responsibilities.
Metadata is data about data and is critical to understanding
and driving your business process and computational
resources .Storing and using metadata is as important to
your business as storing and using data.
Accenture Ab Initio Training 69
EME-Ab Initio Relevance
 By integrating technical and business
metadata ,you can grasp the entirety of
your data processing – from operational to
analytical systems.
 The EME is completely integrated
environment. The following figure shows
how it fits in to the high level architecture
of Ab Initio software.
Accenture Ab Initio Training 70
Accenture Ab Initio Training 71
Stepwise explanation of Ab
Initio Architecture
 You construct your application from the building blocks
called components, manipulating them through the
Graphical Development Environment (GDE).
 You check in your applications to the EME.
 The EME and GDE uses the underlining functionality of
the Co>Operating System to perform many of their
tasks. The Cooperating System units the distributed
resources into a single “virtual computer” to run
applications in parallel.
 Ab Initio software runs on Unix ,Windows NT,MVS
operating systems.
Accenture Ab Initio Training 72
Stepwise explanation of Ab
Initio Architecture - continued
 Ab Initio connector applications extract
metadata from third part metadata sources into
the EME or extract it from the EME into a third
party destination.
 You view the results of project and application
dependency analysis through a Web user
interface .You also view and edit your business
metadata through a web user interface.
Accenture Ab Initio Training 73
EME :Various users
constituency served
The EME addresses the metadata needs of
three different constituencies:
 Business Users
 Developers
 System Administrators
Accenture Ab Initio Training 74
EME :Various users
constituency served
 Business users are interested in exploiting data
for analysis, in particular with regard to
databases ,tables and columns.
 Developers tend to be oriented towards
applications ,needing to analyze the impact of
potential program changes.
 System Administrator and production personnel
want job status information and run statistics.
Accenture Ab Initio Training 75
EME Interfaces
We can create and manage EME through
3 interfaces:
 GDE
 Web User Interface
 Air Utility
Accenture Ab Initio Training 76
Thank You
End of Session 1

Mais conteúdo relacionado

Semelhante a DataWarehousingandAbInitioConcepts.ppt

Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl conceptsjeshocarme
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
SAP BODS -quick guide.docx
SAP BODS -quick guide.docxSAP BODS -quick guide.docx
SAP BODS -quick guide.docxKen T
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseShanthi Mukkavilli
 
Implementation of Data Marts in Data ware house
Implementation of Data Marts in Data ware houseImplementation of Data Marts in Data ware house
Implementation of Data Marts in Data ware houseIJARIIT
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
Data warehouse
Data warehouseData warehouse
Data warehouse_123_
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
Dataware housing
Dataware housingDataware housing
Dataware housingwork
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.pptBsMath3rdsem
 
An Overview On Data Warehousing An Overview On Data Warehousing
An Overview On Data Warehousing An Overview On Data WarehousingAn Overview On Data Warehousing An Overview On Data Warehousing
An Overview On Data Warehousing An Overview On Data WarehousingBRNSSPublicationHubI
 
BI LECTURE 3- 2023.pptx
BI LECTURE 3- 2023.pptxBI LECTURE 3- 2023.pptx
BI LECTURE 3- 2023.pptxAmanyaLaban
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designSarita Kataria
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaionsridhark1981
 

Semelhante a DataWarehousingandAbInitioConcepts.ppt (20)

Data warehouse
Data warehouseData warehouse
Data warehouse
 
Lesson 2.docx
Lesson 2.docxLesson 2.docx
Lesson 2.docx
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
SAP BODS -quick guide.docx
SAP BODS -quick guide.docxSAP BODS -quick guide.docx
SAP BODS -quick guide.docx
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
Implementation of Data Marts in Data ware house
Implementation of Data Marts in Data ware houseImplementation of Data Marts in Data ware house
Implementation of Data Marts in Data ware house
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
An Overview On Data Warehousing An Overview On Data Warehousing
An Overview On Data Warehousing An Overview On Data WarehousingAn Overview On Data Warehousing An Overview On Data Warehousing
An Overview On Data Warehousing An Overview On Data Warehousing
 
BI LECTURE 3- 2023.pptx
BI LECTURE 3- 2023.pptxBI LECTURE 3- 2023.pptx
BI LECTURE 3- 2023.pptx
 
DW 101
DW 101DW 101
DW 101
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-design
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaion
 

Último

M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfOnline Income Engine
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insightsseri bangash
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxpriyanshujha201
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...amitlee9823
 
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...lizamodels9
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒anilsa9823
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetDenis Gagné
 

Último (20)

M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdf
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insights
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...
Call Girls In Holiday Inn Express Gurugram➥99902@11544 ( Best price)100% Genu...
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
 

DataWarehousingandAbInitioConcepts.ppt

  • 1. Accenture Ab Initio Training 1 Introduction to Ab Initio Prepared By : Ashok Chanda
  • 2. Accenture Ab Initio Training 2 Ab initio Session 1  Introduction to DWH  Explanation of DW Architecture  Operating System / Hardware Support  Introduction to ETL Process  Introduction to Ab Initio  Explanation of Ab Initio Architecture
  • 3. Accenture Ab Initio Training 3 What is Data Warehouse  A data warehouse is a copy of transaction data specifically structured for querying and reporting.  A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process.  A data warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems collect.
  • 4. Accenture Ab Initio Training 4 Data Warehouse-Definitions  A data warehouse is a database geared towards the business intelligence requirements of an organization. The data warehouse integrates data from the various operational systems and is typically loaded from these systems at regular intervals. Data warehouses contain historical information that enables analysis of business performance over time. A collection of databases combined with a flexible data extraction system.
  • 5. Accenture Ab Initio Training 5 Data Warehouse  A data warehouse can be normalized or denormalized. It can be a relational database, multidimensional database, flat file, hierarchical database, object database, etc. Data warehouse data often gets changed. And data warehouses often focus on a specific activity or entity.
  • 6. Accenture Ab Initio Training 6 Why Use a Data Warehouse?  Data Exploration and Discovery  Integrated and Consistent data  Quality assured data  Easily accessible data  Production and performance awareness  Access to data in a timely manner
  • 7. Accenture Ab Initio Training 7 Simplified Datawarehouse Architecture
  • 8. Accenture Ab Initio Training 8 Data warehouse Architecture  Data Warehouses can be architected in many different ways, depending on the specific needs of a business. The model shown below is the "hub-and- spokes" Data Warehousing architecture that is popular in many organizations.  In short, data is moved from databases used in operational systems into a data warehouse staging area, then into a data warehouse and finally into a set of conformed data marts. Data is copied from one database to another using a technology called ETL (Extract, Transform, Load).
  • 9. Accenture Ab Initio Training 9
  • 10. Accenture Ab Initio Training 10 The ETL Process  Capture  Scrub or Data cleansing  Transform  Load and Index
  • 11. Accenture Ab Initio Training 11 ETL Technology  ETL Technology is an important component of the Data Warehousing Architecture. It is used to copy data from Operational Applications to the Data Warehouse Staging Area, from the DW Staging Area into the Data Warehouse and finally from the Data Warehouse into a set of conformed Data Marts that are accessible by decision makers.  The ETL software extracts data, transforms values of inconsistent data, cleanses "bad" data, filters data and loads data into a target database. The scheduling of ETL jobs is critical. Should there be a failure in one ETL job, the remaining ETL jobs must respond appropriately. 
  • 12. Accenture Ab Initio Training 12 Data Warehouse Staging Area  The Data Warehouse Staging Area is temporary location where data from source systems is copied. A staging area is mainly required in a Data Warehousing Architecture for timing reasons. In short, all required data must be available before data can be integrated into the Data Warehouse.  Due to varying business cycles, data processing cycles, hardware and network resource limitations and geographical factors, it is not feasible to extract all the data from all Operational databases at exactly the same time
  • 13. Accenture Ab Initio Training 13 Examples- Staging Area  For example, it might be reasonable to extract sales data on a daily basis, however, daily extracts might not be suitable for financial data that requires a month-end reconciliation process. Similarly, it might be feasible to extract "customer" data from a database in Singapore at noon eastern standard time, but this would not be feasible for "customer" data in a Chicago database.  Data in the Data Warehouse can be either persistent (i.e. remains around for a long period) or transient (i.e. only remains around temporarily).  Not all business require a Data Warehouse Staging Area. For many businesses it is feasible to use ETL to copy data directly from operational databases into the Data Warehouse.
  • 14. Accenture Ab Initio Training 14 Data warehouse  The purpose of the Data Warehouse in the overall Data Warehousing Architecture is to integrate corporate data. It contains the "single version of truth" for the organization that has been carefully constructed from data stored in disparate internal and external operational databases.  The amount of data in the Data Warehouse is massive. Data is stored at a very granular level of detail. For example, every "sale" that has ever occurred in the organization is recorded and related to dimensions of interest. This allows data to be sliced and diced, summed and grouped in unimaginable ways.
  • 15. Accenture Ab Initio Training 15 Data Warehouse  Contrary to popular opinion, the Data Warehouses does not contain all the data in the organization. It's purpose is to provide key business metrics that are needed by the organization for strategic and tactical decision making.  Decision makers don't access the Data Warehouse directly. This is done through various front-end Data Warehouse Tools that read data from subject specific Data Marts.  The Data Warehouse can be either "relational" or "dimensional". This depends on how the business intends to use the information.
  • 16. Accenture Ab Initio Training 16 Data Warehouse Environment In addition to a relational/multidimensional database, a data warehouse environment often consists of an ETL solution, an OLAP engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.
  • 17. Accenture Ab Initio Training 17 Data Mart  A subset of a data warehouse, for use by a single department or function.  A repository of data gathered from operational data and other sources that is designed to serve a particular community of knowledge workers.  A subset of the information contained in a data warehouse.  Data marts have the same definition as the data warehouse (see below), but data marts have a more limited audience and/or data content.
  • 18. Accenture Ab Initio Training 18 Data Mart  ETL (Extract Transform Load) jobs extract data from the Data Warehouse and populate one or more Data Marts for use by groups of decision makers in the organizations. The Data Marts can be Dimensional (Star Schemas) or relational, depending on how the information is to be used and what "front end" Data Warehousing Tools will be used to present the information.  Each Data Mart can contain different combinations of tables, columns and rows from the Enterprise Data Warehouse. For example, an business unit or user group that doesn't require a lot of historical data might only need transactions from the current calendar year in the database. The Personnel Department might need to see all details about employees, whereas data such as "salary" or "home address" might not be appropriate for a Data Mart that focuses on Sales.
  • 19. Accenture Ab Initio Training 19 Star Schema  The star schema is perhaps the simplest data warehouse schema.  It is called a star schema because the entity- relationship diagram of this schema resembles a star, with points radiating from a central table.  The center of the star consists of a large fact table and the points of the star are the dimension tables.
  • 20. Accenture Ab Initio Training 20 Star Schema – continued  A star schema is characterized by one or more very large fact tables that contain the primary information in the data warehouse, and a number of much smaller dimension tables (or lookup tables), each of which contains information about the entries for a particular attribute in the fact table.
  • 21. Accenture Ab Initio Training 21 Advantages of Star Schemas  Provide a direct and intuitive mapping between the business entities being analyzed by end users and the schema design.  Provide highly optimized performance for typical star queries.  Are widely supported by a large number of business intelligence tools, which may anticipate or even require that the data-warehouse schema contain dimension tables  Star schemas are used for both simple data marts and very large data warehouses.
  • 22. Accenture Ab Initio Training 22 Star schema  Diagrammatic representation of star schema
  • 23. Accenture Ab Initio Training 23 Snowflake Schema  The snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema.  It is called a snowflake schema because the diagram of the schema resembles a snowflake.  Snowflake schemas normalize dimensions to eliminate redundancy.
  • 24. Accenture Ab Initio Training 24 Snowflake Schema - Example  That is, the dimension data has been grouped into multiple tables instead of one large table. For example, a product dimension table in a star schema might be normalized into a products table, a product_category table, and a product_manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance.
  • 25. Accenture Ab Initio Training 25 Diagrammatic representation for Snowflake Schema
  • 26. Accenture Ab Initio Training 26 Fact Table The centralized table in a star schema is called as FACT table. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys.
  • 27. Accenture Ab Initio Training 27 What happens during the ETL process?  During extraction, the desired data is identified and extracted from many different sources, including database systems and applications. Depending on the source system's capabilities (for example, operating system resources), some transformations may take place during this extraction process. The size of the extracted data varies from hundreds of kilobytes up to gigabytes, depending on the source system and the business situation. After extracting data, it has to be physically transported to the target system or an intermediate system for further processing.
  • 28. Accenture Ab Initio Training 28 Examples of Second- Generation ETL Tools  Powermart 4.5 – Informatica Corporation  Pioneer due to market share  Ardent DataStage – Ardent Software, Inc.  General-purpose tool oriented to data marts  Sagent Data Mart Solution 3.0 – Sagent Technology  Progressively integrated with Microsoft  Ab Initio 2.2 – Ab Initio Software  A kit of tools that can be used to build applications  Tapestry 2.1 – D2K, Inc  End-to-end data warehousing solution from a single vendor
  • 29. Accenture Ab Initio Training 29 What to look for in ETL tools  Use optional data cleansing tool to clean-up source data  Use extraction/transformation/load tool to retrieve, cleanse, transform, summarize, aggregate, and load data  Use modern, engine-driven technology for fast, parallel operation  Goal: define 100% of the transform rule with point and click interface  Support development of logical and physical data models  Generate and manage central metadata repository  Open metadata exchange architecture to integrate central metadata with local metadata.  Support metadata standards  Provide end users access to metadata in business terms
  • 30. Accenture Ab Initio Training 30 Operating System / Hardware Support  This section discusses how a DBMS utilizes OS/hardware features such as parallel functionality, SMP/MPP support, and clustering. These OS/hardware features greatly extend the scalability and improve performance. However, managing an environment with these features is difficult and expensive.
  • 31. Accenture Ab Initio Training 31 Parallel Functionality  The introduction and maturation of parallel processing environments are key enablers of increasing database sizes, as well as providing acceptable response times for storing, retrieving, and administrating data. DBMS vendors are continually bringing products to market that take advantage of multi-processor hardware platforms. These products can perform table scans, backups, loads, and queries in parallel.
  • 32. Accenture Ab Initio Training 32 Parallel Features An overview of typical parallel functionality is given below :  Queries — Parallel queries can enhance scalability for many query operations  Data load — Performance is always a serious issue when loading large databases. Meeting response time requirements is the overriding factor for determining the best load method and should be a key part of a performance benchmark  Create table as select — This feature makes it possible to create aggregated tables in parallel  Index creation — Parallel index creation exploits the benefits of parallel hardware by distributing the workload generated by a large index created for a large number of processors .
  • 33. Accenture Ab Initio Training 33 Which parallel processor configuration, SMP or MPP ?  SMP and clustered SMP environments , have the flexibility and ability to scale in small increments.  SMP environments are often useful for the large, but static data warehouse, where the data cannot be easily partitioned, due to the unpredictable nature of how the data is joined over multiple tables for complex searches and ad-hoc queries.
  • 34. Accenture Ab Initio Training 34 Which parallel processor configuration, SMP or MPP ?  MPP works well in environments where growth is potentially unlimited, access patterns to the database are predictable, and the data can be easily partitioned across different MPP nodes with minimal data accesses crossing between them. This often occurs in large OLTP environments, where transactions are generally small and predictable, as opposed to decision support and data warehouse environments, where multiple tables can be joined in unpredictable ways.  In fact, data warehousing and decision support are the areas most vendors of parallel hardware platforms and DBMSs are targeting.  MPP does not scale well if heavy data warehouse database accesses must cross MPP nodes, causing I/O bottlenecks over the MPP interconnect, or if multiple MPP nodes are continually locked for concurrent record updates.
  • 35. Accenture Ab Initio Training 35 A Multi-CPU Computer (SMP)
  • 36. Accenture Ab Initio Training 36 A Network of Multi-CPU Nodes
  • 37. Accenture Ab Initio Training 37 A Network of Networks
  • 38. Accenture Ab Initio Training 38 Parallel Computer Architecture  Computers come in many “shapes and sizes”:  Single-CPU, Multi-CPU  Network of single-CPU computers  Network of multi-CPU computers  Multi-CPU machines are often called SMP’s (for Symmetric Multi Processors).  Specially-built networks of machines are often called MPP’s (for Massively Parallel Processors).
  • 39. Accenture Ab Initio Training 39 Introduction to Ab Initio
  • 40. Accenture Ab Initio Training 40 History of Ab Initio  Ab Initio Software Corporation was founded in the mid 1990's by Sheryl Handler, the former CEO at Thinking Machines Corporation, after TMC filed for bankruptcy. In addition to Handler, other former TMC people involved in the founding of Ab Initio included Cliff Lasser, Angela Lordi, and Craig Stanfill.  Ab Initio is known for being very secretive in the way that they run their business, but their software is widely regarded as top notch.
  • 41. Accenture Ab Initio Training 41 History of Ab Initio  The Ab Initio software is a fourth generation data analysis, batch processing, data manipulation graphical user interface (GUI)- based parallel processing tool that is used mainly to extract, transform and load data.  The Ab Initio software is a suite of products that together provides platform for robust data processing applications. The Core Ab Initio Products are: The [Co>Operating System] The Component Library The Graphical Development Environment
  • 42. Accenture Ab Initio Training 42 What Does “Ab Initio” Mean?  Ab Initio is Latin for “From the Beginning.”  From the beginning our software was designed to support a complete range of business applications, from simple to the most complex. Crucial capabilities like parallelism and checkpointing can’t be added after the fact.  The Graphical Development Environment and a powerful set of components allow our customers to get valuable results from the beginning.
  • 43. Accenture Ab Initio Training 43 Ab Initio’s focus  “Moving Data”  move small and large volumes of data in an efficient manner  deal with the complexity associated with business data  High Performance  scalable solutions  Better productivity
  • 44. Accenture Ab Initio Training 44 Ab Initio’s Software  Ab Initio software is a general-purpose data processing platform for mission- critical applications such as:  Data warehousing  Batch processing  Click-stream analysis  Data movement  Data transformation
  • 45. Accenture Ab Initio Training 45 Applications of Ab Initio Software  Processing just about any form and volume of data.  Parallel sort/merge processing.  Data transformation.  Rehosting of corporate data.  Parallel execution of existing applications.
  • 46. Accenture Ab Initio Training 46 Ab Initio Provides For:  Distribution - a platform for applications to execute across a collection of processors within the confines of a single machine or across multiple machines.  Reduced Run Time Complexity - the ability for applications to run in parallel on any combination of computers where the Ab Initio Co>Operating System is installed from a single point of control.
  • 47. Accenture Ab Initio Training 47 Applications of Ab Initio Software in terms of Data Warehouse  Front end of Data Warehouse:  Transformation of disparate sources  Aggregation and other preprocessing  Referential integrity checking  Database loading  Back end of Data Warehouse:  Extraction for external processing  Aggregation and loading of Data Marts
  • 48. Accenture Ab Initio Training 48 Ab Initio or Informatica- Powerful ETL  Informatica and Ab Initio both support parallelism. But Informatica supports only one type of parallelism but the Ab Initio supports three types of parallelism. In Informatica the developer need to do some partitions in server manager by using that you can achieve parallelism concepts. But in Ab Initio the tool it self take care of parallelism we have three types of parallelisms in Ab Initio 1. Component 2. Data Parallelism 3. Pipe Line parallelism this is the difference in parallelism concepts. 2. We don't have scheduler in Ab Initio like Informatica you need to schedule through script or u need to run manually. 3. Ab Initio supports different types of text files means you can read same file with different structures that is not possible in Informatica, and also Ab Initio is more user friendly than Informatica so there is a lot of differences in Informatica and Ab initio.  8. AbInitio doesn't need a dedicated administrator, UNIX or NT Admin will suffice, where as other ETL tools do have administrative work.
  • 49. Accenture Ab Initio Training 49 Ab Initio or Informatica- Powerful ETL-continued  Error Handling - In Ab Initio you can attach error and reject files to each transformation and capture and analyze the message and data separately. Informatica has one huge log! Very inefficient when working on a large process, with numerous points of failure.  Robust transformation language - Informatica is very basic as far as transformations go. While I will not go into a function by function comparison, it seems that Ab Initio was much more robust.  Instant feedback - On execution, Ab Initio tells you how many records have been processed/rejected/etc. and detailed performance metrics for each component. Informatica has a debug mode, but it is slow and difficult to adapt to.
  • 50. Accenture Ab Initio Training 50 Both tools are fundamentally different Which one to use depends on the work at hand and existing infrastructure and resources available. Informatica is an engine based ETL tool, the power this tool is in it's transformation engine and the code that it generates after development cannot be seen or modified. Ab Initio is a code based ETL tool, it generates ksh or bat etc. code, which can be modified to achieve the goals, if any that cannot be taken care through the ETL tool itself. Ab Initio doesn't need a dedicated administrator, UNIX or NT Admin will suffice, where as other ETL tools do have administrative work.
  • 51. Accenture Ab Initio Training 51 Ab Initio Product Architecture Native Operating System (Unix, Windows, OS/390) The Ab Initio Co>Operating® System Component Library Development Environments GDE Shell 3rd Party Components User-defined Components User Applications Ab Initio EME
  • 52. Accenture Ab Initio Training 52 Ab Initio Architecture- Explanation  The Ab Initio Cooperating system unites the network of computing resources-CPUs,storage disks , programs , datasets into a production quality data processing system with scalable performance and mainframe class reliability.  The Cooperating system is layered on the top of the native operating systems of the collection of servers .It provides a distributed model for process execution, file management ,debugging, process monitoring , checkpointing .A user may perform all these functions from a single point of control.
  • 53. Accenture Ab Initio Training 53 Co>Operating System Services  Parallel and distributed application execution  Control  Data Transport  Transactional semantics at the application level.  Checkpointing.  Monitoring and debugging.  Parallel file management.  Metadata-driven components.
  • 54. Accenture Ab Initio Training 54 Ab Initio: What We Do  Ab Initio software helps you build large-scale data processing applications and run them in parallel environments. Ab Initio software consists of two main programs:  Co>Operating System: which your system administrator installs on a host Unix or Windows NT server, as well as on processing computers.  The Graphical Development Environment (GDE): which you install on your PC (GDE Computer) and configure to communicate with the host.
  • 55. Accenture Ab Initio Training 55 The Ab Initio Co>Operating® System  The Co>Operating System Runs across a variety of Operating Systems and Hardware Platforms including OS/390 on Mainframe, Unix, and Windows. Supports distributed and parallel execution. Can provide scalability proportional to the hardware resources provided. Supports platform independent data transport.
  • 56. Accenture Ab Initio Training 56 The Ab Initio Co>Operating® System-Continued The Ab Initio Co>Operating System depends on parallelism to connect (i.e., cooperate with) diverse databases. It extracts, transforms and loads data to and from Teradata and other data sources.
  • 57. Accenture Ab Initio Training 57 Solaris, AIX, NT, Linux, NCR Top Layer Co-Op System Any OS Same Co-Op Command On any OS. Graphs can be moved from One OS to another w/o any Changes. Co-Operating System Layer GDE GDE GDE GDE
  • 58. Accenture Ab Initio Training 58 The Ab Initio Co>Operating System Runs on:  Sun Solaris  IBM AIX  Hewlett-Packard HP- UX  Siemens Pyramid Reliant UNIX  IBM DYNIX/ptx  Silicon Graphics IRIX  Red Hat Linux  Windows NT 4.0 (x86)  Windows NT 2000 (x86)  Compaq Tru64 UNIX  IBM OS/390  NCR MP-RAS
  • 59. Accenture Ab Initio Training 59 Connectivity to Other Software  Common, high performance database interfaces:  IBM DB2, DB2/PE, DB2EEE, UDB, IMS  Oracle, Informix XPS,Sybase,Teradata,MS SQL Server 7  OLE-DB  ODBC  Other software packages:  Connectors to many other third party products  Trillium, ErWin, Siebel, etc.
  • 60. Accenture Ab Initio Training 60 Ab Initio Cooperating System Ab Initio Software Corporation, headquartered in Lexington, MA, develops software solutions that process vast amounts of data (well into the terabyte range) in a timely fashion by employing many (often hundreds) of server processors in parallel. Major corporations worldwide use Ab Initio software in mission critical, enterprise-wide, data processing systems. Together, Teradata and Ab Initio deliver: • End-to-end solutions for integrating and processing data throughout the enterprise • Software that is flexible, efficient, and robust, with unlimited scalability • Professional and highly responsive support The Co>Operating System executes your application by creating and managing the processes and data flows that the components and arrows represent.
  • 61. Accenture Ab Initio Training 61 Graphical Development Environment GDE
  • 62. Accenture Ab Initio Training 62 The GDE The Graphical Development Environment (GDE) provides a graphical user interface into the services of the Co>Operating System. The Graphical Development Environment Enables you to create applications by dragging and dropping Components. Allows you to point and click operations on executable flow charts. The Co>Operating System can execute these flowcharts directly. Graphical monitoring of running applications allows you to quantify data volumes and execution times, helping spot opportunities for improving performance.
  • 63. Accenture Ab Initio Training 63 The Graph Model
  • 64. Accenture Ab Initio Training 64 The Component Library:  The Component Library: Reusable software Modules for Sorting, Data Transformation, database Loading Etc. The components adapt at runtime to the record formats and business rules controlling their behavior.  Ab Initio products have helped reduce a project’s development and research time significantly.
  • 65. Accenture Ab Initio Training 65 Components  Components may run on any computer running the Co>Operating System.  Different components do different jobs.  The particular work a component accomplishes depends upon its parameter settings.  Some parameters are data transformations, that is business rules to be applied to an input (s) to produce a required output.
  • 66. Accenture Ab Initio Training 66 3rd Party Components
  • 67. Accenture Ab Initio Training 67 EME  The Enterprise Meta>Environment (EME) is a high- performance object-oriented storage system that inventories and manages various kinds of information associated with Ab Initio applications. It provides storage for all aspects of your data processing system, from design information to operations data.  The EME also provides rich store for the applications themselves, including data formats and business rules. It acts as hub for data and definitions . Integrated metadata management provides the global and consolidated view of the structure and meaning of applications and data- information that is usually scattered throughout you business .
  • 68. Accenture Ab Initio Training 68 Benefits of EME The Enterprise Meta>Environment provides a rich store for applications and all of their associated information including :  Technical Metadata-Applications related business rules ,record formats and execution statistics  Business Metadata-User defined documentations of job functions ,roles and responsibilities. Metadata is data about data and is critical to understanding and driving your business process and computational resources .Storing and using metadata is as important to your business as storing and using data.
  • 69. Accenture Ab Initio Training 69 EME-Ab Initio Relevance  By integrating technical and business metadata ,you can grasp the entirety of your data processing – from operational to analytical systems.  The EME is completely integrated environment. The following figure shows how it fits in to the high level architecture of Ab Initio software.
  • 70. Accenture Ab Initio Training 70
  • 71. Accenture Ab Initio Training 71 Stepwise explanation of Ab Initio Architecture  You construct your application from the building blocks called components, manipulating them through the Graphical Development Environment (GDE).  You check in your applications to the EME.  The EME and GDE uses the underlining functionality of the Co>Operating System to perform many of their tasks. The Cooperating System units the distributed resources into a single “virtual computer” to run applications in parallel.  Ab Initio software runs on Unix ,Windows NT,MVS operating systems.
  • 72. Accenture Ab Initio Training 72 Stepwise explanation of Ab Initio Architecture - continued  Ab Initio connector applications extract metadata from third part metadata sources into the EME or extract it from the EME into a third party destination.  You view the results of project and application dependency analysis through a Web user interface .You also view and edit your business metadata through a web user interface.
  • 73. Accenture Ab Initio Training 73 EME :Various users constituency served The EME addresses the metadata needs of three different constituencies:  Business Users  Developers  System Administrators
  • 74. Accenture Ab Initio Training 74 EME :Various users constituency served  Business users are interested in exploiting data for analysis, in particular with regard to databases ,tables and columns.  Developers tend to be oriented towards applications ,needing to analyze the impact of potential program changes.  System Administrator and production personnel want job status information and run statistics.
  • 75. Accenture Ab Initio Training 75 EME Interfaces We can create and manage EME through 3 interfaces:  GDE  Web User Interface  Air Utility
  • 76. Accenture Ab Initio Training 76 Thank You End of Session 1

Notas do Editor

  1. Ab Initio Training
  2. Ab Initio Training
  3. Ab Initio Training
  4. Ab Initio Training
  5. Ab Initio Training
  6. Ab Initio Training
  7. Ab Initio Training