2. Learning Outcomes
8.1 Describe the roles and purposes of data
warehouses and data marts in an
organization
8.2 Compare the multidimensional nature of
data warehouses (and data marts) with
the two-dimensional nature of databases
8-2
3. Learning Outcomes
8.3 Identify the importance of ensuring the
cleanliness of information throughout an
organization
8.4 Explain the relationship between
business intelligence and a data
warehouse
8-3
4. History of Data Warehousing
• Data warehouses extend the transformation of
data into information
• In the 1990’s executives became less
concerned with the day-to-day business
operations and more concerned with overall
business functions
• The data warehouse provided the ability to
support decision making without disrupting the
day-to-day operations
8-4
5. Data Warehouse Fundamentals
• Data warehouse – a logical collection of
information – gathered from many different
operational databases – that supports business
analysis activities and decision-making tasks
• The primary purpose of a data warehouse is to
aggregate information throughout an
organization into a single repository for
decision-making purposes
8-5
6. Data Warehouse Fundamentals
• Extraction, transformation, and loading
(ETL) – a process that extracts information from
internal and external databases, transforms the
information using a common set of enterprise
definitions, and loads the information into a data
warehouse
• Data mart – contains a subset of data
warehouse information
8-6
8. Multidimensional Analysis
and Data Mining
• Databases contain information in a series
of two-dimensional tables
• In a data warehouse and data mart,
information is multidimensional, it
contains layers of columns and rows
– Dimension – a particular attribute of
information
8-8
9. Multidimensional Analysis
and Data Mining
• Cube – common term for the
representation of multidimensional
information
8-9
10. Multidimensional Analysis
and Data Mining
• Data mining – the process of analyzing data to
extract information not offered by the raw data
alone
• To perform data mining users need data-mining
tools
– Data-mining tool – uses a variety of techniques to
find patterns and relationships in large volumes of
information and infers rules that predict future
behavior and guide decision making
8-10
11. Information Cleansing or Scrubbing
• An organization must maintain high-
quality data in the data warehouse
• Information cleansing or scrubbing – a
process that weeds out and fixes or
discards inconsistent, incorrect, or
incomplete information
8-11
16. Business Intelligence
• Business intelligence – information that
people use to support their decision-
making efforts
• Principle BI enablers include:
– Technology
– People
– Culture
8-16
17. OPENING CASE STUDY QUESTIONS
It Takes A Village to Write an Encyclopedia
1. Determine how Wikipedia could use a data
warehouse to improve its business operations
2. Explain why Wikipedia must cleanse or scrub
the information in its data warehouse
3. Explain how a company could use information
from Wikipedia to gain business intelligence
8-17
18. CHAPTER EIGHT CASE
Mining the Data Warehouse
• According to a Merrill Lynch survey in
2006, business intelligence software and
data-mining tools were at the top of the
technology spending list of CIOs
• Ben & Jerry’s, California Pizza Kitchen,
and Noodles & Company are using
business intelligence and data mining in
new and exciting ways
8-18
19. Chapter Eight Case Questions
1. Explain how Ben & Jerry’s is using
business intelligence tools to remain
successful and competitive in a
saturated market
2. Identify why information cleansing and
scrubbing is critical to California Pizza
Kitchen’s business intelligence tool’s
success
8-19
20. Chapter Eight Case Questions
3. Illustrate why 100 percent accurate and
complete information is impossible for
Noodles & Company to obtain
4. Describe how each of the companies above is
using BI from their data warehouse to gain a
competitive advantage
8-20
22. UNIT CLOSING CASE ONE
Harrah’s – Gambling Big on Technology
1. Identify the effects poor information might have
on Harrah’s service-oriented business strategy
2. Summarize how Harrah’s uses database
technologies to implement its service-oriented
strategy
3. Harrah’s was one of the first casino companies
to find value in offering rewards to customers
who visit multiple Harrah’s locations. Describe
the effects on the company if it did not build
any integrations among the databases located
at each of its casinos 8-22
23. UNIT CLOSING CASE ONE
Harrah’s – Gambling Big on Technology
4. Estimate the potential impact to Harrah’s
business if there is a security breach in its
customer information
5. Explain the business effects if Harrah’s fails to
use data-mining tools to gather business
intelligence
6. Identify three different types of data marts
Harrah’s might want to build to help it analyze
its operational performance 8-23
24. UNIT CLOSING CASE ONE
Harrah’s – Gambling Big on Technology
7. Predict what might occur if Harrah’s fails to clean or
scrub its information before loading it into its data
warehouse
8. How could Harrah’s use data mining to increase
revenue?
8-24
25. UNIT CLOSING CASE TWO
Searching for Revenue - Google
1. Determine if Google’s search results are
examples of transactional information or
analytical information
2. Describe the ramifications on Google’s
business if the search information it presented
to its customers was of low quality
3. Explain how the Web site
RateMyProfessors.com solved its problem of
poor information
8-25
26. UNIT CLOSING CASE TWO
Searching for Revenue - Google
4. Identify the different types of entity classes that might be
stored in Google’s indexing database
5. Identify how Google could use a data warehouse to improve
its business
6. Explain why Google would need to scrub and cleanse the
information in its data warehouse
7. Identify a data mart that Google’s marketing and sales
department might use to track and analyze its AdWords
revenue 8-26
Notas do Editor
CLASSROOM OPENER GREAT BUSINESS DECISIONS – Bill Inmon – The Father of the Data Warehouse Bill Inmon, is recognized as the "father of the data warehouse" and co-creator of the "Corporate Information Factory." He has 35 years of experience in database technology management and data warehouse design. He is known globally for his seminars on developing data warehouses and has been a keynote speaker for every major computing association and many industry conferences, seminars, and tradeshows. As an author, Bill has written about a variety of topics on the building, usage, and maintenance of the data warehouse and the Corporate Information Factory. He has written more than 650 articles, many of them have been published in major computer journals such as Datamation, ComputerWorld, DM Review and Byte Magazine. Bill currently publishes a free weekly newsletter for the Business Intelligence Network, and has been a major contributor since its inception. http://www.b-eye-network.com/home/
8.1 Describe the roles and purposes of data warehouses and data marts in an organization The primary purpose of data warehouses and data marts are to perform analytical processing or OLAP The insights into organizational information that can be gained from analytical processing are instrumental in setting strategic directions and goals 8.2 Compare the multidimensional nature of data warehouses (and data marts) with the two-dimensional nature of databases Databases contain information in a series of two-dimensional tables, which means that you can only ever view two dimensions of information at one time. In a data warehouse and data mart, information is multidimensional, it contains layers of columns and rows. Each layer in a data warehouse or data mart represents information according to an additional dimension. Dimensions could include such things as products, promotions, stores, category, region, stock price, date, time, and even the weather. The ability to look at information from different dimensions can add tremendous business insight.
8.3 Identify the importance of ensuring the cleanliness of information throughout an organization An organization must maintain high-quality information in the data warehouse Information cleansing and scrubbing is a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information Without high-quality information the organization will be unable to make good business decisions 8.4 Explain the relationship between business intelligence and a data warehouse. A data warehouse is an enabler of business intelligence. The purpose of a data warehouse is to pull all kinds of disparate information into a single location where it is cleansed and scrubbed for analysis.
What is the primary difference between a database and data warehouse? The primary difference between a database and a data warehouse is that a database stores information for a single application, whereas a data warehouse stores information from multiple databases, or multiple applications, and external information such as industry information This enables cross-functional analysis, industry analysis, market analysis, etc., all from a single repository Data warehouses support only analytical processing (OLAP)
The ETL process gathers data from the internal and external databases and passes it to the data warehouse The ETL process also gathers data from the data warehouse and passes it to the data marts
The data warehouse modeled in the above figure compiles information from internal databases or transactional/operational databases and external databases through ETL It then send subsets of information to the data marts through the ETL process Ask your students to distinguish between a data warehouse and a data mart? Ans: A data warehouse has an enterprisewide organizational focus, while a data mart focuses on a subset of information for a given business unit such as finance
Each layer in a data warehouse or data mart represents information according to an additional dimension Dimensions could include such things as: Products Promotions Stores Category Region Stock price Date Time Weather Why is the ability to look at information based on different dimensions critical to a businesses success? Ans: The ability to look at information from different dimensions can add tremendous business insight By slicing-and-dicing the information a business can uncover great unexpected insights
Users can slice and dice the cube to drill down into the information Cube A represents store information (the layers), product information (the rows), and promotion information (the columns) Cube B represents a slice of information displaying promotion II for all products at all stores Cube C represents a slice of information displaying promotion III for product B at store 2 CLASSROOM EXERCISE Analyzing Multiple Dimensions of Information Jump! is a company that specializes in making sports equipment, primarily basketballs, footballs, and soccer balls. The company currently sells to four primary distributors and buys all of its raw materials and manufacturing materials from a single vendor. Break your students into groups and ask them to develop a single cube of information that would give the company the greatest insight into its business (or business intelligence) given the following choices: Product A, B, C, and D Distributor X, Y, and Z Promotion I, II, and III Sales Season Date/Time Salesperson Karen and John Vendor Smithson Remember you can pick only 3 dimensions of information for the cube, they need to pick the best 3 Product Sales Promotion These give the three most business-critical pieces of information
Data mining can begin at a summary information level (coarse granularity) and progress through increasing levels of detail (drilling down), or the reverse (drilling up) Data-mining tools include query tools, reporting tools, multidimensional analysis tools, statistical tools, and intelligent agents Ask your students to provide an example of what an accountant might discover through the use of data-mining tools Ans: An accountant could drill down into the details of all of the expense and revenue finding great business intelligence including which employees are spending the most amount of money on long-distance phone calls to which customers are returning the most products Could the data warehousing team at Enron have discovered the accounting inaccuracies that caused the company to go bankrupt? If the did spot them, what should the team have done?
This is a an excellent time to return to the information learned in Chapter 6 on high-quality and low-quality information What would happen if the information contained in the data warehouse was only about 70 percent accurate? Would you use this information to make business decisions? Is it realistic to assume that an organization could get to a 100% accuracy level on information contained in its data warehouse? No, it is too expensive
Taking a look at customer information highlights why information cleansing and scrubbing is necessary Customer information exists in several operational systems In each system all details of this customer information could change form the customer ID to contact information Determining which contact information is accurate and correct for this customer depends on the business process that is being executed
Ask your students if they have ever received more than one piece of identical mail, such as a flyer, catalog, or application If so, ask them why this might have occurred Could it have occurred because their name was in many different disparate systems? What is the cost to the business of sending multiple identical marketing materials to the same customers? Expense Risk of alienating customers
Information cleansing allows an organization to fix these types of inconsistencies and cleans the data in the data warehouse
Why do you think most businesses cannot achieve 100% accurate and complete information? If they had to choose a percentage for acceptable information what would it be and why? Some companies are willing to go as low as 20% complete just to find business intelligence Few organizations will go below 50% accurate – the information is useless if it is not accurate Achieving perfect information is almost impossible The more complete and accurate an organization wants to get its information, the more it costs The tradeoff between perfect information lies in accuracy verses completeness Accurate information means it is correct, while complete information means there are no blanks Most organizations determine a percentage high enough to make good decisions at a reasonable cost, such as 85% accurate and 65% complete
Technology Even the smallest company with BI software can do sophisticated analyses today that were unavailable to the largest organizations a generation ago. The largest companies today can create enterprisewide BI systems that compute and monitor metrics on virtually every variable important for managing the company. How is this possible? The answer is technology—the most significant enabler of business intelligence. People Understanding the role of people in BI allows organizations to systematically create insight and turn these insights into actions. Organizations can improve their decision making by having the right people making the decisions. This usually means a manager who is in the field and close to the customer rather than an analyst rich in data but poor in experience. In recent years “business intelligence for the masses” has been an important trend, and many organizations have made great strides in providing sophisticated yet simple analytical tools and information to a much larger user population than previously possible. Culture A key responsibility of executives is to shape and manage corporate culture. The extent to which the BI attitude flourishes in an organization depends in large part on the organization’s culture. Perhaps the most important step an organization can take to encourage BI is to measure the performance of the organization against a set of key indicators. The actions of publishing what the organization thinks are the most important indicators, measuring these indicators, and analyzing the results to guide improvement display a strong commitment to BI throughout the organization.
1. Determine how Wikipedia could use a data warehouse to improve its business operations. Wikipedia could use a data warehouse to build a repository of information from sources all over the world. The data warehouse could be used to perform detailed analysis on subject matters ranging from history to medicine. 2. Explain why Wikipedia must cleanse or scrub the information in its data warehouse. Wikipedia must maintain high quality information in its data warehouse. Information cleansing and scrubbing is a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information. Without high quality information Wikipedia will be unable to offer customers accurate and complete information. 3. Explain how a company could use information from Wikipedia to gain business intelligence. Business intelligence comes from such things as environmental scanning and market analysis. A company could use information from Wikipedia as external information in its data warehouse that could help it analyses new trends and technologies.
1. Explain how Ben & Jerry’s is using business intelligence tools to remain successful and competitive in a saturated market. Ben & jerry’s tracks the ingredients and life of each pint in a data warehouse. If a consumer calls in with a complaint, the consumer affairs staff matches up the pint with which supplier’s mile, eggs, or cherries, etc. did not meet the organization’s near-obsession with quality. 2. Identify why information cleansing and scrubbing is critical to California Pizza Kitchen’s business intelligence tool’s success. Financial statements must be as accurate and complete as possible. There have been too many instances in the past where shoddy financial statements have lead to financial crisis such as Enron and WorldCom. It does not matter how good or how many BI tools California Pizza Kitchen uses; if the core data is dirty the results will be inaccurate.
3. Illustrate why 100 percent accurate and complete information is impossible for Noodles & Company to obtain. Noodles & Company will never have 100 percent accurate and complete information. Perfect information is pricey. Achieving perfect information is almost impossible. The more complete and accurate an organization wants to get its information, the more it costs. The tradeoff between perfect information lies in accuracy verses completeness. Accurate information means it is correct, while complete information means there are no blanks. Most organizations determine a percentage high enough to make good decisions at a reasonable cost, such as 85% accurate and 65% complete. 4. Describe how each of the companies above is using BI from their data warehouse to gain a competitive advantage. Ben & Jerry’s is using BI to improve quality. Customers know that a pint of Ben & Jerry’s ice cream is of the highest quality. California Pizza Kitchen and Noodles & Company are using BI to improve financial analysis capabilities. Both companies can now receive more accurate and complete financial views of their businesses.
1. Identify the effects low-quality information might have on Harrah’s service-oriented business strategy Using the wrong information can lead to making the wrong decision. Making the wrong decision can cost time, money, and even reputations. Business decisions are only as good as the information used to make the decision. Low-quality information leads to low-quality business decisions. High-quality information can significantly improve the chances of making a good business decision and directly affect an organization’s bottom line. Harrah’s must use high-quality information whenever it is making business decisions, especially decisions that affect its service-oriented business strategy. 2. Summarize how Harrah’s uses database technologies to implement its service-oriented strategy Harrah’s implements a service-oriented strategy called Total Rewards. Total Rewards allows Harrah’s to give every single customer the appropriate amount of personal attention, whether it’s leaving sweets in the hotel room or offering free meals. Total Rewards works by providing each customer with an account and a corresponding card that the player swipes each time he or she plays a casino game. The program collects information, via a database, on the amount of time the customers gamble, their total winnings and losses, and their betting strategies. Customers earn points based on the amount of time they spend gambling, which they can then exchange for comps such as free dinners, hotel rooms, tickets to shows, and even cash. 3. Harrah’s was one of the first casino companies to find value in offering rewards to customers who visit multiple Harrah’s locations. Describe the effects on the company if it did not build any integrations among the databases located at each of its casinos Without database integration among its hotels and casinos, Harrah’s would be unable to determine what a customer’s true value is to the company. For example, a customer that spend $500,000 dollars at one casino might be treated like royalty. This same customer could visit another Harrah’s location, but since the information is not integrated, the new location would have no idea that they had a high-rolling customer on the premises and they might not treat the customer accordingly.
4. Estimate the potential impact to Harrah’s business if there is a security breach in its customer information Some customers have concerns regarding Harrah’s information collection strategy since they want to keep their gambling information private. If there was a security violation and sensitive customer information was compromised Harrah’s would risk losing its customers’ trust and their business. 5. Explain the effects if Harrah’s fails to use data-mining tools to gather business intelligence. Having terra bytes of data without anyway to analysis the data makes the data useless. Harrah’s must use data-mining tools to sift through the massive amounts of data in its warehouse to uncover the business intelligence that has given it a competitive advantage over its customers. 6. Identify three different types of data marts Harrah’s might want to build to help it analyze its operational performance Answers to this question will vary. Potential answers include (1) customers’ spending habits across properties, (2) repeat customer spending habits at a single location, (3) dealer sales at a location and across locations.
7. Predict what might occur if Harrah’s fails to clean or scrub its information before loading it into its data warehouse. Harrah’s must maintain high quality information in its data warehouse. Information cleansing and scrubbing is a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information. Without high quality information Harrah’s will be unable to make good business decisions and operate its service-oriented strategy. Potential business effects resulting from low quality information include: Inability to accurately track customers Difficulty identifying valuable customers Inability to identify selling opportunities Marketing to nonexistent customers Difficulty tracking revenue due to inaccurate invoices Inability to build strong customer relationships – which increases buyer power 8. How could Harrah’s use data mining to increase revenue? Harrah’s can use data mining to uncover customer patterns to ensure it is taking advantage of customer relationship management strategies with its customers. It could also use data mining to uncover patterns in food, drink, and room availability to optimize its supply chain.
1. Determine if Google’s search results are examples of transactional information or analytical information. From the customer’s perspective Google’s search results are an example of analytical information. They are using the information to make a decision or perform an analysis. From Google’s perspective each search result is an example of transactional information since it is their primary business process. 2. Describe the ramifications on Google’s business if the search information it presented to its customers was of low quality. Displaying links that do not work, links that have nothing to do with the query, or multiple duplication of links will cause customers to switch to a different search engine. If Google’s search results were of low quality, they would quickly lose business. Since providing search results is Google’s primary line of business, it must display high quality search results. 3. Explain how the Web site RateMyProfessors.com solved its problem of poor information. The developers of the Web site turned to Google’s API to create an automatic verification tool. If Google finds enough mentions in conjunction with a new professor or university to be added to the database, then it considers the information valid and posts it to the Web site.
4. Identify the different types of entity classes that might be stored in Google’s indexing database. Entity classes could include: DOCUMENT TITLE SEARCH TERM WORD LOCATION WEB PAGE 5. Identify how Google could use a data warehouse to improve its business. Google could use a data warehouse to contain not only internal organization information, but also external information such as market trends, competitor information, and industry trends. Google could then analyze its business across markets, among its competitors, and throughout different industries. 6. Explain why Google would need to scrub and cleanse the information in its data warehouse. Google must maintain high quality information in its data warehouse. Information cleansing and scrubbing is a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information. Without high quality information Google will be unable to make good business decisions. 7. Identify a data mart that Google’s marketing and sales department might use to track and analyze its AdWords revenue. One potential data mart might include information broken down by industry (products, telecommunications, health care, energy, travel, human services) and tracked against revenue by companies. This would tell Google which industries are using AdWords and which industries are untapped. It would also tell Google which customers in each industry are taking advantage of AdWords and perhaps would benefit from a specialized marketing plan, and which customers are not yet taking advantage of AdWords and might be interested in learning about the product.