2. [Redo to include Google Cert content – more methodical process especially the 6 steps of data
analysis Understand basic skills required to be a good data analyst. Rate yourself (1-10, 10
highest) on the following prerequisite skills for your first posting to the discussion forum (3
points)
Intellectual curiosity
Business acumen
Teamwork and presentation skills
Basic mathematical and statistical problem solving
Programming mindset (able to code when needed)
Strong grasp of data structures, be able to extract (SQL) and manipulate data (data wrangling)
Good sense of recognizing and visualizing patterns
[Ability to use all the tools -
Describe what you already know that’s related to Data Analytics (e.g. Programming, Machine
Learning, Statistics, Operations Management) in the week’s discussion forum (2 points).
Explain why analytics has become a critical and exciting part of doing business
Explain different flavors of data analytics, the fundamentals of the art and science of data-
analytic thinking.
Recognize examples of opportunities for business analytics from classic examples, and different
business domains and practices
2
3. Traditional Business Intelligence (BI)
Watch BI Video https://www.youtube.com/watch?v=LFnewuBsYiY
Data is valuable to business and companies are capturing only
a fraction of the potential value from data for competitive
advantage
Data analytics are changing the the basis for competition.
Leading companies are not only improving core operations, they
are building new business models
Network effects (something gains additional value as more
people use it) are benefitting companies that exploit their data
Recent advances in machine learning are enabling companies to
solve problems that could not be solved before.
3
5. 1958 Applied
Mathematics and
Statistics: First
Credit Scoring
System
1989 FICO
Score
Post World War II
Operations
Research – Data-
Driven Analysis
1970s
Decision
Support
Systems
1980s Data
Warehouse
and Business
Intelligence
1970s Relational
Databases (Codd)
Online
Analytical
Processing
(OLAP)
5
6. 1959 Machine
Learning
1989 Data Mining
and Knowledge
Discovery in
Databases
1997 Big Data
Naur 1966
Datalogy
1974 Principles of
Data Science
(Naur)
2005 Business
Analytics = Data
Science in
Business
Tukey 1962
Future of
Data Analysis
Tukey 1977
Exploratory Data
Analysis (EDA)
1997 Data Science
6
9. Business Intelligence (BI)
A broad category of applications, technologies, and processes for gathering, storing,
accessing, and analyzing data to help business users make better decisions (Watson 2009)
Data visualization and reporting for understanding “what happened and what is happening”
using charts, tables and dashboards (Shmueli 2018)
Business Analytics
Extensive use of data, statistical and quantitative analysis, explanatory and predictive
models, and fact-based management to drive decisions and actions (Davenport & Harris
2007)
Practice and art of bringing quantitative data to bear on decision-making. Includes BI and
other sophisticated techniques (Shmueli 2018)
Knowledge and Data Discovery (KDD)
The overall process of discovering useful knowledge from data (Fayyad et al 1996, AI Magazine)
Data Mining is one of the steps in KDD
The application of specific algorithms for extracting patterns from data. The other steps in
the KDD process, data preparation, selection, cleaning, incorporating prior knowledge, and
proper interpretation are essential to ensure useful knowledge is derived from the data
Data Science
Area of practice and study … that encompasses the techniques, tools, technologies, and
processes for making sense out of big data (Watson 2014)
Big Data
Big data is the term for a collection of data sets so large and complex that it becomes difficult
to process using on-hand database management tools or traditional data processing
applications. (Wikipedia)
9
11. Different kinds of business analytics
The traditional BI (link)
The novel – Google Flu Trends Video (link)
The really novel - Moneyball (link)
Even more novel – Climate.com Fieldview – [link]
Netflix $1M winner (link)
Machine learning focus -- automate, in other words, make
many, many, many decisions simultaneously
Analytics focus -- don't know how many decisions you want to
make before you begin? Looking for inspiration? Encountering
the unknown
Statistics focus – Looking for rigor and protecting managers
from making wrong decisions
11
12. Hurricane Frances was on its way, barreling across the
Caribbean, threatening a direct hit on Florida’s Atlantic coast.
Residents made for higher ground, but far away, in
Bentonville, Ark., executives at Wal- Mart Stores decided that
the situation offered a great opportunity for one of their
newest data-driven weapons ... predictive technology.
A week ahead of the storm’s landfall, Linda M. Dillman, Wal-
Mart’s chief information officer, pressed her staff to come up
with forecasts based on what had happened when Hurricane
Charley struck several weeks earlier. Backed by the trillions of
bytes’ worth of shopper history that is stored in Wal-Mart’s
data warehouse, she felt that the company could ‘start
predicting what’s going to happen, instead of waiting for it to
happen,’ as she put it. (Hays, New York Times, 2004)
12
13. In the 1980s data science helped predict bank loan defaults
by offering different customers different interest rates based
on risk assessment – higher risk>higher interest rates
No one thought of using it for credit card
Uniform pricing (typically 1.7%)
Customers would not like the differential pricing
Founders of Signet Bank viewed it as an information service
rather than a banking service
Applied predictive modeling to offer different terms (pricing,
credit limits, low initial rate balance transfer, cash back,
loyalty points, etc)
Works because of 80-20 rule, 20% of customers account for
80% of profits (actually > 100%)
13
14. By modeling profitability – make better offers to “best”
customers (skim the cream)
Credit card returns > commercial business
No existing model – had to build their own
Bought data – upfront losses
Conducted experiments by offering different terms to random
customers
Initially number of defaults soared (2.9% avg to 6%)
After several years, charge off rates dropped
Signet bank became Capital One – Largest credit card issuer
in the industry
14
15. Analytics can also be used to disrupt and break down decision
making and create havoc.
Cambridge Analytica (founded by Robert Mercer and Steve
Bannon) weaponized information by profiling millions of
Facebook users without their permission and targeted them
with fake news and deceptive data to disrupt the US 2016
election and over 200 other elections around the world
(Wikipedia, “Cambridge Analytica”)
Alexander Nix and Cambridge Analytica
https://www.youtube.com/watch?v=n8Dd5aVXLCc
15
16. Watch video
https://www.youtube.com/watch?v=_5PY1swcEEs
Billy Beane, General Manager at Oakland Athletics
baseball team, faced with a limited budget,
assembled a competitive team of undervalued
talent with the help of Peter Brand, a young Yale
economics graduate harboring radical ideas.
At the end, Oakland Athletics, the team that
finished the previous season with the worst record
in Major League Baseball, sets a new American
League record by winning 20 consecutive games in
103 years’ history of American League baseball and
that too with one of the lowest budgets in the
league.
17. Answer this question by yourself:
What is the single most important concept you got from the business
stories about analytics so far?
Think about an answer by yourself
Discuss with your partner your response and persuade your partner why
you are correct
Answer this question by yourself
What makes the Business Intelligence story a different kind of
analytics than the Walmart Story?
A: BI applies spreadsheets whereas Walmart applied a different
technology
B: BI was about what is happening whereas Walmart’s story was about
what is going to happen
C: BI was about the grocery shopping whereas Walmart’s story was about
general retailing
D: BI is about business decision making whereas Walmart was about
predictive analytics
Discuss with your partner your response and persuade your partner
why you are correct
17
18. Using the pieces of information supplied about the origins of
analytics, Google the parts you find more interesting
Share with your partner your findings
Share with the class what you thought was most interesting
18
19. ANALYTICS AND
AGRICULTURE – FROM
CLIMATE CORP TO
FIELDVIEW
Two Google employers founded WeatherBill in 2006 –
weather insurance to ski resorts, large events and farmers
WeatherBill became Climate Corp in 2011 to provide federal
crop insurance. In 2013 Monsanto bought CC for $1.1 Billion
Developed digital agriculture platforms to improve the
efficiency and productivity of their operations
19
20. Discuss with your partner the differences between statistics
and analytics
Which reasons are better, yours or your partner’s?
Core ideas
Data exploration and visualization
Classification – predict which class (category) item of interest falls
into
Prediction – predict value of a continuous variable rather than
classify category (e.g. Zillow value of homes)
Association rules and recommender systems – discover patterns or
rules of “what goes with what” (e.g. Netflix, customers and movies)
20
22. Analytics have been in use since the late 1800s. But Big Data
Analytics is only recently exploding because of:
Availability of high volume and variety of data from Internet sites,
online transactions, mobile devices and social media
Improvements in computational power and storage
Recent developments in more sophisticated software and algorithmic
techniques
Linking of separate databases and enhancement in SQL and NoSQL
technologies
The volume and variety of data have far outstripped the capacity to
manually analyze data
22
23. Where does the data come from?
Credit card transactions, loyalty cards, discount coupons, customer complaint calls,
plus (public) lifestyle studies
Target marketing
Find clusters of “model” customers who share the same characteristics: interest,
income level, spending habits, etc.
Determine customer purchasing patterns over time
Cross-market analysis
Associations/co-relations between product sales, & prediction based on such
association
Customer profiling
What types of customers buy what products (clustering or classification)
Customer requirement analysis
identifying the best products for different customers
predict what factors will attract new customers
Provision of summary information
multidimensional summary reports
statistical summary information (data central tendency and variation)
23
24. 24
Andrew Pole was hired by Target in 2002. Using analytics, he figured out certain patterns in
Target’s sales. He identified about 25 products that, when analyzed together, allowed him to
assign each shopper a “pregnancy prediction” score. More important, he could also estimate
the shopper’s due date to within a small window, so Target could send coupons timed to very
specific stages of her pregnancy.
A Target manager received an angry phone call from a parent: “My daughter got this in the
mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and
cribs? Are you trying to encourage her to get pregnant?” The manager didn’t have any idea
what the man was talking about. He looked at the mailer. Sure enough, it was addressed to
the man’s daughter and contained advertisements for maternity clothing, nursery furniture
and pictures of smiling infants. The manager apologized and then called a few days later to
apologize again. On the phone, though, the father was somewhat abashed. “I had a talk with
my daughter,” he said. “It turns out there’s been some activities in my house I haven’t been
completely aware of. She’s due in August. I owe you an apology.”
Which application of analytics is this?
25. Finance planning and asset evaluation
cash flow analysis and prediction
contingent claim analysis to evaluate assets
cross-sectional and time series analysis (financial-ratio, trend
analysis, etc.)
Resource planning
summarize and compare the resources and spending
Competition
monitor competitors and market directions
group customers into classes and a class-based pricing procedure
set pricing strategy in a highly competitive market
25
26. Approaches: Clustering & model construction for frauds,
outlier analysis
Applications: Health care, retail, credit card service, telecomm.
Auto insurance: ring of collisions
Money laundering: suspicious monetary transactions
Medical insurance
Professional patients, ring of doctors, and ring of references
Unnecessary or correlated screening tests
Telecommunications: phone-call fraud
Phone call model: destination of the call, duration, time of day or week.
Analyze patterns that deviate from an expected norm
Retail industry
Analysts estimate that 38% of retail shrink is due to dishonest
employees
Anti-terrorism
26
27. 27
You might know your own credit score, which is a measure of your
creditworthiness, which in turn is used by banks, retailers and car distributors to
measure your ability to pay your debt obligations on time. This measure
determines if you can get a loan, a store credit card or buy a car or a house. Have
you wondered how the system got started? In 1899, grocer owner brothers Cator
and Guy Woolford, compiled lists of customers based on their creditworthiness.
They saw an opportunity to start a new kind of business selling information, so
set up the Retail Credit Company. They began selling their information to
insurance companies and other industries and by the 1950s, sold it to the earliest
credit card issuers such as Diner’s Club, and quickly became profitable. Because
they collected sensitive data that included loans that hadn't been repaid, overdue
credit card payments, and multiple address changes by people constantly trying
to escape creditors, and were often sloppy in their data collection, they got into
legal trouble. Sued by various parties (one suit alleges them collecting
information that "may include 'facts, statistics, inaccuracies and rumors' ... about
virtually every phase of a person's life; his marital troubles, jobs, school history,
childhood, sex life, and political activities, and customers weren’t allowed to view
that information), in 1975, they changed their name to Equifax and began selling
credit reports to people other than businesses ….
28. 28
Earlier in 1956,William Fair, a mathematician with degrees from Cal Tech,
Stanford and Berkeley, and Earl Isaac, an electrical engineer, set up a
management consulting company to fix the complex credit checking process.
Probably using some multivariate analysis of some kind involving payment
history, loan utilization, credit history and credit mix, they launched their own
credit scoring system which later became known as FICO. In 1986, Robert
Hecht-Nielsen developed a neural network application for his company, HNC
Software, Inc., in 1986 to detect credit card fraud. In 2002, that company was
bought for $1 billion by FICO.
What application of analytics is this?
29. Forecasting and planning
Prevent overstocking and understocking
Reduce delivery time
Address customer needs
Inventory Management
Real-time capacity availability
Improve inventory distribution across warehouses
Transportation Management
Optimal routing
Streamline fulfillment
Sourcing
Optimize procurement and reduce purchasing costs
29
30. 30
In 1946, American Airlines began experimenting with electromechanical
reservation systems. It wasn’t until 1964 that with the help of IBM it launched
SABRE, the first real-time airline reservation system. In 1996, Sabre working
with a Bell company and Random House, created Travelocity to take advantage of
Internet sales and Microsoft entered the market with Expedia. In 2003, Oren
Etzioni, a computer science professor found out in flight that most of the
passengers on the plane had paid less than he did for his seat even though he had
purchased it earlier. Frustrated, he set out to build the first airfare predictor,
Farecast. Using a sample of 12,000 price observations from travel websites, he
predicted if the ticket price was likely to rise or fall, giving the purchaser more
information as to the best time to buy a ticket. Essentially the application would
deny the airline industry millions of dollars of potential revenue. By 2008 he was
already working on other goods like hotel rooms, concert tickets and used cars.
Before he could complete his work, Microsoft bought Farecast for $115 million and
integrated it into their Bing search engine. By 2012, Bing was correctly calling 75
percent of the the time and saving travelers an average of $50 per ticket. In 2014,
Microsoft killed its Bing Price Predictor and Google has since outflanked it with its
own Google Flight.
What applications of analytics is this?
31. 31
Many groups record data
regarding aviation safety
including the National
Transportation Safety Board
(NTSB) and the Federal Aviation
Administration (FAA)
Integrating data from different
sources as well as mining for
patterns from a mix of both
structured fields and free text is
a difficult task (Eric Bloedorn)
The goal of our initial analysis is
to determine how data mining
can be used to improve airline
safety by finding patterns that
predict safety problems
33. Based on what you learned from the previous slides, write a 1-
minute thesis on the participation forum about any one
application of analytics that interests you and define the
following: (3 points)
1) What were the indicators of a potential to disrupt that
area of industry?
2) What is the archetype of the disruption?
33
34. “The ability to take data – to be able to understand it, to process
it, to extract value from it, to visualize it, to communicate it’s
going to be a hugely important skill in the next decades, not only
at the professional level but even at the educational level for
elementary school kids, for high school kids, for college kids … I
keep saying that the sexy job in the next 10 years will be
statisticians, and I’m not kidding.” Hal Varian, Google, McKinsey
Quarterly, 2009.
Thomas Davenport’s “Data Scientist: The Sexiest Job of the 21st
Century,” HBR, 2012
Data analytic thinking involves viewing business problems from a
data perspective and understanding principles of extracting
useful knowledge from data.
As you get better at data-analytic thinking you will develop
intuition as to how and where to apply creativity and domain
knowledge.
34
35. Explore R Built-in datasets
Load the datasets package
Find the two different titanic datasets (Titanic vs titanic)
Stanford University
http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/
Amazon
http://aws.amazon.com/datasets?_encoding=UTF8&jiveRedirect=1
Big Machine Learning
http://blog.bigml.com/2013/02/28/data-data-data-thousands-of-public-data-sources/
Data Market
http://datamarket.com/
Data Source Handbook by O’Reilly
http://shop.oreilly.com/product/0636920018254.do
UCI Data Set
http://www.sgi.com/tech/mlc/db/
35
36. Spreadsheets
Query languages
Visualization
Data Mining
It’s all very technical! I know
36