SlideShare uma empresa Scribd logo
1 de 36
Nik Rushdi Hassan
University of Minnesota Duluth
1
 [Redo to include Google Cert content – more methodical process especially the 6 steps of data
analysis Understand basic skills required to be a good data analyst. Rate yourself (1-10, 10
highest) on the following prerequisite skills for your first posting to the discussion forum (3
points)
 Intellectual curiosity
 Business acumen
 Teamwork and presentation skills
 Basic mathematical and statistical problem solving
 Programming mindset (able to code when needed)
 Strong grasp of data structures, be able to extract (SQL) and manipulate data (data wrangling)
 Good sense of recognizing and visualizing patterns
 [Ability to use all the tools -
 Describe what you already know that’s related to Data Analytics (e.g. Programming, Machine
Learning, Statistics, Operations Management) in the week’s discussion forum (2 points).
 Explain why analytics has become a critical and exciting part of doing business
 Explain different flavors of data analytics, the fundamentals of the art and science of data-
analytic thinking.
 Recognize examples of opportunities for business analytics from classic examples, and different
business domains and practices
2
 Traditional Business Intelligence (BI)
 Watch BI Video https://www.youtube.com/watch?v=LFnewuBsYiY
 Data is valuable to business and companies are capturing only
a fraction of the potential value from data for competitive
advantage
 Data analytics are changing the the basis for competition.
Leading companies are not only improving core operations, they
are building new business models
 Network effects (something gains additional value as more
people use it) are benefitting companies that exploit their data
 Recent advances in machine learning are enabling companies to
solve problems that could not be solved before.
3
4
1958 Applied
Mathematics and
Statistics: First
Credit Scoring
System
1989 FICO
Score
Post World War II
Operations
Research – Data-
Driven Analysis
1970s
Decision
Support
Systems
1980s Data
Warehouse
and Business
Intelligence
1970s Relational
Databases (Codd)
Online
Analytical
Processing
(OLAP)
5
1959 Machine
Learning
1989 Data Mining
and Knowledge
Discovery in
Databases
1997 Big Data
Naur 1966
Datalogy
1974 Principles of
Data Science
(Naur)
2005 Business
Analytics = Data
Science in
Business
Tukey 1962
Future of
Data Analysis
Tukey 1977
Exploratory Data
Analysis (EDA)
1997 Data Science
6
7
Data Analytics
Computer
Science
Statistics
Mathematics
Engineering
Other
Disciplines
MIS
8
 Business Intelligence (BI)
 A broad category of applications, technologies, and processes for gathering, storing,
accessing, and analyzing data to help business users make better decisions (Watson 2009)
 Data visualization and reporting for understanding “what happened and what is happening”
using charts, tables and dashboards (Shmueli 2018)
 Business Analytics
 Extensive use of data, statistical and quantitative analysis, explanatory and predictive
models, and fact-based management to drive decisions and actions (Davenport & Harris
2007)
 Practice and art of bringing quantitative data to bear on decision-making. Includes BI and
other sophisticated techniques (Shmueli 2018)
 Knowledge and Data Discovery (KDD)
 The overall process of discovering useful knowledge from data (Fayyad et al 1996, AI Magazine)
 Data Mining is one of the steps in KDD
 The application of specific algorithms for extracting patterns from data. The other steps in
the KDD process, data preparation, selection, cleaning, incorporating prior knowledge, and
proper interpretation are essential to ensure useful knowledge is derived from the data
 Data Science
 Area of practice and study … that encompasses the techniques, tools, technologies, and
processes for making sense out of big data (Watson 2014)
 Big Data
 Big data is the term for a collection of data sets so large and complex that it becomes difficult
to process using on-hand database management tools or traditional data processing
applications. (Wikipedia)
9
10
(Fayyad et al 1996, AI Magazine)
 Different kinds of business analytics
 The traditional BI (link)
 The novel – Google Flu Trends Video (link)
 The really novel - Moneyball (link)
 Even more novel – Climate.com Fieldview – [link]
 Netflix $1M winner (link)
 Machine learning focus -- automate, in other words, make
many, many, many decisions simultaneously
 Analytics focus -- don't know how many decisions you want to
make before you begin? Looking for inspiration? Encountering
the unknown
 Statistics focus – Looking for rigor and protecting managers
from making wrong decisions
11
 Hurricane Frances was on its way, barreling across the
Caribbean, threatening a direct hit on Florida’s Atlantic coast.
Residents made for higher ground, but far away, in
Bentonville, Ark., executives at Wal- Mart Stores decided that
the situation offered a great opportunity for one of their
newest data-driven weapons ... predictive technology.
 A week ahead of the storm’s landfall, Linda M. Dillman, Wal-
Mart’s chief information officer, pressed her staff to come up
with forecasts based on what had happened when Hurricane
Charley struck several weeks earlier. Backed by the trillions of
bytes’ worth of shopper history that is stored in Wal-Mart’s
data warehouse, she felt that the company could ‘start
predicting what’s going to happen, instead of waiting for it to
happen,’ as she put it. (Hays, New York Times, 2004)
12
 In the 1980s data science helped predict bank loan defaults
by offering different customers different interest rates based
on risk assessment – higher risk>higher interest rates
 No one thought of using it for credit card
 Uniform pricing (typically 1.7%)
 Customers would not like the differential pricing
 Founders of Signet Bank viewed it as an information service
rather than a banking service
 Applied predictive modeling to offer different terms (pricing,
credit limits, low initial rate balance transfer, cash back,
loyalty points, etc)
 Works because of 80-20 rule, 20% of customers account for
80% of profits (actually > 100%)
13
 By modeling profitability – make better offers to “best”
customers (skim the cream)
 Credit card returns > commercial business
 No existing model – had to build their own
 Bought data – upfront losses
 Conducted experiments by offering different terms to random
customers
 Initially number of defaults soared (2.9% avg to 6%)
 After several years, charge off rates dropped
 Signet bank became Capital One – Largest credit card issuer
in the industry
14
 Analytics can also be used to disrupt and break down decision
making and create havoc.
 Cambridge Analytica (founded by Robert Mercer and Steve
Bannon) weaponized information by profiling millions of
Facebook users without their permission and targeted them
with fake news and deceptive data to disrupt the US 2016
election and over 200 other elections around the world
(Wikipedia, “Cambridge Analytica”)
 Alexander Nix and Cambridge Analytica
 https://www.youtube.com/watch?v=n8Dd5aVXLCc
15
 Watch video
https://www.youtube.com/watch?v=_5PY1swcEEs
 Billy Beane, General Manager at Oakland Athletics
baseball team, faced with a limited budget,
assembled a competitive team of undervalued
talent with the help of Peter Brand, a young Yale
economics graduate harboring radical ideas.
 At the end, Oakland Athletics, the team that
finished the previous season with the worst record
in Major League Baseball, sets a new American
League record by winning 20 consecutive games in
103 years’ history of American League baseball and
that too with one of the lowest budgets in the
league.
 Answer this question by yourself:
 What is the single most important concept you got from the business
stories about analytics so far?
 Think about an answer by yourself
 Discuss with your partner your response and persuade your partner why
you are correct
 Answer this question by yourself
 What makes the Business Intelligence story a different kind of
analytics than the Walmart Story?
 A: BI applies spreadsheets whereas Walmart applied a different
technology
 B: BI was about what is happening whereas Walmart’s story was about
what is going to happen
 C: BI was about the grocery shopping whereas Walmart’s story was about
general retailing
 D: BI is about business decision making whereas Walmart was about
predictive analytics
 Discuss with your partner your response and persuade your partner
why you are correct
17
 Using the pieces of information supplied about the origins of
analytics, Google the parts you find more interesting
 Share with your partner your findings
 Share with the class what you thought was most interesting
18
ANALYTICS AND
AGRICULTURE – FROM
CLIMATE CORP TO
FIELDVIEW
 Two Google employers founded WeatherBill in 2006 –
weather insurance to ski resorts, large events and farmers
 WeatherBill became Climate Corp in 2011 to provide federal
crop insurance. In 2013 Monsanto bought CC for $1.1 Billion
 Developed digital agriculture platforms to improve the
efficiency and productivity of their operations
19
 Discuss with your partner the differences between statistics
and analytics
 Which reasons are better, yours or your partner’s?
 Core ideas
 Data exploration and visualization
 Classification – predict which class (category) item of interest falls
into
 Prediction – predict value of a continuous variable rather than
classify category (e.g. Zillow value of homes)
 Association rules and recommender systems – discover patterns or
rules of “what goes with what” (e.g. Netflix, customers and movies)
20
21
 Analytics have been in use since the late 1800s. But Big Data
Analytics is only recently exploding because of:
 Availability of high volume and variety of data from Internet sites,
online transactions, mobile devices and social media
 Improvements in computational power and storage
 Recent developments in more sophisticated software and algorithmic
techniques
 Linking of separate databases and enhancement in SQL and NoSQL
technologies
 The volume and variety of data have far outstripped the capacity to
manually analyze data
22
 Where does the data come from?
 Credit card transactions, loyalty cards, discount coupons, customer complaint calls,
plus (public) lifestyle studies
 Target marketing
 Find clusters of “model” customers who share the same characteristics: interest,
income level, spending habits, etc.
 Determine customer purchasing patterns over time
 Cross-market analysis
 Associations/co-relations between product sales, & prediction based on such
association
 Customer profiling
 What types of customers buy what products (clustering or classification)
 Customer requirement analysis
 identifying the best products for different customers
 predict what factors will attract new customers
 Provision of summary information
 multidimensional summary reports
 statistical summary information (data central tendency and variation)
23
24
Andrew Pole was hired by Target in 2002. Using analytics, he figured out certain patterns in
Target’s sales. He identified about 25 products that, when analyzed together, allowed him to
assign each shopper a “pregnancy prediction” score. More important, he could also estimate
the shopper’s due date to within a small window, so Target could send coupons timed to very
specific stages of her pregnancy.
A Target manager received an angry phone call from a parent: “My daughter got this in the
mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and
cribs? Are you trying to encourage her to get pregnant?” The manager didn’t have any idea
what the man was talking about. He looked at the mailer. Sure enough, it was addressed to
the man’s daughter and contained advertisements for maternity clothing, nursery furniture
and pictures of smiling infants. The manager apologized and then called a few days later to
apologize again. On the phone, though, the father was somewhat abashed. “I had a talk with
my daughter,” he said. “It turns out there’s been some activities in my house I haven’t been
completely aware of. She’s due in August. I owe you an apology.”
Which application of analytics is this?
 Finance planning and asset evaluation
 cash flow analysis and prediction
 contingent claim analysis to evaluate assets
 cross-sectional and time series analysis (financial-ratio, trend
analysis, etc.)
 Resource planning
 summarize and compare the resources and spending
 Competition
 monitor competitors and market directions
 group customers into classes and a class-based pricing procedure
 set pricing strategy in a highly competitive market
25
 Approaches: Clustering & model construction for frauds,
outlier analysis
 Applications: Health care, retail, credit card service, telecomm.
 Auto insurance: ring of collisions
 Money laundering: suspicious monetary transactions
 Medical insurance
 Professional patients, ring of doctors, and ring of references
 Unnecessary or correlated screening tests
 Telecommunications: phone-call fraud
 Phone call model: destination of the call, duration, time of day or week.
Analyze patterns that deviate from an expected norm
 Retail industry
 Analysts estimate that 38% of retail shrink is due to dishonest
employees
 Anti-terrorism
26
27
You might know your own credit score, which is a measure of your
creditworthiness, which in turn is used by banks, retailers and car distributors to
measure your ability to pay your debt obligations on time. This measure
determines if you can get a loan, a store credit card or buy a car or a house. Have
you wondered how the system got started? In 1899, grocer owner brothers Cator
and Guy Woolford, compiled lists of customers based on their creditworthiness.
They saw an opportunity to start a new kind of business selling information, so
set up the Retail Credit Company. They began selling their information to
insurance companies and other industries and by the 1950s, sold it to the earliest
credit card issuers such as Diner’s Club, and quickly became profitable. Because
they collected sensitive data that included loans that hadn't been repaid, overdue
credit card payments, and multiple address changes by people constantly trying
to escape creditors, and were often sloppy in their data collection, they got into
legal trouble. Sued by various parties (one suit alleges them collecting
information that "may include 'facts, statistics, inaccuracies and rumors' ... about
virtually every phase of a person's life; his marital troubles, jobs, school history,
childhood, sex life, and political activities, and customers weren’t allowed to view
that information), in 1975, they changed their name to Equifax and began selling
credit reports to people other than businesses ….
28
Earlier in 1956,William Fair, a mathematician with degrees from Cal Tech,
Stanford and Berkeley, and Earl Isaac, an electrical engineer, set up a
management consulting company to fix the complex credit checking process.
Probably using some multivariate analysis of some kind involving payment
history, loan utilization, credit history and credit mix, they launched their own
credit scoring system which later became known as FICO. In 1986, Robert
Hecht-Nielsen developed a neural network application for his company, HNC
Software, Inc., in 1986 to detect credit card fraud. In 2002, that company was
bought for $1 billion by FICO.
What application of analytics is this?
 Forecasting and planning
 Prevent overstocking and understocking
 Reduce delivery time
 Address customer needs
 Inventory Management
 Real-time capacity availability
 Improve inventory distribution across warehouses
 Transportation Management
 Optimal routing
 Streamline fulfillment
 Sourcing
 Optimize procurement and reduce purchasing costs
29
30
In 1946, American Airlines began experimenting with electromechanical
reservation systems. It wasn’t until 1964 that with the help of IBM it launched
SABRE, the first real-time airline reservation system. In 1996, Sabre working
with a Bell company and Random House, created Travelocity to take advantage of
Internet sales and Microsoft entered the market with Expedia. In 2003, Oren
Etzioni, a computer science professor found out in flight that most of the
passengers on the plane had paid less than he did for his seat even though he had
purchased it earlier. Frustrated, he set out to build the first airfare predictor,
Farecast. Using a sample of 12,000 price observations from travel websites, he
predicted if the ticket price was likely to rise or fall, giving the purchaser more
information as to the best time to buy a ticket. Essentially the application would
deny the airline industry millions of dollars of potential revenue. By 2008 he was
already working on other goods like hotel rooms, concert tickets and used cars.
Before he could complete his work, Microsoft bought Farecast for $115 million and
integrated it into their Bing search engine. By 2012, Bing was correctly calling 75
percent of the the time and saving travelers an average of $50 per ticket. In 2014,
Microsoft killed its Bing Price Predictor and Google has since outflanked it with its
own Google Flight.
What applications of analytics is this?
31
Many groups record data
regarding aviation safety
including the National
Transportation Safety Board
(NTSB) and the Federal Aviation
Administration (FAA)
Integrating data from different
sources as well as mining for
patterns from a mix of both
structured fields and free text is
a difficult task (Eric Bloedorn)
The goal of our initial analysis is
to determine how data mining
can be used to improve airline
safety by finding patterns that
predict safety problems
32
 Based on what you learned from the previous slides, write a 1-
minute thesis on the participation forum about any one
application of analytics that interests you and define the
following: (3 points)
 1) What were the indicators of a potential to disrupt that
area of industry?
 2) What is the archetype of the disruption?
33
 “The ability to take data – to be able to understand it, to process
it, to extract value from it, to visualize it, to communicate it’s
going to be a hugely important skill in the next decades, not only
at the professional level but even at the educational level for
elementary school kids, for high school kids, for college kids … I
keep saying that the sexy job in the next 10 years will be
statisticians, and I’m not kidding.” Hal Varian, Google, McKinsey
Quarterly, 2009.
 Thomas Davenport’s “Data Scientist: The Sexiest Job of the 21st
Century,” HBR, 2012
 Data analytic thinking involves viewing business problems from a
data perspective and understanding principles of extracting
useful knowledge from data.
 As you get better at data-analytic thinking you will develop
intuition as to how and where to apply creativity and domain
knowledge.
34
 Explore R Built-in datasets
 Load the datasets package
 Find the two different titanic datasets (Titanic vs titanic)
 Stanford University
 http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/
 Amazon
 http://aws.amazon.com/datasets?_encoding=UTF8&jiveRedirect=1
 Big Machine Learning
 http://blog.bigml.com/2013/02/28/data-data-data-thousands-of-public-data-sources/
 Data Market
 http://datamarket.com/
 Data Source Handbook by O’Reilly
 http://shop.oreilly.com/product/0636920018254.do
 UCI Data Set
 http://www.sgi.com/tech/mlc/db/
35
 Spreadsheets
 Query languages
 Visualization
 Data Mining
 It’s all very technical! I know
36

Mais conteúdo relacionado

Semelhante a Wk02-Introduction to DA.pptx

Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieSunil Ranka
 
Big Data Analytics Research Report
Big Data Analytics Research ReportBig Data Analytics Research Report
Big Data Analytics Research ReportIla Group
 
Bit120 m02 l02 - valuing information
Bit120   m02 l02 - valuing informationBit120   m02 l02 - valuing information
Bit120 m02 l02 - valuing informationNeumontStudio
 
Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesT.S. Lim
 
big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...johnmutiso245
 
big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...johnmutiso245
 
Carlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for businessCarlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for businessCarlo Vaccari
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data scienceVipul Kalamkar
 
Toward trusted smart statistics
Toward trusted smart statisticsToward trusted smart statistics
Toward trusted smart statisticsbarghouthi2016
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analyticsCapgemini
 
Data Analytics & Visualization (Introduction)
Data Analytics & Visualization (Introduction)Data Analytics & Visualization (Introduction)
Data Analytics & Visualization (Introduction)Dolapo Amusat
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxstilliegeorgiana
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxPrabhaJoshi4
 
Big Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonBig Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonSocietyConsulting
 

Semelhante a Wk02-Introduction to DA.pptx (20)

Big data
Big dataBig data
Big data
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A Lie
 
Big Data Analytics Research Report
Big Data Analytics Research ReportBig Data Analytics Research Report
Big Data Analytics Research Report
 
Intro.pptx
Intro.pptxIntro.pptx
Intro.pptx
 
Business analytics
Business analyticsBusiness analytics
Business analytics
 
Bit120 m02 l02 - valuing information
Bit120   m02 l02 - valuing informationBit120   m02 l02 - valuing information
Bit120 m02 l02 - valuing information
 
Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in Businesses
 
Big data
Big dataBig data
Big data
 
big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...
 
big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...
 
Carlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for businessCarlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for business
 
The 25 Predictions About The Future Of Big Data
The 25 Predictions About The Future Of Big DataThe 25 Predictions About The Future Of Big Data
The 25 Predictions About The Future Of Big Data
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
Bigdata Hadoop introduction
Bigdata Hadoop introductionBigdata Hadoop introduction
Bigdata Hadoop introduction
 
Toward trusted smart statistics
Toward trusted smart statisticsToward trusted smart statistics
Toward trusted smart statistics
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analytics
 
Data Analytics & Visualization (Introduction)
Data Analytics & Visualization (Introduction)Data Analytics & Visualization (Introduction)
Data Analytics & Visualization (Introduction)
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
 
Big Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonBig Data Meetup by Chad Richeson
Big Data Meetup by Chad Richeson
 

Último

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 

Último (20)

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 

Wk02-Introduction to DA.pptx

  • 1. Nik Rushdi Hassan University of Minnesota Duluth 1
  • 2.  [Redo to include Google Cert content – more methodical process especially the 6 steps of data analysis Understand basic skills required to be a good data analyst. Rate yourself (1-10, 10 highest) on the following prerequisite skills for your first posting to the discussion forum (3 points)  Intellectual curiosity  Business acumen  Teamwork and presentation skills  Basic mathematical and statistical problem solving  Programming mindset (able to code when needed)  Strong grasp of data structures, be able to extract (SQL) and manipulate data (data wrangling)  Good sense of recognizing and visualizing patterns  [Ability to use all the tools -  Describe what you already know that’s related to Data Analytics (e.g. Programming, Machine Learning, Statistics, Operations Management) in the week’s discussion forum (2 points).  Explain why analytics has become a critical and exciting part of doing business  Explain different flavors of data analytics, the fundamentals of the art and science of data- analytic thinking.  Recognize examples of opportunities for business analytics from classic examples, and different business domains and practices 2
  • 3.  Traditional Business Intelligence (BI)  Watch BI Video https://www.youtube.com/watch?v=LFnewuBsYiY  Data is valuable to business and companies are capturing only a fraction of the potential value from data for competitive advantage  Data analytics are changing the the basis for competition. Leading companies are not only improving core operations, they are building new business models  Network effects (something gains additional value as more people use it) are benefitting companies that exploit their data  Recent advances in machine learning are enabling companies to solve problems that could not be solved before. 3
  • 4. 4
  • 5. 1958 Applied Mathematics and Statistics: First Credit Scoring System 1989 FICO Score Post World War II Operations Research – Data- Driven Analysis 1970s Decision Support Systems 1980s Data Warehouse and Business Intelligence 1970s Relational Databases (Codd) Online Analytical Processing (OLAP) 5
  • 6. 1959 Machine Learning 1989 Data Mining and Knowledge Discovery in Databases 1997 Big Data Naur 1966 Datalogy 1974 Principles of Data Science (Naur) 2005 Business Analytics = Data Science in Business Tukey 1962 Future of Data Analysis Tukey 1977 Exploratory Data Analysis (EDA) 1997 Data Science 6
  • 8. 8
  • 9.  Business Intelligence (BI)  A broad category of applications, technologies, and processes for gathering, storing, accessing, and analyzing data to help business users make better decisions (Watson 2009)  Data visualization and reporting for understanding “what happened and what is happening” using charts, tables and dashboards (Shmueli 2018)  Business Analytics  Extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions (Davenport & Harris 2007)  Practice and art of bringing quantitative data to bear on decision-making. Includes BI and other sophisticated techniques (Shmueli 2018)  Knowledge and Data Discovery (KDD)  The overall process of discovering useful knowledge from data (Fayyad et al 1996, AI Magazine)  Data Mining is one of the steps in KDD  The application of specific algorithms for extracting patterns from data. The other steps in the KDD process, data preparation, selection, cleaning, incorporating prior knowledge, and proper interpretation are essential to ensure useful knowledge is derived from the data  Data Science  Area of practice and study … that encompasses the techniques, tools, technologies, and processes for making sense out of big data (Watson 2014)  Big Data  Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. (Wikipedia) 9
  • 10. 10 (Fayyad et al 1996, AI Magazine)
  • 11.  Different kinds of business analytics  The traditional BI (link)  The novel – Google Flu Trends Video (link)  The really novel - Moneyball (link)  Even more novel – Climate.com Fieldview – [link]  Netflix $1M winner (link)  Machine learning focus -- automate, in other words, make many, many, many decisions simultaneously  Analytics focus -- don't know how many decisions you want to make before you begin? Looking for inspiration? Encountering the unknown  Statistics focus – Looking for rigor and protecting managers from making wrong decisions 11
  • 12.  Hurricane Frances was on its way, barreling across the Caribbean, threatening a direct hit on Florida’s Atlantic coast. Residents made for higher ground, but far away, in Bentonville, Ark., executives at Wal- Mart Stores decided that the situation offered a great opportunity for one of their newest data-driven weapons ... predictive technology.  A week ahead of the storm’s landfall, Linda M. Dillman, Wal- Mart’s chief information officer, pressed her staff to come up with forecasts based on what had happened when Hurricane Charley struck several weeks earlier. Backed by the trillions of bytes’ worth of shopper history that is stored in Wal-Mart’s data warehouse, she felt that the company could ‘start predicting what’s going to happen, instead of waiting for it to happen,’ as she put it. (Hays, New York Times, 2004) 12
  • 13.  In the 1980s data science helped predict bank loan defaults by offering different customers different interest rates based on risk assessment – higher risk>higher interest rates  No one thought of using it for credit card  Uniform pricing (typically 1.7%)  Customers would not like the differential pricing  Founders of Signet Bank viewed it as an information service rather than a banking service  Applied predictive modeling to offer different terms (pricing, credit limits, low initial rate balance transfer, cash back, loyalty points, etc)  Works because of 80-20 rule, 20% of customers account for 80% of profits (actually > 100%) 13
  • 14.  By modeling profitability – make better offers to “best” customers (skim the cream)  Credit card returns > commercial business  No existing model – had to build their own  Bought data – upfront losses  Conducted experiments by offering different terms to random customers  Initially number of defaults soared (2.9% avg to 6%)  After several years, charge off rates dropped  Signet bank became Capital One – Largest credit card issuer in the industry 14
  • 15.  Analytics can also be used to disrupt and break down decision making and create havoc.  Cambridge Analytica (founded by Robert Mercer and Steve Bannon) weaponized information by profiling millions of Facebook users without their permission and targeted them with fake news and deceptive data to disrupt the US 2016 election and over 200 other elections around the world (Wikipedia, “Cambridge Analytica”)  Alexander Nix and Cambridge Analytica  https://www.youtube.com/watch?v=n8Dd5aVXLCc 15
  • 16.  Watch video https://www.youtube.com/watch?v=_5PY1swcEEs  Billy Beane, General Manager at Oakland Athletics baseball team, faced with a limited budget, assembled a competitive team of undervalued talent with the help of Peter Brand, a young Yale economics graduate harboring radical ideas.  At the end, Oakland Athletics, the team that finished the previous season with the worst record in Major League Baseball, sets a new American League record by winning 20 consecutive games in 103 years’ history of American League baseball and that too with one of the lowest budgets in the league.
  • 17.  Answer this question by yourself:  What is the single most important concept you got from the business stories about analytics so far?  Think about an answer by yourself  Discuss with your partner your response and persuade your partner why you are correct  Answer this question by yourself  What makes the Business Intelligence story a different kind of analytics than the Walmart Story?  A: BI applies spreadsheets whereas Walmart applied a different technology  B: BI was about what is happening whereas Walmart’s story was about what is going to happen  C: BI was about the grocery shopping whereas Walmart’s story was about general retailing  D: BI is about business decision making whereas Walmart was about predictive analytics  Discuss with your partner your response and persuade your partner why you are correct 17
  • 18.  Using the pieces of information supplied about the origins of analytics, Google the parts you find more interesting  Share with your partner your findings  Share with the class what you thought was most interesting 18
  • 19. ANALYTICS AND AGRICULTURE – FROM CLIMATE CORP TO FIELDVIEW  Two Google employers founded WeatherBill in 2006 – weather insurance to ski resorts, large events and farmers  WeatherBill became Climate Corp in 2011 to provide federal crop insurance. In 2013 Monsanto bought CC for $1.1 Billion  Developed digital agriculture platforms to improve the efficiency and productivity of their operations 19
  • 20.  Discuss with your partner the differences between statistics and analytics  Which reasons are better, yours or your partner’s?  Core ideas  Data exploration and visualization  Classification – predict which class (category) item of interest falls into  Prediction – predict value of a continuous variable rather than classify category (e.g. Zillow value of homes)  Association rules and recommender systems – discover patterns or rules of “what goes with what” (e.g. Netflix, customers and movies) 20
  • 21. 21
  • 22.  Analytics have been in use since the late 1800s. But Big Data Analytics is only recently exploding because of:  Availability of high volume and variety of data from Internet sites, online transactions, mobile devices and social media  Improvements in computational power and storage  Recent developments in more sophisticated software and algorithmic techniques  Linking of separate databases and enhancement in SQL and NoSQL technologies  The volume and variety of data have far outstripped the capacity to manually analyze data 22
  • 23.  Where does the data come from?  Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies  Target marketing  Find clusters of “model” customers who share the same characteristics: interest, income level, spending habits, etc.  Determine customer purchasing patterns over time  Cross-market analysis  Associations/co-relations between product sales, & prediction based on such association  Customer profiling  What types of customers buy what products (clustering or classification)  Customer requirement analysis  identifying the best products for different customers  predict what factors will attract new customers  Provision of summary information  multidimensional summary reports  statistical summary information (data central tendency and variation) 23
  • 24. 24 Andrew Pole was hired by Target in 2002. Using analytics, he figured out certain patterns in Target’s sales. He identified about 25 products that, when analyzed together, allowed him to assign each shopper a “pregnancy prediction” score. More important, he could also estimate the shopper’s due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy. A Target manager received an angry phone call from a parent: “My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?” The manager didn’t have any idea what the man was talking about. He looked at the mailer. Sure enough, it was addressed to the man’s daughter and contained advertisements for maternity clothing, nursery furniture and pictures of smiling infants. The manager apologized and then called a few days later to apologize again. On the phone, though, the father was somewhat abashed. “I had a talk with my daughter,” he said. “It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.” Which application of analytics is this?
  • 25.  Finance planning and asset evaluation  cash flow analysis and prediction  contingent claim analysis to evaluate assets  cross-sectional and time series analysis (financial-ratio, trend analysis, etc.)  Resource planning  summarize and compare the resources and spending  Competition  monitor competitors and market directions  group customers into classes and a class-based pricing procedure  set pricing strategy in a highly competitive market 25
  • 26.  Approaches: Clustering & model construction for frauds, outlier analysis  Applications: Health care, retail, credit card service, telecomm.  Auto insurance: ring of collisions  Money laundering: suspicious monetary transactions  Medical insurance  Professional patients, ring of doctors, and ring of references  Unnecessary or correlated screening tests  Telecommunications: phone-call fraud  Phone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected norm  Retail industry  Analysts estimate that 38% of retail shrink is due to dishonest employees  Anti-terrorism 26
  • 27. 27 You might know your own credit score, which is a measure of your creditworthiness, which in turn is used by banks, retailers and car distributors to measure your ability to pay your debt obligations on time. This measure determines if you can get a loan, a store credit card or buy a car or a house. Have you wondered how the system got started? In 1899, grocer owner brothers Cator and Guy Woolford, compiled lists of customers based on their creditworthiness. They saw an opportunity to start a new kind of business selling information, so set up the Retail Credit Company. They began selling their information to insurance companies and other industries and by the 1950s, sold it to the earliest credit card issuers such as Diner’s Club, and quickly became profitable. Because they collected sensitive data that included loans that hadn't been repaid, overdue credit card payments, and multiple address changes by people constantly trying to escape creditors, and were often sloppy in their data collection, they got into legal trouble. Sued by various parties (one suit alleges them collecting information that "may include 'facts, statistics, inaccuracies and rumors' ... about virtually every phase of a person's life; his marital troubles, jobs, school history, childhood, sex life, and political activities, and customers weren’t allowed to view that information), in 1975, they changed their name to Equifax and began selling credit reports to people other than businesses ….
  • 28. 28 Earlier in 1956,William Fair, a mathematician with degrees from Cal Tech, Stanford and Berkeley, and Earl Isaac, an electrical engineer, set up a management consulting company to fix the complex credit checking process. Probably using some multivariate analysis of some kind involving payment history, loan utilization, credit history and credit mix, they launched their own credit scoring system which later became known as FICO. In 1986, Robert Hecht-Nielsen developed a neural network application for his company, HNC Software, Inc., in 1986 to detect credit card fraud. In 2002, that company was bought for $1 billion by FICO. What application of analytics is this?
  • 29.  Forecasting and planning  Prevent overstocking and understocking  Reduce delivery time  Address customer needs  Inventory Management  Real-time capacity availability  Improve inventory distribution across warehouses  Transportation Management  Optimal routing  Streamline fulfillment  Sourcing  Optimize procurement and reduce purchasing costs 29
  • 30. 30 In 1946, American Airlines began experimenting with electromechanical reservation systems. It wasn’t until 1964 that with the help of IBM it launched SABRE, the first real-time airline reservation system. In 1996, Sabre working with a Bell company and Random House, created Travelocity to take advantage of Internet sales and Microsoft entered the market with Expedia. In 2003, Oren Etzioni, a computer science professor found out in flight that most of the passengers on the plane had paid less than he did for his seat even though he had purchased it earlier. Frustrated, he set out to build the first airfare predictor, Farecast. Using a sample of 12,000 price observations from travel websites, he predicted if the ticket price was likely to rise or fall, giving the purchaser more information as to the best time to buy a ticket. Essentially the application would deny the airline industry millions of dollars of potential revenue. By 2008 he was already working on other goods like hotel rooms, concert tickets and used cars. Before he could complete his work, Microsoft bought Farecast for $115 million and integrated it into their Bing search engine. By 2012, Bing was correctly calling 75 percent of the the time and saving travelers an average of $50 per ticket. In 2014, Microsoft killed its Bing Price Predictor and Google has since outflanked it with its own Google Flight. What applications of analytics is this?
  • 31. 31 Many groups record data regarding aviation safety including the National Transportation Safety Board (NTSB) and the Federal Aviation Administration (FAA) Integrating data from different sources as well as mining for patterns from a mix of both structured fields and free text is a difficult task (Eric Bloedorn) The goal of our initial analysis is to determine how data mining can be used to improve airline safety by finding patterns that predict safety problems
  • 32. 32
  • 33.  Based on what you learned from the previous slides, write a 1- minute thesis on the participation forum about any one application of analytics that interests you and define the following: (3 points)  1) What were the indicators of a potential to disrupt that area of industry?  2) What is the archetype of the disruption? 33
  • 34.  “The ability to take data – to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids … I keep saying that the sexy job in the next 10 years will be statisticians, and I’m not kidding.” Hal Varian, Google, McKinsey Quarterly, 2009.  Thomas Davenport’s “Data Scientist: The Sexiest Job of the 21st Century,” HBR, 2012  Data analytic thinking involves viewing business problems from a data perspective and understanding principles of extracting useful knowledge from data.  As you get better at data-analytic thinking you will develop intuition as to how and where to apply creativity and domain knowledge. 34
  • 35.  Explore R Built-in datasets  Load the datasets package  Find the two different titanic datasets (Titanic vs titanic)  Stanford University  http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/  Amazon  http://aws.amazon.com/datasets?_encoding=UTF8&jiveRedirect=1  Big Machine Learning  http://blog.bigml.com/2013/02/28/data-data-data-thousands-of-public-data-sources/  Data Market  http://datamarket.com/  Data Source Handbook by O’Reilly  http://shop.oreilly.com/product/0636920018254.do  UCI Data Set  http://www.sgi.com/tech/mlc/db/ 35
  • 36.  Spreadsheets  Query languages  Visualization  Data Mining  It’s all very technical! I know 36