In this presentation, let's have a look at What is Data Science and it's applications. We discussed most common use cases of Data Science.
I presented this at LSPE-IN meetup happened on 10th March 2018 at Walmart Global Technology Services.
Agenda…
• What is Data Science?
• Big Data Challenges
• Data Science vs Software Engineering
• Data Science Applications & Use cases
• Conclusion
What is Data Science?
Data Science is the science which uses computer science, statistics and machine
learning, visualization and human-computer interactions to collect, clean, integrate,
analyze, visualize, interact with data to create data products.
“Using data to make better decisions, optimize processes and improve products
and services.”
“What distinguishes data science itself from the tools and techniques
is the central goal of deploying effective decision-making models to a
production environment. “
– John Mount & Nina Zumel, Practical Data Science with R
Big Data Challenges
• Dealing with Data Growth
• Generating insights in a timely manner
• Integrating disparate data sources
• Validating Data
• Securing Bigdata
• Organizational resistance
‘Data science’ is “Data-Driven Decision” making, to help the business to
make good choices, whereas software engineering is the methodology
for software product development without any confusions about the
requirements.
Data Science vs Software Engineering
Data Science Competence Groups - Research
Data Science Competence includes 5
areas/groups
• Data Analytics
• Data Science Engineering
• Domain Expertise
• Data Management
• Scientific Methods (or Business Process
Management)
Scientific Methods
• Design Experiment
• Collect Data
• Analyse Data
• Identify Patterns
• Hypothesise Explanation
• Test Hypothesis
Business Operations
• Operations Strategy
• Plan
• Design & Deploy
• Monitor & Control
• Improve & Re-design
Data Science Competence includes 5
areas/groups
• Data Analytics
• Data Science Engineering
• Domain Expertise
• Data Management
• Scientific Methods (or Business Process
Management)
Scientific Methods
• Design Experiment
• Collect Data
• Analyse Data
• Identify Patterns
• Hypothesise Explanation
• Test Hypothesis
Business Process
Operations/Stages
• Design
• Model/Plan
• Deploy & Execute
• Monitor & Control
• Optimise & Re-design
Data Science Competences Groups – Business
Design
Modelling
Execution
Monitoring
Optimisation
RESEARCH
DATA
ANALYTICS
ALGORITHMSANALYTIC
SYSTEMS
ENGINEERING
COMPETENCES
DOMAIN
EXPERTISE DATA
SCIENCE
Data
Management
Scientific
Methods
Business Process
Management
Data Science Applications & Use cases
• RECOMMENDER SYSTEMS
• CREDIT SCORING
• DYNAMIC PRICING
• CUSTOMER CHURN
• FRAUD DETECTION
RECOMMENDER SYSTEMS
WHAT IS A RECOMMENDER SYSTEM?
A model that filters information to present users with a curated subset
of options they’re likely to find appealing
HOW DOES IT WORK?
Generally via a collaborative approach (considering user’s previous
behavior) or content-based approach (based on discrete assigned
characteristics)
WHAT IS A REAL USE CASE?
Tendril uses recommendation models to match eligible customers with
new or existing energy products
CREDIT SCORING
WHAT IS CREDIT SCORING?
A model that determines an applicant’s creditworthiness for a mortgage,
loan or credit card
HOW DOES IT WORK?
A set of decision management rules evaluates how likely an applicant is to
repay debts
WHAT IS A REAL USE CASE?
Ferratum Bank uses machine learning models to reach prospective
customers that may have been overlooked by traditional banking
institutions
DYNAMIC PRICING
WHAT IS DYNAMIC PRICING?
Modeling price as a function of supply, demand, competitor pricing and
exogenous factors
HOW DOES IT WORK?
Generalized linear models and classification trees are popular
techniques for estimating the “right” price to maximize expected
revenue.
WHAT IS A REAL USE CASE?
Turo uses dynamic pricing models to suggest prices to the people who
list and rent out cars
CUSTOMER CHURN
WHAT IS CUSTOMER CHURN?
Predicting which customers are going to abandon a product or service
HOW DOES IT WORK?
Data scientists may consider using support vector machines, random
forest or k-nearest-neighbors algorithms
WHAT IS A REAL USE CASE?
EAB combines data from transcripts, standardized test scores,
demographics and more to identify students at risk of not graduating.
FRAUD DETECTION
WHAT IS FRAUD DETECTION?
Detecting and preventing fraudulent financial transactions from being
processed
HOW DOES IT WORK?
Fraud detection is a binary classification problem: “is this transaction
legitimate or not?”
WHAT IS A REAL USE CASE?
Via SMS Group uses a combination of complex data lookups and
decision algorithms written in R and implemented in PHP to assess
whether a loan applicant is fraudulent
Churn rate describes the rate at which customers abandon a product or service. Understanding customers’ likelihood to churn is particularly important for subscription-based models, everything ranging from traditional cable or gym memberships to recently popularized monthly subscription boxes.