2. Topics
• What is Data Science?
• Pre-requisites for Data Science.
• Data ScienceTasks
• Data Science Life Cycle
• Data Scientist
• Common Data Quality Problem
• How to tackle a Data problem
3. Data Science
A science of:
■ Interactive analysis of data
■ Interactive retrieval of data
■ Interactive prediction based on foresee/intelligence.
■ Generalized definition: Data science is the science which uses
computer science, statistics and machine learning, visualization
and human computer interactions to collect, clean, integrate,
analyze, visualize, and interact with data to create data
products.
5. Pre-requisites for Data Science:
■ Computer Science: the study of both computer hardware and
software design. It encompasses both the study of theoretical
algorithms and the practical problems involved in
implementing them through computer hardware and software.
■ Statistics: a branch of mathematics dealing with the collection,
analysis, interpretation, and presentation of masses of
numerical data.
6. ■ Machine Learning: Machine
learning is an application of artificial
intelligence (AI) that provides systems
the ability to automatically learn and
improve from experience without
being explicitly programmed. Machine
learning focuses on the development
of computer programs that can access
data and use it learn for themselves.
■ Visualization: the process of
representing data graphically and
interacting with these representations
in order to gain insight into the data.
Pre-requisites for Data Science:
7. Regular Data ScienceTasks
• Data analysis
• What percentage of users back to our site?
• Which products usually bought together?
• Modeling/statistics
• How many cars we are going to sell next year?
• Which city is better for opening new office?
• Engineering/prototyping
• Product to use a prediction model
• Visualization of analytics
9. What is a Data Scientist?
■ Data scientists serve the needs and solve the problems of data users. They use their
formidable skills in math, statistics and programming to clean, manage and organize
them.Then they apply all their analytic powers to uncover hidden solutions in the data.
12. ■ Subject Matter Expert (SME)
■ They possess domain knowledge with regards to the type of
problem and so are a source of professional wisdom.
■ Anomaly
■ Anomalies are best and worst case scenarios. Main aim is to
reach “the center” for the information required regarding the
problem.
■ Risk and Uncertainty in data
■ Uncertainty can be minimized through the validation of
information gained about the problem. Risks can be reduced
once uncertainty is reduced which enables to make good
decisions.