The document discusses data science and the role of a data scientist. It defines data science as the extraction of knowledge from large amounts of structured and unstructured data. A data scientist is described as a hybrid of data hacker, analyst, communicator, and trusted advisor who analyzes and interprets complex digital data to assist businesses. The document outlines the responsibilities of a data scientist in conducting research, extracting and cleaning large volumes of data, developing algorithms and tools, and recommending strategies. It also lists common tools used by data scientists like machine learning, deep learning, and data visualization.
2. R A J E E V R A N J A N D W I V E D I
I N T E G R A T E D M . S C . S T A T I S T I C S
C E N T R A L U N I V E R S I T Y O F R A J A S T H A N
3. DATA SCIENCE
Data science, also known as data-
driven science, is an interdisciplinary
field of scientific methods,
processes, and systems to extract
knowledge or insights from data in
various forms, either structured or
unstructured, similar to data mining.
Data science is the study of where
information comes from, what it
represents and how it can be turned
into a valuable resource in the creation
of business and IT strategies. Mining
large amounts of structured and
unstructured data to identify patterns
can help an organization rein in costs,
recognize new market opportunities and
increase the organization's competitive
advantage.
4. N O W A F T E R K N O W I N G A B O U T
D A T A S C I E N C E , L E T ' S T R Y T O
K N O W W H A T A DATA
SCIENTIST I S ?
5. A H Y B R I D O F D A T A H A C K E R ,
A N A L Y S T , C O M M U N I C A T O R ,
A N D T R U S T E D A D V I S E R .
6. DATA SCIENTIST
A PERSON EMPLOYED TO ANALYSE AND INTERPRET COMPLEX DIGITAL DATA,
SUCH AS THE USAGE STATISTICS OF A WEBSITE, ESPECIALLY IN ORDER TO
ASSIST A BUSINESS IN ITS DECISION-MAKING.
7. WHAT DOES
A
DATA
SCIENTIST
DO?
> Conduct un-directed research and frame
open-ended industry questions
> Extract huge volumes of data from multiple
internal and external sources
> Employ sophisticated analytics programs,
machine learning and statistical methods to
prepare data for use in predictive and
prescriptive modelling
> Thoroughly clean and prune data to discard
irrelevant information
> Invent new algorithms to solve problems and
build new tools to automate work
> Recommend cost-effective changes to existing
procedures and strategies
> And many more...
8. Data visualization: the presentation of data
in a pictorial or graphical format so it can be
easily analyzed.
Machine learning: a branch of artificial
intelligence based on mathematical
algorithms and automation.
Deep learning: an area of machine learning
research that uses data to model complex
abstractions.
Pattern recognition: technology that
recognizes patterns in data (often used
interchangeably with machine learning).
Data preparation: the process of converting
raw data into another format so it can be
more easily consumed.
DATA SCIENTIST’S
TOOLBOX
9. “THE SEXY JOB IN THE
NEXT 10 YEARS WILL BE
STATISTICIANS. PEOPLE
THINK I’M JOKING, BUT
WHO WOULD’VE
GUESSED THAT
COMPUTER ENGINEERS
WOULD’VE BEEN THE
SEXY JOB OF THE
1990S?”
HAL R. VARIAN