Data scientist

Trieu Nguyen
Trieu NguyenSystem Thinker
DATA
SCIENCE
MORE THAN MINING

                                                 “The sexiest job
                                                    in the next
                                                 10 years will be
                                                  statisticians.”
                                                       — Hal Varian,
                                                      Chief economist,
                                                          Google




While the concept of data science has been around for
decades, the notion of a data scientist has become a
sought-after and in-demand career leading to a rise of a new
generation of data scientists.

The phenomenon in technology development significantly
exposes the staggering growth rates of “big data.”
Technology innovation and the World Wide Web provide for
the growth of new types of data — such as user-generated
content — and tools that can be used to interpret it.

Social media platforms such as Facebook (the largest social
network and valued at $52 billion) depend on data science to
create innovative, interactive features that encourage users
to get interested and stay that way — all so that we know it's
important.

But what does the term ‘Data Science’ really mean?




What is data science?
Data science can be broken down into four essential parts.



Mining data                                      Statistics




Collecting and formatting                          Information analysis
the information




Interpret                                        Leverage



                                                         A B
                                                         C ?


Representation or visualization in               Implications of the data,
the form of presentations,                       application of the data, interaction
infographics, graphs or charts                   using the data and predictions
                                                 formed from studying it




Defining a data scientist
A good data scientist understands the importance of:



Scouring                                         Organization
Their eyes search for                            Their voice asks questions
information on the web                           about what they hope to
  Vectorized operations                          accomplish at the end of
                                                 the project, setting
  Algorithmic strategizing
                                                 information goals.
  APIs




Extraction                                                   Expansion &
Takes information they want and                              Application
organizing it using formulas. They
organize the information in order to                         The appropriate data flows
form educated, insightful conclusions                        out of the person in the form
using statistical and these                                  of keywords, Facebook “Likes”
mathematical methods:                                        and other statistics.
   Factor Analysis
   Regression Analysis
   Correlation
   Time Series Analysis




Creating new theories and
predictions based upon the data
Ask questions to further expound             pile-up and missed opportunities.
upon the data beyond the reaches of
                                             For example, statistics regarding
hard numbers or facts.
                                             holiday shopping trends are
Apply the information in a useful,           imperative around the holiday
innovative manner to applications            season. If the statistics are
whose success depends on data                processed and the conclusions are
science.                                     drawn too late, the season has
                                             passed and the information can no
Immediately process terabytes of
                                             longer be utilized to its full potential.
data that flow in to prevent




Required skills
for a data scientist
A successful data scientist must have a combination of skills that opens up
possibilities both for that individual and their team. Visualization processes are
often disjointed since each person is typically assigned to a specific part of the
project. The designer depends on the information architect. The information
architect depends on stats from the statistician, and so on. A true data scientist
should be skilled in multiple areas.


                               Expertise in
Hacking and                    Mathematics,
Computer                       Statistics,                         Creativity
Science                        Data Mining                         & Insight



                                              %

Knowing how to take            Pulling important                   Knowing what
advantage of                   statistics and                      statistics are
computers and the              coherently organizing               important and how
internet to create             them using                          to leverage them
data-mining formulas           mathematic prowess
                               and computer formulas




Dangers of data science
Statistics can be displayed in a misleading manner
Leading the pollee:
What type of question are you more likely
to answer “yes” to?




                 85%                                                70%
                 No                                                 Yes


Should Americans be taxed                        Should taxes support the
so others can take advantage                     government’s aid to those
of welfare and avoid working?                    who are unable to find work?




                                     Facts that are left out
                                     Including only the starting
                                     and ending points
                                     of data makes the change
                                     seem more drastic.




                                     A collage of carefully
 9 of 10




                                     selected information
                                     combined to induce a
                                     certain opinion
                                     Selection bias occurs when an unrepresentative
                                     population has been taken for a survey or study
                                     and then the results are advertised to the public
                                     consumers as if it represented the total
                                     population. An example is a toothpaste brand
                                     that shows the user how ‘studies’ can often be
                                     weighted in a company's favor.




Ironically, facts and stats can be used to
paint a very inaccurate — and damaging —
picture of a business, organization or
general topic.




Facts about data science

1790                                  The first big data collection project in
                                      history was by the U.S. Census, which
                                      started in 1790.




5MB                                         When hard drives were first
                                            invented, a 5 megabyte server
                                            took up roughly the space of a
                                            luxury refrigerator. Today, a
                                            32 gigabyte micro-SD card
                                            measures around 5/8 x 3/8 inch
                                            and weighs about 0.5 grams.


                                                 32GB




When collecting mass quantities of data, some human remedial input is needed,
this gave birth to   crowd sourcing, The best example is
Amazon's mechanical turk.




Modern collecting of big data is possible with   cloud computing,
or the spreading of the data across several physical resources that can be accessed
remotely, rather than concentrated at one location.



“The computing and processing of
data is literally 100 to 1,000 times
faster and cheaper than before.”
— Scott Yara, Greenplum

Mais conteúdo relacionado

Mais procurados(20)

Big data march2016 ipsos moriBig data march2016 ipsos mori
Big data march2016 ipsos mori
Chris Guthrie125 visualizações
Ibm 1129-the big data zooIbm 1129-the big data zoo
Ibm 1129-the big data zoo
Accenture678 visualizações
Big Data-Job 2Big Data-Job 2
Big Data-Job 2
Roshan Barua151 visualizações
Data science landscape in the insurance industryData science landscape in the insurance industry
Data science landscape in the insurance industry
Stefano Perfetti399 visualizações
Brief introduction to data visualizationBrief introduction to data visualization
Brief introduction to data visualization
Zach Gemignani16K visualizações
Hadoop OverviewHadoop Overview
Hadoop Overview
Gregg Barrett221 visualizações
How to collect and organize dataHow to collect and organize data
How to collect and organize data
Frieda Brioschi1.9K visualizações
Big data PaperBig data Paper
Big data Paper
Daryaz Fares526 visualizações
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data Context
Murad Daryousse662 visualizações
Data science and the art of persuasionData science and the art of persuasion
Data science and the art of persuasion
Alex Clapson76 visualizações
Keynote DubaiKeynote Dubai
Keynote Dubai
Neil Raden156 visualizações
Lecture #03Lecture #03
Lecture #03
Konpal Darakshan108 visualizações
Map Reduce in Big fataMap Reduce in Big fata
Map Reduce in Big fata
Suraj Sawant351 visualizações
Keynote acm10.14.2017Keynote acm10.14.2017
Keynote acm10.14.2017
Alo Ghosh262 visualizações
Km   cognitive computing overview by ken martin 19 jan2015Km   cognitive computing overview by ken martin 19 jan2015
Km cognitive computing overview by ken martin 19 jan2015
HCL Technologies976 visualizações

Destaque

My buyer agency servicesMy buyer agency services
My buyer agency servicessusan lucas
446 visualizações13 slides
1st time homebuyer flyer1st time homebuyer flyer
1st time homebuyer flyerMildred Molina
2.7K visualizações1 slide
First time buyer slide showFirst time buyer slide show
First time buyer slide showChris Bate
746 visualizações64 slides
Buyer presentation Buyer presentation
Buyer presentation Helena Talbot
2.7K visualizações13 slides

Destaque(9)

My buyer agency servicesMy buyer agency services
My buyer agency services
susan lucas446 visualizações
Tracey Taylor Real Estate Buyer Presentation Tracey Taylor Real Estate Buyer Presentation
Tracey Taylor Real Estate Buyer Presentation
Traceytaylor3.6K visualizações
1st time homebuyer flyer1st time homebuyer flyer
1st time homebuyer flyer
Mildred Molina2.7K visualizações
Buy A New Home in 2015 - Buyer presentationBuy A New Home in 2015 - Buyer presentation
Buy A New Home in 2015 - Buyer presentation
Sriram L497 visualizações
First time buyer slide showFirst time buyer slide show
First time buyer slide show
Chris Bate746 visualizações
Buyer presentation Buyer presentation
Buyer presentation
Helena Talbot2.7K visualizações
Who Is the First Time Homebuyer - Infographic | New American FundingWho Is the First Time Homebuyer - Infographic | New American Funding
Who Is the First Time Homebuyer - Infographic | New American Funding
New American Funding - Direct Mortgage Lenders1.2K visualizações
First Time Home Buyer SeminarFirst Time Home Buyer Seminar
First Time Home Buyer Seminar
poo1shark837.8K visualizações
1st Time Home Buyer Seminars1st Time Home Buyer Seminars
1st Time Home Buyer Seminars
Ivan Warman8K visualizações

Similar a Data scientist(20)

Top 10 data science takeaways for executivesTop 10 data science takeaways for executives
Top 10 data science takeaways for executives
Dylan Erens972 visualizações
What is data science ?What is data science ?
What is data science ?
Bohitesh Misra, PMP192 visualizações
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big Data
Snowball Group370 visualizações
Insight white paper_2014Insight white paper_2014
Insight white paper_2014
Lin Todd56 visualizações
Embracing data scienceEmbracing data science
Embracing data science
Vipul Kalamkar227 visualizações
Big data  (word file)Big data  (word file)
Big data (word file)
Shahbaz Anjam805 visualizações
Big Data & Analytics Trends 2016 Vin MalhotraBig Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin Malhotra
Vin Malhotra811 visualizações
Data Scientist - Good Rebels -Data Scientist - Good Rebels -
Data Scientist - Good Rebels -
Good Rebels2.1K visualizações
Ds article pptDs article ppt
Ds article ppt
TanayKarnik173 visualizações
Big data uploadBig data upload
Big data upload
Bhavin Tandel211 visualizações
Who is a data scientist  Who is a data scientist
Who is a data scientist
prateek kumar69 visualizações
Global Technology Outlook 2012 BookletGlobal Technology Outlook 2012 Booklet
Global Technology Outlook 2012 Booklet
IBM Danmark1.6K visualizações
Data centric business and knowledge graph trendsData centric business and knowledge graph trends
Data centric business and knowledge graph trends
Alan Morrison1.7K visualizações
ds.pptxds.pptx
ds.pptx
Elves337 visualizações
CS309A Final Paper_KM_DDCS309A Final Paper_KM_DD
CS309A Final Paper_KM_DD
David Darrough531 visualizações
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
Muhammad Rumman Islam Nur977 visualizações

Mais de Trieu Nguyen(20)

[Notes] Customer 360 Analytics with LEO CDP[Notes] Customer 360 Analytics with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP
Trieu Nguyen219 visualizações
Leo CDP - Pitch DeckLeo CDP - Pitch Deck
Leo CDP - Pitch Deck
Trieu Nguyen402 visualizações
LEO CDP  - What's new in 2022LEO CDP  - What's new in 2022
LEO CDP - What's new in 2022
Trieu Nguyen96 visualizações
Why is LEO CDP important for digital business ?Why is LEO CDP important for digital business ?
Why is LEO CDP important for digital business ?
Trieu Nguyen206 visualizações
From Dataism to Customer Data PlatformFrom Dataism to Customer Data Platform
From Dataism to Customer Data Platform
Trieu Nguyen500 visualizações
Why is Customer Data Platform (CDP) ?Why is Customer Data Platform (CDP) ?
Why is Customer Data Platform (CDP) ?
Trieu Nguyen5K visualizações
Video Ecosystem and some ideas about video big dataVideo Ecosystem and some ideas about video big data
Video Ecosystem and some ideas about video big data
Trieu Nguyen1.6K visualizações
Open OTT - Video Content PlatformOpen OTT - Video Content Platform
Open OTT - Video Content Platform
Trieu Nguyen9.9K visualizações
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
Trieu Nguyen11.8K visualizações

Último(20)

Classification of crude drugs.pptxClassification of crude drugs.pptx
Classification of crude drugs.pptx
GayatriPatra1460 visualizações
Dance KS5 BreakdownDance KS5 Breakdown
Dance KS5 Breakdown
WestHatch53 visualizações
Women from Hackney’s History: Stoke Newington by Sue DoeWomen from Hackney’s History: Stoke Newington by Sue Doe
Women from Hackney’s History: Stoke Newington by Sue Doe
History of Stoke Newington117 visualizações
Chemistry of sex hormones.pptxChemistry of sex hormones.pptx
Chemistry of sex hormones.pptx
RAJ K. MAURYA107 visualizações
STYP infopack.pdfSTYP infopack.pdf
STYP infopack.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego159 visualizações
ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}
ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}
DR .PALLAVI PATHANIA190 visualizações
CWP_23995_2013_17_11_2023_FINAL_ORDER.pdfCWP_23995_2013_17_11_2023_FINAL_ORDER.pdf
CWP_23995_2013_17_11_2023_FINAL_ORDER.pdf
SukhwinderSingh895865480 visualizações
STERILITY TEST.pptxSTERILITY TEST.pptx
STERILITY TEST.pptx
Anupkumar Sharma107 visualizações
Universe revised.pdfUniverse revised.pdf
Universe revised.pdf
DrHafizKosar88 visualizações
Narration  ppt.pptxNarration  ppt.pptx
Narration ppt.pptx
TARIQ KHAN76 visualizações
231112 (WR) v1  ChatGPT OEB 2023.pdf231112 (WR) v1  ChatGPT OEB 2023.pdf
231112 (WR) v1 ChatGPT OEB 2023.pdf
WilfredRubens.com118 visualizações
Psychology KS5Psychology KS5
Psychology KS5
WestHatch56 visualizações
Sociology KS5Sociology KS5
Sociology KS5
WestHatch52 visualizações
ACTIVITY BOOK key water sports.pptxACTIVITY BOOK key water sports.pptx
ACTIVITY BOOK key water sports.pptx
Mar Caston Palacio275 visualizações
Narration lesson plan.docxNarration lesson plan.docx
Narration lesson plan.docx
TARIQ KHAN92 visualizações
Streaming Quiz 2023.pdfStreaming Quiz 2023.pdf
Streaming Quiz 2023.pdf
Quiz Club NITW97 visualizações
GSoC 2024GSoC 2024
GSoC 2024
DeveloperStudentClub1056 visualizações

Data scientist

  • 1. DATA SCIENCE MORE THAN MINING “The sexiest job in the next 10 years will be statisticians.” — Hal Varian, Chief economist, Google While the concept of data science has been around for decades, the notion of a data scientist has become a sought-after and in-demand career leading to a rise of a new generation of data scientists. The phenomenon in technology development significantly exposes the staggering growth rates of “big data.” Technology innovation and the World Wide Web provide for the growth of new types of data — such as user-generated content — and tools that can be used to interpret it. Social media platforms such as Facebook (the largest social network and valued at $52 billion) depend on data science to create innovative, interactive features that encourage users to get interested and stay that way — all so that we know it's important. But what does the term ‘Data Science’ really mean? What is data science? Data science can be broken down into four essential parts. Mining data Statistics Collecting and formatting Information analysis the information Interpret Leverage A B C ? Representation or visualization in Implications of the data, the form of presentations, application of the data, interaction infographics, graphs or charts using the data and predictions formed from studying it Defining a data scientist A good data scientist understands the importance of: Scouring Organization Their eyes search for Their voice asks questions information on the web about what they hope to Vectorized operations accomplish at the end of the project, setting Algorithmic strategizing information goals. APIs Extraction Expansion & Takes information they want and Application organizing it using formulas. They organize the information in order to The appropriate data flows form educated, insightful conclusions out of the person in the form using statistical and these of keywords, Facebook “Likes” mathematical methods: and other statistics. Factor Analysis Regression Analysis Correlation Time Series Analysis Creating new theories and predictions based upon the data Ask questions to further expound pile-up and missed opportunities. upon the data beyond the reaches of For example, statistics regarding hard numbers or facts. holiday shopping trends are Apply the information in a useful, imperative around the holiday innovative manner to applications season. If the statistics are whose success depends on data processed and the conclusions are science. drawn too late, the season has passed and the information can no Immediately process terabytes of longer be utilized to its full potential. data that flow in to prevent Required skills for a data scientist A successful data scientist must have a combination of skills that opens up possibilities both for that individual and their team. Visualization processes are often disjointed since each person is typically assigned to a specific part of the project. The designer depends on the information architect. The information architect depends on stats from the statistician, and so on. A true data scientist should be skilled in multiple areas. Expertise in Hacking and Mathematics, Computer Statistics, Creativity Science Data Mining & Insight % Knowing how to take Pulling important Knowing what advantage of statistics and statistics are computers and the coherently organizing important and how internet to create them using to leverage them data-mining formulas mathematic prowess and computer formulas Dangers of data science Statistics can be displayed in a misleading manner Leading the pollee: What type of question are you more likely to answer “yes” to? 85% 70% No Yes Should Americans be taxed Should taxes support the so others can take advantage government’s aid to those of welfare and avoid working? who are unable to find work? Facts that are left out Including only the starting and ending points of data makes the change seem more drastic. A collage of carefully 9 of 10 selected information combined to induce a certain opinion Selection bias occurs when an unrepresentative population has been taken for a survey or study and then the results are advertised to the public consumers as if it represented the total population. An example is a toothpaste brand that shows the user how ‘studies’ can often be weighted in a company's favor. Ironically, facts and stats can be used to paint a very inaccurate — and damaging — picture of a business, organization or general topic. Facts about data science 1790 The first big data collection project in history was by the U.S. Census, which started in 1790. 5MB When hard drives were first invented, a 5 megabyte server took up roughly the space of a luxury refrigerator. Today, a 32 gigabyte micro-SD card measures around 5/8 x 3/8 inch and weighs about 0.5 grams. 32GB When collecting mass quantities of data, some human remedial input is needed, this gave birth to crowd sourcing, The best example is Amazon's mechanical turk. Modern collecting of big data is possible with cloud computing, or the spreading of the data across several physical resources that can be accessed remotely, rather than concentrated at one location. “The computing and processing of data is literally 100 to 1,000 times faster and cheaper than before.” — Scott Yara, Greenplum