SlideShare uma empresa Scribd logo
1 de 33
Numerical Relativity as preparation for
Industrial Data Science:
a personal perspective
Ken Smith, CIO/CTO
APS April Meeting, 2014-04-06
Who am I?
What is data science?
Why is it a viable (maybe even desirable)
career option for physicists?
How do you get started?
Overview
Note: all image attributions will appear at the end of the slide deck.
2
Who am I?
2002 2004 2006 2008 2010 2012 2014
grad student
lecturer
sr. scientist CIO
sr. scientist
architect
physics
educationnumerical
relativity /
astrophysics
machine
learning
natural
language
processing software
architecture
3
Selected projects
• Automatically categorizing text documents into
topics based solely on content
• Improving entity (person, location, organization)
extraction techniques for large bodies of text within
the US Army
• Developing new tools for US Patent Examiners
within the USPTO
• Modeling and linking disparate datasets
associated with supply & maintenance of US Navy
systems
• Designing systems to organize and visualize skills
mix of employees within a company
4
WHAT IS DATA SCIENCE?
SKILLS
TRENDS
ACTIVITIES
5
―I keep saying the sexy job in the
next ten years will be statisticians.
People think I’m joking, but who
would’ve guessed that computer
engineers would’ve been the sexy
job of the 1990s? The ability to take
data—to be able to understand it, to
process it, to extract value from it, to
visualize it, to communicate it—that’s
going to be a hugely important skill in
the next decades‖
Hal Varian, Chief Economist, Google
January 2009
The sexiest job?
http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1
http://www.mckinsey.com/insights/innovation/hal_varian_on_how_the_web_challenges_managers
6
Data Science Skills & Disciplines
7
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Data Science post-Prism
8http://joelgrus.com/2013/06/09/post-prism-data-science-venn-diagram
Trends: Data Storage
IBM 350 in 1956:
3.75 MB
6.4 kB/s data transfer
(50) 24-in diameter disk
platters
> 1 ton
Leased for $3200/mo
9
http://old-photos.blogspot.com/2011/06/hard-drive.html
Trends: Data Storage
10http://www.mkomo.com/cost-per-gigabyte-update
Trends: Open Source
Software
11https://github.com/blog/1724-10-million-repositories
Trends: Quantized Self
The 2012 Feltron Report
12
http://feltron.com/ar12_02.html
Trends: Quantized Self
The 2012 Feltron Report
13http://feltron.com/ar12_02.html
Trends: Quantized Self &
Ubiquitous Sensors
14
Trends: Digital Exhaust
15
Father walks into a Minneapolis
Target store: ―My daughter got
this in the mail!‖ he said. ―She’s
still in high school, and you’re
sending her coupons for baby
clothes and cribs? Are you trying
to encourage her to get
pregnant?‖
Manager apologizes and calls
back a few days later to apologize
again
―I had a talk with my daughter,‖ he
said. ―It turns out there’s been
some activities in my house I
haven’t been completely aware of.
She’s due in August. I owe you an
apology.‖
Data mining determined a set of
signals that a pregnant shopper
may be getting near to her due
date:
• larger quantities of unscented
lotion
• supplements like calcium,
magnesium and zinc.
• scent-free soap and
• extra-big bags of cotton balls
• hand sanitizers
• washcloths
Trends: Targeted Marketing
http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html
16
―What differentiates data
science from statistics is that
data science is a holistic
approach. We’re increasingly
finding data in the wild, and
data scientists are involved
with gathering data,
massaging it into a tractable
form, making it tell its story,
and presenting that story to
others.‖
What data scientists do
17
http://www.oreilly.com/data/free/what-is-data-science.csp
What does a data scientist
do?
18
http://strata.oreilly.com/2013/09/data-analysis-just-one-component-of-the-data-science-workflow.html
WHY IS DATA SCIENCE VIABLE
FOR PHYSICISTS?
19
―People often assume that data scientists need a
background in computer science. In my experience, that
hasn’t been the case: my best data scientists have come
from very different backgrounds. The inventor of
LinkedIn’s People You May Know was an experimental
physicist. A computational chemist on my decision
sciences team had solved a 100-year-old problem on
energy states of water. An oceanographer made major
impacts on the way we identify fraud. Perhaps most
surprising was the neurosurgeon who turned out to be a
wizard at identifying rich underlying trends in the data.‖
DJ Patil, former Chief Scientist for LinkedIn
Where do data scientists come from?
http://radar.oreilly.com/2011/09/building-data-science-teams.html
20
Insight Data Science Fellows
21
http://insightdatascience.com/
An intensive six week post-doctoral training
fellowship bridging the gap between academia and
data science
Projected Data Science Demand
22
https://www.mckinsey.com/~/media/McKinsey/dotcom/Insights%20and%20pubs/MGI/Research/Technology
%20and%20Innovation/Big%20Data/MGI_big_data_exec_summary.ashx
Recent NSF data on
employment at PhD award
23
http://www.nsf.gov/statistics/sed/digest/2012/
AIP Physics Career Statistics
24
http://aip.org/statistics/data-graphics/physics-phds-starting-salaries-classes-2009-2010
http://aip.org/statistics/physics-trends/physics-phds-1-year-later
What you have:
• Analytical/problem-
solving mindset
• Presentation skills (oral,
written, & graphical)
• Mathematical preparation
• Curiosity
• Understanding that
reference frames can
only ever be local
What you are missing:
• Sufficient training in
statistics
– Regression beyond linear
– Classification techniques
– Machine learning
• SQL (Database)
• Information Visualization
(psychology of design)
• Business/Finance
acumen
Physics prep for Data Science
Warning: gross generalizations
25
Introduce statistical analysis
techniques into graduate (possibly
undergraduate) core physics
curriculum.
Make computer science courses
available in high school. The
ability to program is becoming a
foundational skill along with
reading, writing, and arithmetic.
Curriculum
Recommendations
26
http://www.amazon.com/Mathematical-Methods-Physicists-Fourth-Edition/dp/0120598159
http://csedweek.org/promote
HOW DO YOU GET STARTED?
27
28
http://nirvacana.com/thoughts/becoming-a-data-scientist/
• Insight Data Science Fellows Program
http://insightdatascience.com/
• Coursera: Stanford Machine Learning
https://www.coursera.org/course/ml
• Coursera: U. Washington Intro to Data Science
https://www.coursera.org/course/datasci
• Coursera: Princeton Algorithms Part I
https://www.coursera.org/course/algs4partI
• General Assembly Data Science
https://generalassemb.ly/education/data-science
Resources available
29
Learn and compete!
“Kaggle is the world's largest
community of data
scientists. They compete
with each other to solve
complex data science
problems, and the top
competitors are invited to
work on the most interesting
and sensitive business
problems from some of the
world’s biggest companies
through Masters
competitions.”
www.kaggle.com/about
30
Twitter: @Ken_2scientists
http://www.atsid.com
http://slidesha.re/1idf43d
Thanks!
31
Image Sources
32
Slide Source
7 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
8 http://joelgrus.com/2013/06/09/post-prism-data-science-venn-diagram
9 http://old-photos.blogspot.com/2011/06/hard-drive.html
10 http://www.mkomo.com/cost-per-gigabyte-update
11 https://github.com/blog/1724-10-million-repositories
12,13 http://feltron.com/ar12_02.html
14 http://www.fitbit.com
15 https://chrome.google.com/webstore/detail/collusion-for-
chrome/ganlifbpkcplnldliibcbegplfmcfigp
18 http://strata.oreilly.com/2013/09/data-analysis-just-one-component-of-the-
data-science-workflow.html
21 http://insightdatascience.com/
Image Sources
33
Slide Source
22 https://www.mckinsey.com/~/media/McKinsey/dotcom/Insights%20and%20pu
bs/MGI/Research/Technology%20and%20Innovation/Big%20Data/MGI_big_dat
a_exec_summary.ashx
23 http://www.nsf.gov/statistics/sed/digest/2012/
24 http://aip.org/statistics/data-graphics/physics-phds-starting-salaries-classes-
2009-2010
http://aip.org/statistics/physics-trends/physics-phds-1-year-later
26 http://www.amazon.com/Mathematical-Methods-Physicists-Fourth-
Edition/dp/0120598159
http://csedweek.org/promote
28 http://nirvacana.com/thoughts/becoming-a-data-scientist/
30 http://www.kaggle.com/competitions

Mais conteúdo relacionado

Semelhante a Numerical Relativity as preparation for Industrial Data Science: a personal perspective

Data Scientist - Good Rebels -
Data Scientist - Good Rebels -Data Scientist - Good Rebels -
Data Scientist - Good Rebels -Good Rebels
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science LandscapePhilip Bourne
 
Data science market insights usa
Data science market insights usaData science market insights usa
Data science market insights usaKaitlin McAndrews
 
From Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringFrom Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringRy Walker
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBala Iyer
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AlonePhilip Bourne
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Analytics trends 2016 the next evolution
Analytics trends 2016 the next evolutionAnalytics trends 2016 the next evolution
Analytics trends 2016 the next evolutionYann Lecourt
 
Analytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionAnalytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionDeloitte United States
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data scienceVipul Kalamkar
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Joanne Luciano
 
Data Science Careers in 2024
Data Science Careers in 2024Data Science Careers in 2024
Data Science Careers in 2024USDSI
 
Factsheet: Data Science Careers in 2024
Factsheet: Data Science Careers in 2024Factsheet: Data Science Careers in 2024
Factsheet: Data Science Careers in 2024USDSI
 
An Introduction to Data Science.pptx learn
An Introduction to Data Science.pptx learnAn Introduction to Data Science.pptx learn
An Introduction to Data Science.pptx learnPavankalayankusetty
 
Ten 2015 Technology Predictions
Ten 2015 Technology PredictionsTen 2015 Technology Predictions
Ten 2015 Technology Predictionsibi
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 

Semelhante a Numerical Relativity as preparation for Industrial Data Science: a personal perspective (20)

Data Scientist - Good Rebels -
Data Scientist - Good Rebels -Data Scientist - Good Rebels -
Data Scientist - Good Rebels -
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
Big data assignment
Big data assignmentBig data assignment
Big data assignment
 
Data science market insights usa
Data science market insights usaData science market insights usa
Data science market insights usa
 
From Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringFrom Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data Engineering
 
Data science for everyone
Data science for everyoneData science for everyone
Data science for everyone
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the Marketspace
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
The 25 Predictions About The Future Of Big Data
The 25 Predictions About The Future Of Big DataThe 25 Predictions About The Future Of Big Data
The 25 Predictions About The Future Of Big Data
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Analytics trends 2016 the next evolution
Analytics trends 2016 the next evolutionAnalytics trends 2016 the next evolution
Analytics trends 2016 the next evolution
 
Analytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionAnalytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolution
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020
 
Data Science Careers in 2024
Data Science Careers in 2024Data Science Careers in 2024
Data Science Careers in 2024
 
Factsheet: Data Science Careers in 2024
Factsheet: Data Science Careers in 2024Factsheet: Data Science Careers in 2024
Factsheet: Data Science Careers in 2024
 
An Introduction to Data Science.pptx learn
An Introduction to Data Science.pptx learnAn Introduction to Data Science.pptx learn
An Introduction to Data Science.pptx learn
 
Ten 2015 Technology Predictions
Ten 2015 Technology PredictionsTen 2015 Technology Predictions
Ten 2015 Technology Predictions
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 

Último

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Numerical Relativity as preparation for Industrial Data Science: a personal perspective

  • 1. Numerical Relativity as preparation for Industrial Data Science: a personal perspective Ken Smith, CIO/CTO APS April Meeting, 2014-04-06
  • 2. Who am I? What is data science? Why is it a viable (maybe even desirable) career option for physicists? How do you get started? Overview Note: all image attributions will appear at the end of the slide deck. 2
  • 3. Who am I? 2002 2004 2006 2008 2010 2012 2014 grad student lecturer sr. scientist CIO sr. scientist architect physics educationnumerical relativity / astrophysics machine learning natural language processing software architecture 3
  • 4. Selected projects • Automatically categorizing text documents into topics based solely on content • Improving entity (person, location, organization) extraction techniques for large bodies of text within the US Army • Developing new tools for US Patent Examiners within the USPTO • Modeling and linking disparate datasets associated with supply & maintenance of US Navy systems • Designing systems to organize and visualize skills mix of employees within a company 4
  • 5. WHAT IS DATA SCIENCE? SKILLS TRENDS ACTIVITIES 5
  • 6. ―I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s? The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades‖ Hal Varian, Chief Economist, Google January 2009 The sexiest job? http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1 http://www.mckinsey.com/insights/innovation/hal_varian_on_how_the_web_challenges_managers 6
  • 7. Data Science Skills & Disciplines 7 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  • 9. Trends: Data Storage IBM 350 in 1956: 3.75 MB 6.4 kB/s data transfer (50) 24-in diameter disk platters > 1 ton Leased for $3200/mo 9 http://old-photos.blogspot.com/2011/06/hard-drive.html
  • 12. Trends: Quantized Self The 2012 Feltron Report 12 http://feltron.com/ar12_02.html
  • 13. Trends: Quantized Self The 2012 Feltron Report 13http://feltron.com/ar12_02.html
  • 14. Trends: Quantized Self & Ubiquitous Sensors 14
  • 16. Father walks into a Minneapolis Target store: ―My daughter got this in the mail!‖ he said. ―She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?‖ Manager apologizes and calls back a few days later to apologize again ―I had a talk with my daughter,‖ he said. ―It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.‖ Data mining determined a set of signals that a pregnant shopper may be getting near to her due date: • larger quantities of unscented lotion • supplements like calcium, magnesium and zinc. • scent-free soap and • extra-big bags of cotton balls • hand sanitizers • washcloths Trends: Targeted Marketing http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html 16
  • 17. ―What differentiates data science from statistics is that data science is a holistic approach. We’re increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.‖ What data scientists do 17 http://www.oreilly.com/data/free/what-is-data-science.csp
  • 18. What does a data scientist do? 18 http://strata.oreilly.com/2013/09/data-analysis-just-one-component-of-the-data-science-workflow.html
  • 19. WHY IS DATA SCIENCE VIABLE FOR PHYSICISTS? 19
  • 20. ―People often assume that data scientists need a background in computer science. In my experience, that hasn’t been the case: my best data scientists have come from very different backgrounds. The inventor of LinkedIn’s People You May Know was an experimental physicist. A computational chemist on my decision sciences team had solved a 100-year-old problem on energy states of water. An oceanographer made major impacts on the way we identify fraud. Perhaps most surprising was the neurosurgeon who turned out to be a wizard at identifying rich underlying trends in the data.‖ DJ Patil, former Chief Scientist for LinkedIn Where do data scientists come from? http://radar.oreilly.com/2011/09/building-data-science-teams.html 20
  • 21. Insight Data Science Fellows 21 http://insightdatascience.com/ An intensive six week post-doctoral training fellowship bridging the gap between academia and data science
  • 22. Projected Data Science Demand 22 https://www.mckinsey.com/~/media/McKinsey/dotcom/Insights%20and%20pubs/MGI/Research/Technology %20and%20Innovation/Big%20Data/MGI_big_data_exec_summary.ashx
  • 23. Recent NSF data on employment at PhD award 23 http://www.nsf.gov/statistics/sed/digest/2012/
  • 24. AIP Physics Career Statistics 24 http://aip.org/statistics/data-graphics/physics-phds-starting-salaries-classes-2009-2010 http://aip.org/statistics/physics-trends/physics-phds-1-year-later
  • 25. What you have: • Analytical/problem- solving mindset • Presentation skills (oral, written, & graphical) • Mathematical preparation • Curiosity • Understanding that reference frames can only ever be local What you are missing: • Sufficient training in statistics – Regression beyond linear – Classification techniques – Machine learning • SQL (Database) • Information Visualization (psychology of design) • Business/Finance acumen Physics prep for Data Science Warning: gross generalizations 25
  • 26. Introduce statistical analysis techniques into graduate (possibly undergraduate) core physics curriculum. Make computer science courses available in high school. The ability to program is becoming a foundational skill along with reading, writing, and arithmetic. Curriculum Recommendations 26 http://www.amazon.com/Mathematical-Methods-Physicists-Fourth-Edition/dp/0120598159 http://csedweek.org/promote
  • 27. HOW DO YOU GET STARTED? 27
  • 29. • Insight Data Science Fellows Program http://insightdatascience.com/ • Coursera: Stanford Machine Learning https://www.coursera.org/course/ml • Coursera: U. Washington Intro to Data Science https://www.coursera.org/course/datasci • Coursera: Princeton Algorithms Part I https://www.coursera.org/course/algs4partI • General Assembly Data Science https://generalassemb.ly/education/data-science Resources available 29
  • 30. Learn and compete! “Kaggle is the world's largest community of data scientists. They compete with each other to solve complex data science problems, and the top competitors are invited to work on the most interesting and sensitive business problems from some of the world’s biggest companies through Masters competitions.” www.kaggle.com/about 30
  • 32. Image Sources 32 Slide Source 7 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram 8 http://joelgrus.com/2013/06/09/post-prism-data-science-venn-diagram 9 http://old-photos.blogspot.com/2011/06/hard-drive.html 10 http://www.mkomo.com/cost-per-gigabyte-update 11 https://github.com/blog/1724-10-million-repositories 12,13 http://feltron.com/ar12_02.html 14 http://www.fitbit.com 15 https://chrome.google.com/webstore/detail/collusion-for- chrome/ganlifbpkcplnldliibcbegplfmcfigp 18 http://strata.oreilly.com/2013/09/data-analysis-just-one-component-of-the- data-science-workflow.html 21 http://insightdatascience.com/
  • 33. Image Sources 33 Slide Source 22 https://www.mckinsey.com/~/media/McKinsey/dotcom/Insights%20and%20pu bs/MGI/Research/Technology%20and%20Innovation/Big%20Data/MGI_big_dat a_exec_summary.ashx 23 http://www.nsf.gov/statistics/sed/digest/2012/ 24 http://aip.org/statistics/data-graphics/physics-phds-starting-salaries-classes- 2009-2010 http://aip.org/statistics/physics-trends/physics-phds-1-year-later 26 http://www.amazon.com/Mathematical-Methods-Physicists-Fourth- Edition/dp/0120598159 http://csedweek.org/promote 28 http://nirvacana.com/thoughts/becoming-a-data-scientist/ 30 http://www.kaggle.com/competitions