Invited talk presented by Applied Technical Systems' CIO/CTO Ken Smith at the 2014 American Physical Society's April Meeting in Savannah, GA
Abstract:
Much of the conversation in commercial enterprises these days revolves around industry buzz words such as Big Data, Data Science, and being Data Driven. Beyond the hype surrounding these terms, there is a real, continuously growing movement for organizations to make better use of the data assets they have to inform decisions, strategy, and policy. This push is not unique to the commercial sector; governmental and academic organizations are also embracing such initiatives. The skills required to staff a Data Science project typically come from a number of disciplines, ranging from computer science, statistics, modeling and simulation, to information technology, but the emerging wisdom in the community is that the rigor and discipline of a scientific background often makes for the best data scientists. In this talk, I will offer a personal perspective on making the transition from a career in computational physics (specifically Numerical Relativity) to a career in industry, where I have focused on helping organizations make more informed decisions through better access and analysis of data at their disposal. I will identify the skills and training that carry over from a background in physics, discuss the gaps in that preparation, hypothesize as to where this industry is headed, and offer a frank look at a life outside of academia.
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Numerical Relativity as preparation for Industrial Data Science: a personal perspective
1. Numerical Relativity as preparation for
Industrial Data Science:
a personal perspective
Ken Smith, CIO/CTO
APS April Meeting, 2014-04-06
2. Who am I?
What is data science?
Why is it a viable (maybe even desirable)
career option for physicists?
How do you get started?
Overview
Note: all image attributions will appear at the end of the slide deck.
2
3. Who am I?
2002 2004 2006 2008 2010 2012 2014
grad student
lecturer
sr. scientist CIO
sr. scientist
architect
physics
educationnumerical
relativity /
astrophysics
machine
learning
natural
language
processing software
architecture
3
4. Selected projects
• Automatically categorizing text documents into
topics based solely on content
• Improving entity (person, location, organization)
extraction techniques for large bodies of text within
the US Army
• Developing new tools for US Patent Examiners
within the USPTO
• Modeling and linking disparate datasets
associated with supply & maintenance of US Navy
systems
• Designing systems to organize and visualize skills
mix of employees within a company
4
6. ―I keep saying the sexy job in the
next ten years will be statisticians.
People think I’m joking, but who
would’ve guessed that computer
engineers would’ve been the sexy
job of the 1990s? The ability to take
data—to be able to understand it, to
process it, to extract value from it, to
visualize it, to communicate it—that’s
going to be a hugely important skill in
the next decades‖
Hal Varian, Chief Economist, Google
January 2009
The sexiest job?
http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1
http://www.mckinsey.com/insights/innovation/hal_varian_on_how_the_web_challenges_managers
6
7. Data Science Skills & Disciplines
7
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
9. Trends: Data Storage
IBM 350 in 1956:
3.75 MB
6.4 kB/s data transfer
(50) 24-in diameter disk
platters
> 1 ton
Leased for $3200/mo
9
http://old-photos.blogspot.com/2011/06/hard-drive.html
16. Father walks into a Minneapolis
Target store: ―My daughter got
this in the mail!‖ he said. ―She’s
still in high school, and you’re
sending her coupons for baby
clothes and cribs? Are you trying
to encourage her to get
pregnant?‖
Manager apologizes and calls
back a few days later to apologize
again
―I had a talk with my daughter,‖ he
said. ―It turns out there’s been
some activities in my house I
haven’t been completely aware of.
She’s due in August. I owe you an
apology.‖
Data mining determined a set of
signals that a pregnant shopper
may be getting near to her due
date:
• larger quantities of unscented
lotion
• supplements like calcium,
magnesium and zinc.
• scent-free soap and
• extra-big bags of cotton balls
• hand sanitizers
• washcloths
Trends: Targeted Marketing
http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html
16
17. ―What differentiates data
science from statistics is that
data science is a holistic
approach. We’re increasingly
finding data in the wild, and
data scientists are involved
with gathering data,
massaging it into a tractable
form, making it tell its story,
and presenting that story to
others.‖
What data scientists do
17
http://www.oreilly.com/data/free/what-is-data-science.csp
18. What does a data scientist
do?
18
http://strata.oreilly.com/2013/09/data-analysis-just-one-component-of-the-data-science-workflow.html
20. ―People often assume that data scientists need a
background in computer science. In my experience, that
hasn’t been the case: my best data scientists have come
from very different backgrounds. The inventor of
LinkedIn’s People You May Know was an experimental
physicist. A computational chemist on my decision
sciences team had solved a 100-year-old problem on
energy states of water. An oceanographer made major
impacts on the way we identify fraud. Perhaps most
surprising was the neurosurgeon who turned out to be a
wizard at identifying rich underlying trends in the data.‖
DJ Patil, former Chief Scientist for LinkedIn
Where do data scientists come from?
http://radar.oreilly.com/2011/09/building-data-science-teams.html
20
21. Insight Data Science Fellows
21
http://insightdatascience.com/
An intensive six week post-doctoral training
fellowship bridging the gap between academia and
data science
22. Projected Data Science Demand
22
https://www.mckinsey.com/~/media/McKinsey/dotcom/Insights%20and%20pubs/MGI/Research/Technology
%20and%20Innovation/Big%20Data/MGI_big_data_exec_summary.ashx
23. Recent NSF data on
employment at PhD award
23
http://www.nsf.gov/statistics/sed/digest/2012/
24. AIP Physics Career Statistics
24
http://aip.org/statistics/data-graphics/physics-phds-starting-salaries-classes-2009-2010
http://aip.org/statistics/physics-trends/physics-phds-1-year-later
25. What you have:
• Analytical/problem-
solving mindset
• Presentation skills (oral,
written, & graphical)
• Mathematical preparation
• Curiosity
• Understanding that
reference frames can
only ever be local
What you are missing:
• Sufficient training in
statistics
– Regression beyond linear
– Classification techniques
– Machine learning
• SQL (Database)
• Information Visualization
(psychology of design)
• Business/Finance
acumen
Physics prep for Data Science
Warning: gross generalizations
25
26. Introduce statistical analysis
techniques into graduate (possibly
undergraduate) core physics
curriculum.
Make computer science courses
available in high school. The
ability to program is becoming a
foundational skill along with
reading, writing, and arithmetic.
Curriculum
Recommendations
26
http://www.amazon.com/Mathematical-Methods-Physicists-Fourth-Edition/dp/0120598159
http://csedweek.org/promote
29. • Insight Data Science Fellows Program
http://insightdatascience.com/
• Coursera: Stanford Machine Learning
https://www.coursera.org/course/ml
• Coursera: U. Washington Intro to Data Science
https://www.coursera.org/course/datasci
• Coursera: Princeton Algorithms Part I
https://www.coursera.org/course/algs4partI
• General Assembly Data Science
https://generalassemb.ly/education/data-science
Resources available
29
30. Learn and compete!
“Kaggle is the world's largest
community of data
scientists. They compete
with each other to solve
complex data science
problems, and the top
competitors are invited to
work on the most interesting
and sensitive business
problems from some of the
world’s biggest companies
through Masters
competitions.”
www.kaggle.com/about
30