O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Data Science?!
what even...
David Coallier
@davidcoallier
Data Scientist

Engine Yard
And I cook..
A lot.
(n-1) items
Adapting.
Feedback.
Indifference.
Young mathematically
inclined minds
Young mathematically inclined minds

We knew everything.
First Bad Assumption.
So we asked “experts”.
Wrong Ingredients
Bad Data
Tasted like sh*t
From Our Results
We had questions.
Found Expertise
Not Online.
Data Scientific
Method
Find a Question
Your Hypothesis
Current Data

What do you have?
Features & Tests
Try it.
Analyse Results
Won’t be pretty.
Conversation

Framed. By. Data.
But....
Good Discussions
Imply good data scientists
Hacking Skills
Hacking Skills

Maths &
Stats
Hacking Skills

Expertise

Maths &
Stats
Hacking Skills
Machine
Learning

Danger
Zone!!!

Expertise

Research

Maths &
Stats
Hacking Skills

Data
Science

Expertise

Maths &
Stats
Hacking Skills

Danger
Zone!!!

Machine
Learning

Data
Science
Maths &
Stats

Expertise
Research
Business

Don’t need an MBA
In other words.
1. Hacking
2. Maths & Stats
3. Expertise
Apply Method
Data Scientific
1. Question
2. Current Data
3. Features/Tests
4. Analyse
5. Converse
Find a Question

Let’s imagine Github
Upgrade Repos
Affect users as little as possible
import csv
content = csv.read('repo1.csv')
λ e
f (k; λ ) =
k!

k −k

for k >= 0
Converse

Present Findings
Iterate

Commits aren’t key.
KPIs are key

Indicators from experience
Questions

Super Important.
Just test it..
We are Human.

Emotional Connection
What next?

Second Hypothesis.
Focus on Data

Relevant to your KPIs.
Data gives you the what
Humans give you the why
Turn Information
Into

Actionable Insight
Create Discussions
Introspection Engines
Seeing, Feeling it
The brain sees.
Not regressions
Not p-values
Not slopes
Not F-statistics
Not coefficients
Question Data

Not Visualisations.
Toolbox

What do we use?
R
Modeling, Testing, Prototyping
RStudio

The IDE
lubridate
and zoo
Dealing with Dates...
yy/mm/dd
mm/dd/yy
YYYY-mm-dd HH:MM:ss TZ
yy-mm-dd
1363784094.513425
yy/mm
different timezone
reshape2

Reshape your Data
ggplot2

Visualise your Data
RCurl, RJSONIO
Find more Data
HMisc

Miscellaneous useful functions
forecast

Can you guess?
garch

Generalized Autoregressive
Conditional Heteroskedasticity
quantmod

Statistical Financial Trading
getSymbols('AAPL')
barChart(AAPL)
addMACD()
xts

Extensible Time Series
igraph

Study Networks
maptools

Read & View Maps
map('state', region = c(row.names(USArrests)), col=cm.colors(16, 1)[floor(USArrests$Rape/max(USArrests$Rape)*28)], fill=T)
Python

Scientific Computing
SciPy
http://www.scipy.org
scipy.stats
scipy.stats
Descriptive Statistics
from scipy.stats import
describe
s = [1,2,1,3,4,5]
print describe(s)
scipy.stats
Probability Distributions
Example
Poisson Distribution
λ e
f (k; λ ) =
k!

k −k

for k >= 0
import scipy.stats.poisson
p = poisson.pmf([1,2,3,4,1,2,3], 2)
print p.mean()
print p.sum()
...
NumPy
http://www.numpy.org/
NumPy
Linear Algebra
⎛ 1 0 ⎞
⎜ 0 1 ⎟
⎝
⎠
import numpy as np
x = np.array([ [1, 0], [0, 1] ])
vec, val = np.linalg.eig(x)
np.linalg.eigvals(x)
>>> np.linalg.eig(x)
(
array([ 1., 1.]),
array([
[ 1., 0.],
[ 0., 1.]
])
)
Matplotlib

Python Plotting
statsmodels
Advanced Statistics Modeling
NLTK

Natural Language Tool Kit
scikit-learn

Machine Learning
from sklearn import tree
X = [[0, 0], [1, 1]]
Y = [0, 1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)
clf.predi...
PyBrain

... Machine Learning
PyMC
Bayesian Inference
Pattern

Web Mining for Python
NetworkX

Study Networks
MILK: Machine Learning
Pandas

easy-to-use data structures
from pandas import *
x = DataFrame([
{"age": 26},
{"age": 19},
{"age": 21},
{"age": 18}
])
print x[x['age'] > 20].count()
...
Python vs R?

Different Purposes
Dogfooding

Data Scientific Method
Original Question
What is Data Science?
Back to you

For questioning
Data Science, what even?!
Data Science, what even?!
Data Science, what even?!
Data Science, what even?!
Data Science, what even?!
Data Science, what even?!
Data Science, what even?!
Data Science, what even?!
Data Science, what even?!
Data Science, what even?!
Data Science, what even?!
Data Science, what even?!
Data Science, what even?!
Próximos SlideShares
Carregando em…5
×

de

Data Science, what even?! Slide 1 Data Science, what even?! Slide 2 Data Science, what even?! Slide 3 Data Science, what even?! Slide 4 Data Science, what even?! Slide 5 Data Science, what even?! Slide 6 Data Science, what even?! Slide 7 Data Science, what even?! Slide 8 Data Science, what even?! Slide 9 Data Science, what even?! Slide 10 Data Science, what even?! Slide 11 Data Science, what even?! Slide 12 Data Science, what even?! Slide 13 Data Science, what even?! Slide 14 Data Science, what even?! Slide 15 Data Science, what even?! Slide 16 Data Science, what even?! Slide 17 Data Science, what even?! Slide 18 Data Science, what even?! Slide 19 Data Science, what even?! Slide 20 Data Science, what even?! Slide 21 Data Science, what even?! Slide 22 Data Science, what even?! Slide 23 Data Science, what even?! Slide 24 Data Science, what even?! Slide 25 Data Science, what even?! Slide 26 Data Science, what even?! Slide 27 Data Science, what even?! Slide 28 Data Science, what even?! Slide 29 Data Science, what even?! Slide 30 Data Science, what even?! Slide 31 Data Science, what even?! Slide 32 Data Science, what even?! Slide 33 Data Science, what even?! Slide 34 Data Science, what even?! Slide 35 Data Science, what even?! Slide 36 Data Science, what even?! Slide 37 Data Science, what even?! Slide 38 Data Science, what even?! Slide 39 Data Science, what even?! Slide 40 Data Science, what even?! Slide 41 Data Science, what even?! Slide 42 Data Science, what even?! Slide 43 Data Science, what even?! Slide 44 Data Science, what even?! Slide 45 Data Science, what even?! Slide 46 Data Science, what even?! Slide 47 Data Science, what even?! Slide 48 Data Science, what even?! Slide 49 Data Science, what even?! Slide 50 Data Science, what even?! Slide 51 Data Science, what even?! Slide 52 Data Science, what even?! Slide 53 Data Science, what even?! Slide 54 Data Science, what even?! Slide 55 Data Science, what even?! Slide 56 Data Science, what even?! Slide 57 Data Science, what even?! Slide 58 Data Science, what even?! Slide 59 Data Science, what even?! Slide 60 Data Science, what even?! Slide 61 Data Science, what even?! Slide 62 Data Science, what even?! Slide 63 Data Science, what even?! Slide 64 Data Science, what even?! Slide 65 Data Science, what even?! Slide 66 Data Science, what even?! Slide 67 Data Science, what even?! Slide 68 Data Science, what even?! Slide 69 Data Science, what even?! Slide 70 Data Science, what even?! Slide 71 Data Science, what even?! Slide 72 Data Science, what even?! Slide 73 Data Science, what even?! Slide 74 Data Science, what even?! Slide 75 Data Science, what even?! Slide 76 Data Science, what even?! Slide 77 Data Science, what even?! Slide 78 Data Science, what even?! Slide 79 Data Science, what even?! Slide 80 Data Science, what even?! Slide 81 Data Science, what even?! Slide 82 Data Science, what even?! Slide 83 Data Science, what even?! Slide 84 Data Science, what even?! Slide 85 Data Science, what even?! Slide 86 Data Science, what even?! Slide 87 Data Science, what even?! Slide 88 Data Science, what even?! Slide 89 Data Science, what even?! Slide 90 Data Science, what even?! Slide 91 Data Science, what even?! Slide 92 Data Science, what even?! Slide 93 Data Science, what even?! Slide 94 Data Science, what even?! Slide 95 Data Science, what even?! Slide 96 Data Science, what even?! Slide 97 Data Science, what even?! Slide 98 Data Science, what even?! Slide 99 Data Science, what even?! Slide 100 Data Science, what even?! Slide 101 Data Science, what even?! Slide 102 Data Science, what even?! Slide 103 Data Science, what even?! Slide 104 Data Science, what even?! Slide 105 Data Science, what even?! Slide 106 Data Science, what even?! Slide 107 Data Science, what even?! Slide 108 Data Science, what even?! Slide 109 Data Science, what even?! Slide 110 Data Science, what even?! Slide 111 Data Science, what even?! Slide 112 Data Science, what even?! Slide 113 Data Science, what even?! Slide 114 Data Science, what even?! Slide 115 Data Science, what even?! Slide 116 Data Science, what even?! Slide 117 Data Science, what even?! Slide 118 Data Science, what even?! Slide 119 Data Science, what even?! Slide 120
Próximos SlideShares
Sears L
Avançar
Transfira para ler offline e ver em ecrã inteiro.

2 gostaram

Compartilhar

Baixar para ler offline

Data Science, what even?!

Baixar para ler offline

Presented an abridged version of my "What is data science" talk at #websummit 2013.

This talk goes over the required skillset as defined by Drew Conway and his famous venn diagram, and also outlines the Data Scientific Method brought by Dr. Patil. The talk is mainly two parts and the second part goes over some of the packages and technologies we use — minus the storage part.

Audiolivros relacionados

Gratuito durante 30 dias do Scribd

Ver tudo

Data Science, what even?!

  1. 1. Data Science?! what even...
  2. 2. David Coallier @davidcoallier
  3. 3. Data Scientist Engine Yard
  4. 4. And I cook.. A lot.
  5. 5. (n-1) items
  6. 6. Adapting.
  7. 7. Feedback.
  8. 8. Indifference.
  9. 9. Young mathematically inclined minds
  10. 10. Young mathematically inclined minds We knew everything.
  11. 11. First Bad Assumption.
  12. 12. So we asked “experts”.
  13. 13. Wrong Ingredients
  14. 14. Bad Data
  15. 15. Tasted like sh*t
  16. 16. From Our Results We had questions.
  17. 17. Found Expertise Not Online.
  18. 18. Data Scientific Method
  19. 19. Find a Question Your Hypothesis
  20. 20. Current Data What do you have?
  21. 21. Features & Tests Try it.
  22. 22. Analyse Results Won’t be pretty.
  23. 23. Conversation Framed. By. Data.
  24. 24. But....
  25. 25. Good Discussions Imply good data scientists
  26. 26. Hacking Skills
  27. 27. Hacking Skills Maths & Stats
  28. 28. Hacking Skills Expertise Maths & Stats
  29. 29. Hacking Skills Machine Learning Danger Zone!!! Expertise Research Maths & Stats
  30. 30. Hacking Skills Data Science Expertise Maths & Stats
  31. 31. Hacking Skills Danger Zone!!! Machine Learning Data Science Maths & Stats Expertise Research
  32. 32. Business Don’t need an MBA
  33. 33. In other words.
  34. 34. 1. Hacking 2. Maths & Stats 3. Expertise
  35. 35. Apply Method Data Scientific
  36. 36. 1. Question 2. Current Data 3. Features/Tests 4. Analyse 5. Converse
  37. 37. Find a Question Let’s imagine Github
  38. 38. Upgrade Repos Affect users as little as possible
  39. 39. import csv content = csv.read('repo1.csv')
  40. 40. λ e f (k; λ ) = k! k −k for k >= 0
  41. 41. Converse Present Findings
  42. 42. Iterate Commits aren’t key.
  43. 43. KPIs are key Indicators from experience
  44. 44. Questions Super Important.
  45. 45. Just test it..
  46. 46. We are Human. Emotional Connection
  47. 47. What next? Second Hypothesis.
  48. 48. Focus on Data Relevant to your KPIs.
  49. 49. Data gives you the what Humans give you the why
  50. 50. Turn Information
  51. 51. Into Actionable Insight
  52. 52. Create Discussions Introspection Engines
  53. 53. Seeing, Feeling it The brain sees.
  54. 54. Not regressions
  55. 55. Not p-values
  56. 56. Not slopes
  57. 57. Not F-statistics
  58. 58. Not coefficients
  59. 59. Question Data Not Visualisations.
  60. 60. Toolbox What do we use?
  61. 61. R Modeling, Testing, Prototyping
  62. 62. RStudio The IDE
  63. 63. lubridate and zoo Dealing with Dates...
  64. 64. yy/mm/dd mm/dd/yy YYYY-mm-dd HH:MM:ss TZ yy-mm-dd 1363784094.513425 yy/mm different timezone
  65. 65. reshape2 Reshape your Data
  66. 66. ggplot2 Visualise your Data
  67. 67. RCurl, RJSONIO Find more Data
  68. 68. HMisc Miscellaneous useful functions
  69. 69. forecast Can you guess?
  70. 70. garch Generalized Autoregressive Conditional Heteroskedasticity
  71. 71. quantmod Statistical Financial Trading
  72. 72. getSymbols('AAPL') barChart(AAPL) addMACD()
  73. 73. xts Extensible Time Series
  74. 74. igraph Study Networks
  75. 75. maptools Read & View Maps
  76. 76. map('state', region = c(row.names(USArrests)), col=cm.colors(16, 1)[floor(USArrests$Rape/max(USArrests$Rape)*28)], fill=T)
  77. 77. Python Scientific Computing
  78. 78. SciPy http://www.scipy.org
  79. 79. scipy.stats
  80. 80. scipy.stats Descriptive Statistics
  81. 81. from scipy.stats import describe s = [1,2,1,3,4,5] print describe(s)
  82. 82. scipy.stats Probability Distributions
  83. 83. Example Poisson Distribution
  84. 84. λ e f (k; λ ) = k! k −k for k >= 0
  85. 85. import scipy.stats.poisson p = poisson.pmf([1,2,3,4,1,2,3], 2)
  86. 86. print p.mean() print p.sum() ...
  87. 87. NumPy http://www.numpy.org/
  88. 88. NumPy Linear Algebra
  89. 89. ⎛ 1 0 ⎞ ⎜ 0 1 ⎟ ⎝ ⎠
  90. 90. import numpy as np x = np.array([ [1, 0], [0, 1] ]) vec, val = np.linalg.eig(x) np.linalg.eigvals(x)
  91. 91. >>> np.linalg.eig(x) ( array([ 1., 1.]), array([ [ 1., 0.], [ 0., 1.] ]) )
  92. 92. Matplotlib Python Plotting
  93. 93. statsmodels Advanced Statistics Modeling
  94. 94. NLTK Natural Language Tool Kit
  95. 95. scikit-learn Machine Learning
  96. 96. from sklearn import tree X = [[0, 0], [1, 1]] Y = [0, 1] clf = tree.DecisionTreeClassifier() clf = clf.fit(X, Y) clf.predict([[2., 2.]]) >>> array([1])
  97. 97. PyBrain ... Machine Learning
  98. 98. PyMC Bayesian Inference
  99. 99. Pattern Web Mining for Python
  100. 100. NetworkX Study Networks
  101. 101. MILK: Machine Learning
  102. 102. Pandas easy-to-use data structures
  103. 103. from pandas import * x = DataFrame([ {"age": 26}, {"age": 19}, {"age": 21}, {"age": 18} ]) print x[x['age'] > 20].count() print x[x['age'] > 20].mean()
  104. 104. Python vs R? Different Purposes
  105. 105. Dogfooding Data Scientific Method
  106. 106. Original Question What is Data Science?
  107. 107. Back to you For questioning
  • HugoWenlongZheng

    Apr. 7, 2014
  • ssuser0bf892

    Jan. 26, 2014

Presented an abridged version of my "What is data science" talk at #websummit 2013. This talk goes over the required skillset as defined by Drew Conway and his famous venn diagram, and also outlines the Data Scientific Method brought by Dr. Patil. The talk is mainly two parts and the second part goes over some of the packages and technologies we use — minus the storage part.

Vistos

Vistos totais

1.795

No Slideshare

0

De incorporações

0

Número de incorporações

212

Ações

Baixados

26

Compartilhados

0

Comentários

0

Curtir

2

×