O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Próximos SlideShares
Carregando em…5
×

de

Próximos SlideShares
やはり日本の技術基準適合証明はまちがっている?
Avançar
Transfira para ler offline e ver em ecrã inteiro.

3 gostaram

Compartilhar

# The Artful Business of Data Mining: Computational Statistics with Open Source Tools

This talk goes over a concepts of data mining and data analysis using open source tools, mainly Python and R with interesting libraries and the tools I have used and currently use at Engine Yard.

Ver tudo

Ver tudo

### The Artful Business of Data Mining: Computational Statistics with Open Source Tools

1. 1. The Artful Business of Data Mining Computational Statistics with Open Source Tool Wednesday 20 March 13
2. 2. David Coallier @davidcoallier Wednesday 20 March 13
3. 3. Data Scientist At Engine Yard (.com) Wednesday 20 March 13
4. 4. Find Data Wednesday 20 March 13
5. 5. Clean Data Wednesday 20 March 13
6. 6. Analyse Data? Wednesday 20 March 13
7. 7. Analyse Data Wednesday 20 March 13
8. 8. Question Data Wednesday 20 March 13
9. 9. Report Findings Wednesday 20 March 13
10. 10. Data Scientist Wednesday 20 March 13
11. 11. Data Janitor Wednesday 20 March 13
12. 12. Actual Tasks Wednesday 20 March 13
13. 13. “If your model is elegant, it’s probably wrong” Wednesday 20 March 13
14. 14. “The Times they are a-Changing” — Bob Dylan Wednesday 20 March 13
15. 15. Python & R Wednesday 20 March 13
16. 16. SciPy http://www.scipy.org Wednesday 20 March 13
17. 17. scipy.stats Wednesday 20 March 13
18. 18. scipy.stats Descriptive Statistics Wednesday 20 March 13
19. 19. from scipy.stats import describe s = [1,2,1,3,4,5] print describe(s) Wednesday 20 March 13
20. 20. scipy.stats Probability Distributions Wednesday 20 March 13
21. 21. Example Poisson Distribution Wednesday 20 March 13
22. 22. λ e k −k f (k; λ ) = k! for k >= 0 Wednesday 20 March 13
23. 23. import scipy.stats.poisson p = poisson.pmf([1,2,3,4,1,2,3], 2) Wednesday 20 March 13
24. 24. print p.mean() print p.sum() ... Wednesday 20 March 13
25. 25. NumPy http://www.numpy.org/ Wednesday 20 March 13
26. 26. NumPy Linear Algebra Wednesday 20 March 13
27. 27. ⎛ 1 0 ⎞ ⎜ 0 1 ⎟ ⎝ ⎠ Wednesday 20 March 13
28. 28. import numpy as np x = np.array([ [1, 0], [0, 1] ]) vec, val = np.linalg.eig(x) np.linalg.eigvals(x) Wednesday 20 March 13
29. 29. >>> np.linalg.eig(x) ( array([ 1., 1.]), array([ [ 1., 0.], [ 0., 1.] ]) ) Wednesday 20 March 13
30. 30. Matplotlib Python Plotting Wednesday 20 March 13
31. 31. statsmodels Advanced Statistics Modeling Wednesday 20 March 13
32. 32. NLTK Natural Language Tool Kit Wednesday 20 March 13
33. 33. scikit-learn Machine Learning Wednesday 20 March 13
34. 34. from sklearn import tree X = [[0, 0], [1, 1]] Y = [0, 1] clf = tree.DecisionTreeClassifier() clf = clf.fit(X, Y) clf.predict([[2., 2.]]) >>> array([1]) Wednesday 20 March 13
35. 35. PyBrain ... Machine Learning Wednesday 20 March 13
36. 36. PyMC Bayesian Inference Wednesday 20 March 13
37. 37. Pattern Web Mining for Python Wednesday 20 March 13
38. 38. NetworkX Study Networks Wednesday 20 March 13
39. 39. MILK MOAR machine LEARNING! Wednesday 20 March 13
40. 40. Pandas easy-to-use data structures Wednesday 20 March 13
41. 41. from pandas import * x = DataFrame([ {"age": 26}, {"age": 19}, {"age": 21}, {"age": 18} ]) print x[x['age'] > 20].count() print x[x['age'] > 20].mean() Wednesday 20 March 13
42. 42. R Wednesday 20 March 13
43. 43. RStudio The IDE Wednesday 20 March 13
44. 44. lubridate and zoo Dealing with Dates... Wednesday 20 March 13
45. 45. yy/mm/dd mm/dd/yy YYYY-mm-dd HH:MM:ss TZ yy-mm-dd 1363784094.513425 yy/mm different timezone Wednesday 20 March 13
46. 46. reshape2 Reshape your Data Wednesday 20 March 13
47. 47. ggplot2 Visualise your Data Wednesday 20 March 13
48. 48. RCurl, RJSONIO Find more Data Wednesday 20 March 13
49. 49. HMisc Miscellaneous useful functions Wednesday 20 March 13
50. 50. forecast Can you guess? Wednesday 20 March 13
51. 51. garch And ruGarch Wednesday 20 March 13
52. 52. quantmod Statistical Financial Trading Wednesday 20 March 13
53. 53. xts Extensible Time Series Wednesday 20 March 13
54. 54. igraph Study Networks Wednesday 20 March 13
55. 55. maptools Read & View Maps Wednesday 20 March 13
56. 56. map('state', region = c(row.names(USArrests)), col=cm.colors(16, 1)[ﬂoor(USArrests\$Rape/max(USArrests\$Rape)*28)], ﬁll=T) Wednesday 20 March 13
57. 57. Sto rage Wednesday 20 March 13
58. 58. Oppose “big” Data Wednesday 20 March 13
59. 59. “Learn how to sample” Wednesday 20 March 13
60. 60. Experim ents Wednesday 20 March 13
61. 61. What Do You Want to Answer? Wednesday 20 March 13
62. 62. Understand Your Audience Wednesday 20 March 13
63. 63. Scientific Reporting Wednesday 20 March 13
64. 64. Busy-ness Time is money Wednesday 20 March 13
65. 65. Public Visualisation Wednesday 20 March 13
66. 66. Best Visualisation, Bad Data Wednesday 20 March 13
67. 67. Best Forecasting models... Bad Visualisation Wednesday 20 March 13
68. 68. Wednesday 20 March 13
69. 69. Wednesday 20 March 13
70. 70. Sean chaí Wednesday 20 March 13
71. 71. Wednesday 20 March 13
72. 72. Feel it Wednesday 20 March 13
73. 73. Wednesday 20 March 13
74. 74. Wednesday 20 March 13
75. 75. Wednesday 20 March 13
76. 76. “Don’t be scared of bar charts.” Wednesday 20 March 13
77. 77. Mathematical Statistics Engineering Business Economics Curiosity Wednesday 20 March 13
78. 78. davidcoallier.github.com @davidcoallier on Twitter Wednesday 20 March 13
• #### nunoedgarfernandes

Oct. 20, 2013
• #### maheshcr

Mar. 30, 2013
• #### TakeshiWatanabe2

Mar. 22, 2013

This talk goes over a concepts of data mining and data analysis using open source tools, mainly Python and R with interesting libraries and the tools I have used and currently use at Engine Yard.

#### Vistos

Vistos totais

1.105

No Slideshare

0

De incorporações

0

Número de incorporações

2

30