O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Data Science @ Scale
@davidcoallier
Part of an amazing team at Barricade.io
Data Science is
Hard
Data Hacking is
“Easy”
Data Analysis is
“Easy”
Data Expertise is
“Easy”
Got all?
Having the three is real hard!
Is that it?
Well don’t forget your purpose.
You are not an economist.
ɪˈkɒnəmɪst/: Someone with all the answers, and none of the questions.
The Data Scientific
Method
Find a question.
Use the data you have
Features & Tests
Analyse Results
You will be sad.
Conversate
Talk about your findings.
Good Chats
Imply egoless and collaborative data scientists.
Recap.
1. Hacking
2. Maths & Stats
3. Expertise
And
1. Question
2. Be Pragmatic
3. Features
4. Analyse
5. Share.
A team!
Rarely a single-person effort.
An Example
Fraud Prevention — Business Prevention
I knew better.
Obviously… duh
We didn’t share.
Science has historically been shared.
Not with p-values
Empathise.
Use human language, not lingo.
For us at
Barricade
Doing this at
scale is hard.
We’re still small
About a billion data points a day.
Humble Beginnings
Typically… an Queue and an API.
This had issues.
Hard to scale, hard to decouple, etc.
Enter the
Lambda Architecture.
Speed Layer
Batch Layer
Speed Layer: U new behaviour from new data
Batch Layer: All classified behaviour since T
Serving Layer
Speed Layer: U new behaviour from new data
Batch Layer: All classified behaviour since T
Serve Layer: Batch layer U Speed ...
Cache Layer
On Amazon AWS
Identifying an
Attack.
Ahh! What’s that?
Kafka Queue.
Distributed messaging system
Append-only log
Consumers have offsets
Partition for parallelism
Replicate for r...
Barricade
Customer
Questions?
@davidcoallier
@barricadeio
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
Próximos SlideShares
Carregando em…5
×

de

Data Science at Scale @ barricade.io Slide 1 Data Science at Scale @ barricade.io Slide 2 Data Science at Scale @ barricade.io Slide 3 Data Science at Scale @ barricade.io Slide 4 Data Science at Scale @ barricade.io Slide 5 Data Science at Scale @ barricade.io Slide 6 Data Science at Scale @ barricade.io Slide 7 Data Science at Scale @ barricade.io Slide 8 Data Science at Scale @ barricade.io Slide 9 Data Science at Scale @ barricade.io Slide 10 Data Science at Scale @ barricade.io Slide 11 Data Science at Scale @ barricade.io Slide 12 Data Science at Scale @ barricade.io Slide 13 Data Science at Scale @ barricade.io Slide 14 Data Science at Scale @ barricade.io Slide 15 Data Science at Scale @ barricade.io Slide 16 Data Science at Scale @ barricade.io Slide 17 Data Science at Scale @ barricade.io Slide 18 Data Science at Scale @ barricade.io Slide 19 Data Science at Scale @ barricade.io Slide 20 Data Science at Scale @ barricade.io Slide 21 Data Science at Scale @ barricade.io Slide 22 Data Science at Scale @ barricade.io Slide 23 Data Science at Scale @ barricade.io Slide 24 Data Science at Scale @ barricade.io Slide 25 Data Science at Scale @ barricade.io Slide 26 Data Science at Scale @ barricade.io Slide 27 Data Science at Scale @ barricade.io Slide 28 Data Science at Scale @ barricade.io Slide 29 Data Science at Scale @ barricade.io Slide 30 Data Science at Scale @ barricade.io Slide 31 Data Science at Scale @ barricade.io Slide 32 Data Science at Scale @ barricade.io Slide 33 Data Science at Scale @ barricade.io Slide 34 Data Science at Scale @ barricade.io Slide 35 Data Science at Scale @ barricade.io Slide 36 Data Science at Scale @ barricade.io Slide 37 Data Science at Scale @ barricade.io Slide 38 Data Science at Scale @ barricade.io Slide 39 Data Science at Scale @ barricade.io Slide 40 Data Science at Scale @ barricade.io Slide 41 Data Science at Scale @ barricade.io Slide 42 Data Science at Scale @ barricade.io Slide 43 Data Science at Scale @ barricade.io Slide 44 Data Science at Scale @ barricade.io Slide 45 Data Science at Scale @ barricade.io Slide 46 Data Science at Scale @ barricade.io Slide 47 Data Science at Scale @ barricade.io Slide 48 Data Science at Scale @ barricade.io Slide 49 Data Science at Scale @ barricade.io Slide 50 Data Science at Scale @ barricade.io Slide 51 Data Science at Scale @ barricade.io Slide 52 Data Science at Scale @ barricade.io Slide 53 Data Science at Scale @ barricade.io Slide 54 Data Science at Scale @ barricade.io Slide 55 Data Science at Scale @ barricade.io Slide 56 Data Science at Scale @ barricade.io Slide 57 Data Science at Scale @ barricade.io Slide 58 Data Science at Scale @ barricade.io Slide 59 Data Science at Scale @ barricade.io Slide 60 Data Science at Scale @ barricade.io Slide 61 Data Science at Scale @ barricade.io Slide 62
Próximos SlideShares
Peer Review - MFADT thesis
Avançar
Transfira para ler offline e ver em ecrã inteiro.

0 gostaram

Compartilhar

Baixar para ler offline

Data Science at Scale @ barricade.io

Baixar para ler offline

This talk describes the challenges with data science and how we run data analysis at scale at https://Barricade.io

Livros relacionados

Gratuito durante 30 dias do Scribd

Ver tudo
  • Seja a primeira pessoa a gostar disto

Data Science at Scale @ barricade.io

  1. 1. Data Science @ Scale
  2. 2. @davidcoallier Part of an amazing team at Barricade.io
  3. 3. Data Science is Hard
  4. 4. Data Hacking is “Easy”
  5. 5. Data Analysis is “Easy”
  6. 6. Data Expertise is “Easy”
  7. 7. Got all? Having the three is real hard!
  8. 8. Is that it? Well don’t forget your purpose.
  9. 9. You are not an economist. ɪˈkɒnəmɪst/: Someone with all the answers, and none of the questions.
  10. 10. The Data Scientific Method
  11. 11. Find a question.
  12. 12. Use the data you have
  13. 13. Features & Tests
  14. 14. Analyse Results You will be sad.
  15. 15. Conversate Talk about your findings.
  16. 16. Good Chats Imply egoless and collaborative data scientists.
  17. 17. Recap.
  18. 18. 1. Hacking 2. Maths & Stats 3. Expertise
  19. 19. And
  20. 20. 1. Question 2. Be Pragmatic 3. Features 4. Analyse 5. Share.
  21. 21. A team! Rarely a single-person effort.
  22. 22. An Example Fraud Prevention — Business Prevention
  23. 23. I knew better. Obviously… duh
  24. 24. We didn’t share. Science has historically been shared.
  25. 25. Not with p-values
  26. 26. Empathise. Use human language, not lingo.
  27. 27. For us at Barricade
  28. 28. Doing this at scale is hard.
  29. 29. We’re still small About a billion data points a day.
  30. 30. Humble Beginnings Typically… an Queue and an API.
  31. 31. This had issues. Hard to scale, hard to decouple, etc.
  32. 32. Enter the Lambda Architecture.
  33. 33. Speed Layer
  34. 34. Batch Layer
  35. 35. Speed Layer: U new behaviour from new data Batch Layer: All classified behaviour since T
  36. 36. Serving Layer
  37. 37. Speed Layer: U new behaviour from new data Batch Layer: All classified behaviour since T Serve Layer: Batch layer U Speed Layer
  38. 38. Cache Layer
  39. 39. On Amazon AWS
  40. 40. Identifying an Attack.
  41. 41. Ahh! What’s that?
  42. 42. Kafka Queue. Distributed messaging system Append-only log Consumers have offsets Partition for parallelism Replicate for redundancy Message order guaranteed, per-partition
  43. 43. Barricade Customer
  44. 44. Questions?
  45. 45. @davidcoallier @barricadeio

This talk describes the challenges with data science and how we run data analysis at scale at https://Barricade.io

Vistos

Vistos totais

1.127

No Slideshare

0

De incorporações

0

Número de incorporações

80

Ações

Baixados

4

Compartilhados

0

Comentários

0

Curtir

0

×