SlideShare uma empresa Scribd logo
1 de 53
Baixar para ler offline
1
Sellers
30 thousand
active sellers
Buyers
9 million
active buyers
2
Since maternity, Elo7 has been involved in the
growth of many babies and children
3
1 out of 4 brides organize their
wedding with Elo7
4
What it is like to be a data scientist at Elo7?
Cinthia Tanaka, PhD
Applied Math
Data Scientist @ Elo7
Samara Alves, MSc.
Statistics
Data Scientist @ Elo7
Junior A. Koch, PhD
Mathematical Physics
Data Scientist @ Elo7
Renato Cordeiro
Computer Science
ML Engineer @ Elo7
Igor Bonadio, PhD
Computer Science
Lead Data Scientist @ Elo7
Research
+
Product
development
Autonomy
TeamworkStudy groups
Creativity
5
About me
Graduação em
física
2003-2007
Mestrado e
Doutorado
(técnicas
experimentais)
2008-2014
Professor de
Física e Cálculo
(programação e
matemática)
2013-2017
Pós-doutorado
(matemática
teórica ++)
2018
Decisão de
trilhar Ciência
de dados
2018
Elo7
(problemas
práticos,
colaboração e
pesquisa)
2019
7
https://elo7.gupy.io/
Modelos Geradores
Team Hightower
PhD. Junior A. Koch
Data Scientist @ Elo7
8Elo7.dev Workshops - GAN School 1: 10/12/2019
I need data, lots of data
O que é um modelo gerador?
Machine learning tradicional:
temos os dados e queremos
uma função ou regra que
categoriza os dados.
Abordagem de geradores:
estamos preocupados em
aprender a distribuição dos
dados de cada umas das
classes presentes.
O que é um modelo gerador? Exemplo
Preço dos produtos Preço do frete
Idade dos compradores Tempo mensal no site
Distribuição dos dados
Dados originais
Amostragem de novos dados
Distribuição dos dados
Dados originais
Amostragem de novos dados
Distribuição dos dados
Tipos de modelos geradores
15
Tipos de modelos geradores
16
Modelos Geradores são difíceis
Pipeline de aprendizado comum no estudo de
machine learning:
1. Basics
1. Supervised,unsupervised,reinforcement
2. Bias-variance trade-off
3. Overfitting, underfitting
2. Gradient descent:
3. Linear discriminant analysis (LDA)
4. Principal Component Analysis(PCA)
5. Learning Vector Quantization (LVQ)
6. Regularization methods
7. Kernel smoothing methods
8. Ensemble learning
9. Ordinary least squares
10. ...
17
Por que (na minha opinião)?
1. Exige mais modelagem
2. É difícil criar uma biblioteca
standard
3. Cada aplicação apresenta suas
próprias peculiaridades
Associação e causa
18
(X,y) y’ min erro
f Loss
y’ = 2x1
+1x2
-3x3
Podemos dizer que SE aumentarmos x1
e x2
então y’
vai aumentar também?
Lei zero de inferência causal:
Associação e causa são
diferentes.
Em ML tradicional apenas
aprendemos uma função que
de adequa aos dados e isso
não quer dizer que podemos
inferir que as features
escolhidas são a causa da
variável de saída.
Como fazer inferência causal?
19
(X,y) y’ min erro
f Loss
p(y,x1
,x2
,x3
)
Precisamos conhecer muito
bem o problema para
escolher boas features e
então aprender distribuições
condicionais para o grafo
causal escolhido.
Nota: podemos fazer tudo
isso e perceber que não foi
suficiente.
p(y,x1
,x2
,x3
) = p(y|x1
,x2
,x3
)p(x1
,x2
,x3
)
X2
X1
Y X3
Exemplo de gráfico causal:
Wasserman, Larry. All of statistics: a concise course in statistical
inference. Springer Science & Business Media, 2013.
Dependência entre características dos dados
20
Kocaoglu, Murat, et al. "Causalgan: Learning causal implicit
generative models with adversarial training." arXiv preprint
arXiv:1709.02023 (2017).
Variational Autoencoder
21
min Kingma, Diederik P., and Max
Welling. "Auto-encoding variational
bayes." arXiv preprint
arXiv:1312.6114 (2013).
Variational Autoencoder
22
Zi
Xi
GANs e Treino Adversarial
Team Hightower
PhD. Junior A. Koch
Data Scientist @ Elo7
23Elo7.dev Workshops - GAN School 1: 10/12/2019
Por que GANs?
24
Aplicações: CycleGAN
25
Zhu, Jun-Yan, et al. "Unpaired image-to-image translation using
cycle-consistent adversarial networks." Proceedings of the IEEE
international conference on computer vision. 2017.
Aplicações: SRGAN
26
Ledig, Christian, et al. "Photo-realistic single image super-resolution
using a generative adversarial network." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2017.
Aplicações: InfoGAN
27
Chen, Xi, et al. "Infogan: Interpretable
representation learning by information maximizing
generative adversarial nets." Advances in neural
information processing systems. 2016.
Dados reais ou falsos?
Dados reais ou falsos?
Real ou Falso?
Generative Adversarial Networks
The GANfather - Ian Goodfellow, 2014
Yoshua Bengio
Goodfellow, Ian, Yoshua Bengio, and Aaron
Courville. Deep learning. MIT press, 2016.
Let`s play a game
Falsificador - GeradorInvestigador - Discriminador
1
Adversarial Training
35
D G
Dado real: X
Ruído: z
Dado falso: X'
Etapa 1: dados reais
D(X) = P(X) real
Adversarial Training
36
D G
Dado real: X Dado falso: X'
D(X') = P(X') real 1-D(X') = P(X') falso
Etapa 2: dados falsos
Adversarial Training
Ruído: z
37
D G
Dado real: X Dado falso: X'
max D(X), min D(X')
Etapa 3: treino do discriminador
Adversarial Training
Ruído: z
38
D G
Dado real: X Dado falso: X'
Etapa 4: treino do gerador
min D(X), max D(X')
Adversarial Training
Ruído: z
39
z
Adversarial Training
https://cs.stanford.edu/people/karpathy/gan/
Dados falsosDados reais
Erro
Evolução do treino
Adversarial Training
Loss - MinMax Game
41
D: tenta maximizar a
probabilidade de X ser
real e G(z) ser falso
G: tenta minimizar a
probabilidade de X real
e G(z) ser falso
O primeiro desafio: instabilidade
JD
- JG
= 0
42
Adversarial Training
Loss - Non Saturating GAN (NSGAN)
43
D: tenta maximizar a
probabilidade de X ser
real e G(z) ser falso
G: tenta maximizar a
probabilidade de G(z)
ser real
GAN Zoo
44
Mão na massa...
45
Conditional GAN
Team Hightower
PhD. Junior A. Koch
Data Scientist @ Elo7
46Elo7.dev Workshops - GAN School 1: 10/12/2019
CGAN
47
D G
Dado real: X Dado falso: X'
Etapa 1: dados reais
D(X) = P(X) real
y y
Ruído: z
48
D G
D(X',y) = P(X'|y) real 1-D(X',y) = P(X'|y) falso
Etapa 2: dados falsos
Dado real: X Dado falso: X'y y
yRuído: z
Adversarial Training
49
D G
max D(X|y), min D(X'|y)
Etapa 3: treino do discriminador
Dado real: X Dado falso: X'y y
yRuído: z
Adversarial Training
Adversarial Training
50
D G
Etapa 4: treino do gerador
max D(X'|y)
yRuído: z
Dado real: X Dado falso: X'y y
Loss - MinMax Game
51
D: tenta maximizar a probabilidade
de (X|y) ser real e (G(z|y)|y) ser
falso
G: tenta maximizar a probabilidade
de (G(z|y)|y) ser real
Vantagens do CGAN
52
Mode
collapse
Maior
controle
Mão na massa...
53

Mais conteúdo relacionado

Destaque

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 

Destaque (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

GAN School - Elo7 Workshops

  • 1. 1
  • 3. Since maternity, Elo7 has been involved in the growth of many babies and children 3
  • 4. 1 out of 4 brides organize their wedding with Elo7 4
  • 5. What it is like to be a data scientist at Elo7? Cinthia Tanaka, PhD Applied Math Data Scientist @ Elo7 Samara Alves, MSc. Statistics Data Scientist @ Elo7 Junior A. Koch, PhD Mathematical Physics Data Scientist @ Elo7 Renato Cordeiro Computer Science ML Engineer @ Elo7 Igor Bonadio, PhD Computer Science Lead Data Scientist @ Elo7 Research + Product development Autonomy TeamworkStudy groups Creativity 5
  • 6. About me Graduação em física 2003-2007 Mestrado e Doutorado (técnicas experimentais) 2008-2014 Professor de Física e Cálculo (programação e matemática) 2013-2017 Pós-doutorado (matemática teórica ++) 2018 Decisão de trilhar Ciência de dados 2018 Elo7 (problemas práticos, colaboração e pesquisa) 2019
  • 8. Modelos Geradores Team Hightower PhD. Junior A. Koch Data Scientist @ Elo7 8Elo7.dev Workshops - GAN School 1: 10/12/2019
  • 9. I need data, lots of data
  • 10. O que é um modelo gerador? Machine learning tradicional: temos os dados e queremos uma função ou regra que categoriza os dados. Abordagem de geradores: estamos preocupados em aprender a distribuição dos dados de cada umas das classes presentes.
  • 11. O que é um modelo gerador? Exemplo
  • 12. Preço dos produtos Preço do frete Idade dos compradores Tempo mensal no site Distribuição dos dados
  • 13. Dados originais Amostragem de novos dados Distribuição dos dados
  • 14. Dados originais Amostragem de novos dados Distribuição dos dados
  • 15. Tipos de modelos geradores 15
  • 16. Tipos de modelos geradores 16
  • 17. Modelos Geradores são difíceis Pipeline de aprendizado comum no estudo de machine learning: 1. Basics 1. Supervised,unsupervised,reinforcement 2. Bias-variance trade-off 3. Overfitting, underfitting 2. Gradient descent: 3. Linear discriminant analysis (LDA) 4. Principal Component Analysis(PCA) 5. Learning Vector Quantization (LVQ) 6. Regularization methods 7. Kernel smoothing methods 8. Ensemble learning 9. Ordinary least squares 10. ... 17 Por que (na minha opinião)? 1. Exige mais modelagem 2. É difícil criar uma biblioteca standard 3. Cada aplicação apresenta suas próprias peculiaridades
  • 18. Associação e causa 18 (X,y) y’ min erro f Loss y’ = 2x1 +1x2 -3x3 Podemos dizer que SE aumentarmos x1 e x2 então y’ vai aumentar também? Lei zero de inferência causal: Associação e causa são diferentes. Em ML tradicional apenas aprendemos uma função que de adequa aos dados e isso não quer dizer que podemos inferir que as features escolhidas são a causa da variável de saída.
  • 19. Como fazer inferência causal? 19 (X,y) y’ min erro f Loss p(y,x1 ,x2 ,x3 ) Precisamos conhecer muito bem o problema para escolher boas features e então aprender distribuições condicionais para o grafo causal escolhido. Nota: podemos fazer tudo isso e perceber que não foi suficiente. p(y,x1 ,x2 ,x3 ) = p(y|x1 ,x2 ,x3 )p(x1 ,x2 ,x3 ) X2 X1 Y X3 Exemplo de gráfico causal: Wasserman, Larry. All of statistics: a concise course in statistical inference. Springer Science & Business Media, 2013.
  • 20. Dependência entre características dos dados 20 Kocaoglu, Murat, et al. "Causalgan: Learning causal implicit generative models with adversarial training." arXiv preprint arXiv:1709.02023 (2017).
  • 21. Variational Autoencoder 21 min Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint arXiv:1312.6114 (2013).
  • 23. GANs e Treino Adversarial Team Hightower PhD. Junior A. Koch Data Scientist @ Elo7 23Elo7.dev Workshops - GAN School 1: 10/12/2019
  • 25. Aplicações: CycleGAN 25 Zhu, Jun-Yan, et al. "Unpaired image-to-image translation using cycle-consistent adversarial networks." Proceedings of the IEEE international conference on computer vision. 2017.
  • 26. Aplicações: SRGAN 26 Ledig, Christian, et al. "Photo-realistic single image super-resolution using a generative adversarial network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  • 27. Aplicações: InfoGAN 27 Chen, Xi, et al. "Infogan: Interpretable representation learning by information maximizing generative adversarial nets." Advances in neural information processing systems. 2016.
  • 28. Dados reais ou falsos?
  • 29. Dados reais ou falsos?
  • 32. The GANfather - Ian Goodfellow, 2014 Yoshua Bengio Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
  • 33. Let`s play a game Falsificador - GeradorInvestigador - Discriminador
  • 35. 35 D G Dado real: X Ruído: z Dado falso: X' Etapa 1: dados reais D(X) = P(X) real Adversarial Training
  • 36. 36 D G Dado real: X Dado falso: X' D(X') = P(X') real 1-D(X') = P(X') falso Etapa 2: dados falsos Adversarial Training Ruído: z
  • 37. 37 D G Dado real: X Dado falso: X' max D(X), min D(X') Etapa 3: treino do discriminador Adversarial Training Ruído: z
  • 38. 38 D G Dado real: X Dado falso: X' Etapa 4: treino do gerador min D(X), max D(X') Adversarial Training Ruído: z
  • 41. Loss - MinMax Game 41 D: tenta maximizar a probabilidade de X ser real e G(z) ser falso G: tenta minimizar a probabilidade de X real e G(z) ser falso
  • 42. O primeiro desafio: instabilidade JD - JG = 0 42 Adversarial Training
  • 43. Loss - Non Saturating GAN (NSGAN) 43 D: tenta maximizar a probabilidade de X ser real e G(z) ser falso G: tenta maximizar a probabilidade de G(z) ser real
  • 46. Conditional GAN Team Hightower PhD. Junior A. Koch Data Scientist @ Elo7 46Elo7.dev Workshops - GAN School 1: 10/12/2019
  • 47. CGAN 47 D G Dado real: X Dado falso: X' Etapa 1: dados reais D(X) = P(X) real y y Ruído: z
  • 48. 48 D G D(X',y) = P(X'|y) real 1-D(X',y) = P(X'|y) falso Etapa 2: dados falsos Dado real: X Dado falso: X'y y yRuído: z Adversarial Training
  • 49. 49 D G max D(X|y), min D(X'|y) Etapa 3: treino do discriminador Dado real: X Dado falso: X'y y yRuído: z Adversarial Training
  • 50. Adversarial Training 50 D G Etapa 4: treino do gerador max D(X'|y) yRuído: z Dado real: X Dado falso: X'y y
  • 51. Loss - MinMax Game 51 D: tenta maximizar a probabilidade de (X|y) ser real e (G(z|y)|y) ser falso G: tenta maximizar a probabilidade de (G(z|y)|y) ser real