O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Armando Benitez -- Data x Desing

93 visualizações

Publicada em

https://www.picatic.com/dataxdesign

Publicada em: Dados e análise
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Armando Benitez -- Data x Desing

  1. 1. Machine Learning Applications Armando Benitez BMO Capital Markets Jul 18, 2016
  2. 2. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 • Located on the outskirts of Geneva. France - Switzerland • 27 km in circumference • The tunnel is buried around 50 to 175 m. underground. 2 LHC - CERN
  3. 3. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 3 Atlas Detector Detector Amplifier Digitizer selection storage computers Particle signal Trash 010010 5/6/03 Shabnam Jabeen (Kansas) Trig
  4. 4. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 4 Multiple Algorithms in Parallel"#$%&'()&(%*+(,($-.&.+/*%012. !!!!!"##$%&'!!!!!!!!!!!!!!!!"()&$*(+!!!!!!!!!!,(%-*. /&0*$*#+!1-&&$!!2&3-(4!2&%5#-6$!!74&8&+%$ Using another ML algorithm to combine the result of individual classifiers. Purpose: extract all possible information from the Dataset. The Combination produces an output, from where all measurements are obtained Combine
  5. 5. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 5 Mobile Market Place
  6. 6. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 Data Processing and Modelling Transaction grade APIs + MQs Data Lake HBase, Cassandra, etc. Stream Processing Batch Processing Model Generator Decision Engine (context, event, data) (event) (data) Feature Selection Model Training Model Evaluation Model Assembly Real-Time Layer Batch Processing Layer { Data Science 1. Fraud Detection 2. Search 3. Recommendations 4. Notifications 5. Ratings 6. Merchant Intelligence 7. Engagement Optimization 8. Marketing Optimization 9. App Personalization 10. Ad Network Support 11. Image / Speech Recognition Theory (Math, Algorithms) Proof-of-Concept (R, Python, Scala, C++) Spark Implementation (Scalability, Robustness) Platform Integration
  7. 7. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 Fraud Detection 7 • Very small number of fraud cases • Large number of good transactions • Many different “types” of anomalies. Hard for algorithms to learn from positive examples what the anomalies look like • Future anomalies may look nothing like any of the anomalous examples we’ve seen so far
  8. 8. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 8 Personalization • Offers targeted for each user • Use browsing history and shopping habits to determine products the user is most likely to buy • Similarity among users • Similarity among items • Catalog search results
  9. 9. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 9 Incorporating ML to Design Visual Inputs Aural Inputs Corporal Inputs Environmental Inputs
  10. 10. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 • Machine Learning algorithm capable of discovering pattern with data presented to them. How can we make use of it? • Find discovery opportunities that only are possible with the help of Machine Learning • Designers and programmers to establish a strong collaboration to find ground- breaking applications. • Understand rules to know which ones to bend or break 10 Creating Dialogue
  11. 11. Extra Slides
  12. 12. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 12 Search Strategy Initial objects Found it! 15 )2 Invariant Mass (GeV/c 5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7 Events/(0.05) 2 4 6 8 10 12 )2 Invariant Mass (GeV/c 5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7 Events/(0.05) 2 4 6 8 10 12 )2 Invariant Mass (GeV/c 5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7 Events/(0.05) 0 2 4 6 8 10 12 )2 Invariant Mass (GeV/c 5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 6.8 7 Events/(0.05) 0 2 4 6 8 10 12 FIG. 16: b mass distribution of background events from J/ sideband events after all selection cuts have been applied (top), and these events -red squares- on top of the signal observed in right-sign combination events -open circles- (bottom). 3. ⇥b reconstruction on b ⇥ J/⇥ (p ) MC events. We applied our ⇥b selection on 30K generated b ⇥ J/⇥ (p ) MC events. This is p17 MC with the same cuts at generation level as those applied to our ⇥b MC, and reprocessed with the same extended configuration as used on data. No events survived after selection. VI. CONCLUSIONS By using a simple set of cuts we observe a signal peak with a mass of 5.774 ± 0.011 GeV/c2 (stat) ± 0.22 GeV/c2 (sys) and a width of 0.037 ± 0.008 GeV/c2 , a significance of 5.53 and S/ ⇤ B = 7.80. This peak is showed in Fig. 12 and the results of the fit are in Table II. This support the previous report of the observation by using Bagger Decision Trees [6]. We measure a relative production ratio to be f(b⇥⇥b )Br(⇥b ⇥J/⇥⇥ ( )) f(b⇥ b)Br( b⇥J/⇥ ) = 0.376 ± 0.119stat. ± 0.188syst [1] PL B384 449, D. Buskalic et. al. [2] ZPHY C68 541 P. Abreu et al. [3] Common Samples Group, http://wwwd0.fnal.gov/Run2Physics/cs/. [4] See description of ”J/psi & dimuon mass continuum” at http://d0server1.fnal.gov/users/nomerot/Run2A/BANA/Dskim.html. [5] Reconstruction of B hadron signals at DØ , DØ Note 4481. [6] DØ Note 5401. DØ Note 5403 Version 4.1 as June 5, 2007 Observation of the heavy baryon b E. De La Cruz Burelo, H.A. Neal, and J. Qian University of Michigan B. Abbott University of Oklahoma G.D. Alexeev, Yu.P. Merekov, G.A. Panov, A.M. Rozhdestvensky, L.S. Vertogradov, Yu.L. Vertogradova Joint Institute for Nuclear Research, Russia Using approximately 1.3 fb 1 of data collected by the upgraded DØ detector in Run II of the Tevatron, the ⇤b state has been observed in the decay mode J/⇤(⇤ µ+ µ )⇤ (⇤ ⇤ ⇥⇥± , ⇥ ⇤ ⇥p) A tracking algorithm which allows a more e⇧cient method of reconstructing tracks with large impact parameters was used in order to increase the e⇧ciency of reconstructing the ⇥ and ⇤ . We observe the ⇤b with a significance of 2 ln(L) = 5.53, S/ ⌅ B = 7.80 with a mass of 5.774 ± 0.011 GeV/c2 (stat) ± .022 GeV/c2 (sys). We measure the relative production ratio to be f(b ⇤ ⇤b )Br(⇤b ⇤ J/⇤⇤ (⇥⇥ )) f(b ⇤ ⇥b)Br(⇥b ⇤ J/⇤⇥) = 0.376 ± 0.119 stat. ± 0.188 syst. Data Cleaning Signal to Bkg 20:1 Initial objects Found it!Data Cleaning Machine Learning 9.4.2 Observed Results tb+tqb DT Output 0 0.2 0.4 0.6 0.8 1 EventYield 0 200 400 600 800 -1 D0 RunII Prelim. 2.3 fb channelµp17+p20 e+ 1-2 b-tags 2-4 jets tb+tqb DT Output 0 0.2 0.4 0.6 0.8 1 EventYield 0 200 400 600 800 tb+tqb DT Output 0 0.2 0.4 0.6 0.8 1 EventYield 2 10 3 10 -1 D0 RunII Prelim. 2.3 fb channelµp17+p20 e+ 1-2 b-tags 2-4 jets tb+tqb DT Output 0 0.2 0.4 0.6 0.8 1 EventYield 2 10 3 10 ield 60 -1 D0 RunII Prelim. 2.3 fb ield 60 Traditional searches Small Signal Analysis Signal to Bkg 1:20
  13. 13. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 Signal 13 Decision Trees 1 Signal Bkg Bkg x y y 1 1 2 2 X Y Bkg Signal x<x x<x y<y Bkg Signal BkgSignal Bkg x 2 1 1 L 4 R4 L R3 3L R2 2 y<y2 L 1 R Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving the classification problem of signal and background. 8.1 Overview Signal Signal Bkg Bkg Bkg Task: separate signal from background Issue: A single split on X or Y is not enough! Solution: Use a series of consecutive splits, generating a tree structure 1 Signal Bkg Bkg x y y 1 1 2 2 X Y Bkg Signal x<x x<x y<y Bkg Signal BkgSignal Bkg x 2 1 1 L 4 R4 L R3 3L R2 2 y<y2 L 1 R Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving the classification problem of signal and background.
  14. 14. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 Signal 14 Decision Trees 1 Signal Bkg Bkg x y y 1 1 2 2 X Y Bkg Signal x<x x<x y<y Bkg Signal BkgSignal Bkg x 2 1 1 L 4 R4 L R3 3L R2 2 y<y2 L 1 R Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving the classification problem of signal and background. 8.1 Overview Failed C1 Split 1: on the X variable 1 Signal Bkg Bkg x y y 1 1 2 2 X Y Bkg Signal x<x x<x y<y Bkg Signal BkgSignal Bkg x 2 1 1 L 4 R4 L R3 3L R2 2 y<y2 L 1 R Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving the classification problem of signal and background. Passed C1 P1F1
  15. 15. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 Signal 15 Decision Trees 1 Signal Bkg Bkg x y y 1 1 2 2 X Y Bkg Signal x<x x<x y<y Bkg Signal BkgSignal Bkg x 2 1 1 L 4 R4 L R3 3L R2 2 y<y2 L 1 R Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving the classification problem of signal and background. 8.1 Overview F: C1 F: C2 Split 2: Recovered events that failed the split 1 1 Signal Bkg Bkg x y y 1 1 2 2 X Y Bkg Signal x<x x<x y<y Bkg Signal BkgSignal Bkg x 2 1 1 L 4 R4 L R3 3L R2 2 y<y2 L 1 R Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving the classification problem of signal and background. Passed C1 P1F1 P2F2 F: C1 P: C2 repeat and continue the splitting process until events are classified
  16. 16. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 16 Decision Trees After 4 splits: Signal and Background regions are separated! Done! 1 Signal Bkg Bkg x y y 1 1 2 2 X Y Bkg Signal x<x x<x y<y Bkg Signal BkgSignal Bkg x 2 1 1 L 4 R4 L R3 3L R2 2 y<y2 L 1 R Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving the classification problem of signal and background. P1F1 P2F2 P3F3 P4F4 Signal 1 Signal Bkg Bkg x y y 1 1 2 2 X Y Bkg Signal x<x x<x y<y Bkg Signal BkgSignal Bkg x 2 1 1 L 4 R4 L R3 3L R2 2 y<y2 L 1 R Figure 8.1: 2D plane of a simple classification problem, and a Decision Tree solving the classification problem of signal and background. 8.1 Overview F: C1 P: C2 P: C1,C2 F: C4 P: C1,
 C3,C4 F: C1,C2 P: C1 F: C2 Toy model: only 2 variables, easy to determine cut values
  17. 17. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 17 A/B Testing
  18. 18. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 Anomaly detection 19 ๏ Fit model on training set ๏ On a cross validation/test example, predict ๏ Possible evaluation metrics: ๏ True positive, false positive, false negative, true negative ๏ Precision/Recall ๏ F1-score
  19. 19. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 • The SM describes the world around us • Components: • 24 particles of matter • 4 mediators • Interactions of the particles explained by the mediators • Does not include: gravity, dark matter and dark energy 20 Standard Model (SM)
  20. 20. Armando Benitez - @jabenitez - Data x Design - Jul 18, 2016 21 Identity Resolution • What? 
 Identify products having similar properties (name, colour, size) as a unique product • Why? 
 Recommender systems trained on these products would produce better recommendations -> Non-repetitive • How? • Classifying pairs as match or non-match, based on how similar they are. • Making use of catalog known features

×