Sri Ambati – CEO, 0xdata at MLconf ATL

•Transferir como PPTX, PDF•

0 gostou•2,989 visualizações

"Comparing Variable Importance from Ensemble and Deep Learning Methods for AdTech Data" Variable Importance brings interpretability to popular black box modeling techniques. In this talk we study performance of popular ensemble techniques like Random Forest, Gradient Boosting with GLM. We observe certain traits that get magnified by non-linear techniques like Deep Learning that are otherwise missed by GBM or Random Forest. We describe Open Source Scalable Machine Learning package, H2O which through ease-of-use and speed makes comparisons and picking best-of-breed and ensembles more natural. H2O's implementation of these algorithms tracks popular open source and text book implementations closely.

Tecnologia

H2O.ai
Open Source
Machine Learning
for Intelligent Applications
H2O.ai
Machine Intelligence

Time is the only non-renewable resource
Speed Matters!
H2O.ai
Machine Intelligence

On Premise
On / Off Hadoop
On EC2
Per Node
2M Row ingest/sec
50M Row Regression/sec
750M Row Aggregates / sec
Tableau
R
JSON
Scala
Java
Python
H2O Prediction Engine
SDK / API
Nano Fast Scoring Engine
Deep learning
Regression
Trees
Boosting
Forests
Solvers
Gradients
ensembles
Cluster
Query Processor R-engine
In-Mem Map Reduce
Distributed fork/join
Memory Manager
Columnar Compression
Classify
HDFS S3 SQL NoSQL
Excel
H2O.ai
Machine Intelligence

Infrastructure
Parallelism
Data Parallel
Chunking Express!
Algorithm Parallel
Parallel Code blocks
Math Parallelism
ADMM, HogWild
Distribution
Zero-Serialization –
endian wars have ended

Scalable Machine Learning
For Smarter Applications
H2O.ai
Machine Intelligence
H2O.ai

Programmable Internet
H2O.ai
Machine Intelligence

Programmable Devices
H2O.ai
Machine Intelligence

AdSense Sense
H2O.ai
Machine Intelligence

Correlation Causality
H2O.ai
Machine Intelligence

Data
Sensors
Devices
Events. Signals. TimeSeries
Semi-structured data. json.
High velocity.
High dimensions.
H2O.ai
Machine Intelligence

Streaming Data
Historical Data
Scoring from prediction
Anomaly and Outliers Detection
Unsupervised Learning
H2O.ai
Machine Intelligence

Streaming Data
Historical Data
Anomaly and Outliers Detection
model
Scoring from prediction
H2O.ai
Machine Intelligence

Streaming Data
Historical Data
Clustering / Unsupervise Learning
model
Scoring from prediction
H2O.ai
Machine Intelligence

H2O.ai
Machine Intelligence https://developer.nest.com/documentation/api-reference/devices

Take Models to Production in Java
H2O.ai
Machine Intelligence

Onset of Rita
H2O.ai
Machine Intelligence

Common ensemble techniques
Bayesian Classifiers
Ensembles of all hypotheses in hypothesis-space.
Bagging
Each model votes with equal weight.
Bagging trains models on randomly drawn subset
Boosting
Incrementally build an ensemble of each new model
H2O.ai
Machine Intelligence

Gradient Boosting Machine
H2O.ai
Machine Intelligence

Variable Importance Comparison
Gradient Boosting Machine, 50 trees
Random Forest, 50 trees
H2O.ai
Machine Intelligence

Generalized Linear Modeling – Variable Importance
GLM, Elastic Net (Binomial)
GLM, Elastic Net (Binomial)
Categorical expansion on Age
H2O.ai
Machine Intelligence

Variable Importance Comparison
Deep Learning (Tanh / 4-layer)
Deep Learning (Tanh / 3-layer)
H2O.ai
Machine Intelligence

every generation needs to invent it’s math.
Our data, our tools!
H2O.ai
Machine Intelligence

Code is incomplete without Community!
Open Source Matters!
H2O.ai
Machine Intelligence

Community
Committers 30
Meet ups 90
in 12 months
Coverage Conference
Speakers
Curriculum
Stanford, MIT, CSU,
SUNY, SJSU, Purdue

Data Driven Decision Making is hard!
Courage Matters!
H2O.ai
Machine Intelligence

Thanks
Courtney, Nick & MLConf
for bringing us to ATL

Sparkling Water Application Life
Cycle
Sparkling
App
jar file
Spark
Master
JVM
spark-submit
Spark
Worker
JVM
Spark
Worker
JVM
Spark
Worker
JVM
(1)
(2)
(3)
(1) User submits App to Spark cluster Master node
(2) App distributed to Spark cluster Worker nodes
(3) Spark Executor JVMs start for App
(4) H2O instance starts within each Executor JVM
(5) App’s Scala main program runs
Sparkling Water Cluster
Spark
Executor
JVM
H2O
(4)
Spark
Executor
JVM
H2O
Spark
Executor
JVM
H2O

Sparkling Water Data Distribution
Sparkling Water Cluster
H2O
H2O
H2O
Spark Executor JVM
Data
Source
(e.g.
HDFS)
(1)
(2)
(3)
(1) Use Spark SQL to read
data into a Spark RDD
(2) Convert Spark RDD to
H2O RDD; H2O RDD is
column-based and highly
compressed
(Not shown) Run modeling
and prediction workflows
with H2O
(3) Convert H2O RDD (e.g.
predictions) back to Spark
RDD
H2O
RDD
Spark
RDD
Spark Executor JVM
Spark Executor JVM

H2O
HHDFS
H2O
YARN
HHDFS
H2O
Hadoop MR
HHDFS
Standalone YARN H2O in MR
H HortonWorks, Cloudera, MapR, Intel 2O.ai
Machine Intelligence

H2O – The Killer-App for Spark
MLlib H2O SQL
H2ORDD
HDFS=DATA
Sparkling Water
H2O.ai
Machine Intelligence
In-Memory Big Data, Columnar
ML 100x faster Algos
R CRAN, API, fast engine
API Spark API, Java MM
Community Devs, Data Science

Fraud / No-fraud
1/1000 unbalanced
Click-Stream
Browse / Click / Buy
H2O.ai
Machine Intelligence

Propensity Models
Merchants –to- Users
Lifetime Value of Customer
Pricing Engines
H2O.ai
Machine Intelligence

Mais conteúdo relacionado

Mais procurados

Sparkling Water Webinar October 29th, 2014Sri Ambati

Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...Spark Summit

Lightening Fast Big Data Analytics using Apache SparkManish Gupta

H2O PySparkling WaterSri Ambati

Hadoop - A Very Short Introductiondewang_mistry

Big Data Ecosystem - 1000 Simulated DronesEspeo Software

Apache spark - Architecture , Overview & librariesWalaa Hamdy Assy

An efficient data mining solution by integrating Spark and CassandraStratio

Spark For The Business AnalystGustaf Cavanaugh

Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Data Con LA

Why spark by Stratio - v.1.0Stratio

Machine Learning with Sparkelephantscale

Big data with javaStefan Angelov

DASK and Apache SparkDatabricks

Spark Summit - Stratio Streaming Stratio

Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Big Data Spain

0xdata_h2o_BigDataScience_5.28.2013Sri Ambati

Applying Machine Learning to Live Patient DataCarol McDonald

Real Time Processing Using Twitter Heron by Karthik RamasamyData Con LA

Beginner Apache Spark PresentationNidhin Pattaniyil

Mais procurados (20)

Sparkling Water Webinar October 29th, 2014

Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...

Lightening Fast Big Data Analytics using Apache Spark

H2O PySparkling Water

Hadoop - A Very Short Introduction

Big Data Ecosystem - 1000 Simulated Drones

Apache spark - Architecture , Overview & libraries

An efficient data mining solution by integrating Spark and Cassandra

Spark For The Business Analyst

Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...

Why spark by Stratio - v.1.0

Machine Learning with Spark

Big data with java

DASK and Apache Spark

Spark Summit - Stratio Streaming

Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...

0xdata_h2o_BigDataScience_5.28.2013

Applying Machine Learning to Live Patient Data

Real Time Processing Using Twitter Heron by Karthik Ramasamy

Beginner Apache Spark Presentation

Destaque

H2O on Hadoop Dec 12 Sri Ambati

Big Data Analytics-Open Source ToolkitsDataWorks Summit

A Continuously Deployed Hadoop Analytics Platform?DataWorks Summit/Hadoop Summit

Skymind Open Power Summit ISV Round TableAdam Gibson

H2O Big Data EnvironmentsSri Ambati

Eric bieschke slidesMLconf

Skymind Company ProfileShu Wei Goh

Dl4j in the wildAdam Gibson

Enterprise Deep Learning with DL4JJosh Patterson

Configuring Credit Card Process in SAPShailendu Verma

RE-Work Deep Learning Summit - September 2016Intel Nervana

Введение в архитектуры нейронных сетей / HighLoad++ 2016Grigory Sapunov

Ready for Funding?Manish Singhal

Artificial Intelligence - Trends & AdvancementsManish Singhal

Introduction to Machine Learning with TensorFlowPaolo Tomeo

Pi ai landscapeManish Singhal

Paymetrics Deck - Seed RoundShannon Sofield

Tensorflowmarwa Ayad Mohamed

BootstrapLabs - Tracxn Report - artificial intelligence for the Applied Arti...BootstrapLabs

Skymind & Deeplearning4j: Deep Learning for the EnterpriseAdam Gibson

Destaque (20)

H2O on Hadoop Dec 12

Big Data Analytics-Open Source Toolkits

A Continuously Deployed Hadoop Analytics Platform?

Skymind Open Power Summit ISV Round Table

H2O Big Data Environments

Eric bieschke slides

Skymind Company Profile

Dl4j in the wild

Enterprise Deep Learning with DL4J

Configuring Credit Card Process in SAP

RE-Work Deep Learning Summit - September 2016

Введение в архитектуры нейронных сетей / HighLoad++ 2016

Ready for Funding?

Artificial Intelligence - Trends & Advancements

Introduction to Machine Learning with TensorFlow

Pi ai landscape

Paymetrics Deck - Seed Round

Tensorflow

BootstrapLabs - Tracxn Report - artificial intelligence for the Applied Arti...

Skymind & Deeplearning4j: Deep Learning for the Enterprise

Semelhante a Sri Ambati – CEO, 0xdata at MLconf ATL

Machine Learning with H2O, Spark, and Python at Strata 2015Sri Ambati

Analyzing Big data in R and Scala using Apache Spark 17-7-19Ahmed Elsayed

Handling not so big dataSATOSHI TAGOMORI

Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Imam Raza

Top 10 Performance Gotchas for scaling in-memory Algorithms.srisatish ambati

qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...Sri Ambati

ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochgSri Ambati

Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...BigDataEverywhere

Opening Keynote - AWS Summit SG 2017Amazon Web Services

Machine Learning on Google Cloud with H2OSri Ambati

Bhupeshbansal bigdata Bhupesh Bansal

0xdata H2O Podcastinside-BigData.com

Thinking in parallel ab tuladevPavel Tsukanov

Hadoop and Big Data: RevealedSachin Holla

Cloud Computing Bootcamp On The Google App Engine [v1.1]Matthew McCullough

Cassandra summit-2013dfilppi

(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...Amazon Web Services

Spark Based Distributed Deep Learning Framework For Big Data Applications Humoyun Ahmedov

Big Data - Need of Converged Data PlatformGeekNightHyderabad

Semelhante a Sri Ambati – CEO, 0xdata at MLconf ATL (20)

Machine Learning with H2O, Spark, and Python at Strata 2015

Analyzing Big data in R and Scala using Apache Spark 17-7-19

Handling not so big data

Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...

Top 10 Performance Gotchas for scaling in-memory Algorithms.

qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...

ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg

Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...

Opening Keynote - AWS Summit SG 2017

Machine Learning on Google Cloud with H2O

Bhupeshbansal bigdata

0xdata H2O Podcast

Thinking in parallel ab tuladev

Hadoop and Big Data: Revealed

Cloud Computing Bootcamp On The Google App Engine [v1.1]

Cassandra summit-2013

(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...

Spark Based Distributed Deep Learning Framework For Big Data Applications

Big Data - Need of Converged Data Platform

Mais de MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf

Josh Wills - Data Labeling as Religious ExperienceMLconf

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf

Meghana Ravikumar - Optimized Image Classification on the CheapMLconf

Noam Finkelstein - The Importance of Modeling Data CollectionMLconf

June Andrews - The Uncanny Valley of MLMLconf

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf

Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf

Neel Sundaresan - Teaching a machine to codeMLconf

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf

Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf

Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf

Mais de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

Josh Wills - Data Labeling as Religious Experience

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

Meghana Ravikumar - Optimized Image Classification on the Cheap

Noam Finkelstein - The Importance of Modeling Data Collection

June Andrews - The Uncanny Valley of ML

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

Vito Ostuni - The Voice: New Challenges in a Zero UI World

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

Neel Sundaresan - Teaching a machine to code

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

Soumith Chintala - Increasing the Impact of AI Through Better Software

Roy Lowrance - Predicting Bond Prices: Regime Changes

Último

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

Platformless Horizons for Digital AdaptabilityWSO2

Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea

Architecting Cloud Native ApplicationsWSO2

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021

Corporate and higher education May webinar.pptxRustici Software

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub

WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2

CNIC Information System with Pakdata Cf In Pakistandanishmna97

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

FWD Group - Insurer Innovation Award 2024The Digital Insurer

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays

Sri Ambati – CEO, 0xdata at MLconf ATL

1. H2O.ai Open Source Machine Learning for Intelligent Applications H2O.ai Machine Intelligence

2. Time is the only non-renewable resource Speed Matters! H2O.ai Machine Intelligence

3. Sampling Law of Large Numbers

4. On Premise On / Off Hadoop On EC2 Per Node 2M Row ingest/sec 50M Row Regression/sec 750M Row Aggregates / sec Tableau R JSON Scala Java Python H2O Prediction Engine SDK / API Nano Fast Scoring Engine Deep learning Regression Trees Boosting Forests Solvers Gradients ensembles Cluster Query Processor R-engine In-Mem Map Reduce Distributed fork/join Memory Manager Columnar Compression Classify HDFS S3 SQL NoSQL Excel H2O.ai Machine Intelligence

5. Infrastructure Parallelism Data Parallel Chunking Express! Algorithm Parallel Parallel Code blocks Math Parallelism ADMM, HogWild Distribution Zero-Serialization – endian wars have ended

6. Scalable Machine Learning For Smarter Applications H2O.ai Machine Intelligence H2O.ai

7. Programmable Internet H2O.ai Machine Intelligence

8. Programmable Devices H2O.ai Machine Intelligence

9. AdSense Sense H2O.ai Machine Intelligence

10. Correlation Causality H2O.ai Machine Intelligence

11. Data Sensors Devices Events. Signals. TimeSeries Semi-structured data. json. High velocity. High dimensions. H2O.ai Machine Intelligence

12. Streaming Data Historical Data Scoring from prediction Anomaly and Outliers Detection Unsupervised Learning H2O.ai Machine Intelligence

13. Streaming Data Historical Data Anomaly and Outliers Detection model Scoring from prediction H2O.ai Machine Intelligence

14. Streaming Data Historical Data Clustering / Unsupervise Learning model Scoring from prediction H2O.ai Machine Intelligence

15. H2O.ai Machine Intelligence https://developer.nest.com/documentation/api-reference/devices

16. Take Models to Production in Java H2O.ai Machine Intelligence

17. Onset of Rita H2O.ai Machine Intelligence

18. Common ensemble techniques Bayesian Classifiers Ensembles of all hypotheses in hypothesis-space. Bagging Each model votes with equal weight. Bagging trains models on randomly drawn subset Boosting Incrementally build an ensemble of each new model H2O.ai Machine Intelligence

19. H2O.ai Machine Intelligence

20. H2O.ai Machine Intelligence

21. Gradient Boosting Machine H2O.ai Machine Intelligence

22. H2O.ai Machine Intelligence

23. H2O.ai Machine Intelligence

24. Variable Importance Comparison Gradient Boosting Machine, 50 trees Random Forest, 50 trees H2O.ai Machine Intelligence

25. Generalized Linear Modeling – Variable Importance GLM, Elastic Net (Binomial) GLM, Elastic Net (Binomial) Categorical expansion on Age H2O.ai Machine Intelligence

26. Variable Importance Comparison Deep Learning (Tanh / 4-layer) Deep Learning (Tanh / 3-layer) H2O.ai Machine Intelligence

27. every generation needs to invent it’s math. Our data, our tools! H2O.ai Machine Intelligence

28. Power-Law

29. Code is incomplete without Community! Open Source Matters! H2O.ai Machine Intelligence

30.

31. Community Committers 30 Meet ups 90 in 12 months Coverage Conference Speakers Curriculum Stanford, MIT, CSU, SUNY, SJSU, Purdue

32. Data Driven Decision Making is hard! Courage Matters! H2O.ai Machine Intelligence

33. Thanks Courtney, Nick & MLConf for bringing us to ATL

34. Sparkling Water Application Life Cycle Sparkling App jar file Spark Master JVM spark-submit Spark Worker JVM Spark Worker JVM Spark Worker JVM (1) (2) (3) (1) User submits App to Spark cluster Master node (2) App distributed to Spark cluster Worker nodes (3) Spark Executor JVMs start for App (4) H2O instance starts within each Executor JVM (5) App’s Scala main program runs Sparkling Water Cluster Spark Executor JVM H2O (4) Spark Executor JVM H2O Spark Executor JVM H2O

35. Sparkling Water Data Distribution Sparkling Water Cluster H2O H2O H2O Spark Executor JVM Data Source (e.g. HDFS) (1) (2) (3) (1) Use Spark SQL to read data into a Spark RDD (2) Convert Spark RDD to H2O RDD; H2O RDD is column-based and highly compressed (Not shown) Run modeling and prediction workflows with H2O (3) Convert H2O RDD (e.g. predictions) back to Spark RDD H2O RDD Spark RDD Spark Executor JVM Spark Executor JVM

36. H2O HHDFS H2O YARN HHDFS H2O Hadoop MR HHDFS Standalone YARN H2O in MR H HortonWorks, Cloudera, MapR, Intel 2O.ai Machine Intelligence

37. H2O – The Killer-App for Spark MLlib H2O SQL H2ORDD HDFS=DATA Sparkling Water H2O.ai Machine Intelligence In-Memory Big Data, Columnar ML 100x faster Algos R CRAN, API, fast engine API Spark API, Java MM Community Devs, Data Science

38. examples H2O.ai Machine Intelligence

39.

40. Fraud / No-fraud 1/1000 unbalanced Click-Stream Browse / Click / Buy H2O.ai Machine Intelligence

41. Propensity Models Merchants –to- Users Lifetime Value of Customer Pricing Engines H2O.ai Machine Intelligence

Sri Ambati – CEO, 0xdata at MLconf ATL

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Sri Ambati – CEO, 0xdata at MLconf ATL

Semelhante a Sri Ambati – CEO, 0xdata at MLconf ATL (20)

Mais de MLconf

Mais de MLconf (20)

Último

Último (20)

Sri Ambati – CEO, 0xdata at MLconf ATL