This document provides an overview of H2O.ai, a leading AI platform company. It discusses that H2O.ai was founded in 2012, is funded with $75 million, and has products including its open source H2O machine learning platform and its Driverless AI automated machine learning product. It also describes H2O.ai's leadership in the machine learning platform market according to Gartner, its team of 90 AI experts, and its global presence across several offices. Finally, it outlines H2O.ai's machine learning capabilities and how customers can use its platform and products.
1. Machine Learning with H2O.ai
on Google Cloud
Nicholas Png
Partnerships Software Engineer
nicholas@h2o.ai
2. Who is H2O.ai?
Company
● Founded in Silicon Valley
in 2012
● Funded: $75m
● Investors: Wells Fargo,
NVIDIA, Nexus Ventures,
Paxion Ventures
Products
● H2O Open Source Machine
Learning (14,000
organizations)
● H2O Driverless AI -
Automated Machine
Learning
Leadership
Leader in Gartner MQ
machine learning and
data science platform
Team
90 AI experts (5 of the
world’s top 100 data
scientists with Kaggle
Grandmasters)
Global
Mountain View
London
Prague
India
3. Technology leader with most
completeness of vision
Recognized for the mindshare, partner network
and status as a quasi-industry standard for
machine learning and AI
H2O.ai customers gave the highest overall
score among all the vendors for sales relationship
and account management, customer support
(onboarding, troubleshooting, etc.) and overall
service and support
Get the Gartner Magic Quadrant here
H2O.ai is a Leader in the 2018 Gartner Data Science and
Machine Learning Platforms Magic Quadrant
4. In-Memory,
Distributed Machine
Learning Algorithms
with H2O Flow GUI
H2O AI Open
Source Engine
Integration
with Spark
Lightning Fast
machine learning on
GPUs
100% open source – Apache V2 licensed
Built for data scientists – interface using R, Python
on H2O Flow (interactive notebook interface)
We offer Enterprise Support subscriptions
Commercial Licensed
(closed source)
Built for domain users, analysts &
data scientists – GUI based
interface for end-to-end
data science
Fully automated machine
learning from ingest to
deployment
We offer user licenses on a per
seat basis (annual subscription)
Automatic feature engineering,
machine learning and
interpretability
H2O.ai Product Suite
6. What is H2O?
Math Platform
Open source in-memory
AI engine
● Parallelized and distributed
algorithms
● GLM, Random Forest, GBM,
Deep Learning, etc.
Tech and API
Easy to use and adopt
● Written in Java - perfect for
Java programmers
● Install is lightweight
● REST API (Java) - run H2O
from R, Python, WebUI
Big data
More data or better models?
BOTH
● Use all of your data - model
without sampling
● More data + better models
= better predictions
7. Clustering
• K-Means (Auto-K)
Dimension reduction
• Principal Component Analysis
• Generalized Low Rank Models
Word embedding
• Word2Vec
Time series
• iSAX
Machine Learning tuning
• Hyperparameter Search
• Early Stopping
Algorithms on H2O
Statistical analysis
• Linear Models (GLM)
• Naïve Bayes
Ensembles
• Random Forest
• Distributed Trees
• Gradient Boosting Machine
• Stacking / Super Learner
Deep Neural Networks
• NLP
• Autoencoder
• Anomaly Detection
• Deep Features
• CNN, RNN (Deep Water)
8. +
Data integration
Data quality and
transformation
Modeling table Model building Model
Features Target
Simplified typical machine learning pipeline
9. Production
Environments
JobFluid Vector Frame
MRTaskDistributed K/V Store
Distributed Fork/JoinNon-Blocking Hash Table
Distributed In-Memory Processing
REST / JSON
Parse
Exploratory
Analysis
Feature
Engineering
ML
Algorithms
Model
Evaluation
Scoring
Data/Model
Export
SQL
NFS
Local
GCS
HDFS
POJO
High level architecture
11. Driverless AI delivers
“Expert data scientist in a box”
Created and supported by world renowned AI experts
Empowers companies to accomplish AI and ML
with a single platform
Performs the function of an expert data scientist
and adds more power to both novice and expert
teams
Details and highlights insights and interpretability
with easy to understand results and visualizations
21 day free trial for Driverless AI
12. Driverless AI
+
Data integration
Data quality and
transformation
Modeling table Model building Model
Features Target
Typical enterprise machine learning workflow
13. Data is a team sport
~100
Data science experts in the
world
Weeks to
hours
Time for a data scientist to
build a model
Black box models
Lack of AI talent Time to insights slow Lack of trust in AI
”US alone faces a shortage of 190,000
people with analytical expertise.”
Driverless
AI delivers
Your digital data
scientist
Automatic Feature Engineering with
GPU accelerated machine learning
Explainable and
Interpretable AI
Why Driverless AI for Enterprise AI adoption
14. Automatic feature engineering to
increase accuracy - AlphaGo for AI
Automatic Kaggle Grandmaster
recipes in a box for solving wide
variety of use-cases
Automatic machine learning to find
and tune the right ensemble of
models
Accuracy
15. Original features
Generated
features
Automatic Text Handling
Frequency Encoding
Cross Validation Target Encoding
Truncated Singular Value
Decomposition
Clustering and more
Feature transformations
Auto feature generation
Kaggle Grandmaster Out of the Box
17. YARN
CPU CPU
Model BuildingSQL NFS
GCS
Kubernetes / Kubefow
H2O.ai Driverless AI
H2O Distributed
In-Memory
H2O.ai + Kubeflow
CPU
18. H2
O Flow
H2O Cluster
(H2O can run anywhere: desktop, cloud, on-prem;
Hadoop and Spark environments supported)
Model training
Model Repository
POJO
(java file)
MOJO
(zip file)
C++
MOJO
Library
Java
MOJO
Library
Java R Py .NET ...
...
...
Apps Language bindings
Model management Model deployment
(Store models in H2O Steam, git, HDFS, S3, etc.) (Add any language with C/C++ binding support)
Save Model
Load Model
Load Model
H2O deployment options
19. BigQuery
NFS
Local
Cloud
Storage
HDFS
Storage Data Munging Driverless AI
Compute Engine
MOJO
(.zip)
Compute Engine
Inference
●Initial data stored on
HDFS or Google
BigQuery
●Deploy MOJO file to serve
real-time inference (millisecond
response times)
●Additional logic can be placed
before or after calling the MOJO
High Level Deployment Pipeline - Spark
Google Dataproc
●Save munged data to structured data file
●Ingest data file into Driverless AI
●Automatic feature engineering
●Automatic visualizations
●Complete model pipeline exported as MOJO
●Generate high performance model, ensemble
XGBoost, + TF + RunFit
●Ingest data into Spark running
on Google Dataproc.
●Use Sparkling Water for
preliminary modeling and data
munging.
●Current data pipeline can be
added here
20. BigQuery
NFS
Local
Cloud
Storage
HDFS
Storage
Google BigQuery
Data Munging Driverless AI
Compute Engine
MOJO
(.zip)
Compute Engine
Inference
●Initial data stored on
HDFS or Google
BigQuery
●Perform data cleaning and data
munging in Google BigQuery.
●Driverless AI has an integrated
connector with GBQ for direct
data ingest via SQL queries
●Automatic feature engineering
●Automatic visualizations
●Complete model pipeline exported as MOJO
●Generate high performance models, ensemble
XGBoost + TF + RunFit
●Deploy MOJO file to serve
real-time inference (millisecond
response times)
●Additional logic can be placed
before or after calling the MOJO
High Level Deployment Pipeline - BigQuery