High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
H2o tutorial
1. H2O – Tutorial and Demo
A brief intro to H2O
Ralph Schlosser
https://github.com/bwv988
July 2017
Ralph Schlosser H2O – Tutorial and Demo July 2017 1 / 12
2. Overview
Agenda
H2O at a glance
H2O projects
Architecture
Data input and output
ML algorithms
Demo
Links
Git repo: https://github.com/bwv988/h2o-tutorial
Demo: http://rpubs.com/bwv988/h2odemo
Ralph Schlosser H2O – Tutorial and Demo July 2017 2 / 12
3. H2O at a glance
“H2O is an open source, in-memory, distributed, fast, and scalable
machine learning and predictive analytics platform that allows you to
build machine learning models on big data and provides easy
productionalization of those models in an enterprise environment.”
— H2O.ai documentation
Ralph Schlosser H2O – Tutorial and Demo July 2017 3 / 12
4. H2O at a glance
Open source software, backed by commercial company.
Implemented from scratch in Java.
Multiple language bindings, or APIs: Java, Scala, R, Python.
Big data: Interfaces to Spark, Hadoop.
Deep Learning: Has abstraction layer to call TensorFlow, MXNet, Caffe
back-ends.
H2O company advisers include Rob Tibshirani and Trevor Hastie of
ESL fame. ;)
Ralph Schlosser H2O – Tutorial and Demo July 2017 4 / 12
5. H2O projects
H2O: Core project.
Sparkling Water: Execute H2O workloads on a Spark cluster.
Deep Water (preview): Call TensorFlow, MXNet, Caffee from within
H2O.
Flow: Integral part of H2O; web-based, notebook-style, interactive,
“point-and-click” UI for H2O.
Steam: Cluster management, collaborate, manage models, data,
teams.
Ralph Schlosser H2O – Tutorial and Demo July 2017 5 / 12
6. Architecture: Highlights
The H2O cloud consists of one or more nodes.
Each node runs as a separate JVM process.
Employs three-layered architecture: Language, Algorithms,
Infrastructure.
Language:
Native language support for Java, Scala.
REST API for other languages.
Algorithms
Has own implementation of many common ML algorithms.
Separate slide for this.
Infrastructure
Manage distributed data sets.
Manage distributed (parallel) computations: MapReduce.
Ralph Schlosser H2O – Tutorial and Demo July 2017 6 / 12
8. Architecture: How R (or Python) interface with H2O
No actual H2O computations, or data operations are done in R.
# Example: Load data from HDFS into H2O.
h2o_df = h2o.importFile("hdfs://give/me/data.csv")
Ralph Schlosser H2O – Tutorial and Demo July 2017 8 / 12
9. Data input and output
Many formats and sources are supported. Automatic data schema
discovery for most scenarios.
Formats
CSV
ORC – Optimized Row
Columnar, new in Hadoop &
Hive
SVMLight – Sparse data format
ARFF – Attribute Relation File
Format, from Weka
XLS, XLSX
Avro
Parquet
Sources
Local files
Remote files
HDFS
S3
Alluxio
JDBC
Ralph Schlosser H2O – Tutorial and Demo July 2017 9 / 12
10. ML algorithms
H2O provides highly optimized from-scratch implementations of
classical, as well as modern techniques on top of its distributed,
in-memory processing engine.
Deep Learning: Native implementation of a multi-layer, feed-forward
ANN. 1
Distributed Random Forest
GLM
GBM
k-Means clustering
PCA
. . . and many more: http://docs.h2o.ai/h2o/latest-stable/
h2o-docs/data-science.html
1
CNN and RNN only through 3rd party integrations, e.g. TensorFlow.
Ralph Schlosser H2O – Tutorial and Demo July 2017 10 / 12
11. Demo
Supervised learning example in R.
Flow demo.
Steam demo.
http: // rpubs. com/ bwv988/ h2odemo
Ralph Schlosser H2O – Tutorial and Demo July 2017 11 / 12
12. References
H2O website: https://www.h2o.ai/
Some pictures “stolen” from H2O’s documentation: http:
//docs.h2o.ai/h2o/latest-stable/h2o-docs/welcome.html
Erin Ledell’s excellent presentation: https://www.stat.berkeley.
edu/~ledell/docs/h2o_hpccon_oct2015.pdf
Darren Cook: “Practical Machine Learning with H2O” –
https://www.amazon.com/
Practical-Machine-Learning-H2O-Techniques/dp/149196460X
Example code is available in GitHub:
https://github.com/DarrenCook/h2o
Ralph Schlosser H2O – Tutorial and Demo July 2017 12 / 12