This session was recorded in NYC on October 22nd, 2019 and can be viewed here: https://www.youtube.com/watch?v=xAhQAYV5_PY&list=PLNtMya54qvOE3AvWRCNF2tybxNobUbAYp&index=3&t=2s
Bio: Prithvi is Chief of Technology, Applications at H2O.ai. Prithvi leads the design and development of “Q”, H2O.ai’s high scale exploratory data analysis and analytical application development platform.
Prithvi has been with H2O.ai since its early days and has been responsible for several products including Driverless AI (our flagship automatic machine learning platform), Steam (distributed cluster management, model management and deployment for H2O), H2O.js (Javascript transpiler for H2O’s distributed runtime), Play (on-demand cloud provisioning system for H2O), Flow (a hybrid GUI/REPL/Notebook for H2O) and Lightning (statistical graphics for H2O).
Bio: Shivam Bansal is a Data Scientist at H2O.ai and Kaggle Grandmaster in Kernels Section. He is the three times winner of Kaggle’s Data Science for Good Competition and winner of multiple other offline AI and Data Science competitions.
Shivam has extensive cross-industry and hands-on experience in building data science products. He has helped clients in the Insurance, Healthcare, Banking, and Retail domains to solve unstructured data science problems by building end to end pipelines and solutions.
5. Visualize
Ingest Prep
ModelDeploy
Refine
Analytics = “Information resulting from the systematic analysis of data or statistics”
What is analytics?
AI/ML/DataScience
“BI”/InternalApplications
Consumer/End-userApplications
Customer /
Data Consumer
Internal /
Business User
Data Scientist /
ML Engineer
Analytics / ML / AI Workflow
The three levels of analytical information consumption.
6. What does it take to build this?
Every analytical
application needs to:
— Ingest, store and retrieve data.
— Prepare or transform data.
— Handle user inputs (forms / UI).
— Filter or search through data.
— Display visualizations.
— Create or use ML models.
— Allow collaboration / sharing.
— Make all this fast, fun and easy!
Analytics / ML / AI
Front End / User Interface
Transformed
Data
Operational
Metrics
Model
Metrics &
Predictions
Refine
Typical Analytical Application Architecture
Visualization
Forms
Search / Filter
Database
PrepIngest Model Score
Collaboration
Raw Data
Your application
logic goes here.— Data Science / ML / AI
— Business Intelligence
— End-user Applications
Applies to:
Your application logic goes here.
7. Needs specialized skills
Analytics / ML / AI
User Interface
Transformed
Data
Operational
Metrics
Model
Metrics &
Predictions
Refine
Visualization
Forms
Search / Filter
Database
PrepIngest Model Score
Collaboration
Raw Data
Your application
logic goes here.
Your application logic goes here.
Data Scientist
Data Visualization
Specialist
Database Developer
Front-end Developer
Application Engineer
Data Engineer
8. Every stage of
analytics requires
interactive ad-hoc
data exploration and
visualization.
Every analytical application is
in fact a bespoke data
visualization application!
Visualize
Ingest Prep
ModelDeploy
Raw Data
Transformed
Data
Operational
Metrics
Model
Metrics &
Predictions
Refine
Analytics / ML / AI Workflow
Visualization
everywhere
9. Retrofit AI on BI?
No! The start of the art has advanced!
— Back-end: “BI” is too manual / reactive / Q&A driven
— Reports / dashboards not enough: need to be live, proactive,
predictive
— ML algorithms are better, faster, cheaper at finding insights
— Front-end: Drag-and-drop “BI" mental models are clunky to use
— Search is a simpler paradigm: get to results quickly
— Everyone understands and uses search every day
— More powerful with predictive / recommendation capabilities
AI + BI need to work as a cohesive whole - not as
an afterthought.
10. Conclusion
Building beautiful, usable predictive apps is hard.
Doing all this quickly is harder.
Doing all this without a diverse set of skills is insanely
hard!
11. Questions
— How do we simplify this process?
— How do we ease development of AI/ML applications?
— How do we rapidly experiment / prototype new ideas?
— How do we lower development costs?
— How do we reduce time to market?
Can we empower data scientists to quickly prototype
and deliver interactive predictive applications directly
to business users?
Data Scientist Business
13. H2O Q
— Provides:
— Large-scale analytical data storage
— High-performance analytical search + superior UX
— Beautiful, high-scale, ad-hoc, interactive, automatic
visualizations
— Point-and-click ad-hoc data prep
— Automatic Machine Learning
— Extensible back-end and front-end
— Using 100% pure Python!
— No front-end programming!
— No need to reason about client-server / distributed
architecture.
— Deploy apps in Months Weeks Days Hours, Minutes!
14. fx
Driverless AI
EDA
Formula Editor
Transformation
Editor
Notebook
Editor
Automatic
Insights
Formula ParserTypeahead
Visualization
Tables, Views &
Transformation
Pipelines
Tables
Views
Statistics
Typeahead
Index(Cold)
Q Store
External
Data
Sources
Typeahead
Index (Hot)
Fuzzy
Matcher
Query
Translator
Query
Parser
FormulaTranslator
Q App
Scheduler
+
Workflow
Engine
Metadata
Store
Tables
Notebooks
Visualizations
Pipelines
App Data
Q Server Q Apps (Python)
Pipelines
AutoInsights
Connectors
Frontend
Q App API
H2O Q
System Architecture
AI/ML
QApp
Your App
Q App UI
QApp
QApp
Text
15. fx
Driverless AI
EDA
Formula Editor
Transformation
Editor
Notebook
Editor
Automatic
Insights
Formula ParserTypeahead
Visualization
Tables, Views &
Transformation
Pipelines
Tables
Views
Statistics
Typeahead
Index(Cold)
Q Store
External
Data
Sources
Typeahead
Index (Hot)
Fuzzy
Matcher
Query
Translator
Query
Parser
FormulaTranslator
Q App
Scheduler
+
Workflow
Engine
Metadata
Store
Tables
Notebooks
Visualizations
Pipelines
App Data
Q Server Q Apps (Python)
Pipelines
AutoInsights
Connectors
Frontend
Q App API
H2O Q
System Architecture
AI/ML
QApp
Your App
Q App UI
QApp
QApp
Text
Q Core
Building
Blocks for AI
Applications.
Q Apps
Your AI
Applications and
Extensions.
16. fx
Driverless AI
EDA
Formula Editor
Transformation
Editor
Notebook
Editor
Automatic
Insights
Formula ParserTypeahead
Visualization
Tables, Views &
Transformation
Pipelines
Tables
Views
Statistics
Typeahead
Index(Cold)
Q Store
External
Data
Sources
Typeahead
Index (Hot)
Fuzzy
Matcher
Query
Translator
Query
Parser
FormulaTranslator
Q App
Scheduler
+
Workflow
Engine
Metadata
Store
Tables
Notebooks
Visualizations
Pipelines
App Data
Q Server Q Apps (Python)
Pipelines
AutoInsights
Connectors
Frontend
Q App API
H2O Q
System Architecture
AI/ML
QApp
Your App
Q App UI
QApp
QApp
Text
(1) Q Store
- Distributed analytical database
- Column store
- Parallel, vectorized query execution
- Linearly scalable
- Optimized for analytical queries
- No pre-aggregation required
- Fast!
26. fx
Driverless AI
EDA
Formula Editor
Transformation
Editor
Notebook
Editor
Automatic
Insights
Formula ParserTypeahead
Visualization
Tables, Views &
Transformation
Pipelines
Tables
Views
Statistics
Typeahead
Index(Cold)
Q Store
External
Data
Sources
Typeahead
Index (Hot)
Fuzzy
Matcher
Query
Translator
Query
Parser
FormulaTranslator
Q App
Scheduler
+
Workflow
Engine
Metadata
Store
Tables
Notebooks
Visualizations
Pipelines
App Data
Q Server Q Apps (Python)
Pipelines
AutoInsights
Connectors
Frontend
Q App API
H2O Q
System Architecture
AI/ML
QApp
Your App
Q App UI
QApp
QApp
Text
(3) Exploratory Data Analysis
- Not a charting library!
- Tight integration with Q Store + Search
- High scale visualization (tested ~5M marks)
- Unique 2-phase incremental rendering: fast static pass
followed by JIT interactivity.
- Based on Leland Wilkinson's Grammar of Graphics
- Advised by Leland Wilkinson!
49. fx
Driverless AI
EDA
Formula Editor
Transformation
Editor
Notebook
Editor
Automatic
Insights
Formula ParserTypeahead
Visualization
Tables, Views &
Transformation
Pipelines
Tables
Views
Statistics
Typeahead
Index(Cold)
Q Store
External
Data
Sources
Typeahead
Index (Hot)
Fuzzy
Matcher
Query
Translator
Query
Parser
FormulaTranslator
Q App
Scheduler
+
Workflow
Engine
Metadata
Store
Tables
Notebooks
Visualizations
Pipelines
App Data
Q Server Q Apps (Python)
Pipelines
AutoInsights
Connectors
Frontend
Q App API
H2O Q
System Architecture
AI/ML
QApp
Your App
Q App UI
QApp
QApp
Text
(4) Self-service Data Prep:
Pipelines
50.
51.
52. Qlang stdlib: 125+ parallel/distributed functions
Math: abs acos asin atan atan2 cos cosh cot coth degrees div exp ln
log pi power radians sign sin sinh sqrt square tan tanh ceiling floor
round
Conditional: if
Aggregate: avg count countd corr covar covarp max median min
percentile stdev stdevp sum var varp
String: contains endswith find findnth left len lower ltrim ltrim_this mid
regexp_replace regexp_match regexp_extract regexp_extract_nth
replace right rtrim space split startswith str trim upper
Date: now today date datetime dateadd day month year datediff
datepart datename isdate usec_to_timestamp timestamp_to_usec
Conversion: makedate makedatetime maketime datetrunc ascii char
float int
Misc: attr first ifnull index isnull last max min size zn host tld parse_url
parse_url_query
59. fx
Driverless AI
EDA
Formula Editor
Transformation
Editor
Notebook
Editor
Automatic
Insights
Formula ParserTypeahead
Visualization
Tables, Views &
Transformation
Pipelines
Tables
Views
Statistics
Typeahead
Index(Cold)
Q Store
External
Data
Sources
Typeahead
Index (Hot)
Fuzzy
Matcher
Query
Translator
Query
Parser
FormulaTranslator
Q App
Scheduler
+
Workflow
Engine
Metadata
Store
Tables
Notebooks
Visualizations
Pipelines
App Data
Q Server Q Apps (Python)
Pipelines
AutoInsights
Connectors
Frontend
Q App API
H2O Q
System Architecture
AI/ML
QApp
Your App
Q App UI
QApp
QApp
Text
(5) Build Predictive Models
- H2O's flagship Driverless AI under
the hood: Industry-leading
automatic machine learning
- For everything else, just import your
favorite ML libraries in Q Apps!
60.
61.
62.
63. fx
Driverless AI
EDA
Formula Editor
Transformation
Editor
Notebook
Editor
Automatic
Insights
Formula ParserTypeahead
Visualization
Tables, Views &
Transformation
Pipelines
Tables
Views
Statistics
Typeahead
Index(Cold)
Q Store
External
Data
Sources
Typeahead
Index (Hot)
Fuzzy
Matcher
Query
Translator
Query
Parser
FormulaTranslator
Q App
Scheduler
+
Workflow
Engine
Metadata
Store
Tables
Notebooks
Visualizations
Pipelines
App Data
Q Server Q Apps (Python)
Pipelines
AutoInsights
Connectors
Frontend
Q App API
H2O Q
System Architecture
AI/ML
QApp
Your App
Q App UI
QApp
QApp
Text
Q Apps
- Q’s extensibility mechanism
- Build interactive UI apps
- Authored in 100% Python
- No HTML/Javascript required
64. Built for extensibility
Q Store
3. Analyze
UI
Q App
4. Write Data5. Output Results
2. Read Data1. Prompt for inputs
Your
favorite ML
libraries!
65. Q Apps
- Apps run in parallel, managed by
scheduler
- Full fledged workflow engine - app
workflows can run for
days/weeks/months -
hydrated/dehydrated just-in-time:
cheap to run!
- Apps run venv isolated: no Docker /
Kubernetes required - light on
resources - runs on your laptop!