SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Analytics and Machine Learning with
Red Hat Infrastructure
Kyle Bader, Senior Architect
Sean Pryor, AI Developer
Sherard Griffin, Senior Manager, Open Data Hub
BOSTON, 2019
● PROBLEM STATEMENT
○ Multi-tenant data analytics and machine learning
○ Shared data context
○ Sensitive data can’t leave the country, data governance restrictions
● DATA STRUCTURES
○ Shared data context with Ceph
○ Preparing your data
■ Structured data with Hive Metastore*
■ Semi-structured data
■ Data processing jobs
■ Spark
○ AI/ML
■ Features/Labels/other important terms
■ Background on AI and how it works
■ TensorFlow
● DATA PLATFORM ARCHITECTURE
○ Open Data Hub (Spark, Ceph, JupyterHub, TensorFlow)
○ Follow-up slides for them to learn more
■ ISVs
■ ODH
■ Frameworks
■ Other talks, etc.
PROBLEM STATEMENT
ANALYTICS AND ML CHALLENGES
EXPLOSIVE GROWTH
in analytics teams and analytic tools
MULTIPLE TEAMS COMPETING
for use of the same big data resources
CONGESTION
in busy analytic clusters causing frustration
and missed SLAs
HADOOP
SPARK
HIVE
PRESTO
IMPALA
KAFKA
NIFI
TENSORFLOW
PYTORCH
OPTIONS TO ADDRESS CHALLENGES
Get a bigger cluster
for many teams to share
Give each team
own dedicated cluster,
each with copies of
PBs of data
#1 #2
Give teams ability to
spin-up/spin-down
clusters which can
share common data store
#3
MULTI-WORKLOAD TENANCY
SHARED DATA CONTEXT
HIT SERVICE-LEVEL AGREEMENTS
Give teams their own compute clusters.
ELIMINATE IDLE RESOURCES
By right-sizing de-coupled compute and storage.
BUY 10’s OF PBS INSTEAD OF 100’s
Share data sets across clusters instead of duplicating them.
INCREASE AGILITY
With spin-up/spin-down clusters.
HYBRID CLOUD ANALYTICS AND ML
OPERATOR FRAMEWORK
Provides a managed service like experience
STATEFUL STORAGE SERVICES
Object, block, and file interfaces
DEVICE PLUGIN
GPU acceleration
LOCAL PVS
High performance scratch storage
DATA STRUCTURES
CLEANING AND CONFORMING
SEMI-STRUCTURED DATA
● Sources
○ Stateless applications
○ Sensors
● Common formats
○ CSV, JSON, XML
○ ORC, Avro, Parquet
DATA PROCESSING
● Variety of sources and formats
● Schema detection
● Distributed streaming and batch ETL
STRUCTURED DATA
● Cataloged into databases and tables
● External locations map to object URIs
● Table and column statistics
Select
Model
Select
Features
Model
Training
Model
Evaluation
Model
Tuning
Trained
Models
Model
Serving
&
Scoring
Keras
Microsoft
Cognitive
Toolkit
Horovod
MODELING AND SERVING
DATA PLATFORM
ARCHITECTURE
ARCHITECTURE
ARCHITECTURE
ARCHITECTURE
ARCHITECTURE
ARCHITECTURE
OPEN DATA HUB
Collaborate on a Data & AI platform for the Hybrid Cloud
● Open source community for AI-as-a-service platform
● Cloud-agnostic - AI for the Hybrid Cloud
● No cloud vendor lock-in
● OpenDataHub.io
Sentiment analysis and entity detection
on customer engagements, support
tickets, marketing surveys and more.
Trained on the specific Red Hat product
terminology.
AWS Microsoft AzureOpenStackDatacenterLaptop
CONTAINERIZER APPS
AT RED HAT’S CORE PROCESSES
Internal Use Cases
AWS Microsoft AzureOpenStackDatacenterLaptop
CONTAINERIZER APPS
AT RED HAT’S CORE PROCESSES
Internal Use Cases
Improve Red Hat’s core Engineering and
Operations processes by applying
analytics, machine learning, and AI.
AWS Microsoft AzureOpenStackDatacenterLaptop
CONTAINERIZER APPS
- rules
- heuristics
- ML
CORE DEPLOYMENT
● Container platform
● Certified Kubernetes
● Hybrid cloud
● Unified, distributed
storage
● RESTful gateway
● S3 and Swift compatible
● Radanalytics.io
community
● Unified analytics
engine
● Large-scale data
● Runs on Kubernetes
● Multi-user Jupyter
● Used for data science
and research
Available Now at OpenDataHub.io
Add-Ons
● Part of Open Data Hub
● Set of deployed
pre-defined AI models
available to use
● Monitoring and alerting
toolkit
● Records numeric time
series data
● Used to diagnose
problems
● Analytics platform for
all metrics
● Query, visualize and
alert on metrics
● Deploying machine
learning models on
Kubernetes
● Expose models via
REST and gRPC
● Full model lifecycle
management
Available Now at OpenDataHub.io
Open Data Hub
AI Library
RUNNING AT RED HAT
PLANNED RELEASES
Highlights
July
2019
Data Engineering Additions
- Cloudera Hue deployment
- Spark SQL Thrift Server deployment
- Argo deployment
- MLFlow deployment
- Kubeflow integration
- Kafka (Strimzi) deployment
- Seldon-core deployment
October
2019
To be determined
January
2019
Version 0.1 - Initial ODH Release
- OCP 3.10 and 3.11 support
- JupyterHub + Spark + Ceph-nano
deployment
April
2019
Operator Support + Monitoring
- OCP 4.0+ support
- Open Data Hub operator
- AI Library
- Rook for Ceph deployment
- TwoSigma BeakerX integration
- JupyterHub with GPU support
- Prometheus deployment with Spark
monitoring
AI AND MACHINE LEARNING
IN THIS LAB
AI IN THIS LAB
WHAT NEXT?
● Try Open Data Hub yourself!
○ https://try.openshift.com
○ https://gitlab.com/opendatahub/opendatahub-operator
● Building the Next Generation of Innovation Together
○ Thursday at 8:30 AM
● Kaleidoscope of Innovation: AI and Machine Learning on
OpenShift
○ Part 1: Thursday at 2:00 PM
○ Part 2: Thursday at 3:15 PM
Red Hat data analytics infrastructure solution
red.ht/videos-RHDAIS
MACHINE LEARNING CYCLE
Ingest Prepare Preprocess Discover Develop Train Test Deploy
MKL-DNN
cuDNN

Mais conteúdo relacionado

Mais procurados

Tracking data lineage at Stitch Fix
Tracking data lineage at Stitch FixTracking data lineage at Stitch Fix
Tracking data lineage at Stitch Fix
Stitch Fix Algorithms
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
Marin Dimitrov
 
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Databricks
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
DataWorks Summit
 

Mais procurados (20)

Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
 
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
 
How to build a data stack from scratch
How to build a data stack from scratchHow to build a data stack from scratch
How to build a data stack from scratch
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML Pipelines
 
Tracking data lineage at Stitch Fix
Tracking data lineage at Stitch FixTracking data lineage at Stitch Fix
Tracking data lineage at Stitch Fix
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
 
Data Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsData Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
 
Serverless data pipelines gcp
Serverless data pipelines gcpServerless data pipelines gcp
Serverless data pipelines gcp
 
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
 
Productive Data Tools for Quants
Productive Data Tools for QuantsProductive Data Tools for Quants
Productive Data Tools for Quants
 
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
 
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
 
Societal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending ComparisonSocietal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending Comparison
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 

Semelhante a Red hat infrastructure for analytics

Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
StampedeCon
 

Semelhante a Red hat infrastructure for analytics (20)

Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 
Session 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers ProgramSession 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers Program
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Graph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comGraph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.com
 
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsData Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Red hat infrastructure for analytics

  • 1. Analytics and Machine Learning with Red Hat Infrastructure Kyle Bader, Senior Architect Sean Pryor, AI Developer Sherard Griffin, Senior Manager, Open Data Hub BOSTON, 2019
  • 2. ● PROBLEM STATEMENT ○ Multi-tenant data analytics and machine learning ○ Shared data context ○ Sensitive data can’t leave the country, data governance restrictions ● DATA STRUCTURES ○ Shared data context with Ceph ○ Preparing your data ■ Structured data with Hive Metastore* ■ Semi-structured data ■ Data processing jobs ■ Spark ○ AI/ML ■ Features/Labels/other important terms ■ Background on AI and how it works ■ TensorFlow ● DATA PLATFORM ARCHITECTURE ○ Open Data Hub (Spark, Ceph, JupyterHub, TensorFlow) ○ Follow-up slides for them to learn more ■ ISVs ■ ODH ■ Frameworks ■ Other talks, etc.
  • 4. ANALYTICS AND ML CHALLENGES EXPLOSIVE GROWTH in analytics teams and analytic tools MULTIPLE TEAMS COMPETING for use of the same big data resources CONGESTION in busy analytic clusters causing frustration and missed SLAs HADOOP SPARK HIVE PRESTO IMPALA KAFKA NIFI TENSORFLOW PYTORCH
  • 5. OPTIONS TO ADDRESS CHALLENGES Get a bigger cluster for many teams to share Give each team own dedicated cluster, each with copies of PBs of data #1 #2 Give teams ability to spin-up/spin-down clusters which can share common data store #3
  • 6. MULTI-WORKLOAD TENANCY SHARED DATA CONTEXT HIT SERVICE-LEVEL AGREEMENTS Give teams their own compute clusters. ELIMINATE IDLE RESOURCES By right-sizing de-coupled compute and storage. BUY 10’s OF PBS INSTEAD OF 100’s Share data sets across clusters instead of duplicating them. INCREASE AGILITY With spin-up/spin-down clusters.
  • 7. HYBRID CLOUD ANALYTICS AND ML OPERATOR FRAMEWORK Provides a managed service like experience STATEFUL STORAGE SERVICES Object, block, and file interfaces DEVICE PLUGIN GPU acceleration LOCAL PVS High performance scratch storage
  • 10. SEMI-STRUCTURED DATA ● Sources ○ Stateless applications ○ Sensors ● Common formats ○ CSV, JSON, XML ○ ORC, Avro, Parquet
  • 11. DATA PROCESSING ● Variety of sources and formats ● Schema detection ● Distributed streaming and batch ETL
  • 12. STRUCTURED DATA ● Cataloged into databases and tables ● External locations map to object URIs ● Table and column statistics
  • 20. OPEN DATA HUB Collaborate on a Data & AI platform for the Hybrid Cloud ● Open source community for AI-as-a-service platform ● Cloud-agnostic - AI for the Hybrid Cloud ● No cloud vendor lock-in ● OpenDataHub.io
  • 21. Sentiment analysis and entity detection on customer engagements, support tickets, marketing surveys and more. Trained on the specific Red Hat product terminology. AWS Microsoft AzureOpenStackDatacenterLaptop CONTAINERIZER APPS AT RED HAT’S CORE PROCESSES Internal Use Cases
  • 22. AWS Microsoft AzureOpenStackDatacenterLaptop CONTAINERIZER APPS AT RED HAT’S CORE PROCESSES Internal Use Cases Improve Red Hat’s core Engineering and Operations processes by applying analytics, machine learning, and AI. AWS Microsoft AzureOpenStackDatacenterLaptop CONTAINERIZER APPS - rules - heuristics - ML
  • 23. CORE DEPLOYMENT ● Container platform ● Certified Kubernetes ● Hybrid cloud ● Unified, distributed storage ● RESTful gateway ● S3 and Swift compatible ● Radanalytics.io community ● Unified analytics engine ● Large-scale data ● Runs on Kubernetes ● Multi-user Jupyter ● Used for data science and research Available Now at OpenDataHub.io
  • 24. Add-Ons ● Part of Open Data Hub ● Set of deployed pre-defined AI models available to use ● Monitoring and alerting toolkit ● Records numeric time series data ● Used to diagnose problems ● Analytics platform for all metrics ● Query, visualize and alert on metrics ● Deploying machine learning models on Kubernetes ● Expose models via REST and gRPC ● Full model lifecycle management Available Now at OpenDataHub.io Open Data Hub AI Library
  • 26. PLANNED RELEASES Highlights July 2019 Data Engineering Additions - Cloudera Hue deployment - Spark SQL Thrift Server deployment - Argo deployment - MLFlow deployment - Kubeflow integration - Kafka (Strimzi) deployment - Seldon-core deployment October 2019 To be determined January 2019 Version 0.1 - Initial ODH Release - OCP 3.10 and 3.11 support - JupyterHub + Spark + Ceph-nano deployment April 2019 Operator Support + Monitoring - OCP 4.0+ support - Open Data Hub operator - AI Library - Rook for Ceph deployment - TwoSigma BeakerX integration - JupyterHub with GPU support - Prometheus deployment with Spark monitoring
  • 27. AI AND MACHINE LEARNING IN THIS LAB
  • 28. AI IN THIS LAB
  • 29. WHAT NEXT? ● Try Open Data Hub yourself! ○ https://try.openshift.com ○ https://gitlab.com/opendatahub/opendatahub-operator ● Building the Next Generation of Innovation Together ○ Thursday at 8:30 AM ● Kaleidoscope of Innovation: AI and Machine Learning on OpenShift ○ Part 1: Thursday at 2:00 PM ○ Part 2: Thursday at 3:15 PM Red Hat data analytics infrastructure solution red.ht/videos-RHDAIS
  • 30.
  • 31.
  • 32. MACHINE LEARNING CYCLE Ingest Prepare Preprocess Discover Develop Train Test Deploy MKL-DNN cuDNN