SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
Hadoop and Spark – Perfect Together
Arun C. Murthy (@acmurthy)
Co-Founder, Hortonworks
® ®
© Hortonworks Inc. 2015. All Rights Reserved
Data Operating System
Enable all data and applications
TO BE
accessible and shared
BY
any end-users
© Hortonworks Inc. 2015. All Rights Reserved
Data Operating System: Open Enterprise Hadoop
YARN: data operating system
Governance Security
Operations
Resource management
Data access: batch, interactive, real-time
Storage
Commodity Appliance Cloud
Built on a centralized architecture of
shared enterprise services:
Scalable tiered storage
Resource and workload management
Trusted data governance & metadata management
Consistent operations
Comprehensive security
Developer APIs and tools
Hadoop/YARN-powered data operating system
100% open source, multi-tenant data platform for
any application, any data set, anywhere.
© Hortonworks Inc. 2015. All Rights Reserved
Elegant Developer APIs
DataFrames, Machine Learning, and SQL
Made for Data Science
All apps need to get predictive at scale and fine granularity
Democratize Machine Learning
Spark is doing to ML on Hadoop what Hive did for SQL on
Hadoop
Community
Broad developer, customer and partner interest
Realize Value of Data Operating System
A key tool in the Hadoop toolbox
Why We Love Spark at Hortonworks
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
© Hortonworks Inc. 2015. All Rights Reserved
Resource Management
YARN for multi-tenant, diverse workloads with predictable SLAs
Tiered Memory Storage
HDFS in-memory tier – External BlockStore for RDD Cache
SparkSQL & Hive for SQL
Interop with modern Metastore/HS2, optimized ORC support,
advanced analytics e.g. Geospatial
Spark & NoSQL
Deep integration with HBase via DataSources/Catalyst for
Predicate/Aggregate Pushdown
Connect The Dots – Algorithms to Use-Cases
Higher-level ML Abstractions – E.g. OneVsRest
Validation, Tuning, Pipeline assembly, Model export/retraining …
Spark and Hadoop – How Can We Do Better?
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
© Hortonworks Inc. 2015. All Rights Reserved
Ease of Use
Apache Zeppelin for interactive notebooks
Metadata & Governance
Apache Atlas for metadata & Apache Falcon support for
Spark pipelines
Security & Operations
Apache Ranger managed authorization and deployment/
management via Apache Ambari
Deployable Anywhere
Linux, Windows, on-premises or cloud
Self-Service Spark in the Cloud
Easy launch of Data Science clusters via Cloudbreak and
Ambari – for Azure, AWS, GCP, OpenStack, Docker
Spark and Hadoop – How Can We Do Better?
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
© Hortonworks Inc. 2015. All Rights Reserved
Let’s Talk Real-World Use-Cases
© Hortonworks Inc. 2015. All Rights Reserved
Real-time events and alerts application
Last Year: Real-time and Interactive Application
Microsoft
Excel
Truck sensors
Real-time and interactive
data processing
running SIMULTANEOUSLY on YARN
Trucking company datasets stored in HDFS
© Hortonworks Inc. 2015. All Rights Reserved
Chief	
  Data	
  Officer’s	
  Needs	
  
Increase	
  safety	
  and	
  reduce	
  liabili1es	
  
An1cipate	
  driver	
  viola1ons	
  BEFORE	
  they	
  	
  	
  
happen	
  and	
  take	
  precau1onary	
  ac1ons	
  
Applica5on	
  Team’s	
  Response	
  
Enrich	
  app	
  with	
  weather	
  data	
  and	
  driver	
  profiles	
  
Explore	
  data	
  for	
  predic1ve	
  model	
  features	
  
Train	
  and	
  generate	
  predic1ve	
  model	
  
Plug	
  in	
  model	
  to	
  original	
  applica1on	
  so	
  it	
  can	
  predict	
  
driver	
  viola1ons	
  in	
  real-­‐1me	
  
Truck Fleet Use Case: Real-time, Predictive App
© Hortonworks Inc. 2015. All Rights Reserved
Data Scientist: Explore Data & Build Model in Cloud
Click-thru Demo
Provision data science
environment in the cloud
Use data science
notebook to explore data
Run algorithms to create
predictive model
Cloudbreak
1. Choose a cloud
2. Pick the Spark
blueprint
3. Launch HDP
Microsoft Azure
© Hortonworks Inc. 2015. All Rights Reserved
Login to launch.hortonworks.com
which is a self-service portal for
launching HDP clusters to the cloud
© Hortonworks Inc. 2015. All Rights Reserved
Select a cloud provider, then start the
process of creating your cluster
© Hortonworks Inc. 2015. All Rights Reserved
Name the cluster, choose your region,
and pick your blueprint…in this case,
we want “hdp-spark-cluster” for our
data science work
© Hortonworks Inc. 2015. All Rights Reserved
We clicked “create cluster” and
Cloudbreak is now provisioning our
Spark environment on Azure
© Hortonworks Inc. 2015. All Rights Reserved
We can now access Zeppelin which is
a data science notebook for Spark
that’s similar to iPython notebook
© Hortonworks Inc. 2015. All Rights Reserved
Let’s look at our data. We can see
eventType, if the driver’s certified, how
many hours driven, as well as weather
data such as foggy, rainy, etc.
© Hortonworks Inc. 2015. All Rights Reserved
Let’s start asking questions of our
data; such as, does fatigue cause
violations?
© Hortonworks Inc. 2015. All Rights Reserved
Let’s view the data in a pie chart
graphic to see how violations look by
hours driven.
© Hortonworks Inc. 2015. All Rights Reserved
How are violations impacted by fog?
© Hortonworks Inc. 2015. All Rights Reserved
Does location have an impact on
incidents?
© Hortonworks Inc. 2015. All Rights Reserved
OK, we’ve learned enough about the
data and what features we want to
include in our model. So we’ll run a
logistic regression on training data.
© Hortonworks Inc. 2015. All Rights Reserved
Let’s run our code
© Hortonworks Inc. 2015. All Rights Reserved
Let’s look at our model.
Next step is to hand the model off to
the Enterprise Architect to integrate
into our real-time application.
© Hortonworks Inc. 2015. All Rights Reserved
Apps on YARN
Trucking company datasets stored in HDFS
Real-time and Predictive Application Architecture
Your BI Tool
Predictive application
Truck sensors
App alerts
(ActiveMQ)
Messages
SQL Stream NoSQLML
Use
Model
© Hortonworks Inc. 2015. All Rights Reserved
HDP and Spark…
Perfect Together

Mais conteúdo relacionado

Mais procurados

Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in Action
DataWorks Summit
 
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDownscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Databricks
 

Mais procurados (20)

Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in Action
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
 
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkReal-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
 
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
 
Shifting Data Science into High Gear
Shifting Data Science into High GearShifting Data Science into High Gear
Shifting Data Science into High Gear
 
Disrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the CloudDisrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the Cloud
 
How Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaHow Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-Shma
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
 
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSmack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
 
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDownscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
 
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
LinkedIn
LinkedInLinkedIn
LinkedIn
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
 
Apache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing SystemApache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing System
 
Simplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta LakeSimplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta Lake
 
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
 

Destaque

Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Spark Summit
 
禽流感和人流感簡介
禽流感和人流感簡介禽流感和人流感簡介
禽流感和人流感簡介
honan4108
 
13-07-2015 Greenlight (Visualisations removed)
13-07-2015 Greenlight (Visualisations removed)13-07-2015 Greenlight (Visualisations removed)
13-07-2015 Greenlight (Visualisations removed)
Marius Lazauskas
 
Targets as tools, not talismans - Don Williams
Targets as tools, not talismans - Don WilliamsTargets as tools, not talismans - Don Williams
Targets as tools, not talismans - Don Williams
HELIGLIASA
 
Dimension política de las redes sociales
Dimension política de las redes socialesDimension política de las redes sociales
Dimension política de las redes sociales
Cristina Juesas
 
名人美食經
名人美食經名人美食經
名人美食經
honan4108
 
Consulta respecto a la Constitucionalidad de Norma Relacionada con la Pensión...
Consulta respecto a la Constitucionalidad de Norma Relacionada con la Pensión...Consulta respecto a la Constitucionalidad de Norma Relacionada con la Pensión...
Consulta respecto a la Constitucionalidad de Norma Relacionada con la Pensión...
Dra. Roxana Silva Ch.
 
Radiation reactors
Radiation reactorsRadiation reactors
Radiation reactors
jmocherman
 

Destaque (20)

Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...
 
Spark + Flashblade: Spark Summit East talk by Brian Gold
Spark + Flashblade: Spark Summit East talk by Brian GoldSpark + Flashblade: Spark Summit East talk by Brian Gold
Spark + Flashblade: Spark Summit East talk by Brian Gold
 
禽流感和人流感簡介
禽流感和人流感簡介禽流感和人流感簡介
禽流感和人流感簡介
 
Devoxx 2013 - David Tillemans - Security Test Automation in Software Developm...
Devoxx 2013 - David Tillemans - Security Test Automation in Software Developm...Devoxx 2013 - David Tillemans - Security Test Automation in Software Developm...
Devoxx 2013 - David Tillemans - Security Test Automation in Software Developm...
 
13-07-2015 Greenlight (Visualisations removed)
13-07-2015 Greenlight (Visualisations removed)13-07-2015 Greenlight (Visualisations removed)
13-07-2015 Greenlight (Visualisations removed)
 
Targets as tools, not talismans - Don Williams
Targets as tools, not talismans - Don WilliamsTargets as tools, not talismans - Don Williams
Targets as tools, not talismans - Don Williams
 
Dimension política de las redes sociales
Dimension política de las redes socialesDimension política de las redes sociales
Dimension política de las redes sociales
 
Código de Planificación y Finanzas Públicas Ecuador
Código de Planificación y Finanzas Públicas Ecuador Código de Planificación y Finanzas Públicas Ecuador
Código de Planificación y Finanzas Públicas Ecuador
 
Drongo: Zoeken in Audiovisuele Documenten
Drongo: Zoeken in Audiovisuele DocumentenDrongo: Zoeken in Audiovisuele Documenten
Drongo: Zoeken in Audiovisuele Documenten
 
Content Curation; or how to be an Information Hero
Content Curation; or how to be an Information HeroContent Curation; or how to be an Information Hero
Content Curation; or how to be an Information Hero
 
名人美食經
名人美食經名人美食經
名人美食經
 
Виховна робота
Виховна робота Виховна робота
Виховна робота
 
Measuring change presentation
Measuring change presentationMeasuring change presentation
Measuring change presentation
 
Engage All The Things: Rethinking Online Engagement
Engage All The Things: Rethinking Online EngagementEngage All The Things: Rethinking Online Engagement
Engage All The Things: Rethinking Online Engagement
 
Consulta respecto a la Constitucionalidad de Norma Relacionada con la Pensión...
Consulta respecto a la Constitucionalidad de Norma Relacionada con la Pensión...Consulta respecto a la Constitucionalidad de Norma Relacionada con la Pensión...
Consulta respecto a la Constitucionalidad de Norma Relacionada con la Pensión...
 
Radiation reactors
Radiation reactorsRadiation reactors
Radiation reactors
 
Content Marketing Master Class - San Francisco: Epilogue
Content Marketing Master Class - San Francisco: EpilogueContent Marketing Master Class - San Francisco: Epilogue
Content Marketing Master Class - San Francisco: Epilogue
 
2
22
2
 
Acuril 2016: Transition to customer focused Information services
Acuril 2016: Transition to customer focused Information servicesAcuril 2016: Transition to customer focused Information services
Acuril 2016: Transition to customer focused Information services
 

Semelhante a Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)

Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
skumpf
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
inevitablecloud
 

Semelhante a Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks) (20)

Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun Murthy
 
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's KeynoteSpark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's Keynote
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing ...
Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing ...Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing ...
Webinar Nebula&Scalr : Increasing Business Agility with Real-time Processing ...
 
Webinar: Increasing Business Agility with Real-time Processing with Apache Ha...
Webinar: Increasing Business Agility with Real-time Processing with Apache Ha...Webinar: Increasing Business Agility with Real-time Processing with Apache Ha...
Webinar: Increasing Business Agility with Real-time Processing with Apache Ha...
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Applying systems thinking to AWS enterprise application migration
Applying systems thinking to AWS enterprise application migrationApplying systems thinking to AWS enterprise application migration
Applying systems thinking to AWS enterprise application migration
 

Mais de Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

Mais de Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Último

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Último (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 

Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)

  • 1. Hadoop and Spark – Perfect Together Arun C. Murthy (@acmurthy) Co-Founder, Hortonworks ® ®
  • 2. © Hortonworks Inc. 2015. All Rights Reserved Data Operating System Enable all data and applications TO BE accessible and shared BY any end-users
  • 3. © Hortonworks Inc. 2015. All Rights Reserved Data Operating System: Open Enterprise Hadoop YARN: data operating system Governance Security Operations Resource management Data access: batch, interactive, real-time Storage Commodity Appliance Cloud Built on a centralized architecture of shared enterprise services: Scalable tiered storage Resource and workload management Trusted data governance & metadata management Consistent operations Comprehensive security Developer APIs and tools Hadoop/YARN-powered data operating system 100% open source, multi-tenant data platform for any application, any data set, anywhere.
  • 4. © Hortonworks Inc. 2015. All Rights Reserved Elegant Developer APIs DataFrames, Machine Learning, and SQL Made for Data Science All apps need to get predictive at scale and fine granularity Democratize Machine Learning Spark is doing to ML on Hadoop what Hive did for SQL on Hadoop Community Broad developer, customer and partner interest Realize Value of Data Operating System A key tool in the Hadoop toolbox Why We Love Spark at Hortonworks Storage YARN: Data Operating System Governance Security Operations Resource Management
  • 5. © Hortonworks Inc. 2015. All Rights Reserved Resource Management YARN for multi-tenant, diverse workloads with predictable SLAs Tiered Memory Storage HDFS in-memory tier – External BlockStore for RDD Cache SparkSQL & Hive for SQL Interop with modern Metastore/HS2, optimized ORC support, advanced analytics e.g. Geospatial Spark & NoSQL Deep integration with HBase via DataSources/Catalyst for Predicate/Aggregate Pushdown Connect The Dots – Algorithms to Use-Cases Higher-level ML Abstractions – E.g. OneVsRest Validation, Tuning, Pipeline assembly, Model export/retraining … Spark and Hadoop – How Can We Do Better? Storage YARN: Data Operating System Governance Security Operations Resource Management
  • 6. © Hortonworks Inc. 2015. All Rights Reserved Ease of Use Apache Zeppelin for interactive notebooks Metadata & Governance Apache Atlas for metadata & Apache Falcon support for Spark pipelines Security & Operations Apache Ranger managed authorization and deployment/ management via Apache Ambari Deployable Anywhere Linux, Windows, on-premises or cloud Self-Service Spark in the Cloud Easy launch of Data Science clusters via Cloudbreak and Ambari – for Azure, AWS, GCP, OpenStack, Docker Spark and Hadoop – How Can We Do Better? Storage YARN: Data Operating System Governance Security Operations Resource Management
  • 7. © Hortonworks Inc. 2015. All Rights Reserved Let’s Talk Real-World Use-Cases
  • 8. © Hortonworks Inc. 2015. All Rights Reserved Real-time events and alerts application Last Year: Real-time and Interactive Application Microsoft Excel Truck sensors Real-time and interactive data processing running SIMULTANEOUSLY on YARN Trucking company datasets stored in HDFS
  • 9. © Hortonworks Inc. 2015. All Rights Reserved Chief  Data  Officer’s  Needs   Increase  safety  and  reduce  liabili1es   An1cipate  driver  viola1ons  BEFORE  they       happen  and  take  precau1onary  ac1ons   Applica5on  Team’s  Response   Enrich  app  with  weather  data  and  driver  profiles   Explore  data  for  predic1ve  model  features   Train  and  generate  predic1ve  model   Plug  in  model  to  original  applica1on  so  it  can  predict   driver  viola1ons  in  real-­‐1me   Truck Fleet Use Case: Real-time, Predictive App
  • 10. © Hortonworks Inc. 2015. All Rights Reserved Data Scientist: Explore Data & Build Model in Cloud Click-thru Demo Provision data science environment in the cloud Use data science notebook to explore data Run algorithms to create predictive model Cloudbreak 1. Choose a cloud 2. Pick the Spark blueprint 3. Launch HDP Microsoft Azure
  • 11. © Hortonworks Inc. 2015. All Rights Reserved Login to launch.hortonworks.com which is a self-service portal for launching HDP clusters to the cloud
  • 12. © Hortonworks Inc. 2015. All Rights Reserved Select a cloud provider, then start the process of creating your cluster
  • 13. © Hortonworks Inc. 2015. All Rights Reserved Name the cluster, choose your region, and pick your blueprint…in this case, we want “hdp-spark-cluster” for our data science work
  • 14. © Hortonworks Inc. 2015. All Rights Reserved We clicked “create cluster” and Cloudbreak is now provisioning our Spark environment on Azure
  • 15. © Hortonworks Inc. 2015. All Rights Reserved We can now access Zeppelin which is a data science notebook for Spark that’s similar to iPython notebook
  • 16. © Hortonworks Inc. 2015. All Rights Reserved Let’s look at our data. We can see eventType, if the driver’s certified, how many hours driven, as well as weather data such as foggy, rainy, etc.
  • 17. © Hortonworks Inc. 2015. All Rights Reserved Let’s start asking questions of our data; such as, does fatigue cause violations?
  • 18. © Hortonworks Inc. 2015. All Rights Reserved Let’s view the data in a pie chart graphic to see how violations look by hours driven.
  • 19. © Hortonworks Inc. 2015. All Rights Reserved How are violations impacted by fog?
  • 20. © Hortonworks Inc. 2015. All Rights Reserved Does location have an impact on incidents?
  • 21. © Hortonworks Inc. 2015. All Rights Reserved OK, we’ve learned enough about the data and what features we want to include in our model. So we’ll run a logistic regression on training data.
  • 22. © Hortonworks Inc. 2015. All Rights Reserved Let’s run our code
  • 23. © Hortonworks Inc. 2015. All Rights Reserved Let’s look at our model. Next step is to hand the model off to the Enterprise Architect to integrate into our real-time application.
  • 24. © Hortonworks Inc. 2015. All Rights Reserved Apps on YARN Trucking company datasets stored in HDFS Real-time and Predictive Application Architecture Your BI Tool Predictive application Truck sensors App alerts (ActiveMQ) Messages SQL Stream NoSQLML Use Model
  • 25. © Hortonworks Inc. 2015. All Rights Reserved HDP and Spark… Perfect Together