Enviar pesquisa
Carregar
Next Generation Tooling for building streaming analytics app
•
Transferir como PPTX, PDF
•
2 gostaram
•
244 visualizações
G
gvetticaden
Seguir
Presentation at Berlin Data Summit 2018
Leia menos
Leia mais
Tecnologia
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 23
Baixar agora
Recomendados
Curing the Kafka Blindness
Curing the Kafka Blindness
gvetticaden
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Flink Forward
Trouble with Performance Debugging? Not Anymore with Choreo, the AI-Assisted ...
Trouble with Performance Debugging? Not Anymore with Choreo, the AI-Assisted ...
WSO2
Flink Forward San Francisco 2018: Xingzhong Xu - "Scaling Uber’s Realtime Opt...
Flink Forward San Francisco 2018: Xingzhong Xu - "Scaling Uber’s Realtime Opt...
Flink Forward
Hybrid API Management with Kong - Ivan Rylach, Kong Summit, 2020
Hybrid API Management with Kong - Ivan Rylach, Kong Summit, 2020
Ivan Rylach
Toward Hybrid Cloud Serverless Transparency with Lithops Framework
Toward Hybrid Cloud Serverless Transparency with Lithops Framework
LibbySchulze
Meetup #5 API Testing World
Meetup #5 API Testing World
Malang QA Community
The Netflix API for a global service
The Netflix API for a global service
Katharina Probst
Recomendados
Curing the Kafka Blindness
Curing the Kafka Blindness
gvetticaden
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Flink Forward
Trouble with Performance Debugging? Not Anymore with Choreo, the AI-Assisted ...
Trouble with Performance Debugging? Not Anymore with Choreo, the AI-Assisted ...
WSO2
Flink Forward San Francisco 2018: Xingzhong Xu - "Scaling Uber’s Realtime Opt...
Flink Forward San Francisco 2018: Xingzhong Xu - "Scaling Uber’s Realtime Opt...
Flink Forward
Hybrid API Management with Kong - Ivan Rylach, Kong Summit, 2020
Hybrid API Management with Kong - Ivan Rylach, Kong Summit, 2020
Ivan Rylach
Toward Hybrid Cloud Serverless Transparency with Lithops Framework
Toward Hybrid Cloud Serverless Transparency with Lithops Framework
LibbySchulze
Meetup #5 API Testing World
Meetup #5 API Testing World
Malang QA Community
The Netflix API for a global service
The Netflix API for a global service
Katharina Probst
Data Driven API Testing: Best Practices for Real-World Testing Scenarios
Data Driven API Testing: Best Practices for Real-World Testing Scenarios
SmartBear
4 Major Advantages of API Testing
4 Major Advantages of API Testing
QASource
Expanding beyond SPL -- More language support in IBM Streams V4.1
Expanding beyond SPL -- More language support in IBM Streams V4.1
lisanl
Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021
Lalit Panwar
Mule soft mcia-level-1 Dumps
Mule soft mcia-level-1 Dumps
Armstrongsmith
Github Projects Overview and IBM Streams V4.1
Github Projects Overview and IBM Streams V4.1
lisanl
5 Pillars of Building Enterprise0grade APIs
5 Pillars of Building Enterprise0grade APIs
WSO2
Kong Summit 2021 - Opening Keynote
Kong Summit 2021 - Opening Keynote
Ivan Rylach
Vizag mulesoft-meetup-6-anypoint-datagraph--v2
Vizag mulesoft-meetup-6-anypoint-datagraph--v2
Ravi Tamada
The Paved PaaS to Microservices at Netflix (IAS2017 Nanjing)
The Paved PaaS to Microservices at Netflix (IAS2017 Nanjing)
Yunong Xiao
GraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
GraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
Nicola Molinari
Dynatrace
Dynatrace
Purnima Kurella
Introduction to APIs & how to automate APIs testing with selenium web driver?
Introduction to APIs & how to automate APIs testing with selenium web driver?
BugRaptors
Introduction to Anypoint Runtime Fabric on Amazon Elastic Kubernetes Service ...
Introduction to Anypoint Runtime Fabric on Amazon Elastic Kubernetes Service ...
Anoop Ramachandran
GraphQL Europe Recap
GraphQL Europe Recap
Philipp Sporrer
Deep Dive on CI/CD NYC Meet Up Group
Deep Dive on CI/CD NYC Meet Up Group
NeerajKumar1965
Virtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin Knauf
Virtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin Knauf
Flink Forward
The Magic Behind Faster API Development, Testing and Delivery with API Virtua...
The Magic Behind Faster API Development, Testing and Delivery with API Virtua...
SmartBear
MuleSoft Surat Virtual Meetup#17 - Automated Code Review
MuleSoft Surat Virtual Meetup#17 - Automated Code Review
Jitendra Bafna
2.3.anypoint exchange
2.3.anypoint exchange
Prakash Chakravarthi
Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...
DataWorks Summit
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
Mais conteúdo relacionado
Mais procurados
Data Driven API Testing: Best Practices for Real-World Testing Scenarios
Data Driven API Testing: Best Practices for Real-World Testing Scenarios
SmartBear
4 Major Advantages of API Testing
4 Major Advantages of API Testing
QASource
Expanding beyond SPL -- More language support in IBM Streams V4.1
Expanding beyond SPL -- More language support in IBM Streams V4.1
lisanl
Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021
Lalit Panwar
Mule soft mcia-level-1 Dumps
Mule soft mcia-level-1 Dumps
Armstrongsmith
Github Projects Overview and IBM Streams V4.1
Github Projects Overview and IBM Streams V4.1
lisanl
5 Pillars of Building Enterprise0grade APIs
5 Pillars of Building Enterprise0grade APIs
WSO2
Kong Summit 2021 - Opening Keynote
Kong Summit 2021 - Opening Keynote
Ivan Rylach
Vizag mulesoft-meetup-6-anypoint-datagraph--v2
Vizag mulesoft-meetup-6-anypoint-datagraph--v2
Ravi Tamada
The Paved PaaS to Microservices at Netflix (IAS2017 Nanjing)
The Paved PaaS to Microservices at Netflix (IAS2017 Nanjing)
Yunong Xiao
GraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
GraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
Nicola Molinari
Dynatrace
Dynatrace
Purnima Kurella
Introduction to APIs & how to automate APIs testing with selenium web driver?
Introduction to APIs & how to automate APIs testing with selenium web driver?
BugRaptors
Introduction to Anypoint Runtime Fabric on Amazon Elastic Kubernetes Service ...
Introduction to Anypoint Runtime Fabric on Amazon Elastic Kubernetes Service ...
Anoop Ramachandran
GraphQL Europe Recap
GraphQL Europe Recap
Philipp Sporrer
Deep Dive on CI/CD NYC Meet Up Group
Deep Dive on CI/CD NYC Meet Up Group
NeerajKumar1965
Virtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin Knauf
Virtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin Knauf
Flink Forward
The Magic Behind Faster API Development, Testing and Delivery with API Virtua...
The Magic Behind Faster API Development, Testing and Delivery with API Virtua...
SmartBear
MuleSoft Surat Virtual Meetup#17 - Automated Code Review
MuleSoft Surat Virtual Meetup#17 - Automated Code Review
Jitendra Bafna
2.3.anypoint exchange
2.3.anypoint exchange
Prakash Chakravarthi
Mais procurados
(20)
Data Driven API Testing: Best Practices for Real-World Testing Scenarios
Data Driven API Testing: Best Practices for Real-World Testing Scenarios
4 Major Advantages of API Testing
4 Major Advantages of API Testing
Expanding beyond SPL -- More language support in IBM Streams V4.1
Expanding beyond SPL -- More language support in IBM Streams V4.1
Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft mcia-level-1 Dumps
Mule soft mcia-level-1 Dumps
Github Projects Overview and IBM Streams V4.1
Github Projects Overview and IBM Streams V4.1
5 Pillars of Building Enterprise0grade APIs
5 Pillars of Building Enterprise0grade APIs
Kong Summit 2021 - Opening Keynote
Kong Summit 2021 - Opening Keynote
Vizag mulesoft-meetup-6-anypoint-datagraph--v2
Vizag mulesoft-meetup-6-anypoint-datagraph--v2
The Paved PaaS to Microservices at Netflix (IAS2017 Nanjing)
The Paved PaaS to Microservices at Netflix (IAS2017 Nanjing)
GraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
GraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
Dynatrace
Dynatrace
Introduction to APIs & how to automate APIs testing with selenium web driver?
Introduction to APIs & how to automate APIs testing with selenium web driver?
Introduction to Anypoint Runtime Fabric on Amazon Elastic Kubernetes Service ...
Introduction to Anypoint Runtime Fabric on Amazon Elastic Kubernetes Service ...
GraphQL Europe Recap
GraphQL Europe Recap
Deep Dive on CI/CD NYC Meet Up Group
Deep Dive on CI/CD NYC Meet Up Group
Virtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin Knauf
Virtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin Knauf
The Magic Behind Faster API Development, Testing and Delivery with API Virtua...
The Magic Behind Faster API Development, Testing and Delivery with API Virtua...
MuleSoft Surat Virtual Meetup#17 - Automated Code Review
MuleSoft Surat Virtual Meetup#17 - Automated Code Review
2.3.anypoint exchange
2.3.anypoint exchange
Semelhante a Next Generation Tooling for building streaming analytics app
Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...
DataWorks Summit
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
SAM—streaming analytics made easy
SAM—streaming analytics made easy
DataWorks Summit
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
DataWorks Summit/Hadoop Summit
Unlocking insights in streaming data
Unlocking insights in streaming data
Carolyn Duby
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
DataWorks Summit
Vnv kumar performance testing
Vnv kumar performance testing
Vinay Kumar
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
DataWorks Summit
Schema Registry & Stream Analytics Manager
Schema Registry & Stream Analytics Manager
Sriharsha Chintalapani
Streaming analytics manager
Streaming analytics manager
Sriharsha Chintalapani
Virtual Flink Forward 2020: Keynote: The Evolution of Data Infrastructure at ...
Virtual Flink Forward 2020: Keynote: The Evolution of Data Infrastructure at ...
Flink Forward
Breaking the Monolith
Breaking the Monolith
VMware Tanzu
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Kelly Kohlleffel
Neha Arora_Resume
Neha Arora_Resume
Neha Arora
Vijaybabu_Resume
Vijaybabu_Resume
vijay balakrishnan
APIs and Services for Fleet Management - Talks given @ APIDays Berlin and Ba...
APIs and Services for Fleet Management - Talks given @ APIDays Berlin and Ba...
Toralf Richter
Full-Stack Observability for IoT Event Stream Data Processing at Penske
Full-Stack Observability for IoT Event Stream Data Processing at Penske
VMware Tanzu
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
Jim Dowling
Prasanth
Prasanth
Prasanth K
Linux Akraino Blueprint
Linux Akraino Blueprint
Liz Warner
Semelhante a Next Generation Tooling for building streaming analytics app
(20)
Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
SAM—streaming analytics made easy
SAM—streaming analytics made easy
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
Unlocking insights in streaming data
Unlocking insights in streaming data
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Vnv kumar performance testing
Vnv kumar performance testing
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
Schema Registry & Stream Analytics Manager
Schema Registry & Stream Analytics Manager
Streaming analytics manager
Streaming analytics manager
Virtual Flink Forward 2020: Keynote: The Evolution of Data Infrastructure at ...
Virtual Flink Forward 2020: Keynote: The Evolution of Data Infrastructure at ...
Breaking the Monolith
Breaking the Monolith
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Neha Arora_Resume
Neha Arora_Resume
Vijaybabu_Resume
Vijaybabu_Resume
APIs and Services for Fleet Management - Talks given @ APIDays Berlin and Ba...
APIs and Services for Fleet Management - Talks given @ APIDays Berlin and Ba...
Full-Stack Observability for IoT Event Stream Data Processing at Penske
Full-Stack Observability for IoT Event Stream Data Processing at Penske
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
Prasanth
Prasanth
Linux Akraino Blueprint
Linux Akraino Blueprint
Último
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Florian Wilhelm
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
carlostorres15106
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Fwdays
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Sergiu Bodiu
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
charlottematthew16
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Miki Katsuragi
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Patryk Bandurski
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
2toLead Limited
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Alex Barbosa Coqueiro
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
gvaughan
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
SeasiaInfotech2
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Dubai Multi Commodity Centre
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
RankYa
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Safe Software
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
comworks
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Scott Keck-Warren
Último
(20)
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Next Generation Tooling for building streaming analytics app
1.
1 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Next Gen Tooling for Building Stream Analytics App Berlin Talk George Vetticaden VP of Emerging Products at Hortonworks
2.
2 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager (SAM) What is it? Helps users build and deploy complex stream analytics apps without writing code using graphical interface An Open Source ASF Licensed tool – Project Name: Streamline, https://github.com/hortonworks/streamline Key Design Principles Build stream analytics apps w/o specialized skillsets. Support multiple underlining streaming engine (Storm, Spark Streaming, Flink) Extensibility – Provide SDK to plug in custom sources/sinks/processors/udfs Schema is a first class citizen
3.
3 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Who Uses SAM?
4.
4 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved SAM is All about Doing Real-Time Analytics on the Stream Real-Time Prescriptive Analytics Real-Time Analytics Real-Time Predictive Analytics Real-Time Descriptive Analytics What should we do right now? What could happen now/soon? What is happening right now?
5.
5 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Showcasing SAM with an Use Case
6.
6 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Trucking company w/ large fleet of international trucks A truck generates millions of events for a given route; an event could be: 'Normal' events: starting / stopping of the vehicle ‘Violation’ events: speeding, excessive acceleration and breaking, unsafe tail distance ‘Speed’ Events: The speed of a driver that comes in every minute. Company uses an application that monitors truck locations and violations from the truck/driver in real- time Route? Truck? Driver? Analysts query a broad history to understand if today’s violations are part of a larger problem with specific routes, trucks, or drivers
7.
7 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Two Streaming Sources: Geo Event and Speed Stream eventType truckId driverId driverName longitude eventTime routeId eventSource 2018-01-22 15:00:58.493|1516654858493| truck_geo_event |40| 23 | G Vetticaden | 1090292248 | Peoria to Ceder Rapids Route 2 | Lane Departure| 40.7 | -89.52 |1| route latitude correlationId speed truckId driverId driverName eventTime routeIdeventSource 2018-01-22 15:00:58.673|1516654858673| truck_speed_event|40| 23 | G Vetticaden| 1090292248 | Peoria to Ceder Rapids Route 2 | 73 | route Each Truck emits different event stream – Truck Geo Event – Truck Speed Event Truck Geo Event: Truck Speed Event: eventTimeLong eventTimeLong
8.
8 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Common Streaming Analytics Requirements Streaming Analytics Requirement # Requirement Description Req. #1 Create streams consuming from the two Kafka topics that NiFi delivered the enriched geo and speed streams to. Req. #2 Join the streams of the Geo and Speed sensors over a time based aggregation window. Req. #3 Apply rules on the stream to filter on events of interest. Req. #4 Calculate the average speed of driver over 3 minute window and create alert for speeding driver Req. #5 Enrich the stream with features required for a machine learning (ML) model. The enrichment entails performing lookups for driver HR info, hours/miles driven in the past week, weather info. Req. #6 Normalize the events in the stream to feed into the PMML model. Req. #7 Execute a predictive logistical regression model on the stream built with Spark ML to predict if a driver is in danger going to commit a violation. Req. #8 Alert and feed into real-time dashboard if model predicted a violation.
9.
9 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Implementing these 8 Requirements in Storm – Complex and Lots of Java Code
10.
10 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Implementing this with Spark Structured Streaming – Complex and Lots of Scala Code
11.
11 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Implementing this with SAM – Easy as Pie
12.
12 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Demo Implement Requirements 1-4 Streaming Analytics Requirement # Requirement Description Req. #1 Create streams consuming from the two Kafka topics that NiFi delivered the enriched geo and speed streams to. Req. #2 Join the streams of the Geo and Speed sensors over a time based aggregation window. Req. #3 Apply rules on the stream to filter on events of interest. Req. #4 Calculate the average speed of driver over 3 minute window and create alert for speeding driver Req. #5 Enrich the stream with features required for a machine learning (ML) model. The enrichment entails performing lookups for driver HR info, hours/miles driven in the past week, weather info. Req. #6 Normalize the events in the stream to feed into the PMML model. Req. #7 Execute a predictive logistical regression model on the stream built with Spark ML to predict if a driver is in danger going to commit a violation. Req. #8 Alert and feed into real-time dashboard if model predicted a violation.
13.
13 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Demo Implementing Streaming Reqs 1 - 4 with SAM
14.
14 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Implement Requirements 5-8 Streaming Analytics Requirement # Requirement Description Req. #1 Create streams consuming from the two Kafka topics that NiFi delivered the enriched geo and speed streams to. Req. #2 Join the streams of the Geo and Speed sensors over a time based aggregation window. Req. #3 Apply rules on the stream to filter on events of interest. Req. #4 Calculate the average speed of driver over 3 minute window and create alert for speeding driver Req. #5 Enrich the stream with features required for a machine learning (ML) model. The enrichment entails performing lookups for driver HR info, hours/miles driven in the past week, weather info. Req. #6 Normalize the events in the stream to feed into the PMML model. Req. #7 Execute a predictive logistical regression model on the stream built with Spark ML to predict if a driver is in danger going to commit a violation. Req. #8 Alert and feed into real-time dashboard if model predicted a violation.
15.
15 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Real-Time Predictive Analytics Question: No violation events but what might happen that I need to be worried about? My data science team has a model that can predict that based on – Weather – Roads – Driver HR info like driver certification status, wagePlan – Driver timesheet info like hours, and miles logged over the last week
16.
16 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Building the Predictive Model Identify suitable ML algorithms to train a model – we will use classification algorithms as we have labeled events data 2 Transform enriched events data to a format that is friendly to Spark MLlib – many ML libs expect training data in a certain format 3 Train a logistic classification Spark model on YARN, with above events as training input, and iterate to fine tune generated model 4 Explore small subset of events to identify predictive features and make a hypothesis. E.g. hypothesis: “foggy weather causes driver violations” 1
17.
17 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Logistical Regression Model
18.
18 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Scoring the Predictive Model using SAM Use SAM’s enrich/custom processors to enrich the event with the features required for the model6 Enrich with Features Use SAM’s projection/custom processors to transform/normalize the streaming event and the features required for the model 7 Transform/Normalize Use SAM’s PMML processor to score the model for each stream event with its required features8 Score Model Use SAM’s rule and notification processors to alert, notify and take action using the results of the model9 Alert / Notify / Action Export the Spark Mllib model and import into the SAM’s Model Registry 5 Model Registry
19.
19 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Demo Implementing Streaming Reqs 5 - 8
20.
20 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved SAM “Test” Mode Key Highlights Persona User: Developer Allows Developers to test SAM app locally without deploying to cluster SAM Test Mode Lifecycle 1. Create Named Test 2. Mock out sources (e.g: Kafka) with test data 3. Execute the Test Case 4. Visualize Validate the data as it traverses through the SAM DAG Visualization 5. Download the Test Case Results Can do steps 1-5 via UI or REST Enables writing automated test (Junit)
21.
21 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Build Automated Junit Tests and CI & CD Pipelines with SAM
22.
22 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Visualizing Output of Test Mode is Great…But want CI/CD Key Highlights What Users Really Want? – Automated Unit Tests – Write Junit tests for streaming apps like their traditional apps – Continuous Integration - Incorporate streaming apps into their CI Envs with Jenkins – Continuous Delivery – Deliver to business new features in a continuous fashion. SAM Addresses these needs. How? – SAM has first class support for REST. – Everything in UI can be done with SAM REST. This includes SAM Test Mode. SAM REST allows you to – Create service pools, create envs, create test case, setup test data, download test results, validate, deploy app to diff envs
23.
23 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Demo Automated JUnit Tests / CI / CD
Notas do Editor
s
s
Baixar agora