Around the world, businesses are turning to AI to transform the way they operate and serve their customers. But before they can implement these technologies, companies must address the roadblock of moving from batch analytics to making real-time decisions by rapidly accessing and analyzing the relevant information amidst a sea of data. Yaron will explain how to make Spark handle multivariate real-time, historical and event data simultaneously to provide immediate and intelligent responses. He will present several time sensitive use-cases including fraud detection, prevention of outages and customer recommendations to demonstrate how to perform predictive analytics and real-time actions with Spark.
Speaker: Yaron Ekshtein
2. § Current data-science and analytics challenges
§ A continuous and cloud native architecture
§ What does Serverless have to do with it?
§ Use cases
§ Summary and Q&A
Agenda
3. 3
The Surprising Truth About What it Takes to Build a
Machine Learning Product
Source: https://medium.com/thelaunchpad/the-ml-surprise-f54706361a6c
Josh Cogan, Google
4. The Data-Driven Business Challenge
From Reactive to Proactive and Intelligent
Value
of Data
Time to Action
Real-time Minutes Days
Interactive
Event-Driven
Batch
5. Evolve Into an Agile Cloud-Native Architecture
Your Business Logic
Consume
Innovate
Cloud Storage and Databases
Any Containerized Microservice
6. 6
Today: Intelligent App Pipeline is Complex and Siloed
Multiple Management
Interfaces:
Collection and
Exploration
ML Development
and Training
Deployment & Serving
(cloud or edge)
Stream Processing
ETL and Batch ML Training Jobs
Interactive Data Science ML model
Interactive app
Data and
Compute:
Data and
Compute:
Data and
Compute:
Data Engineers
App Developers,
Data EngineersData Scientists
Data Sources
Data Lakes/
Warehouses Reports and
Dashboards
Triggers and
Interaction
7. 7
A Continuous Pipeline, Focused On Production
Real-time and historical data
Train and Test
ML Models
Deploy with
Serverless
Collect, Explore
and Tag Data
Monitor
Triggers and
Interactions
Data Sources
Develop MonitorDeploy
Microservices
8. 8
§ Zero copy, buffer reuse
§ Up to 400K events/sec/proc
§ GPU Support
Nuclio: Taking Serverless to The Next Level
Function
Workers
Event
Listeners
Open-source Serverless for compute & data intensive tasks
Extreme Performance
Shard 1 Workers
Workers
Shard 2
Shard 3
Shard 4 Workers
Advanced Data & AI Features
DB, MQ, File
Functions
§ Auto-rebalance, checkpoints
§ Any trigger source
§ Simple integration
§ Data bindings
§ Shared volumes
§ Context cache
Statefulness
nuclio processor
11. Demo: Voice Driven Real-Time Analytics
Voice
Query
SQL APIAI
Update
Locations SMART HOME
DEVICE
GOOGLE
MAP
SERVICE
WEB UI (REACT)
SQL Query
12. 12
Use Case: Real-Time Analysis of Financial Data
RT Tweet
Sentiment
Analysis
Tick feed
Analysis
& Tagging
Real-time Dashboard
News Stream
viewer
World Trading Data
Data Exploration
& RT Analysis
• Enriched tweet stream
• Stocks tables
• Stocks + sentiment TSDB
13. 13
Auto-Healing Network Operations
Predict network outages and avoid them in real-time
§ Cross correlating real time data from multiple sources with historical data
§ AI based predictions trigger pre-programmed actions that fix evolving problems in the network
§ Implemented within weeks
14. 14
Demo: Predictive Netops Using Serverless + Spark
NLP processing
Of real-time
router logs
NetFlow
data
Exploration &
Correlation
ML Training,
Model export
Failure & Anomaly
prediction
Real-time DB
Real-time
telemetry
Serverless
Spark
Auto-deploy
15. 15
Real-time Data and AI for Airport Operations
Real-time Database
NoSQL + K/V tables + TSDB
Ingest and Process Data for Intelligent Apps
Staff
roster
Vehicle
Telemetry
Passenger
status
Flight Status
Baggage
status
Flight
Schedule
Events Streams
Scheduled batch
Push / Pull via
REST API
Insights
BI style dashboards
& alerts
Real-time Apps
Dashboards
alerts and actions
Intelligent Apps
Other AI/ML
Systems
Leading Airport Ground Operations uses AI to react faster to schedule changes
§ Quicker ground handling response to flight re-scheduling
§ Operational efficiency and visibility
16. Time Series Vectors
(Avg, Min/Max, Stdev per sensor)
Process
Sensor Data
• ML Models
• Machine Metadata
• Environmental dataReal-time
dashboard
Real-time
Alerts
Predicted
Alerts
Aggregate using
Time Series APIs
Every 6
hours
Every 15
minutes
Devices & Machines
Predict Upload to
Cloud
Query
APIs
Stream
Trigger
NoSQL & Time
Series API
intelligent edge
Web
hook
Update ML
Model
Example: Predictive Maintenance Based on Real-time + Historical Data
17. 17
§ Focus on using data, not collecting it
§ Adopt a continuous data and integration approach
§ Consolidate cloud-native microservices architecture
§ Use Serverless – for faster agile results
Build continuous, AI-driven and proactive apps faster
Summary
My Email: yarone@Iguazio.com