SlideShare uma empresa Scribd logo
1 de 35
ML on Big Data: Real Time
Analysis on Time Series
Machine Learning
on Big Data
YASWANTH YADLAPALLI
Topics
 Business use case
 Training phase of the algorithm
 Tech stack
 Real time implementation
 Demonstration on a force sensor
Data Model
 We are currently working on these data models :
 Unstructured data
 Structured Data
 Time series Data
 For this talk we are going to concentrate on Time series data
Problem Statement
 To build a reactive application which trains on limited amount of data.
Business use case
 Main use case is in preventive maintenance systems.
 Calendar based maintenance schedules and holding excessive inventory to reduce
downtime all lead to inefficiencies and increase costs.
 Recent failures in machinery of oil rigs, car manufacturing plants have cost their
respective industries millions of dollars in down time and repairs.
 Condition Based Monitoring systems are implemented with the goal of
eliminating unplanned downtime and reducing operations cost by maintaining the
proper equipment at the proper time.
 As they say a stitch in time saves nine.
Sample Data
Our solution
Our solution
Our solution
Time series analytics
 Any analytics algorithm should be a mathematical model that should:
 Data Compression: Compact representation of data
 Signal Processing: extracting signal(sequences) even in presence of noise
 Prediction: using model predict the future values of time series
Terminology
 Patterns
 Block of graph where values are within a
range
 Patterns are grown from pairs of sequential
points till the block conform given
thresholds
 Clusters
 Similar type of patterns
Terminology
 Sequences
 A recurring series of patterns belonging to
a set of clusters.
 Concepts
 Sequences which are tagged as relevant to
the user.
 Knowledge Base
 Inference drawn from concepts.
 This is the compressed representation of
the time series
Phases
 Training phase
 Objective is to build a Knowledge base.
 Bulk historical data is given as input.
 Parameters of the algorithm are fine tuned to match the use case.
 Concepts are identified and assigned an action.
 Validation Phase
 Bulk
 Bulk data is given.
 Patterns are found and classified according to knowledge base.
 Used to identify and tag scenarios over a known timeline.
Phases
 Decision phase
 Real Time
 For example a Kafka source is provided.
 Received data is processed in batches.
 Patterns spanning multiples batches are stitched.
 If a sequence is identified as a concept, the specified action is triggered.
Architecture
Source
•Google Drive
•Kafka
•Local file
Ingestion
•Filters
•Transformation
•Manipulation
•Materialization
Computation
•Patterns
•Clusters
•Sequences
•Concepts
Actions
•SMS
•Email
Tech Architecture
Ui Server FrontendBackend Server
Benchmarks
0
2
4
6
8
10
12
0 10 20 30 40 50 60
Duration(Hours)
Size (GB)
Duration vs Size
Driver memory (GB) Total executor memory (GB) Slave cores (procs)
Real Time Analysis
on Time Series
ROHITH YERAVOTHULA
Training phase output
 Knowledge Base properties:
 Data Compression: Compact representation of data
 Signal Processing: extracting signal(sequences) even in presence of noise
 Prediction: using model predict the future values of time series
Real time system
 Light weight Computation framework
 Ability to handle 3V’s (Volume, Velocity and Variety) of Big Data
 Computation framework with micro batch processing architecture
Tech Architecture
Ui Server FrontendBackend Server
Data Source
 Data source that can keep the data from the source and ingest into computation
framework which can
 Take Advantage of distributed computation framework
 Store data in a fault tolerant manner
Tech Architecture
Ui Server FrontendBackend Server
Using Spark and Kafka
Frontend
Via Server
Connecting with IoT
 Connect Mobile accelerometer to AWS IoT and stream data.
 Train the system to predict an user’s behavior using accelerometer data.
Mobile IoT architecture
600 ms
5 sec
3 sec
2 sec
4 sec
Frontend
Via Server
Bottlenecks
 Small File Issues: writing and reading huge number of small files.
 Sharing data between batches.
Fix: Small Files Problem
 Implemented a in memory queue to hold data for several batches and then
compile everything into a single file and write to storage system
 Can also serve UI requests from in-memory queue.
 This eliminates the extra read calls from storage system to serve UI requests
 Allows the writes in first place to be asynchronous
Why Share data between batches
 In Real time data ingestion, data can be broken into different batches depending
upon the batch size we choose
 We need to take care of signals overflowing across batches
Sharing Data between batches
 UpdateStateByKey
 ssc.remember()
 Spark Accumulators
Mobile IoT architecture
600 ms
5 sec
3 sec
2 sec
4 sec
Frontend
Via Server
Mobile IoT architecture (updated)
600 ms
5 sec
1 sec
async
Frontend
Demo QUICK DEMONSTRATION
WITH FORCE SENSOR
“
”
Keep calm and ask questions
Q & A session

Mais conteúdo relacionado

Mais procurados

Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Petr Zapletal
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Summit
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryIlya Ganelin
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Spark Summit
 
Adding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug GrallAdding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug GrallSpark Summit
 
An Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise SearchAn Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise SearchPatricia Gorla
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksDatabricks
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's comingDatabricks
 
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...DataStax
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...DataStax
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1Joe Stein
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Spark Summit
 
Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringAnant Rustagi
 
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...Spark Summit
 
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkEnd-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkDatabricks
 
Scaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksScaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksDatabricks
 
Structuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingStructuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingDatabricks
 

Mais procurados (20)

Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
 
Adding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug GrallAdding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug Grall
 
An Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise SearchAn Introduction to Distributed Search with Datastax Enterprise Search
An Introduction to Distributed Search with Datastax Enterprise Search
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroring
 
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
 
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkEnd-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Scaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksScaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on Databricks
 
Structuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingStructuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and Streaming
 

Destaque

Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Joining Large data at Scale
Joining Large data at ScaleJoining Large data at Scale
Joining Large data at ScaleSigmoid
 
SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0Sigmoid
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Sigmoid
 
Machine Learning on Big Data
Machine Learning on Big DataMachine Learning on Big Data
Machine Learning on Big DataMax Lin
 
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...Spark Summit
 
Hive function-cheat-sheet
Hive function-cheat-sheetHive function-cheat-sheet
Hive function-cheat-sheetDr. Volkan OBAN
 
2017.03.13 Financialisation as a Strategic Action Field: An Historically Info...
2017.03.13 Financialisation as a Strategic Action Field: An Historically Info...2017.03.13 Financialisation as a Strategic Action Field: An Historically Info...
2017.03.13 Financialisation as a Strategic Action Field: An Historically Info...NUI Galway
 
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...NUI Galway
 
Can we know the future? By John Wilkins
Can we know the future?  By John WilkinsCan we know the future?  By John Wilkins
Can we know the future? By John WilkinsAdam Ford
 
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al TobeyTokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al TobeyDataStax Academy
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...StampedeCon
 
Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Tes...
Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Tes...Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Tes...
Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Tes...Justin Hunter
 
Using Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with HadoopUsing Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with HadoopJames Kebinger
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data ManagementAlbert Bifet
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series WorldMapR Technologies
 
R data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RR data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RDr. Volkan OBAN
 

Destaque (20)

Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Joining Large data at Scale
Joining Large data at ScaleJoining Large data at Scale
Joining Large data at Scale
 
SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0
 
Machine Learning on Big Data
Machine Learning on Big DataMachine Learning on Big Data
Machine Learning on Big Data
 
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
 
R-Shiny Cheat sheet
R-Shiny Cheat sheetR-Shiny Cheat sheet
R-Shiny Cheat sheet
 
Hive function-cheat-sheet
Hive function-cheat-sheetHive function-cheat-sheet
Hive function-cheat-sheet
 
2017.03.13 Financialisation as a Strategic Action Field: An Historically Info...
2017.03.13 Financialisation as a Strategic Action Field: An Historically Info...2017.03.13 Financialisation as a Strategic Action Field: An Historically Info...
2017.03.13 Financialisation as a Strategic Action Field: An Historically Info...
 
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
 
Can we know the future? By John Wilkins
Can we know the future?  By John WilkinsCan we know the future?  By John Wilkins
Can we know the future? By John Wilkins
 
The Big 5- Twitter
The Big 5- TwitterThe Big 5- Twitter
The Big 5- Twitter
 
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al TobeyTokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Tes...
Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Tes...Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Tes...
Hexawise Software Test Design Tool - "Vendor Meets User" at CAST Software Tes...
 
Using Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with HadoopUsing Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with Hadoop
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Follow up SPARK
Follow up SPARKFollow up SPARK
Follow up SPARK
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series World
 
R data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RR data mining-Time Series Analysis with R
R data mining-Time Series Analysis with R
 

Semelhante a ML on Big Data: Real-Time Analysis on Time Series

AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...GeeksLab Odessa
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewAmazon Web Services
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
 
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...MSAdvAnalytics
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKSungmin Kim
 
AWS Summit 2018 Summary
AWS Summit 2018 SummaryAWS Summit 2018 Summary
AWS Summit 2018 SummaryAshish Mrig
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent MonitoringIntelie
 
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...RightScale
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3'sdelagoya
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Coursejimliddle
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxRenjithPillai26
 
Proposal for google summe of code 2016
Proposal for google summe of code 2016 Proposal for google summe of code 2016
Proposal for google summe of code 2016 Mahesh Dananjaya
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Jamie Kinney
 
A guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update ConferenceA guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update ConferenceEldert Grootenboer
 
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesWindows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesComunidade NetPonto
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...Niraj Tolia
 

Semelhante a ML on Big Data: Real-Time Analysis on Time Series (20)

AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
 
AWS Summit 2018 Summary
AWS Summit 2018 SummaryAWS Summit 2018 Summary
AWS Summit 2018 Summary
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent Monitoring
 
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
Harness the Power of the Cloud for Grid Computing and Batch Processing Applic...
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3's
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
 
Proposal for google summe of code 2016
Proposal for google summe of code 2016 Proposal for google summe of code 2016
Proposal for google summe of code 2016
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
 
A guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update ConferenceA guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update Conference
 
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesWindows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
 
DataPalooza: ML & IoT Workshop
DataPalooza: ML & IoT WorkshopDataPalooza: ML & IoT Workshop
DataPalooza: ML & IoT Workshop
 

Mais de Sigmoid

Monitoring and tuning Spark applications
Monitoring and tuning Spark applicationsMonitoring and tuning Spark applications
Monitoring and tuning Spark applicationsSigmoid
 
Structured Streaming Using Spark 2.1
Structured Streaming Using Spark 2.1Structured Streaming Using Spark 2.1
Structured Streaming Using Spark 2.1Sigmoid
 
Real-Time Stock Market Analysis using Spark Streaming
 Real-Time Stock Market Analysis using Spark Streaming Real-Time Stock Market Analysis using Spark Streaming
Real-Time Stock Market Analysis using Spark StreamingSigmoid
 
Levelling up in Akka
Levelling up in AkkaLevelling up in Akka
Levelling up in AkkaSigmoid
 
Expression Problem: Discussing the problems in OOPs language & their solutions
Expression Problem: Discussing the problems in OOPs language & their solutionsExpression Problem: Discussing the problems in OOPs language & their solutions
Expression Problem: Discussing the problems in OOPs language & their solutionsSigmoid
 
Building bots to automate common developer tasks - Writing your first smart c...
Building bots to automate common developer tasks - Writing your first smart c...Building bots to automate common developer tasks - Writing your first smart c...
Building bots to automate common developer tasks - Writing your first smart c...Sigmoid
 
Failsafe Hadoop Infrastructure and the way they work
Failsafe Hadoop Infrastructure and the way they workFailsafe Hadoop Infrastructure and the way they work
Failsafe Hadoop Infrastructure and the way they workSigmoid
 
WEBSOCKETS AND WEBWORKERS
WEBSOCKETS AND WEBWORKERSWEBSOCKETS AND WEBWORKERS
WEBSOCKETS AND WEBWORKERSSigmoid
 
Angular js performance improvements
Angular js performance improvementsAngular js performance improvements
Angular js performance improvementsSigmoid
 
Building high scalable distributed framework on apache mesos
Building high scalable distributed framework on apache mesosBuilding high scalable distributed framework on apache mesos
Building high scalable distributed framework on apache mesosSigmoid
 
Equation solving-at-scale-using-apache-spark
Equation solving-at-scale-using-apache-sparkEquation solving-at-scale-using-apache-spark
Equation solving-at-scale-using-apache-sparkSigmoid
 
Introduction to apache nutch
Introduction to apache nutchIntroduction to apache nutch
Introduction to apache nutchSigmoid
 
Approaches to text analysis
Approaches to text analysisApproaches to text analysis
Approaches to text analysisSigmoid
 
Graph computation
Graph computationGraph computation
Graph computationSigmoid
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big dataSigmoid
 
Dashboard design By Anu Vijayan
Dashboard design By Anu VijayanDashboard design By Anu Vijayan
Dashboard design By Anu VijayanSigmoid
 
Introduction to Spark R with R studio - Mr. Pragith
Introduction to Spark R with R studio - Mr. Pragith Introduction to Spark R with R studio - Mr. Pragith
Introduction to Spark R with R studio - Mr. Pragith Sigmoid
 
Spark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. JyotiskaSpark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. JyotiskaSigmoid
 
Tale of Kafka Consumer for Spark Streaming
Tale of Kafka Consumer for Spark StreamingTale of Kafka Consumer for Spark Streaming
Tale of Kafka Consumer for Spark StreamingSigmoid
 
Sparkstreaming with kafka and h base at scale (1)
Sparkstreaming with kafka and h base at scale (1)Sparkstreaming with kafka and h base at scale (1)
Sparkstreaming with kafka and h base at scale (1)Sigmoid
 

Mais de Sigmoid (20)

Monitoring and tuning Spark applications
Monitoring and tuning Spark applicationsMonitoring and tuning Spark applications
Monitoring and tuning Spark applications
 
Structured Streaming Using Spark 2.1
Structured Streaming Using Spark 2.1Structured Streaming Using Spark 2.1
Structured Streaming Using Spark 2.1
 
Real-Time Stock Market Analysis using Spark Streaming
 Real-Time Stock Market Analysis using Spark Streaming Real-Time Stock Market Analysis using Spark Streaming
Real-Time Stock Market Analysis using Spark Streaming
 
Levelling up in Akka
Levelling up in AkkaLevelling up in Akka
Levelling up in Akka
 
Expression Problem: Discussing the problems in OOPs language & their solutions
Expression Problem: Discussing the problems in OOPs language & their solutionsExpression Problem: Discussing the problems in OOPs language & their solutions
Expression Problem: Discussing the problems in OOPs language & their solutions
 
Building bots to automate common developer tasks - Writing your first smart c...
Building bots to automate common developer tasks - Writing your first smart c...Building bots to automate common developer tasks - Writing your first smart c...
Building bots to automate common developer tasks - Writing your first smart c...
 
Failsafe Hadoop Infrastructure and the way they work
Failsafe Hadoop Infrastructure and the way they workFailsafe Hadoop Infrastructure and the way they work
Failsafe Hadoop Infrastructure and the way they work
 
WEBSOCKETS AND WEBWORKERS
WEBSOCKETS AND WEBWORKERSWEBSOCKETS AND WEBWORKERS
WEBSOCKETS AND WEBWORKERS
 
Angular js performance improvements
Angular js performance improvementsAngular js performance improvements
Angular js performance improvements
 
Building high scalable distributed framework on apache mesos
Building high scalable distributed framework on apache mesosBuilding high scalable distributed framework on apache mesos
Building high scalable distributed framework on apache mesos
 
Equation solving-at-scale-using-apache-spark
Equation solving-at-scale-using-apache-sparkEquation solving-at-scale-using-apache-spark
Equation solving-at-scale-using-apache-spark
 
Introduction to apache nutch
Introduction to apache nutchIntroduction to apache nutch
Introduction to apache nutch
 
Approaches to text analysis
Approaches to text analysisApproaches to text analysis
Approaches to text analysis
 
Graph computation
Graph computationGraph computation
Graph computation
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
Dashboard design By Anu Vijayan
Dashboard design By Anu VijayanDashboard design By Anu Vijayan
Dashboard design By Anu Vijayan
 
Introduction to Spark R with R studio - Mr. Pragith
Introduction to Spark R with R studio - Mr. Pragith Introduction to Spark R with R studio - Mr. Pragith
Introduction to Spark R with R studio - Mr. Pragith
 
Spark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. JyotiskaSpark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. Jyotiska
 
Tale of Kafka Consumer for Spark Streaming
Tale of Kafka Consumer for Spark StreamingTale of Kafka Consumer for Spark Streaming
Tale of Kafka Consumer for Spark Streaming
 
Sparkstreaming with kafka and h base at scale (1)
Sparkstreaming with kafka and h base at scale (1)Sparkstreaming with kafka and h base at scale (1)
Sparkstreaming with kafka and h base at scale (1)
 

Último

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Último (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

ML on Big Data: Real-Time Analysis on Time Series

  • 1. ML on Big Data: Real Time Analysis on Time Series
  • 2. Machine Learning on Big Data YASWANTH YADLAPALLI
  • 3. Topics  Business use case  Training phase of the algorithm  Tech stack  Real time implementation  Demonstration on a force sensor
  • 4. Data Model  We are currently working on these data models :  Unstructured data  Structured Data  Time series Data  For this talk we are going to concentrate on Time series data
  • 5. Problem Statement  To build a reactive application which trains on limited amount of data.
  • 6. Business use case  Main use case is in preventive maintenance systems.  Calendar based maintenance schedules and holding excessive inventory to reduce downtime all lead to inefficiencies and increase costs.  Recent failures in machinery of oil rigs, car manufacturing plants have cost their respective industries millions of dollars in down time and repairs.  Condition Based Monitoring systems are implemented with the goal of eliminating unplanned downtime and reducing operations cost by maintaining the proper equipment at the proper time.  As they say a stitch in time saves nine.
  • 11. Time series analytics  Any analytics algorithm should be a mathematical model that should:  Data Compression: Compact representation of data  Signal Processing: extracting signal(sequences) even in presence of noise  Prediction: using model predict the future values of time series
  • 12. Terminology  Patterns  Block of graph where values are within a range  Patterns are grown from pairs of sequential points till the block conform given thresholds  Clusters  Similar type of patterns
  • 13. Terminology  Sequences  A recurring series of patterns belonging to a set of clusters.  Concepts  Sequences which are tagged as relevant to the user.  Knowledge Base  Inference drawn from concepts.  This is the compressed representation of the time series
  • 14. Phases  Training phase  Objective is to build a Knowledge base.  Bulk historical data is given as input.  Parameters of the algorithm are fine tuned to match the use case.  Concepts are identified and assigned an action.  Validation Phase  Bulk  Bulk data is given.  Patterns are found and classified according to knowledge base.  Used to identify and tag scenarios over a known timeline.
  • 15. Phases  Decision phase  Real Time  For example a Kafka source is provided.  Received data is processed in batches.  Patterns spanning multiples batches are stitched.  If a sequence is identified as a concept, the specified action is triggered.
  • 17. Tech Architecture Ui Server FrontendBackend Server
  • 18. Benchmarks 0 2 4 6 8 10 12 0 10 20 30 40 50 60 Duration(Hours) Size (GB) Duration vs Size Driver memory (GB) Total executor memory (GB) Slave cores (procs)
  • 19. Real Time Analysis on Time Series ROHITH YERAVOTHULA
  • 20. Training phase output  Knowledge Base properties:  Data Compression: Compact representation of data  Signal Processing: extracting signal(sequences) even in presence of noise  Prediction: using model predict the future values of time series
  • 21. Real time system  Light weight Computation framework  Ability to handle 3V’s (Volume, Velocity and Variety) of Big Data  Computation framework with micro batch processing architecture
  • 22. Tech Architecture Ui Server FrontendBackend Server
  • 23. Data Source  Data source that can keep the data from the source and ingest into computation framework which can  Take Advantage of distributed computation framework  Store data in a fault tolerant manner
  • 24. Tech Architecture Ui Server FrontendBackend Server
  • 25. Using Spark and Kafka Frontend Via Server
  • 26. Connecting with IoT  Connect Mobile accelerometer to AWS IoT and stream data.  Train the system to predict an user’s behavior using accelerometer data.
  • 27. Mobile IoT architecture 600 ms 5 sec 3 sec 2 sec 4 sec Frontend Via Server
  • 28. Bottlenecks  Small File Issues: writing and reading huge number of small files.  Sharing data between batches.
  • 29. Fix: Small Files Problem  Implemented a in memory queue to hold data for several batches and then compile everything into a single file and write to storage system  Can also serve UI requests from in-memory queue.  This eliminates the extra read calls from storage system to serve UI requests  Allows the writes in first place to be asynchronous
  • 30. Why Share data between batches  In Real time data ingestion, data can be broken into different batches depending upon the batch size we choose  We need to take care of signals overflowing across batches
  • 31. Sharing Data between batches  UpdateStateByKey  ssc.remember()  Spark Accumulators
  • 32. Mobile IoT architecture 600 ms 5 sec 3 sec 2 sec 4 sec Frontend Via Server
  • 33. Mobile IoT architecture (updated) 600 ms 5 sec 1 sec async Frontend
  • 35. “ ” Keep calm and ask questions Q & A session

Notas do Editor

  1. In this section I am going go give brief introduction of architecture and business use case of our system. Our goal is to make sense of any given sensor data may it be a pressure sensor in a valve or a camera on a self-driving car. By which we may be able to take smart decisions or make predictions about the future.
  2. Unstructured data doesn't have relations between columns.
  3. Whatever may be the data source, we want to make a generalized solution which can handle any type of variation and enable the user to get a specialized system for this own use case
  4. Assume there is oil rig with 10 machine and 100 sensors each. Say, we know that a component in a machine needs maintainance every 3 months, but in many real life situations the component may breakdown pre maturely, which may cause the company millions in down time. Having a person monitor all the sensor outputs and determine whether any component needs maintainance is not a viable solution. Our system is built to handle this use case.
  5. Please have a look this data from a pressure sensor in a valve. Say as an user we know that the first anomaly is caused by miss orientation of the spring and second is caused when seal of the valve is broken. Can you suggest any methods to isolate these 2 phenomenon. Most of the traditional approaches wouldn't take in to consideration if a new type pattern emerges. next slide our solution.
  6. loss of similarity
  7. Knowledge base is inferences drawn from the given data.
  8. This is the pipeline all the phases of our application go through. First you provide a data source, currently you can upload a local file in your computer or select it from your google chrome, formats supported are CSV and TSV. Then the data is ingested using a schema provided. You can type cast variable, join columns from multiple files, etc. Using this ingested data as our time series we can compute PCSC,
  9. Simple example for say y=sin(x) time series model Prediction on time series data is one of the use case for real time time series data analytics talk 1 explains about how did we train the system and teach it the make decisions. We use the trained system for real time analytics We need some streaming or live computation framework
  10. Spark streaming is a micro batch processing architecture Collects stream data into small batches and processes it Job Creation and scheduling overhead is in order of milliseconds Batch interval can be as small as 1 Second
  11. We can’t rely on original data source as it cannot provide recent data once lost
  12. Apache kafka is a Distributed streaming platform Publish and subscribe to streams of records. Store streams of records in a fault-tolerant way
  13. Streams of time series data will be pumping data into kafka Spark will connect to kafka brokers and consume data Processes data and store to database UI server will pull data for visualization Spark is the computation layer while kafka acts as data source for streaming data We now have a streaming end to end application :  a data source to stream, a compute framework and a storage system We can connect any real time streaming source One such demo: simple aws iot with moble sensors
  14. Every batch is writing a lot of small files into storage system (HDFS) We use parquet as it is one of the best compressed fomat of data available Spark parquet format small files writing is adding up to extra overhead Reading several small files from Storage to serve UI requests is also adding up to delay
  15. Sharing data between batches
  16. State maintain
  17. Make one line Basic problem with live streaming is data will be broken into batches. Our mathematical model can’t rely upon a batch, needs to wait on for next batch to see if data is overflown