SlideShare uma empresa Scribd logo
1 de 21
1© Cloudera, Inc. All rights reserved.
Harnessing Data within Hadoop
in the Connected Brewery:
Kafka, Spark Streaming, and
Kudu
Jason Hubbard
Jason.hubbard@cloudera.com
Cloudera
2© Cloudera, Inc. All rights reserved.
Internet of Things (IoT)
$1.7
Trillion
In Value
20%
Annual Growth
30 Billion
Things
250
Million
Connected Vehicles
Source - IDC & Gartner Estimates
Internet of
Things
IoT Markets - 2020
3© Cloudera, Inc. All rights reserved.
IoT Will Drive An Explosion of Data…
Data expected to explode to
44 ZB by 2020
Source: IDC
44 Trillion GB!80% of data will be
unstructured
4© Cloudera, Inc. All rights reserved.
Value is maximized when data is combined with other
sources
Value of Data is multiplied when you combine
and correlate it with other data from relevant
sources
Improvement in value that can be
unlocked by combining data from
multiple IoT applications and sources
SOURCE: McKinsey Global Institute analysis
Interoperability would significantly improve performance by
combining sensor data from different machines and systems to provide
decision makers with an integrated view of performance
40%
5© Cloudera, Inc. All rights reserved.
The IoT Ecosystem
Consumer
Industrial
IoT Gateway
Data Center
Data Analytics
Sensors/ Things
Data Characteristics
• Un-structured
• Intermittent
• Volume & Variety
Gateway
• Data Routing
• Edge-Processing
• Edge-Storage
Sensors/ Things
•To grow by 50X
•Drop in prices by
70% in last 5 years
Data Storage, Processing & Analytics
IOT Data Characteristics
• More processing in the
cloud
• Analytics on the cloud
IOT Data Analytics
• Key to Value Creation
• Combine data from multiple
sources & types
• Drive business insights
IOT Data Characteristics
• Distributed Data
Processing
• Cloud & On-Premise
Cloud
6© Cloudera, Inc. All rights reserved.
IoT Attributes
• Low powered devices, possibly battery powered
• Highly Distributed
• Gateway/Controller possibly mesh network
• Compact messages
7© Cloudera, Inc. All rights reserved.
IoT Challenges
• Multiple protocols (Z-wave, Zigbee, Thread, etc)
• Distributed, low power may mean data coming from multiple locations
• May power off to save battery or away from controller, need to handle late data
• Calibration between devices may be limited
• Very fast and bursty traffic
• Low bandwidth last mile
8© Cloudera, Inc. All rights reserved.
Use Cases
• Yes, Contrived
• But a good excuse to:
• Brew Beer
• Buy more sensors and microprocessors
• Sorry Wife
9© Cloudera, Inc. All rights reserved.
Use Case - Calibration
• Sensors need to continually be calibrated
• Calibration takes resources and down time
• Instead use historical raw data
• Calibrate on known values
• For temperature sensors use bowling point and triple point
• Temperature sensor is typically linear between these points
• Fit a curve instead
10© Cloudera, Inc. All rights reserved.
Use Case - Optimize Models
• Kalman Filter is used to estimate variable with presence of noise
• Need to know accuracy of sensor
• Usually published by manufacturer but generalized
• Accuracy can degrade over time
• PID Controller
• 3 parameters control performance
• Parameters different for each application
11© Cloudera, Inc. All rights reserved.
Use Case - Predictive Maintenance
• No, not just for heavy machinery
• Sensors fail too
• Can save money by not replacing too early
• More importantly, schedule downtime
• Better Model with more data – Sensors same application many factories
12© Cloudera, Inc. All rights reserved.
Technologies
• Apache Kafka
• Messaging Framework – Scalable, Fault Tolerant
• Pub/Sub
• Retains Data
• Apache Spark
• General Purpose Distributed Processing Framework
• Multiple Components including Streaming
• Streaming continually processes data
• Apache Kudu
13© Cloudera, Inc. All rights reserved.
Kudu for IoT
Why it matters
14© Cloudera, Inc. All rights reserved.
Kudu use cases
Kudu is best for use cases requiring a simultaneous combination of
sequential and random reads and writes
• Machine data analytics
• Example: IOT, Connected Cars, Network threat detection
• Workload: Inserts, scans, lookups
• Time series
• Examples: Streaming market data, fraud detection / prevention, risk monitoring
• Workload: Insert, updates, scans, lookups
• Online reporting
• Example: Operational data store (ODS)
• Workload: Inserts, updates, scans, lookups
15© Cloudera, Inc. All rights reserved.
How would we build the Analytics System Today?
• HDFS Excels at:
• Full table scans
• Ad-hoc analytics
Click to enter confidentiality
Sensors Kafka /
Pub-sub
Events
Today’s Partition
Yesterday’s Partition
Historic Data
AnalystIngest
1. Have we
accumulated
enough data?
2. Flush into
HDFS
16© Cloudera, Inc. All rights reserved.
Handling Late Arriving Data
Click to enter confidentiality
/cars/01-13/
/cars/01-14/
/cars/01-15/HDFS (Storage)
17© Cloudera, Inc. All rights reserved.
Hybrid big data analytics pipeline
Before Kudu
Sensors Kafka /
Pub-sub
Events
HBase
Consumer
HDFS (Storage)
Random Reads
Analyst
Analytics
Snapshot
& Convert to
Parquet
Compact late
arriving data
18© Cloudera, Inc. All rights reserved.
Hybrid big data analytics pipeline
After Kudu
Sensors Kafka /
Pub-sub
Events
Kudu
ConsumerRandom Reads
Analyst
Analytics
Kudu supports simultaneous combination of
sequential and random reads and writes
19© Cloudera, Inc. All rights reserved.
What Kudu is *NOT*
• Not a SQL interface itself
• It’s just the storage layer
• Not an application that runs on HDFS
• It’s an alternative, native Hadoop storage engine
• Not a replacement for HDFS or HBase
• Select the right storage for the right use case
20© Cloudera, Inc. All rights reserved.
Kudu Trade-Offs (vs Hbase)
• Random updates will be slower
• HBase model allows random updates without incurring a disk seek
• Kudu requires a key lookup before update, Bloom lookup before insert
• Single-row reads may be slower
• Columnar design is optimized for scans
• Future: may introduce “column groups” for applications where single-row
access is more important
21© Cloudera, Inc. All rights reserved.
Demo

Mais conteúdo relacionado

Mais procurados

Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Kevin Mao
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Cloudera, Inc.
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016Chester Chen
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduCloudera, Inc.
 
Breaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIBreaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIDataWorks Summit
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudCloudera, Inc.
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDisrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDataWorks Summit/Hadoop Summit
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidDataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
From SQL to NoSQL - StampedeCon 2015
From SQL to NoSQL  - StampedeCon 2015From SQL to NoSQL  - StampedeCon 2015
From SQL to NoSQL - StampedeCon 2015StampedeCon
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Data Con LA
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsKinetica
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentDataWorks Summit
 

Mais procurados (20)

Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Real Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with SparkReal Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with Spark
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Active Learning for Fraud Prevention
Active Learning for Fraud PreventionActive Learning for Fraud Prevention
Active Learning for Fraud Prevention
 
Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 
Breaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIBreaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AI
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDisrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and Druid
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
From SQL to NoSQL - StampedeCon 2015
From SQL to NoSQL  - StampedeCon 2015From SQL to NoSQL  - StampedeCon 2015
From SQL to NoSQL - StampedeCon 2015
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environment
 
Self-Service Analytics on Hadoop: Lessons Learned
Self-Service Analytics on Hadoop: Lessons LearnedSelf-Service Analytics on Hadoop: Lessons Learned
Self-Service Analytics on Hadoop: Lessons Learned
 

Destaque

Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Apache Apex
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafkaconfluent
 
Anomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronAnomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronArun Kejariwal
 
Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center   Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center confluent
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafkaconfluent
 
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Chris Fregly
 
Getting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics servicesGetting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics servicesVladimir Bychkov
 
London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)Landoop Ltd
 
Not Only Streams for Akademia JLabs
Not Only Streams for Akademia JLabsNot Only Streams for Akademia JLabs
Not Only Streams for Akademia JLabsKonrad Malawski
 
Processing IoT Data with Apache Kafka
Processing IoT Data with Apache KafkaProcessing IoT Data with Apache Kafka
Processing IoT Data with Apache KafkaMatthew Howlett
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Nitin Kumar
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...confluent
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformconfluent
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 

Destaque (20)

Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
Anomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronAnomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using Heron
 
Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center   Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafka
 
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
Advanced Analytics and Recommendations with Apache Spark - Spark Maryland/DC ...
 
Blr hadoop meetup
Blr hadoop meetupBlr hadoop meetup
Blr hadoop meetup
 
Getting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics servicesGetting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics services
 
London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)
 
Storm over gearpump
Storm over gearpumpStorm over gearpump
Storm over gearpump
 
Kafka connect
Kafka connectKafka connect
Kafka connect
 
Not Only Streams for Akademia JLabs
Not Only Streams for Akademia JLabsNot Only Streams for Akademia JLabs
Not Only Streams for Akademia JLabs
 
Processing IoT Data with Apache Kafka
Processing IoT Data with Apache KafkaProcessing IoT Data with Apache Kafka
Processing IoT Data with Apache Kafka
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 

Semelhante a IoT Connected Brewery

Cloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera, Inc.
 
Powering the Internet of Things with Apache Hadoop
Powering the Internet of Things with Apache HadoopPowering the Internet of Things with Apache Hadoop
Powering the Internet of Things with Apache HadoopCloudera, Inc.
 
Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduEnabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduGrant Henke
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
IoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceIoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceCloudera, Inc.
 
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...Flink Forward
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and ManufacturingCloudera, Inc.
 
Connect Tableau & Power BI to Cognos Data
Connect Tableau & Power BI to Cognos DataConnect Tableau & Power BI to Cognos Data
Connect Tableau & Power BI to Cognos DataSenturus
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera, Inc.
 
Hadoop As The Platform For The Smartgrid At TVA
Hadoop As The Platform For The Smartgrid At TVAHadoop As The Platform For The Smartgrid At TVA
Hadoop As The Platform For The Smartgrid At TVACloudera, Inc.
 
巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architecture巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architectureWei-Chiu Chuang
 
Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Niel Dunnage
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsCloudera, Inc.
 
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Amazon Web Services
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformCloudera, Inc.
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryDataWorks Summit/Hadoop Summit
 
Cloud-based vs. On-site CTMS - Which is Right for Your Organization?
Cloud-based vs. On-site CTMS - Which is Right for Your Organization?Cloud-based vs. On-site CTMS - Which is Right for Your Organization?
Cloud-based vs. On-site CTMS - Which is Right for Your Organization?Perficient
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteRoger Barga
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 

Semelhante a IoT Connected Brewery (20)

Cloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera - IoT & Smart Cities
Cloudera - IoT & Smart Cities
 
Powering the Internet of Things with Apache Hadoop
Powering the Internet of Things with Apache HadoopPowering the Internet of Things with Apache Hadoop
Powering the Internet of Things with Apache Hadoop
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduEnabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache Kudu
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
IoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceIoT-Enabled Predictive Maintenance
IoT-Enabled Predictive Maintenance
 
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and Manufacturing
 
Connect Tableau & Power BI to Cognos Data
Connect Tableau & Power BI to Cognos DataConnect Tableau & Power BI to Cognos Data
Connect Tableau & Power BI to Cognos Data
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
Hadoop As The Platform For The Smartgrid At TVA
Hadoop As The Platform For The Smartgrid At TVAHadoop As The Platform For The Smartgrid At TVA
Hadoop As The Platform For The Smartgrid At TVA
 
巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architecture巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architecture
 
Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data Platform
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 
Cloud-based vs. On-site CTMS - Which is Right for Your Organization?
Cloud-based vs. On-site CTMS - Which is Right for Your Organization?Cloud-based vs. On-site CTMS - Which is Right for Your Organization?
Cloud-based vs. On-site CTMS - Which is Right for Your Organization?
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 

Último

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

IoT Connected Brewery

  • 1. 1© Cloudera, Inc. All rights reserved. Harnessing Data within Hadoop in the Connected Brewery: Kafka, Spark Streaming, and Kudu Jason Hubbard Jason.hubbard@cloudera.com Cloudera
  • 2. 2© Cloudera, Inc. All rights reserved. Internet of Things (IoT) $1.7 Trillion In Value 20% Annual Growth 30 Billion Things 250 Million Connected Vehicles Source - IDC & Gartner Estimates Internet of Things IoT Markets - 2020
  • 3. 3© Cloudera, Inc. All rights reserved. IoT Will Drive An Explosion of Data… Data expected to explode to 44 ZB by 2020 Source: IDC 44 Trillion GB!80% of data will be unstructured
  • 4. 4© Cloudera, Inc. All rights reserved. Value is maximized when data is combined with other sources Value of Data is multiplied when you combine and correlate it with other data from relevant sources Improvement in value that can be unlocked by combining data from multiple IoT applications and sources SOURCE: McKinsey Global Institute analysis Interoperability would significantly improve performance by combining sensor data from different machines and systems to provide decision makers with an integrated view of performance 40%
  • 5. 5© Cloudera, Inc. All rights reserved. The IoT Ecosystem Consumer Industrial IoT Gateway Data Center Data Analytics Sensors/ Things Data Characteristics • Un-structured • Intermittent • Volume & Variety Gateway • Data Routing • Edge-Processing • Edge-Storage Sensors/ Things •To grow by 50X •Drop in prices by 70% in last 5 years Data Storage, Processing & Analytics IOT Data Characteristics • More processing in the cloud • Analytics on the cloud IOT Data Analytics • Key to Value Creation • Combine data from multiple sources & types • Drive business insights IOT Data Characteristics • Distributed Data Processing • Cloud & On-Premise Cloud
  • 6. 6© Cloudera, Inc. All rights reserved. IoT Attributes • Low powered devices, possibly battery powered • Highly Distributed • Gateway/Controller possibly mesh network • Compact messages
  • 7. 7© Cloudera, Inc. All rights reserved. IoT Challenges • Multiple protocols (Z-wave, Zigbee, Thread, etc) • Distributed, low power may mean data coming from multiple locations • May power off to save battery or away from controller, need to handle late data • Calibration between devices may be limited • Very fast and bursty traffic • Low bandwidth last mile
  • 8. 8© Cloudera, Inc. All rights reserved. Use Cases • Yes, Contrived • But a good excuse to: • Brew Beer • Buy more sensors and microprocessors • Sorry Wife
  • 9. 9© Cloudera, Inc. All rights reserved. Use Case - Calibration • Sensors need to continually be calibrated • Calibration takes resources and down time • Instead use historical raw data • Calibrate on known values • For temperature sensors use bowling point and triple point • Temperature sensor is typically linear between these points • Fit a curve instead
  • 10. 10© Cloudera, Inc. All rights reserved. Use Case - Optimize Models • Kalman Filter is used to estimate variable with presence of noise • Need to know accuracy of sensor • Usually published by manufacturer but generalized • Accuracy can degrade over time • PID Controller • 3 parameters control performance • Parameters different for each application
  • 11. 11© Cloudera, Inc. All rights reserved. Use Case - Predictive Maintenance • No, not just for heavy machinery • Sensors fail too • Can save money by not replacing too early • More importantly, schedule downtime • Better Model with more data – Sensors same application many factories
  • 12. 12© Cloudera, Inc. All rights reserved. Technologies • Apache Kafka • Messaging Framework – Scalable, Fault Tolerant • Pub/Sub • Retains Data • Apache Spark • General Purpose Distributed Processing Framework • Multiple Components including Streaming • Streaming continually processes data • Apache Kudu
  • 13. 13© Cloudera, Inc. All rights reserved. Kudu for IoT Why it matters
  • 14. 14© Cloudera, Inc. All rights reserved. Kudu use cases Kudu is best for use cases requiring a simultaneous combination of sequential and random reads and writes • Machine data analytics • Example: IOT, Connected Cars, Network threat detection • Workload: Inserts, scans, lookups • Time series • Examples: Streaming market data, fraud detection / prevention, risk monitoring • Workload: Insert, updates, scans, lookups • Online reporting • Example: Operational data store (ODS) • Workload: Inserts, updates, scans, lookups
  • 15. 15© Cloudera, Inc. All rights reserved. How would we build the Analytics System Today? • HDFS Excels at: • Full table scans • Ad-hoc analytics Click to enter confidentiality Sensors Kafka / Pub-sub Events Today’s Partition Yesterday’s Partition Historic Data AnalystIngest 1. Have we accumulated enough data? 2. Flush into HDFS
  • 16. 16© Cloudera, Inc. All rights reserved. Handling Late Arriving Data Click to enter confidentiality /cars/01-13/ /cars/01-14/ /cars/01-15/HDFS (Storage)
  • 17. 17© Cloudera, Inc. All rights reserved. Hybrid big data analytics pipeline Before Kudu Sensors Kafka / Pub-sub Events HBase Consumer HDFS (Storage) Random Reads Analyst Analytics Snapshot & Convert to Parquet Compact late arriving data
  • 18. 18© Cloudera, Inc. All rights reserved. Hybrid big data analytics pipeline After Kudu Sensors Kafka / Pub-sub Events Kudu ConsumerRandom Reads Analyst Analytics Kudu supports simultaneous combination of sequential and random reads and writes
  • 19. 19© Cloudera, Inc. All rights reserved. What Kudu is *NOT* • Not a SQL interface itself • It’s just the storage layer • Not an application that runs on HDFS • It’s an alternative, native Hadoop storage engine • Not a replacement for HDFS or HBase • Select the right storage for the right use case
  • 20. 20© Cloudera, Inc. All rights reserved. Kudu Trade-Offs (vs Hbase) • Random updates will be slower • HBase model allows random updates without incurring a disk seek • Kudu requires a key lookup before update, Bloom lookup before insert • Single-row reads may be slower • Columnar design is optimized for scans • Future: may introduce “column groups” for applications where single-row access is more important
  • 21. 21© Cloudera, Inc. All rights reserved. Demo

Notas do Editor

  1. Lets start by taking a look at the market potential for IoT: Billions of devices include everything from cars, homes, airplanes, parking meters, factories, oil rigs, heavy machinery to wearables will be connected to the internet and more importantly will be interconnected enabling businesses to work smarter, faster and more profitably. If you look at the market potential for IoT, We are talking about significant growth and some big numbers here.   By 2020 we are talking anywhere from 30 – 50 Billion connected things depending on who you talk to… There will be around a quarter billion connected cars on the roads.   And It is estimated that IoT will generate ~1.7 Trillion US Dollars in value in the next 4-5 years with an approx. growth rate of 20% YoY.
  2. Data is the key to IoT – all of the ability to gain insights out of all of this data However, IoT isn’t just about the things or connecting these objects to the Internet; IoT is really going to be all about the data.   With 30 Billion things connected, IoT Will Drive An Explosion of Data…   The amount of data on the planet is set to grow 10-fold to around 44ZB. . If you are wondering how much that is – That is about 44 trillion GBs of data.   Not only that – over 80% of that data is going to be unstructured/ semi-structured. So the question then becomes - how can you effectively manage, store, process, analyze and drive insights into all of this data that IoT is going to generate?
  3. Data coming in from just one sensor has value, but limited value. Real value from this data can be exploited by combining with data from other IoT sensors or combining it with Internal & external data. So for example – its good to know that your brake pads need to be replaced in your care, through sensors, but auto manufacturers are taking it to the next level – They want to combine that data with other data about the customer including what make and model is the car, where does the customer live, how does he or she like to shop and then send targeted offers to the customer saying – Here is an offer for your brake pad change, at your favorite body shop and you here is a coupon for 15% off your brake pad replacement service. ------------------------------------- McKinsey estimates that situations in which two or more IoT systems must work together can account for about 40 percent of the total value that can be unlocked by the Internet of Things. For ex. Interoperability would significantly improve performance by combining sensor data from different machines and systems to provide decision makers with an integrated view of performance across an entire factory or oil rig. While most use cases involve an immediate response — e.g., when a sensor detects a water leak — the bigger value may be in analyzing historical data or combining it with other data sets.
  4. 30-70% Drop in the price of MEMS sensors in past five years – McKinsey Research Diverse data types – from intermittent sensor readings of temperature and pressure to real-time location data or streaming live videos for video analytics Given the flexible, scalable nature of cloud-based infrastructure and the fact that machine data often originates off premises, we expect a lot of IoT data to be stored and processed in the cloud. The ideal IoT data platform can be deployed either on premise or in a public, hybrid, or private cloud environment. It should be possible to administer the platform via both a web-based interface and API calls. Gateways collects, aggregates, and optionally processes the data generated by the devices. The gateway can also accept and route commands sent from the backend to the respective device. Gateway is responsible for authenticating and authorizing the devices to participate in the workflow. It ensures secure communication between the devices and the centralized command center. The gateway is capable of dealing with multiple protocols and data formats. Response to edge analytics: Having access to all of your data is important, but with access comes responsibility and you a need a strategy about which data needs to be collected at the atomic level, which data needs to be rolled up and aggregated, and which data needs to be used to run your business. We are not saying all and every bit of the data generated by every sensor needs to make it way back to the data center. For some data it might make sense to collect, store, interpret and respond to locally. But organizations need a strategy about which data needs to be collected at the atomic level, which data needs to be rolled up and aggregated, and which data needs to be used to run your business. You will have to decide what happens at the edge, at the core, and perhaps in-between. For example, rather than send all sensor data to a central location, an edge device or software solution may send a summary of the data or trigger an automatic alert based a threshold-level status change. However there are few things you need to mind 1) you need to ensure you are not building up hundreds of different data silos that sits out there and you lack a centralized/ comprehensive view of the business – That is really a huge step backwards from both a business and IT perspective and 2) Security & Governance – Do you want sensitive data, customer data sitting in thousands of edge sensors or gateways significantly increasing your risk of a breach.