"Effective data processing and streaming architectures for real time IoT – applications", Erik Schmiegelow, CEO at Hivemind
YouTube Link: https://www.youtube.com/watch?v=DrUYbJCRa6U
Watch more from Data Natives 2015 here: http://bit.ly/1OVkK2J
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2016: http://bit.ly/1WMJAqS
About the Author:
Since 1996, Erik Schmiegelow has worked as a software architecht and consultant, building large data processing platforms for companies such as NTT DoCoMo, Royal Mail, Siemens, E-Plus, Allianz and T-Mobile; and until 2001 he was CTO at the Cologne-based digital agency denkwerk. In 2007 he founded the telecommunications consulting agency Itellity, followed by Hivemind Technologies in 2014. Hivemind Technologies is a solutions and services company, focussed on big data analytics and stream processing technologies for web, social data and industrial applications. Erik studied computer sciences in Hamburg.
2. Agenda
I. Intro – IoT Use Cases
II. Challenges
III. Effective Architectures for IoT
3. I. Intro – IoT
IoT – More than a buzz word and not that new, either.
- First implementation as early as 1982 in an
internet connected Coke vending machine at
Carnegie Mellon University
- The rise of RFID, later NFC and other
technologies popularised the concept in the late
90’s and 2000’s
- Device’s numbers are exploding, while expected
to reach 50 billion devices by 2020
4. I. Intro – Use Cases
IoT turns formerly unintelligent isolated devices into
something smart by interconnecting them:
- Smart wearables: sport bands, health sensors
etc..
- Smart homes: auto-sensing shades, bulbs,
heating
- Smart transportation: vehicle and parking
monitoring, dynamic routing, telematics are all
powered by IoT devices
- Smart manufacturing: sensors in factories monitor
supply chains, temperature and other metricised
values
5. I. Intro – Anatomy of IoT
IoT devices consist of two key elements:
I. A network interface – which enables
communication with other devices and a data
processing centre
II. A computing device – to collect data from sensors
IoT devices are usually small, embedded devices
with minimal computing capacity and low energy
requirements.
Smartphone apps are an exception – usually in
NFC/beacon use cases (Airports, Malls etc..)
6. II. Challenges
Since IoT devices can’t do much on their own, they
must rely on a supporting data centre for data
submission (unidirectional) and feedback
(bidirectional)
Multiple devices will send their data at any time in
any order
Devices will expect real time updates in return
Data Synchronisation happens within the data centre
7. II. Challenges – Time and Order
Data Synchronisation in real time
Data centres must be able to handle unsorted,
unordered incoming events.
Data centres must offer instantaneous processing
and updates for IoT devices that expect feedback.
This mandates near real time, stream-oriented
processing as offered by modern Big Data
architectures.
8. III. Effective Architectures
The fundamental principles of effective data
processing architecture for IoT devices are:
1. Asynchronous processing of incoming events
(Message Broker)
2. Constant, streamed computation of events
(Stream Processing)
3. Near real time delivery of aggregated data sets
(Data Store)
9. III. Message Brokering
Message Brokers provide an asynchronous Queue
for incoming messages. Producers write
independently from a Queue from which (multiple)
Consumers pick up the messages.
Typical Broker implementations for IoT would be
Apache Kafka for high throughput or MQTT compliant
10. III. Stream Processing
Incoming events are consumed by a stream
processing pipeline that continuously aggregates and
computes incoming events.
The result of which is stored in a persistent storage.
Typical implementations of Stream Processing
Frameworks are: Apache Spark, Apache Flink or
Storm.
Message Broker
RT Storage
Stream
Processing
11. III. Near Real Time Delivery
IoT Device feedback is achieved by providing either
immediate responses or providing a REST interface;
both of which are connected to the stream
processing’s persistent storage.
The type of storage heavily depends on the use case
involved – Caches, relational databases, key value
stores and other NoSQL DBs are all valid choices.
12. III. Long Term Storage
In some use cases, raw device data most be kept in
long term storage, either for model calculation or for
thorough re-computation.
In such instances, batch processing on top of
inexpensive, scalable storage is the weapon of
choice. Popular implementations are Spark or Flink
on top of Hadoop’s HDFS or linear scalable NoSQL
stores such as Cassandra, ElasticSearch or
MongoDB
13. III. Architectural overview
I
IoT devices
Message Broker
Batch Layer / Long Term Storgae
HDFS
Data node
HDFS
Data node
HDFS
Data node
HDFS
Data node
HDFS
Data node
HDFS
Data node
Apache Spark
(YARN)
Streaming Processing
Apache CassandraREST Interface
14. III. Key Learnings
1. Decouple IoT event delivery with asynchronous
Message Brokers
2. Continuously process incoming events and
update your state in a real time store with a
streaming framework
3. Provide a lightweight query interface for that store
for your devices