In recent years, there has been an explosive growth in the number of connected devices and real-time data sources. Because of this, data is being produced continuously and its production rate is accelerating. Businesses can no longer wait for hours or days to use this data. To gain the most valuable insights, they must use this data immediately so they can react quickly to new information. In this workshop, you learn how to take advantage of streaming data sources to analyze and react in near real-time. You are presented with several requirements for a real-world streaming data scenario and you're tasked with creating a solution that successfully satisfies the requirements using services such as Amazon Kinesis, AWS Lambda and Amazon SNS.
3. Data is produced continuously
Mobile Apps Web Clickstream Application Logs
Metering Records IoT Sensors Smart Buildings
[Wed Oct 11 14:32:52
2000] [error] [client
127.0.0.1] client
denied by server
configuration:
/export/home/live/ap/h
tdocs/test
5. Amazon Kinesis makes it easy to work with
real-time streaming data
Amazon Kinesis
Streams
• For technical developers
• Collect and stream data
for ordered, replayable,
real-time processing
Amazon Kinesis
Firehose
• For all developers, data
scientists
• Easily load massive
volumes of streaming data
into Amazon S3, Amazon
Redshift, Amazon ES
Amazon Kinesis
Analytics
• For all developers, data
scientists
• Easily analyze data
streams using standard
SQL queries
6. Amazon Kinesis Streams
• Reliably ingest and durably store streaming data at low
cost
• Build custom real-time applications to process streaming
data
7. Amazon Kinesis Firehose
• Reliably ingest and deliver batched, compressed, and
encrypted data to S3, Amazon Redshift, and Amazon ES
• Point and click setup with zero administration and seamless
elasticity
8. Amazon Kinesis Analytics
• Interact with streaming data in real-time using SQL
• Build fully managed and elastic stream processing
applications that process data for real-time
visualizations and alarms
9. Amazon Kinesis Data Producers
SDKs
• Publish directly from application code via PutRecord and PutRecords APIs
Kinesis Agent
• Tail log files and forward lines as messages to Kinesis Streams
Kinesis Producer Library (KPL)
• Background process aggregates and batches messages
• Producer application calls PutUserRecord method
Third-party and open source
• Log4j appender
• Flume, fluentd source libraries
10. Amazon Kinesis Data Consumers
Direct API access
• Custom application, using GetShardIterator and GetRecords APIs
• Application responsible for shard processing, check-points, reshard operations
Kinesis Client Library (KCL)
• Open source library, available in several languages
• Manages stream checkpointing
• Manages shard-worker relationships on reshard, or consumer instance scaling
AWS Lambda
• Serverless stream processing
• Lambda function is invoked only when messages exist on stream
• One Lambda function instance per shard
Third-party and open source
• Spark Streaming
• Storm Spout
13. Workshop Questions
1. Utilization: What is the busiest toll station?
2. Promotions: Who are the most active users?
3. Support: Detect failing Toll sensors.
17. Connect to streaming source
• Streaming data sources include Kinesis
Firehose or Kinesis Streams
• Input formats include JSON, .csv, variable
column, unstructured text
• Each input has a schema; schema is inferred,
but you can edit
• Reference data sources (S3) for data
enrichment
Amazon Kinesis Analytics Core Concepts
18. Write SQL code
• Build streaming applications with one-to-
many SQL statements
• Robust SQL support and advanced analytic
functions
• Extensions to the SQL standard to work
seamlessly with streaming data
• Support for at-least-once processing
semantics
Amazon Kinesis Analytics Core Concepts
19. Continuously deliver SQL results
• Send processed data to multiple destinations
• Amazon S3, Amazon Redshift, Amazon ES
(through Firehose)
• Streams (with AWS Lambda integration for
custom destinations)
• End-to-end processing speed as low as sub-
second
• Separation of processing and data delivery
Amazon Kinesis Analytics Core Concepts