If you are interested to know more about AWS Chicago Summit, please use the following to register: http://amzn.to/1RooPPL
Amazon Kinesis is a fully managed, cloud-based service for real-time data processing over large, distributed data streams. AWS Lambda is a compute service that runs your code in response to events and automatically manages the compute resources for you. AWS Lambda can run code in response to data in Amazon Kinesis streams, making it easy to build big data applications that respond quickly to new information. In this webinar, we will cover key Kinesis and Lambda features, walk through sample use cases for stream processing, and discuss best practices on using the services together. We'll then demonstrate setting up an Amazon Kinesis stream and an associated Lambda function to capture and perform custom computations on click-stream data, all without setting up any infrastructure.
Learning Objectives: • Understand key Amazon Kinesis and AWS Lambda features • Learn how to setup streaming data capture and processing framework using AWS Lambda • Learn sample use cases, best practices and tips on using AWS Lambda with Amazon Kinesis
Who Should Attend: • Developers, Devops Engineers, IT Operations Professionals
2. Amazon Kinesis: A managed service for
streaming data ingestion and processing
Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates data
across three data centers (availability zones)
Aggregate and
archive to S3
Millions of
sources producing
100s of terabytes
per hour
Front
End
Authentication
Authorization
Ordered stream
of events supports
multiple readers
Real-time
dashboards
and alarms
Machine learning
algorithms or
sliding window
analytics
Aggregate analysis
in Hadoop or a
data warehouse
Inexpensive: $0.028 per million puts
3. Benefits of Amazon Kinesis for stream data
ingestion and continuous processing
Real-time Ingest
Highly Scalable
Durable
Elastic
Replay-able Reads
Continuous Processing FX
Elastic
Load-balancing incoming streams
Fault-tolerance, Checkpoint / Replay
Enable multiple processing apps in parallel
Enable data movement into Stores/ Processing Engines
Managed Service
Low end-to-end latency
4. AWS Lambda: A compute service that runs
your code in response to events
Lambda functions: Stateless, event-driven code execution
Triggered by events:
• Put to an Amazon S3 bucket
• Record in an Amazon Kinesis stream
• Direct sync and async invocations
Makes it easy to
• Build back-end services that perform at scale
• Perform data-driven auditing, analysis, and notification
5. High performance at any scale;
Cost-effective and efficient
No Infrastructure to manage
Pay only for what you use: Lambda
automatically matches capacity to
your request rate. Purchase
compute in 100ms increments.
Bring Your Own Code
“Productivity focused compute platform to build powerful, dynamic,
modular applications in the cloud”
Run code in a choice of standard
languages. Use threads, processes,
files, and shell scripts normally.
Focus on business logic, not
infrastructure. You upload code; AWS
Lambda handles everything else.
Benefits of AWS Lambda for building a server-
less data processing engine
1 2 3
6. What you can do with Kinesis+Lambda
Data Input Kinesis Action Lambda Data Output
IT application activity
Capture the
stream
Audit
Process the
stream
SNS
Metering records Condense Redshift
Change logs Backup S3
Financial data Store RDS
Transaction orders Process SQS
Server health metrics Monitor EC2
User clickstream Analyze EMR
IoT device data Respond Backend endpoint
Custom data Custom action Custom application
7. Today’s demo: Workflow of a simple real-time
data analytics setup
Amazon
Kinesis
AWS
Lambda
Amazon
SNS
Amazon
CloudWatch
8. Create different Lambda functions for each task,
associate to same Kinesis stream
Log to
CloudWatch
Logs
Push to SNS
9. Demo: Real time processing of
Amazon Kinesis data streams with
AWS Lambda
10. Things to remember when creating a Kinesis
stream
• Streams are made of Shards
• Each Shard ingests data up to 1MB/sec
• Each Shard emits data up to 2MB/sec
• All data is stored for 24 hours, Replay data inside of 24hr window
• A Partition Key is supplied by producer and used to distribute the PUTs across Shards
• A unique Sequence # is returned to the Producer upon a successful PUT call
11. Attaching a Lambda function to a Kinesis stream
• Shards: One Lambda function concurrently invoked per Kinesis shard
• Increasing shards will cause more Lambda functions invoked concurrently
• Each individual shard follows ordered processing
… …
Source
Kinesis
Destination
1
Lambda
Destination
2
Pollers FunctionsShards
Lambda will scale automaticallyScale Kinesis by adding shards
12. Performance tuning Kinesis as an event source
• Batch size: Number of records that AWS
Lambda will retrieve from Kinesis at the
time of invoking your function
• Increasing batch size will cause fewer
Lambda function invocations with more
data processed per function
• Starting Position: The position in the
stream where Lambda starts reading
• Set to “Trim Horizon” for ordered
processing (FIFO)
• Set to “Latest” for reading most recent
data (LIFO)
13. Best practices for creating Lambda functions
• Memory: CPU and disk proportional to the memory configured
• Increasing memory makes your code execute faster (if CPU bound)
• Increasing memory allows for larger record sizes processed
• Timeout: Increasing timeout allows for longer functions, but more wait in case of errors
• Retries: For Kinesis, Lambda has unlimited retries (until data expires)
• Permission model: Lambda pulls data from Kinesis, so no invocation role needed, only
execution role
14. Monitoring and Debugging Lambda functions
• Monitoring: available in Amazon CloudWatch Metrics
• Invocation count
• Duration
• Error count
• Throttle count
• Debugging: available in Amazon CloudWatch Logs
• All Metrics
• Custom logs
• RAM consumed
• Search for log events
15. Customers running real-time data stream
processing on Kinesis+Lambda
AWS
Lambda
Aggregate
statistics
Real-time
analytics
Kinesis Stream
“I want to apply custom logic to
process data being uploaded through
my Kinesis stream”.
• Client activity tracking
• Metrics generation
• Data cleansing
• Log filtering
• Indexing and searching
• Log routing
• Live alarms and notifications
16. Three Next Steps
1. Create your first Kinesis stream. You can configure hundreds of
thousands of data producers to continuously put data into an
Amazon Kinesis stream. For example, data from website
clickstreams, application logs, and social media feeds.
2. Create and test your first Lambda function. With AWS Lambda,
there are no new languages, tools, or frameworks to learn. You can
use any third party library, even native ones. And the first 1M
requests each month are on us!
3. Use AWS Lambda to process Amazon Kinesis streams … no
infrastructure to manage, and setup real-time analytics in minutes!
17. AWS Summit – Chicago: An exciting, free cloud conference designed to educate and inform new
customers about the AWS platform, best practices and new cloud services.
Details
• July 1, 2015
• Chicago, Illinois
• @ McCormick Place
Featuring
• New product launches
• 36+ sessions, labs, and bootcamps
• Executive and partner networking
Registration is now open
• Come and see what AWS and the cloud can do for you.
18. - If you are interested in learning more about how to navigate the cloud to grow
your business - then attend the AWS Summit Chicago, July 1st.
- Register today to learn from technical sessions led by AWS engineers, hear best
practices from AWS customers and partners, and participate in some of the 30+
paid sessions and labs.
- Simply go to
https://aws.amazon.com/summits/chicago/?trkcampaign=summit_chicago_bootc
amps&trk=Webinar_slide
to register today.
- Registration is FREE.
19. Thank you!
Visit http://aws.amazon.com/kinesis,
the AWS Big Data blog, and the
Kinesis forum to learn more and get
started using Kinesis.
Visit http://aws.amazon.com/lambda,
the AWS Compute blog, and the
Lambda forum to learn more and
get started using Lambda.