SlideShare uma empresa Scribd logo
1 de 48
Baixar para ler offline
Supercharge Your Big Data Infrastructure
with Amazon Kinesis:
Learn to Build Real-time Streaming Big data Processing Applications
Marvin Theimer, AWS
November 15, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Kinesis
Managed Service for Real-Time Processing of Big Data
App.1

Data
Sources
Availability
Zone

Data
Sources

Data
Sources

Availability
Zone

S3
App.2

AWS Endpoint

Data
Sources

Availability
Zone

[Aggregate &
De-Duplicate]

Shard 1
Shard 2
Shard N

[Metric
Extraction]
DynamoDB
App.3
[Sliding
Window
Analysis]
Redshift

Data
Sources

App.4
[Machine
Learning]
Talk outline
• A tour of Kinesis concepts in the context of a
Twitter Trends service
• Implementing the Twitter Trends service

• Kinesis in the broader context of a Big Data ecosystem
A tour of Kinesis concepts in
the context of a Twitter Trends
service
twitter-trends.com website
twitter-trends.com

Elastic Beanstalk
twitter-trends.com
Too big to handle on one box
twitter-trends.com
The solution: streaming map/reduce
twitter-trends.com
My top-10
My top-10
My top-10

Global top-10
Core concepts
Data record

Stream
Partition key

twitter-trends.com
My top-10
Global top-10

My top-10
My top-10

Data record
Shard:

Shard

Worker

14

17

18

21

23

Sequence number
How this relates to Kinesis
twitter-trends.com

Kinesis

Kinesis application
Core Concepts recapped
•

Data record ~ tweet

•

Stream ~ all tweets (the Twitter Firehose)

•

Partition key ~ Twitter topic (every tweet record belongs to exactly one of
these)

•

Shard ~ all the data records belonging to a set of Twitter topics that will get
grouped together

•

Sequence number ~ each data record gets one assigned when first
ingested.

•

Worker ~ processes the records of a shard in sequence number order
Using the Kinesis API directly
twitter-trends.com

K
I
N
iterator = getShardIterator(shardId, LATEST); while (true) {
process(records): {
E
while (record in records) {
for (true) {
localTop10Lists =
S iterator] =
[records,
updateLocalTop10(record);
scanDDBTable();
I
} getNextRecords(iterator, maxRecsToReturn); updateGlobalTop10List(
process(records);
if (timeToDoOutput()) {
localTop10Lists);
S
}

writeLocalTop10ToDDB();
}

}

sleep(10);
}
Challenges with using the Kinesis API
directly
How many workers
Manual creation of workers and
How many EC2 instances?
per to shards
assignment EC2 instance?
twitter-trends.com

K
I
N
E
S
I
S
Kinesis
application
Using the Kinesis Client Library
twitter-trends.com

K
I
N
E
S
I
S

Shard mgmt
table

Kinesis
application
Elasticity and load balancing using Auto
Scaling groups
Auto scaling
Group

K
I
N
E
S
I
S

twitter-trends.com

Shard mgmt
table
Dynamic growth and hot spot
management through stream resharding
Auto scaling
Group

twitter-trends.com

K
I
N
E
S
I
S

Shard mgmt
table
The challenges of fault tolerance

K
I
N
E
S
I
S

X
X
X

twitter-trends.com

Shard mgmt
table
Fault tolerance support in KCL
Availability Zone
1

K
I
N
E
S
I
S

X

twitter-trends.com

Shard mgmt
table
Availability Zone
3
Checkpoint, replay design pattern
Kinesis

Host A

X

Shard
ID

Local top-10

#Seahawks: 9835, …}
{#Obama: 10235, #Congress: 9910, …}
Shard-i

0
18
10
3

5
14

21
18

23

21

18

17

14

10

8

5

10

3

2

8

17

23

3

Host B

2

Shard-i
23
21
18
17

Shard
ID

14

Lock

Shard-i

A
Host B

Seq
num

10
23
0
10
Catching up from delayed processing
Kinesis

Host A

X

2
3

5
23

21

18

17

14

10

8

5

3

Shard-i

Host B

2

10

How long until
we’ve caught up?
Shard throughput SLA:
- 1 MBps in
- 2 MBps out
=>
Catch up time ~ down time

8

23

5

21
18
17

14

3
2

Unless your application
can’t process 2 MBps on a
single host
=>
Provision more shards
Concurrent processing applications
twitter-trends.com

Kinesis
18
10
0
23
3
5
14

21
18

23

21

18

17

14

10

8

5

2

8

17

23

10

3

3

2

Shard-i

10
8

23

5

21
18

spot-the-revolution.org

3

17
14

2

10
0
23
Why not combine applications?
twitter-trends.com

Kinesis
Output state & checkpoint
every 10 secs

twitter-stats.com
Output state & checkpoint
every hour
Implementing twitter-trends.com
Creating a Kinesis Stream
How many shards?
Additional considerations:
-

How much processing is needed
per data record/data byte?

-

How quickly do you need to
catch up if one or more of your
workers stalls or fails?

You can always dynamically reshard
to adjust to changing workloads
Twitter Trends Shard Processing Code
Class TwitterTrendsShardProcessor implements IRecordProcessor {
public TwitterTrendsShardProcessor() { … }

@Override
public void initialize(String shardId) { … }
@Override
public void processRecords(List<Record> records,
IRecordProcessorCheckpointer checkpointer) { … }
@Override
public void shutdown(IRecordProcessorCheckpointer checkpointer,
ShutdownReason reason) { … }
}
Twitter Trends Shard Processing Code
Class TwitterTrendsShardProcessor implements IRecordProcessor {
private Map<String, AtomicInteger> hashCount = new HashMap<>();

private long tweetsProcessed = 0;

@Override
public void processRecords(List<Record> records,
IRecordProcessorCheckpointer checkpointer) {
computeLocalTop10(records);
if ((tweetsProcessed++) >= 2000) {
emitToDynamoDB(checkpointer);
}

}
}
Twitter Trends Shard Processing Code
private void computeLocalTop10(List<Record> records) {
for (Record r : records) {
String tweet = new String(r.getData().array());
String[] words = tweet.split(" t");
for (String word : words) {
if (word.startsWith("#")) {
if (!hashCount.containsKey(word)) hashCount.put(word, new AtomicInteger(0));
hashCount.get(word).incrementAndGet();
}
}
}
private void emitToDynamoDB(IRecordProcessorCheckpointer checkpointer) {
persistMapToDynamoDB(hashCount);
try {
checkpointer.checkpoint();
} catch (IOException | KinesisClientLibDependencyException | InvalidStateException
| ThrottlingException | ShutdownException e) {
// Error handling
}
hashCount = new HashMap<>();
tweetsProcessed = 0;
}
Kinesis as a gateway into the
Big Data eco-system
Leveraging the Big Data eco-system
Twitter trends
anonymized archive

Twitter trends
statistics

twitter-trends.com
Twitter trends
processing application
Connector framework
public interface IKinesisConnectorPipeline<T> {
IEmitter<T> getEmitter(KinesisConnectorConfiguration configuration);
IBuffer<T> getBuffer(KinesisConnectorConfiguration configuration);
ITransformer<T> getTransformer(
KinesisConnectorConfiguration configuration);
IFilter<T> getFilter(KinesisConnectorConfiguration configuration);
}
Transformer & Filter interfaces
public interface ITransformer<T> {
public T toClass(Record record) throws IOException;
public byte[] fromClass(T record) throws IOException;
}
public interface IFilter<T> {
public boolean keepRecord(T record);
}
Tweet transformer for Amazon S3 archival
public class TweetTransformer implements ITransformer<TwitterRecord> {
private final JsonFactory fact =
JsonActivityFeedProcessor.getObjectMapper().getJsonFactory();
@Override
public TwitterRecord toClass(Record record) {
try {
return fact.createJsonParser(
record.getData().array()).readValueAs(TwitterRecord.class);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
Tweet transformer (cont.)
@Override
public byte[] fromClass(TwitterRecord r) {
SimpleDateFormat df = new SimpleDateFormat("YYYY-MM-dd HH:MM:SS");
StringBuilder b = new StringBuilder();
b.append(r.getActor().getId()).append("|")
.append(sanitize(r.getActor().getDisplayName())).append("|")
.append(r.getActor().getFollowersCount()).append("|")
.append(r.getActor().getFriendsCount()).append("|”)
...
.append('n');
return b.toString().replaceAll("ufffdufffdufffd", "").getBytes();
}

}
Buffer interface
public interface IBuffer<T> {
public long getBytesToBuffer();
public long getNumRecordsToBuffer();
public boolean shouldFlush();
public void consumeRecord(T record, int recordBytes, String sequenceNumber);

public void clear();
public String getFirstSequenceNumber();
public String getLastSequenceNumber();

public List<T> getRecords();
}
Tweet aggregation buffer for Amazon Redshift
...
@Override
consumeRecord(TwitterAggRecord record, int recordSize, String sequenceNumber) {
if (buffer.isEmpty()) {
firstSequenceNumber = sequenceNumber;
}
lastSequenceNumber = sequenceNumber;
TwitterAggRecord agg = null;
if (!buffer.containsKey(record.getHashTag())) {
agg = new TwitterAggRecord();
buffer.put(agg);
byteCount.addAndGet(agg.getRecordSize());
}
agg = buffer.get(record.getHashTag());
agg.aggregate(record);
}
Tweet Amazon Redshift aggregation transformer
public class TweetTransformer implements ITransformer<TwitterAggRecord> {
private final JsonFactory fact =
JsonActivityFeedProcessor.getObjectMapper().getJsonFactory();
@Override
public TwitterAggRecord toClass(Record record) {
try {
return new TwitterAggRecord(
fact.createJsonParser(
record.getData().array()).readValueAs(TwitterRecord.class));
} catch (IOException e) {
throw new RuntimeException(e);
}
}
Tweet aggregation transformer (cont.)
@Override
public byte[] fromClass(TwitterAggRecord r) {
SimpleDateFormat df = new SimpleDateFormat("YYYY-MM-dd HH:MM:SS");
StringBuilder b = new StringBuilder();
b.append(r.getActor().getId()).append("|")
.append(sanitize(r.getActor().getDisplayName())).append("|")
.append(r.getActor().getFollowersCount()).append("|")
.append(r.getActor().getFriendsCount()).append("|”)
...
.append('n');
return b.toString().replaceAll("ufffdufffdufffd", "").getBytes();
}

}
Emitter interface
public interface IEmitter<T> {
List<T> emit(UnmodifiableBuffer<T> buffer) throws IOException;
void fail(List<T> records);
void shutdown();
}
Example Amazon Redshift connector
Amazon
Redshift

K
I
N
E
S
I
S
Amazon
S3
Example Amazon Redshift connector – v2

K
I
N
E
S
I
S

K
I
N
E
S
I
S

Amazon
Redshift
Amazon
S3
Amazon Kinesis: Key Developer Benefits
Easy Administration
Managed service for real-time streaming
data collection, processing and analysis.
Simply create a new stream, set the desired
level of capacity, and let the service handle
the rest.

Real-time Performance
Perform continual processing on streaming
big data. Processing latencies fall to a few
seconds, compared with the minutes or
hours associated with batch processing.

Elastic
Seamlessly scale to match your data
throughput rate and volume. You can easily
scale up to gigabytes per second. The service
will scale up or down based on your
operational or business needs.

S3, Redshift, & DynamoDB Integration

Build Real-time Applications

Low Cost

Reliably collect, process, and transform all of
your data in real-time & deliver to AWS data
stores of choice, with Connectors for
Amazon S3, Amazon Redshift, and Amazon
DynamoDB.

Client libraries that enable developers to
design and operate real-time streaming data
processing applications.

Cost-efficient for workloads of any scale. You
can get started by provisioning a small
stream, and pay low hourly rates only for
what you use.

41
Please give us your feedback on this
presentation

BDT311
As a thank you, we will select prize
winners daily for completed surveys!

You can follow us at
http://aws.amazon.com/kinesis
@awscloud #kinesis
Implementing
spot-the-revolution.org
spot-the-revolution.org – version 1

X

Twitter Firehose

spot-the-revolution
Storm
application

ZooKeeper

. . . . . $$

Kafka
cluster
spot-the-revolution.org – Kinesis version
Anonymized archive
1-minute
statistics

spot-the-revolution
Storm
application on EC2

Storm
connector
Generalizing the design
pattern
Clickstream analytics example
Clickstream archive
Aggregate
clickstream
statistics

Clickstream trend
analysis
Clickstream
processing application
Simple metering & billing example
Metering record archive
Incremental
bill
computation

Billing mgmt
service
Billing auditors

Mais conteúdo relacionado

Mais procurados

Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisAmazon Web Services
 
AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAmazon Web Services
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysisAmazon Web Services
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAmazon Web Services
 
Getting Started with Real-Time Analytics
Getting Started with Real-Time AnalyticsGetting Started with Real-Time Analytics
Getting Started with Real-Time AnalyticsAmazon Web Services
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsAmazon Web Services
 
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWSAmazon Web Services
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Amazon Web Services
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Chris Fregly
 
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSAmazon Web Services
 
FSI301 An Architecture for Trade Capture and Regulatory Reporting
FSI301 An Architecture for Trade Capture and Regulatory ReportingFSI301 An Architecture for Trade Capture and Regulatory Reporting
FSI301 An Architecture for Trade Capture and Regulatory ReportingAmazon Web Services
 
AWS re:Invent 2016: How Toyota Racing Development Makes Racing Decisions in R...
AWS re:Invent 2016: How Toyota Racing Development Makes Racing Decisions in R...AWS re:Invent 2016: How Toyota Racing Development Makes Racing Decisions in R...
AWS re:Invent 2016: How Toyota Racing Development Makes Racing Decisions in R...Amazon Web Services
 
Real-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxReal-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxAmazon Web Services
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services
 
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...Amazon Web Services
 

Mais procurados (20)

Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
 
AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis Webinar
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache Storm
 
AWS Kinesis Streams
AWS Kinesis StreamsAWS Kinesis Streams
AWS Kinesis Streams
 
Getting Started with Real-Time Analytics
Getting Started with Real-Time AnalyticsGetting Started with Real-Time Analytics
Getting Started with Real-Time Analytics
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
AWS Real-Time Event Processing
AWS Real-Time Event ProcessingAWS Real-Time Event Processing
AWS Real-Time Event Processing
 
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
FSI301 An Architecture for Trade Capture and Regulatory Reporting
FSI301 An Architecture for Trade Capture and Regulatory ReportingFSI301 An Architecture for Trade Capture and Regulatory Reporting
FSI301 An Architecture for Trade Capture and Regulatory Reporting
 
AWS re:Invent 2016: How Toyota Racing Development Makes Racing Decisions in R...
AWS re:Invent 2016: How Toyota Racing Development Makes Racing Decisions in R...AWS re:Invent 2016: How Toyota Racing Development Makes Racing Decisions in R...
AWS re:Invent 2016: How Toyota Racing Development Makes Racing Decisions in R...
 
Real-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxReal-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with Beeswax
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
 

Destaque

5分でできる ebfly
5分でできる ebfly5分でできる ebfly
5分でできる ebflyKazuyuki Honda
 
Spot Instance + Spark + MLlibで実現する簡単低コスト機械学習
Spot Instance + Spark + MLlibで実現する簡単低コスト機械学習Spot Instance + Spark + MLlibで実現する簡単低コスト機械学習
Spot Instance + Spark + MLlibで実現する簡単低コスト機械学習Katsushi Yamashita
 
CloudTrail でログとれ〜る
CloudTrail でログとれ〜るCloudTrail でログとれ〜る
CloudTrail でログとれ〜るHokuto Hoshi
 
20140329 modern logging and data analysis pattern on .NET
20140329 modern logging and data analysis pattern on .NET20140329 modern logging and data analysis pattern on .NET
20140329 modern logging and data analysis pattern on .NETTakayoshi Tanaka
 
Kinesis Analyticsの適用できない用途と、Kinesis Firehoseの苦労話
Kinesis Analyticsの適用できない用途と、Kinesis Firehoseの苦労話Kinesis Analyticsの適用できない用途と、Kinesis Firehoseの苦労話
Kinesis Analyticsの適用できない用途と、Kinesis Firehoseの苦労話Sotaro Kimura
 
Fluentdのお勧めシステム構成パターン
Fluentdのお勧めシステム構成パターンFluentdのお勧めシステム構成パターン
Fluentdのお勧めシステム構成パターンKentaro Yoshida
 
fluentd を利用した大規模ウェブサービスのロギング
fluentd を利用した大規模ウェブサービスのロギングfluentd を利用した大規模ウェブサービスのロギング
fluentd を利用した大規模ウェブサービスのロギングYuichi Tateno
 
Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本Sotaro Kimura
 

Destaque (10)

5分でできる ebfly
5分でできる ebfly5分でできる ebfly
5分でできる ebfly
 
Spot Instance + Spark + MLlibで実現する簡単低コスト機械学習
Spot Instance + Spark + MLlibで実現する簡単低コスト機械学習Spot Instance + Spark + MLlibで実現する簡単低コスト機械学習
Spot Instance + Spark + MLlibで実現する簡単低コスト機械学習
 
20140418 aws-casual-network
20140418 aws-casual-network20140418 aws-casual-network
20140418 aws-casual-network
 
AWS Casual2 LT
AWS Casual2 LTAWS Casual2 LT
AWS Casual2 LT
 
CloudTrail でログとれ〜る
CloudTrail でログとれ〜るCloudTrail でログとれ〜る
CloudTrail でログとれ〜る
 
20140329 modern logging and data analysis pattern on .NET
20140329 modern logging and data analysis pattern on .NET20140329 modern logging and data analysis pattern on .NET
20140329 modern logging and data analysis pattern on .NET
 
Kinesis Analyticsの適用できない用途と、Kinesis Firehoseの苦労話
Kinesis Analyticsの適用できない用途と、Kinesis Firehoseの苦労話Kinesis Analyticsの適用できない用途と、Kinesis Firehoseの苦労話
Kinesis Analyticsの適用できない用途と、Kinesis Firehoseの苦労話
 
Fluentdのお勧めシステム構成パターン
Fluentdのお勧めシステム構成パターンFluentdのお勧めシステム構成パターン
Fluentdのお勧めシステム構成パターン
 
fluentd を利用した大規模ウェブサービスのロギング
fluentd を利用した大規模ウェブサービスのロギングfluentd を利用した大規模ウェブサービスのロギング
fluentd を利用した大規模ウェブサービスのロギング
 
Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本
 

Semelhante a Build Real-Time Big Data Apps with Amazon Kinesis

Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteRoger Barga
 
Fraud Detection with Amazon Machine Learning on AWS
Fraud Detection with Amazon Machine Learning on AWSFraud Detection with Amazon Machine Learning on AWS
Fraud Detection with Amazon Machine Learning on AWSAmazon Web Services
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
 
The Scout24 Data Platform (A Technical Deep Dive)
The Scout24 Data Platform (A Technical Deep Dive)The Scout24 Data Platform (A Technical Deep Dive)
The Scout24 Data Platform (A Technical Deep Dive)RaffaelDzikowski
 
Fraud Detection and Prevention on AWS using Machine Learning
Fraud Detection and Prevention on AWS using Machine LearningFraud Detection and Prevention on AWS using Machine Learning
Fraud Detection and Prevention on AWS using Machine LearningAmazon Web Services
 
Self-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesSelf-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesNECST Lab @ Politecnico di Milano
 
Credit Agricole: Powering banking apps with the Elastic Stack
Credit Agricole: Powering banking apps with the Elastic StackCredit Agricole: Powering banking apps with the Elastic Stack
Credit Agricole: Powering banking apps with the Elastic StackElasticsearch
 
Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data projectMichael Peacock
 
Highway to heaven - Microservices Meetup Munich
Highway to heaven - Microservices Meetup MunichHighway to heaven - Microservices Meetup Munich
Highway to heaven - Microservices Meetup MunichChristian Deger
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxRenjithPillai26
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKSungmin Kim
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleSriram Krishnan
 
ENT227_IoT + Cloud enables Enterprise Digital Transformation
ENT227_IoT + Cloud enables Enterprise Digital TransformationENT227_IoT + Cloud enables Enterprise Digital Transformation
ENT227_IoT + Cloud enables Enterprise Digital TransformationAmazon Web Services
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesElasticsearch
 
[Velocity Conf 2017 NY] How Twitter built a framework to improve infrastructu...
[Velocity Conf 2017 NY] How Twitter built a framework to improve infrastructu...[Velocity Conf 2017 NY] How Twitter built a framework to improve infrastructu...
[Velocity Conf 2017 NY] How Twitter built a framework to improve infrastructu...Vinu Charanya
 
Stream Processing in SmartNews #jawsdays
Stream Processing in SmartNews #jawsdaysStream Processing in SmartNews #jawsdays
Stream Processing in SmartNews #jawsdaysSmartNews, Inc.
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architecturesArun Kejariwal
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...nimak
 

Semelhante a Build Real-Time Big Data Apps with Amazon Kinesis (20)

Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
 
Fraud Detection with Amazon Machine Learning on AWS
Fraud Detection with Amazon Machine Learning on AWSFraud Detection with Amazon Machine Learning on AWS
Fraud Detection with Amazon Machine Learning on AWS
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
The Scout24 Data Platform (A Technical Deep Dive)
The Scout24 Data Platform (A Technical Deep Dive)The Scout24 Data Platform (A Technical Deep Dive)
The Scout24 Data Platform (A Technical Deep Dive)
 
Fraud Detection and Prevention on AWS using Machine Learning
Fraud Detection and Prevention on AWS using Machine LearningFraud Detection and Prevention on AWS using Machine Learning
Fraud Detection and Prevention on AWS using Machine Learning
 
Self-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesSelf-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policies
 
Credit Agricole: Powering banking apps with the Elastic Stack
Credit Agricole: Powering banking apps with the Elastic StackCredit Agricole: Powering banking apps with the Elastic Stack
Credit Agricole: Powering banking apps with the Elastic Stack
 
Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data project
 
Highway to heaven - Microservices Meetup Munich
Highway to heaven - Microservices Meetup MunichHighway to heaven - Microservices Meetup Munich
Highway to heaven - Microservices Meetup Munich
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
ENT227_IoT + Cloud enables Enterprise Digital Transformation
ENT227_IoT + Cloud enables Enterprise Digital TransformationENT227_IoT + Cloud enables Enterprise Digital Transformation
ENT227_IoT + Cloud enables Enterprise Digital Transformation
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitables
 
[Velocity Conf 2017 NY] How Twitter built a framework to improve infrastructu...
[Velocity Conf 2017 NY] How Twitter built a framework to improve infrastructu...[Velocity Conf 2017 NY] How Twitter built a framework to improve infrastructu...
[Velocity Conf 2017 NY] How Twitter built a framework to improve infrastructu...
 
AWS AutoScalling- Tech Talks Maio 2019
AWS AutoScalling- Tech Talks Maio 2019AWS AutoScalling- Tech Talks Maio 2019
AWS AutoScalling- Tech Talks Maio 2019
 
Stream Processing in SmartNews #jawsdays
Stream Processing in SmartNews #jawsdaysStream Processing in SmartNews #jawsdays
Stream Processing in SmartNews #jawsdays
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architectures
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
 

Mais de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 

Último (20)

MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 

Build Real-Time Big Data Apps with Amazon Kinesis

  • 1. Supercharge Your Big Data Infrastructure with Amazon Kinesis: Learn to Build Real-time Streaming Big data Processing Applications Marvin Theimer, AWS November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2. Amazon Kinesis Managed Service for Real-Time Processing of Big Data App.1 Data Sources Availability Zone Data Sources Data Sources Availability Zone S3 App.2 AWS Endpoint Data Sources Availability Zone [Aggregate & De-Duplicate] Shard 1 Shard 2 Shard N [Metric Extraction] DynamoDB App.3 [Sliding Window Analysis] Redshift Data Sources App.4 [Machine Learning]
  • 3. Talk outline • A tour of Kinesis concepts in the context of a Twitter Trends service • Implementing the Twitter Trends service • Kinesis in the broader context of a Big Data ecosystem
  • 4. A tour of Kinesis concepts in the context of a Twitter Trends service
  • 6. Too big to handle on one box twitter-trends.com
  • 7. The solution: streaming map/reduce twitter-trends.com My top-10 My top-10 My top-10 Global top-10
  • 8. Core concepts Data record Stream Partition key twitter-trends.com My top-10 Global top-10 My top-10 My top-10 Data record Shard: Shard Worker 14 17 18 21 23 Sequence number
  • 9. How this relates to Kinesis twitter-trends.com Kinesis Kinesis application
  • 10. Core Concepts recapped • Data record ~ tweet • Stream ~ all tweets (the Twitter Firehose) • Partition key ~ Twitter topic (every tweet record belongs to exactly one of these) • Shard ~ all the data records belonging to a set of Twitter topics that will get grouped together • Sequence number ~ each data record gets one assigned when first ingested. • Worker ~ processes the records of a shard in sequence number order
  • 11. Using the Kinesis API directly twitter-trends.com K I N iterator = getShardIterator(shardId, LATEST); while (true) { process(records): { E while (record in records) { for (true) { localTop10Lists = S iterator] = [records, updateLocalTop10(record); scanDDBTable(); I } getNextRecords(iterator, maxRecsToReturn); updateGlobalTop10List( process(records); if (timeToDoOutput()) { localTop10Lists); S } writeLocalTop10ToDDB(); } } sleep(10); }
  • 12. Challenges with using the Kinesis API directly How many workers Manual creation of workers and How many EC2 instances? per to shards assignment EC2 instance? twitter-trends.com K I N E S I S Kinesis application
  • 13. Using the Kinesis Client Library twitter-trends.com K I N E S I S Shard mgmt table Kinesis application
  • 14. Elasticity and load balancing using Auto Scaling groups Auto scaling Group K I N E S I S twitter-trends.com Shard mgmt table
  • 15. Dynamic growth and hot spot management through stream resharding Auto scaling Group twitter-trends.com K I N E S I S Shard mgmt table
  • 16. The challenges of fault tolerance K I N E S I S X X X twitter-trends.com Shard mgmt table
  • 17. Fault tolerance support in KCL Availability Zone 1 K I N E S I S X twitter-trends.com Shard mgmt table Availability Zone 3
  • 18. Checkpoint, replay design pattern Kinesis Host A X Shard ID Local top-10 #Seahawks: 9835, …} {#Obama: 10235, #Congress: 9910, …} Shard-i 0 18 10 3 5 14 21 18 23 21 18 17 14 10 8 5 10 3 2 8 17 23 3 Host B 2 Shard-i 23 21 18 17 Shard ID 14 Lock Shard-i A Host B Seq num 10 23 0 10
  • 19. Catching up from delayed processing Kinesis Host A X 2 3 5 23 21 18 17 14 10 8 5 3 Shard-i Host B 2 10 How long until we’ve caught up? Shard throughput SLA: - 1 MBps in - 2 MBps out => Catch up time ~ down time 8 23 5 21 18 17 14 3 2 Unless your application can’t process 2 MBps on a single host => Provision more shards
  • 21. Why not combine applications? twitter-trends.com Kinesis Output state & checkpoint every 10 secs twitter-stats.com Output state & checkpoint every hour
  • 24. How many shards? Additional considerations: - How much processing is needed per data record/data byte? - How quickly do you need to catch up if one or more of your workers stalls or fails? You can always dynamically reshard to adjust to changing workloads
  • 25. Twitter Trends Shard Processing Code Class TwitterTrendsShardProcessor implements IRecordProcessor { public TwitterTrendsShardProcessor() { … } @Override public void initialize(String shardId) { … } @Override public void processRecords(List<Record> records, IRecordProcessorCheckpointer checkpointer) { … } @Override public void shutdown(IRecordProcessorCheckpointer checkpointer, ShutdownReason reason) { … } }
  • 26. Twitter Trends Shard Processing Code Class TwitterTrendsShardProcessor implements IRecordProcessor { private Map<String, AtomicInteger> hashCount = new HashMap<>(); private long tweetsProcessed = 0; @Override public void processRecords(List<Record> records, IRecordProcessorCheckpointer checkpointer) { computeLocalTop10(records); if ((tweetsProcessed++) >= 2000) { emitToDynamoDB(checkpointer); } } }
  • 27. Twitter Trends Shard Processing Code private void computeLocalTop10(List<Record> records) { for (Record r : records) { String tweet = new String(r.getData().array()); String[] words = tweet.split(" t"); for (String word : words) { if (word.startsWith("#")) { if (!hashCount.containsKey(word)) hashCount.put(word, new AtomicInteger(0)); hashCount.get(word).incrementAndGet(); } } } private void emitToDynamoDB(IRecordProcessorCheckpointer checkpointer) { persistMapToDynamoDB(hashCount); try { checkpointer.checkpoint(); } catch (IOException | KinesisClientLibDependencyException | InvalidStateException | ThrottlingException | ShutdownException e) { // Error handling } hashCount = new HashMap<>(); tweetsProcessed = 0; }
  • 28. Kinesis as a gateway into the Big Data eco-system
  • 29. Leveraging the Big Data eco-system Twitter trends anonymized archive Twitter trends statistics twitter-trends.com Twitter trends processing application
  • 30. Connector framework public interface IKinesisConnectorPipeline<T> { IEmitter<T> getEmitter(KinesisConnectorConfiguration configuration); IBuffer<T> getBuffer(KinesisConnectorConfiguration configuration); ITransformer<T> getTransformer( KinesisConnectorConfiguration configuration); IFilter<T> getFilter(KinesisConnectorConfiguration configuration); }
  • 31. Transformer & Filter interfaces public interface ITransformer<T> { public T toClass(Record record) throws IOException; public byte[] fromClass(T record) throws IOException; } public interface IFilter<T> { public boolean keepRecord(T record); }
  • 32. Tweet transformer for Amazon S3 archival public class TweetTransformer implements ITransformer<TwitterRecord> { private final JsonFactory fact = JsonActivityFeedProcessor.getObjectMapper().getJsonFactory(); @Override public TwitterRecord toClass(Record record) { try { return fact.createJsonParser( record.getData().array()).readValueAs(TwitterRecord.class); } catch (IOException e) { throw new RuntimeException(e); } }
  • 33. Tweet transformer (cont.) @Override public byte[] fromClass(TwitterRecord r) { SimpleDateFormat df = new SimpleDateFormat("YYYY-MM-dd HH:MM:SS"); StringBuilder b = new StringBuilder(); b.append(r.getActor().getId()).append("|") .append(sanitize(r.getActor().getDisplayName())).append("|") .append(r.getActor().getFollowersCount()).append("|") .append(r.getActor().getFriendsCount()).append("|”) ... .append('n'); return b.toString().replaceAll("ufffdufffdufffd", "").getBytes(); } }
  • 34. Buffer interface public interface IBuffer<T> { public long getBytesToBuffer(); public long getNumRecordsToBuffer(); public boolean shouldFlush(); public void consumeRecord(T record, int recordBytes, String sequenceNumber); public void clear(); public String getFirstSequenceNumber(); public String getLastSequenceNumber(); public List<T> getRecords(); }
  • 35. Tweet aggregation buffer for Amazon Redshift ... @Override consumeRecord(TwitterAggRecord record, int recordSize, String sequenceNumber) { if (buffer.isEmpty()) { firstSequenceNumber = sequenceNumber; } lastSequenceNumber = sequenceNumber; TwitterAggRecord agg = null; if (!buffer.containsKey(record.getHashTag())) { agg = new TwitterAggRecord(); buffer.put(agg); byteCount.addAndGet(agg.getRecordSize()); } agg = buffer.get(record.getHashTag()); agg.aggregate(record); }
  • 36. Tweet Amazon Redshift aggregation transformer public class TweetTransformer implements ITransformer<TwitterAggRecord> { private final JsonFactory fact = JsonActivityFeedProcessor.getObjectMapper().getJsonFactory(); @Override public TwitterAggRecord toClass(Record record) { try { return new TwitterAggRecord( fact.createJsonParser( record.getData().array()).readValueAs(TwitterRecord.class)); } catch (IOException e) { throw new RuntimeException(e); } }
  • 37. Tweet aggregation transformer (cont.) @Override public byte[] fromClass(TwitterAggRecord r) { SimpleDateFormat df = new SimpleDateFormat("YYYY-MM-dd HH:MM:SS"); StringBuilder b = new StringBuilder(); b.append(r.getActor().getId()).append("|") .append(sanitize(r.getActor().getDisplayName())).append("|") .append(r.getActor().getFollowersCount()).append("|") .append(r.getActor().getFriendsCount()).append("|”) ... .append('n'); return b.toString().replaceAll("ufffdufffdufffd", "").getBytes(); } }
  • 38. Emitter interface public interface IEmitter<T> { List<T> emit(UnmodifiableBuffer<T> buffer) throws IOException; void fail(List<T> records); void shutdown(); }
  • 39. Example Amazon Redshift connector Amazon Redshift K I N E S I S Amazon S3
  • 40. Example Amazon Redshift connector – v2 K I N E S I S K I N E S I S Amazon Redshift Amazon S3
  • 41. Amazon Kinesis: Key Developer Benefits Easy Administration Managed service for real-time streaming data collection, processing and analysis. Simply create a new stream, set the desired level of capacity, and let the service handle the rest. Real-time Performance Perform continual processing on streaming big data. Processing latencies fall to a few seconds, compared with the minutes or hours associated with batch processing. Elastic Seamlessly scale to match your data throughput rate and volume. You can easily scale up to gigabytes per second. The service will scale up or down based on your operational or business needs. S3, Redshift, & DynamoDB Integration Build Real-time Applications Low Cost Reliably collect, process, and transform all of your data in real-time & deliver to AWS data stores of choice, with Connectors for Amazon S3, Amazon Redshift, and Amazon DynamoDB. Client libraries that enable developers to design and operate real-time streaming data processing applications. Cost-efficient for workloads of any scale. You can get started by provisioning a small stream, and pay low hourly rates only for what you use. 41
  • 42. Please give us your feedback on this presentation BDT311 As a thank you, we will select prize winners daily for completed surveys! You can follow us at http://aws.amazon.com/kinesis @awscloud #kinesis
  • 44. spot-the-revolution.org – version 1 X Twitter Firehose spot-the-revolution Storm application ZooKeeper . . . . . $$ Kafka cluster
  • 45. spot-the-revolution.org – Kinesis version Anonymized archive 1-minute statistics spot-the-revolution Storm application on EC2 Storm connector
  • 47. Clickstream analytics example Clickstream archive Aggregate clickstream statistics Clickstream trend analysis Clickstream processing application
  • 48. Simple metering & billing example Metering record archive Incremental bill computation Billing mgmt service Billing auditors