SlideShare uma empresa Scribd logo
1 de 55
Baixar para ler offline
Transforming Mobile 
Push Notifications with 
Big Data 
Dennis Waldron, Data Engineering 
Pablo Varela, Systems Engineering
Who is Plumbee? 
● 12.8M Installs 
● 209K Daily Active Users 
● 818K Monthly Active Users 
● Social Games Studio 
● Mirrorball Slots & Bingo 
● Facebook Canvas, iOS
Data Providers 
Inhouse data = 99.9% of all data 
In Total: 
● 98TB (907 days of data) 
● All stored in Amazon S3 
Daily: 
● 78GB compressed 
● ~450M events/day 
● 4,800 events/second (peak)
Architecture - Overview 
Events (JSON) 
Daily Batch Processing 
Aggregates 
Application/Game Servers 
End Users (Desktop & Mobile) 
Log Aggregators 
Amazon S3 
Amazon EMR 
(Elastic MapReduce) 
DataPipeline (Simple Storage Service) 
Amazon Redshift 
Plumbee Employees 
Analytics (SQL Queries) 
SQS Analytics Queue 
Events (JSON)
Amazon Web Service 
Application/Game Servers 
End Users (Desktop & Mobile) 
● Collect everything! 
● RPC events intercepted by 
annotated endpoints. (Requests) 
● All mutating state changes 
recorded: 
○ DynamoDB, MySQL, Memcache 
(Blobs Updates) 
● Custom Telemetry (Other): 
○ Client: click tracking, loading time 
statistics, GPU data... 
○ Server: promotions, transactions, 
Facebook user data... 
Game Data 
MySQL 
MemCache 
RPC 
77% 
9% 
OTHER 15% 
GENERATES 
DynamoDB
Game Data - Example RPC Endpoint Annotation 
/** 
* Example annotation 
*/ 
@SQSRequestLog(requestMessage = SpinRequest.class) 
@RequestMapping(“/spin”) 
public SpinResponse spin(SpinRequest spinRequest) { 
… 
}
Example Event - userStats 
● All events are recorded in JSON. 
● Structure: 
○ Headers 
○ Categorization Data (metadata) 
○ Payload (message) 
● Important Headers: 
○ timestamp 
○ testVariant 
○ plumbeeUid
Architecture - Collection 
Analytics (SQL Queries) 
Daily Batch Processing 
Aggregates 
Application/Game Servers 
End Users (Desktop & Mobile) 
Amazon S3 
Amazon EMR 
(Elastic MapReduce) 
DataPipeline (Simple Storage Service) 
Amazon Redshift 
Plumbee Employees 
Log Aggregators 
Events (JSON) 
SQS Analytics Queue 
Events (JSON)
Data Collection (I) - PUT 
Application/Game Servers 
Events (JSON) 
SQS Queue 
Log Aggregators 
Producers Consumers 
What is SQS (Simple Queue Service)? 
A cloud-based message queue for transmitting 
messages between producers and consumers 
SQS Provides: 
● ACK/FAIL semantics 
● Unlimited number of messages 
● Scales transparently 
● Buffer zone
Data Collection (II) - GET 
SQS Queue 
What is Apache Flume? 
A distributed, reliable, and available service 
for efficiently collecting, aggregating, and 
moving large amounts of log data 
Apache Flume 
Consumers 
Amazon S3 
(Simple Storage Service) 
S3 Data: 
● Partitioned by: date / type / sub_type 
● Compressed with: Snappy 
● Aggregated in 512MB chunks
Data Collection (III) - Flume 
Flume Agent 
Source 
(Custom) 
Sink 
(HDFS) 
SQS Queue 
Channel 
(File Based) 
● Pluggable component architecture 
● Durability via transactions 
● File channel use Elastic Book Store (EBS) volumes (network attached storage) 
○ Protects against Hardware failure 
● SQS Flume Plugin: https://github.com/plumbee/flume-sqs-source 
S3 Bucket 
Transactions 
A + B + C = Flow 
A B C
Architecture - Processing 
Events (JSON) 
Daily Batch Processing 
Aggregates 
Application/Game Servers 
End Users (Desktop & Mobile) 
Amazon S3 
Amazon EMR 
(Elastic MapReduce) 
DataPipeline (Simple Storage Service) 
Amazon Redshift 
Plumbee Employees 
Analytics (SQL Queries) 
SQS Analytics Queue 
Events (JSON)
Extract, Transform, Load 
● Daily activity 
● Orchestrated by Amazon DataPipeline 
● Includes generation of reports 
● Configured with JSON 
What is DataPipeline? 
A cloud-based data workflow service that 
helps you process and move data between 
different AWS services 
RESOURCE COMMAND SCHEDULE
Extract & Transform (I) 
What is Elastic Map Reduce? 
Cloud-based MapReduce implementation to 
process vast amounts of data built on top of 
the open-sourced Hadoop framework. 
Two phases: 
● Map() Procedure -> Filtering & Sorting 
● Reduce() -> Summary operation 
Penguin 
Horse 
Cake 
Cake 
Penguin 
Penguin 
Penguin 
Horse 
Horse 
Cake 
Cake 
Horse 
Horse 
Horse 
MAP() 
Penguin 
Penguin 
Penguin 
Penguin 
REDUCE() 
Cake: 2 Horse: 3 
RESULT SORTED QUEUES RAW DATA 
Penguin: 
4
Extract & Transform (II) 
What is Hive? 
An open-sourced Apache project with provides a 
SQL-Like interface to summarize, query and 
analysis large datasets by leveraging Hadoop’s 
MapReduce infrastructure. 
● Not really SQL, HQL -> HiveQL 
● No transactions, materialized views, 
limited subquery support, ... 
SELECT plumbeeuid, 
COUNT(*) AS spins 
FROM eventlog 
-- Partitioned data access 
WHERE event_date = '2014-11-18' 
AND event_type = 'rpc' 
AND event_sub_type = 'rpc-spin' 
-- Aggregation 
GROUP BY plumbeeuid; 
Table: Eventlog 
● Mounted on top of raw data 
● SerDe provides JSON parsing 
● Target data via partition filters
Extract & Transform (III) 
● Hive has limitations! 
○ Speed, JSON 
● Most of our transformations use: 
Streaming MapReduce Jobs 
What is Streaming? 
“A Hadoop utility that allows you to create 
and run MapReduce jobs using any 
executable script as a mapper or reducer” 
for line in sys.stdin: 
data = json.loads(line) 
print data['plumbeeUid'] + 't' + 1 
Emits, Key value Pairs 
466264 => 1, 376166 => 1 
983131 => 1, 466264 => 1 
Hadoop sorts and shuffles the data making sure 
matching keys are processed by a single reducer! 
results = defaultdict(int) 
for line in sys.stdin: 
plumbee_uid, count = line.split('t') 
results[plumbee_uid] += int(count) 
print results 
JSON rpc-spin 
Data 
Result: 
{ 466264: 2, 376166: 1, 983131: 1 } 
map() 
reduce()
Results 
Load (I) - Problem 
Raw S3 JSON Data Aggregated Data 
EMR Transformed data: 
● Referred to as aggregates 
● Stored in S3 
● Accessible via EMR cluster 
EMR Transformation 
(Hive & Streaming Jobs) 
5.4TB 
Problem 
● We don’t run long-lived EMR clusters. 
EMR requires: 
● Specialists knowledge 
● Is slow, processing and booting “offline”. 
Use Amazon Redshift for fast “online” data access
What is Redshift? 
A column-oriented database which uses 
Massive Parallel Processing (MPP) techniques 
to support analytics style SQL based 
workloads across large datasets. 
Power comes from: 
● Query parallelization 
● Column-oriented design 
Redshift Provides: 
● Low latency JDBC and ODBC access 
● Fault Tolerance 
● Automated Backups 
Load (II) - Redshift 
Redshift (x3 nodes): 0.33s 
EMR (x20 nodes): 135.46s
Load (II) - Column-Oriented Databases 
Row-oriented Database - MySQL 
ID First Name Last Name Country 
1 Penguin Situation GB 
2 Cheese Labs US 
3 Horse Barracks GB 
Column-oriented Database - Redshift 
ID First Name Last Name Country 
1 Penguin Situation GB 
2 Cheese Labs US 
3 Horse Barracks GB 
● East to add/modify records 
● Could read irrelevant data. 
● Great for fast lookups (OLTP) 
● Only read in relevant data 
● Adding rows requires multiple 
updates to column data. 
● Great for aggregation queries 
(OLAP)
Architecture - Revisit 
Daily Batch Processing 
Aggregates 
Application/Game Servers 
End Users (Desktop & Mobile) 
Amazon S3 
Amazon EMR 
(Elastic MapReduce) 
DataPipeline (Simple Storage Service) 
Amazon Redshift 
Plumbee Employees 
Analytics (SQL Queries) 
Log Aggregators 
Events (JSON) 
SQS Analytics Queue 
Events (JSON)
Q&A
Targeted Push 
Notifications
Mirrorball Slots: Kingdom of Riches
Mirrorball Slots: Challenges 
● recurring timed event 
● collect symbols from non-winning 
spins 
● get free coins if enough symbols are 
collected
Some players ask for notifications
Use Cases
Building blocks
Data Collection
Data Collection 
Players 
Amazon Redshift
Architecture - Overview 
Amazon Redshift 
Amazon S3 
Trigger Publisher Segmentation Workers 
Batch Processors Amazon SNS 
Players 
Targeting 
Mobile Push
User Targeting
User targeting 
Run SQL queries directly against Redshift 
SQL Query 
Amazon Redshift User Segment
User targeting: Query example 
-- Target all mobile users 
SELECT plumbee_uid, arn 
FROM mobile_user
User targeting: Query example (II) 
-- Target lapsed users (1 week lapse) 
SELECT plumbee_uid, arn 
FROM mobile_user 
WHERE last_play_time < (now - 7 days)
Demo (I) 
Mobile MBS Notifications
Architecture - Mobile Push 
Amazon Redshift 
Amazon S3 
Trigger Publisher Segmentation Workers 
Batch Processors Amazon SNS 
Players 
Targeting 
Mobile Push
Amazon Simple 
Notification Service
What is SNS? 
“Amazon Simple Notification Service (Amazon 
SNS) is a fast, flexible, fully managed push 
messaging service”
Amazon SNS
Amazon SNS
Amazon SNS: Device Registration 
Players Game Servers SQS Analytics Queue Amazon Redshift 
Amazon SNS 
register device 
event 
register
Amazon SNS: ARN Retrieval 
private String getArnForDeviceEndpoint(String platformApplicationArn, String deviceToken) { 
CreatePlatformEndpointRequest request = 
new CreatePlatformEndpointRequest() 
.withPlatformApplicationArn(platformApplicationArn) 
.withToken(deviceToken); 
CreatePlatformEndpointResult result = snsClient.createPlatformEndpoint(request); 
return result.getEndpointArn(); 
}
Amazon SNS: Analytics Event 
private String registerEndpointForApplicationAndPlatform( final long plumbeeUid, 
String platformARN, String platformToken) { 
final String deviceEndpointARN = getArnForDeviceEndpoint( platformARN , platformToken ); 
sqsLogger.queueMessage( new HashMap<String, Object>() {{ 
put( "notification", "register"); 
put( "plumbeeUid", plumbeeUid ); 
put( "provider", platformName ); 
put( "endpoint", deviceEndpointARN ); 
}}, null); 
return deviceEndpointARN; 
}
Amazon SNS: Mobile Push 
private void publishMessage(UserData userData, String jsonPayload) { 
amazonSNS.publish(new PublishRequest() 
.withTargetArn( userData.getEndpoint()) 
.withMessageStructure( "json") 
.withMessage( jsonPayload )); 
} 
Payload example 
{"default": "The 5 day Halloween Challenge has started today! Touch to play NOW!"}
Architecture - Orchestration 
Amazon Redshift 
Amazon S3 
Trigger Publisher Segmentation Workers 
Batch Processors Amazon SNS 
Players 
Targeting 
Mobile Push
Amazon Simple Workflow
What is Amazon SWF? 
“Amazon Simple Workflow (Amazon SWF) is a 
task coordination and state management 
service for cloud applications.”
What Amazon SWF provides 
● consistent execution state management 
● workflow executions and tasks tracking 
● non-duplicated dispatch of tasks 
● task routing and queuing 
● the AWS Flow Framework
Architecture - Orchestration 
Amazon Redshift 
Amazon S3 
Trigger Publisher Segmentation Workers 
Batch Processors Amazon SNS 
Players 
Targeting 
Mobile Push
Mobile Push: Scheduling 
Trigger Publish Service Amazon 
Simple Workflow
Mobile Push: Targeting 
query query 
target 
users 
Amazon SWF 
Amazon EC2 
Worker 
(Segmentation) 
Amazon 
Redshift 
Amazon 
S3
Mobile Push: Processing 
batch 1-N publish push 
Workers 
(Processing) 
Amazon SWF Read data + push End User
Mobile Push: Reporting 
send send 
Amazon SWF 
Amazon EC2 
Worker 
(Reporting) 
Amazon 
SES
Demo (II)
Q&A

Mais conteúdo relacionado

Mais procurados

Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon RedshiftJeff Patti
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven MicroservicesFabrizio Fortino
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkDatabricks
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming JobsDatabricks
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
 
Hadoop summit 2010, HONU
Hadoop summit 2010, HONUHadoop summit 2010, HONU
Hadoop summit 2010, HONUJerome Boulon
 
Pulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at ScalePulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at ScaleTony Ng
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Petr Zapletal
 
Streaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud DataflowStreaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud DataflowC4Media
 
Apache Spark for Library Developers with William Benton and Erik Erlandson
 Apache Spark for Library Developers with William Benton and Erik Erlandson Apache Spark for Library Developers with William Benton and Erik Erlandson
Apache Spark for Library Developers with William Benton and Erik ErlandsonDatabricks
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series DataMongoDB
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)Amazon Web Services
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 
Log everything! @DC13
Log everything! @DC13Log everything! @DC13
Log everything! @DC13DECK36
 
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB
 

Mais procurados (20)

Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon Redshift
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
 
Druid
DruidDruid
Druid
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
Hadoop summit 2010, HONU
Hadoop summit 2010, HONUHadoop summit 2010, HONU
Hadoop summit 2010, HONU
 
Pulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at ScalePulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at Scale
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Streaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud DataflowStreaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud Dataflow
 
Apache Spark for Library Developers with William Benton and Erik Erlandson
 Apache Spark for Library Developers with William Benton and Erik Erlandson Apache Spark for Library Developers with William Benton and Erik Erlandson
Apache Spark for Library Developers with William Benton and Erik Erlandson
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series Data
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Log everything! @DC13
Log everything! @DC13Log everything! @DC13
Log everything! @DC13
 
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor Management
 
Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討
 

Destaque

Your Guide to Push Notifications - Comparing GCM & APNS
Your Guide to Push Notifications - Comparing GCM & APNS  Your Guide to Push Notifications - Comparing GCM & APNS
Your Guide to Push Notifications - Comparing GCM & APNS Sparkbit
 
Brug - Web push notification
Brug  - Web push notificationBrug  - Web push notification
Brug - Web push notificationOlga Lavrentieva
 
Push notification to the open web
Push notification to the open webPush notification to the open web
Push notification to the open webAhmed Gamal
 
How to Choose Between Push Notifications and SMS | CM Telecom
How to Choose Between Push Notifications and SMS | CM TelecomHow to Choose Between Push Notifications and SMS | CM Telecom
How to Choose Between Push Notifications and SMS | CM TelecomCM.com
 
(MBL307) How Mobile Businesses and Enterprises Use Amazon SNS
(MBL307) How Mobile Businesses and Enterprises Use Amazon SNS(MBL307) How Mobile Businesses and Enterprises Use Amazon SNS
(MBL307) How Mobile Businesses and Enterprises Use Amazon SNSAmazon Web Services
 
Push notifications
Push notificationsPush notifications
Push notificationsDale Lane
 
Push Notifications for Websites
Push Notifications for WebsitesPush Notifications for Websites
Push Notifications for WebsitesRoost
 
web push notifications for your webapp
web push notifications for your webappweb push notifications for your webapp
web push notifications for your webappLahiru Jayakody
 
(MBL301) Beyond the App - Extend Your User Experience with Mobile Push Notifi...
(MBL301) Beyond the App - Extend Your User Experience with Mobile Push Notifi...(MBL301) Beyond the App - Extend Your User Experience with Mobile Push Notifi...
(MBL301) Beyond the App - Extend Your User Experience with Mobile Push Notifi...Amazon Web Services
 
MapReduce Design Patterns
MapReduce Design PatternsMapReduce Design Patterns
MapReduce Design PatternsDonald Miner
 

Destaque (13)

Your Guide to Push Notifications - Comparing GCM & APNS
Your Guide to Push Notifications - Comparing GCM & APNS  Your Guide to Push Notifications - Comparing GCM & APNS
Your Guide to Push Notifications - Comparing GCM & APNS
 
Brug - Web push notification
Brug  - Web push notificationBrug  - Web push notification
Brug - Web push notification
 
Push notification to the open web
Push notification to the open webPush notification to the open web
Push notification to the open web
 
Push notifications
Push notificationsPush notifications
Push notifications
 
How to Choose Between Push Notifications and SMS | CM Telecom
How to Choose Between Push Notifications and SMS | CM TelecomHow to Choose Between Push Notifications and SMS | CM Telecom
How to Choose Between Push Notifications and SMS | CM Telecom
 
Push notifications
Push notificationsPush notifications
Push notifications
 
(MBL307) How Mobile Businesses and Enterprises Use Amazon SNS
(MBL307) How Mobile Businesses and Enterprises Use Amazon SNS(MBL307) How Mobile Businesses and Enterprises Use Amazon SNS
(MBL307) How Mobile Businesses and Enterprises Use Amazon SNS
 
Push notifications
Push notificationsPush notifications
Push notifications
 
Push Notifications for Websites
Push Notifications for WebsitesPush Notifications for Websites
Push Notifications for Websites
 
Push Notification
Push NotificationPush Notification
Push Notification
 
web push notifications for your webapp
web push notifications for your webappweb push notifications for your webapp
web push notifications for your webapp
 
(MBL301) Beyond the App - Extend Your User Experience with Mobile Push Notifi...
(MBL301) Beyond the App - Extend Your User Experience with Mobile Push Notifi...(MBL301) Beyond the App - Extend Your User Experience with Mobile Push Notifi...
(MBL301) Beyond the App - Extend Your User Experience with Mobile Push Notifi...
 
MapReduce Design Patterns
MapReduce Design PatternsMapReduce Design Patterns
MapReduce Design Patterns
 

Semelhante a Transforming Mobile Push Notifications with Big Data

Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analyticsXiang Fu
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Amazon Web Services
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Amazon Web Services
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisAmazon Web Services
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationYi Pan
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호Amazon Web Services Korea
 
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDKGDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDKNate Wiger
 
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...Fwdays
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Martin Zapletal
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overviewjimliddle
 
Get Value From Your Data
Get Value From Your DataGet Value From Your Data
Get Value From Your DataDanilo Poccia
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
 
Building a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSBuilding a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSSmartNews, Inc.
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 

Semelhante a Transforming Mobile Push Notifications with Big Data (20)

Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
 
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDKGDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
 
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...
Sergei Sokolenko "Advances in Stream Analytics: Apache Beam and Google Cloud ...
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
 
Serverless Realtime Backup
Serverless Realtime BackupServerless Realtime Backup
Serverless Realtime Backup
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overview
 
Get Value From Your Data
Get Value From Your DataGet Value From Your Data
Get Value From Your Data
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Building a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSBuilding a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWS
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Transforming Mobile Push Notifications with Big Data

  • 1. Transforming Mobile Push Notifications with Big Data Dennis Waldron, Data Engineering Pablo Varela, Systems Engineering
  • 2. Who is Plumbee? ● 12.8M Installs ● 209K Daily Active Users ● 818K Monthly Active Users ● Social Games Studio ● Mirrorball Slots & Bingo ● Facebook Canvas, iOS
  • 3. Data Providers Inhouse data = 99.9% of all data In Total: ● 98TB (907 days of data) ● All stored in Amazon S3 Daily: ● 78GB compressed ● ~450M events/day ● 4,800 events/second (peak)
  • 4. Architecture - Overview Events (JSON) Daily Batch Processing Aggregates Application/Game Servers End Users (Desktop & Mobile) Log Aggregators Amazon S3 Amazon EMR (Elastic MapReduce) DataPipeline (Simple Storage Service) Amazon Redshift Plumbee Employees Analytics (SQL Queries) SQS Analytics Queue Events (JSON)
  • 5. Amazon Web Service Application/Game Servers End Users (Desktop & Mobile) ● Collect everything! ● RPC events intercepted by annotated endpoints. (Requests) ● All mutating state changes recorded: ○ DynamoDB, MySQL, Memcache (Blobs Updates) ● Custom Telemetry (Other): ○ Client: click tracking, loading time statistics, GPU data... ○ Server: promotions, transactions, Facebook user data... Game Data MySQL MemCache RPC 77% 9% OTHER 15% GENERATES DynamoDB
  • 6. Game Data - Example RPC Endpoint Annotation /** * Example annotation */ @SQSRequestLog(requestMessage = SpinRequest.class) @RequestMapping(“/spin”) public SpinResponse spin(SpinRequest spinRequest) { … }
  • 7. Example Event - userStats ● All events are recorded in JSON. ● Structure: ○ Headers ○ Categorization Data (metadata) ○ Payload (message) ● Important Headers: ○ timestamp ○ testVariant ○ plumbeeUid
  • 8. Architecture - Collection Analytics (SQL Queries) Daily Batch Processing Aggregates Application/Game Servers End Users (Desktop & Mobile) Amazon S3 Amazon EMR (Elastic MapReduce) DataPipeline (Simple Storage Service) Amazon Redshift Plumbee Employees Log Aggregators Events (JSON) SQS Analytics Queue Events (JSON)
  • 9. Data Collection (I) - PUT Application/Game Servers Events (JSON) SQS Queue Log Aggregators Producers Consumers What is SQS (Simple Queue Service)? A cloud-based message queue for transmitting messages between producers and consumers SQS Provides: ● ACK/FAIL semantics ● Unlimited number of messages ● Scales transparently ● Buffer zone
  • 10. Data Collection (II) - GET SQS Queue What is Apache Flume? A distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data Apache Flume Consumers Amazon S3 (Simple Storage Service) S3 Data: ● Partitioned by: date / type / sub_type ● Compressed with: Snappy ● Aggregated in 512MB chunks
  • 11. Data Collection (III) - Flume Flume Agent Source (Custom) Sink (HDFS) SQS Queue Channel (File Based) ● Pluggable component architecture ● Durability via transactions ● File channel use Elastic Book Store (EBS) volumes (network attached storage) ○ Protects against Hardware failure ● SQS Flume Plugin: https://github.com/plumbee/flume-sqs-source S3 Bucket Transactions A + B + C = Flow A B C
  • 12. Architecture - Processing Events (JSON) Daily Batch Processing Aggregates Application/Game Servers End Users (Desktop & Mobile) Amazon S3 Amazon EMR (Elastic MapReduce) DataPipeline (Simple Storage Service) Amazon Redshift Plumbee Employees Analytics (SQL Queries) SQS Analytics Queue Events (JSON)
  • 13. Extract, Transform, Load ● Daily activity ● Orchestrated by Amazon DataPipeline ● Includes generation of reports ● Configured with JSON What is DataPipeline? A cloud-based data workflow service that helps you process and move data between different AWS services RESOURCE COMMAND SCHEDULE
  • 14. Extract & Transform (I) What is Elastic Map Reduce? Cloud-based MapReduce implementation to process vast amounts of data built on top of the open-sourced Hadoop framework. Two phases: ● Map() Procedure -> Filtering & Sorting ● Reduce() -> Summary operation Penguin Horse Cake Cake Penguin Penguin Penguin Horse Horse Cake Cake Horse Horse Horse MAP() Penguin Penguin Penguin Penguin REDUCE() Cake: 2 Horse: 3 RESULT SORTED QUEUES RAW DATA Penguin: 4
  • 15. Extract & Transform (II) What is Hive? An open-sourced Apache project with provides a SQL-Like interface to summarize, query and analysis large datasets by leveraging Hadoop’s MapReduce infrastructure. ● Not really SQL, HQL -> HiveQL ● No transactions, materialized views, limited subquery support, ... SELECT plumbeeuid, COUNT(*) AS spins FROM eventlog -- Partitioned data access WHERE event_date = '2014-11-18' AND event_type = 'rpc' AND event_sub_type = 'rpc-spin' -- Aggregation GROUP BY plumbeeuid; Table: Eventlog ● Mounted on top of raw data ● SerDe provides JSON parsing ● Target data via partition filters
  • 16. Extract & Transform (III) ● Hive has limitations! ○ Speed, JSON ● Most of our transformations use: Streaming MapReduce Jobs What is Streaming? “A Hadoop utility that allows you to create and run MapReduce jobs using any executable script as a mapper or reducer” for line in sys.stdin: data = json.loads(line) print data['plumbeeUid'] + 't' + 1 Emits, Key value Pairs 466264 => 1, 376166 => 1 983131 => 1, 466264 => 1 Hadoop sorts and shuffles the data making sure matching keys are processed by a single reducer! results = defaultdict(int) for line in sys.stdin: plumbee_uid, count = line.split('t') results[plumbee_uid] += int(count) print results JSON rpc-spin Data Result: { 466264: 2, 376166: 1, 983131: 1 } map() reduce()
  • 17. Results Load (I) - Problem Raw S3 JSON Data Aggregated Data EMR Transformed data: ● Referred to as aggregates ● Stored in S3 ● Accessible via EMR cluster EMR Transformation (Hive & Streaming Jobs) 5.4TB Problem ● We don’t run long-lived EMR clusters. EMR requires: ● Specialists knowledge ● Is slow, processing and booting “offline”. Use Amazon Redshift for fast “online” data access
  • 18. What is Redshift? A column-oriented database which uses Massive Parallel Processing (MPP) techniques to support analytics style SQL based workloads across large datasets. Power comes from: ● Query parallelization ● Column-oriented design Redshift Provides: ● Low latency JDBC and ODBC access ● Fault Tolerance ● Automated Backups Load (II) - Redshift Redshift (x3 nodes): 0.33s EMR (x20 nodes): 135.46s
  • 19. Load (II) - Column-Oriented Databases Row-oriented Database - MySQL ID First Name Last Name Country 1 Penguin Situation GB 2 Cheese Labs US 3 Horse Barracks GB Column-oriented Database - Redshift ID First Name Last Name Country 1 Penguin Situation GB 2 Cheese Labs US 3 Horse Barracks GB ● East to add/modify records ● Could read irrelevant data. ● Great for fast lookups (OLTP) ● Only read in relevant data ● Adding rows requires multiple updates to column data. ● Great for aggregation queries (OLAP)
  • 20. Architecture - Revisit Daily Batch Processing Aggregates Application/Game Servers End Users (Desktop & Mobile) Amazon S3 Amazon EMR (Elastic MapReduce) DataPipeline (Simple Storage Service) Amazon Redshift Plumbee Employees Analytics (SQL Queries) Log Aggregators Events (JSON) SQS Analytics Queue Events (JSON)
  • 21. Q&A
  • 24. Mirrorball Slots: Challenges ● recurring timed event ● collect symbols from non-winning spins ● get free coins if enough symbols are collected
  • 25. Some players ask for notifications
  • 29. Data Collection Players Amazon Redshift
  • 30. Architecture - Overview Amazon Redshift Amazon S3 Trigger Publisher Segmentation Workers Batch Processors Amazon SNS Players Targeting Mobile Push
  • 32. User targeting Run SQL queries directly against Redshift SQL Query Amazon Redshift User Segment
  • 33. User targeting: Query example -- Target all mobile users SELECT plumbee_uid, arn FROM mobile_user
  • 34. User targeting: Query example (II) -- Target lapsed users (1 week lapse) SELECT plumbee_uid, arn FROM mobile_user WHERE last_play_time < (now - 7 days)
  • 35. Demo (I) Mobile MBS Notifications
  • 36. Architecture - Mobile Push Amazon Redshift Amazon S3 Trigger Publisher Segmentation Workers Batch Processors Amazon SNS Players Targeting Mobile Push
  • 38. What is SNS? “Amazon Simple Notification Service (Amazon SNS) is a fast, flexible, fully managed push messaging service”
  • 41. Amazon SNS: Device Registration Players Game Servers SQS Analytics Queue Amazon Redshift Amazon SNS register device event register
  • 42. Amazon SNS: ARN Retrieval private String getArnForDeviceEndpoint(String platformApplicationArn, String deviceToken) { CreatePlatformEndpointRequest request = new CreatePlatformEndpointRequest() .withPlatformApplicationArn(platformApplicationArn) .withToken(deviceToken); CreatePlatformEndpointResult result = snsClient.createPlatformEndpoint(request); return result.getEndpointArn(); }
  • 43. Amazon SNS: Analytics Event private String registerEndpointForApplicationAndPlatform( final long plumbeeUid, String platformARN, String platformToken) { final String deviceEndpointARN = getArnForDeviceEndpoint( platformARN , platformToken ); sqsLogger.queueMessage( new HashMap<String, Object>() {{ put( "notification", "register"); put( "plumbeeUid", plumbeeUid ); put( "provider", platformName ); put( "endpoint", deviceEndpointARN ); }}, null); return deviceEndpointARN; }
  • 44. Amazon SNS: Mobile Push private void publishMessage(UserData userData, String jsonPayload) { amazonSNS.publish(new PublishRequest() .withTargetArn( userData.getEndpoint()) .withMessageStructure( "json") .withMessage( jsonPayload )); } Payload example {"default": "The 5 day Halloween Challenge has started today! Touch to play NOW!"}
  • 45. Architecture - Orchestration Amazon Redshift Amazon S3 Trigger Publisher Segmentation Workers Batch Processors Amazon SNS Players Targeting Mobile Push
  • 47. What is Amazon SWF? “Amazon Simple Workflow (Amazon SWF) is a task coordination and state management service for cloud applications.”
  • 48. What Amazon SWF provides ● consistent execution state management ● workflow executions and tasks tracking ● non-duplicated dispatch of tasks ● task routing and queuing ● the AWS Flow Framework
  • 49. Architecture - Orchestration Amazon Redshift Amazon S3 Trigger Publisher Segmentation Workers Batch Processors Amazon SNS Players Targeting Mobile Push
  • 50. Mobile Push: Scheduling Trigger Publish Service Amazon Simple Workflow
  • 51. Mobile Push: Targeting query query target users Amazon SWF Amazon EC2 Worker (Segmentation) Amazon Redshift Amazon S3
  • 52. Mobile Push: Processing batch 1-N publish push Workers (Processing) Amazon SWF Read data + push End User
  • 53. Mobile Push: Reporting send send Amazon SWF Amazon EC2 Worker (Reporting) Amazon SES
  • 55. Q&A