SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Building a real-time, scalable
and intelligent programmatic
ad buying platform
Martín Bonamico
Juan Martín Pampliega
Agenda
1. Jampp
2. Adtech, RTB, clicks, installs, events
3. Initial Architecture
4. Initial Architecture Characteristics
5. Evolution of Data Needs
6. New Data Infrastructure - Stream Processing
7. Key Take Aways
Jampp and AdTech
Jampp is a leading mobile app
marketing and retargeting platform.
Founded in 2013, Jampp has offices in San
Francisco, London, Berlin, Buenos Aires, São
Paulo and Cape Town.
We help companies grow their business by
seamlessly acquiring, engaging & retaining
mobile app users.
Jampp’s platform combines machine
learning with big data for programmatic ad
buying which optimizes towards in-app
activity.
Our platform processes +200,000 RTB ad bid
requests per second (17+ billions per day)
which amounts to about 300 MB/s or 25 TB
of data per day.
How does programmatic ads work?
DOWNLOAD
APP
Source /
Exchange
Jampp
Tracking
Platform
AppStore /
Google Play
App
Install
Postback
Postback
RTB: Real Time Bidding
Jampp Events
1. RTB:
a. Auction: the exchange asks if we want to bid for the
impression.
b. Bid/Non-Bid: bid with price or non-bid (less than 80ms).
c. Impression: the ad is displayed to the user.
2. Non-RTB:
a. Click: event that marks when the user clicks on the ad.
b. Install: install of the app on first app open.
c. Event: in app events like purchase, view, favorited.
Data @ Jampp
● Our platform started using RDBMSs and a
traditional Data Warehouse architecture on Amazon
Web Services.
● Data grew exponentially and data needs became
more complex.
● In the last year alone, 2500%+ in-app events and
500%+ RTB bids.
● This made us evolve our architecture to be able to
effectively handle Big Data.
Initial Data Architecture
C1
C2
Cn
Cupper
Load
Balancer
MySQL
Click
Install
Event
Click
Redirect
PostgreSQL
B1 B2 Bn
Replicator
API
(Pivots)
Auctions Bids Impressions
Initial Jampp Infrastructure
Jampp Initial Systems: Bidder
● OpenRTB bidding system implementation that runs on
200+ virtual machines with 70GB RAM each.
● Strong latency requirements. Less than 80ms to answer a
request.
● Written in Cython and uses ZMQ for communication.
● Heavy use of coherent caching to comply with latency
requirements.
● Data is continually replicated and enriched from MySQL
by the replicator process.
Jampp Initial Systems: Cupper
● Event tracking system written in Node.js.
● Tracks clicks, installs and in-app events. (200+
millions per day)
● Can be scaled horizontally (10 instances) and is
located behind a load balancer (ELB).
● Uses a MySQL database to store attributed events
and Kinesis to store organics.
Jampp Initial Systems: API
● PostgreSQL is used as a Data Warehouse database apart
from the use the bidder does.
● An API exposes the data for querying with a caching
layer.
● Fact tables are maintained with hourly, daily and
monthly granularity and high cardinality dimensions are
removed in large fact tables for data older than 15 days.
● Data is continually aggregated through an aggregation
process written in Python.
Evolution of the Data
Architecture
Emerging Needs
● Log forensics capabilities - as our systems and company
scale and we integrate with more outside systems.
● More historical and granular data for advanced analytics
and model training.
● The need to make the data readily available to other
systems outside from the traditional RDBMS arose. Some
of these systems are too demanding for RDBMS to
handle easily.
C1
C2
Cn
Cupper
Load
Balancer
MySQL
(Ruby)
Click
Install
Event
Click
Redirect
ELB
Logs
C1
C2
Cn
EMR - Hadoop Cluster
AirPal
Initial Evolution
New System Characteristics
● The new system was based on Amazon Elastic Map
Reduce.
● Data imported hourly from RDBMSs with Sqoop.
● Logs are imported every 10 minutes from different
sources to S3 tables.
● Facebook PrestoDB and Apache Spark are used for
interactive log and analytics.
New System Characteristics
● Scalable storage and processing capabilities using
HDFS, YARN and Hive for ETLs and data storage.
● Connectors from different languages like Python,
Julia and Java/Scala.
● Data archiving in S3 for long term storage and
enabling other data processing technologies.
Aspects that needed improvement
● Data still imported in batch mode. Delay was larger
for MySQL data than with Python replicator.
● EMR not great for long running clusters.
● The EMR cluster is not designed with strong multi-
user capabilities. It is better to have multiple
clusters with few users than a big one with many.
● Data still being accumulated in RDBMSs (clicks,
installs, events).
Final stage of the evolution
● Real-time event processing architecture based on
best practices for stream processing in AWS.
● Uses Amazon Kinesis for streaming data storage
and Amazon Lambda for data processing.
● DynamoDB and Redis are used for temporal data
storage for enrichment and analytics.
● S3 gives us a Source of Truth for batch data
applications and Kinesis for stream processing.
Our Real-Time Architecture
Still, it isn’t perfect...
● There is no easy way to manage windows and out or
order data with Amazon Lambda.
● Consistency of DynamoDB and S3.
● Price of AWS managed services for events with large
numbers compared to custom maintained solutions.
● ACID guarantees of RDBMs are not an easy thing to part
with.
● SQL and indexes in RDBMs make forensics easier.
Benefits of the Evolution
● Enables the use of stream processing frameworks to
keep data as fresh as economically possible.
● Decouples data from processing to enable multiple Big
Data engines running on different clusters/
infrastructure.
● Easy on demand scaling given by AWS managed tools
like AWS Lambda, AWS DynamoDB and AWS EMR.
● Monitoring, logs and alerts managed by AWS
Cloudwatch.
Big Data Technologies at Jampp
S3HDFS
Hadoop/YARN
Lambda
DynamoDB
Key Take Aways
● Ad tech is a technologically intensive market which
complies with the three Vs from Big Data.
● As the business’ data needs grows in complexity specialized
data systems need to be put in place.
● Using technologies that are meant to scale easily and are
managed by a third party can bring you peace of mind.
● Stream processing is fundamental in new Big Data Projects.
● There is currently no one tool that clearly fulfills all the
needs for scalable and correct stream processing.
References
http://radar.oreilly.com/2015/08/the-world-beyond-batch-
streaming-101.html
http://radar.oreilly.com/2015/08/the-world-beyond-batch-
streaming-102.html
https://engineering.linkedin.com/distributed-systems/log-what-
every-software-engineer-should-know-about-real-time-datas-
unifying
http://blog.confluent.io/2015/01/29/making-sense-of-stream-
processing/
JAMPP - AGRANDA 2015
http://44jaiio.sadio.org.
ar/sites/default/files/agranda14-30.pdf
Questions?
geeks.jampp.com
We Are Hiring! - jampp.com/jobs.php
martin.bonamico@jampp.com
juan@jampp.com

Mais conteúdo relacionado

Mais procurados

Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
confluent
 
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
confluent
 

Mais procurados (20)

Microservices Live
Microservices LiveMicroservices Live
Microservices Live
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platform
 
Introduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted ConfIntroduction to Google Cloud Platform for Big Data - Trusted Conf
Introduction to Google Cloud Platform for Big Data - Trusted Conf
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
 
Customer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data PerspectiveCustomer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data Perspective
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
The Impact of Always-on Connectivity for Geospatial Applications and Analysis
The Impact of Always-on Connectivity for Geospatial Applications and AnalysisThe Impact of Always-on Connectivity for Geospatial Applications and Analysis
The Impact of Always-on Connectivity for Geospatial Applications and Analysis
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
 
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
 
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
 
Winning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive AnalyticsWinning the On-Demand Economy with Spark and Predictive Analytics
Winning the On-Demand Economy with Spark and Predictive Analytics
 
Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale
 
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan
 
Real Time Data Infrastructure team overview
Real Time Data Infrastructure team overviewReal Time Data Infrastructure team overview
Real Time Data Infrastructure team overview
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
 
Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoT
 

Destaque

Promoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPromoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices Environment
PyData
 

Destaque (9)

Flowics - Jornada en Big Data 2016 - ITBA
Flowics - Jornada en Big Data 2016 - ITBA Flowics - Jornada en Big Data 2016 - ITBA
Flowics - Jornada en Big Data 2016 - ITBA
 
Promoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPromoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices Environment
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
 
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
 
CRM@Oracle - Customer 360
CRM@Oracle - Customer 360CRM@Oracle - Customer 360
CRM@Oracle - Customer 360
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
 
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
 

Semelhante a Building a real-time, scalable and intelligent programmatic ad buying platform

November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory grid
Yahoo Developer Network
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
DataWorks Summit
 
Cloud computing infrastructure
Cloud computing infrastructureCloud computing infrastructure
Cloud computing infrastructure
sinhhn
 
Giga Spaces Getting Ready For The Cloud
Giga Spaces   Getting Ready For The CloudGiga Spaces   Getting Ready For The Cloud
Giga Spaces Getting Ready For The Cloud
chzesin
 
GigaSpaces - Getting Ready For The Cloud
GigaSpaces - Getting Ready For The CloudGigaSpaces - Getting Ready For The Cloud
GigaSpaces - Getting Ready For The Cloud
gigaspaces
 

Semelhante a Building a real-time, scalable and intelligent programmatic ad buying platform (20)

Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
 
No SQL at The Guardian
No SQL at The GuardianNo SQL at The Guardian
No SQL at The Guardian
 
NoSql presentation
NoSql presentationNoSql presentation
NoSql presentation
 
NoSQL meetup July 2011
NoSQL meetup July 2011NoSQL meetup July 2011
NoSQL meetup July 2011
 
Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2Serverlessusecase workshop feb3_v2
Serverlessusecase workshop feb3_v2
 
November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory grid
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
 
Cloud computing infrastructure
Cloud computing infrastructureCloud computing infrastructure
Cloud computing infrastructure
 
IBM z/OS Version 2 Release 2 -- Fueling the digital enterprise
IBM z/OS Version 2 Release 2 -- Fueling the digital enterpriseIBM z/OS Version 2 Release 2 -- Fueling the digital enterprise
IBM z/OS Version 2 Release 2 -- Fueling the digital enterprise
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
 
Giga Spaces Getting Ready For The Cloud
Giga Spaces   Getting Ready For The CloudGiga Spaces   Getting Ready For The Cloud
Giga Spaces Getting Ready For The Cloud
 
GigaSpaces - Getting Ready For The Cloud
GigaSpaces - Getting Ready For The CloudGigaSpaces - Getting Ready For The Cloud
GigaSpaces - Getting Ready For The Cloud
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Anz cics ts v5 technical update seminar intro (half day event)
Anz cics ts v5 technical update seminar intro (half day event)Anz cics ts v5 technical update seminar intro (half day event)
Anz cics ts v5 technical update seminar intro (half day event)
 
Automating Big Data with the Automic Hadoop Agent
Automating Big Data with the Automic Hadoop AgentAutomating Big Data with the Automic Hadoop Agent
Automating Big Data with the Automic Hadoop Agent
 
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
 

Último

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Último (20)

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 

Building a real-time, scalable and intelligent programmatic ad buying platform

  • 1. Building a real-time, scalable and intelligent programmatic ad buying platform Martín Bonamico Juan Martín Pampliega
  • 2. Agenda 1. Jampp 2. Adtech, RTB, clicks, installs, events 3. Initial Architecture 4. Initial Architecture Characteristics 5. Evolution of Data Needs 6. New Data Infrastructure - Stream Processing 7. Key Take Aways
  • 4. Jampp is a leading mobile app marketing and retargeting platform. Founded in 2013, Jampp has offices in San Francisco, London, Berlin, Buenos Aires, São Paulo and Cape Town. We help companies grow their business by seamlessly acquiring, engaging & retaining mobile app users.
  • 5. Jampp’s platform combines machine learning with big data for programmatic ad buying which optimizes towards in-app activity. Our platform processes +200,000 RTB ad bid requests per second (17+ billions per day) which amounts to about 300 MB/s or 25 TB of data per day.
  • 6. How does programmatic ads work? DOWNLOAD APP Source / Exchange Jampp Tracking Platform AppStore / Google Play App Install Postback Postback
  • 7. RTB: Real Time Bidding
  • 8. Jampp Events 1. RTB: a. Auction: the exchange asks if we want to bid for the impression. b. Bid/Non-Bid: bid with price or non-bid (less than 80ms). c. Impression: the ad is displayed to the user. 2. Non-RTB: a. Click: event that marks when the user clicks on the ad. b. Install: install of the app on first app open. c. Event: in app events like purchase, view, favorited.
  • 9. Data @ Jampp ● Our platform started using RDBMSs and a traditional Data Warehouse architecture on Amazon Web Services. ● Data grew exponentially and data needs became more complex. ● In the last year alone, 2500%+ in-app events and 500%+ RTB bids. ● This made us evolve our architecture to be able to effectively handle Big Data.
  • 12. Jampp Initial Systems: Bidder ● OpenRTB bidding system implementation that runs on 200+ virtual machines with 70GB RAM each. ● Strong latency requirements. Less than 80ms to answer a request. ● Written in Cython and uses ZMQ for communication. ● Heavy use of coherent caching to comply with latency requirements. ● Data is continually replicated and enriched from MySQL by the replicator process.
  • 13. Jampp Initial Systems: Cupper ● Event tracking system written in Node.js. ● Tracks clicks, installs and in-app events. (200+ millions per day) ● Can be scaled horizontally (10 instances) and is located behind a load balancer (ELB). ● Uses a MySQL database to store attributed events and Kinesis to store organics.
  • 14. Jampp Initial Systems: API ● PostgreSQL is used as a Data Warehouse database apart from the use the bidder does. ● An API exposes the data for querying with a caching layer. ● Fact tables are maintained with hourly, daily and monthly granularity and high cardinality dimensions are removed in large fact tables for data older than 15 days. ● Data is continually aggregated through an aggregation process written in Python.
  • 15. Evolution of the Data Architecture
  • 16. Emerging Needs ● Log forensics capabilities - as our systems and company scale and we integrate with more outside systems. ● More historical and granular data for advanced analytics and model training. ● The need to make the data readily available to other systems outside from the traditional RDBMS arose. Some of these systems are too demanding for RDBMS to handle easily.
  • 18. New System Characteristics ● The new system was based on Amazon Elastic Map Reduce. ● Data imported hourly from RDBMSs with Sqoop. ● Logs are imported every 10 minutes from different sources to S3 tables. ● Facebook PrestoDB and Apache Spark are used for interactive log and analytics.
  • 19. New System Characteristics ● Scalable storage and processing capabilities using HDFS, YARN and Hive for ETLs and data storage. ● Connectors from different languages like Python, Julia and Java/Scala. ● Data archiving in S3 for long term storage and enabling other data processing technologies.
  • 20. Aspects that needed improvement ● Data still imported in batch mode. Delay was larger for MySQL data than with Python replicator. ● EMR not great for long running clusters. ● The EMR cluster is not designed with strong multi- user capabilities. It is better to have multiple clusters with few users than a big one with many. ● Data still being accumulated in RDBMSs (clicks, installs, events).
  • 21. Final stage of the evolution ● Real-time event processing architecture based on best practices for stream processing in AWS. ● Uses Amazon Kinesis for streaming data storage and Amazon Lambda for data processing. ● DynamoDB and Redis are used for temporal data storage for enrichment and analytics. ● S3 gives us a Source of Truth for batch data applications and Kinesis for stream processing.
  • 23. Still, it isn’t perfect... ● There is no easy way to manage windows and out or order data with Amazon Lambda. ● Consistency of DynamoDB and S3. ● Price of AWS managed services for events with large numbers compared to custom maintained solutions. ● ACID guarantees of RDBMs are not an easy thing to part with. ● SQL and indexes in RDBMs make forensics easier.
  • 24. Benefits of the Evolution ● Enables the use of stream processing frameworks to keep data as fresh as economically possible. ● Decouples data from processing to enable multiple Big Data engines running on different clusters/ infrastructure. ● Easy on demand scaling given by AWS managed tools like AWS Lambda, AWS DynamoDB and AWS EMR. ● Monitoring, logs and alerts managed by AWS Cloudwatch.
  • 25. Big Data Technologies at Jampp S3HDFS Hadoop/YARN Lambda DynamoDB
  • 26. Key Take Aways ● Ad tech is a technologically intensive market which complies with the three Vs from Big Data. ● As the business’ data needs grows in complexity specialized data systems need to be put in place. ● Using technologies that are meant to scale easily and are managed by a third party can bring you peace of mind. ● Stream processing is fundamental in new Big Data Projects. ● There is currently no one tool that clearly fulfills all the needs for scalable and correct stream processing.
  • 28. Questions? geeks.jampp.com We Are Hiring! - jampp.com/jobs.php martin.bonamico@jampp.com juan@jampp.com