SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
WELCOME	
Alexey	Kharlamov,	VP	Technology
1	
•  Ad	Fraud	–	We	eliminate	fraudulent	impressions	by	
adbots	and	make	sure	ads	don’t	show	up	on	
fraudulent	web	sites	
	
•  Brand	Safety	–	We	make	sure	ads	don’t	show	up	in	
places	that	brands	don’t	want	them		
•  Viewability	–	We	measure	whether	a	person	actually	
viewed	an	ad	
	
WE	MEASURE	AND	ENSURE	QUALITY
2	
•  Ad	impressions	processed:	5+	Billion/day	
•  HTTP	Requests:	50+	Billion/day	
•  Data	Centers:	10+	(AWS	and	on-premises)	
•  Data	stored	in	clusters:	6+	petabytes	
•  New	data	collected	daily:	20+	terabytes	
•  Hadoop	cluster	processing	cores	~	20,000	
INTEGRAL	ENGINEERING	BY	THE	NUMBERS
H20	World,	11/10/2015	
	
	
			3	
AD	FRAUD	
NEARLY	ALL	AD	FRAUD	IS	CAUSED	BY	BOT	ACTIVITY	
Ad	Stacking	
Placing	mulVple	ads	on	
top	of	one	other	in	a	
single	ad	placement,	with	
only	the	top	ad	in	view	
Illegal	Bots	
Compromised	computers	with	breached	security	defenses	
conceded	to	a	third	party	
Pixel	Stuffing	
Stuffing	an	enVre	ad-
supported	site	into	a	1x1	
pixel	
AD
H20	World,	11/10/2015	
	
	
			4	
FRAUD	DETECTION	
facebook
cnn
ebay
nothingtoseehere.com
thisisnotabotnet.com
H20	World,	11/10/2015	
	
	
			5	
FRAUD	DETECTION	
facebook
cnn
ebay
nothingtoseehere.com
thisisnotabotnet.com
6	
REQUIREMENTS	
•  Quickly	idenVfy	freshly	acVvated	bot	
•  High	accuracy	of	detecVon	algorithms	
•  Avoid	transfer	of	personal	informaVon	across	borders	
•  Withstand	single	data	center	failure
BLOCKING																		MONITORING	
5+	billion	events	per	day
8	
EVENT	SESSIONIZATION	
TimeTransaction 1 Transaction 2
Join Window
Impression 1
Impression 2
UnloadDTDTDTInit
DTDTDTInit Timeout
Emit
Emit
Impression 3
DTDTDT Timeout
Drop
9	
DATA	FLOW	
InputTopic
Session
Builder
QLogTopic
Fraud
Detection
Hadoop
Model
Training
Assets
Firewall
10	
•  Local	log	aggregaVon	and	processing		
•  Transfer	over	long	links	causes	all	sorts	of	synchronizaVon	
problems	
•  Intra-DC	links	are	reliable,	Internet	is	NOT.	We	can	keep	data	
locality	and	log	Vme	coherence	
•  Single	firewall	server	failure	is	not	“stop-the-world”	event.	
Data	present	on	Kaaa	cluster.		
	
•  A	completely	autonomous	system		
•  Higher	availability	due	DC	redundancy	
INTRA-DC	DATA	PROCESSING
11	
DATA	CENTER	ARCHITECTURE	
Server 1
Front-End
Server
STORMFront-End
Server
Server N
STORM
Front-End
Server
Front-End
Server
LOG	SOURCING:	TAILER	AGENT	
●  Non-invasive	event	sourcing	
●  Decoupled	data	publica[on	
and	event	processing	
●  Data	fan-out	
●  Hard	latency	requirements	
●  <10ms	response	
●  Periodic	checkpoints	to	
recover	acer	failure
RECOVERY	STRATEGY	
•  Read	logs	in	micro-batches	and	maintain	state	in	
memory	
•  Reliable	Processing	
-  On	success	operaVon	-	write	checkpoint	
-  On	failure	return	to	previous	checkpoint	
-  On	catastrophic	failure	rewind	data	feed	to	a	point	before	
the	problem	started
LOGICAL	TIME	
●  Wall-clock	does	not	work	
●  Load	spikes		
●  Recovery	rewinds	data	feed	to	
previous	Vme	
●  Logical	clock	
●  Maximum	Vmestamp	seen	by	Bolt	
●  New	messages	with	smaller	
Vmestamp	are	late	
●  No	clock	synchronizaVon	
●  All	bolts	are	in	“weak	synchrony”
DEBUGGING	AND	MONITORING	
•  Metrics	recording	and	visualizaVon	is	essenVal	
component	of	development	cycle	
-  Ease	failure	symptoms	correlaVon	
-  Accelerate	build/deploy/debug	cycle	
-  Provide	trace	for	producVon	issues	
	
•  Monitor	business	metrics	
-  This	is	the	only	thing	you	care	
-  Technical	issues	may	or	may	not	have	consequences	
•  Do	it	a	lot	
-  150K	metrics/sec	
	
15
GLOBAL	CONFIGURATION	
16	
EAST COAST EUROPEDC-X
Stream
Mirror
Stream
Mirror
Stream
Mirror
Kafka Backbone
Spark
CENTRAL
Hadoop
LESSONS	LEARNED	
•  Use	staged	roll-out	
-  Start	from	minimal	infrastructure	for	logs	delivery	
•  Do	not	try	to	build	a	fortress	
-  It	is	much	easier	to	build	a	systems	accepVng	limited	data	
loss	
	
•  Minimize	persistent	state	
-  Slows	system	down	
-  Expensive	to	maintain	
	
•  Hardware	magers	
	
17
THANK	YOU!	
alexey@integralads.com

Mais conteúdo relacionado

Semelhante a Real-Time Fraud Detection with Storm and Kafka

appMobi HTML5 Gaming
appMobi HTML5 GamingappMobi HTML5 Gaming
appMobi HTML5 GamingAndrew Smith
 
INT2016 - Paul Barford (comScore) - Invalid Traffic & Viewability: what is th...
INT2016 - Paul Barford (comScore) - Invalid Traffic & Viewability: what is th...INT2016 - Paul Barford (comScore) - Invalid Traffic & Viewability: what is th...
INT2016 - Paul Barford (comScore) - Invalid Traffic & Viewability: what is th...Cristal Events
 
INT2016 Keynote - Paul Barford (comScore) - Invalid Traffic & Viewability: wh...
INT2016 Keynote - Paul Barford (comScore) - Invalid Traffic & Viewability: wh...INT2016 Keynote - Paul Barford (comScore) - Invalid Traffic & Viewability: wh...
INT2016 Keynote - Paul Barford (comScore) - Invalid Traffic & Viewability: wh...IAB Europe
 
What is online ad fraud and what does um do about it
What is online ad fraud and what does um do about itWhat is online ad fraud and what does um do about it
What is online ad fraud and what does um do about itAlan King
 
Iab bots how to_find_them_webinar_2014_03_27
Iab bots how to_find_them_webinar_2014_03_27Iab bots how to_find_them_webinar_2014_03_27
Iab bots how to_find_them_webinar_2014_03_27IABmembership
 
Bot how to find them 2014_27_03
Bot how to find them 2014_27_03Bot how to find them 2014_27_03
Bot how to find them 2014_27_03IABmembership
 
Programmatic Advertising: How To Join In On the Fun
Programmatic Advertising: How To Join In On the FunProgrammatic Advertising: How To Join In On the Fun
Programmatic Advertising: How To Join In On the FunHanapin Marketing
 
Field Guide for Validating Premium Ad Inventory
Field Guide for Validating Premium Ad InventoryField Guide for Validating Premium Ad Inventory
Field Guide for Validating Premium Ad InventoryDistil Networks
 
Tune in for the Ultimate WAF Torture Test: Bots Attack!
Tune in for the Ultimate WAF Torture Test: Bots Attack!Tune in for the Ultimate WAF Torture Test: Bots Attack!
Tune in for the Ultimate WAF Torture Test: Bots Attack!Distil Networks
 
Honeypots, Deception, and Frankenstein
Honeypots, Deception, and FrankensteinHoneypots, Deception, and Frankenstein
Honeypots, Deception, and FrankensteinPhillip Maddux
 
Data driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & SnowplowData driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & SnowplowGiuseppe Gaviani
 
Bot detection deck 042514 final
Bot detection deck 042514 finalBot detection deck 042514 final
Bot detection deck 042514 finalVindicoGroup
 
IBM s'associe au SmartHome Challenge
IBM s'associe au SmartHome ChallengeIBM s'associe au SmartHome Challenge
IBM s'associe au SmartHome ChallengeIBM France
 
How Tracking Companies Circumvented Ad Blockers Using WebSockets
How Tracking Companies Circumvented Ad Blockers Using WebSocketsHow Tracking Companies Circumvented Ad Blockers Using WebSockets
How Tracking Companies Circumvented Ad Blockers Using WebSocketsSajjad "JJ" Arshad
 
Introduction to Cloud Computing with AWS (Thai Session)
Introduction to Cloud Computing with AWS (Thai Session)Introduction to Cloud Computing with AWS (Thai Session)
Introduction to Cloud Computing with AWS (Thai Session)Amazon Web Services
 
Securing your digital identity with drupal
Securing your digital identity with drupalSecuring your digital identity with drupal
Securing your digital identity with drupalmysty
 

Semelhante a Real-Time Fraud Detection with Storm and Kafka (20)

appMobi HTML5 Gaming
appMobi HTML5 GamingappMobi HTML5 Gaming
appMobi HTML5 Gaming
 
Independent Objective Reviews of Anti-Fraud Companies by Augustine Fou
Independent Objective Reviews of Anti-Fraud Companies by Augustine FouIndependent Objective Reviews of Anti-Fraud Companies by Augustine Fou
Independent Objective Reviews of Anti-Fraud Companies by Augustine Fou
 
INT2016 - Paul Barford (comScore) - Invalid Traffic & Viewability: what is th...
INT2016 - Paul Barford (comScore) - Invalid Traffic & Viewability: what is th...INT2016 - Paul Barford (comScore) - Invalid Traffic & Viewability: what is th...
INT2016 - Paul Barford (comScore) - Invalid Traffic & Viewability: what is th...
 
INT2016 Keynote - Paul Barford (comScore) - Invalid Traffic & Viewability: wh...
INT2016 Keynote - Paul Barford (comScore) - Invalid Traffic & Viewability: wh...INT2016 Keynote - Paul Barford (comScore) - Invalid Traffic & Viewability: wh...
INT2016 Keynote - Paul Barford (comScore) - Invalid Traffic & Viewability: wh...
 
What is online ad fraud and what does um do about it
What is online ad fraud and what does um do about itWhat is online ad fraud and what does um do about it
What is online ad fraud and what does um do about it
 
Iab bots how to_find_them_webinar_2014_03_27
Iab bots how to_find_them_webinar_2014_03_27Iab bots how to_find_them_webinar_2014_03_27
Iab bots how to_find_them_webinar_2014_03_27
 
Bot how to find them 2014_27_03
Bot how to find them 2014_27_03Bot how to find them 2014_27_03
Bot how to find them 2014_27_03
 
Programmatic Advertising: How To Join In On the Fun
Programmatic Advertising: How To Join In On the FunProgrammatic Advertising: How To Join In On the Fun
Programmatic Advertising: How To Join In On the Fun
 
Field Guide for Validating Premium Ad Inventory
Field Guide for Validating Premium Ad InventoryField Guide for Validating Premium Ad Inventory
Field Guide for Validating Premium Ad Inventory
 
Tune in for the Ultimate WAF Torture Test: Bots Attack!
Tune in for the Ultimate WAF Torture Test: Bots Attack!Tune in for the Ultimate WAF Torture Test: Bots Attack!
Tune in for the Ultimate WAF Torture Test: Bots Attack!
 
Honeypots, Deception, and Frankenstein
Honeypots, Deception, and FrankensteinHoneypots, Deception, and Frankenstein
Honeypots, Deception, and Frankenstein
 
Data driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & SnowplowData driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & Snowplow
 
Bot detection deck 042514 final
Bot detection deck 042514 finalBot detection deck 042514 final
Bot detection deck 042514 final
 
IBM s'associe au SmartHome Challenge
IBM s'associe au SmartHome ChallengeIBM s'associe au SmartHome Challenge
IBM s'associe au SmartHome Challenge
 
Deep and Dark internet Safari, How to hire a hacker? Robbrecht van Amerongen
Deep and Dark internet Safari, How to hire a hacker? Robbrecht van AmerongenDeep and Dark internet Safari, How to hire a hacker? Robbrecht van Amerongen
Deep and Dark internet Safari, How to hire a hacker? Robbrecht van Amerongen
 
How Tracking Companies Circumvented Ad Blockers Using WebSockets
How Tracking Companies Circumvented Ad Blockers Using WebSocketsHow Tracking Companies Circumvented Ad Blockers Using WebSockets
How Tracking Companies Circumvented Ad Blockers Using WebSockets
 
Introduction to Cloud Computing with AWS (Thai Session)
Introduction to Cloud Computing with AWS (Thai Session)Introduction to Cloud Computing with AWS (Thai Session)
Introduction to Cloud Computing with AWS (Thai Session)
 
Securing your digital identity with drupal
Securing your digital identity with drupalSecuring your digital identity with drupal
Securing your digital identity with drupal
 
Ppt quickrads
Ppt quickradsPpt quickrads
Ppt quickrads
 
quickrads
quickradsquickrads
quickrads
 

Último

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Real-Time Fraud Detection with Storm and Kafka