SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
No bid left behind
My day to day handling a resilient real time bidding platform in a JVM environment. 
Marc de Palol
Trovit
Hey hi,
• Studied here (good to be back)
• Some research on supercomputing
• Moved to London, discovered Hadoop & intensive
data systems.
• Came back, still in the ‘Data Engineering’ stuff.
A classified search engine for property, jobs, cars, products and holiday rentals
• 180 Million ads,
• 170 Tb in the cluster
• 65 Million uniques / 170 Million visits
• 10 apps (iOS, Android)
• Cool office in Barcelona.
have a look at http://www.trovit.es
Real Time Bidding
It’s about selling ads.
• Per impression basis.
• Programmatic instantaneous auction
We are using ‘DoubleClick Ad Exchange’ (Google)
• Response under 100 ms.
• If 15% of our responses are invalid or timed out,
we stop getting bid requests progressively
Currently 10.000 QPS.
This system, literally, spends money. So, it must be rock solid.
Our system is coded carefully, with love and tests.
Still, sh*t happens.*t Happens
Resiliency
The ability to recover from unexpected errors.
The ability to sleep at night.
Detect Recover Warn
Detect Recover Warn
Monitoring
Resiliency
Patterns
Notifications
Monitoring, in a sensible way
• Logging with ‘mailAppender’
log4j.appender.mail=org.apache.log4j.net.SMTPAppender
log4j.appender.mail.SMTPHost=localhost
log4j.appender.mail.From=Error <error-bla@trovit.com>
log4j.appender.mail.To=tech@trovit.com, ceo@trovit.com
log4j.appender.mail.Subject=[ERROR] WE ARE GOING TO DIE
log4j.appender.mail.layout=org.apache.log4j.PatternLayout
log4j.appender.mail.threshold=ERROR
• Logging with ‘mailAppender’
Probably, no e-mail when you’ve got an OOM.
log4j.appender.mail=org.apache.log4j.net.SMTPAppender
log4j.appender.mail.SMTPHost=localhost
log4j.appender.mail.From=Error <error-bla@trovit.com>
log4j.appender.mail.To=tech@trovit.com, ceo@trovit.com
log4j.appender.mail.Subject=[ERROR] WE ARE GOING TO DIE
log4j.appender.mail.layout=org.apache.log4j.PatternLayout
log4j.appender.mail.threshold=ERROR
Let’s talk about OOM for a
minute.
Let’s talk about OOM for a
minute.
ps ax | grep java
Let’s talk about OOM for a
minute.
ps ax | grep java
JVMOpts=“-
XX:OnOutOfMemoryError=
/usr/local/bin/slack-msg.sh"
🚫
👍
Some cool ideas for improving memory usage
• byte[] serialization in objects ❗
• Varying Memory Conditions ❗
• Logging with ‘mailAppender’
• Bad when OOM.
• Logging with ‘mailAppender’
• Bad when OOM.
• Heartbeat
• Doing some real work
• Logging with ‘mailAppender’
• Bad when OOM.
• Heartbeat
• Doing some real work
• Supervision with actors
• If you’re using Akka
• control flow != data flow
Our Monitoring:
• Nagios.
• Logging (to Sentry)
• Heartbeats with real work.
• graphite comparison
Our Monitoring:
• Nagios.
• Logging (to Sentry)
• Heartbeats with real work.
• graphite comparison
Have graphs
Now we know that something
is going wrong.
Recovery
Bad data in the system
or / and
Errors in the system
Data errors.
Roll back (when possible)
• Keeping different versions in the DB.
• Keep the old version around.
• Know how to do a rollback.
Data errors.
Roll back (when possible)
• Keeping different versions in the DB.
• Keep the old version around.
• Know how to do a rollback.
Checks & Asserts with google guava.
checkArgument(i >= 0,
"Argument was %s but expected nonnegative", i);
checkArgument(i < j,
"Expected i < j, but %s > %s", i, j);
checkNotNull(myList,
"List should not be null")
checkState(object.isValid(),
"Object is not valid")
System errors
These happen mostly between system integrations.
• Your code and the DB.
• Your code and the 3rd party library.
• Your code and the queue.
DBs, a necessary supervillain
• Lost connection.
• Timeouts
• Can give you corrupted data.
• Can give you 0 data.
• Can give you too much data.
Circuit Breaker and his friend,
the Bulkhead Pattern.
Circuit Breaker
Our Beloved
CircuitBreakers
Bulkhead
Once the circuit breaker is open,
• Notify
• Try again! maybe.
• Try to avoid DOS your own system.
• Exponential retry.
• Failover
• Restart
Some other bits and pieces:
• Tight coupling leads to fast propagation of errors.
• Event driven stuff
• Complete parameter checking
• Avoid SPF’s. Pretty please.
• Stateless is better.
• Bounded queues!
Your turn.
mdepalol@trovit.com
@lant
[]
http://www.maxisciences.com/destruction/wallpaper

Mais conteúdo relacionado

Destaque

Destaque (6)

Competing to be unique
Competing to be uniqueCompeting to be unique
Competing to be unique
 
Hfile
HfileHfile
Hfile
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and Solutions
 
Erlang containers
Erlang containersErlang containers
Erlang containers
 
State of the art introduction
State of the art introductionState of the art introduction
State of the art introduction
 
Netty from the trenches
Netty from the trenchesNetty from the trenches
Netty from the trenches
 

Semelhante a No bid left behind

Breaking the oracle tie
Breaking the oracle tieBreaking the oracle tie
Breaking the oracle tie
agiamas
 
Prophet - Beijing Perl Workshop
Prophet - Beijing Perl WorkshopProphet - Beijing Perl Workshop
Prophet - Beijing Perl Workshop
Jesse Vincent
 

Semelhante a No bid left behind (20)

Trending with Purpose
Trending with PurposeTrending with Purpose
Trending with Purpose
 
Big Data Berlin - Criteo
Big Data Berlin - CriteoBig Data Berlin - Criteo
Big Data Berlin - Criteo
 
How Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with SnowplowHow Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with Snowplow
 
Machine Learning with Hadoop Boston hug 2012
Machine Learning with Hadoop Boston hug 2012Machine Learning with Hadoop Boston hug 2012
Machine Learning with Hadoop Boston hug 2012
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
Breaking the oracle tie
Breaking the oracle tieBreaking the oracle tie
Breaking the oracle tie
 
Rubyslava + PyVo #48
Rubyslava + PyVo #48Rubyslava + PyVo #48
Rubyslava + PyVo #48
 
Your app works slowly. Now what?
Your app works slowly. Now what?Your app works slowly. Now what?
Your app works slowly. Now what?
 
MySQL Performance Monitoring
MySQL Performance MonitoringMySQL Performance Monitoring
MySQL Performance Monitoring
 
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl
 
Monitoring of OpenNebula installations
Monitoring of OpenNebula installationsMonitoring of OpenNebula installations
Monitoring of OpenNebula installations
 
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tDefcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
 
Prophet - Beijing Perl Workshop
Prophet - Beijing Perl WorkshopProphet - Beijing Perl Workshop
Prophet - Beijing Perl Workshop
 
Rate Limiting at Scale, from SANS AppSec Las Vegas 2012
Rate Limiting at Scale, from SANS AppSec Las Vegas 2012Rate Limiting at Scale, from SANS AppSec Las Vegas 2012
Rate Limiting at Scale, from SANS AppSec Las Vegas 2012
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)
 
BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)
BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)
BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)
 
Message passing
Message passingMessage passing
Message passing
 
Stop using Nagios (so it can die peacefully)
Stop using Nagios (so it can die peacefully)Stop using Nagios (so it can die peacefully)
Stop using Nagios (so it can die peacefully)
 
The Big Data Journey at Connexity - Big Data Day LA 2015
The Big Data Journey at Connexity - Big Data Day LA 2015The Big Data Journey at Connexity - Big Data Day LA 2015
The Big Data Journey at Connexity - Big Data Day LA 2015
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajH2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
 

Último

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 

No bid left behind

  • 1. No bid left behind My day to day handling a resilient real time bidding platform in a JVM environment.  Marc de Palol Trovit
  • 2. Hey hi, • Studied here (good to be back) • Some research on supercomputing • Moved to London, discovered Hadoop & intensive data systems. • Came back, still in the ‘Data Engineering’ stuff.
  • 3. A classified search engine for property, jobs, cars, products and holiday rentals • 180 Million ads, • 170 Tb in the cluster • 65 Million uniques / 170 Million visits • 10 apps (iOS, Android) • Cool office in Barcelona. have a look at http://www.trovit.es
  • 4. Real Time Bidding It’s about selling ads. • Per impression basis. • Programmatic instantaneous auction
  • 5. We are using ‘DoubleClick Ad Exchange’ (Google) • Response under 100 ms. • If 15% of our responses are invalid or timed out, we stop getting bid requests progressively
  • 7. This system, literally, spends money. So, it must be rock solid. Our system is coded carefully, with love and tests.
  • 9. Resiliency The ability to recover from unexpected errors. The ability to sleep at night.
  • 10.
  • 11.
  • 12.
  • 15. Monitoring, in a sensible way
  • 16. • Logging with ‘mailAppender’ log4j.appender.mail=org.apache.log4j.net.SMTPAppender log4j.appender.mail.SMTPHost=localhost log4j.appender.mail.From=Error <error-bla@trovit.com> log4j.appender.mail.To=tech@trovit.com, ceo@trovit.com log4j.appender.mail.Subject=[ERROR] WE ARE GOING TO DIE log4j.appender.mail.layout=org.apache.log4j.PatternLayout log4j.appender.mail.threshold=ERROR
  • 17. • Logging with ‘mailAppender’ Probably, no e-mail when you’ve got an OOM. log4j.appender.mail=org.apache.log4j.net.SMTPAppender log4j.appender.mail.SMTPHost=localhost log4j.appender.mail.From=Error <error-bla@trovit.com> log4j.appender.mail.To=tech@trovit.com, ceo@trovit.com log4j.appender.mail.Subject=[ERROR] WE ARE GOING TO DIE log4j.appender.mail.layout=org.apache.log4j.PatternLayout log4j.appender.mail.threshold=ERROR
  • 18. Let’s talk about OOM for a minute.
  • 19. Let’s talk about OOM for a minute. ps ax | grep java
  • 20. Let’s talk about OOM for a minute. ps ax | grep java JVMOpts=“- XX:OnOutOfMemoryError= /usr/local/bin/slack-msg.sh" 🚫 👍
  • 21. Some cool ideas for improving memory usage • byte[] serialization in objects ❗ • Varying Memory Conditions ❗
  • 22. • Logging with ‘mailAppender’ • Bad when OOM.
  • 23. • Logging with ‘mailAppender’ • Bad when OOM. • Heartbeat • Doing some real work
  • 24. • Logging with ‘mailAppender’ • Bad when OOM. • Heartbeat • Doing some real work • Supervision with actors • If you’re using Akka • control flow != data flow
  • 25. Our Monitoring: • Nagios. • Logging (to Sentry) • Heartbeats with real work. • graphite comparison
  • 26. Our Monitoring: • Nagios. • Logging (to Sentry) • Heartbeats with real work. • graphite comparison
  • 28. Now we know that something is going wrong.
  • 30. Bad data in the system or / and Errors in the system
  • 31. Data errors. Roll back (when possible) • Keeping different versions in the DB. • Keep the old version around. • Know how to do a rollback.
  • 32. Data errors. Roll back (when possible) • Keeping different versions in the DB. • Keep the old version around. • Know how to do a rollback.
  • 33. Checks & Asserts with google guava. checkArgument(i >= 0, "Argument was %s but expected nonnegative", i); checkArgument(i < j, "Expected i < j, but %s > %s", i, j); checkNotNull(myList, "List should not be null") checkState(object.isValid(), "Object is not valid")
  • 34. System errors These happen mostly between system integrations. • Your code and the DB. • Your code and the 3rd party library. • Your code and the queue.
  • 35. DBs, a necessary supervillain • Lost connection. • Timeouts • Can give you corrupted data. • Can give you 0 data. • Can give you too much data.
  • 36. Circuit Breaker and his friend, the Bulkhead Pattern.
  • 37.
  • 41. Once the circuit breaker is open, • Notify • Try again! maybe. • Try to avoid DOS your own system. • Exponential retry. • Failover • Restart
  • 42. Some other bits and pieces: • Tight coupling leads to fast propagation of errors. • Event driven stuff • Complete parameter checking • Avoid SPF’s. Pretty please. • Stateless is better. • Bounded queues!