SlideShare a Scribd company logo
1 of 33
Sunday, 19 August 12
Triage
                       Dealing with errors in production
                             PyCon Australia 2012

                            Luke Cawood / @lwcd
                         Lars Yencken / @larsyencken




Sunday, 19 August 12
99designs



Sunday, 19 August 12
Sunday, 19 August 12
Balancer



                                  Cache
                                  Cache


                                  App
                                  App
                                   App
                                    App



                       Memcache     DB
                                    DB       Queue



                                             Worker


Sunday, 19 August 12
Balancer




                                   Cache


                                  App
                                  App
                                   App
                                    App



                       Memcache      DB      Queue



                                             Worker


Sunday, 19 August 12
Errors



Sunday, 19 August 12
Sunday, 19 August 12
Hmmm....




Sunday, 19 August 12
Triage



Sunday, 19 August 12
Triage
                       • Improve signal to noise ratio by aggregating
                         similar errors
                       • Allow for claiming, resolving and ranking
                         errors in terms of importance
                       • Integration with github, build tools
                       • Play with new tools and technology
                       • Provide open source alternative to
                         commercial products in this space
Sunday, 19 August 12
Round 1(Fight!)




Sunday, 19 August 12
Round 1(Fight!)

                       • Errors continue to log directly to mongo
                       • Aggregation via incremental MapReduce
                       • Deliver a prototype in one day


Sunday, 19 August 12
Sunday, 19 August 12
Scalability Fatality!

                       • Worked fine during development
                       • Production load caused the MapReduce to
                         asplode!
                       • (Not that we have a lot of errors, right?!)


Sunday, 19 August 12
Round 2




Sunday, 19 August 12
(sub)zeroMQ

                       •   Async error API using
                           zeroMQ pub/sub
                           sockets

                       •   MessagePack as error
                           format (fast, binary)

                       •   Aggregation in python




Sunday, 19 August 12
Aggregation Method

                       • Generate hash in python based on error
                         document
                       • Query mongo for error hash
                       • Create or update error document based
                         on outcome of query, incrementing
                         counters etc where appropriate



Sunday, 19 August 12
Sunday, 19 August 12
Sunday, 19 August 12
Sunday, 19 August 12
Scalability Fatality 2

                       • Multithreaded experiments
                       • Mongo optimisations
                        • There is no schema
                        • The cake is a lie
                       • Mongo ‘upsert’ rocks!

Sunday, 19 August 12
Updating like a boss
                       collection.update(criteria, document, upsert=False)




Sunday, 19 August 12
Updating like a boss
                       collection.update(criteria, document, upsert=False)




Sunday, 19 August 12
Updating like a boss
                       collection.update(criteria, document, upsert=False)




Sunday, 19 August 12
Updating like a boss
                       collection.update(criteria, document, upsert=False)




Sunday, 19 August 12
Updating like a boss
                       collection.update(criteria, document, upsert=False)




Sunday, 19 August 12
Sunday, 19 August 12
Outcomes & future



Sunday, 19 August 12
Outcomes

                       • Getting the ‘right’ level of grouping hard
                       • What to do with errors that just wont go
                         away?
                       • Error occurrence count - what does this
                         tell us?



Sunday, 19 August 12
Future

                       • Easier installation, package in pypi
                       • Better language support (plz halp)
                       • Drop in replacement for airbrake etc
                       • Client side logging (javascript)
                       • Email style filters & actions - ifttt.com

Sunday, 19 August 12
Thanks
                       •   99designs for research and development time

                       •   Contributors:

                               •   Luke Cawood - Project lead

                               •   Josh Benham - Developer

                               •   Jamison Lu - Developer

                           •   Additional assistance

                               •   Lars Yencken - Operations

                               •   99designs UX team




Sunday, 19 August 12
Thanks for listening!
                          https://github.com/lwc/triage



Sunday, 19 August 12

More Related Content

Similar to Triage: real-world error logging for web applications

Disposable Testing Environments: There's Nothing Like Production Except Produ...
Disposable Testing Environments: There's Nothing Like Production Except Produ...Disposable Testing Environments: There's Nothing Like Production Except Produ...
Disposable Testing Environments: There's Nothing Like Production Except Produ...Atlassian
 
Cloud4all Architecture Overview
Cloud4all Architecture OverviewCloud4all Architecture Overview
Cloud4all Architecture Overviewicchp2012
 
Html5 new sword for interactive app
Html5 new sword for interactive appHtml5 new sword for interactive app
Html5 new sword for interactive appYohan Totting
 
Responsive Web Design & Workflow
Responsive Web Design & WorkflowResponsive Web Design & Workflow
Responsive Web Design & Workflowhouhr
 
99 inception-deck
99 inception-deck99 inception-deck
99 inception-deckdrewz lin
 
Cloud Tech III: Actionable Metrics
Cloud Tech III: Actionable MetricsCloud Tech III: Actionable Metrics
Cloud Tech III: Actionable Metricsroyrapoport
 
Caching, sharding, distributing - Scaling best practices
Caching, sharding, distributing - Scaling best practicesCaching, sharding, distributing - Scaling best practices
Caching, sharding, distributing - Scaling best practicesLars Jankowfsky
 
Cross-platform tools for mobile application development
Cross-platform tools for mobile application developmentCross-platform tools for mobile application development
Cross-platform tools for mobile application developmentbertouttier
 
[JVMLS 12] Kotlin / Java Interop
[JVMLS 12] Kotlin / Java Interop[JVMLS 12] Kotlin / Java Interop
[JVMLS 12] Kotlin / Java InteropAndrey Breslav
 
Core Data in Motion
Core Data in MotionCore Data in Motion
Core Data in MotionLori Olson
 
JS-Everywhere - LocalStorage Hands-on
JS-Everywhere - LocalStorage Hands-onJS-Everywhere - LocalStorage Hands-on
JS-Everywhere - LocalStorage Hands-onBrice Argenson
 
Falling in Love with Frontend Exception | Devon 2012
Falling in Love with Frontend Exception | Devon 2012Falling in Love with Frontend Exception | Devon 2012
Falling in Love with Frontend Exception | Devon 2012Daum DNA
 
Performance for Product Developers
Performance for Product DevelopersPerformance for Product Developers
Performance for Product DevelopersMatthew Wilkes
 
Arnaud Porterie - The Truth About C++
Arnaud Porterie - The Truth About C++Arnaud Porterie - The Truth About C++
Arnaud Porterie - The Truth About C++Arnaud Porterie
 

Similar to Triage: real-world error logging for web applications (20)

[Phind] Miracle
[Phind] Miracle[Phind] Miracle
[Phind] Miracle
 
Rubypalooza 2009
Rubypalooza 2009Rubypalooza 2009
Rubypalooza 2009
 
Disposable Testing Environments: There's Nothing Like Production Except Produ...
Disposable Testing Environments: There's Nothing Like Production Except Produ...Disposable Testing Environments: There's Nothing Like Production Except Produ...
Disposable Testing Environments: There's Nothing Like Production Except Produ...
 
Cloud4all Architecture Overview
Cloud4all Architecture OverviewCloud4all Architecture Overview
Cloud4all Architecture Overview
 
Pagetypes
PagetypesPagetypes
Pagetypes
 
Html5 new sword for interactive app
Html5 new sword for interactive appHtml5 new sword for interactive app
Html5 new sword for interactive app
 
Responsive Web Design & Workflow
Responsive Web Design & WorkflowResponsive Web Design & Workflow
Responsive Web Design & Workflow
 
99 inception-deck
99 inception-deck99 inception-deck
99 inception-deck
 
Cloud Tech III: Actionable Metrics
Cloud Tech III: Actionable MetricsCloud Tech III: Actionable Metrics
Cloud Tech III: Actionable Metrics
 
Caching, sharding, distributing - Scaling best practices
Caching, sharding, distributing - Scaling best practicesCaching, sharding, distributing - Scaling best practices
Caching, sharding, distributing - Scaling best practices
 
Cross-platform tools for mobile application development
Cross-platform tools for mobile application developmentCross-platform tools for mobile application development
Cross-platform tools for mobile application development
 
[JVMLS 12] Kotlin / Java Interop
[JVMLS 12] Kotlin / Java Interop[JVMLS 12] Kotlin / Java Interop
[JVMLS 12] Kotlin / Java Interop
 
100% JS
100% JS100% JS
100% JS
 
Core Data in Motion
Core Data in MotionCore Data in Motion
Core Data in Motion
 
JS-Everywhere - LocalStorage Hands-on
JS-Everywhere - LocalStorage Hands-onJS-Everywhere - LocalStorage Hands-on
JS-Everywhere - LocalStorage Hands-on
 
Falling in Love with Frontend Exception | Devon 2012
Falling in Love with Frontend Exception | Devon 2012Falling in Love with Frontend Exception | Devon 2012
Falling in Love with Frontend Exception | Devon 2012
 
Firefoxos bcndevcon
Firefoxos bcndevconFirefoxos bcndevcon
Firefoxos bcndevcon
 
Performance for Product Developers
Performance for Product DevelopersPerformance for Product Developers
Performance for Product Developers
 
Arnaud Porterie - The Truth About C++
Arnaud Porterie - The Truth About C++Arnaud Porterie - The Truth About C++
Arnaud Porterie - The Truth About C++
 
Cloudera Desktop
Cloudera DesktopCloudera Desktop
Cloudera Desktop
 

Recently uploaded

TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jNeo4j
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPTiSEO AI
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfUK Journal
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Hiroshi SHIBATA
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 

Recently uploaded (20)

TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 

Triage: real-world error logging for web applications

  • 2. Triage Dealing with errors in production PyCon Australia 2012 Luke Cawood / @lwcd Lars Yencken / @larsyencken Sunday, 19 August 12
  • 5. Balancer Cache Cache App App App App Memcache DB DB Queue Worker Sunday, 19 August 12
  • 6. Balancer Cache App App App App Memcache DB Queue Worker Sunday, 19 August 12
  • 11. Triage • Improve signal to noise ratio by aggregating similar errors • Allow for claiming, resolving and ranking errors in terms of importance • Integration with github, build tools • Play with new tools and technology • Provide open source alternative to commercial products in this space Sunday, 19 August 12
  • 13. Round 1(Fight!) • Errors continue to log directly to mongo • Aggregation via incremental MapReduce • Deliver a prototype in one day Sunday, 19 August 12
  • 15. Scalability Fatality! • Worked fine during development • Production load caused the MapReduce to asplode! • (Not that we have a lot of errors, right?!) Sunday, 19 August 12
  • 16. Round 2 Sunday, 19 August 12
  • 17. (sub)zeroMQ • Async error API using zeroMQ pub/sub sockets • MessagePack as error format (fast, binary) • Aggregation in python Sunday, 19 August 12
  • 18. Aggregation Method • Generate hash in python based on error document • Query mongo for error hash • Create or update error document based on outcome of query, incrementing counters etc where appropriate Sunday, 19 August 12
  • 22. Scalability Fatality 2 • Multithreaded experiments • Mongo optimisations • There is no schema • The cake is a lie • Mongo ‘upsert’ rocks! Sunday, 19 August 12
  • 23. Updating like a boss collection.update(criteria, document, upsert=False) Sunday, 19 August 12
  • 24. Updating like a boss collection.update(criteria, document, upsert=False) Sunday, 19 August 12
  • 25. Updating like a boss collection.update(criteria, document, upsert=False) Sunday, 19 August 12
  • 26. Updating like a boss collection.update(criteria, document, upsert=False) Sunday, 19 August 12
  • 27. Updating like a boss collection.update(criteria, document, upsert=False) Sunday, 19 August 12
  • 30. Outcomes • Getting the ‘right’ level of grouping hard • What to do with errors that just wont go away? • Error occurrence count - what does this tell us? Sunday, 19 August 12
  • 31. Future • Easier installation, package in pypi • Better language support (plz halp) • Drop in replacement for airbrake etc • Client side logging (javascript) • Email style filters & actions - ifttt.com Sunday, 19 August 12
  • 32. Thanks • 99designs for research and development time • Contributors: • Luke Cawood - Project lead • Josh Benham - Developer • Jamison Lu - Developer • Additional assistance • Lars Yencken - Operations • 99designs UX team Sunday, 19 August 12
  • 33. Thanks for listening! https://github.com/lwc/triage Sunday, 19 August 12