SlideShare uma empresa Scribd logo
1 de 33
Processing
Firefox Crash
Reports with
Python

laura@ mozilla.com
@lxt
Overview



• The basics
• The numbers
• Work process and tools
The basics
Socorro




          Very Large Array at Socorro, New Mexico, USA. Photo taken by Hajor, 08.Aug.2004. Released under cc.by.sa
          and/or GFDL. Source: http://en.wikipedia.org/wiki/File:USA.NM.VeryLargeArray.02.jpg
“Socorro has a lot of moving parts”
...
“I prefer to think of them as dancing parts”
Basic architecture (simplified)

              Collector   Crashmover




              cronjobs     HBase        Monitor




             Middleware   PostgreSQL   Processor




              Webapp
Lifetime of a crash


 • Raw crash submitted by user via POST (metadata json +
    minidump)

 • Collected to disk by collector (web.py WSGI app)
 • Moved to HBase by crashmover
 • Noticed in queue by monitor and assigned for processing
Processing


• Processor spins off minidumpstackwalk (MDSW)
• MDSW re-unites raw crash with symbols to generate a stack
• Processor generates a signature and pulls out other data
• Processor writes processed crash back to HBase and to
  PostgreSQL
Back end processing
 Large number of cron jobs, e.g.:

  • Calculate aggregates: Top crashers by signature, URL,
     domain

  • Process incoming builds from ftp server
  • Match known crashes to bugzilla bugs
  • Duplicate detection
  • Match up pairs of dumps (OOPP, content crashes, etc)
  • Generate extracts (CSV) for engineers to analyze
Middleware


• Moving all data access to be through REST API (by end of
   year)

• (Still some queries in webapp)
• Enable other front ends to data and us to rewrite webapp
   using Django in 2012

• In upcoming version (late 2011, 2012) each component will
   have its own API for status and health checks
Webapp


• Hard part here: how to visualize some of this data
• Example: nightly builds, moving to reporting in build time
   rather than clock time

• Code a bit crufty: rewrite in 2012
• Currently KohanaPHP, will be (likely) Django
Implementation details


 • Python 2.6 mostly (except PHP for the webapp)
 • PostgreSQL9.1, some stored procs in pgpl/sql
 • memcache for the webapp
 • Thrift for HBase access
 • HBase we use CDH3
Scale
A different type of scaling:
• Typical webapp: scale to millions of users without
   degradation of response time
• Socorro: less than a hundred users, terabytes of data.
Basic law of scale still applies:
The bigger you get, the more spectacularly you fail
Some numbers



• At peak we receive 2300 crashes per minute
• 2.5 million per day
• Median crash size 150k, max size 20MB (reject bigger)
• ~110TB stored in HDFS (3x replication, ~40TB of HBase data)
What can we do?

• Does betaN have more (null signature) crashes than other
   betas?
• Analyze differences between Flash versions x and y crashes
• Detect duplicate crashes
• Detect explosive crashes
• Find “frankeninstalls”
• Email victims of a malware-related crash
Implementation scale


• > 115 physical boxes (not cloud)
• Now up to 8 developers + sysadmins + QA + Hadoop ops/
  analysts

  • (Yes, hiring.   mozilla.org/careers)

• Deploy approximately weekly but could do continuous if
  needed
Managing complexity
Development process


• Fork
• Hard to install: use a VM (more in a moment)
• Pull request with bugfix/feature
• Code review
• Lands on master
Development process -2

• Jenkins polls github master, picks up changes
• Jenkins runs tests, builds a package
• Package automatically picked up and pushed to dev
• Wanted changes merged to release branch
• Jenkins builds release branch, manual push to stage
• QA runs acceptance on stage (Selenium/Jenkins + manual)
• Manual push same build to production
Note “Manual” deployment =

• Run a single script with the build as
   parameter

• Pushes it out to all machines and restarts
   where needed
ABSOLUTELY CRITICAL

All the machinery for continuous deployment
even if you don’t want to deploy continuously
Configuration management


• Some releases involve a configuration change
• These are controlled and managed through Puppet
• Again, a single line to change config the same way every
   time

• Config controlled the same way on dev and stage; tested the
   same way; deployed the same way
Virtualization


 • You don’t want to install HBase
 • Use Vagrant to set up a virtual machine
 • Use Jenkins to build a new Vagrant VM with each code build
 • Use Puppet to configure the VM the same way as production
 • Tricky part is still getting the right data
Virtualization - 2



• VM work at
       github.com/rhelmer/socorro-vagrant

• Also pulled in as a submodule to socorro
Upcoming


• ElasticSearch implemented for better search including
   faceted search, waiting on hardware to pref it on

• More analytics: automatic detection of explosive crashes,
   malware, etc

• Better queueing: looking at Sagrada queue
• Grand Unified Configuration System
Everything is open (source)


  Fork: https://github.com/mozilla/socorro
  Read/file/fix bugs: https://bugzilla.mozilla.org/
  Docs: http://www.readthedocs.org/docs/socorro
  Mailing list: https://lists.mozilla.org/listinfo/tools-socorro
  Join us in IRC: irc.mozilla.org #breakpad
Questions?




• Ask me (almost) anything, now or later
• laura@mozilla.com

Mais conteúdo relacionado

Mais procurados

Easily create dashboards to manage your databases with OVH
Easily create dashboards to manage your databases with OVH Easily create dashboards to manage your databases with OVH
Easily create dashboards to manage your databases with OVH OVHcloud
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedTim Callaghan
 
PyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPiyush Kumar
 
Introduction to Systems Management with SaltStack
Introduction to Systems Management with SaltStackIntroduction to Systems Management with SaltStack
Introduction to Systems Management with SaltStackCraig Sebenik
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 Peopleconfluent
 
London devops logging
London devops loggingLondon devops logging
London devops loggingTomas Doran
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Docker, Inc.
 
Rust with-kafka-07-02-2019
Rust with-kafka-07-02-2019Rust with-kafka-07-02-2019
Rust with-kafka-07-02-2019Gerard Klijs
 
High Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance TuningHigh Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance TuningAlbert Chen
 
Testing Below the Application
Testing Below the ApplicationTesting Below the Application
Testing Below the ApplicationAsh Winter
 
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...HostedbyConfluent
 
Python performance profiling
Python performance profilingPython performance profiling
Python performance profilingJon Haddad
 
Hacking on WildFly 9
Hacking on WildFly 9Hacking on WildFly 9
Hacking on WildFly 9JBUG London
 
Rust kafka-5-2019-unskip
Rust kafka-5-2019-unskipRust kafka-5-2019-unskip
Rust kafka-5-2019-unskipGerard Klijs
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and MessagingXin Wang
 
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache KafkaKafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafkaconfluent
 
Log aggregation and analysis
Log aggregation and analysisLog aggregation and analysis
Log aggregation and analysisDhaval Mehta
 
Zabbix 3.2 presentation June 2017
Zabbix 3.2 presentation June 2017Zabbix 3.2 presentation June 2017
Zabbix 3.2 presentation June 2017Amirhossein Saberi
 

Mais procurados (20)

Easily create dashboards to manage your databases with OVH
Easily create dashboards to manage your databases with OVH Easily create dashboards to manage your databases with OVH
Easily create dashboards to manage your databases with OVH
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
 
PyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPyCon India 2012: Celery Talk
PyCon India 2012: Celery Talk
 
Introduction to Systems Management with SaltStack
Introduction to Systems Management with SaltStackIntroduction to Systems Management with SaltStack
Introduction to Systems Management with SaltStack
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
 
London devops logging
London devops loggingLondon devops logging
London devops logging
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
 
Rust with-kafka-07-02-2019
Rust with-kafka-07-02-2019Rust with-kafka-07-02-2019
Rust with-kafka-07-02-2019
 
High Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance TuningHigh Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance Tuning
 
Testing Below the Application
Testing Below the ApplicationTesting Below the Application
Testing Below the Application
 
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
 
Python performance profiling
Python performance profilingPython performance profiling
Python performance profiling
 
Apache pulsar
Apache pulsarApache pulsar
Apache pulsar
 
Hacking on WildFly 9
Hacking on WildFly 9Hacking on WildFly 9
Hacking on WildFly 9
 
Rust kafka-5-2019-unskip
Rust kafka-5-2019-unskipRust kafka-5-2019-unskip
Rust kafka-5-2019-unskip
 
London HUG 12/4
London HUG 12/4London HUG 12/4
London HUG 12/4
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and Messaging
 
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache KafkaKafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
 
Log aggregation and analysis
Log aggregation and analysisLog aggregation and analysis
Log aggregation and analysis
 
Zabbix 3.2 presentation June 2017
Zabbix 3.2 presentation June 2017Zabbix 3.2 presentation June 2017
Zabbix 3.2 presentation June 2017
 

Destaque

Everything you ever wanted to know about deployment but were afraid to ask
Everything you ever wanted to know about deployment but were afraid to askEverything you ever wanted to know about deployment but were afraid to ask
Everything you ever wanted to know about deployment but were afraid to asklauraxthomson
 
Classroom Blog Math Conf 08
Classroom Blog Math Conf 08Classroom Blog Math Conf 08
Classroom Blog Math Conf 08Darren Mosley
 
Credit Crisis in 30 slides
Credit Crisis in 30 slidesCredit Crisis in 30 slides
Credit Crisis in 30 slidesChrisJCook
 
Introducing the PetroTrust
Introducing the PetroTrustIntroducing the PetroTrust
Introducing the PetroTrustChrisJCook
 
Partnership Finance October 2010
Partnership Finance October 2010Partnership Finance October 2010
Partnership Finance October 2010ChrisJCook
 
Presentation to Combined Heat and Power Roadshow 30 05_13
Presentation to Combined Heat and Power Roadshow 30 05_13Presentation to Combined Heat and Power Roadshow 30 05_13
Presentation to Combined Heat and Power Roadshow 30 05_13ChrisJCook
 
St Margarets Partnership June 2010
St Margarets Partnership June 2010St Margarets Partnership June 2010
St Margarets Partnership June 2010ChrisJCook
 
Talk proposal get_accepted
Talk proposal get_acceptedTalk proposal get_accepted
Talk proposal get_acceptedlauraxthomson
 
Rewrite or refactor: When to declare technical bankruptcy
Rewrite or refactor: When to declare technical bankruptcyRewrite or refactor: When to declare technical bankruptcy
Rewrite or refactor: When to declare technical bankruptcylauraxthomson
 
Energy Pools - Scottish Energy Institute 11 11 2009
Energy Pools - Scottish Energy Institute 11 11 2009Energy Pools - Scottish Energy Institute 11 11 2009
Energy Pools - Scottish Energy Institute 11 11 2009ChrisJCook
 
Energy Pool 20 05 2009
Energy Pool 20 05 2009Energy Pool 20 05 2009
Energy Pool 20 05 2009ChrisJCook
 

Destaque (14)

Everything you ever wanted to know about deployment but were afraid to ask
Everything you ever wanted to know about deployment but were afraid to askEverything you ever wanted to know about deployment but were afraid to ask
Everything you ever wanted to know about deployment but were afraid to ask
 
Classroom Blog Math Conf 08
Classroom Blog Math Conf 08Classroom Blog Math Conf 08
Classroom Blog Math Conf 08
 
Credit Crisis in 30 slides
Credit Crisis in 30 slidesCredit Crisis in 30 slides
Credit Crisis in 30 slides
 
Introducing the PetroTrust
Introducing the PetroTrustIntroducing the PetroTrust
Introducing the PetroTrust
 
Partnership Finance October 2010
Partnership Finance October 2010Partnership Finance October 2010
Partnership Finance October 2010
 
Presentation to Combined Heat and Power Roadshow 30 05_13
Presentation to Combined Heat and Power Roadshow 30 05_13Presentation to Combined Heat and Power Roadshow 30 05_13
Presentation to Combined Heat and Power Roadshow 30 05_13
 
St Margarets Partnership June 2010
St Margarets Partnership June 2010St Margarets Partnership June 2010
St Margarets Partnership June 2010
 
Talk proposal get_accepted
Talk proposal get_acceptedTalk proposal get_accepted
Talk proposal get_accepted
 
Rewrite or refactor: When to declare technical bankruptcy
Rewrite or refactor: When to declare technical bankruptcyRewrite or refactor: When to declare technical bankruptcy
Rewrite or refactor: When to declare technical bankruptcy
 
Migrating big data
Migrating big dataMigrating big data
Migrating big data
 
Energy Pools - Scottish Energy Institute 11 11 2009
Energy Pools - Scottish Energy Institute 11 11 2009Energy Pools - Scottish Energy Institute 11 11 2009
Energy Pools - Scottish Energy Institute 11 11 2009
 
No Fracase En Su Negocio
No Fracase En Su NegocioNo Fracase En Su Negocio
No Fracase En Su Negocio
 
Money 3.0
Money 3.0Money 3.0
Money 3.0
 
Energy Pool 20 05 2009
Energy Pool 20 05 2009Energy Pool 20 05 2009
Energy Pool 20 05 2009
 

Semelhante a Crash reports pycodeconf

Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Tuenti Release Workflow v1.1
Tuenti Release Workflow v1.1Tuenti Release Workflow v1.1
Tuenti Release Workflow v1.1Tuenti
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing Environment
DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing EnvironmentDCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing Environment
DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing EnvironmentDocker, Inc.
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging振东 刘
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQLKonstantin Gredeskoul
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.jsorkaplan
 
Midwest PHP - Scaling Magento
Midwest PHP - Scaling MagentoMidwest PHP - Scaling Magento
Midwest PHP - Scaling MagentoMathew Beane
 
Spot Trading - A case study in continuous delivery for mission critical finan...
Spot Trading - A case study in continuous delivery for mission critical finan...Spot Trading - A case study in continuous delivery for mission critical finan...
Spot Trading - A case study in continuous delivery for mission critical finan...SaltStack
 
MySQL Infrastructure Testing Automation at GitHub
MySQL Infrastructure Testing Automation at GitHubMySQL Infrastructure Testing Automation at GitHub
MySQL Infrastructure Testing Automation at GitHubIke Walker
 
A Tale of 2 Systems
A Tale of 2 SystemsA Tale of 2 Systems
A Tale of 2 SystemsDavid Newman
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)Panagiotis Kanavos
 
KACE Agent Architecture and Troubleshooting Overview
KACE Agent Architecture and Troubleshooting OverviewKACE Agent Architecture and Troubleshooting Overview
KACE Agent Architecture and Troubleshooting OverviewDell World
 

Semelhante a Crash reports pycodeconf (20)

Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
JavaScript Event Loop
JavaScript Event LoopJavaScript Event Loop
JavaScript Event Loop
 
Tuenti Release Workflow v1.1
Tuenti Release Workflow v1.1Tuenti Release Workflow v1.1
Tuenti Release Workflow v1.1
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing Environment
DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing EnvironmentDCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing Environment
DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing Environment
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
 
Midwest PHP - Scaling Magento
Midwest PHP - Scaling MagentoMidwest PHP - Scaling Magento
Midwest PHP - Scaling Magento
 
Spot Trading - A case study in continuous delivery for mission critical finan...
Spot Trading - A case study in continuous delivery for mission critical finan...Spot Trading - A case study in continuous delivery for mission critical finan...
Spot Trading - A case study in continuous delivery for mission critical finan...
 
MySQL Infrastructure Testing Automation at GitHub
MySQL Infrastructure Testing Automation at GitHubMySQL Infrastructure Testing Automation at GitHub
MySQL Infrastructure Testing Automation at GitHub
 
A Tale of 2 Systems
A Tale of 2 SystemsA Tale of 2 Systems
A Tale of 2 Systems
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
 
KACE Agent Architecture and Troubleshooting Overview
KACE Agent Architecture and Troubleshooting OverviewKACE Agent Architecture and Troubleshooting Overview
KACE Agent Architecture and Troubleshooting Overview
 

Último

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Último (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Crash reports pycodeconf

  • 2. Overview • The basics • The numbers • Work process and tools
  • 4. Socorro Very Large Array at Socorro, New Mexico, USA. Photo taken by Hajor, 08.Aug.2004. Released under cc.by.sa and/or GFDL. Source: http://en.wikipedia.org/wiki/File:USA.NM.VeryLargeArray.02.jpg
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. “Socorro has a lot of moving parts” ... “I prefer to think of them as dancing parts”
  • 10. Basic architecture (simplified) Collector Crashmover cronjobs HBase Monitor Middleware PostgreSQL Processor Webapp
  • 11. Lifetime of a crash • Raw crash submitted by user via POST (metadata json + minidump) • Collected to disk by collector (web.py WSGI app) • Moved to HBase by crashmover • Noticed in queue by monitor and assigned for processing
  • 12. Processing • Processor spins off minidumpstackwalk (MDSW) • MDSW re-unites raw crash with symbols to generate a stack • Processor generates a signature and pulls out other data • Processor writes processed crash back to HBase and to PostgreSQL
  • 13. Back end processing Large number of cron jobs, e.g.: • Calculate aggregates: Top crashers by signature, URL, domain • Process incoming builds from ftp server • Match known crashes to bugzilla bugs • Duplicate detection • Match up pairs of dumps (OOPP, content crashes, etc) • Generate extracts (CSV) for engineers to analyze
  • 14. Middleware • Moving all data access to be through REST API (by end of year) • (Still some queries in webapp) • Enable other front ends to data and us to rewrite webapp using Django in 2012 • In upcoming version (late 2011, 2012) each component will have its own API for status and health checks
  • 15. Webapp • Hard part here: how to visualize some of this data • Example: nightly builds, moving to reporting in build time rather than clock time • Code a bit crufty: rewrite in 2012 • Currently KohanaPHP, will be (likely) Django
  • 16. Implementation details • Python 2.6 mostly (except PHP for the webapp) • PostgreSQL9.1, some stored procs in pgpl/sql • memcache for the webapp • Thrift for HBase access • HBase we use CDH3
  • 17. Scale
  • 18. A different type of scaling: • Typical webapp: scale to millions of users without degradation of response time • Socorro: less than a hundred users, terabytes of data.
  • 19. Basic law of scale still applies: The bigger you get, the more spectacularly you fail
  • 20. Some numbers • At peak we receive 2300 crashes per minute • 2.5 million per day • Median crash size 150k, max size 20MB (reject bigger) • ~110TB stored in HDFS (3x replication, ~40TB of HBase data)
  • 21. What can we do? • Does betaN have more (null signature) crashes than other betas? • Analyze differences between Flash versions x and y crashes • Detect duplicate crashes • Detect explosive crashes • Find “frankeninstalls” • Email victims of a malware-related crash
  • 22. Implementation scale • > 115 physical boxes (not cloud) • Now up to 8 developers + sysadmins + QA + Hadoop ops/ analysts • (Yes, hiring. mozilla.org/careers) • Deploy approximately weekly but could do continuous if needed
  • 24. Development process • Fork • Hard to install: use a VM (more in a moment) • Pull request with bugfix/feature • Code review • Lands on master
  • 25. Development process -2 • Jenkins polls github master, picks up changes • Jenkins runs tests, builds a package • Package automatically picked up and pushed to dev • Wanted changes merged to release branch • Jenkins builds release branch, manual push to stage • QA runs acceptance on stage (Selenium/Jenkins + manual) • Manual push same build to production
  • 26. Note “Manual” deployment = • Run a single script with the build as parameter • Pushes it out to all machines and restarts where needed
  • 27. ABSOLUTELY CRITICAL All the machinery for continuous deployment even if you don’t want to deploy continuously
  • 28. Configuration management • Some releases involve a configuration change • These are controlled and managed through Puppet • Again, a single line to change config the same way every time • Config controlled the same way on dev and stage; tested the same way; deployed the same way
  • 29. Virtualization • You don’t want to install HBase • Use Vagrant to set up a virtual machine • Use Jenkins to build a new Vagrant VM with each code build • Use Puppet to configure the VM the same way as production • Tricky part is still getting the right data
  • 30. Virtualization - 2 • VM work at github.com/rhelmer/socorro-vagrant • Also pulled in as a submodule to socorro
  • 31. Upcoming • ElasticSearch implemented for better search including faceted search, waiting on hardware to pref it on • More analytics: automatic detection of explosive crashes, malware, etc • Better queueing: looking at Sagrada queue • Grand Unified Configuration System
  • 32. Everything is open (source) Fork: https://github.com/mozilla/socorro Read/file/fix bugs: https://bugzilla.mozilla.org/ Docs: http://www.readthedocs.org/docs/socorro Mailing list: https://lists.mozilla.org/listinfo/tools-socorro Join us in IRC: irc.mozilla.org #breakpad
  • 33. Questions? • Ask me (almost) anything, now or later • laura@mozilla.com

Notas do Editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n