SlideShare uma empresa Scribd logo
1 de 39
How to Meta-Sumo
Using Logs for Agile Monitoring of
Production Services

Christian Beedgen
christian@sumologic.com
@raychaser
Who Am I?

     Co-Founder & CTO, Sumo Logic since 2010
      – Cloud-based log management and analytics
      – Applications, operations, security
     Server guy, Chief Architect, ArcSight, 2001 – 2009
      – Major SIEM player in the enterprise space
      – Log management for security and compliance




 2
Yo GlueCon!!@!!211




 3
Yo GlueCon!!@!!211




 4
5
What are Logs?

     The murmurings and whispers of your infrastructure
      – Devices & Services (ex. Security, Email, Authentication)
      – Applications (your code, or 3rd party applications)
     Written to disk by applications
      – Used during development by developers
      – And then often ignored in production
     Yes, Logs are Big Data
      – There’s a lot of Logs ( Volume)
      – A large Variety of formats, plus it’s real-time ( Velocity)
     Free-form text, usually one message per line
      – Scrolls by in your terminal
      – Makes you feel like you’re in the matrix


 6
Application Logging

     Just do it*
      – Use what’s common in your language (Log4J, …)
      – If in doubt, log it - you might need it later, and disk is cheap
     The obvious stuff
      – Always log a precise timestamp
      – Add a log level for filtering
     Building a distributed system?
      – Always add the process/service/module name
      – Add the host ID if you can
     Log the context
      – When processing a request, remember who it is from
      – Within the scope of the request, always log the context
     Bring it all together in a single place
      – Open Source options
      – Commercial options

 7
Just Do It*

     Actually, please do think first
      – Do not log passwords
      – Do not log credit card numbers
      – Do not log anything that’s PII




 8
Or Else…




 http://www.slideshare.net/christoferhoff/shit-my-cloud-evangelist-saysjust-not-to-my-cso




 9
Log Management




 10
Log Management




 11
12
This is the Meta part

      Our system to collect and analyze Logs is actually a
      distributed, cloud-based infrastructure that itself
      generates a ton of logs
      We are running a second, smaller instance that
      receives all the logs from the Prod system – we call
      this our Shadow system
      A majority of the management and monitoring of the
      Prod system is done via the Shadow system, which
      we are madly in love with



 13
My team




 14
My team




          No, really. And we also have a Panda.


 15
Example




 16
Example




      Timestamp with time zone!




 17
Example




      Timestamp with time zone!
      Log level




 18
Example




      Timestamp with time zone!
      Log level
      Host ID & module name (process/service)




 19
Example




      Timestamp with time zone!
      Log level
      Host ID & module name (process/service)
      Code location or class




 20
Example




      Timestamp with time zone!
      Log level
      Host ID & module name (process/service)
      Code location or class
      Authentication context



 21
Example




      Timestamp with time zone!
      Log level
      Host ID & module name (process/service)
      Code location or class
      Authentication context
      Key-value pairs

 22
What about structure?

      Should you rather use JSON, say?
       – Hey, it’s really up to you – machines love it
       – But do humans read the logs as well?
      Modern tools extract structure if there is structure
       – Even if the structure doesn’t conform to a protocol
       – So you don’t strictly have to use a structured format




 23
24
Sumo Pet Trick #1 – Basic Troubleshooting

      All errors in the application have an error instance ID




       UI always tries very hard to display the ID
        – User can easily copy/paste the error ID into a ticket


 25
Sumo Pet Trick #1 – Basic Troubleshooting

      Find AP2V3-HZICT-BP6UO in the logs!




 26
Sumo Pet Trick #2 – Everything breaks

      Distributed systems  lots of different components
       – There will be fail
       – The morning coffee routine…
      There is beauty in numbers
       – Pull out all the logs that contain the term “error” (or “fail”,
         or “exception”, or all of the above)
       – Then count by process/service/module, or host ID
      Even the most basic aggregation will create
      actionable insight



 27
Sumo Pet Trick #2 – Everything breaks




 28
Sumo Pet Trick #2 – Everything breaks




                Schedule
                  this!


 29
Sumo Pet Trick #2 – Everything breaks

      Old school can rock this too!




 30
Sumo Pet Trick #3 – API Choke Points

      What is my API doing?
       – Which APIs are being called?
       – Average execution times per API
       – Who’s calling?




 31
Sumo Pet Trick #3 – API Choke Points

      Number of calls by API




 32
Sumo Pet Trick #3 – API Choke Points

      Average execution time by API




 33
Sumo Pet Trick #3 – API Choke Points

      Number of API calls by user




 34
Sumo Pet Trick #4 – Which searches are running

      Log a session ID on start and stop
       – When there’s long-running operations
       – More than one node participates in generating the result
      For each session in the time range, did you get both
      the start and the stop message?




 35
Sumo Pet Trick #4 – Which searches are running




 36
Sumo Pet Trick #4 – Which searches are running




 37
https://www.sumologic.com/free-trial/



     Christian Beedgen
     christian@sumologic.com
     @raychaser



38
Thank you.
 Ok, and now we start drinking…




 39
http://iam.cat/cat-love-beer

Mais conteúdo relacionado

Semelhante a How to Meta-Sumo - Using Logs for Agile Monitoring of Production Services

Semelhante a How to Meta-Sumo - Using Logs for Agile Monitoring of Production Services (20)

Setting Up Sumo Logic - Apr 2017
Setting Up Sumo Logic - Apr 2017Setting Up Sumo Logic - Apr 2017
Setting Up Sumo Logic - Apr 2017
 
Dev opsdays 2018 - Observability, the practical approach
Dev opsdays 2018 - Observability, the practical approachDev opsdays 2018 - Observability, the practical approach
Dev opsdays 2018 - Observability, the practical approach
 
Observability, the practical approach - Anton Drukh - DevOpsDays Tel Aviv 2018
Observability, the practical approach - Anton Drukh - DevOpsDays Tel Aviv 2018Observability, the practical approach - Anton Drukh - DevOpsDays Tel Aviv 2018
Observability, the practical approach - Anton Drukh - DevOpsDays Tel Aviv 2018
 
Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015
 
ThoughtWorks Tech Talks NYC: DevOops, 10 Ops Things You Might Have Forgotten ...
ThoughtWorks Tech Talks NYC: DevOops, 10 Ops Things You Might Have Forgotten ...ThoughtWorks Tech Talks NYC: DevOops, 10 Ops Things You Might Have Forgotten ...
ThoughtWorks Tech Talks NYC: DevOops, 10 Ops Things You Might Have Forgotten ...
 
Six Mistakes of Log Management 2008
Six Mistakes of Log Management 2008Six Mistakes of Log Management 2008
Six Mistakes of Log Management 2008
 
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
 
Using Sumo Logic - Apr 2018
Using Sumo Logic - Apr 2018Using Sumo Logic - Apr 2018
Using Sumo Logic - Apr 2018
 
How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)
 
Packer Genetics: The selfish code
Packer Genetics: The selfish codePacker Genetics: The selfish code
Packer Genetics: The selfish code
 
Sumo Logic QuickStat - Apr 2017
Sumo Logic QuickStat - Apr 2017Sumo Logic QuickStat - Apr 2017
Sumo Logic QuickStat - Apr 2017
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
 
TDC 2015 - POA - Trilha PHP - Shit Happens
TDC 2015 - POA - Trilha PHP - Shit HappensTDC 2015 - POA - Trilha PHP - Shit Happens
TDC 2015 - POA - Trilha PHP - Shit Happens
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
 
Python for Machine Learning
Python for Machine LearningPython for Machine Learning
Python for Machine Learning
 
LogChaos: Challenges and Opportunities of Security Log Standardization
LogChaos: Challenges and Opportunities of Security Log StandardizationLogChaos: Challenges and Opportunities of Security Log Standardization
LogChaos: Challenges and Opportunities of Security Log Standardization
 
Monitoring As Code: How to Integrate App Monitoring Into Your Developer Cycle
Monitoring As Code: How to Integrate App Monitoring Into Your Developer CycleMonitoring As Code: How to Integrate App Monitoring Into Your Developer Cycle
Monitoring As Code: How to Integrate App Monitoring Into Your Developer Cycle
 
Splunk for ITOA Breakout Session
Splunk for ITOA Breakout SessionSplunk for ITOA Breakout Session
Splunk for ITOA Breakout Session
 
[Pinto] Is my SharePoint Development team properly enlighted?
[Pinto] Is my SharePoint Development team properly enlighted?[Pinto] Is my SharePoint Development team properly enlighted?
[Pinto] Is my SharePoint Development team properly enlighted?
 
An Introduction to Microservices
An Introduction to MicroservicesAn Introduction to Microservices
An Introduction to Microservices
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

How to Meta-Sumo - Using Logs for Agile Monitoring of Production Services

  • 1. How to Meta-Sumo Using Logs for Agile Monitoring of Production Services Christian Beedgen christian@sumologic.com @raychaser
  • 2. Who Am I? Co-Founder & CTO, Sumo Logic since 2010 – Cloud-based log management and analytics – Applications, operations, security Server guy, Chief Architect, ArcSight, 2001 – 2009 – Major SIEM player in the enterprise space – Log management for security and compliance 2
  • 5. 5
  • 6. What are Logs? The murmurings and whispers of your infrastructure – Devices & Services (ex. Security, Email, Authentication) – Applications (your code, or 3rd party applications) Written to disk by applications – Used during development by developers – And then often ignored in production Yes, Logs are Big Data – There’s a lot of Logs ( Volume) – A large Variety of formats, plus it’s real-time ( Velocity) Free-form text, usually one message per line – Scrolls by in your terminal – Makes you feel like you’re in the matrix 6
  • 7. Application Logging Just do it* – Use what’s common in your language (Log4J, …) – If in doubt, log it - you might need it later, and disk is cheap The obvious stuff – Always log a precise timestamp – Add a log level for filtering Building a distributed system? – Always add the process/service/module name – Add the host ID if you can Log the context – When processing a request, remember who it is from – Within the scope of the request, always log the context Bring it all together in a single place – Open Source options – Commercial options 7
  • 8. Just Do It* Actually, please do think first – Do not log passwords – Do not log credit card numbers – Do not log anything that’s PII 8
  • 12. 12
  • 13. This is the Meta part Our system to collect and analyze Logs is actually a distributed, cloud-based infrastructure that itself generates a ton of logs We are running a second, smaller instance that receives all the logs from the Prod system – we call this our Shadow system A majority of the management and monitoring of the Prod system is done via the Shadow system, which we are madly in love with 13
  • 15. My team No, really. And we also have a Panda. 15
  • 17. Example Timestamp with time zone! 17
  • 18. Example Timestamp with time zone! Log level 18
  • 19. Example Timestamp with time zone! Log level Host ID & module name (process/service) 19
  • 20. Example Timestamp with time zone! Log level Host ID & module name (process/service) Code location or class 20
  • 21. Example Timestamp with time zone! Log level Host ID & module name (process/service) Code location or class Authentication context 21
  • 22. Example Timestamp with time zone! Log level Host ID & module name (process/service) Code location or class Authentication context Key-value pairs 22
  • 23. What about structure? Should you rather use JSON, say? – Hey, it’s really up to you – machines love it – But do humans read the logs as well? Modern tools extract structure if there is structure – Even if the structure doesn’t conform to a protocol – So you don’t strictly have to use a structured format 23
  • 24. 24
  • 25. Sumo Pet Trick #1 – Basic Troubleshooting All errors in the application have an error instance ID UI always tries very hard to display the ID – User can easily copy/paste the error ID into a ticket 25
  • 26. Sumo Pet Trick #1 – Basic Troubleshooting Find AP2V3-HZICT-BP6UO in the logs! 26
  • 27. Sumo Pet Trick #2 – Everything breaks Distributed systems  lots of different components – There will be fail – The morning coffee routine… There is beauty in numbers – Pull out all the logs that contain the term “error” (or “fail”, or “exception”, or all of the above) – Then count by process/service/module, or host ID Even the most basic aggregation will create actionable insight 27
  • 28. Sumo Pet Trick #2 – Everything breaks 28
  • 29. Sumo Pet Trick #2 – Everything breaks Schedule this! 29
  • 30. Sumo Pet Trick #2 – Everything breaks Old school can rock this too! 30
  • 31. Sumo Pet Trick #3 – API Choke Points What is my API doing? – Which APIs are being called? – Average execution times per API – Who’s calling? 31
  • 32. Sumo Pet Trick #3 – API Choke Points Number of calls by API 32
  • 33. Sumo Pet Trick #3 – API Choke Points Average execution time by API 33
  • 34. Sumo Pet Trick #3 – API Choke Points Number of API calls by user 34
  • 35. Sumo Pet Trick #4 – Which searches are running Log a session ID on start and stop – When there’s long-running operations – More than one node participates in generating the result For each session in the time range, did you get both the start and the stop message? 35
  • 36. Sumo Pet Trick #4 – Which searches are running 36
  • 37. Sumo Pet Trick #4 – Which searches are running 37
  • 38. https://www.sumologic.com/free-trial/ Christian Beedgen christian@sumologic.com @raychaser 38
  • 39. Thank you. Ok, and now we start drinking… 39 http://iam.cat/cat-love-beer