SlideShare uma empresa Scribd logo
1 de 31
Effective Monitoring for 
Demanding Operations 
Environments 
Rodrigue Chakode 
Nagios World Conference, Saint-Paul, MN, US 
2013-10-01
Background 
● Service : generic term to refer an IT functionality (e.g. mysqld service) 
● Business Service/Process : a service provided value-added to 
business applications or to end-users (e.g. hosting service) 
● Check: a probe allowing to detect the status of an IT service (e.g. 
check for mysqld service) 
● Abbreviations 
– BS: Business Service 
– BP: Business Process 
– BSM: Business Service Management 
– OSM: Open Source Monitoring 
– OSMS : Open Source Monitoring System/Software
Basic Monitoring Scheme 
Flat Display, no notion 
of business impact !
“Too many alarms kill alarm” 
S. Bortzmeye
Today's IT infrastructures facts 
● Huge number of checks to handle 
– E.g. 100 hosts, 8 checks/host => 8,00 checks 
● False alerts are the bane of administrators 
– Not a matter of being a lazy admin 
No way for operators to be effective with flat 
display !
Challenges for effective monitoring 
● How a failure actually impacts your business ?
Is there a disruption of services? 
RAID 0 
(striping) 
RAID 1 
(mirroring)
“prioritize and orchestrate work based on business 
needs” http://www.bmc.com/solutions/bsm/
Go beyond individual checks 
● Think business services 
– A failure don't necessarily mean disruptions on 
business applications or end-user services 
● Benefits of BSM 
– Reduce downtime by up to 75% 
– Deliver services up to 30% more efficiently 
– Credit: http://www.bmc.com/solutions/bsm/
Think relational services 
● A business service may depend on : 
– one or many IT services, and/or on 
– other business services 
– E.g. Streaming ← Web Server ← Databases ← 
Network ← Operating System ← Hardware 
Devices...
Service hierarchy and mapping
Service hierarchy and mapping 
Service map ISN'T 
Network map
Apply flexible incident management 
● Only select checks that impact your business 
services 
● Apply advanced severity calculation 
● Set how the severity of a node is computed from on 
the severities of its childs 
– And advanced status propagation rules 
● Set how the severity of a node is propagated to its 
parent
Use cases 
● RAID 0 ● RAID 1 
● Redundant databases ● Merchant-site
Specialize your Operations Dashboards 
● Business service-centric/competency-centric 
● Deal with large/demanding environments 
– Just collect what is useful for each dashboard 
● Get insight in one shot
“takes the IT you already have, and adds to it 
the visibility and control of a unified platform” 
http://www.bmc.com/
Existing options 
● Basic features 
– Nagios BP Add-on, Shinken Business Rules 
– No service map, basic aggregation rules 
– Handle a huge number of services could be tricky
RealOpInsight 
● Powerful Dashboard Toolkit for BSM 
– Generic and versatile add-on supporting many OSM 
tools 
● Qt-based GUI application 
– Powerful and friendly interfaces 
– Cross platform (Linux, Windows, Mac OS X) 
● http://realopinsight.com 
“small and efficient and gets the job done” 
lukaswhite, SourceForget.net
Some Features 
● Effective Operations Management 
– Prioritize incidents based on business impact 
● Advanced customizable event processing rules 
– avg, high impact, decrease, increase... 
● Distributed monitoring made easy 
– Versatile, supports up to 10 monitoring backends simultaneously 
● Free, Open Source and Cross-platform 
– Windows, Linux, OS X 
● More comprehensive messages 
– e.g. “the CPU load on server <IP/hostname> is more than <threshold> 
percent 
● System Tray Notifications
Tree View, Map and Events in one 
Console 
Service Tree 
● Tooltips 
● Focus 
● Service-related 
message 
filtering... 
Service Mapping 
● Tooltips, Zooming, Dragging 
and Scrolling, Focus, Service-related 
message filtering... 
Message Console 
● Trouble view filtering, Large 
font mode
Advanced Incident Management 
● Severity 
aggregation 
● Severity increasing 
● Severity decreasing 
● ...
Simple and Efficient Design 
● Service Views as XML files 
● Native WYSIWYG Editor 
● Dynamic Operations 
Console 
● Simple Integration
Distributed Monitoring/Unified Dashboard 
● Loosely-coupled scalable architecture 
– Status data retrieved through RPC APIs
Ngrt4nd-based Integration - How To 
● Specific daemon on Nagios server 
– See documentation 
● Relies on status.data 
● ZeroMQ-based RPC APIs 
– Authenticated data retrieving 
● Non recommended 
– Non-scalable, delayed status data,
Livestatus-based Integration - How To 
● Xinetd TCP-based RPC over a native UNIX 
socket 
– Xinetd socket over the Livestatus NEB socket 
– /etc/xinetd.d/livestatus 
● Restart Xinetd 
– /etc/init.d/xinetd restart 
● Recommended 
– NEB, scalable, up-to-date data
Source Settings 
Ngrt4nd 
– Monitor Web URL (optional) 
– Auth String 
– Server address 
– Listening port (1983 by default) 
– “Use Livestatus” must be disabled 
Livestatus 
– Monitor Web URL (optional) 
– Server address 
– Listening port 
– “Use Livestatus” must be enabled
Getting started in 3 steps 
● Run the Editor 
… and edit your service view configuration 
● Run the Configuration Manager 
… and set the access to the remote API 
● Run the Operations Console 
… and load the configuration file 
● Then fall in love!
Integration with Nagios 
Service in Nagios 
Service selection in RealOpInsight 
SourceId:]host_name[/service_description] 
Set sources and API access 
ngrt4nd/Livestatus
History: Experience Feedback 1/2 
● 2008 : the Idea 
● May 2010 : 1st lines of code 
● March 2011 (1st release, 1.0) 
– <30 downloads a month 
● May - August 2012 (version 2.0) 
– New architecture, GPLv3 License 
– SourceForge.net, Nagios Exchange 
– Windows Installer 
– 200 downloads a month
History: Experience Feedback 2/2 
● December 2012 (v2.1) 
– Continuous packaging for openSUSE, Fedora and Ubuntu 
● March 2013 (v2.2) 
– 600 downloads a month 
● May 2013 (v2.3) 
– Support for Livestatus API 
● July - September 2013 
– Nagios Affiliate 
– v2.4, adding support of distributed environments 
● Today 
– 7k+ downloads from 120+ countries
And the story continues..., Thanks 
● Web Edition (2014) 
@realopinsight

Mais conteúdo relacionado

Destaque

Repousser les limites de l'agilité
Repousser les limites de l'agilitéRepousser les limites de l'agilité
Repousser les limites de l'agilité
Pascal Poussard
 
Manage services presentation
Manage services presentationManage services presentation
Manage services presentation
Len Moncrieffe
 

Destaque (17)

Quick Start Guide to Managed Services
Quick Start Guide to Managed ServicesQuick Start Guide to Managed Services
Quick Start Guide to Managed Services
 
Repousser les limites de l'agilité
Repousser les limites de l'agilitéRepousser les limites de l'agilité
Repousser les limites de l'agilité
 
Une stratégie basée sur les préférences client, Jean-Louis Nicque
Une stratégie basée sur les préférences client, Jean-Louis NicqueUne stratégie basée sur les préférences client, Jean-Louis Nicque
Une stratégie basée sur les préférences client, Jean-Louis Nicque
 
Volta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceVolta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a Service
 
Agile et Lean : des univers convergents ? par Dimitri Baeli
Agile et Lean : des univers convergents ? par Dimitri BaeliAgile et Lean : des univers convergents ? par Dimitri Baeli
Agile et Lean : des univers convergents ? par Dimitri Baeli
 
Service delivery and Project management
Service delivery and Project managementService delivery and Project management
Service delivery and Project management
 
L’engagement du dirigeant au cœur de la démarche Lean par C.Riboulet et C.Dané
L’engagement du dirigeant au cœur de la démarche Lean par C.Riboulet et C.DanéL’engagement du dirigeant au cœur de la démarche Lean par C.Riboulet et C.Dané
L’engagement du dirigeant au cœur de la démarche Lean par C.Riboulet et C.Dané
 
Les managers face au déploiement du Lean par MC Boutonnet, Philips
Les managers face au déploiement du Lean par MC Boutonnet, PhilipsLes managers face au déploiement du Lean par MC Boutonnet, Philips
Les managers face au déploiement du Lean par MC Boutonnet, Philips
 
Lean, stratégie et résultats par Catherine Chabiron
Lean, stratégie et résultats par Catherine Chabiron Lean, stratégie et résultats par Catherine Chabiron
Lean, stratégie et résultats par Catherine Chabiron
 
Manage services presentation
Manage services presentationManage services presentation
Manage services presentation
 
Le Lean en ingénierie par Cécile Roche de Thales
Le Lean en ingénierie par Cécile Roche de ThalesLe Lean en ingénierie par Cécile Roche de Thales
Le Lean en ingénierie par Cécile Roche de Thales
 
Managed Services is not a product, it's a business model!
Managed Services is not a product, it's a business model!Managed Services is not a product, it's a business model!
Managed Services is not a product, it's a business model!
 
Etre Lean dans la durée par Pierre Vareille et Yves Merel
Etre Lean dans la durée par Pierre Vareille et Yves MerelEtre Lean dans la durée par Pierre Vareille et Yves Merel
Etre Lean dans la durée par Pierre Vareille et Yves Merel
 
Service delivery governance
Service delivery governanceService delivery governance
Service delivery governance
 
Du Lean en maintenance ferroviaire par Boris Evesque de la SNCF
Du Lean en maintenance ferroviaire par Boris Evesque de la SNCFDu Lean en maintenance ferroviaire par Boris Evesque de la SNCF
Du Lean en maintenance ferroviaire par Boris Evesque de la SNCF
 
New frontiers: Lean in the digital age by Daniel T Jones
New frontiers: Lean in the digital age by Daniel T JonesNew frontiers: Lean in the digital age by Daniel T Jones
New frontiers: Lean in the digital age by Daniel T Jones
 
Service Delivery Management (Lucia Eversley)
Service Delivery Management (Lucia Eversley)Service Delivery Management (Lucia Eversley)
Service Delivery Management (Lucia Eversley)
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Effective Monitoring For Demanding Operations Environments

  • 1. Effective Monitoring for Demanding Operations Environments Rodrigue Chakode Nagios World Conference, Saint-Paul, MN, US 2013-10-01
  • 2. Background ● Service : generic term to refer an IT functionality (e.g. mysqld service) ● Business Service/Process : a service provided value-added to business applications or to end-users (e.g. hosting service) ● Check: a probe allowing to detect the status of an IT service (e.g. check for mysqld service) ● Abbreviations – BS: Business Service – BP: Business Process – BSM: Business Service Management – OSM: Open Source Monitoring – OSMS : Open Source Monitoring System/Software
  • 3. Basic Monitoring Scheme Flat Display, no notion of business impact !
  • 4. “Too many alarms kill alarm” S. Bortzmeye
  • 5. Today's IT infrastructures facts ● Huge number of checks to handle – E.g. 100 hosts, 8 checks/host => 8,00 checks ● False alerts are the bane of administrators – Not a matter of being a lazy admin No way for operators to be effective with flat display !
  • 6. Challenges for effective monitoring ● How a failure actually impacts your business ?
  • 7. Is there a disruption of services? RAID 0 (striping) RAID 1 (mirroring)
  • 8. “prioritize and orchestrate work based on business needs” http://www.bmc.com/solutions/bsm/
  • 9. Go beyond individual checks ● Think business services – A failure don't necessarily mean disruptions on business applications or end-user services ● Benefits of BSM – Reduce downtime by up to 75% – Deliver services up to 30% more efficiently – Credit: http://www.bmc.com/solutions/bsm/
  • 10. Think relational services ● A business service may depend on : – one or many IT services, and/or on – other business services – E.g. Streaming ← Web Server ← Databases ← Network ← Operating System ← Hardware Devices...
  • 12. Service hierarchy and mapping Service map ISN'T Network map
  • 13. Apply flexible incident management ● Only select checks that impact your business services ● Apply advanced severity calculation ● Set how the severity of a node is computed from on the severities of its childs – And advanced status propagation rules ● Set how the severity of a node is propagated to its parent
  • 14. Use cases ● RAID 0 ● RAID 1 ● Redundant databases ● Merchant-site
  • 15. Specialize your Operations Dashboards ● Business service-centric/competency-centric ● Deal with large/demanding environments – Just collect what is useful for each dashboard ● Get insight in one shot
  • 16. “takes the IT you already have, and adds to it the visibility and control of a unified platform” http://www.bmc.com/
  • 17. Existing options ● Basic features – Nagios BP Add-on, Shinken Business Rules – No service map, basic aggregation rules – Handle a huge number of services could be tricky
  • 18. RealOpInsight ● Powerful Dashboard Toolkit for BSM – Generic and versatile add-on supporting many OSM tools ● Qt-based GUI application – Powerful and friendly interfaces – Cross platform (Linux, Windows, Mac OS X) ● http://realopinsight.com “small and efficient and gets the job done” lukaswhite, SourceForget.net
  • 19. Some Features ● Effective Operations Management – Prioritize incidents based on business impact ● Advanced customizable event processing rules – avg, high impact, decrease, increase... ● Distributed monitoring made easy – Versatile, supports up to 10 monitoring backends simultaneously ● Free, Open Source and Cross-platform – Windows, Linux, OS X ● More comprehensive messages – e.g. “the CPU load on server <IP/hostname> is more than <threshold> percent ● System Tray Notifications
  • 20. Tree View, Map and Events in one Console Service Tree ● Tooltips ● Focus ● Service-related message filtering... Service Mapping ● Tooltips, Zooming, Dragging and Scrolling, Focus, Service-related message filtering... Message Console ● Trouble view filtering, Large font mode
  • 21. Advanced Incident Management ● Severity aggregation ● Severity increasing ● Severity decreasing ● ...
  • 22. Simple and Efficient Design ● Service Views as XML files ● Native WYSIWYG Editor ● Dynamic Operations Console ● Simple Integration
  • 23. Distributed Monitoring/Unified Dashboard ● Loosely-coupled scalable architecture – Status data retrieved through RPC APIs
  • 24. Ngrt4nd-based Integration - How To ● Specific daemon on Nagios server – See documentation ● Relies on status.data ● ZeroMQ-based RPC APIs – Authenticated data retrieving ● Non recommended – Non-scalable, delayed status data,
  • 25. Livestatus-based Integration - How To ● Xinetd TCP-based RPC over a native UNIX socket – Xinetd socket over the Livestatus NEB socket – /etc/xinetd.d/livestatus ● Restart Xinetd – /etc/init.d/xinetd restart ● Recommended – NEB, scalable, up-to-date data
  • 26. Source Settings Ngrt4nd – Monitor Web URL (optional) – Auth String – Server address – Listening port (1983 by default) – “Use Livestatus” must be disabled Livestatus – Monitor Web URL (optional) – Server address – Listening port – “Use Livestatus” must be enabled
  • 27. Getting started in 3 steps ● Run the Editor … and edit your service view configuration ● Run the Configuration Manager … and set the access to the remote API ● Run the Operations Console … and load the configuration file ● Then fall in love!
  • 28. Integration with Nagios Service in Nagios Service selection in RealOpInsight SourceId:]host_name[/service_description] Set sources and API access ngrt4nd/Livestatus
  • 29. History: Experience Feedback 1/2 ● 2008 : the Idea ● May 2010 : 1st lines of code ● March 2011 (1st release, 1.0) – <30 downloads a month ● May - August 2012 (version 2.0) – New architecture, GPLv3 License – SourceForge.net, Nagios Exchange – Windows Installer – 200 downloads a month
  • 30. History: Experience Feedback 2/2 ● December 2012 (v2.1) – Continuous packaging for openSUSE, Fedora and Ubuntu ● March 2013 (v2.2) – 600 downloads a month ● May 2013 (v2.3) – Support for Livestatus API ● July - September 2013 – Nagios Affiliate – v2.4, adding support of distributed environments ● Today – 7k+ downloads from 120+ countries
  • 31. And the story continues..., Thanks ● Web Edition (2014) @realopinsight