SlideShare uma empresa Scribd logo
1 de 33
Distributed Monitoring
         for Web Apps
            Fernando Hönig
          fernando.honig@intel.com




1
About me
    - From Córdoba, Argentina
    - System Administrator
    - Working last 7 years in IT Companies
    - Working in Intel IT since April 2011
    - Married with Jesica and Father of Benjamin
    (2)




    * Other names and brands may be claimed as the property of others.
2
Third Party Vendors / Open Source
     This presentation will cover the solution achieved instead
      of talking about third party vendors.
     All products used for this are open source.


     Best Practices
     With this presentation we would like to show IT@Intel
      processes and best practices.




    * Other names and brands may be claimed as the property of others.
3
Topics
    - Problem Overview
    - Distributed Infrastructure
    - Failover Solution
    - Webinject Architecture
    - API Feeds Integration
    - Multi Checks + NRPE
    - Notifications with SMS and VoIP
    - Q/A



    * Other names and brands may be claimed as the property of others.
4
Why do we need a Distributed
    Infrastructure?
     More than 500 Services Checks per Customer
     Apps from our Customer needs to be reached from diff
      GEOs
     Checks every 1 or 5 minutes
    Why do we/ need a Centralized
      Redundancy Fast Recovery
     Dashboard?
     ACLs for different customers and groups.
     Fast and simple services/commands/hosts
      adds/updates/removal.
     MySQL stored performance data for external BI solutions.


    * Other names and brands may be claimed as the property of others.
5
Centralized Dashboard / Nagios
    Distribution
                                                                  External
                                                                 Monitoring
                                                                Provider #1




                                                                 Region #1




                                                                 Region #2




                                                                 Region #3




                                                               Region #N


                                                                   External
                                                                 Monitoring
                                                                 Provider #2




    * Other names and brands may be claimed as the property of others.
6
Simple Distributed Nagios with NDOUtils
                                                                         Ndomod.cfg - Example
                                                                         instance_name=Central
                                                                         output_type=tcpsocket
                                                                         output=127.0.0.1
                                                                         tcp_port=5668
                                                                         output_buffer_items=5000
                                                                         file_rotation_interval=14400
                                                                         file_rotation_timeout=60
                                                                         reconnect_interval=15
                                                                         reconnect_warning_interval=900
                                                                         data_processing_options=-1
                                                                         config_output_options=3

                                                                         Ndo2db.cfg - Example
                                                                         ndo2db_user=nagios
                                                                         ndo2db_group=nagios
                                                                         socket_type=tcp
                                                                         socket_name=/var/run/ndo.sock
                                                                         tcp_port=5668
                                                                         db_servertype=mysql
                                                                         db_host=localhost
                                                                         db_name=centstatus
                                                                         db_port=3306
                                                                         db_prefix=nagios_
                                                                         db_user=username
                                                                         db_pass=password
                                                                         max_timedevents_age=1440
                                                                         max_systemcommands_age=1440
                                                                         max_servicechecks_age=1440
                                                                         max_hostchecks_age=1440
                                                                         max_eventhandlers_age=1440



    * Other names and brands may be claimed as the property of others.
7
How to Enable a new distributed node?
       To enable a new Nagios node you just need to:

       • Install a VM with the latest Nagios/NRPE/NDOMod code.
       • Setup ssh-without-password auth for nagios user. Add some sudo rights.
       • Enable that node in the centralized interface.
            • Create a new poller
            • Create a new nagios.cfg config for that poller
            • Create a new ndomod config for that poller
       • Enable the service checks that you need on that poller



                                       All of this could be automated




    * Other names and brands may be claimed as the property of others.
8
Remote Pollers Visualization



    Remote Services / Checks




    * Other names and brands may be claimed as the property of others.
9
Distributed with Failover




     * Other names and brands may be claimed as the property of others.
10
Failover Infrastructure Scripts / Master
  Side
#!/bin/sh

apacherun=`ps ax | grep /usr/sbin/httpd | grep -v grep | cut -c1-5 | paste -s -`
nagiosrun=`ps ax | grep /usr/local/nagios/bin/nagios | grep -v grep | cut -c1-5 | paste -s -`

                      if [ "$nagiosrun" == "" ];
                      then
                          echo "Stopping Apache since Nagios is not running"
                           /etc/init.d/apache2 stop
                      else
                               echo "Nagios running"
                      fi

                      if [ "$apacherun" == "" ];
                      then
                           echo "Stopping Nagios since Apache is not running"
                          /etc/init.d/nagios stop
                      else
                               echo "Apache running"
                      fi
exit 0

 NRPE Command
command[check_nagios_failover]=/usr/local/nagios/libexec/check_nagios -F /usr/local/nagios/var/status.dat -e 1 -C
'/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg'




      * Other names and brands may be claimed as the property of others.
11
Failover Infrastructure Scripts / Failover
      Side
#!/bin/sh
nagcmd=`/usr/local/nagios/libexec/check_nrpe -H 172.16.0.1 -c check_nagios_failover`
now=`date +%s`
commandfile='/usr/local/nagios/var/rw/nagios.cmd'
apacherun=`ps ax | grep /usr/sbin/httpd | grep -v grep | cut -c1-5 | paste -s -`
nagiosrun=`ps ax | grep /usr/local/nagios/bin/nagios | grep -v grep | cut -c1-5 | paste -s -`
CHECK=`echo $nagcmd | grep CRITICAL`

if [ "$CHECK" == "" ] ; then
        echo "Nagios Master is OK"
       if [ "$apacherun" ];
                       then
                          echo "Stopping Apache"
                          /etc/init.d/apache2 stop
                       fi
                       if [ "$nagiosrun" ];
                       then
                          echo "Stopping Nagios"
                          /etc/init.d/nagios stop
                       fi
else
       if [ "$apacherun" == "" ];
         then
          echo "Starting Apache"
          /etc/init.d/apache2 start
       fi
       if [ "$nagiosrun" == "" ];
         then
          echo "Starting Nagios"
          /etc/init.d/nagios start
          sleep 15
       fi
 /bin/echo "[%lu] SEND_CUSTOM_HOST_NOTIFICATION;nagiosmaster;2;nagiosfailover;Master Nagios seems to be down. Oncall Group Engaged.
Nagios Failover in place.n" $now > $commandfile
fi
exit 0




       * Other names and brands may be claimed as the property of others.
 12
Web Apps Monitoring




     * Other names and brands may be claimed as the property of others.
13
How can we monitor Web Apps?



                                                                                   All these white holes are the
                                                                                   space that we’re not monitoring
                                                                                   using these checks.
                                                                  Intel Web Apps




     * Other names and brands may be claimed as the property of others.
14
How can we monitor Web Apps?




                                                                          Intel Web Apps




                           Probably we’re not covering 100%,
                           but white holes are not as big as
                           before. This is a continuous
                           improvement

     * Other names and brands may be claimed as the property of others.
15
What’s Webinject?
      WebInject is a free tool for automated testing of web applications
      and web services. It can be used to test individual system
      components that have HTTP interfaces. WebInject offers real-time
      results display and may also be used for monitoring system
      response times.*
      * Source: www.webinject.org

     How it works?
                                                                                Web
                                     Config                     TestCase
                                                                                Apps

     What do you receive after a check?

                                           OK                        CRITICAL



     * Other names and brands may be claimed as the property of others.
16
Webinject Installation
     How to Install it?:

     From Cpan Perl Library use: install Webinject




     From Consol Labs: http://labs.consol.de/lang/de/nagios/check_webinject/




     * Other names and brands may be claimed as the property of others.
17
Webinject Architecture
     Config File Model:

     <testcasefile>test.xml</testcasefile>
     <globalhttplog>no</globalhttplog>
     <reporttype>nagios</reporttype>
     <timeout>10</timeout>
     <useragent>Mozilla/5.0</useragent>
     <proxy>http://127.0.0.1:8080</proxy>




     * Other names and brands may be claimed as the property of others.
18
Webinject Architecture
     Test Case File Model:

     <testcases repeat="1">
     <case
         id=“1"
         description1=“Case Description"
         url="https://{BASEURL1}/id/health"
         method="post"
         posttype="application/x-www-form-
     urlencoded"
         postbody="string={BASEURL2}&id=1"
         parseresponse='"code":"|",“string"|'
         verifypositive='"state":"{BASEURL2}"'
     />
     </testcases>




     * Other names and brands may be claimed as the property of others.
19
Webinject Architecture
     Test Case File Model:
     <testcases repeat="1">
     <case
         id="1"
         description1="Case Description"
         url="https://{BASEURL1}/id/health"
         method="post"
         posttype="application/x-www-form-urlencoded"
         postbody="string={BASEURL2}&id=1"
         parseresponse='"method":"|"'
         verifypositive='"state":"{BASEURL2}"'
     />
     <case
         id="2"
         description1="Validation"
         url="https://{BASEURL2}/v1/rollback"
         method="post"
         posttype="application/json"
     postbody='{"usermethod":"{PARSEDRESULT}","taxes":false}'
         verifypositive='"transactionMessage":"Transaction Approved"'
     />
     </testcases>




     * Other names and brands may be claimed as the property of others.
20
Webinject and Nagios Integration
         Command Definition:
         define command {
                   command_name webinject
                   command_line /usr/local/webinject/webinject.pl $ARG1$ -c $ARG2$
         }


         Service Definition:
         define service {
                   use generic-service
                   host_name MyApplication-server
                   service_description WebInject Test
                   is_volatile 0
                   check_period 24x7
                   max_check_attempts 3
                   normal_check_interval 1
                   retry_check_interval 1
                   contact_groups myapplication-admins
                   notification_interval 120
                   notification_period 24x7
                   notification_options w,u,c,r
                   check_command webinject! -s BASEURL1=url-domain.com!config_file.xml
                   }




      * Other names and brands may be claimed as the property of others.
21
External API Integration
       We use a webinject SOAP call to get the availability of a test in an external monitoring
       provider.


         #/bin/bash

         #Arguments
         #1 = TestCase

         # Web Inject API Call
         webinject=`/path/to/webinject.pl -c path/to/config.xml $1`

         if [[ "$webinject" == *CRITICAL* ]] ; then
                echo “CRITICAL- $webinject"
                exit 2
         else
                echo "$webinject"
                exit 0
         fi




     * Other names and brands may be claimed as the property of others.
22
Multi Check + NRPE:
     Using check_multi + nrpe we can centralize the distributed execution of
     several scripts and set warning or critical threshold based on # of tests.




     * Other names and brands may be claimed as the property of others.
23
Multi Check + NRPE Architecture:
     Multi Distributed Script

     • Bash script to call check_multi with
       options
     • cmd file to execute the tests and validate
     the execution

     • NRPE remote command executes and
       get the results




     * Other names and brands may be claimed as the property of others.
24
Multi Check Bash Script
     #!/bin/bash

     nagiospluginpath="/usr/local/nagios/libexec"


     $nagiospluginpath/check_multi –f 
     $nagiospluginpath/check_multi_configs/path/webinject.cmd 
             -s TEST1="$1" 
             -s TEST2="$2" 
             -s TEST3="$3" 
             -s TEST4="$4" 
             -t 60 
             -T 120




      * Other names and brands may be claimed as the property of others.
25
Multi Check Command Script

 # Web Inject Calls for Multi Tests

 command    [   place1     ]   =   check_nrpe     -H   place1     -c      external_webinject   -a   "$TEST1$"
 command    [   place2     ]   =   check_nrpe     -H   place2     -c      external_webinject   -a   "$TEST2$"
 command    [   place3     ]   =   check_nrpe     -H   place3     -c      external_webinject   -a   "$TEST3$"
 command    [   place4     ]   =   check_nrpe     -H   place4     -c      external_webinject   -a   "$TEST4$"

 state   [ CRITICAL   ] = COUNT(CRITICAL) > 3 || COUNT(WARNING)==COUNT(ALL) ||
   COUNT(UNKNOWN)==COUNT(ALL)
 state   [ WARNING    ] = COUNT(WARNING) > 0 || COUNT(CRITICAL) > 0 || COUNT(UNKNOWN) > 0




     * Other names and brands may be claimed as the property of others.
26
Multi Check Service Status:




      * Other names and brands may be claimed as the property of others.
27
Additional Notifications




     * Other names and brands may be claimed as the property of others.
28
Notifications with SMS




     * Other names and brands may be claimed as the property of others.
29
Notifications with VoIP Calls (Nagios
     calls you)




     * Other names and brands may be claimed as the property of others.
30
Complete Distributed Monitoring
     Solution




     * Other names and brands may be claimed as the property of others.
31
Q/A




                                                  Fernando Hönig
                                             fernando.honig@intel.com
                                                   @fernandohonig
                                                          www.linkedin.com/in/fernandoh
                                                          onig

     * Other names and brands may be claimed as the property of others.
32
Legal Notices




This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
* Other names and brands may be claimed as the property of others.
Copyright © 2012, Intel Corporation. All rights reserved.



       * Other names and brands may be claimed as the property of others.
33

Mais conteúdo relacionado

Semelhante a Nagios distributed monitoring for web applications

Making Your Capistrano Recipe Book
Making Your Capistrano Recipe BookMaking Your Capistrano Recipe Book
Making Your Capistrano Recipe BookTim Riley
 
agri inventory - nouka data collector / yaoya data convertor
agri inventory - nouka data collector / yaoya data convertoragri inventory - nouka data collector / yaoya data convertor
agri inventory - nouka data collector / yaoya data convertorToshiaki Baba
 
linux_Commads
linux_Commadslinux_Commads
linux_Commadstastedone
 
Nko workshop - node js crud & deploy
Nko workshop - node js crud & deployNko workshop - node js crud & deploy
Nko workshop - node js crud & deploySimon Su
 
Configuring wifi in open embedded builds
Configuring wifi in open embedded buildsConfiguring wifi in open embedded builds
Configuring wifi in open embedded buildsMender.io
 
glance replicator
glance replicatorglance replicator
glance replicatoririx_jp
 
High Availability Architecture for Legacy Stuff - a 10.000 feet overview
High Availability Architecture for Legacy Stuff - a 10.000 feet overviewHigh Availability Architecture for Legacy Stuff - a 10.000 feet overview
High Availability Architecture for Legacy Stuff - a 10.000 feet overviewMarco Amado
 
Fosdem2012 sayer-sems-sbc
Fosdem2012 sayer-sems-sbcFosdem2012 sayer-sems-sbc
Fosdem2012 sayer-sems-sbcstefansayer
 
Asian Spirit 3 Day Dba On Ubl
Asian Spirit 3 Day Dba On UblAsian Spirit 3 Day Dba On Ubl
Asian Spirit 3 Day Dba On Ublnewrforce
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper Omid Vahdaty
 
PuppetCamp SEA 1 - Puppet Deployment at OnApp
PuppetCamp SEA 1 - Puppet Deployment  at OnAppPuppetCamp SEA 1 - Puppet Deployment  at OnApp
PuppetCamp SEA 1 - Puppet Deployment at OnAppWalter Heck
 
Puppet Deployment at OnApp
Puppet Deployment at OnApp Puppet Deployment at OnApp
Puppet Deployment at OnApp Puppet
 
PuppetCamp SEA 1 - Puppet Deployment at OnApp
PuppetCamp SEA 1 - Puppet Deployment  at OnAppPuppetCamp SEA 1 - Puppet Deployment  at OnApp
PuppetCamp SEA 1 - Puppet Deployment at OnAppOlinData
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo dbAmit Thakkar
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Nagios
 

Semelhante a Nagios distributed monitoring for web applications (20)

Linux Hardening - nullhyd
Linux Hardening - nullhydLinux Hardening - nullhyd
Linux Hardening - nullhyd
 
Making Your Capistrano Recipe Book
Making Your Capistrano Recipe BookMaking Your Capistrano Recipe Book
Making Your Capistrano Recipe Book
 
linux-namespaces.pdf
linux-namespaces.pdflinux-namespaces.pdf
linux-namespaces.pdf
 
agri inventory - nouka data collector / yaoya data convertor
agri inventory - nouka data collector / yaoya data convertoragri inventory - nouka data collector / yaoya data convertor
agri inventory - nouka data collector / yaoya data convertor
 
linux_Commads
linux_Commadslinux_Commads
linux_Commads
 
Nko workshop - node js crud & deploy
Nko workshop - node js crud & deployNko workshop - node js crud & deploy
Nko workshop - node js crud & deploy
 
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
 
Configuring wifi in open embedded builds
Configuring wifi in open embedded buildsConfiguring wifi in open embedded builds
Configuring wifi in open embedded builds
 
glance replicator
glance replicatorglance replicator
glance replicator
 
High Availability Architecture for Legacy Stuff - a 10.000 feet overview
High Availability Architecture for Legacy Stuff - a 10.000 feet overviewHigh Availability Architecture for Legacy Stuff - a 10.000 feet overview
High Availability Architecture for Legacy Stuff - a 10.000 feet overview
 
Fosdem2012 sayer-sems-sbc
Fosdem2012 sayer-sems-sbcFosdem2012 sayer-sems-sbc
Fosdem2012 sayer-sems-sbc
 
Asian Spirit 3 Day Dba On Ubl
Asian Spirit 3 Day Dba On UblAsian Spirit 3 Day Dba On Ubl
Asian Spirit 3 Day Dba On Ubl
 
Storage managment using nagios
Storage managment using nagiosStorage managment using nagios
Storage managment using nagios
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
 
PuppetCamp SEA 1 - Puppet Deployment at OnApp
PuppetCamp SEA 1 - Puppet Deployment  at OnAppPuppetCamp SEA 1 - Puppet Deployment  at OnApp
PuppetCamp SEA 1 - Puppet Deployment at OnApp
 
Puppet Deployment at OnApp
Puppet Deployment at OnApp Puppet Deployment at OnApp
Puppet Deployment at OnApp
 
PuppetCamp SEA 1 - Puppet Deployment at OnApp
PuppetCamp SEA 1 - Puppet Deployment  at OnAppPuppetCamp SEA 1 - Puppet Deployment  at OnApp
PuppetCamp SEA 1 - Puppet Deployment at OnApp
 
Mongodb workshop
Mongodb workshopMongodb workshop
Mongodb workshop
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo db
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
 

Último

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Nagios distributed monitoring for web applications

  • 1. Distributed Monitoring for Web Apps Fernando Hönig fernando.honig@intel.com 1
  • 2. About me - From Córdoba, Argentina - System Administrator - Working last 7 years in IT Companies - Working in Intel IT since April 2011 - Married with Jesica and Father of Benjamin (2) * Other names and brands may be claimed as the property of others. 2
  • 3. Third Party Vendors / Open Source  This presentation will cover the solution achieved instead of talking about third party vendors.  All products used for this are open source. Best Practices  With this presentation we would like to show IT@Intel processes and best practices. * Other names and brands may be claimed as the property of others. 3
  • 4. Topics - Problem Overview - Distributed Infrastructure - Failover Solution - Webinject Architecture - API Feeds Integration - Multi Checks + NRPE - Notifications with SMS and VoIP - Q/A * Other names and brands may be claimed as the property of others. 4
  • 5. Why do we need a Distributed Infrastructure?  More than 500 Services Checks per Customer  Apps from our Customer needs to be reached from diff GEOs  Checks every 1 or 5 minutes Why do we/ need a Centralized Redundancy Fast Recovery Dashboard?  ACLs for different customers and groups.  Fast and simple services/commands/hosts adds/updates/removal.  MySQL stored performance data for external BI solutions. * Other names and brands may be claimed as the property of others. 5
  • 6. Centralized Dashboard / Nagios Distribution External Monitoring Provider #1 Region #1 Region #2 Region #3 Region #N External Monitoring Provider #2 * Other names and brands may be claimed as the property of others. 6
  • 7. Simple Distributed Nagios with NDOUtils Ndomod.cfg - Example instance_name=Central output_type=tcpsocket output=127.0.0.1 tcp_port=5668 output_buffer_items=5000 file_rotation_interval=14400 file_rotation_timeout=60 reconnect_interval=15 reconnect_warning_interval=900 data_processing_options=-1 config_output_options=3 Ndo2db.cfg - Example ndo2db_user=nagios ndo2db_group=nagios socket_type=tcp socket_name=/var/run/ndo.sock tcp_port=5668 db_servertype=mysql db_host=localhost db_name=centstatus db_port=3306 db_prefix=nagios_ db_user=username db_pass=password max_timedevents_age=1440 max_systemcommands_age=1440 max_servicechecks_age=1440 max_hostchecks_age=1440 max_eventhandlers_age=1440 * Other names and brands may be claimed as the property of others. 7
  • 8. How to Enable a new distributed node? To enable a new Nagios node you just need to: • Install a VM with the latest Nagios/NRPE/NDOMod code. • Setup ssh-without-password auth for nagios user. Add some sudo rights. • Enable that node in the centralized interface. • Create a new poller • Create a new nagios.cfg config for that poller • Create a new ndomod config for that poller • Enable the service checks that you need on that poller All of this could be automated * Other names and brands may be claimed as the property of others. 8
  • 9. Remote Pollers Visualization Remote Services / Checks * Other names and brands may be claimed as the property of others. 9
  • 10. Distributed with Failover * Other names and brands may be claimed as the property of others. 10
  • 11. Failover Infrastructure Scripts / Master Side #!/bin/sh apacherun=`ps ax | grep /usr/sbin/httpd | grep -v grep | cut -c1-5 | paste -s -` nagiosrun=`ps ax | grep /usr/local/nagios/bin/nagios | grep -v grep | cut -c1-5 | paste -s -` if [ "$nagiosrun" == "" ]; then echo "Stopping Apache since Nagios is not running" /etc/init.d/apache2 stop else echo "Nagios running" fi if [ "$apacherun" == "" ]; then echo "Stopping Nagios since Apache is not running" /etc/init.d/nagios stop else echo "Apache running" fi exit 0 NRPE Command command[check_nagios_failover]=/usr/local/nagios/libexec/check_nagios -F /usr/local/nagios/var/status.dat -e 1 -C '/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg' * Other names and brands may be claimed as the property of others. 11
  • 12. Failover Infrastructure Scripts / Failover Side #!/bin/sh nagcmd=`/usr/local/nagios/libexec/check_nrpe -H 172.16.0.1 -c check_nagios_failover` now=`date +%s` commandfile='/usr/local/nagios/var/rw/nagios.cmd' apacherun=`ps ax | grep /usr/sbin/httpd | grep -v grep | cut -c1-5 | paste -s -` nagiosrun=`ps ax | grep /usr/local/nagios/bin/nagios | grep -v grep | cut -c1-5 | paste -s -` CHECK=`echo $nagcmd | grep CRITICAL` if [ "$CHECK" == "" ] ; then echo "Nagios Master is OK" if [ "$apacherun" ]; then echo "Stopping Apache" /etc/init.d/apache2 stop fi if [ "$nagiosrun" ]; then echo "Stopping Nagios" /etc/init.d/nagios stop fi else if [ "$apacherun" == "" ]; then echo "Starting Apache" /etc/init.d/apache2 start fi if [ "$nagiosrun" == "" ]; then echo "Starting Nagios" /etc/init.d/nagios start sleep 15 fi /bin/echo "[%lu] SEND_CUSTOM_HOST_NOTIFICATION;nagiosmaster;2;nagiosfailover;Master Nagios seems to be down. Oncall Group Engaged. Nagios Failover in place.n" $now > $commandfile fi exit 0 * Other names and brands may be claimed as the property of others. 12
  • 13. Web Apps Monitoring * Other names and brands may be claimed as the property of others. 13
  • 14. How can we monitor Web Apps? All these white holes are the space that we’re not monitoring using these checks. Intel Web Apps * Other names and brands may be claimed as the property of others. 14
  • 15. How can we monitor Web Apps? Intel Web Apps Probably we’re not covering 100%, but white holes are not as big as before. This is a continuous improvement * Other names and brands may be claimed as the property of others. 15
  • 16. What’s Webinject? WebInject is a free tool for automated testing of web applications and web services. It can be used to test individual system components that have HTTP interfaces. WebInject offers real-time results display and may also be used for monitoring system response times.* * Source: www.webinject.org How it works? Web Config TestCase Apps What do you receive after a check? OK CRITICAL * Other names and brands may be claimed as the property of others. 16
  • 17. Webinject Installation How to Install it?: From Cpan Perl Library use: install Webinject From Consol Labs: http://labs.consol.de/lang/de/nagios/check_webinject/ * Other names and brands may be claimed as the property of others. 17
  • 18. Webinject Architecture Config File Model: <testcasefile>test.xml</testcasefile> <globalhttplog>no</globalhttplog> <reporttype>nagios</reporttype> <timeout>10</timeout> <useragent>Mozilla/5.0</useragent> <proxy>http://127.0.0.1:8080</proxy> * Other names and brands may be claimed as the property of others. 18
  • 19. Webinject Architecture Test Case File Model: <testcases repeat="1"> <case id=“1" description1=“Case Description" url="https://{BASEURL1}/id/health" method="post" posttype="application/x-www-form- urlencoded" postbody="string={BASEURL2}&id=1" parseresponse='"code":"|",“string"|' verifypositive='"state":"{BASEURL2}"' /> </testcases> * Other names and brands may be claimed as the property of others. 19
  • 20. Webinject Architecture Test Case File Model: <testcases repeat="1"> <case id="1" description1="Case Description" url="https://{BASEURL1}/id/health" method="post" posttype="application/x-www-form-urlencoded" postbody="string={BASEURL2}&id=1" parseresponse='"method":"|"' verifypositive='"state":"{BASEURL2}"' /> <case id="2" description1="Validation" url="https://{BASEURL2}/v1/rollback" method="post" posttype="application/json" postbody='{"usermethod":"{PARSEDRESULT}","taxes":false}' verifypositive='"transactionMessage":"Transaction Approved"' /> </testcases> * Other names and brands may be claimed as the property of others. 20
  • 21. Webinject and Nagios Integration Command Definition: define command { command_name webinject command_line /usr/local/webinject/webinject.pl $ARG1$ -c $ARG2$ } Service Definition: define service { use generic-service host_name MyApplication-server service_description WebInject Test is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 1 retry_check_interval 1 contact_groups myapplication-admins notification_interval 120 notification_period 24x7 notification_options w,u,c,r check_command webinject! -s BASEURL1=url-domain.com!config_file.xml } * Other names and brands may be claimed as the property of others. 21
  • 22. External API Integration We use a webinject SOAP call to get the availability of a test in an external monitoring provider. #/bin/bash #Arguments #1 = TestCase # Web Inject API Call webinject=`/path/to/webinject.pl -c path/to/config.xml $1` if [[ "$webinject" == *CRITICAL* ]] ; then echo “CRITICAL- $webinject" exit 2 else echo "$webinject" exit 0 fi * Other names and brands may be claimed as the property of others. 22
  • 23. Multi Check + NRPE: Using check_multi + nrpe we can centralize the distributed execution of several scripts and set warning or critical threshold based on # of tests. * Other names and brands may be claimed as the property of others. 23
  • 24. Multi Check + NRPE Architecture: Multi Distributed Script • Bash script to call check_multi with options • cmd file to execute the tests and validate the execution • NRPE remote command executes and get the results * Other names and brands may be claimed as the property of others. 24
  • 25. Multi Check Bash Script #!/bin/bash nagiospluginpath="/usr/local/nagios/libexec" $nagiospluginpath/check_multi –f $nagiospluginpath/check_multi_configs/path/webinject.cmd -s TEST1="$1" -s TEST2="$2" -s TEST3="$3" -s TEST4="$4" -t 60 -T 120 * Other names and brands may be claimed as the property of others. 25
  • 26. Multi Check Command Script # Web Inject Calls for Multi Tests command [ place1 ] = check_nrpe -H place1 -c external_webinject -a "$TEST1$" command [ place2 ] = check_nrpe -H place2 -c external_webinject -a "$TEST2$" command [ place3 ] = check_nrpe -H place3 -c external_webinject -a "$TEST3$" command [ place4 ] = check_nrpe -H place4 -c external_webinject -a "$TEST4$" state [ CRITICAL ] = COUNT(CRITICAL) > 3 || COUNT(WARNING)==COUNT(ALL) || COUNT(UNKNOWN)==COUNT(ALL) state [ WARNING ] = COUNT(WARNING) > 0 || COUNT(CRITICAL) > 0 || COUNT(UNKNOWN) > 0 * Other names and brands may be claimed as the property of others. 26
  • 27. Multi Check Service Status: * Other names and brands may be claimed as the property of others. 27
  • 28. Additional Notifications * Other names and brands may be claimed as the property of others. 28
  • 29. Notifications with SMS * Other names and brands may be claimed as the property of others. 29
  • 30. Notifications with VoIP Calls (Nagios calls you) * Other names and brands may be claimed as the property of others. 30
  • 31. Complete Distributed Monitoring Solution * Other names and brands may be claimed as the property of others. 31
  • 32. Q/A Fernando Hönig fernando.honig@intel.com @fernandohonig www.linkedin.com/in/fernandoh onig * Other names and brands may be claimed as the property of others. 32
  • 33. Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. * Other names and brands may be claimed as the property of others. Copyright © 2012, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 33