2024: Domino Containers - The Next Step. News from the Domino Container commu...
Italian Conference on Nagios: Michael Medin on Windows Monitoring
1. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Going where no man has gone before
2. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
These slides represent the work and opinions
of the author and do not constitute official
positions of any organization sponsoring the
author‟s work
This material has not been peer reviewed and
is presented here as-is with the permission of
the author.
The author assumes no liability for any
content or opinion expressed in this
presentation and or use of content herein.
3. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Developer (not system manager)
◦ Not working with Nagios
Accidentally ended up in our NOC
◦ Hated BB
2003: The birth of NSClient++
◦ NSClient sucked (Broke Exchange)
◦ NRPE_NT was to hard to use
2004: The open source of NSClient++
◦ “just for fun”
2007: The rebirth of NSClient++
◦ A lot of users emailed me
◦ Got a lot of hits on the webpage
◦ Intense development lead to 0.3.0!
2010: The Future
◦ 0.3.8 out now,
◦ 0.4.x in development (scheduled for beta fall 2010)
4. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Agents
◦ An overview of your options
About NSClient++
◦ Quick Introduction
Monitoring
◦ Eventlog Checking
◦ WMI (Windows Management Instrumentation)
◦ Scripts
◦ Revisiting WMI
Q/A
5. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
An overview of the options
6. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
What is NSClient?
◦ A (pretty old) program
NSClient (or pNSClient)
◦ A (pretty limited) protocol
check_nt
◦ A (pretty incorrect) concept
”Windows monitoring”
What is it not?
◦ NSClient++!
◦ NSClient++ was written as a replacment for NSClient
◦ But has evolved much since then...
8. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
I would use either:
◦ NSClient++
◦ NC_NET
I would not use:
◦ SNMP
To complex to use (and limited on vanilla hardware)
◦ NSClient/NRPE_NT/OpMonAgent
Old, outdated and has limited functionality
◦ Agentless WMI
Limited functionality (and enforces centralized monitoring)
But...
◦ I am biased, so might not wanna take my word for it...
9. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Protocol Method Encryption Auth Payload Args. Multi Commands
NSClient Active No Yes Unlimited1 Yes1 Yes1
NRPE Active Yes No 10242 Yes No
NSCA Passive Yes Yes 5122 Yes Yes
Future3 A/P/* Yes Yes Unlimited Yes Yes
1) Protocol supports it but not check_nt
2) NRPE Payload can be extended with recompile of check_nrpe and configured in NSClient++
3) A future protocol I am thinking of adding to NSClient++
NSClient++ supports all of them
10. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
I would use:
◦ NRPE (check_nrpe)
For active checks (the server queries information)
◦ NSCA
For passive checks (the client pushes information)
I would not use:
◦ NSClient (check_nt)
Limited feature set
Be aware!
◦ None of them are safe (from a security perspective)!
◦ But then... Nothing really is...
12. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Internals:
◦ C++ using W32 API
◦ Around 40.000 lines of code
◦ Actively developed (unfortunately only by me)
◦ Modularized design philosophy
Runs on:
◦ NT4, w2k, XP, w2k3, Vista, w2k8, Windows 7 ...
◦ X86, x64, IA64 (I lack a compiler for that platform, but it works)
Current Version:
◦ 0.3.8 (out now, yesterday in fact)
◦ Don‟t use 0.2.7!
Most features require NRPE or NSCA
Documentation online (WIKI)
◦ http://nsclient.org
13. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Not supported by a commercial entity
◦ Donations welcome
◦ Sponsoring available (contact me for details)
Used by a lot of people (I think)
◦ Impossible to estimate any figures
Website has:
◦ Around 10-15.000 unique visitors per month
◦ Around 20-30.000 downloads per month
Please, Help out!
◦ Add documentation, report problems, ideas,
thoughts, etc, etc...
15. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Major simplification to the eventlog checker
generated > -2d AND severity = 'error'
Registry checks
Improvements to the file checker
Supports multi-language performance counters
“Automatic” volume support
Improved command line support
Simplified scripting with a new VB Helper
◦ Thanks op5!
Many more things…
16. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Rewritten ”core” using boost
◦ Means ”propper handling” (and fewer bugs?)
◦ Unix support
◦ Improved multitasking
◦ Etc.
New settings subsytem
◦ Registry, improved ini support, better loader, xml?
Filter-like API (in addition to options)
◦ “warn=any drive > 90% or c: > 80%”
New improved client with improved protocol
Better .net integration
Better customization support
CEP - Complex Event Processing?
◦ If anyone wants this let me know!
17. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
NSClient++ is a command line program!
◦ nsclient++ /start (net start nsclientpp)
◦ nsclient++ /stop (net stop nsclientpp)
nsclient++ /test
◦ nsclient++ /test
Is your
Configuration:
◦ notepad nsc.ini
friend!
Testing:
1. Local (nsclient++ /test)
2. From CLI (check_nrpe ...)
3. From Nagios (add command)
Works with “anything” (event non Nagios things)
19. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
In a galaxy far far away...
20. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
The good:
◦ Powerfull interface!
◦ Simple to use!
◦ out-of-the-box solution!
(on which you can expand)
The bad:
◦ Nothing! Really, I mean it!
But...
◦ …still a bit “experimental”
21. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Syntax is friendly and intuitive
Still experimental
◦ Should work though, so please try it
Based on SQL WHERE clauses
◦ generated > -2d AND severity = 'error'
Automatically detects version to use
◦ So no filter=newer option
22. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Like SQL ”Where” clauses
◦ severity = ‟error‟
◦ severity = ‟error‟ OR severity = ‟warning‟
◦ severity = ‟error‟ OR (id = 123 OR id = 345)
◦ severity = ‟error‟ OR id IN (123, 345)
23. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Type Description
type Type of error. (Microsoft says this is severity)
error, warning, info, auditSuccess or auditFailure
source The name of the source of the event.
The program who logged the message
generated Time ago the message was generated.
When it happened
written Time ago the message was written to the log (don‟t use)
strings Message contents (faster)
message Message text (slower)
id Event id of the log message (this with source in unique)
severity Event severity (I think this is severity)
success, informational, warning or error
24. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Operator Safe Meaning
= eq Equality
!= ne Not equal
> gt Greater then
< lt Less then
=> ge Greater then or equal
=< le Less then or equal
like String similarity (substring matching)
25. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Name Use Example
convert(...) Converts from one type to another
Usualy not needed as types are infered
neg(...) Negate value -1 = neg(1)
yesterday=neg(tomorrow)
yesterday = -tomorrow
in ( ... ) Equals to anyone from a list id in (1,3, 4, 5)
26. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Option Description
file The “eventlog file” to open.
Use multiple file-options to check multiple files.
filter Define the filter (there can only be one)
MaxWarn Maximum hits before a warning state is issued.
MaxCrit Maximum hits before a critical state is issued.
truncate Length of returned data.
Since NRPE (and NSClient++) has a limited capacity this
is important. Usually 900 is a good value.
syntax How to format the return data
unique Only “one of each” record will be returned.
(“count” (MaxWarn/MaxCrit) is not affected)
descriptions If you plan on using the %message% syntax option.
(Will impact performance “severely”)
debug=true Displays a lot more information about the check
27. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
alias_event_log
Uses the following definition:
◦ file=application file=system
The files to check
◦ MaxWarn=1 MaxCrit=1
Every error is a warning
◦ "filter=generated gt -2d
Generated less then 2 days ago
◦ AND severity NOT IN ('success', 'informational')"
NOT a success or information message
◦ truncate=800 unique descriptions
Truncate returned data and make it look pretty
28. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Filtering is fairly straight forward
The ”parser” will do most of the work for you
◦ generated > -2d just works!
enable debug=true to see what happens
Always always always debug in ”/test mode”
Check query times to optimize performance
There is a pretty ok guide on the wiki
29. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Start with “everything” and work your way
down.
System, Application, etc etc
Reasonable start filter:
◦ generated > -2d AND severity NOT IN („success‟,
„informational‟ )
Need to customize it for your environment.
A good idea is to use more then one check
1. Check “all errors” send to /dev/null
2. Check “my service” send to admin@server
Don‟t overdo it (eventlog checking is slow)
30. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
WMI - Windows Management
Instrumentation
31. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
The purpose of WMI is to define a non-proprietary set of
environment-independent specifications which allow
management information to be shared between management
applications.
WMI prescribes enterprise management standards and related
technologies that work with existing management standards,
such as Desktop Management Interface (DMI) and SNMP.
WMI complements these other standards by providing a
uniform model. This model represents the managed
environment through which management data from any
source can be accessed in a common way.
…yada yada yada…
In short: A bit like SNMP but modern
◦ Though it is actually more then 10 years old
32. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Everything?
◦ Almost...
There is a lot of objects (tables)
◦ win32 has 450 objects
◦ Various services will add more (AD, SQL Server, ...)
You can:
◦ Read, write and work with “objects”.
But only read via the built-in commands of NSClient++
But you can not:
◦ Check your application (ish)
33. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Built-in commands are dangerous!
◦ No security, allows access to a lot of things!
◦ For instance you can enumerate the file system
34. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
CheckWMI
◦ Check a result set
◦ Good for;
checking if we have more (or less) then n items...
CheckWMIValue
◦ Check a specific value
◦ Good for;
checking if a value is more or less then n
Custom Scripts
◦ For, I think, most things beyond the basics
◦ Also improves the security aspect
◦ Good for;
Everything
35. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
WQL - WMI Query Language
◦ Based upon SQL
◦ Only select … (no update/insert/delete/DDL)
“Tables” are called objects in WMI
◦ An object usually correspond to a logical “type”.
Example:
◦ select LoadPercentage from win32_Processor
Retrieves system load from the win32_Processor ”object”.
◦ select * from win32_Processor
Retrieves everything from the win32_Processor ”object”.
36. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Object Description
Win32_Fan Represents the properties of a fan device in the computer system.
Win32_TemperatureProbe Represents the properties of a temperature sensor (electronic thermometer).
Win32_DiskDrive Represents a physical disk drive as seen by a computer running the Windows operating system.
Win32_PhysicalMedia Represents any type of documentation or storage medium.
Win32_TapeDrive Represents a tape drive on a computer system running Windows.
Win32_BaseBoard Represents a baseboard (also known as a motherboard or system board).
Win32_BIOS Represents the attributes of the computer system's basic input or output services (BIOS).
Win32_IDEController Represents the capabilities of an Integrated Drive Electronics (IDE) controller device.
Win32_MemoryArray Represents the properties of the computer system memory array and mapped addresses.
Win32_OnBoardDevice Represents common adapter devices built into the motherboard (system board).
Win32_Processor Represents a device capable of interpreting a sequence of machine instructions on the computer.
Win32_SCSIController Represents a small computer system interface (SCSI) controller on a computer system running Windows.
Win32_USBControllerDevice Relates a USB controller and the CIM_LogicalDevice instances connected to it.
Win32_NetworkAdapter Represents a network adapter on a computer system running Windows.
Win32_Battery Represents a battery connected to the computer system.
Win32_PortableBattery Represents the properties of a portable battery, such as one used for a notebook computer.
Win32_PowerManagementEvent Represents power management events resulting from power state changes.
Win32_UninterruptiblePowerSupply Represents the capabilities and management capacity of an uninterruptible power supply (UPS).
Represents a device connected to a computer system running Windows that is capable of reproducing a
Win32_Printer visual image on a medium.
Win32_PrintJob Represents a print job generated by a Windows-based application.
37. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Object Description
Win32_SystemDriver Represents the system driver for a base service.
Win32_Directory Represents a directory entry on a computer system running Windows.
Win32_DiskQuota Tracks disk space usage for NTFS file system volumes.
Win32_LogicalDisk Represents a data source that resolves to an actual local storage device.
Win32_Volume Represents an area of storage on a hard disk.
Win32_PageFileUsage Represents the file used for handling virtual memory file swapping on a computer system running Windows.
Win32_NetworkConnection Represents an active network connection in a Windows environment.
Win32_NTDomain Represents a Windows NT domain.
Win32_PingStatus Represents the values returned by the standard ping command.
Win32_ComputerSystem Represents a computer system operating in a Windows environment.
Win32_OperatingSystem Represents an operating system installed on a computer system running Windows.
Win32_Process Represents a sequence of events on a computer system running Windows.
Win32_ProcessStartup Represents the startup configuration of a computer system running Windows.
Win32_ScheduledJob Represents a job scheduled using the Windows NT schedule service.
Win32_BaseService Represents executable objects that are installed in a registry database maintained by the SCM.
Win32_Service Represents a service on a computer system running Windows.
Win32_LogonSession Describes the logon session or sessions associated with a user logged on to Windows 2000 or Windows NT.
Win32_UserAccount Represents information about a user account on a computer system running Windows.
Win32_UserInDomain Association class
Win32_WindowsProductActivation Contains properties and methods related to WPA.
Win32_NTEvent... Yes you can even check the eventlog!
38. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
NSClient++ has support for executing WQL
queries ”as is” and get the result.
◦ nsclient++ -noboot CheckWMI <query>
Sample use
◦ nsclient++ -noboot CheckWMI select * from win32_Processor
39. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
A better option is
◦ WMI Administrative Tools
◦ Freely avalible from Microsoft
40. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
1. Checking ”values”
Is Load above 50%
Use CheckWMIValue
2. Checking ”items”
Is load on more then 3 cores above 50%
Use CheckWMI
3. Checking ”custom things”
Check if load is above 50% and less then 5 queries are running
on the database
Use Scripts
41. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Best way to start
Simple to use...
◦ ...if you know your WMI
A sample query:
◦ CheckWMIValue
"Query=Select * from win32_Processor“
MaxWarn=80 MaxCrit=90
Check:CPU=LoadPercentage
AliasCol=LoadPercentage
ShowAll=long
◦ (a bit like CheckCPU)
43. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
External Scripts
◦ VB, Perl, Python, ...
◦ .exe files
◦ .net
◦ ...
Lua
◦ Lua is a simple programming language
◦ Used INSIDE NSClient++
◦ Very powerful, and simple
◦ A fairly new feature so feel free to suggest things
Modules
◦ Written in C++, Vb, .net, ...
◦ Very powerful, but much “harder”
44. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Configuration:
◦ [modules]
◦ CheckExternalScripts.dll
◦ ...
◦ [External Scripts]
◦ <alias>=<script>
<alias> is the command from nrpe
<script> is the command to execute
check_es_ok=scriptsok.bat
◦ [Wrapped Scripts]
◦ <alias>=<script>
45. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Sample Code:
◦ @echo CRITICAL: Everything is not going to be ok!
◦ @exit 2
Exit statuses:
◦ 0=OK, 1=Warning, 2=Critical, 3=Unknown
NSC.ini syntax:
◦ [External Scripts]
◦ check_bat=scriptscheck_test.bat
Or
◦ [Wrapped Scripts]
◦ check_test=check_test.bat
46. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Sample Code:
◦ Wscript.Echo “Everything might not be ok"
◦ Wscript.Quit(1)
Exit statuses:
◦ 0=OK, 1=Warning, 2=Critical, 3=Unknown
NSC.ini syntax:
◦ [External Scripts]
◦ check_test=cscript.exe /T:30 /NoLogo scriptscheck_test.vbs
Or
◦ [Wrapped Scripts]
◦ check_test=check_test.vbs
48. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
This is exactly as writing ”regular” Nagios
scripts.
Find Script on:
http://www.monitoringexchange.org
http://exchange.nagios.org
49. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Configuration:
◦ [modules]
◦ LUAScript.dll
◦ ...
◦ [LUA Scripts]
◦ <script>
scriptstest.lua
What, no alias?
◦ Not needed (happens inside the script)
50. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
nscp.print('Loading test script...')
nscp.register('check_foo', „foo')
function foo (command)
◦ nscp.print(command)
◦ code, msg, perf = nscp.execute('CheckCPU','time=5','MaxCrit=5')
◦ return code, 'hello from LUA: ' .. msg, perf
end
51. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
The power of Lua scripts comes from:
◦ The ability to run and modify the result of other
commands
◦ The ability to run ”inside” NSClient++
◦ The simplicity of the language
53. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
' Default settings for your script.
threshold_warning = 50
threshold_critical = 20
' Create the NagiosPlugin object
Set np = New NagiosPlugin
' Define what args that should be used
np.add_arg "warning", "warning threshold", 0
np.add_arg "critical", "critical threshold", 0
If Args.Exists("warning") Then threshold_warning = Args("warning")
If Args.Exists("critical") Then threshold_critical = Args("critical")
np.set_thresholds threshold_warning, threshold_critical
54. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Set colInstances = np.simple_WMI_CIMV2(“.”, "SELECT * FROM Win32_Battery")
For Each objInstance In colInstances
WScript.Echo "Battery " & objInstance.Status
& " - Charge Remaining = " & objInstance.EstimatedChargeRemaining
& "% | charge=" & objInstance.EstimatedChargeRemaining
return_code = np.escalate_check_threshold(return_code, objInstance.EstimatedChargeRemaining)
Next
np.nagios_exit "", return_code
55. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
Status Meaning
1 Other
2 Unknown
3 Idle
4 Printing
5 WarmUp
6 Stopped Printing
7 Offline
57. CONFERENCE ON NAGIOS & OSS Monitoring
May 20th - Bolzano
michael@medin.name
http://www.linkedin.com/in/mickem
Information about NSClient++
http://nsclient.org
Slides, and examples at:
http://nsclient.org/nscp/conferances/2010/WPN/