This session will show a “case study” of how CWPS uses built-in Kaseya functionality to eliminate over a hundred and fifty superfluous tickets per day, while enlisting utilities to produce “actionable intelligence” for those tickets that need human intervention. The “building blocks” present in Kaseya will be detailed, and content takeaways will be provided for attendees’ forays into the world of “automatic remediation.” Special emphasis will be placed on auditing exceptions when executing automatic remediation, as well as “WWYAFLSDEDWTT?”— the meaning of which will be revealed during the session.
15. Eliminating Superfluous Tickets
Rebuilding Event Sets – Best Practices
• Use EID, Source & Description if possible
– Systems Management Pack also… doesn’t:
– Be aware of why this is a problem!
16. Eliminating Superfluous Tickets
Rebuilding Event Sets – Best Practices
• What’s worth waking up an Engineer?
– “Critical” priority
• What’s going to ruin someone’s day?
– “High” priority
• What needs addressed in a day or so?
– “Monitoring” priority
• What’s going to get you sued?
– “Auditing” priority
17. Eliminating Superfluous Tickets
Rebuilding Event Sets – Best Practices
• Removing superfluous Events:
– Which ticket queue was the Alert in?
• Locate the Event Set…
– What was the Event ID?
• Locate the exact Event in the Event Set…
– What’s *unique* about the Alert
• Modify the Event (or add another Event with
specific info to match the superfluous alert
ticket, and set it to Ignore):
18. Eliminating Superfluous Tickets
Are you pondering what I’m pondering, Pinky?
• Before you add an Event… think!
Can I do something
more intelligent
with this alert?
19. Presentation Outline
Who’s That Guy?
Eliminating Superfluous Tickets
Automatic Remediation Building
Blocks & Example Cases
Actionable Intelligence
Additional Resources &
Brief Q&A
20. Automatic Remediation Case #1
Are hard disk errors actually legitimate problems?
Use the “Run Script” option first:
Note: For automatic remediation to take place, the machine must be online to
use “Run Script,” so do not use this for “Agent Offline” alerts
21.
22. Don’t get SMART with me!
I will pull this car over right now!
S.M.A.R.T. = Self-Monitoring, Analysis and Reporting
Technology; often written as SMART, is a monitoring system
for computer hard disk drives to detect and report on
various indicators of reliability, in the hope of anticipating
failures. Reference: http://en.wikipedia.org/wiki/S.M.A.R.T.
23. Automatic Remediation Case #1
Are hard disk errors actually legitimate problems?
Is the disk fixed or removable?
• We don’t care about Jim Bob’s iPod…
Is SMART enabled on the disk?
• If not, can we enable it?
How’s the disk SMART health?
• If it’s “PASSED,” ignore the Event Log!
24. Use The Variables, Luke!
A true Jedi will always RTFM…
Event Log Alerts populate variables when a
particular Event is encountered
http://help.kaseya.com/WebHelp/EN/VSA/6030000/index.htm#4853.htm
Monitor Alarms populate variables when
the Counter/Service/Process goes “out of
acceptable operating range”
http://help.kaseya.com/WebHelp/EN/VSA/6030000/index.htm#1936.htm
25. Using Event Log Alert Variables
The data is there… let’s use it!
Within an Email Within a Procedure Description
<at> #at# alert time
<cg> #cg# event category
<cn> #cn# computer name
<db-view.column> not available Include a view.column from the database. For
example, to include the computer name of the
machine generating the alert in an email, use <db-
vMachine.ComputerName>
<ed> #ed# event description
<ei> #ei# event id
<es> #es# event source
<et> #et# event time
<eu> #eu# event user
<ev> #ev# event set name
<gr> #gr# group ID
<id> #id# machine ID
<lt> #lt# log type (Application, Security, System)
<tp> #tp# event type - (Error, Warning, Informational, Success
Audit, or Failure Audit)
Note: #subject# and #body# are also available but wouldn’t fit on this slide
26. Using Monitor Alarm Variables
The data is there… let’s use it!
Within an Email Within a Procedure Description
<ad> #ad# alarm duration
<ao> #ao# alarm operator
<at> #at# alert time
<av> #av# alarm threshold
<cg> #cg# event category
<db-view.column> not available Include a view.column from the database. For
example, to include the computer name of the
machine generating the alert in an email, use <db-
vMachine.ComputerName>
<dv> #dv# SNMP device name
<gr> #gr# group ID
<id> #id# machine ID
<ln> #ln# monitoring log object name
<lo> #lo# monitoring log object type: counter, process, object
<lv> #lv# monitoring log value
<mn> #mn# monitor set name
not available #subject# subject text of the email message, if an email was
sent in response to an alert
not available #body# body text of the email message, if an email was sent
in response to an alert
27. Automatic Remediation Case #1
Are hard disk errors actually legitimate problems?
• Make an Event Set that catches disk
events only (excluding tape errors)
• Set the Re-Arm time to an hour (recommended)
30. Automatic Remediation Case #1
Are hard disk errors actually legitimate problems?
Let’s make some Agent Procedures!
• Use a “parent” procedure that calls
“child” procedures to better pinpoint
exceptions & iterate through all drives
• Use the Agent Procedure Log to log key
data points for later troubleshooting
• Perform output validation to confirm all
commands and third-party utilities being
employed produce expected output
31. Automatic Remediation Case #1
Assembling your utility belt
• SMART Health Checker:
– Checks SMART health of a fixed disk
– http://smartmontools.sourceforge.net
• Head.exe & Tail.exe
– Text & file manipulation
– http://unxutils.sourceforge.net/
• Built-in Windows utilities like DiskPart:
33. Automatic Remediation Case #1
Are hard disk errors actually legitimate problems?
What can we “key off of” in these events?
34. Automatic Remediation Case #1
The ciiiiircle of liiiife…
Errors inconsistently identify the drive, so
we’ll check all fixed disks in a loop:
Can’t enable
SMART? Drive
failure imminent?
Fire a ticket!
Unexpected
return value?
Fire a ticket!
Unexpected
return value?
Fire a ticket!
More than 10 disks?
Fire a ticket!
Removable
disk? Skip
to the next
one…
Query disk #__
for type
Interpret
results of query
Fixed disk?
Check SMART
Interpret
SMART results
Increment disk
counter
36. Automatic Remediation Case #1
1) Turn local Event Log Alert variables into
Global variables:
2) Fire the “Initialization” Agent Procedure
3) Fire the “Copy Disk Utilities To Agent”
Agent Procedure
37. Automatic Remediation Case #1
Best Practices – Re-Using Common Agent Procedures
The “Initialization” Agent Procedure:
• Defines alert e-mail addresses:
• Copies over a common suite of utilities:
• Initializes the command output files:
38. Automatic Remediation Case #1
1) Defines abbreviated paths to the
DiskPart script files:
2) Copies the DiskPart script files,
smartctl.exe (x86 or x64) and
diskpart.exe (if Win2K) to the machine
39. Automatic Remediation Case #1
1) Uses DiskPart.exe to request a list of
physical disks on the machine:
2) Starts the Loop that checks disks 0-9 to
see if they’re present in above output:
40. Automatic Remediation Case #1
1) If there are 10+ physical
disks, procedure will alert the
global:monitoringAlertEMailAddress:
2) Fires the next Agent Procedure which
configures the query variables:
41. Automatic Remediation Case #1
1) Sets the disk count,
smartctl.exe query letter (uses letters
instead of disk numbers, no idea why), and
path to the DiskPart script for that disk:
42. Automatic Remediation Case #1
1) Checks the disk type
and proceeds with the next Agent
Procedure only if it’s not a USB disk
2) If it is a USB disk, we brag about having
prevented a Superfluous Ticket:
44. Automatic Remediation Case #1
1) Checks if SMART is enabled
a. Attempts to enable it if not (next slide)
2) Formulates the SMART health check:
3) Runs the SMART health check command
and calls the next Agent Procedure to
interpret the output:
45. Automatic Remediation Case #1
1) Attempts to enable SMART using
smartctl.exe:
2) Failure to enable SMART is accounted in
the next Agent Procedure
46. Automatic Remediation Case #1
1) Interprets the output of
smartctl.exe and:
a. …alerts if SMART can’t be enabled on a disk
with issues (next slide) –or –
b. …boasts if SMART reports that the drive has
a “healthy” SMART status (following slide)
2) Accounts for potential failure of
smartctl.exe by sending failures to
global:monitoringAlertEMailAddress
49. Automatic Remediation Case #1
Are hard disk errors actually legitimate problems?
• If SMART can be enabled, is the disk
healthy? Or is failure imminent?
50. Are We There Yet?
Who’s That Guy?
Eliminating Superfluous Tickets
Automatic Remediation Building
Blocks & Example Cases
Actionable Intelligence
Additional Resources &
Brief Q&A
54. Eliminating Superfluous Tickets
Rebuilding Event Sets – Important Questions
Are all hard disk
errors actually
legitimate
problems?
Does that stopped
Service need actual
Human intervention
or just a swift kick?
Is the disk that’s low on drive space an
iPod, iPad or pagefile-only volume?
Do I need to run an
“Update Lists By Scan”
on this server to catch
recently removed or
deactivated Services?
What is the answer to the Ultimate Question
of Life, The Universe and Everything?
Has the client told
us to not monitor
something?
58. Seriously… Are We There Yet?
Who’s That Guy?
Eliminating Superfluous Tickets
Automatic Remediation Building
Blocks & Example Cases
Actionable Intelligence
Additional Resources &
Brief Q&A
59.
60. Using ALARM/Event Variables
The data is there… let’s use it!
Reference:
http://help.kaseya.com/WebHelp/EN/VSA/6030000/index.htm#1936.htm
http://help.kaseya.com/WebHelp/EN/VSA/6030000/index.htm#4853.htm
All presentation materials are available at:
61. There is a theory which states that
if ever anyone discovers exactly
what the Universe is for and why it
is here, it will instantly disappear
and be replaced by something even
more bizarre and inexplicable.
There is another theory, which
states that this has already
happened.
Editor's Notes
http://www.sxc.hu/photo/1383851/?forcedownload=1
http://www.sxc.hu/photo/1383851/?forcedownload=1
Duck is from Microsoft clipart
ScroogeMcDuck’s Vault = 3 square acres of DuckburgAssuming each coin is silver-dollar sized, the vault contains $27 trillion US dollarsDoes not include all of McDuck Industries(image from DeviantArt)