5. Raw Event Search on Log Events
Splunk 1.0: Find the “Needle in the Haystack”
Raw Event
Search
6. Statistical Analysis on Log Events
Splunk 3.0 and 5.0: Scan through and report on many events
Raw Event
Search
Optimization for
Statistical Queries
7. Metric Analysis on Metric Data Points
Splunk 7.0: Perform statistical calculations
Raw Event
Search
Optimization for
Statistical Queries
Optimization for
Metrics Queries
9. Why Metrics?
… when you already use logs?
▶ Metrics
• Structured data
• Best way to observe a process or device
• Easy way to do monitoring
• You know what you want to measure
• e.g. performance, CPU, Number of
users, memory used, network latency,
disk usage
▶ Events (e.g. Logs)
• Unstructured data
• Needle in the haystack
• Can tell you all about the “why”
• Answers questions you might not even
have yet
• Very versatile
10. Time Metric Name
system.cpu.idle
Measure
(aka Value)
numeric data point,
different types,
e.g., count, gauge,
timing, sample
Dimensions
Host
(10.1.1.100, web01.splunk.com)
Region
(us-east-1, emea-1, apac-2)
IntanceTypes
(t2.medium, t2.large, m3.large)
What Does a Metric Consist of?
Numerical data points captured over time that can be compressed,
stored, processed and retrieved far more efficiently than events
ABC.XYZ
13. Automate, collect, index and
visualize your machine data in
real time
Discover insights from any
machine data–structured or
unstructured
Analyze, predict and act on
outcomes from your machine
data
Splunk Enterprise 7.0
The easiest way to aggregate, analyze and get answers from your machine data
MONITOR INVESTIGATE BUILD INTELLIGENCE
17. Metrics and logs in one
unified experience
Find trends and root
cause easier and faster
based on purpose built
workflows
Start monitoring for free,
expand to span across
teams, use cases and
large hybrid environments
Built for Infrastructure
Monitoring, deploys in
minutes and easy to
maintain
Project Waitomo
Seamless Monitoring
and Troubleshooting
Automated Investigations Expandable Install to Insight
in Minutes
21. Save the Date 2018
October 1-4, 2018
▶ 8,750+ Splunk Enthusiasts
▶ 300+ Sessions
▶ 100+ Customer Speakers
Plus Splunk University:
▶ Three Days: September 29-October 1, 2018
▶ Get Splunk Certified for FREE!
▶ Get CPE credits for CISSP, CAP, SSCP
Walt Disney World Swan and Dolphin Resort in Orlando
conf .splunk.com
SAVE THE DATE!
24. ▶ IT Ops & Application Performance: Metrics provide usage, performance and
availability data (by OS, storage, Apps, Clouds, etc.)
• Trends can identify where there is a problem
• When trends and thresholds illustrate performance issues, other data sources are
correlated to determine the root causes
Use Cases
IT Ops and Application Performance are driven by Metrics
25. Metric Store
Ability to ingest and store
metric measurements
at scale
mstats
tstats equivalent to
query time series from
metrics indexes
Metrics Catalog
REST APIs to query lists
of ingested metrics
and dimensions
Metrics – The New Way
Ingest metrics natively
SPL
26. ▶ 06/29/2017 16:45:15.170 collection="Available Memory"
object=Memory counter="Pages/sec" Value=264
host=10.0.8.156
▶ 06/29/2017 16:47:47.170 collection="MSExchangeIS_Mailbox"
object="MSExchangeIS Mailbox" counter="Messages
Submitted/sec" instance="_Total" Value=185.3656
host=10.0.8.156
Metrics – Status Quo
Here: Windows Perfmon
Timestamp
Metric Name
Measurement Value
Dimensions
27. Dimensions
Fields that help describe and add context to a metric
▶ Dimensions are fields that help describe and add context to a metric
▶ For example a metric named “cpu.usage” might have dimensions for
host, IP address or asset location
▶ Use dimensions to split-by and filter metric data, but not as a primary
way to query the metric store
▶ Standard fields, such as host, source, sourcetype, index can be treated
as dimensions
▶ There are no limits to the number of dimensions you can have…
▶ That said, be mindful and consider best practices
▶ Examples
• Temp Sensor – Dimensions: time, latitude, longitude / Value:temperature
• Pressure Sensor – Dimensions: time, valve_id / Value: pressure(psi)
• IT Monitoring – Dimensions: time, host, pid / Value: cpu, memory
• Splunk Internal Metrics – Dimensions: time, user / Value: search_count
• Web Access – Dimensions: time, requester_ip, request_method, request_url / Value:request_duration, count
28. ▶ Customers want to aggregate, store and analyze as well as stream-process time-series metrics
data in an efficient manner. Furthermore, this system has to scale to handle data rates that may
be orders of magnitude larger than our current rates, and work seamlessly on Cloud and on-prem
deployments.
▶ Luckily, our current technology stack does support ingestion, search and analytics over time
series data, and we can leverage a lot of the machinery we have already built. However, the use
cases around metrics data store differ from log data in some fundamental ways, to list a few:
• Metrics data is voluminous
• Metrics data is structured data with dimensions and numerical measure field
• Lower latency and higher search concurrency requirements
▶ Currently, various customers and solutions engineers need to employ workarounds on our current
system to satisfy the above requirements but these are only stop gap measures that won't scale
to the next level and often times don't meet the latency/performance, TCO and scaling
requirements.
Why Metrics Matter
Metrics support helps customers aggregate, store and analyze data more efficiently
29. Metrics versus Events
Two distinct machine data sources that have been hard to integrate…until now
Metrics
▶ Numbers describing a particular process or activity
▶ Measured over intervals of time –
i.e., time series data
▶ Common metrics sources:
• System metrics (CPU, memory, disk)
• Infrastructure metrics (AWS CloudWatch)
• Web tracking scripts (Google Analytics)
• Application agents (APM, error tracking)
Events
▶ Immutable record of discrete events that happen
over time
▶ Come in three forms: plain text, structured, binary
▶ Common event sources:
• System and server logs (syslog, journald)
• Firewall and intrusion detection system logs
• Social media feeds (Twitter…)
• Application, platform and server logs (log4j, log4net,
Apache, MySQL, AWS)
Timestamp Metric Name Value Dimensions
1481050800 os.cpu.user 42.12345 hq:us-west-1
Sample Metric
[29/Aug/2017 08:47:05:316503] "POST /cart.do?uid=84e8d742-a31d69&action=remove&&product_id=BS-
2&JSESSIONID=SD6SAL4FF1ADFF9 HTTP 1.1" 200 2569 "http://www.buttercupenterprises.com/product.screen?
product_id=BS-2" "Mozilla/5.0 (Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/57.0.2957.0 Safari/537.36" 98
Sample Log
Equivalent to
1 metric value
30. ▶ Millions of CPUs in data centers, and billions of connected devices produce an ever increasing amount
of metrics data
• According to Gartner, the number of IoT endpoint devices (devices = metrics) will total 20.4 billion by 2020, up from 6.4 billion in 2016
• With more workloads moving to the cloud and more devices coming on line every day, metrics data is a foundational and strategic data source. As
structured, time-series data, metrics do not benefit from “schema-on-read” and are far more efficient than log data.
▶ Improved performance and scalability for monitoring and alerting
• With Splunk Enterprise 7.0, the performance of monitoring and alerting on metrics data is boosted by up to 200x vs. previous Splunk releases.
• When ingesting typical metrics payloads with supported metrics source types (collectd_http, statsd, metrics_csv), a metrics index requires about 50%
less disk storage space compared to storing the same payload in an events index.
• Because metrics queries now return faster, monitoring in Enterprise 7.0 puts less strain on the deployment and uses fewer resources. In the past you
didn’t have a choice. You had to use Events or nothing. Now you can choose the right tool for your particular analytics task.
▶ Splunk is a real-time data analytics platform delivering a unified experience between logs and metrics
• Splunk metrics removes context switching time between separate monitoring and troubleshooting tools by correlating metrics and logs; provides
flexibility to ingest these different data types in the most efficient way.
• This is a significant step toward end-to-end monitoring (starting with metrics) and investigation (pin-pointing issues with events) in the same platform.
Metrics Boosts Splunk Enterprise
Boosts performance of monitoring and alerting on metrics by 200X.
Requires *50% less disk space.
31. ▶ New SPL command
▶ optimized for fast retrieval of metrics aggregations (only aggregations on _value)
▶ Like tstats, it is a generating command that generates reports without transforming the events.
▶ unlike tstats, it can search from both on-disk data (historical search) and in-memory data (realtime
search)
▶ mstats cannot search event index, tstats and search commands cannot search metrics index
▶ mstats is a reporting command
mstats
Syntax
| mstats <stats-function> …
WHERE index=<metric_index> AND metric_name=<metricname> …]
[span=<timespan>] [BY <metricname|dimension>]
32. ▶ New SPL command: mcatalog
▶ optimized to list catalog information
(e.g., metric names, dimensions) of
metric store
Syntax
| mcatalog values(<field>) …
[WHERE index=<metric_index>
AND metric_name=<metricname> …]]
[BY <metricname|dimension>]
▶ New REST endpoints
▶ list metric names:
/services/catalog/metricstore/metrics
▶ list dimension names:
/services/catalog/metricstore/dimensi
ons
▶ list dimension values:
/services/catalog/metricstore/dimensi
ons/{dimension-name}/values
▶ You can also use filters with these
endpoints to limit results by index,
dimension, and dimension values.
Metrics Catalog