SplunkLive! Munich 2018: Integrating Metrics and Logs

Integrating Metrics and Logs
Enhanced Troubleshooting & Monitoring
Tomas Baublys | Senior SE, ITOA SME
20. März 2018 | München, Bayern

During the course of this presentation, we may make forward-looking statements regarding future events or
the expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results could
differ materially. For important factors that may cause actual results to differ from those contained in our
forward-looking statements, please review our filings with the SEC.
The forward-looking statements made in this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, this presentation may not contain current or accurate
information. We do not assume any obligation to update any forward looking statements we may make. In
addition, any information about our roadmap outlines our general product direction and is subject to change
at any time without notice. It is for informational purposes only and shall not be incorporated into any contract
or other commitment. Splunk undertakes no obligation either to develop the features or functionality
described or to include any such feature or functionality in a future release.
Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in
the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2018 Splunk Inc. All rights reserved.
Forward-Looking Statements

© 2018 SPLUNK INC.
1. Introduction to Splunk for IT Troubleshooting & Monitoring
2. Metrics Overview
3. Demo
• IT Troubleshooting
• Splunk Enterprise 7.0
• Project Waitomo
What will we be
covering today?
Agenda

Splunk Evolution
The Evolution of Metrics and Logs

Raw Event Search on Log Events
Splunk 1.0: Find the “Needle in the Haystack”
Raw Event
Search

Statistical Analysis on Log Events
Splunk 3.0 and 5.0: Scan through and report on many events
Raw Event
Search
Optimization for
Statistical Queries

Metric Analysis on Metric Data Points
Splunk 7.0: Perform statistical calculations
Raw Event
Search
Optimization for
Statistical Queries
Optimization for
Metrics Queries

What are Metrics?
How are Metrics Defined and why are They Important?

Why Metrics?
… when you already use logs?
▶ Metrics
• Structured data
• Best way to observe a process or device
• Easy way to do monitoring
• You know what you want to measure
• e.g. performance, CPU, Number of
users, memory used, network latency,
disk usage
▶ Events (e.g. Logs)
• Unstructured data
• Needle in the haystack
• Can tell you all about the “why”
• Answers questions you might not even
have yet
• Very versatile

Time Metric Name
system.cpu.idle
Measure
(aka Value)
numeric data point,
different types,
e.g., count, gauge,
timing, sample
Dimensions
Host
(10.1.1.100, web01.splunk.com)
Region
(us-east-1, emea-1, apac-2)
IntanceTypes
(t2.medium, t2.large, m3.large)
What Does a Metric Consist of?
Numerical data points captured over time that can be compressed,
stored, processed and retrieved far more efficiently than events
ABC.XYZ

© 2018 SPLUNK INC.
“Splunk provides one platform to
analyze and investigate across
both events and metrics.”

Faster, Better, More
Efficient
How Metrics Improves Troubleshooting & Monitoring in
Your IT Environment

Automate, collect, index and
visualize your machine data in
real time
Discover insights from any
machine data–structured or
unstructured
Analyze, predict and act on
outcomes from your machine
data
Splunk Enterprise 7.0
The easiest way to aggregate, analyze and get answers from your machine data
MONITOR INVESTIGATE BUILD INTELLIGENCE

© 2018 SPLUNK INC.
▶ 20x and beyond performance
improvement for monitoring and
alerting using metrics data
▶ Sample use cases: CPU utilization,
temperature fluctuations in devices,
app downloads
▶ All Splunk Platform benefits apply:
• Visualizations and alerting
• Role-based access controls
• Data onboarding
• Clustering, Scaling, Alerting
• Leverage open source for existing
sourcetypes (statsd, collectd)
• Supports SaaS apps + legacy/on-
premises systems
Splunk Metrics
Taking the meh out of metrics
Metrics car telemetry dashboard–
example of high volume data, large # of searches in one dashboard

Metrics and logs in one
unified experience
Find trends and root
cause easier and faster
based on purpose built
workflows
Start monitoring for free,
expand to span across
teams, use cases and
large hybrid environments
Built for Infrastructure
Monitoring, deploys in
minutes and easy to
maintain
Project Waitomo
Seamless Monitoring
and Troubleshooting
Automated Investigations Expandable Install to Insight
in Minutes

© 2018 SPLUNK INC.
The Marriage of
Metrics & Logs
Splunk Project
Waitomo

Download Splunk Enterprise or try Splunk Cloud for free:
www.splunk.com/download
www.splunk.com/waitomo

© 2018 SPLUNK INC.
1. Splunk allows you to troubleshoot and
monitor from a single platform
2. Splunk natively supports metrics at scale,
helping you reduce MTTR
3. New in 2018 – Project Waitomo purpose
built for infrastructure monitoring
Splunk provides one
platform to analyze and
investigate across both
Events and Metrics
Key
Takeaways

Save the Date 2018
October 1-4, 2018
▶ 8,750+ Splunk Enthusiasts
▶ 300+ Sessions
▶ 100+ Customer Speakers
Plus Splunk University:
▶ Three Days: September 29-October 1, 2018
▶ Get Splunk Certified for FREE!
▶ Get CPE credits for CISSP, CAP, SSCP
Walt Disney World Swan and Dolphin Resort in Orlando
conf .splunk.com
SAVE THE DATE!

Thank You!
Don't forget to rate this session on Pony Poll

▶ IT Ops & Application Performance: Metrics provide usage, performance and
availability data (by OS, storage, Apps, Clouds, etc.)
• Trends can identify where there is a problem
• When trends and thresholds illustrate performance issues, other data sources are
correlated to determine the root causes
Use Cases
IT Ops and Application Performance are driven by Metrics

Metric Store
Ability to ingest and store
metric measurements
at scale
mstats
tstats equivalent to
query time series from
metrics indexes
Metrics Catalog
REST APIs to query lists
of ingested metrics
and dimensions
Metrics – The New Way
Ingest metrics natively
SPL

▶ 06/29/2017 16:45:15.170 collection="Available Memory"
object=Memory counter="Pages/sec" Value=264
host=10.0.8.156
▶ 06/29/2017 16:47:47.170 collection="MSExchangeIS_Mailbox"
object="MSExchangeIS Mailbox" counter="Messages
Submitted/sec" instance="_Total" Value=185.3656
host=10.0.8.156
Metrics – Status Quo
Here: Windows Perfmon
Timestamp
Metric Name
Measurement Value
Dimensions

Dimensions
Fields that help describe and add context to a metric
▶ Dimensions are fields that help describe and add context to a metric
▶ For example a metric named “cpu.usage” might have dimensions for
host, IP address or asset location
▶ Use dimensions to split-by and filter metric data, but not as a primary
way to query the metric store
▶ Standard fields, such as host, source, sourcetype, index can be treated
as dimensions
▶ There are no limits to the number of dimensions you can have…
▶ That said, be mindful and consider best practices
▶ Examples
• Temp Sensor – Dimensions: time, latitude, longitude / Value:temperature
• Pressure Sensor – Dimensions: time, valve_id / Value: pressure(psi)
• IT Monitoring – Dimensions: time, host, pid / Value: cpu, memory
• Splunk Internal Metrics – Dimensions: time, user / Value: search_count
• Web Access – Dimensions: time, requester_ip, request_method, request_url / Value:request_duration, count

▶ Customers want to aggregate, store and analyze as well as stream-process time-series metrics
data in an efficient manner. Furthermore, this system has to scale to handle data rates that may
be orders of magnitude larger than our current rates, and work seamlessly on Cloud and on-prem
deployments.
▶ Luckily, our current technology stack does support ingestion, search and analytics over time
series data, and we can leverage a lot of the machinery we have already built. However, the use
cases around metrics data store differ from log data in some fundamental ways, to list a few:
• Metrics data is voluminous
• Metrics data is structured data with dimensions and numerical measure field
• Lower latency and higher search concurrency requirements
▶ Currently, various customers and solutions engineers need to employ workarounds on our current
system to satisfy the above requirements but these are only stop gap measures that won't scale
to the next level and often times don't meet the latency/performance, TCO and scaling
requirements.
Why Metrics Matter
Metrics support helps customers aggregate, store and analyze data more efficiently

Metrics versus Events
Two distinct machine data sources that have been hard to integrate…until now
Metrics
▶ Numbers describing a particular process or activity
▶ Measured over intervals of time –
i.e., time series data
▶ Common metrics sources:
• System metrics (CPU, memory, disk)
• Infrastructure metrics (AWS CloudWatch)
• Web tracking scripts (Google Analytics)
• Application agents (APM, error tracking)
Events
▶ Immutable record of discrete events that happen
over time
▶ Come in three forms: plain text, structured, binary
▶ Common event sources:
• System and server logs (syslog, journald)
• Firewall and intrusion detection system logs
• Social media feeds (Twitter…)
• Application, platform and server logs (log4j, log4net,
Apache, MySQL, AWS)
Timestamp Metric Name Value Dimensions
1481050800 os.cpu.user 42.12345 hq:us-west-1
Sample Metric
[29/Aug/2017 08:47:05:316503] "POST /cart.do?uid=84e8d742-a31d69&action=remove&&product_id=BS-
2&JSESSIONID=SD6SAL4FF1ADFF9 HTTP 1.1" 200 2569 "http://www.buttercupenterprises.com/product.screen?
product_id=BS-2" "Mozilla/5.0 (Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/57.0.2957.0 Safari/537.36" 98
Sample Log
Equivalent to
1 metric value

▶ Millions of CPUs in data centers, and billions of connected devices produce an ever increasing amount
of metrics data
• According to Gartner, the number of IoT endpoint devices (devices = metrics) will total 20.4 billion by 2020, up from 6.4 billion in 2016
• With more workloads moving to the cloud and more devices coming on line every day, metrics data is a foundational and strategic data source. As
structured, time-series data, metrics do not benefit from “schema-on-read” and are far more efficient than log data.
▶ Improved performance and scalability for monitoring and alerting
• With Splunk Enterprise 7.0, the performance of monitoring and alerting on metrics data is boosted by up to 200x vs. previous Splunk releases.
• When ingesting typical metrics payloads with supported metrics source types (collectd_http, statsd, metrics_csv), a metrics index requires about 50%
less disk storage space compared to storing the same payload in an events index.
• Because metrics queries now return faster, monitoring in Enterprise 7.0 puts less strain on the deployment and uses fewer resources. In the past you
didn’t have a choice. You had to use Events or nothing. Now you can choose the right tool for your particular analytics task.
▶ Splunk is a real-time data analytics platform delivering a unified experience between logs and metrics
• Splunk metrics removes context switching time between separate monitoring and troubleshooting tools by correlating metrics and logs; provides
flexibility to ingest these different data types in the most efficient way.
• This is a significant step toward end-to-end monitoring (starting with metrics) and investigation (pin-pointing issues with events) in the same platform.
Metrics Boosts Splunk Enterprise
Boosts performance of monitoring and alerting on metrics by 200X.
Requires *50% less disk space.

▶ New SPL command
▶ optimized for fast retrieval of metrics aggregations (only aggregations on _value)
▶ Like tstats, it is a generating command that generates reports without transforming the events.
▶ unlike tstats, it can search from both on-disk data (historical search) and in-memory data (realtime
search)
▶ mstats cannot search event index, tstats and search commands cannot search metrics index
▶ mstats is a reporting command
mstats
Syntax
| mstats <stats-function> …
WHERE index=<metric_index> AND metric_name=<metricname> …]
[span=<timespan>] [BY <metricname|dimension>]

▶ New SPL command: mcatalog
▶ optimized to list catalog information
(e.g., metric names, dimensions) of
metric store
Syntax
| mcatalog values(<field>) …
[WHERE index=<metric_index>
AND metric_name=<metricname> …]]
[BY <metricname|dimension>]
▶ New REST endpoints
▶ list metric names:
/services/catalog/metricstore/metrics
▶ list dimension names:
/services/catalog/metricstore/dimensi
ons
▶ list dimension values:
/services/catalog/metricstore/dimensi
ons/{dimension-name}/values
▶ You can also use filters with these
endpoints to limit results by index,
dimension, and dimension values.
Metrics Catalog

SplunkLive! Munich 2018: Integrating Metrics and Logs

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a SplunkLive! Munich 2018: Integrating Metrics and Logs

Semelhante a SplunkLive! Munich 2018: Integrating Metrics and Logs (20)

Mais de Splunk

Mais de Splunk (20)

Último

Último (20)

SplunkLive! Munich 2018: Integrating Metrics and Logs