Splunk is a platform for collecting, analyzing, and visualizing machine data. It provides real-time search and reporting across IT systems and infrastructure. Splunk indexes data from various sources without needing predefined schemas, and scales to handle large volumes of data from thousands of systems. The presentation covers an overview of the Splunk platform and how it can be used by developers, including custom visualizations, the Java SDK, and integrations with Spring applications.
2. About me
• Developer Evangelist at Splunk since July 2012
• Splunk Community Member
• Splunk for JMX
• SplunkJavaLogging
• SplunkBase – Apps and Answers
• Splunk Architect and Administrator
• Coder
• Been paying my mortgage developing Enterprise Java solutions most of my career
• Kia Ora
• I do not have a speech impediment, I am from Aotearoa, so please restrain all your
sheep, Lord of the Rings and Kim Dotcom heckles until beer o’clock !!
2
3. Agenda
• Overview of the Splunk platform
• Splunk for Developers
• Custom Visualization Demo
• Splunk Java SDK
• Spring Integration Splunk Extensions
• Integration Adaptors Demo
• Some other JVM/Java related tools
• SplunkJavaLogging
• Splunk for JMX
• Questions
3
5. So What is Splunk, Exactly?
• Splunk is an engine for machine data • It’s software – download and install it in 5 minutes,
• Provides visibility, reporting and search across “freemium” model
all your IT systems and infrastructure • Runs on all modern platforms
• Doesn’t lock you into a fixed schema • Open and extensible architecture
5
6. Indexes any Machine Data
• Capture events from logs in real time
• Run scripts to gather system metrics, connect to APIs and databases
• Listen to syslog, raw TCP/UDP, gather Windows events
• Universally indexes any data format so it doesn’t need adapters, “schema on the fly”
• Stream in data directly from your application code
• Decode binary data and feed in
Windows Linux/Unix Virtualization Applications Databases Network
• Registry • Configurations • Hypervisor • Web logs • Configurations • Configurations
• Event logs • Syslog • Guest OS • Log4J, JMS, • Audit/query • syslog
• File system • File system • Guest Apps JMX logs • SNMP
• sysinternals • Ps, iostat, top • .NET events • Tables • netflow
• Code and • Schemas
scripts
6
7. Centralizes Data Across the Environment
• Splunk Universal Forwarder sends data to Splunk Indexer from remote systems
• Uses minimal system resources, easy to install and deploy
• Delivers secure, distributed, real-time universal data collection for tens of thousands of endpoints
Indexing/Search
Server
Splunk
Forwarders
7
8. Scales to TBs/day and Thousands of Users
• Automatic load balancing linearly scales indexing
• Distributed search and MapReduce linearly scales search and reporting
8
9. Provides Strong Machine Data Governance
• Provides comprehensive controls for data security, retention and integrity
• Single sign-on integration enables pass-through authentication of user credentials
9
10. Splunk and Apache Hadoop MR/HDFS
• Splunk is an implementation of the Map Reduce algorithmic approach
• It is not Apache Hadoop MapReduce(MR) the product
• Splunk is not agnostic of its underlying data source , optimized to Splunk Index files
• Real time vs Batch Jobs
• Optimal for time series based data
• End to End Integrated Big Data Solution
• Fine grained protection of access and data using role based permissions
• Data retention and aging controls
• Users can submit “Map Reduce” jobs without needing to know how to code a job
• Splunk Search Language vs Pig/Sawzill
• But why not get the best of both worlds
• Splunk Hadoop Ops
• Splunk Hadoop Connect
• Shuttl (archiving to HDFS / S3)
10
11. Splunk Has Four Primary Functions
• Searching and Reporting (Search Head)
• Indexing and Search Services (Indexer)
• Local and Distributed Management (Deployment
Server)
• Data Collection and Forwarding (Forwarder)
A
Splunk
install
can
be
one
or
all
roles…
11
12. Getting Data into Splunk
Agent and Agent-less Approach for Flexibility.
syslog
Local
File
Monitoring
log
files,
config
files
TCP/UDP
dumps
and
trace
files
syslog
compa>ble
hosts
and
network
devices
Windows
Inputs
Scripted
Inputs
Event
Logs
shell
scripts
custom
performance
counters
Mounted
File
Systems
WMI
Ac>ve
parsers
batch
loading
registry
monitoring
hostnamemount
Event
Logs
Performance
Directory
AcAve
Directory
monitoring
code
shell
virtual
host
perf
Unix,
Linux
and
Windows
hosts
Windows
hosts
Custom
apps
and
scripted
API
connec>ons
Windows
hosts
Agent-‐less
Data
Input
Splunk
Forwarder
12
13. Universal Data Forwarder
Forward
data
without
negaHvely
impacHng
producHon
performance.
• Delivers secure, distributed,
Universal
Forwarder
Deployment
real-time universal data
collection for 10’s of thousands
of endpoints Logs
Messages
ConfiguraHons
Metrics
Scripts
• Extends Splunk data fabric to
large scale private cloud and
desktop environments
• Uses minimal system
resources, easy to install and
deploy Central
Deployment
Management
– < half memory and footprint of
Splunk 4.1; <1% of single core
Monitor
files,
changes
and
the
system
registry;
capture
metrics
and
status.
13
14. Horizontal Scaling
Load balanced search and indexing for massive, linear scale out.
Distributed
Search
Forwarder
Auto
Load
Balancing
14
15. Multiple Datacenters
Index and store locally. Distribute searches to datacenters, networks & geographies.
Headquarters
Distributed Search
London
Hong
Kong
Tokyo
New
York
15
16. Send Data to Other Systems
Route raw data in real time or send alerts based on searches.
Service
Desk
Event
Console
Problem
InvesHgaHon
SIEM
17. High Availability / DR
Combine auto load balancing and data replication.
Distributed
Search
Primary
Cluster
Secondary
Cluster
Data
Clone
Splunk
Forwarders
Auto
Load
Balancing
17
18. Integrate External Data
Extend search with lookups to external data sources.
LDAP,
AD
Watch
Lists
CMDB
CRM/
ERP
Correlate
IP
addresses
with
locaHons,
accounts
with
regions
18
19. Integrate Users and Roles
Integrate authentication with LDAP and Active Directory.
LDAP,
AD
Splunk
Flexible
Roles
CapabiliHes
&
Filters
Users
and
Groups
Manage
Indexes
Share
Searches
Save
Searches
Problem
InvesHgaHon
Problem
InvesHgaHon
Problem
InvesHgaHon
Manage
Users
NOT
tag=PCI
App=ERP
…
Map LDAP & AD groups to flexible Splunk roles. Define any search as a filter.
19
21. Deployment Monitoring
Keep Tabs On Your Splunk Enterprise Deployment.
Licenses
Sourcetypes
Indexers
Forwarders
21
22. Real-time Search
Data
Monitor
Input
Parsing
Pipeline
Real-‐Hme
• Source,
event
typing
Real-‐Hme
Search
Parsing
Queue
Index
Queue
• Character
set
Buffer
Process
TCP/UDP
Input
normalizaHon
• Line
breaking
• Timestamp
idenHficaHon
Scripted
Input
• Regex
transforms
Indexing
Pipeline
Raw
data
Index
Files
Index
22
23. Real-time Alerting
source=“/var/log/secure.log”
“BAD
SU”
Data
Monitor
Input
Parsing
Pipeline
Real-‐Hme
• Source,
event
typing
Real-‐Hme
Search
Parsing
Queue
Index
Queue
• Character
set
Buffer
Process
TCP/UDP
Input
normalizaHon
• Line
breaking
• Timestamp
idenHficaHon
Scripted
Input
• Regex
transforms
Indexing
Pipeline
Raw
data
Index
Files
Index
23
24. New Approach to Heterogeneous Data
Universal Indexing Search-time Knowledge Flexibility and
Fast Time to Value
• No data normalization • Knowledge applied at • Normalization as it’s
• Automatically handles search-time needed
timestamps • No brittle schema to • Faster implementation
• Parsers not required work around • Easy search language
• Index every term & • Multiple views into the • Multiple views into the
pattern “blindly” same data same data
• No attempt to • Splunk helps find
“understand” up front transactions, patterns
and trends
24
25. Inside Universal Indexing
AutomaHc
event
boundary
idenHficaHon
AutomaHc
Hmestamp
normalizaHon
...enable
accurate
searching
and
trending
by
Hme
across
all
data:
25
26. Inside Search-time Knowledge Extraction
AutomaHcally
discovered
fields
And
user-‐defined
fields
...
enable
staHsHcs
and
precise
search
on
specific
fields:
26
27. Inside Search-time Knowledge Extraction
Searches
saved
as
event
types
Plus
tagging
of
event
types,
hosts
and
other
fields
...
enable
normalized
reporHng,
knowledge
sharing
and
granular
access
control.
27
29. Splunk
&
Developers
Custom/ Accelerate development &
Machine
Data
SplunkUI Existing testing
(Splunk Apps) Applications
SDKs Integrate data from Splunk
Search, chart and graph into your existing IT
Save and schedule searches as alerts
Export search results environment for operational
Manage inputs and indexes visibility
Add & remove users and roles
REST API Build custom solutions to
deliver real-time business
insights from Big Data
Engine
29
30. Splunk in the Developer Community
• Over 1,000 unique visitors per week to dev.splunk.com
• Over 500 followers on Twitter @splunkdev
• Over 350 enterprise developer trial licenses granted
32. How does Splunk Accelerate Dev/Test?
• Splunk frees you from upfront database design for analytics
• late binding schema
• Developers and QA/test engineers don’t have to ask IT/Ops to get
logs off machines
• Role base access to all data within one console without having to log into
production systems
• All events are indexed and accessible in real-time in one place.
• Ad-Hoc real-time monitoring and historical investigation searchable from one place
• Correlations and insights across multiple tiers.
• Splunk lets you find issues quickly, so you can fix issues quickly
• Integrate Splunk search results into testing assertions
32
33. StubHub & Splunk
Engineering uses Splunk to investigate
“Splunk
filled
a
vacuum
we
didn’t
bugs
know
we
had.” QA uses it during dev cycles
- Nathan Pratt, Tech Lead, Tools
& Automation, StubHub
• Started with Site Operations to
resolve issues
• Grew to engineers, QA, upper
management in technology
• Release requirement – Projects
are required to certify that all
logs are Splunk-friendly
High-level view of application errors - used by site operations, engineering, and upper management
33
35. Integration into existing IT tools
The Splunk development platform is
optimized for core enterprise developer
skills
Splunk UI Your application REST API communicates directly with a
(Splunk Apps) Splunk instance for search, management
SDKs and admin
• Provides full control to the developer
REST API • Use any language or tool that supports
splunkd HTTP
SDKs provide broad coverage of the REST
API in popular languages
• Log directly to Splunk from any app
• Build a UI on any web stack
• Integrate into existing infrastructure
35
36. Splunk REST API
• Exposes an API method for every feature in the product
• Whatever you can do in the UI – you can do through the API.
• Run searches
• Manage Splunk configurations
• API is RESTful
• Endpoints are served by splunkd
• Requests are GET, POST, and DELETE HTTP methods
• Responses are Atom XML Feeds
• JSON coming in 5.0
• Search results can be output in CSV/JSON/XML/Raw
36
37. Developer Platform SDKs
• We want to make it as easy as possible for developers to build Big Data apps on
top of the Splunk platform
• Several different language offerings, Software Development Kits (SDKs)
• Javascript, Java, Python, PHP, C#(private), Ruby(private)
• All Splunk functionality is accessible via our SDKs
• Get Data into Splunk
• Execute Splunk Searches, get data out of Splunk
• Manage Splunk
• Customized User Interfaces
37
38. Comcast & Splunk
Content browsed,
purchased and Customer profile
watched
All tracked by time
+ and MAC address /
device assignments
and MAC address
Correlate usage and profile data to
analyze customer behavior:
• Revenues driven by content browsed
• Improving local content mix
• Better search results
• Tailor content promotion
38
39. Bosch & Splunk
Healthcare
Management
Splunking data sent from Evidence-
ARM-based devices based
• Uses the Java SDK to send data Telehealth
to Splunk
Cardiac
Rhythm
Monitoring
39
41. Splunk
=
Integrated,
Enterprise-‐ready
Big
Data
Plajorm
• No need to write MapReduce jobs, just
get data into Splunk and analyze
• Splunk delivers real-time insight – like
clickstream analysis, IT early-warning
systems, security and fraud protection
• Late-binding schema allows for faster,
more flexible data insight gathering
• Data collection is integrated
• Distributed architecture offers scale-out
capabilities with access control
• Out-of-the-box reporting and analytics
capabilities
• SDKs cover over 170 REST API
endpoints
41
42. Socialize & Splunk
“Splunk eliminates the need to
write large MapReduce jobs
to get meaningful information
out of our data. This means
we can get powerful stats and
information to our key
stakeholders in a fraction of
the time.”
- Isaac Mosquera, CTO,
Socialize
42
43. Visualizing Splunk with the SDKs
• Splunkweb has rich, but sometimes limited, visualization
options
• You can use the SDKs to extract data from Splunk using a
search, and visualize it
• Real-time searches can be especially powerful
• Using the Javascript SDK you can integrate with third
party charting librarys like Google Charts & D3.
43
44.
45. Realtime Twitter Visualization Demo
• Twitter feeds being “firehosed” into Splunk and searched over in realtime
• Uses the Splunk Javascript SDK to stream the realtime search results from Splunk into
a totally customized web based user interface
• Visualization of most popular hashtags with interactive pie chart,word cloud and geo
heatmap using D3
Javascript SDK
Browser
45
48. Get the Java SDK
• Open sourced under the Apache v2.0 license
• Clone from Github : git clone https://github.com/splunk/splunk-sdk-java.git
• Project level support for Eclipse and Intellij IDE’s
• Pre-requisites
• JRE 6+
• Ant ( Maven support is in the works )
• Splunk installed
• Loads of code examples
• Project examples folder
• Unit Tests
• http://dev.splunk.com
• http://gist.github.com/damiendallimore
• Comprehensive coverage of the REST API
48
49. Java SDK Class Model
HTTPService Resource
Service ResourceCollection Entity
EntityCollection Application Index Input
InputCollection SavedSearchCollection
• Collections use a common mechanism to create and remove entities
• Entities use a common mechanism to retrieve and update property values, and access entity metadata
• Service is a wrapper that facilitates access to all Splunk REST endpoints
49
50. Key Java SDK Use cases
• Connect and Authenticate
• Manage
• Input Events
• Search
50
51. Connect and Authenticate
public static Service connectAndLoginToSplunkExample() {
Map<String, Object> connectionArgs = new HashMap<String, Object>();
connectionArgs.put("host", ”somehost");
connectionArgs.put("username", ”spring");
connectionArgs.put("password", ”integration");
connectionArgs.put("port", 8089);
connectionArgs.put("scheme", "https");
// will login and save the session key which gets put in the HTTP Authorization header
Service splunkService = Service.connect(connectionArgs);
return splunkService;
}
51
53. Input Events
public static void logEventToSplunkExample() {
Service splunkService = connectAndLoginToSplunkExample();
// Get a Receiver object
Receiver receiver = splunkService.getReceiver();
// Set the sourcetype
Args logArgs = new Args();
logArgs.put("source", ”http-rest");
logArgs.put("sourcetype", ”spring-example");
// Log an event into the spring index
receiver.log(”spring", logArgs, ”SpringOne 2GX rocks");
}
• Other Input transports
• HTTP REST Streaming
• Raw TCP Oneshot & Streaming
• Raw UDP & Syslog
53
54. Search
• Search query
• a set of commands and functions you use to retrieve events from an index or a real-time stream ,
"search index=spring error OR exception | head 10”
• Saved search
• a search query that has been saved to be used again and can be set up to run on a regular schedule
• Search job
• an instance of a completed or still-running search operation.Using a search ID you can access the
results of the search when they become available. Job results are saved for a period of time on the
server and can be retrieved
• Search Modes
• Normal : asynchronous , poll job for status and results
• Realtime : same as normal, but stream is kept open a results streamed in realtime
• Blocking : synchronous , a job handle is returned when search is completed
• Oneshot : synchronous , no job handle is returned, results are streamed
• Export : synchronous, not a search per say, doesn’t create a job, results are streamed oldest to newest
54
55. Blocking Searches
public static void exportSearchExample() {
Service splunkService = connectAndLoginToSplunkExample();
String searchQuery = "search error OR exception | head 10";
Args queryArgs = new Args();
queryArgs.put("earliest_time", "-1d@d");
queryArgs.put("latest_time", "now");
// perform the export , blocks here
InputStream stream = splunkService.export(searchQuery, queryArgs);
processInputStream(stream);
}
public static void simpleSearchExample() {
Service splunkService = connectAndLoginToSplunkExample();
String searchQuery = "search error OR exception| head 10";
Args queryArgs = new Args();
queryArgs.put("earliest_time", "-3d@d");
queryArgs.put("latest_time", "-1d@d");
// perform the search , blocks here
InputStream stream = splunkService.search(searchQuery, queryArgs);
processInputStream(stream);
}
55
56. Non Blocking Search
public static void searchJobExample() {
Service splunkService = connectAndLoginToSplunkExample();
String outputMode = "csv";// xml,json,csv
// submit the job
Job job = splunkService.getJobs().create("search index=spring error OR fatal | head 10");
while (!job.isDone()) {
try {Thread.sleep(500);}
catch (Exception e) {}
}
Args outputArgs = new Args();
outputArgs.put("output_mode", outputMode);
InputStream stream = job.getResults(outputArgs);
processInputStream(stream, outputMode); // uses xml stream, opencsv and gson
}
56
57. Realtime Search
public static void realTimeSearchExample() {
Service splunkService = connectAndLoginToSplunkExample();
Args queryArgs = new Args();
queryArgs.put("earliest_time", "rt-5m");
queryArgs.put("latest_time", "rt");
// submit the job
Job job = splunkService.getJobs().create("search index=spring exception OR error”, queryArgs);
…
}
57
58. Alternate JVM Languages
Scala Groovy Clojure
Javascript(Rhino) JRuby PHP(Quercus)
Ceylon Kotlin Jython
We don’t need SDK’s for these languages , we can just use the Java SDK !
58
59. Groovy
class SplunkJavaSDKWrapper {
static main(args) {
//connect and login
def connectionParameters = [host:”somehost",username:"spring",password:"integration"]
Service service = Service.connect(connectionParameters)
//get Splunk Server info
ServiceInfo info = service.getInfo()
def splunkInfo = [:]
for (key in info.keySet())
splunkInfo.put(key,info.get(key))
printSplunkInfo(splunkInfo)
}
static printSplunkInfo(splunkInfo) {
println "Info”
splunkInfo.each { key, value ->println key + " : " + value}
}
}
59
60. Scala
import com.splunk.Service._
import scala.collection.mutable.HashMap
import scala.collection.JavaConversions._
object SplunkJavaSDKWrapper {
def main(args: Array[String]) = {
//connect and login
val connectionArgs = HashMap[String, Object]("host" ->”somehost”,"username" ->”me”,"password" ->”foo")
val service = connect(connectionArgs)
//get Splunk Server info
val info = service.getInfo
// Scala/Java conversion
val javaSet = info.keySet
val scalaSet = javaSet.toSet
//print out Splunk Server info
for (key <- scalaSet)
println(key + ":" + info.get(key))
}
}
60
61. Spring Integration Splunk Extensions
Special thanks to Jianwei Li(Jarred) & Mark Pollack for creating this !
61
62. Spring Integration
• Spring Integration is an extension to core Spring
• Based on “Enterprise Integration Patterns” model
• Messaging model and Declarative Adaptors
• Makes it easier to build integration solutions
62
63. Spring Integration Splunk Adaptors
• Splunk Java SDK makes it easier to use the REST API
• Building on this , the Spring Integration Adaptors make it easier for Spring/Java
developers to declaratively build data integration solutions and utilize the power of the
Splunk platform
• https://github.com/SpringSource/spring-integration-extensions
• Inbound Adaptor
– Search and export the data from Splunk and push into message channels
– Filter, transform, export to other destinations
• Outbound Adaptor
– Can consume data acquired by other Integration adaptors(Twitter, JDBC…) and
push it into Splunk for indexing, searching and visualization
63
64. Spring Integration Splunk Inbound Adaptor
• Blocking, Non Blocking, Saved & Realtime Searches
• Exporting
64
69. SplunkJavaLogging
• A logging framework to allow developers to as seamlessly as possible
integrate Splunk best practice logging semantics into their code and transport
events directly to Splunk.
• Custom handler/appender implementations(REST and Raw TCP) for the 3
most prevalent Java logging frameworks in play. Splunk events directly from
your code.
• LogBack
• Log4j
• java.util.logging
• Better handling of stacktraces
• All code and examples is on Github
69
71. Splunk for JMX
• SplunkBase App for monitoring JVM Applications
• Out of the box dashboards for JVM level monitoring (java.lang domain)
• Memory , Threading, GC, CPU etc…
• Very simple configuration to wire up monitoring of any Mbeans from applications
(Tomcat, Jboss, Cassandra, Coherence etc…)
• Hotspot, JRockit, IBMJ9, OpenJDK
• Poll JMX attributes and operations , index data over time, correlate with other data
• Supports large scale deployments of JVMs
• Extensible and Customizable
• Many connectivity options
• RMI , IIOP
• Direct Process Attachment
• MX4J Hessian, Burlap and Soap
• Freely available download from SplunkBase & all code is on Github
71
72. Learn More. Stay Connected.
At SpringOne 2GX :
• Come by our booth
• Splunk demos ,Q & A
• SDK code
• Tee Shirts !!
Web :
• Developer Platform : http://dev.splunk.com
• SplunkBase : http://splunk-base.splunk.com
• Twitter : @splunkdev , @damiendallimore
• Email : devinfo@splunk.com , ddallimore@splunk.com
• Blog : http://blogs.splunk.com/dev
• Github : http://github.com/splunk
• Splunk Live! Events and Online Videos at http://www.splunk.com
72