Splunk as a_big_data_platform_for_developers_spring_one2gx

A Big Data Platform for
Developers
Damien Dallimore
Developer Evangelist at Splunk

© 2012 SpringOne 2GX. All rights reserved. Do not distribute without permission.

About me
•  Developer Evangelist at Splunk since July 2012
•  Splunk Community Member
•  Splunk for JMX
•  SplunkJavaLogging
•  SplunkBase – Apps and Answers
•  Splunk Architect and Administrator
•  Coder
•  Been paying my mortgage developing Enterprise Java solutions most of my career
•  Kia Ora
•  I do not have a speech impediment, I am from Aotearoa, so please restrain all your
sheep, Lord of the Rings and Kim Dotcom heckles until beer o’clock !!

2

Agenda
•  Overview of the Splunk platform
•  Splunk for Developers
•  Custom Visualization Demo
•  Splunk Java SDK
•  Spring Integration Splunk Extensions
•  Integration Adaptors Demo
•  Some other JVM/Java related tools
•  SplunkJavaLogging
•  Splunk for JMX
•  Questions

3

So What is Splunk, Exactly?
•  Splunk is an engine for machine data •  It’s software – download and install it in 5 minutes,
•  Provides visibility, reporting and search across “freemium” model
all your IT systems and infrastructure •  Runs on all modern platforms
•  Doesn’t lock you into a fixed schema •  Open and extensible architecture

5

Indexes any Machine Data
•  Capture events from logs in real time
•  Run scripts to gather system metrics, connect to APIs and databases
•  Listen to syslog, raw TCP/UDP, gather Windows events
•  Universally indexes any data format so it doesn’t need adapters, “schema on the fly”
•  Stream in data directly from your application code
•  Decode binary data and feed in

Windows Linux/Unix Virtualization Applications Databases Network
•  Registry •  Configurations •  Hypervisor •  Web logs •  Configurations •  Configurations
•  Event logs •  Syslog •  Guest OS •  Log4J, JMS, •  Audit/query •  syslog
•  File system •  File system •  Guest Apps JMX logs •  SNMP
•  sysinternals •  Ps, iostat, top •  .NET events •  Tables •  netflow
•  Code and •  Schemas
scripts

6

Centralizes Data Across the Environment
•  Splunk Universal Forwarder sends data to Splunk Indexer from remote systems
•  Uses minimal system resources, easy to install and deploy
•  Delivers secure, distributed, real-time universal data collection for tens of thousands of endpoints

Indexing/Search

Server

Splunk
Forwarders

7

Scales to TBs/day and Thousands of Users
•  Automatic load balancing linearly scales indexing
•  Distributed search and MapReduce linearly scales search and reporting

8

Provides Strong Machine Data Governance
•  Provides comprehensive controls for data security, retention and integrity

•  Single sign-on integration enables pass-through authentication of user credentials

9

Splunk and Apache Hadoop MR/HDFS
•  Splunk is an implementation of the Map Reduce algorithmic approach
•  It is not Apache Hadoop MapReduce(MR) the product
•  Splunk is not agnostic of its underlying data source , optimized to Splunk Index files
•  Real time vs Batch Jobs
•  Optimal for time series based data
•  End to End Integrated Big Data Solution
•  Fine grained protection of access and data using role based permissions
•  Data retention and aging controls
•  Users can submit “Map Reduce” jobs without needing to know how to code a job
•  Splunk Search Language vs Pig/Sawzill
•  But why not get the best of both worlds
•  Splunk Hadoop Ops
•  Splunk Hadoop Connect
•  Shuttl (archiving to HDFS / S3)
10

Splunk Has Four Primary Functions
•  Searching and Reporting (Search Head)

•  Indexing and Search Services (Indexer)

•  Local and Distributed Management (Deployment
Server)

•  Data Collection and Forwarding (Forwarder)

A
Splunk
install
can
be
one
or
all
roles…

11

Getting Data into Splunk
Agent and Agent-less Approach for Flexibility.

syslog
Local
File
Monitoring

log
files,
config
files

TCP/UDP
dumps
and
trace
files

syslog
compa>ble
hosts

and
network
devices
Windows
Inputs

Scripted
Inputs
Event
Logs

shell
scripts
custom
performance
counters

Mounted
File
Systems
WMI
Ac>ve

parsers
batch
loading
registry
monitoring

hostnamemount
Event
Logs
Performance
Directory

AcAve
Directory
monitoring

code

shell

virtual

host

perf

Unix,
Linux
and
Windows
hosts

Windows
hosts
Custom
apps
and
scripted
API
connec>ons
Windows
hosts

Agent-‐less
Data
Input
Splunk
Forwarder

12

Universal Data Forwarder
Forward
data
without
negaHvely
impacHng
producHon
performance.

•  Delivers secure, distributed,
Universal
Forwarder
Deployment

real-time universal data
collection for 10’s of thousands
of endpoints Logs
Messages
ConﬁguraHons
Metrics
Scripts

•  Extends Splunk data fabric to
large scale private cloud and
desktop environments
•  Uses minimal system
resources, easy to install and
deploy Central
Deployment
Management

–  < half memory and footprint of
Splunk 4.1; <1% of single core
Monitor
ﬁles,
changes
and
the
system
registry;
capture
metrics
and
status.

13

Horizontal Scaling
Load balanced search and indexing for massive, linear scale out.

Distributed
Search

Forwarder

Auto
Load

Balancing

14

Multiple Datacenters
Index and store locally. Distribute searches to datacenters, networks & geographies.

Headquarters

Distributed Search

London
Hong
Kong
Tokyo
New
York

15

Send Data to Other Systems
Route raw data in real time or send alerts based on searches.

Service
Desk

Event
Console

Problem
InvesHgaHon
SIEM

High Availability / DR
Combine auto load balancing and data replication.

Distributed
Search

Primary
Cluster
Secondary
Cluster

Data
Clone

Splunk
Forwarders

Auto
Load
Balancing

17

Integrate External Data
Extend search with lookups to external data sources.
LDAP,
AD
Watch

Lists

CMDB

CRM/
ERP

Correlate
IP
addresses
with
locaHons,
accounts
with
regions

18

Integrate Users and Roles
Integrate authentication with LDAP and Active Directory.

LDAP,
AD

Splunk
Flexible
Roles
CapabiliHes
&
Filters

Users
and
Groups
Manage

Indexes

Share

Searches
Save

Searches

Problem
InvesHgaHon
Problem
InvesHgaHon
Problem
InvesHgaHon

Manage

Users

NOT

tag=PCI

App=ERP

…

Map LDAP & AD groups to flexible Splunk roles. Define any search as a filter.
19

Centralized Licensing Management
Groups, Stacks, and Pools for Enterprise Deployments.

Problem
InvesHgaHon

20

Deployment Monitoring
Keep Tabs On Your Splunk Enterprise Deployment.

Licenses

Sourcetypes
Indexers
Forwarders

21

Real-time Search

Data

Monitor
Input
Parsing
Pipeline
Real-‐Hme

•  Source,
event
typing
Real-‐Hme
Search

Parsing
Queue

Index
Queue

•  Character
set
Buﬀer
Process

TCP/UDP
Input
normalizaHon

•  Line
breaking

•  Timestamp
idenHﬁcaHon

Scripted
Input
•  Regex
transforms
Indexing

Pipeline
Raw
data

Index
Files
Index

22

Real-time Alerting

source=“/var/log/secure.log”
“BAD
SU”

Data

Monitor
Input

Parsing
Pipeline
Real-‐Hme

•  Source,
event
typing
Real-‐Hme
Search

Parsing
Queue

Index
Queue

•  Character
set
Buﬀer
Process

TCP/UDP
Input
normalizaHon

•  Line
breaking

•  Timestamp
idenHﬁcaHon

Scripted
Input
•  Regex
transforms
Indexing

Pipeline
Raw
data

Index
Files
Index

23

New Approach to Heterogeneous Data
Universal Indexing Search-time Knowledge Flexibility and
Fast Time to Value

•  No data normalization •  Knowledge applied at •  Normalization as it’s
•  Automatically handles search-time needed
timestamps •  No brittle schema to •  Faster implementation
•  Parsers not required work around •  Easy search language
•  Index every term & •  Multiple views into the •  Multiple views into the
pattern “blindly” same data same data
•  No attempt to •  Splunk helps find
“understand” up front transactions, patterns
and trends

24

Inside Universal Indexing

AutomaHc
event
boundary
idenHﬁcaHon

AutomaHc
Hmestamp
normalizaHon

...enable
accurate
searching
and

trending
by
Hme
across
all
data:

25

Inside Search-time Knowledge Extraction
AutomaHcally
discovered
fields

And
user-‐defined
fields

...
enable
staHsHcs
and
precise
search

on
specific
fields:

26

Inside Search-time Knowledge Extraction
Searches
saved
as
event
types

Plus
tagging
of
event
types,
hosts
and
other
ﬁelds

...
enable
normalized
reporHng,
knowledge

sharing
and
granular
access
control.

27

Splunk
&
Developers

Custom/ Accelerate development &
Machine
Data
SplunkUI Existing testing
(Splunk Apps) Applications
SDKs Integrate data from Splunk
Search, chart and graph into your existing IT
Save and schedule searches as alerts
Export search results environment for operational
Manage inputs and indexes visibility
Add & remove users and roles

REST API Build custom solutions to
deliver real-time business
insights from Big Data
Engine

29

Splunk in the Developer Community

•  Over 1,000 unique visitors per week to dev.splunk.com
•  Over 500 followers on Twitter @splunkdev
•  Over 350 enterprise developer trial licenses granted

Accelerate
development &
testing

How does Splunk Accelerate Dev/Test?
•  Splunk frees you from upfront database design for analytics
•  late binding schema
•  Developers and QA/test engineers don’t have to ask IT/Ops to get
logs off machines
•  Role base access to all data within one console without having to log into
production systems
•  All events are indexed and accessible in real-time in one place.
•  Ad-Hoc real-time monitoring and historical investigation searchable from one place
•  Correlations and insights across multiple tiers.
•  Splunk lets you find issues quickly, so you can fix issues quickly
•  Integrate Splunk search results into testing assertions

32

StubHub & Splunk
Engineering uses Splunk to investigate
“Splunk
ﬁlled
a
vacuum
we
didn’t
bugs
know
we
had.” QA uses it during dev cycles
- Nathan Pratt, Tech Lead, Tools
& Automation, StubHub

•  Started with Site Operations to
resolve issues
•  Grew to engineers, QA, upper
management in technology
•  Release requirement – Projects
are required to certify that all
logs are Splunk-friendly

High-level view of application errors - used by site operations, engineering, and upper management

33

Integrate Splunk into
your IT environment

Integration into existing IT tools
The Splunk development platform is
optimized for core enterprise developer
skills

Splunk UI Your application REST API communicates directly with a
(Splunk Apps) Splunk instance for search, management
SDKs and admin
•  Provides full control to the developer
REST API •  Use any language or tool that supports
splunkd HTTP

SDKs provide broad coverage of the REST
API in popular languages
•  Log directly to Splunk from any app
•  Build a UI on any web stack
•  Integrate into existing infrastructure
35

Splunk REST API
•  Exposes an API method for every feature in the product
•  Whatever you can do in the UI – you can do through the API.
•  Run searches
•  Manage Splunk configurations

•  API is RESTful
•  Endpoints are served by splunkd
•  Requests are GET, POST, and DELETE HTTP methods
•  Responses are Atom XML Feeds
•  JSON coming in 5.0
•  Search results can be output in CSV/JSON/XML/Raw

36

Developer Platform SDKs
•  We want to make it as easy as possible for developers to build Big Data apps on
top of the Splunk platform
•  Several different language offerings, Software Development Kits (SDKs)
•  Javascript, Java, Python, PHP, C#(private), Ruby(private)
•  All Splunk functionality is accessible via our SDKs
•  Get Data into Splunk
•  Execute Splunk Searches, get data out of Splunk
•  Manage Splunk
•  Customized User Interfaces

37

Comcast & Splunk

Content browsed,
purchased and Customer profile
watched
All tracked by time
+ and MAC address /
device assignments
and MAC address

Correlate usage and profile data to
analyze customer behavior:
•  Revenues driven by content browsed
•  Improving local content mix
•  Better search results
•  Tailor content promotion
38

Bosch & Splunk
Healthcare
Management
Splunking data sent from Evidence-
ARM-based devices based
•  Uses the Java SDK to send data Telehealth
to Splunk

Cardiac
Rhythm
Monitoring

39

Splunk as an
integrated,
enterprise-ready Big
Data platform

Splunk
=
Integrated,
Enterprise-‐ready
Big
Data
Plajorm

•  No need to write MapReduce jobs, just
get data into Splunk and analyze
•  Splunk delivers real-time insight – like
clickstream analysis, IT early-warning
systems, security and fraud protection
•  Late-binding schema allows for faster,
more flexible data insight gathering
•  Data collection is integrated
•  Distributed architecture offers scale-out
capabilities with access control
•  Out-of-the-box reporting and analytics
capabilities
•  SDKs cover over 170 REST API
endpoints

41

Socialize & Splunk
“Splunk eliminates the need to
write large MapReduce jobs
to get meaningful information
out of our data. This means
we can get powerful stats and
information to our key
stakeholders in a fraction of
the time.”
- Isaac Mosquera, CTO,
Socialize

42

Visualizing Splunk with the SDKs
•  Splunkweb has rich, but sometimes limited, visualization
options
•  You can use the SDKs to extract data from Splunk using a
search, and visualize it
•  Real-time searches can be especially powerful
•  Using the Javascript SDK you can integrate with third
party charting librarys like Google Charts & D3.

43

Realtime Twitter Visualization Demo
•  Twitter feeds being “firehosed” into Splunk and searched over in realtime
•  Uses the Splunk Javascript SDK to stream the realtime search results from Splunk into
a totally customized web based user interface
•  Visualization of most popular hashtags with interactive pie chart,word cloud and geo
heatmap using D3

Javascript SDK
Browser

45

Splunk Java SDK(Software Development Kit)

47

Get the Java SDK
•  Open sourced under the Apache v2.0 license
•  Clone from Github : git clone https://github.com/splunk/splunk-sdk-java.git
•  Project level support for Eclipse and Intellij IDE’s
•  Pre-requisites
•  JRE 6+
•  Ant ( Maven support is in the works )
•  Splunk installed
•  Loads of code examples
•  Project examples folder
•  Unit Tests
•  http://dev.splunk.com
•  http://gist.github.com/damiendallimore
•  Comprehensive coverage of the REST API

48

Java SDK Class Model
HTTPService Resource

Service ResourceCollection Entity

EntityCollection Application Index Input

InputCollection SavedSearchCollection
•  Collections use a common mechanism to create and remove entities
•  Entities use a common mechanism to retrieve and update property values, and access entity metadata
•  Service is a wrapper that facilitates access to all Splunk REST endpoints

49

Key Java SDK Use cases
•  Connect and Authenticate
•  Manage
•  Input Events
•  Search

50

Connect and Authenticate

public static Service connectAndLoginToSplunkExample() {

Map<String, Object> connectionArgs = new HashMap<String, Object>();
connectionArgs.put("host", ”somehost");
connectionArgs.put("username", ”spring");
connectionArgs.put("password", ”integration");
connectionArgs.put("port", 8089);
connectionArgs.put("scheme", "https");

// will login and save the session key which gets put in the HTTP Authorization header
Service splunkService = Service.connect(connectionArgs);
return splunkService;

}

51

Manage

public static void getServerInfoExample() {

Service splunkService = connectAndLoginToSplunkExample();

ServiceInfo info = splunkService.getInfo();
System.out.println("Info:");
for (String key : info.keySet())
System.out.println(" " + key + ": " + info.get(key));

Entity settings = splunkService.getSettings();
System.out.println("nSettings:");
for (String key : settings.keySet())
System.out.println(" " + key + ": " + settings.get(key));

}

52

Input Events
public static void logEventToSplunkExample() {

// Get a Receiver object
Receiver receiver = splunkService.getReceiver();

// Set the sourcetype
Args logArgs = new Args();
logArgs.put("source", ”http-rest");
logArgs.put("sourcetype", ”spring-example");

// Log an event into the spring index
receiver.log(”spring", logArgs, ”SpringOne 2GX rocks");

}

•  Other Input transports
•  HTTP REST Streaming
•  Raw TCP Oneshot & Streaming
•  Raw UDP & Syslog

53

Search
•  Search query
•  a set of commands and functions you use to retrieve events from an index or a real-time stream ,
"search index=spring error OR exception | head 10”
•  Saved search
•  a search query that has been saved to be used again and can be set up to run on a regular schedule
•  Search job
•  an instance of a completed or still-running search operation.Using a search ID you can access the
results of the search when they become available. Job results are saved for a period of time on the
server and can be retrieved
•  Search Modes
•  Normal : asynchronous , poll job for status and results
•  Realtime : same as normal, but stream is kept open a results streamed in realtime
•  Blocking : synchronous , a job handle is returned when search is completed
•  Oneshot : synchronous , no job handle is returned, results are streamed
•  Export : synchronous, not a search per say, doesn’t create a job, results are streamed oldest to newest

54

Blocking Searches
public static void exportSearchExample() {

String searchQuery = "search error OR exception | head 10";
Args queryArgs = new Args();
queryArgs.put("earliest_time", "-1d@d");
queryArgs.put("latest_time", "now");
// perform the export , blocks here
InputStream stream = splunkService.export(searchQuery, queryArgs);
processInputStream(stream);

}
public static void simpleSearchExample() {

String searchQuery = "search error OR exception| head 10";
queryArgs.put("earliest_time", "-3d@d");
queryArgs.put("latest_time", "-1d@d");
// perform the search , blocks here
InputStream stream = splunkService.search(searchQuery, queryArgs);
processInputStream(stream);

}
55

Non Blocking Search
public static void searchJobExample() {


String outputMode = "csv";// xml,json,csv

// submit the job
Job job = splunkService.getJobs().create("search index=spring error OR fatal | head 10");

while (!job.isDone()) {
try {Thread.sleep(500);}
catch (Exception e) {}
}

Args outputArgs = new Args();
outputArgs.put("output_mode", outputMode);

InputStream stream = job.getResults(outputArgs);
processInputStream(stream, outputMode); // uses xml stream, opencsv and gson
}

56

Realtime Search
public static void realTimeSearchExample() {



queryArgs.put("earliest_time", "rt-5m");
queryArgs.put("latest_time", "rt");

// submit the job
Job job = splunkService.getJobs().create("search index=spring exception OR error”, queryArgs);

…

}

57

Alternate JVM Languages

Scala Groovy Clojure
Javascript(Rhino) JRuby PHP(Quercus)
Ceylon Kotlin Jython

We don’t need SDK’s for these languages , we can just use the Java SDK !

58

Groovy
class SplunkJavaSDKWrapper {

static main(args) {
//connect and login
def connectionParameters = [host:”somehost",username:"spring",password:"integration"]
Service service = Service.connect(connectionParameters)
//get Splunk Server info
ServiceInfo info = service.getInfo()

def splunkInfo = [:]

for (key in info.keySet())
splunkInfo.put(key,info.get(key))

printSplunkInfo(splunkInfo)

}
static printSplunkInfo(splunkInfo) {
println "Info”
splunkInfo.each { key, value ->println key + " : " + value}
}
}

59

Scala
import com.splunk.Service._
import scala.collection.mutable.HashMap
import scala.collection.JavaConversions._

object SplunkJavaSDKWrapper {

def main(args: Array[String]) = {
//connect and login
val connectionArgs = HashMap[String, Object]("host" ->”somehost”,"username" ->”me”,"password" ->”foo")
val service = connect(connectionArgs)
//get Splunk Server info
val info = service.getInfo
// Scala/Java conversion
val javaSet = info.keySet
val scalaSet = javaSet.toSet
//print out Splunk Server info
for (key <- scalaSet)
println(key + ":" + info.get(key))
}
}

60

Spring Integration Splunk Extensions

Special thanks to Jianwei Li(Jarred) & Mark Pollack for creating this !

61

Spring Integration

•  Spring Integration is an extension to core Spring
•  Based on “Enterprise Integration Patterns” model
•  Messaging model and Declarative Adaptors
•  Makes it easier to build integration solutions

62

Spring Integration Splunk Adaptors
•  Splunk Java SDK makes it easier to use the REST API
•  Building on this , the Spring Integration Adaptors make it easier for Spring/Java
developers to declaratively build data integration solutions and utilize the power of the
Splunk platform

•  https://github.com/SpringSource/spring-integration-extensions

•  Inbound Adaptor
–  Search and export the data from Splunk and push into message channels
–  Filter, transform, export to other destinations
•  Outbound Adaptor
–  Can consume data acquired by other Integration adaptors(Twitter, JDBC…) and
push it into Splunk for indexing, searching and visualization

63

Spring Integration Splunk Inbound Adaptor

•  Blocking, Non Blocking, Saved & Realtime Searches
•  Exporting

64

Spring Integration Splunk Outbound Adaptor

•  HTTP REST Input
•  TCP Input

65

XML Configuration
Common Splunk settings
<int-splunk:server id="splunkServer" host=”somehost" port="8089" userName=”damien"
password=”foobar"/>

Searching/exporting from Splunk
<int-splunk:inbound-channel-adapter id="splunkInboundChannelAdapter” auto-startup="true"
search="search index=spring error OR exception” splunk-server-ref="splunkServer”
channel="inputFromSplunk" mode="blocking" initEarliestTime="-1d">

<int:poller fixed-rate="5" time-unit="SECONDS"/>

</int-splunk:inbound-channel-adapter>

Inputting events to Splunk
<int-splunk:outbound-channel-adapter id="splunkOutboundChannelAdapter" auto-startup="true"
order="1” channel="outputToSplunkWithMessageStore" splunk-server-ref="splunkServer”
pool-server-connection="true" index="spring" sourceType="twitter-feed"
source="spring-integration-httprest” ingest="submit">
</int-splunk:outbound-channel-adapter>

66

Spring Integration Splunk Twitter Demo

67

SplunkJavaLogging
•  A logging framework to allow developers to as seamlessly as possible
integrate Splunk best practice logging semantics into their code and transport
events directly to Splunk.
•  Custom handler/appender implementations(REST and Raw TCP) for the 3
most prevalent Java logging frameworks in play. Splunk events directly from
your code.
•  LogBack
•  Log4j
•  java.util.logging
•  Better handling of stacktraces
•  All code and examples is on Github

69

Splunk for JMX
•  SplunkBase App for monitoring JVM Applications
•  Out of the box dashboards for JVM level monitoring (java.lang domain)
•  Memory , Threading, GC, CPU etc…
•  Very simple configuration to wire up monitoring of any Mbeans from applications
(Tomcat, Jboss, Cassandra, Coherence etc…)
•  Hotspot, JRockit, IBMJ9, OpenJDK
•  Poll JMX attributes and operations , index data over time, correlate with other data
•  Supports large scale deployments of JVMs
•  Extensible and Customizable
•  Many connectivity options
•  RMI , IIOP
•  Direct Process Attachment
•  MX4J Hessian, Burlap and Soap
•  Freely available download from SplunkBase & all code is on Github
71

Learn More. Stay Connected.
At SpringOne 2GX :
•  Come by our booth
•  Splunk demos ,Q & A
•  SDK code
•  Tee Shirts !!

Web :
•  Developer Platform : http://dev.splunk.com
•  SplunkBase : http://splunk-base.splunk.com
•  Twitter : @splunkdev , @damiendallimore
•  Email : devinfo@splunk.com , ddallimore@splunk.com
•  Blog : http://blogs.splunk.com/dev
•  Github : http://github.com/splunk
•  Splunk Live! Events and Online Videos at http://www.splunk.com

72

Splunk as a_big_data_platform_for_developers_spring_one2gx

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Splunk as a_big_data_platform_for_developers_spring_one2gx

Semelhante a Splunk as a_big_data_platform_for_developers_spring_one2gx (20)

Mais de Damien Dallimore

Mais de Damien Dallimore (14)

Último

Último (20)

Splunk as a_big_data_platform_for_developers_spring_one2gx