How do you keep up with the velocity and variety of data streaming in and get analytics on it even before persistence and replication in Hadoop? In this talk, we'll look at common architectural patterns being used today at companies such as Expedia, Groupon and Zynga that take advantage of Splunk to provide real-time collection, indexing and analysis of machine-generated big data with reliable event delivery to Hadoop. We'll also describe how to use Splunk's advanced search language to access data stored in Hadoop and rapidly analyze, report on and visualize results.
2. Big
Data
Comes
from
Machines
Volume
|
Velocity
|
Variety
|
Variability
Machine-‐generated
data
is
one
of
the
fastest
growing,
most
complex
GPS,
and
most
valuable
segments
of
big
data
RFID,
Hypervisor,
Web
Servers,
Email,
Messaging
Clickstreams,
Mobile,
Telephony,
IVR,
Databases,
Sensors,
Telema>cs,
Storage,
Servers,
Security
Devices,
Desktops
2
3. What
Does
Machine
Data
Look
Like?
Sources
Order
Processing
Middleware
Error
Care
IVR
Twi/er
3
4. Machine
Data
Contains
Cri>cal
Insights
Sources
Customer
ID
Order
ID
Product
ID
Order
Processing
Order
ID
Customer
ID
Middleware
Error
Time
Wai>ng
On
Hold
Care
IVR
Customer
ID
TwiZer
Customer’s
Tweet
ID
Twi/er
Company’s
TwiZer
ID
4
5. Big
Data
Technologies
Aster
Data
Cassandra
Greenplum
Voldemort
Big
Table
CouchDB
Hadoop
Single
Single
RDBMS
SQL
&
NoSQL
RDBMS
Bigger
Sharding
Map/Reduce
RDBMS
Map
/
Reduce
Rela>onal
Database
(highly
structured)
Key/Value,
Tables
or
Temporal,
Unstructured
Other
(semi-‐structured)
Heterogeneous
Time
5
6. Splunk
Turns
Machine
Data
into
Real-‐>me
Insights
Op>mized
for
real-‐>me,
low
latency
and
interac>vity
Ad
hoc
search
Monitor
and
alert
Real-‐Dme
CollecDon
and
Report
and
Indexing
analyze
Splunk
storage
Other
Custom
Stores
dashboards
Developer
PlaHorm
6
7. Splunk
Collects
and
Indexes
Any
Machine
Data
No
upfront
schema.
No
RDBMS.
No
custom
connectors.
Customer
Outside
the
Facing
Data
Datacenter
! Click-‐stream
data
! Manufacturing,
! Shopping
cart
data
logis>cs…
! Online
transac>on
data
! CDRs
&
IPDRs
! Power
consump>on
! RFID
data
Logfiles
Configs
Messages
Traps
Metrics
Scripts
Changes
Tickets
! GPS
data
Alerts
Windows
Linux/Unix
VirtualizaDon
ApplicaDons
Databases
Networking
! Registry
! Configura>ons
&
Cloud
! Web
logs
! Configura>ons
! Configura>ons
! Event
logs
! syslog
! Log4J,
JMS,
JMX
! Audit/query
! syslog
! File
system
! File
system
! Hypervisor
! .NET
events
logs
! SNMP
! sysinternals
! ps,
iostat,
top
! Guest
OS,
Apps
! Code
and
scripts
! Tables
! neglow
! Cloud
! Schemas
7
8. New
Approach
to
Analyzing
Heterogeneous
Data
Universal
Late
Structure
Analysis
and
Indexing
Binding
Visualiza>on
! No
data
normaliza>on
! Knowledge
applied
at
! Normaliza>on
as
it’s
! Automa>cally
handles
search-‐>me
needed
>mestamps
! No
briZle
schema
to
work
! Faster
implementa>on
! Parsers
not
required
around
! Easy
search
language
! Index
every
term
&
! Mul>ple
views
into
the
! Mul>ple
views
into
the
paZern
“blindly”
same
data
same
data
! No
aZempt
to
! Find
transac>ons,
paZerns
“understand”
up
front
and
trends
Rapid
>me-‐to-‐deploy:
hours
or
days
8
10. Opera>onal
Intelligence
for
IT
and
Business
Users
IT
Opera>ons
Management
Web
Intelligence
Applica>on
Management
Business
Analy>cs
Security
&
Compliance
Customer
LOB
Owners/
Support
Execu>ves
Opera>ons
Website/Business
Teams
Analysts
System
IT
Administrator
Execu>ves
Development
Security
Auditors
Teams
Analysts
10
11. Scalability
to
Tens
of
TBs/Day
on
Commodity
Servers
Offload
search
load
to
Splunk
Search
Heads
Auto
load-‐balanced
forwarding
to
as
many
Splunk
Indexers
as
you
need
to
index
terabytes/day
Send
data
from
1000s
of
servers
using
combina>on
of
Splunk
Forwarders,
syslog,
WMI,
message
queues,
or
other
remote
protocols
11
12. Splunk
Big
Data
Solu>on
Product-‐based
Integrated
and
Performance
Solu>on
End-‐to-‐end
at
scale
! Easy
to
download
and
! Collects
data
from
tens
! Proven
at
mul>-‐terabyte
deploy
of
thousands
of
sources
scale
per
day
! Pre-‐integrated,
end-‐to-‐ ! Advanced
real-‐>me
and
! Upwards
of
PB
under
end
func>onality
historical
analysis
of
management
! Enterprise-‐grade
data
! 4,000+
customers
features
! Fast,
custom
visualiza>ons
for
IT
and
business
users
! Developer
APIs
SDKs
12
13. Accelerate
Games
Releases
with
Big
Data
Insight
Splunk
Use:
– Over
10
TB/day
from
scaled-‐out
cloud
and
physical
infrastructure
– Data
indexed
includes
web
server
and
applica>on
logs
for
games
– Splunk
for
opera>onal
visibility,
troubleshoo>ng
and
monitoring
– Users
include:
game
opera>ons,
developers,
and
corporate
IT
Value
Delivered:
– Faster
game
releases
with
real-‐>me
visibility
into
produc>on
issues
– Reduced
fault
resolu>on
>me
from
hours
to
minutes
– Scale
ops
team
to
manage
and
monitor
growing
infrastructure
l Leading
social
gaming
company
globally
l 232
million
monthly
ac>ve
users
l 60
million
daily
ac>ve
users
13
14. ! Launched
in
November
2008
! Over
33
million
ac>ve
customers
(as
of
December
2011)
! More
than
11,000
employees
worldwide
! Ac>ve
in
48
countries
! Running
over
1,000
deals/day
worldwide
15. Daily
Uses
of
Splunk
Key
AcDviDes
Splunk
Use
Cases
! Guarantee
API
performance
! All
log
data
is
available
through
Splunk
! Monitor
API
data
usage
! Dashboards
! Early
access
to
key
business
metrics
! No>fica>ons
(conversions,
funnel,
etc.)
! End-‐to-‐end
tes>ng
>
! Near
real-‐>me
! Ad
hoc
troubleshoo>ng
“Cannot
have
a
server
that
is
not
sending
data
into
Splunk”
15
17. Complemen>ng
BI
and
Hadoop
CollecDon
&
OperaDonal
Intelligence
Daily,
weekly,
monthly
metrics
across
promo>ons
offers
and
acceptance
rates
Applica>on
Performance
Management
(APM)
and
system
availability
Hadoop
Machine
Data
ETL
–
highly
reliable
data
delivery
IntegraDon
to
HDFS
Data
Archival
&
Batch
Data
Science
Long-‐term
data
warehousing
and
specialized,
batch
analy>cs
17
19. Formerly
-‐
Sr.
Director
–
Who
Eddie
Sa/erly
Am
I?
Architecture
&
Engineering,
Expedia
! The
World’s
Largest
! Discount
travel
site
Travel
Site
Hotwire®
! First
$1B
Quarter
in
2011
! 4,000+
Technology
Workers
! 90
localized
Expedia.com®
and
! Development
Team
Who
Is
Hotels.com®
sites
of
1,800
Expedia?
! NASDAQ:
(EXPE)
19
20. Where
Splunk
Comes
In
12,000+
27,000+
1,000+
227,000
Servers
Hosts
Source
Types
Sources
38
Indexers,
16
Search
heads
>
6.5TB
per
day
indexed
20+
Different
Solu>ons
for
RCA
All
Migrated
to
Splunk
in
3
Months
20
21. SDK
Integra>ons
built
for
Cassandra
Why
Splunk?
Archiving
Data
to
Hadoop
for
batch
data
stores
analysis
Speed
of
Deployment
Splunkbase
Apps
Scales
via
Available
for
Commodity
Download
Hardware
Developers
Build
Aggrega>on
of
Custom
Apps
and
Log
Data
from
Dashboards
Any
Device
Simple
UI
for
IT
and
Business
Users
21
22. Splunk
Adop>on
Over
Ten
Months
Use
case:
Business
Unit
Use
case:
Ecommerce
Systems
Data:
125GB/day
Data:
1.8TB/day
Systems:
1100
Systems:
8700
Deployment:
Jan.
2011
Deployment:
March
2011
Big
Data
Integra>on
Use
case:
App
Transac>ons
Data:
3TB/day
Ini>al
Pilot
Viral
Growth
from
Systems:
90TB
Data
Per
Mo.
Demonstrated
Value
Deployment:
1Q12-‐2Q12
All
Devices,
All
Data
Centers
Use
case:
All
Devices
Data:
~4TB/day
Systems:
~21000
Deployment:
Aug.
2011
22
23. Integrate
External
Data
Extend
search
with
lookups
to
external
data
sources.
LDAP,
AD
Watch
Lists
CMDB
Message
Stores
Reference
Lookups
Correlate
across
mul>ple
data
sources
and
data
sets
using
indexes
and
keys
23
24. Unique
Characteris>cs
of
Splunk
MapReduce
• Real-‐>me
temporal
MapReduce
• Preview
in-‐progress
searches
• Searching
works
on
any
devices
• Simplified
Search
Language
24
25. Splunk
Impact
/
Top
Takeaways
Splunk
helped
deliver
Expedia
an
annual
ROI
of
over
$11
Million
ROI
=
5x
original
Splunk
usage
More
data
=
Business
Case
is
viral
more
benefits
! Tools
Consolida>on
! 50+
Apps
Developed
! Adding
more
data
to
and
Re>rement
by
Our
Team
Splunk
via
weekly
deployments
! 83%
MTTR
Reduc>on
! Over
1,400
Users
on
! Analyzing
more
data
Outage
Avoidance
a
Regular
Basis
! sets
in
Splunk
UI
from
Hadoop
&
Cassandra
25