DOXLON November 2016 - Data Democratization Using Splunk

Data Democratisation Using
Splunk
Neil Roy Chowdhury
neil@strft.com

About Me
• Splunking Since 2008
• Largest Splunk Implementation:
• 3 TB/day
• 1.2 PB Searchable
• 900 Users
• Interests:
• Guitars
• And the occasional Uke

What is Splunk?
• Google Search for IT Data?
• Log aggregation Tool?
• Data Visualisation Tool?
• Data Platform with App Creation Capabilities
• Proprietary Search Language - SPL
• Correlation of Structured and Unstructured Data Sources
• Visualisation capabilities
• Out of the Box
• Modular

Getting Data In
Unstructured Data
Sources
Structured Data
Sources - JSON,
CSV, XML
Forwarders
HEC
Data Sources Indexer
Line Breaking
Timestamp
Recognition
Data
Segmentation
Pipeline
Persist to
Disk
Index
Bucket
Bucket
Bucket
Bucket
Bucket
Keywords
Raw Data

Data Collection using Splunk
Forwarder
• Splunk forwarder capabilities
• File based Inputs
• Database Inputs
• Scripted Inputs
• Forwarder Conﬁgurations deployed as modular add-ons

Typical Splunk Search
index = <my_product> sourcetype=web.access checkout | stats
avg(response_time) as “Average Response Time” by request

Searching Data
Query Index By
Keyword
Load Raw Results
Returned in Memory
Apply Data Extractions,
Transformations and
Lookups
Run Streaming
Commands
Indexers - Map
Search Heads - Reduce
Knowledge
Objects
Receive Results and
“Reduce”
Run Additional
Commands
Visualise, Report,
Alert

So what about Knowledge Objects?
• Most Knowledge Objects are configurable from UI
• Common Types:
• Field Extractions - regex to extract fields
• Field Aliases - Alias a name of a field
• Lookups - vs flat files and kv-store
• Tags - Provides event grouping abstraction
• Eventtypes - Provides event categorisation
• Calculated Fields - Data manipulations

Goal?
• Queries like:
• Become:
index=<my_website> “/checkout/auth/conﬁrmation” | rex “<some humungous regex that extracts
customer id in addition to other things>” | eval response_time_seconds = resp_time_milliseconds/
(1000) | where http_code == 200 | lookup db_locations customer_id OUTPUT location | stats
avg(response_time_seconds) as avg_response_time by location
eventtype=auth_successful tag=web | stats avg(response_time_seconds) as
average_response_time by location

Data Democratisation
• Sounds like the holy grail of data
• Idealistic?

Scenario
• Microservices Architecture
• Numerous Development Teams working under different service
umbrellas
• Mix of legacy systems with modern services
• Dependance on vendor integrations
• Data can be sensitive

Typical Data Democratisation Issues
• Security - Some data is sensitive yet valuable but we’d like an open
access model
• Knowledge Fragmentation - Its our data, lets make sure everyone
knows what it means.
• Adoption - People need to like it. Shouldn’t get in the way.
• Scalability
• Chargeback - its not my data, why should I pay for it?

Security - Delegated Access Model
• Splunk Search Apps can serve knowledge containers
• Knowledge Objects Ownership can scope local to the app or global to
the entire system.
• Splunk Indexes are data containers.
• Data Access granted by index
• Assign an app per product or service umbrella
• Assign Data Owner

Delegated Access Model
Federated Group Splunk Role
App Level
Permissions
Index Level
Permissions

Splunk Security Must Have!
• Splunk Authentication is Poor
• No Password Policy
• No Centralised management for multiple search nodes
• Single Sign On - Splunk supports:
• Ping Identity
• Okta
• ADFS
• Azure AD
• LDAP
• Custom Auth
• Use a Entitlement Framework on top of single sign on groups

Combating Knowledge Fragmentation
• Semantic Logging:
• Logging for the sole purpose of analytics
• Rich datasets can be viewed in multiple dimensions
• Deﬁne Developer Guidelines:
• Ensure Correlation Identiﬁers are present in all events
• Precision Timestamps
• Incorporate Logging into SDLC
• Standardise Logging Formats
• Standardise Log content per service - e.g. BAM metrics

Combating Knowledge Fragmentation
Reality - Not all logs can be logged semantically or logged
semantically without signiﬁcant refactoring.
Splunk Solution - Data Models

Data Models
• Enable go go gadget - “Schema on the ﬂy”
• Hierarchically structured search-time mapping of semantic
knowledge.
• Accessed via Datasets tab in Splunk 6.5

Example: Splunk CIM
• Splunk Common Information Model (CIM)
• Collection of Data Models based on subject area
• Shared Semantic model
• Support consistent and normalised treatment of data
• Enables third party apps to be integrated to your data.
• Reference Tables:
http://docs.splunk.com/Documentation/CIM/4.6.0/User/Howtousethesereferencetables

Pivot
• UI Developed to enable the creation of analytics off structured data
models
• Supports:
• Tables
• Charts - Line,Scatter, Column, Bar, Bubble,Pie
• Single Value Visualisations

Performance
• Data Models can be accelerated which can lead to:
• Decreases Search Optimisation Effort
• Decreases Dashboard Optimisation Effort
• Increases Storage Requirements
• Speed up upto x1000
• Speed is dependant on the cardinality of data

Notable Splunk Apps on CIM
• Splunk Enterprise Security
• Splunk PCI Compliance
• Insight Engines - Search Splunk using Natural Language

Adoption
• Most users complain about backlogs on onboarding data
• Automating the onboarding process isn’t as easy as it sounds. Data Validation is key to deriving value.
• Universal Forwarder:
• Standardise Log Locations
• Standardise Time Stamps
• HTTP Event Collector:
• Send data directly from your application to splunk
• Utilise Indexer Acknowledgement
• Notable implementations:
• Docker - Splunk Logging Driver

Newish Splunk Features
• Machine Learning Toolkit
• Comes with built-in assistants for supported algorithms
• Extend algorithms available - python sci-kit learn
• ITSI
• Modular Visualisations
• New Custom Search Command Creation Capability
• TSIDX Reduction - Decrease Storage Costs

Crystal Ball
Further integration into the Hadoop ecosystem

DOXLON November 2016 - Data Democratization Using Splunk

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a DOXLON November 2016 - Data Democratization Using Splunk

Semelhante a DOXLON November 2016 - Data Democratization Using Splunk (20)

Mais de Outlyer

Mais de Outlyer (20)

Último

Último (20)

DOXLON November 2016 - Data Democratization Using Splunk