SlideShare uma empresa Scribd logo
1 de 28
Copyright © 2013 Splunk Inc.

Hunk:
Technical Deep Dive
Legal Notices
During the course of this presentation, we may make forward-looking statements regarding future events or the
expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results could differ
materially. For important factors that may cause actual results to differ from those contained in our forward-looking
statements, please review our filings with the SEC. The forward-looking statements made in this presentation are
being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation
may not contain current or accurate information. We do not assume any obligation to update any forward-looking
statements we may make. In addition, any information about our roadmap outlines our general product direction
and is subject to change at any time without notice. It is for informational purposes only and shall not, be
incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the
features or functionality described or to include any such feature or functionality in a future release.
Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of
Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective
owners.

©2013 Splunk Inc. All rights reserved.

2
The Problem
•

Large amounts of data in Hadoop
– Relatively easy to get the data in
– Hard & time-consuming to get value out

•

Splunk has solved this problem before
– Primarily for event time series

•

Wouldn’t if be great if Splunk could be used to analyze
Hadoop data?

Hadoop + Splunk = Hunk
3
The Goals
•

A viable solution must:
–
–
–
–
–

Process the data in place
Maintain support for Splunk Processing Language (SPL)
True schema on read
Report previews
Ease of setup & use

4
GOALS

Support SPL

•

Naturally suitable for MapReduce

•

Reduces adoption time

•

Challenge: Hadoop “apps” written in Java & all SPL code is in C++

•

Porting SPL to Java would be a daunting task

•

Reuse the C++ code somehow
– Use “splunkd” (the binary) to process the data
– JNI is not easy nor stable

5
GOALS
•

Schema on Read

Apply Splunk’s index-time schema at search time
– Event breaking, time stamping etc.

•

Anything else would be brittle & maintenance nightmare

•

Extremely flexible

•

Runtime overhead (manpower >>$ computation)

•

Challenge: Hadoop “apps” written in Java & all index-time schema
logic is implemented in C++

6
GOALS

Previews

•

No one likes to stare at a blank screen!

•

Challenge: Hadoop is designed for batch-like jobs

7
GOALS
•

Ease of Setup and Use

Users should just specify:
– Hadoop cluster they want to use
– Data within the cluster they want to process

•

Immediately be able to explore and analyze their data

8
Deployment Overview

9
Move Data to Computation (stream)
•

Move data from HDFS to SH

•

Process it in a streaming fashion

•

Visualize the results

•

Problem?

10
Move Computation to Data (MR)
•

Create and start a MapReduce job to do the processing

•

Monitor MR job and collect its results

•

Merge the results and visualize

•

Problem?

11
Mixed Mode
•

Use both computation models concurrently
preview

Switch over
time

Stream
preview

MR

Time

12
First Search Setup
hdfs://<working dir>/packages
1. Copy splunkd package

HDFS

.tgz

Hunk
search head >

2. Copy
.tgz

.tgz

TaskTracker 1

TaskTracker 2

3. Expand in specified location on each TaskTracker

13

TaskTracker 3

4. Receives package in
subsequent searches
Data Flow

14
Search

Streaming Data Flow

HDFS

Raw data / data transfer
Preprocessed data / stdin

ERP

Search process

Final search results

Results

Search head
15
Search
TaskTracker

Reporting Data Flow

preprocessed

raw

HDFS

MapReduce

Search process
Remote results

Remote results
Remote results

Remote results

ERP
Search process

Search head
16

Final search
results
Results
Search
DataNode/TaskTracker

Raw data
(HDFS)

Data Processing
Custom
processing

stdin

You can plug in
data preprocessors
e.g. Apache Avro or
format readers

HDFS
Inter. results

Compress

Indexing
pipeline
Event breaking
Timestamping

stdout

Search
pipeline
Event typing
Lookups
Tagging
Search processors

Results
splunkd/C++

MapReduce/Java
17

17
Schematization

Search
Raw data

Parsing
Merging
Typing
IndexerPi
pe

splunkd/C++
18

Search
pipeline
.
.
.

Results
Search – Search Head
•

Responsible for:
– Orchestrating everything
– Submitting MR jobs (optionally splitting bigger jobs into smaller ones)
– Merging the results of MR jobs
 Potentially with results from other VIXes or native indexes
– Handling high level optimizations

19
Optimization

Partition Pruning

Data is usually organized into hierarchical dirs, eg.
/<base_path>/<date>/<hour>/<hostname>/somefile.log
• Hunk can be instructed to extract fields and time ranges
from a path
• Ignores directories that cannot possibly contain search results
•

20
Optimization

Partition Pruning e.g.

Paths in a VIX:
/home/hunk/20130610/01/host1/access_combined.log
/home/hunk/20130610/02/host1/access_combined.log
/home/hunk/20130610/01/host2/access_combined.log
/home/hunk/20130610/02/host2/access_combined.log

Search: index=hunk server=host1
Paths searched:
/home/hunk/20130610/01/host1/access_combined.log
/home/hunk/20130610/02/host1/access_combined.log

21
Optimization

Partition Pruning e.g.

Paths in a VIX:
/home/hunk/20130610/01/host1/access_combined.log
/home/hunk/20130610/02/host1/access_combined.log
/home/hunk/20130610/01/host2/access_combined.log
/home/hunk/20130610/02/host2/access_combined.log
Search: index=hunk earliest_time=“2013-06-10T01:00:00” latest_time =“2013-0610T02:00:00”
Paths searched:
/home/hunk/20130610/01/host1/access_combined.log
/home/hunk/20130610/01/host2/access_combined.log

22
Best Practices
•

Partition data in FS using fields that:
– are commonly used
– relatively low cardinality

•

For new data, use formats that are well defined, e.g.
– Avro, JSON etc.
– avoid columnar formats, like csv/tsv (hard to split)

•

Use compression, gzip, snappy etc.
– I/O becomes a bottleneck at scale

23
Troubleshooting

Troubleshooting

•

search.log is your friend!!!

•

Log lines annotated with ERP.<name> …

•

Links for spawned MR job(s)
Follow these links to troubleshoot MR issues

•

hdfs://<base_path>/dispatch/<sid>/<num>/<dispatch_dirs> contains
the dispatch dir content of searches ran on TaskTracker

24
Troubleshooting

Job Inspector

25
Troubleshooting

Common Problems

•

User running Splunk does not have permission to write to HDFS or run
MapReduce jobs

•

HDFS SPLUNK_HOME not writable

•

DN/TT SPLUNK_HOME not writable, out of disk

•

Data reading permission issues

26
Helpful Resources
•

Download
– http://www.splunk.com/bigdata

•

Help & docs
– http://docs.splunk.com/Documentation/Hunk/6.0/Hunk/MeetHunk

•

Resource
– http://answers.splunk.com

27
Thank You

Mais conteúdo relacionado

Mais procurados

Splunk Architecture overview
Splunk Architecture overviewSplunk Architecture overview
Splunk Architecture overviewAlex Fok
 
Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersEnabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersDataWorks Summit
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorBlueData, Inc.
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersRommel Garcia
 
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUsHow To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUsKinetica
 
SplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunk
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureLynn Langit
 
Hadoop & Complex Systems Research
Hadoop & Complex Systems ResearchHadoop & Complex Systems Research
Hadoop & Complex Systems ResearchDr. Mirko Kämpf
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoopRommel Garcia
 
SplunkSummit 2015 - A Quick Guide to Search Optimization
SplunkSummit 2015 - A Quick Guide to Search OptimizationSplunkSummit 2015 - A Quick Guide to Search Optimization
SplunkSummit 2015 - A Quick Guide to Search OptimizationSplunk
 
SplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojoSplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojoSplunk
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsDr. Mirko Kämpf
 
SplunkLive Sydney Scaling and best practice for Splunk on premise and in the ...
SplunkLive Sydney Scaling and best practice for Splunk on premise and in the ...SplunkLive Sydney Scaling and best practice for Splunk on premise and in the ...
SplunkLive Sydney Scaling and best practice for Splunk on premise and in the ...Gabrielle Knowles
 
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarScalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarDatabricks
 
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDataWorks Summit
 
Overview of stinger interactive query for hive
Overview of stinger   interactive query for hiveOverview of stinger   interactive query for hive
Overview of stinger interactive query for hiveDavid Kaiser
 

Mais procurados (20)

Splunk Architecture overview
Splunk Architecture overviewSplunk Architecture overview
Splunk Architecture overview
 
Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersEnabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab Accelerator
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data Centers
 
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUsHow To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
 
Interactive query using hadoop
Interactive query using hadoopInteractive query using hadoop
Interactive query using hadoop
 
SplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with Splunk
 
Splunk Architecture
Splunk ArchitectureSplunk Architecture
Splunk Architecture
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
 
Hadoop & Complex Systems Research
Hadoop & Complex Systems ResearchHadoop & Complex Systems Research
Hadoop & Complex Systems Research
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
SplunkSummit 2015 - A Quick Guide to Search Optimization
SplunkSummit 2015 - A Quick Guide to Search OptimizationSplunkSummit 2015 - A Quick Guide to Search Optimization
SplunkSummit 2015 - A Quick Guide to Search Optimization
 
SplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojoSplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojo
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
SplunkLive Sydney Scaling and best practice for Splunk on premise and in the ...
SplunkLive Sydney Scaling and best practice for Splunk on premise and in the ...SplunkLive Sydney Scaling and best practice for Splunk on premise and in the ...
SplunkLive Sydney Scaling and best practice for Splunk on premise and in the ...
 
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarScalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
 
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
 
Overview of stinger interactive query for hive
Overview of stinger   interactive query for hiveOverview of stinger   interactive query for hive
Overview of stinger interactive query for hive
 
Intro to Big Data - Spark
Intro to Big Data - SparkIntro to Big Data - Spark
Intro to Big Data - Spark
 

Semelhante a SplunkLive! Hunk Technical Deep Dive

Splunk Data Onboarding Overview - Splunk Data Collection Architecture
Splunk Data Onboarding Overview - Splunk Data Collection ArchitectureSplunk Data Onboarding Overview - Splunk Data Collection Architecture
Splunk Data Onboarding Overview - Splunk Data Collection ArchitectureSplunk
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunk
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunk
 
SplunkLive! Amsterdam 2015 Breakout - Getting Started with Splunk
SplunkLive! Amsterdam 2015 Breakout - Getting Started with SplunkSplunkLive! Amsterdam 2015 Breakout - Getting Started with Splunk
SplunkLive! Amsterdam 2015 Breakout - Getting Started with SplunkSplunk
 
Lesser known-search-commands
Lesser known-search-commandsLesser known-search-commands
Lesser known-search-commandspendoo
 
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 -  Real World Big Data ArchitectureSplunkSummit 2015 -  Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data ArchitectureSplunk
 
SplunkLive! Splunk Enterprise 6.3 - Data On-boarding
SplunkLive! Splunk Enterprise 6.3 - Data On-boardingSplunkLive! Splunk Enterprise 6.3 - Data On-boarding
SplunkLive! Splunk Enterprise 6.3 - Data On-boardingSplunk
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseSplunk
 
Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)stelligence
 
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014Tom LaGatta
 
Splunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data ScienceSplunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data ScienceSplunk
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Landon Robinson
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionSplunk
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsDatabricks
 
Splunk and map_reduce
Splunk and map_reduceSplunk and map_reduce
Splunk and map_reduceGreg Hanchin
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseSplunk
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseShannon Cuthbertson
 
Splunk conf2014 - Onboarding Data Into Splunk
Splunk conf2014 - Onboarding Data Into SplunkSplunk conf2014 - Onboarding Data Into Splunk
Splunk conf2014 - Onboarding Data Into SplunkSplunk
 
Getting Started with Splunk (Hands-On)
Getting Started with Splunk (Hands-On) Getting Started with Splunk (Hands-On)
Getting Started with Splunk (Hands-On) Splunk
 

Semelhante a SplunkLive! Hunk Technical Deep Dive (20)

Splunk Data Onboarding Overview - Splunk Data Collection Architecture
Splunk Data Onboarding Overview - Splunk Data Collection ArchitectureSplunk Data Onboarding Overview - Splunk Data Collection Architecture
Splunk Data Onboarding Overview - Splunk Data Collection Architecture
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
 
FNC2751.pdf
FNC2751.pdfFNC2751.pdf
FNC2751.pdf
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
SplunkLive! Amsterdam 2015 Breakout - Getting Started with Splunk
SplunkLive! Amsterdam 2015 Breakout - Getting Started with SplunkSplunkLive! Amsterdam 2015 Breakout - Getting Started with Splunk
SplunkLive! Amsterdam 2015 Breakout - Getting Started with Splunk
 
Lesser known-search-commands
Lesser known-search-commandsLesser known-search-commands
Lesser known-search-commands
 
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 -  Real World Big Data ArchitectureSplunkSummit 2015 -  Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data Architecture
 
SplunkLive! Splunk Enterprise 6.3 - Data On-boarding
SplunkLive! Splunk Enterprise 6.3 - Data On-boardingSplunkLive! Splunk Enterprise 6.3 - Data On-boarding
SplunkLive! Splunk Enterprise 6.3 - Data On-boarding
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 
Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)
 
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
 
Splunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data ScienceSplunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data Science
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Splunk and map_reduce
Splunk and map_reduceSplunk and map_reduce
Splunk and map_reduce
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 
Splunk conf2014 - Onboarding Data Into Splunk
Splunk conf2014 - Onboarding Data Into SplunkSplunk conf2014 - Onboarding Data Into Splunk
Splunk conf2014 - Onboarding Data Into Splunk
 
Getting Started with Splunk (Hands-On)
Getting Started with Splunk (Hands-On) Getting Started with Splunk (Hands-On)
Getting Started with Splunk (Hands-On)
 

Mais de Splunk

.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routineSplunk
 
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTVSplunk
 
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica).conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica)Splunk
 
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank InternationalSplunk
 
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett .conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett Splunk
 
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär).conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)Splunk
 
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu....conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...Splunk
 
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever....conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...Splunk
 
.conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex).conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex)Splunk
 
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)Splunk
 
Splunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk
 
Splunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go KölnSplunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go KölnSplunk
 
Splunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk
 
Data foundations building success, at city scale – Imperial College London
 Data foundations building success, at city scale – Imperial College London Data foundations building success, at city scale – Imperial College London
Data foundations building success, at city scale – Imperial College LondonSplunk
 
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk
 
SOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSplunk
 
.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session.conf Go 2022 - Observability Session
.conf Go 2022 - Observability SessionSplunk
 
.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - KeynoteSplunk
 
.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform SessionSplunk
 
.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security SessionSplunk
 

Mais de Splunk (20)

.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine
 
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
 
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica).conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
 
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International
 
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett .conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
 
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär).conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
 
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu....conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
 
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever....conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
 
.conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex).conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex)
 
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
 
Splunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11y
 
Splunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go KölnSplunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go Köln
 
Splunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go Köln
 
Data foundations building success, at city scale – Imperial College London
 Data foundations building success, at city scale – Imperial College London Data foundations building success, at city scale – Imperial College London
Data foundations building success, at city scale – Imperial College London
 
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
 
SOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security Webinar
 
.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session
 
.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote
 
.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session
 
.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session
 

Último

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Último (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

SplunkLive! Hunk Technical Deep Dive

  • 1. Copyright © 2013 Splunk Inc. Hunk: Technical Deep Dive
  • 2. Legal Notices During the course of this presentation, we may make forward-looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward-looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release. Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. ©2013 Splunk Inc. All rights reserved. 2
  • 3. The Problem • Large amounts of data in Hadoop – Relatively easy to get the data in – Hard & time-consuming to get value out • Splunk has solved this problem before – Primarily for event time series • Wouldn’t if be great if Splunk could be used to analyze Hadoop data? Hadoop + Splunk = Hunk 3
  • 4. The Goals • A viable solution must: – – – – – Process the data in place Maintain support for Splunk Processing Language (SPL) True schema on read Report previews Ease of setup & use 4
  • 5. GOALS Support SPL • Naturally suitable for MapReduce • Reduces adoption time • Challenge: Hadoop “apps” written in Java & all SPL code is in C++ • Porting SPL to Java would be a daunting task • Reuse the C++ code somehow – Use “splunkd” (the binary) to process the data – JNI is not easy nor stable 5
  • 6. GOALS • Schema on Read Apply Splunk’s index-time schema at search time – Event breaking, time stamping etc. • Anything else would be brittle & maintenance nightmare • Extremely flexible • Runtime overhead (manpower >>$ computation) • Challenge: Hadoop “apps” written in Java & all index-time schema logic is implemented in C++ 6
  • 7. GOALS Previews • No one likes to stare at a blank screen! • Challenge: Hadoop is designed for batch-like jobs 7
  • 8. GOALS • Ease of Setup and Use Users should just specify: – Hadoop cluster they want to use – Data within the cluster they want to process • Immediately be able to explore and analyze their data 8
  • 10. Move Data to Computation (stream) • Move data from HDFS to SH • Process it in a streaming fashion • Visualize the results • Problem? 10
  • 11. Move Computation to Data (MR) • Create and start a MapReduce job to do the processing • Monitor MR job and collect its results • Merge the results and visualize • Problem? 11
  • 12. Mixed Mode • Use both computation models concurrently preview Switch over time Stream preview MR Time 12
  • 13. First Search Setup hdfs://<working dir>/packages 1. Copy splunkd package HDFS .tgz Hunk search head > 2. Copy .tgz .tgz TaskTracker 1 TaskTracker 2 3. Expand in specified location on each TaskTracker 13 TaskTracker 3 4. Receives package in subsequent searches
  • 15. Search Streaming Data Flow HDFS Raw data / data transfer Preprocessed data / stdin ERP Search process Final search results Results Search head 15
  • 16. Search TaskTracker Reporting Data Flow preprocessed raw HDFS MapReduce Search process Remote results Remote results Remote results Remote results ERP Search process Search head 16 Final search results Results
  • 17. Search DataNode/TaskTracker Raw data (HDFS) Data Processing Custom processing stdin You can plug in data preprocessors e.g. Apache Avro or format readers HDFS Inter. results Compress Indexing pipeline Event breaking Timestamping stdout Search pipeline Event typing Lookups Tagging Search processors Results splunkd/C++ MapReduce/Java 17 17
  • 19. Search – Search Head • Responsible for: – Orchestrating everything – Submitting MR jobs (optionally splitting bigger jobs into smaller ones) – Merging the results of MR jobs  Potentially with results from other VIXes or native indexes – Handling high level optimizations 19
  • 20. Optimization Partition Pruning Data is usually organized into hierarchical dirs, eg. /<base_path>/<date>/<hour>/<hostname>/somefile.log • Hunk can be instructed to extract fields and time ranges from a path • Ignores directories that cannot possibly contain search results • 20
  • 21. Optimization Partition Pruning e.g. Paths in a VIX: /home/hunk/20130610/01/host1/access_combined.log /home/hunk/20130610/02/host1/access_combined.log /home/hunk/20130610/01/host2/access_combined.log /home/hunk/20130610/02/host2/access_combined.log Search: index=hunk server=host1 Paths searched: /home/hunk/20130610/01/host1/access_combined.log /home/hunk/20130610/02/host1/access_combined.log 21
  • 22. Optimization Partition Pruning e.g. Paths in a VIX: /home/hunk/20130610/01/host1/access_combined.log /home/hunk/20130610/02/host1/access_combined.log /home/hunk/20130610/01/host2/access_combined.log /home/hunk/20130610/02/host2/access_combined.log Search: index=hunk earliest_time=“2013-06-10T01:00:00” latest_time =“2013-0610T02:00:00” Paths searched: /home/hunk/20130610/01/host1/access_combined.log /home/hunk/20130610/01/host2/access_combined.log 22
  • 23. Best Practices • Partition data in FS using fields that: – are commonly used – relatively low cardinality • For new data, use formats that are well defined, e.g. – Avro, JSON etc. – avoid columnar formats, like csv/tsv (hard to split) • Use compression, gzip, snappy etc. – I/O becomes a bottleneck at scale 23
  • 24. Troubleshooting Troubleshooting • search.log is your friend!!! • Log lines annotated with ERP.<name> … • Links for spawned MR job(s) Follow these links to troubleshoot MR issues • hdfs://<base_path>/dispatch/<sid>/<num>/<dispatch_dirs> contains the dispatch dir content of searches ran on TaskTracker 24
  • 26. Troubleshooting Common Problems • User running Splunk does not have permission to write to HDFS or run MapReduce jobs • HDFS SPLUNK_HOME not writable • DN/TT SPLUNK_HOME not writable, out of disk • Data reading permission issues 26
  • 27. Helpful Resources • Download – http://www.splunk.com/bigdata • Help & docs – http://docs.splunk.com/Documentation/Hunk/6.0/Hunk/MeetHunk • Resource – http://answers.splunk.com 27

Notas do Editor

  1. On the first search, MapReduce auto-populates the Splunk binaries. The orchestration process begins by copying the Hunk binary .tgz file to HDFS. Hunk supports both the MapReduce JobTracker and the YARN MapReduce Resource Manager.Each TaskTracker (called ApplicationContainer in YARN) fetches the binary.TaskTrackers not involved in the 1st search will receive the Hunk binary in a subsequent search that involves those TaskTrackers. This process is one example of why Hunk needs some scratch space in HDFS and in the local file system (TaskTrackers / DataNodes). Hadoop NotesTypically a Hadoop cluster has a single master and multiple worker nodes. The master node (also referred to as NameNode) coordinates the reads and writes to worker nodes (also referred to as DataNodes). HDFS reliability is achieved by replicating the data across multiple machines. By default the replication value is 3 and chunk size is 64MB.The JobTracker dispatches tasks to worker nodes (TaskTracker) in the cluster. Priority is given to nodes that host the data upon which said task will operate on. If the task cannot be run on that node, next priority is given to neighboring nodes (in order to minimize network traffic). Upon job completion, each worker node writes own results locally and the HDFS ensures replication across the cluster.HDFS = NameNode + DataNodes
MapReduce Engine = JobTracker + TaskTracker
  2. HDFS -&gt; MapReduce -&gt; (preprocess) -&gt; Splunkd/TT -&gt; MapReduce -&gt; HDFS -&gt; SH
  3. Before data is processed by Hunk you can plug in your own data preprocessor. The preprocessors have to be written in Java and can transform the data in some way before Hunk gets a chance to. Data preprocessors can vary in complexity from simple translators (say Avro to JSON) to as complex as doing image/video/document processing.Hunk translates Avro to JSON. These translations happen on the fly and are not persisted.