SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Data Democratisation Using
Splunk
Neil Roy Chowdhury
neil@strft.com
About Me
• Splunking Since 2008
• Largest Splunk Implementation:
• 3 TB/day
• 1.2 PB Searchable
• 900 Users
• Interests:
• Guitars
• And the occasional Uke
What is Splunk?
• Google Search for IT Data?
• Log aggregation Tool?
• Data Visualisation Tool?
• Data Platform with App Creation Capabilities
• Proprietary Search Language - SPL
• Correlation of Structured and Unstructured Data Sources
• Visualisation capabilities
• Out of the Box
• Modular
Getting Data In
Unstructured Data
Sources
Structured Data
Sources - JSON,
CSV, XML
Forwarders
HEC
Data Sources Indexer
Line Breaking
Timestamp
Recognition
Data
Segmentation
Pipeline
Persist to
Disk
Index
Bucket
Bucket
Bucket
Bucket
Bucket
Keywords
Raw Data
Data Collection using Splunk
Forwarder
• Splunk forwarder capabilities
• File based Inputs
• Database Inputs
• Scripted Inputs
• Forwarder Configurations deployed as modular add-ons
Typical Splunk Search
index = <my_product> sourcetype=web.access checkout | stats
avg(response_time) as “Average Response Time” by request
Searching Data
Query Index By
Keyword
Load Raw Results
Returned in Memory
Apply Data Extractions,
Transformations and
Lookups
Run Streaming
Commands
Indexers - Map
Search Heads - Reduce
Knowledge
Objects
Receive Results and
“Reduce”
Run Additional
Commands
Visualise, Report,
Alert
So what about Knowledge Objects?
• Most Knowledge Objects are configurable from UI
• Common Types:
• Field Extractions - regex to extract fields
• Field Aliases - Alias a name of a field
• Lookups - vs flat files and kv-store
• Tags - Provides event grouping abstraction
• Eventtypes - Provides event categorisation
• Calculated Fields - Data manipulations
Goal?
• Queries like:
• Become:
index=<my_website> “/checkout/auth/confirmation” | rex “<some humungous regex that extracts
customer id in addition to other things>” | eval response_time_seconds = resp_time_milliseconds/
(1000) | where http_code == 200 | lookup db_locations customer_id OUTPUT location | stats
avg(response_time_seconds) as avg_response_time by location
eventtype=auth_successful tag=web | stats avg(response_time_seconds) as
average_response_time by location
Goal?
Persisting Knowledge
Data Democratisation
• Sounds like the holy grail of data
• Idealistic?
Scenario
• Microservices Architecture
• Numerous Development Teams working under different service
umbrellas
• Mix of legacy systems with modern services
• Dependance on vendor integrations
• Data can be sensitive
Typical Data Democratisation Issues
• Security - Some data is sensitive yet valuable but we’d like an open
access model
• Knowledge Fragmentation - Its our data, lets make sure everyone
knows what it means.
• Adoption - People need to like it. Shouldn’t get in the way.
• Scalability
• Chargeback - its not my data, why should I pay for it?
Security - Delegated Access Model
• Splunk Search Apps can serve knowledge containers
• Knowledge Objects Ownership can scope local to the app or global to
the entire system.
• Splunk Indexes are data containers.
• Data Access granted by index
• Assign an app per product or service umbrella
• Assign Data Owner
Delegated Access Model
Federated Group Splunk Role
App Level
Permissions
Index Level
Permissions
Splunk Security Must Have!
• Splunk Authentication is Poor
• No Password Policy
• No Centralised management for multiple search nodes
• Single Sign On - Splunk supports:
• Ping Identity
• Okta
• ADFS
• Azure AD
• LDAP
• Custom Auth
• Use a Entitlement Framework on top of single sign on groups
Combating Knowledge Fragmentation
• Semantic Logging:
• Logging for the sole purpose of analytics
• Rich datasets can be viewed in multiple dimensions
• Define Developer Guidelines:
• Ensure Correlation Identifiers are present in all events
• Precision Timestamps
• Incorporate Logging into SDLC
• Standardise Logging Formats
• Standardise Log content per service - e.g. BAM metrics
Combating Knowledge Fragmentation
Reality - Not all logs can be logged semantically or logged
semantically without significant refactoring.
Splunk Solution - Data Models
Data Models
• Enable go go gadget - “Schema on the fly”
• Hierarchically structured search-time mapping of semantic
knowledge.
• Accessed via Datasets tab in Splunk 6.5
Example: Splunk CIM
• Splunk Common Information Model (CIM)
• Collection of Data Models based on subject area
• Shared Semantic model
• Support consistent and normalised treatment of data
• Enables third party apps to be integrated to your data.
• Reference Tables:
http://docs.splunk.com/Documentation/CIM/4.6.0/User/Howtousethesereferencetables
Pivot
• UI Developed to enable the creation of analytics off structured data
models
• Supports:
• Tables
• Charts - Line,Scatter, Column, Bar, Bubble,Pie
• Single Value Visualisations
Performance
• Data Models can be accelerated which can lead to:
• Decreases Search Optimisation Effort
• Decreases Dashboard Optimisation Effort
• Increases Storage Requirements
• Speed up upto x1000
• Speed is dependant on the cardinality of data
Notable Splunk Apps on CIM
• Splunk Enterprise Security
• Splunk PCI Compliance
• Insight Engines - Search Splunk using Natural Language
Adoption
• Most users complain about backlogs on onboarding data
• Automating the onboarding process isn’t as easy as it sounds. Data Validation is key to deriving value.
• Universal Forwarder:
• Standardise Log Locations
• Standardise Time Stamps
• HTTP Event Collector:
• Send data directly from your application to splunk
• Utilise Indexer Acknowledgement
• Notable implementations:
• Docker - Splunk Logging Driver
Newish Splunk Features
• Machine Learning Toolkit
• Comes with built-in assistants for supported algorithms
• Extend algorithms available - python sci-kit learn
• ITSI
• Modular Visualisations
• New Custom Search Command Creation Capability
• TSIDX Reduction - Decrease Storage Costs
Crystal Ball
Further integration into the Hadoop ecosystem

Mais conteúdo relacionado

Mais procurados

Dataversity Sponsorship and Advertising Opportunities
Dataversity Sponsorship and Advertising OpportunitiesDataversity Sponsorship and Advertising Opportunities
Dataversity Sponsorship and Advertising Opportunities
DATAVERSITY
 

Mais procurados (20)

ADV Slides: Increasing Artificial Intelligence Success with Master Data Manag...
ADV Slides: Increasing Artificial Intelligence Success with Master Data Manag...ADV Slides: Increasing Artificial Intelligence Success with Master Data Manag...
ADV Slides: Increasing Artificial Intelligence Success with Master Data Manag...
 
Data Management vs. Data Governance Program
Data Management vs. Data Governance ProgramData Management vs. Data Governance Program
Data Management vs. Data Governance Program
 
RWDG Slides: Data Governance and Three Levels of Metadata Management
RWDG Slides: Data Governance and Three Levels of Metadata ManagementRWDG Slides: Data Governance and Three Levels of Metadata Management
RWDG Slides: Data Governance and Three Levels of Metadata Management
 
Dataversity Sponsorship and Advertising Opportunities
Dataversity Sponsorship and Advertising OpportunitiesDataversity Sponsorship and Advertising Opportunities
Dataversity Sponsorship and Advertising Opportunities
 
DataEd Webinar: Metadata Strategies
DataEd Webinar:  Metadata StrategiesDataEd Webinar:  Metadata Strategies
DataEd Webinar: Metadata Strategies
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data strategy - The Business Game Changer
Data strategy - The Business Game ChangerData strategy - The Business Game Changer
Data strategy - The Business Game Changer
 
RWDG Slides: Activate Your Data Governance Policy
RWDG Slides: Activate Your Data Governance PolicyRWDG Slides: Activate Your Data Governance Policy
RWDG Slides: Activate Your Data Governance Policy
 
Real-World Data Governance: Build Your Own Data Governance Tools
Real-World Data Governance: Build Your Own Data Governance ToolsReal-World Data Governance: Build Your Own Data Governance Tools
Real-World Data Governance: Build Your Own Data Governance Tools
 
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data ModelerDAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
 
RWDG Slides: Data Governance Roles and Responsibilities
RWDG Slides: Data Governance Roles and ResponsibilitiesRWDG Slides: Data Governance Roles and Responsibilities
RWDG Slides: Data Governance Roles and Responsibilities
 
DMBOK - Chapter 1 Summary
DMBOK - Chapter 1 SummaryDMBOK - Chapter 1 Summary
DMBOK - Chapter 1 Summary
 
Do you know where your databases are?
Do you know where your databases are?Do you know where your databases are?
Do you know where your databases are?
 
RWDG: Data Governance and Three Levels of Metadata 
RWDG: Data Governance and Three Levels of Metadata RWDG: Data Governance and Three Levels of Metadata 
RWDG: Data Governance and Three Levels of Metadata 
 
RWDG Slides: Three Approaches to Data Stewardship
RWDG Slides: Three Approaches to Data StewardshipRWDG Slides: Three Approaches to Data Stewardship
RWDG Slides: Three Approaches to Data Stewardship
 
RWDG Webinar: DIY and Purchased Data Governance Tools
RWDG Webinar: DIY and Purchased Data Governance ToolsRWDG Webinar: DIY and Purchased Data Governance Tools
RWDG Webinar: DIY and Purchased Data Governance Tools
 
Getting Started with Data Stewardship
Getting Started with Data StewardshipGetting Started with Data Stewardship
Getting Started with Data Stewardship
 
Information Governance Methodology
Information Governance MethodologyInformation Governance Methodology
Information Governance Methodology
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 

Destaque

Drupal workshop ist 2014
Drupal workshop ist 2014Drupal workshop ist 2014
Drupal workshop ist 2014
Ricardo Amaro
 
Drupalcamp es 2013 drupal with lxc docker and vagrant
Drupalcamp es 2013  drupal with lxc docker and vagrant Drupalcamp es 2013  drupal with lxc docker and vagrant
Drupalcamp es 2013 drupal with lxc docker and vagrant
Ricardo Amaro
 
Drupal workshop fcul_2014
Drupal workshop fcul_2014Drupal workshop fcul_2014
Drupal workshop fcul_2014
Ricardo Amaro
 

Destaque (20)

DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2
 
DOXLON November 2016 - ELK Stack and Beats
DOXLON November 2016 - ELK Stack and Beats DOXLON November 2016 - ELK Stack and Beats
DOXLON November 2016 - ELK Stack and Beats
 
#DOXLON October 2016 - Mesos Deployment at Schibsted
#DOXLON October 2016 - Mesos Deployment at Schibsted#DOXLON October 2016 - Mesos Deployment at Schibsted
#DOXLON October 2016 - Mesos Deployment at Schibsted
 
Neil Saunders (Beamly) - Securing your AWS Infrastructure with Hashicorp Vault
Neil Saunders (Beamly) - Securing your AWS Infrastructure with Hashicorp Vault Neil Saunders (Beamly) - Securing your AWS Infrastructure with Hashicorp Vault
Neil Saunders (Beamly) - Securing your AWS Infrastructure with Hashicorp Vault
 
Building a REST API Microservice for the DevNet API Scavenger Hunt
Building a REST API Microservice for the DevNet API Scavenger HuntBuilding a REST API Microservice for the DevNet API Scavenger Hunt
Building a REST API Microservice for the DevNet API Scavenger Hunt
 
Drupal workshop ist 2014
Drupal workshop ist 2014Drupal workshop ist 2014
Drupal workshop ist 2014
 
Open Source Tools for Container Security and Compliance @Docker LA Meetup 2/13
Open Source Tools for Container Security and Compliance @Docker LA Meetup 2/13Open Source Tools for Container Security and Compliance @Docker LA Meetup 2/13
Open Source Tools for Container Security and Compliance @Docker LA Meetup 2/13
 
How To Train Your APIs
How To Train Your APIsHow To Train Your APIs
How To Train Your APIs
 
Microservice architecture
Microservice architectureMicroservice architecture
Microservice architecture
 
Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing
 
Introduction to Infrastructure as Code & Automation / Introduction to Chef
Introduction to Infrastructure as Code & Automation / Introduction to ChefIntroduction to Infrastructure as Code & Automation / Introduction to Chef
Introduction to Infrastructure as Code & Automation / Introduction to Chef
 
Drupalcamp es 2013 drupal with lxc docker and vagrant
Drupalcamp es 2013  drupal with lxc docker and vagrant Drupalcamp es 2013  drupal with lxc docker and vagrant
Drupalcamp es 2013 drupal with lxc docker and vagrant
 
DATA CENTER
DATA CENTER DATA CENTER
DATA CENTER
 
Priming Your Teams For Microservice Deployment to the Cloud
Priming Your Teams For Microservice Deployment to the CloudPriming Your Teams For Microservice Deployment to the Cloud
Priming Your Teams For Microservice Deployment to the Cloud
 
Docker security: Rolling out Trust in your container
Docker security: Rolling out Trust in your containerDocker security: Rolling out Trust in your container
Docker security: Rolling out Trust in your container
 
Docker Security
Docker SecurityDocker Security
Docker Security
 
Drupal workshop fcul_2014
Drupal workshop fcul_2014Drupal workshop fcul_2014
Drupal workshop fcul_2014
 
S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systems
 
The free software history and communities’ journey ahead
The free software history and communities’ journey aheadThe free software history and communities’ journey ahead
The free software history and communities’ journey ahead
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-IT
 

Semelhante a DOXLON November 2016 - Data Democratization Using Splunk

Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
Petter Skodvin-Hvammen
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 

Semelhante a DOXLON November 2016 - Data Democratization Using Splunk (20)

Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Taking Splunk to the Next Level – Architecture
Taking Splunk to the Next Level – ArchitectureTaking Splunk to the Next Level – Architecture
Taking Splunk to the Next Level – Architecture
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
Intelligent Cloud Enablement
Intelligent Cloud EnablementIntelligent Cloud Enablement
Intelligent Cloud Enablement
 
Historic Opportunities: Discover the Power of Ignition's Historian
Historic Opportunities: Discover the Power of Ignition's HistorianHistoric Opportunities: Discover the Power of Ignition's Historian
Historic Opportunities: Discover the Power of Ignition's Historian
 
OpenStack Swift In the Enterprise
OpenStack Swift In the EnterpriseOpenStack Swift In the Enterprise
OpenStack Swift In the Enterprise
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's New
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
 
Webinar: Site Search in an Hour with Fusion
Webinar: Site Search in an Hour with FusionWebinar: Site Search in an Hour with Fusion
Webinar: Site Search in an Hour with Fusion
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
Building A Self Service Analytics Platform on Hadoop
Building A Self Service Analytics Platform on HadoopBuilding A Self Service Analytics Platform on Hadoop
Building A Self Service Analytics Platform on Hadoop
 
Rdbms
RdbmsRdbms
Rdbms
 
Striim_PPT yogesh.pptx
Striim_PPT yogesh.pptxStriim_PPT yogesh.pptx
Striim_PPT yogesh.pptx
 
Data Model for Mainframe in Splunk: The Newest Feature of Ironstream
Data Model for Mainframe in Splunk: The Newest Feature of IronstreamData Model for Mainframe in Splunk: The Newest Feature of Ironstream
Data Model for Mainframe in Splunk: The Newest Feature of Ironstream
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
 
Cloud patterns at Carleton University
Cloud patterns at Carleton UniversityCloud patterns at Carleton University
Cloud patterns at Carleton University
 

Mais de Outlyer

Mais de Outlyer (20)

Murat Karslioglu, VP Solutions @ OpenEBS - Containerized storage for containe...
Murat Karslioglu, VP Solutions @ OpenEBS - Containerized storage for containe...Murat Karslioglu, VP Solutions @ OpenEBS - Containerized storage for containe...
Murat Karslioglu, VP Solutions @ OpenEBS - Containerized storage for containe...
 
How & When to Feature Flag
How & When to Feature FlagHow & When to Feature Flag
How & When to Feature Flag
 
Why You Need to Stop Using "The" Staging Server
Why You Need to Stop Using "The" Staging ServerWhy You Need to Stop Using "The" Staging Server
Why You Need to Stop Using "The" Staging Server
 
How GitHub combined with CI empowers rapid product delivery at Credit Karma
How GitHub combined with CI empowers rapid product delivery at Credit Karma How GitHub combined with CI empowers rapid product delivery at Credit Karma
How GitHub combined with CI empowers rapid product delivery at Credit Karma
 
Packaging Services with Nix
Packaging Services with NixPackaging Services with Nix
Packaging Services with Nix
 
Minimum Viable Docker: our journey towards orchestration
Minimum Viable Docker: our journey towards orchestrationMinimum Viable Docker: our journey towards orchestration
Minimum Viable Docker: our journey towards orchestration
 
Ops is dead. long live ops.
Ops is dead. long live ops.Ops is dead. long live ops.
Ops is dead. long live ops.
 
The service mesh: resilient communication for microservice applications
The service mesh: resilient communication for microservice applicationsThe service mesh: resilient communication for microservice applications
The service mesh: resilient communication for microservice applications
 
Microservices: Why We Did It (and should you?)
Microservices: Why We Did It (and should you?) Microservices: Why We Did It (and should you?)
Microservices: Why We Did It (and should you?)
 
Renan Dias: Using Alexa to deploy applications to Kubernetes
Renan Dias: Using Alexa to deploy applications to KubernetesRenan Dias: Using Alexa to deploy applications to Kubernetes
Renan Dias: Using Alexa to deploy applications to Kubernetes
 
Alex Dias: how to build a docker monitoring solution
Alex Dias: how to build a docker monitoring solution Alex Dias: how to build a docker monitoring solution
Alex Dias: how to build a docker monitoring solution
 
How to build a container monitoring solution - David Gildeh, CEO and Co-Found...
How to build a container monitoring solution - David Gildeh, CEO and Co-Found...How to build a container monitoring solution - David Gildeh, CEO and Co-Found...
How to build a container monitoring solution - David Gildeh, CEO and Co-Found...
 
Heresy in the church of - Corey Quinn, Principal at The Quinn Advisory Group
Heresy in the church of - Corey Quinn, Principal at The Quinn Advisory Group Heresy in the church of - Corey Quinn, Principal at The Quinn Advisory Group
Heresy in the church of - Corey Quinn, Principal at The Quinn Advisory Group
 
Anatomy of a real-life incident -Alex Solomon, CTO and Co-Founder of PagerDuty
Anatomy of a real-life incident -Alex Solomon, CTO and Co-Founder of PagerDutyAnatomy of a real-life incident -Alex Solomon, CTO and Co-Founder of PagerDuty
Anatomy of a real-life incident -Alex Solomon, CTO and Co-Founder of PagerDuty
 
A Holistic View of Operational Capabilities—Roy Rapoport, Insight Engineering...
A Holistic View of Operational Capabilities—Roy Rapoport, Insight Engineering...A Holistic View of Operational Capabilities—Roy Rapoport, Insight Engineering...
A Holistic View of Operational Capabilities—Roy Rapoport, Insight Engineering...
 
The Network Knows—Avi Freedman, CEO & Co-Founder of Kentik
The Network Knows—Avi Freedman, CEO & Co-Founder of Kentik The Network Knows—Avi Freedman, CEO & Co-Founder of Kentik
The Network Knows—Avi Freedman, CEO & Co-Founder of Kentik
 
Building a production-ready, fully-scalable Docker Swarm using Terraform & Pa...
Building a production-ready, fully-scalable Docker Swarm using Terraform & Pa...Building a production-ready, fully-scalable Docker Swarm using Terraform & Pa...
Building a production-ready, fully-scalable Docker Swarm using Terraform & Pa...
 
Zero Downtime Postgres Upgrades
Zero Downtime Postgres UpgradesZero Downtime Postgres Upgrades
Zero Downtime Postgres Upgrades
 
Leonard Austin (Ravelin) - DevOps in a Machine Learning World
Leonard Austin (Ravelin) - DevOps in a Machine Learning WorldLeonard Austin (Ravelin) - DevOps in a Machine Learning World
Leonard Austin (Ravelin) - DevOps in a Machine Learning World
 
Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

DOXLON November 2016 - Data Democratization Using Splunk

  • 1. Data Democratisation Using Splunk Neil Roy Chowdhury neil@strft.com
  • 2. About Me • Splunking Since 2008 • Largest Splunk Implementation: • 3 TB/day • 1.2 PB Searchable • 900 Users • Interests: • Guitars • And the occasional Uke
  • 3. What is Splunk? • Google Search for IT Data? • Log aggregation Tool? • Data Visualisation Tool? • Data Platform with App Creation Capabilities • Proprietary Search Language - SPL • Correlation of Structured and Unstructured Data Sources • Visualisation capabilities • Out of the Box • Modular
  • 4. Getting Data In Unstructured Data Sources Structured Data Sources - JSON, CSV, XML Forwarders HEC Data Sources Indexer Line Breaking Timestamp Recognition Data Segmentation Pipeline Persist to Disk Index Bucket Bucket Bucket Bucket Bucket Keywords Raw Data
  • 5. Data Collection using Splunk Forwarder • Splunk forwarder capabilities • File based Inputs • Database Inputs • Scripted Inputs • Forwarder Configurations deployed as modular add-ons
  • 6. Typical Splunk Search index = <my_product> sourcetype=web.access checkout | stats avg(response_time) as “Average Response Time” by request
  • 7. Searching Data Query Index By Keyword Load Raw Results Returned in Memory Apply Data Extractions, Transformations and Lookups Run Streaming Commands Indexers - Map Search Heads - Reduce Knowledge Objects Receive Results and “Reduce” Run Additional Commands Visualise, Report, Alert
  • 8. So what about Knowledge Objects? • Most Knowledge Objects are configurable from UI • Common Types: • Field Extractions - regex to extract fields • Field Aliases - Alias a name of a field • Lookups - vs flat files and kv-store • Tags - Provides event grouping abstraction • Eventtypes - Provides event categorisation • Calculated Fields - Data manipulations
  • 9. Goal? • Queries like: • Become: index=<my_website> “/checkout/auth/confirmation” | rex “<some humungous regex that extracts customer id in addition to other things>” | eval response_time_seconds = resp_time_milliseconds/ (1000) | where http_code == 200 | lookup db_locations customer_id OUTPUT location | stats avg(response_time_seconds) as avg_response_time by location eventtype=auth_successful tag=web | stats avg(response_time_seconds) as average_response_time by location
  • 11. Data Democratisation • Sounds like the holy grail of data • Idealistic?
  • 12. Scenario • Microservices Architecture • Numerous Development Teams working under different service umbrellas • Mix of legacy systems with modern services • Dependance on vendor integrations • Data can be sensitive
  • 13. Typical Data Democratisation Issues • Security - Some data is sensitive yet valuable but we’d like an open access model • Knowledge Fragmentation - Its our data, lets make sure everyone knows what it means. • Adoption - People need to like it. Shouldn’t get in the way. • Scalability • Chargeback - its not my data, why should I pay for it?
  • 14. Security - Delegated Access Model • Splunk Search Apps can serve knowledge containers • Knowledge Objects Ownership can scope local to the app or global to the entire system. • Splunk Indexes are data containers. • Data Access granted by index • Assign an app per product or service umbrella • Assign Data Owner
  • 15. Delegated Access Model Federated Group Splunk Role App Level Permissions Index Level Permissions
  • 16. Splunk Security Must Have! • Splunk Authentication is Poor • No Password Policy • No Centralised management for multiple search nodes • Single Sign On - Splunk supports: • Ping Identity • Okta • ADFS • Azure AD • LDAP • Custom Auth • Use a Entitlement Framework on top of single sign on groups
  • 17. Combating Knowledge Fragmentation • Semantic Logging: • Logging for the sole purpose of analytics • Rich datasets can be viewed in multiple dimensions • Define Developer Guidelines: • Ensure Correlation Identifiers are present in all events • Precision Timestamps • Incorporate Logging into SDLC • Standardise Logging Formats • Standardise Log content per service - e.g. BAM metrics
  • 18. Combating Knowledge Fragmentation Reality - Not all logs can be logged semantically or logged semantically without significant refactoring. Splunk Solution - Data Models
  • 19. Data Models • Enable go go gadget - “Schema on the fly” • Hierarchically structured search-time mapping of semantic knowledge. • Accessed via Datasets tab in Splunk 6.5
  • 20. Example: Splunk CIM • Splunk Common Information Model (CIM) • Collection of Data Models based on subject area • Shared Semantic model • Support consistent and normalised treatment of data • Enables third party apps to be integrated to your data. • Reference Tables: http://docs.splunk.com/Documentation/CIM/4.6.0/User/Howtousethesereferencetables
  • 21.
  • 22. Pivot • UI Developed to enable the creation of analytics off structured data models • Supports: • Tables • Charts - Line,Scatter, Column, Bar, Bubble,Pie • Single Value Visualisations
  • 23.
  • 24. Performance • Data Models can be accelerated which can lead to: • Decreases Search Optimisation Effort • Decreases Dashboard Optimisation Effort • Increases Storage Requirements • Speed up upto x1000 • Speed is dependant on the cardinality of data
  • 25. Notable Splunk Apps on CIM • Splunk Enterprise Security • Splunk PCI Compliance • Insight Engines - Search Splunk using Natural Language
  • 26. Adoption • Most users complain about backlogs on onboarding data • Automating the onboarding process isn’t as easy as it sounds. Data Validation is key to deriving value. • Universal Forwarder: • Standardise Log Locations • Standardise Time Stamps • HTTP Event Collector: • Send data directly from your application to splunk • Utilise Indexer Acknowledgement • Notable implementations: • Docker - Splunk Logging Driver
  • 27. Newish Splunk Features • Machine Learning Toolkit • Comes with built-in assistants for supported algorithms • Extend algorithms available - python sci-kit learn • ITSI • Modular Visualisations • New Custom Search Command Creation Capability • TSIDX Reduction - Decrease Storage Costs
  • 28. Crystal Ball Further integration into the Hadoop ecosystem