O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Machine Data 101

894 visualizações

Publicada em

Machine Data 101 Workshop - Long Beach 2/7

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Machine Data 101

  1. 1. Copyright © 2014 Splunk Inc. Machine Data 101: Turning Data into Insight Audience Version
  2. 2. Agenda § Non-Traditional Data Sources § Data Enrichment § Level Up on Search and Reporting Commands § Data Models and Pivot § Advanced Visualizations and the Web Framework 2
  3. 3. Non-Traditional Data Sources
  4. 4. Non-Traditional Data Sources § Network Inputs § HTTP Event Collector § Log Event Alert Action § Splunk App for Stream § Scripted Inputs § Database Inputs § Splunk ODBC Driver § Modular Inputs § zLinux Forwarder § MINT § Non-Splunk Datastores 4
  5. 5. Traditional Data Sources § Captures events from log files in real time § Runs scripts to gather system metrics, connect to APIs and databases § Listens to syslog and gathers Windows events § Universally indexes any data format so it doesn’t need adapters 5 Windows • Registry • Event logs • File system • sysinternals Linux/Unix • Configurations • Syslog • File system • Ps, iostat, top Virtualization • Hypervisor • Guest OS • Guest Apps Applications • Web logs • Log4J, JMS, JMX • .NET events • Code and scripts Databases • Configurations • Audit/query logs • Tables • Schemas Network • Configurations • syslog • SNMP • netflow
  6. 6. Network Inputs § Collect data over any UDP or TCP port § Some devices only send data over a network port § Best Practice: use syslog-ng or rsyslog § Offers persistence § Categorizes data by host 6
  7. 7. HTTP Event Collector (HEC) § Collect data over HTTP or HTTPS directly to Splunk § Application Developer focus – few lines of code in app to send data § HEC Features Include: § Token-based, not credential based § Indexer Acknowledgements – guarantees data indexing § Raw and JSON formatted event payloads § SSL, CORS (Cross Origion access), and Network Restrictions 7
  8. 8. Log Event Alert Action § Use Splunk alerting to index a custom log event § Splunk searchable index of custom alert events § Configurable Features Include: § Host § Source § Sourcetype § Index § Event text – construct the exact syntax of the log event, including any text, tokens, or other information 8
  9. 9. The Splunk App for Stream Wire Data Enhances the Platform for Operational Intelligence Efficient, Cloud-ready Wire Data Collection Simple Deployment Supports Fast Time to Value 9
  10. 10. Stream = Better Insights for * Solution Area Contextual Data Wire Data Enriched View Application Management application logs, monitoring data, metrics, events protocol conversations on database performance, DNS lookups, client data, business transaction paths… Measure application response times, deeper insights for root- cause diagnostics, trace tx paths, establish baselines… IT Operations application logs, monitoring data, metrics, events payload data including process times, errors, transaction traces, ICA latency, SQL statements, DNS records… Analyze traffic volume, speed and packets to identify infrastructure performance issues, capacity constraints, changes; establish baselines… 10
  11. 11. Stream = Better Insights for * Solution Area Contextual Data Wire Data Enriched View Security app + infra logs, monitoring data, events protocol identification, protocol headers, content and payload information, flow records Build analytics and context for incident response, threat detection, monitoring and compliance Digital Intelligence website activity, clickstream data, metrics browser-level customer interactions Customer Experience – analyze website and application bottlenecks to improve customer experience and online revenues Customer Support (online, call center) – faster root cause analysis and resolution of customer issues with website or apps 11
  12. 12. Scripted Inputs 12 § Send data to Splunk via a custom script § Splunk indexes anything written to stdout § Splunk handles scheduling § Supports shell, Python scripts, WIN batch, PowerShell § Any other utility that can format and stream data Streaming Mode § Splunk executes script and indexes stdout § Checks for any running instances Write to File Mode § Splunk launches script which produces output file, no need for external scheduler § Splunk monitors output file
  13. 13. Use Cases for Scripted Inputs 13 § Alternative to file-base or network-based inputs § Stream data from command-line tools, such as vmstat and iostat § Poll a web service, API or database and process the results § Reformat complex or binary data for easier parsing into events and fields § Maintain data sources with slow or resource-intensive startup procedures § Provide special or complex handling for transient or unstable inputs § Scripts that manage passwords and credentials § Wrapper scripts for command line inputs that contain special characters
  14. 14. Database Inputs § Create value with structured data § Enrich search results with additional business context § Easily import data for deeper analysis § Integrate multiple DBs concurrently § Simple set-up, non-invasive and secure DB Connect provides reliable, scalable, real-time integration between Splunk and traditional relational databases 14
  15. 15. Configure Database Inputs 15 § DB Connect App § Real-time, scalable integration with relational DBs § Browse and navigate schemas and tables before data import § Reliable scheduled import § Seamless installation and UI configuration § Supports connection pooling and caching § “Tail” tables or import entire tables § Detect and import new/updated rows using timestamps or unique IDs § Supports many RDBMS flavors § AWS RDS Aurora, AWS RedShift, IBM DB2 for Linux, Informix, MemSQL, MS SQL, MySQL, Oracle, PostgreSQL, SAP SQL Anywhere (aka Sybase SA), Sybase ASE and IQ, Teradata
  16. 16. Splunk ODBC Driver 16 § Interact with, manipulate and visualize machine data in Splunk Enterprise using business software tools § Leverage analytics from Splunk alongside Microsoft Excel, Tableau Desktop or Microstrategy Analytics Desktop § Industry-standard connectivity to Splunk Enterprise § Empowers business users with direct and secure access to machine data § Combine machine data with structured data for better operational context
  17. 17. ODBC: How it Works 17
  18. 18. Modular Inputs 18 § Create your own custom inputs § Scripted input with structure and intelligence § First class citizen in the Splunk management interface § Appears under Settings > Data Inputs § Benefits over simple scripted input § Instance control: launch a single or multiple instances § Input validation § Support multiple platforms § Stream data as text or XML § Secure access to mod input scripts via REST endpoints
  19. 19. Example Modular Inputs 19 Twitter § Stream JSON data from a Twitter source to Splunk using Tweepy Amazon S3 Online Storage § Index data from the Amazon S3 online storage web service Java Messaging Service (JMS) § Poll message queues and topics through JMS Messaging API § Talks to multiple providers: MQSeries (Websphere MQ), ActiveMQ, TibcoEMS, HornetQ, RabbitMQ, Native JMS, WebLogic JMS, Sonic MQ Splunk Windows Inputs § Retrieve WIN event logs, registry keys, perfmon counters
  20. 20. More Modular Inputs 20
  21. 21. zLinux Forwarder 21 § Easily collect and index data on IBM mainframes § Collect application and platform data § Download as new Forwarder distribution for s390x Linux
  22. 22. Extend Operational Intelligence to Mobile Apps 22 Deliver Better Performing, More Reliable Apps Deliver Real-Time Omni-Channel Analytics End-to-End Performance and Capacity Insights
  23. 23. Monitor App Usage and Performance • Improve user retention by quickly identifying crashes and performance issues • Establish whether issues are caused by an app or the network(s) • Correlate app, OS and device type to diagnose crash and network performance issues 23
  24. 24. Integrated Analytics Platform for Diverse Data Stores Full-featured, Integrated Product Fast Insights for Everyone Works with What You Have Today Explore Visualize Dashboard s ShareAnalyze Hadoop Clusters NoSQL and Other Data Stores Hadoop Client Libraries Streaming Resource Libraries Bi-directional Integration with Hadoop
  25. 25. Connect to NoSQL and Other Data Stores • Build custom streaming resource libraries • Search and analyze data from other data stores in Hunk • In partnership with leading NoSQL vendors • Use in conjunction with DB Connect for relational database lookups
  26. 26. Virtual Indexes § Enables seamless use of almost the entire Splunk stack on data § Automatically handles MapReduce § Technology is patent pending
  27. 27. Data Enrichment
  28. 28. Agenda § Tags – categorize and add meaning to data § Field Aliases – simplify search and correlation § Calculated Fields – shortcut complex/repetitive computations § Event Types – group common events and share knowledge § Lookups – augment data with additional external fields 28
  29. 29. § Adds inline meaning/context/specificity to raw data § Used to normalize metadata or raw data § Simplifies correlation of multiple data sources § Created in Splunk § Transferred from external sources What is Data Enrichment? 29
  30. 30. § Add meaning/context/specificity to raw data § Labels describing team, category, platform, geography § Applied to field-value combination § Multiple tags can be applied for each field-value § Case sensitive Tags 30
  31. 31. Create Tags 31
  32. 32. § Search events with tag in any field § Search events with tag in a specific field § Search events with tag using wildcards Find the Web Servers Tags in Action 32 tag=webserver tag::host=webserver tag=web* § Tag the host as webserver § Tag the sourcetype as web 1 2 3 4 5
  33. 33. § Normalize field labels to simplify search and correlation § Apply multiple aliases to a single field § Example: Username | cs_username | User à user § Example: c_ip | client | client_ip à clientip § Processed after field extractions + before lookups § Can apply to lookups § Aliases appear alongside original fields Field Aliases 33
  34. 34. Re-Label Field to Intuitive Name Create Field Alias 34 1 2 3
  35. 35. § Create field alias of clientip = customer § Search events in last 15 minutes, find customer field § Field alias (customer) and original field (clientip) are both displayed Search using an Intuitive Field Name Field Alias in Action 35 1 3 2 sourcetype=access_combined
  36. 36. § Shortcut for performing repetitive/long/complex transformations using eval command § Based on extracted or discovered fields only § Do not apply to lookup or generated fields Calculated Fields 36
  37. 37. Compute Kilobytes from Bytes Create Calculated Field 37 1 2 1 2 3
  38. 38. § Create kilobytes = bytes/1024 § Search events in last 15 minutes for kilobytes and bytes Search Using Kilobytes instead of Bytes Calculated Fields in Action 38 1 2 sourcetype=access_combined
  39. 39. § Classify and group common events § Capture and share knowledge § Based on search § Use in combination with fields and tags to define event topography Event Types 39
  40. 40. § Best Practice: Use punct field § Default metadata field describing event structure § Built on interesting characters: ",;-#$%&+./:=?@'|*nr"(){}<>[]^! » § Can use wildcards Create Event Types 40 event punct ####<Jun 3, 2014 5:38:22 PM MDT> <Notice> <WebLogicServer> <bea03> <asiAdminServer> <WrapperStartStopAppMain> <>WLS Kernel<> <> <BEA-000360> <Server started in RUNNING mode> ####<_,__::__>_<>_<>_<>_<>_<>_ 172.26.34.223 - - [01/Jul/2005:12:05:27 -0700] "GET /trade/app?action=logout HTTP/1.1" 200 2953 ..._-_-_[:::_-]_"_?=_/."__
  41. 41. § Show punct for sourcetype=access_combined § Pick a punct, then wildcard it after the timestamp § Add NOT status=200 § Save as “bad” event type + Color:red + Priority:1 (shift reload in browser to show coloring) Classify Events as Known Bad Create Event Type 41 eventtype=bad sourcetype="access_combined" punct="..._-_-_[//_:::]*" NOT status=200 1 2 3 4
  42. 42. Lookups to Enrich Raw Data LDAP AD Watch Lists CRM/ ERP CMDB External Data Sources Insight comes out Data goes inCreate additional fields from the raw data with a lookup to an external data source
  43. 43. § Augment raw events with additional fields § Provide context or supporting details § Translate field values to more descriptive data § Example: add text descriptions for error codes, IDs § Example: add contact details to user names or IDs § Example: add descriptions to HTTP status codes § File-based or scripted lookups Lookups 43
  44. 44. 44 1. Upload/create table 2. Assign table to lookup object 3. Map lookup to data set Convert a Code into a Description Configure a Static Lookup
  45. 45. § Get the lookup from the Splunk Wiki (save to .csv file) http://wiki.splunk.com/Http_status.csv § Lookup table files > Add new § Name: http_status.csv (must have .csv file extension) § Upload: <path to .csv> § Verify lookup was created successfully 1. Create HTTP Status Table 45 | inputlookup http_status.csv 1 2 3
  46. 46. § Lookup definitions > Add new § Name: http_status § Type: File-based § Lookup file: http_status.csv § Invoke the lookup manually 2. Add Lookup Definition 46 1 2 sourcetype=access_combined | lookup http_status status OUTPUT status_description
  47. 47. § Automatic lookups > Add new § Name: http_status (cannot have spaces) § Lookup table: http_status § Apply to: sourcetype = access_combined § Lookup input field: status § Lookup output field: status_description § Verify lookup is invoked automatically 3. Configure Automatic Lookup 47 1 2 sourcetype=access_combined
  48. 48. § Temporal lookups for time-based lookups § Example: Identify users on your network based on their IP address and the timestamp in DHCP logs § Use search results to populate a lookup table § … | outputlookup <tablename|filename> § Call an external command or script § Python scripts only § Example: DNS lookup for IP ßà Host § Create a lookup table using a relational database § Review matches against a database column or SQL query Fancy Lookups 48
  49. 49. § Creating and Managing Alerts (Job Inspector) § Macros § Workflow Actions More Data Enrichment 49
  50. 50. Level Up on Search & Reporting Commands
  51. 51. Agenda § Doing more with basic search commands § Advanced search commands § Doing more with basic reporting commands 51
  52. 52. Search Syntax Components 52
  53. 53. Anatomy of a Search 53 Disk
  54. 54. § top – limit § rare – same options as top § timechart – parameters § stats – functions (sum, avg, list, values, sparkline) § sort – inline ascending or descending § addcoltotals § addtotals Doing More with Basic Search Commands 54
  55. 55. § Commands have parameters or qualifiers § top and rare have similar syntax § Each search command has its own syntax – show inline help Find Most and Least Active Customers Using the top + rare Commands ... | top limit=20 clientip ... | rare limit=20 clientip IPs with the most visits IPs with the least visits
  56. 56. § Sort inline descending or ascending 56 ... | stats count by clientip | sort - count ... | stats count by clientip | sort + count Number of requests by customer - descending Number of requests by customer - ascending Sort the Number of Customer Requests Using the sort Command
  57. 57. § Show Search Command Reference Docs § Functions for eval + where § Functions for stats + chart and timechart § Invoke a function § Rename inline 57 ... | stats sum(bytes) by clientip | sort - sum(bytes) ... | stats sum(bytes) as totalbytes by clientip | sort - totalbytes Total payload by customer - descending Total payload by customer - ascending Determine Total Customer Payload Using functions + rename command
  58. 58. § List all values of a field § List only distinct values of a field 58 ... | stats values(action) by clientip ... | stats list(action) by clientip Activity by customer Distinct actions by customer Observe Customer Activity Using the list + values Functions
  59. 59. § Show distinct actions and cardinality of each action 59 sourcetype=access_combined | stats count(action) as value by clientip, action | eval pair=action + " (" + value + ")" | stats list(pair) as values by clientip Analyze Customer Activity Combine list + values Functions
  60. 60. § Add columns § Sum specific columns 60 ... | stats count by clientip, action 2 cols: clientip + action ... | stats sum(bytes) as totalbytes, avg(bytes) as avgbytes, count as totalevents by clientip | addcoltotals totalbytes, totalevents Sum totalbytes and totalevents colums Building a Table of Customer Activity Add Columns and Sum Columns
  61. 61. 61 ... | stats sum(bytes) as totalbytes, sum(other) as totalother by clientip | addtotals fieldname=totalstuff For each row, add totalbytes + totalother A better example: physical memory + virtual memory = total memory Building a Table of Customer Activity Sum Across Rows
  62. 62. 62 ... | stats sparkline(count) as trendline by clientip In context of larger event set ... | stats sparkline(count) as trendline sum(bytes) by clientip Inline in tables Trend Individual Customer Activity Sparklines in Action
  63. 63. Advanced Search Commands Command Short Description Hints transaction Group events by a common field value. Convenient, but resource intensive. cluster Cluster similar events together. Can be used on _raw or field. associate Identifies correlations between fields. Calculates entropy btn field values. correlate Calculates the correlation between different fields. Evaluates relationship of all fields in a result set. contingency Builds a contingency table for two fields. Computes co-occurrence, or % two fields exist in same events. anomalies Computes an unexpectedness score for an event. Computes similarity of event (X) to a set of previous events (P). anomalousvalue Finds and summarizes irregular, or uncommon, search results. Considers frequency of occurrence or number of stdev from the mean
  64. 64. § Sew events together + creates duration + eventcount § Sparklines inline in tables 64 ... | transaction JSESSIONID | table JSESSIONID, action, product_id Group by JSESSIONID View Customer Activity by Session Using the transaction Command
  65. 65. § Intelligent group (creates cluster_count and cluster_label) 65 ... | cluster showcount=1 | table _raw, cluster_count, cluster_label Automatically Group Customer Activity Using the cluster Command
  66. 66. § Predict over time § Chart Overlay with and without streamstats § Maps with iplocation + geostats § Single value § Metered visuals with gauge Do More with Basic Reporting Commands 66
  67. 67. § Predict future values using lower/upper bounds – single and multiple series 67 ... | timechart count as traffic | predict traffic Predict Website Traffic Using the predict Command
  68. 68. 68 sourcetype=access_combined (action=view OR action=purchase) | timechart span=10m count(eval(action="view")) as Viewed, count(eval(action="purchase")) as Purchased Compare Browsing vs. Buying Activity Simple Chart Overlay
  69. 69. 69 ... | iplocation clientip | geostats count by clientip Combine IP lookup with geo mapping Map Customer Activity Geographically Geolocation in Action
  70. 70. 70 ... | stats count Display a Simple Count of Events Single Value in Action
  71. 71. Display Counts Using Gauges Single Value, Radial and Filler Gauges in Action 71 ... | stats count | gauge count 10000 20000 30000 40000 50000
  72. 72. Data Model and Pivot
  73. 73. Agenda § What is a data model? § Build a data model § Pivot Interface § Accelerate a data model 73
  74. 74. Powerful Analytics Anyone Can Use Enables non-technical users to build complex reports without the search language Provides more meaningful representation of underlying raw machine data Acceleration technology delivers up to 1000x faster analytics over Splunk 5 74 Pivot Data Model Analytics Store
  75. 75. Define Relationships in Machine Data Data Model • Describes how underlying machine data is represented and accessed • Defines meaningful relationships in the data • Enables single authoritative view of underlying raw data Hierarchical object view of underlying data Add constraints to filter out events
  76. 76. Transparent Acceleration • Automatically collected – Handles timing issues, backfill… • Automatically maintained – Uses acceleration window • Stored on the indexers – Peer to the buckets • Fault tolerant collection Time window of data that is accelerated Check to enable acceleration of data model High Performance Analytics Store
  77. 77. Easy-to-Use Analytics • Drag-and-drop interface enables any user to analyze data • Create complex queries and reports without learning search language • Click to visualize any chart type; reports dynamically update when fields change Select fields from data model Time window All chart types available in the chart toolbox Save report to share Pivot
  78. 78. § Defines least common denominator for a data domain § Standard method to parse, categorize, normalize data § Set of field names and tags by domain § Packaged as a Data Models in a Splunk App § Domains: security, web, inventory, JVM, performance, network sessions, and more § Minimal setup to use Pivot interface Common Information Model (CIM) App 78
  79. 79. § Apps > Find More Apps > § Search: “Common Information Model” § Install free § Show fields for web + Web Data Model Download CIM App 79 1 2 3 4
  80. 80. Data Model & Pivot Tutorial http://docs.splunk.com/Documentation/Splunk/latest/PivotTuto rial/WelcometothePivotTutorial 80
  81. 81. Custom Visualizations and the Web Framework Toolkit
  82. 82. Agenda § Developer Platform § Web Framework Toolkit (WFT) § REST API and SDKs § Get a Flying Start 82
  83. 83. Optimizing the Analytics Process 83 Focus on the data – intuitive tools to enable the analyst No single visualization exists to handle all data sets. Never lose sight of the raw data Splunk Analytics Explore Context Visualize Algorithms
  84. 84. 6.0 + 6.1: Simple, Interactive, and Extensible 84 VISUALIZATION EXPLORATION CUSTOMIZABLE FRAMEWORK POWERFUL ANALYTICS Pivot Data Models Interactive Forms Contextual Drilldown Dashboard Editor Web Framework
  85. 85. The Splunk Enterprise Platform Collection Indexing Search Processing Language Core Functions Inputs, Apps, Other Content SDKContent Core Engine User and Developer Interfaces Web Framework REST API
  86. 86. What’s Possible with the Splunk Enterprise Platform? Power Mobile Apps Log Directly Extract Data Customer Dashboards Integrate BI Tools Integrate Platform Services Developer Platform
  87. 87. Powerful Platform for Enterprise Developers Developers Can Customize and Extend REST API Build Splunk Apps Extend and Integrate Splunk Simple XML JavaScript HTML5 Web Framework Java JavaScript Python Ruby C# PHP Data Models Search Extensibility Modular Inputs SDKs
  88. 88. Splunk Software for Developers Gain Application Intelligence Build Splunk Apps Integrate and Extend Splunk
  89. 89. A Wealth of Splunk Apps Over 1,100 apps available on the Splunk apps site API SDKs UI Server, Storage, Network Server Virtualization Operating Systems Custom Applications Business Applications Cloud Services App Performance MonitoringTicketing/ and Other Web Intelligence Mobile Applications Stream
  90. 90. § Interactive, cut/paste examples from popular source repositories: D3, GitHub, jQuery § Splunk 6.x Dashboard Examples App https://apps.splunk.com/app/1603 § Custom SimpleXML Extensions App https://apps.splunk.com/app/1772 § Splunk Web Framework Toolkit App https://apps.splunk.com/app/1613 Example Advanced Visualizations 90
  91. 91. 91 http://www.d3js.org
  92. 92. Add a D3 Bubble Chart 92 1. Go to Find More Apps and Install the Splunk 6.x Dashboard Examples App 2. Enter the App 3. Go to Examples > Custom Visualizations > D3 Bubble Chart 4. Copy autodiscover.js (file) + components/bubblechart (dir) from: $SH/etc/apps/simple_xml_examples/appserver/static to: $SH/apps/search/appserver/static 5. Copy and paste simple XML to new dashboard
  93. 93. Resources
  94. 94. Splunk Documentation 94 • http://docs.splunk.com • Official Product Docs • Wiki and community topics • Updated daily • Can be printed to .PDF
  95. 95. Splunk Answers 95 • http://answers.splunk.com • Community driven • Splunk supported • Knowledge exchange • Q & A
  96. 96. Splunk Education 96 • Recommended for Users – Using Splunk – Searching & Reporting • Recommended for UI/Dashboard Developers – Developing Apps • Instructor-Led Courses – Web – Onsite

×