SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
Hadoop For OpenStack Log Analysis
Havana Summit, April 16th, 2013,

Mike Pittaro
Principal Architect, Big Data Solutions


@pmikeyp            Freenode: mikeyp
michael_pittaro@dell.com
The Problem : Operating OpenStack at Scale




2                                     Revolutionary Cloud Team
The Search for the Holy Grail of OpenStack Operations

    • Imagine if we could follow a request …
       –   Through the entire system …
       –   Across compute, storage, network …
       –   Independent of physical nodes …
       –   With timestamps …
       –   Correlated with events outside OpenStack …




3                                                       Revolutionary Cloud Team
It’s Easy !




    Collect
              Analyze ?      Sleep !
     Logs




4                                      Revolutionary Cloud Team
OpenStack Log Analysis is a Big Data Problem

      Big Data is when the data itself is part of the
                       problem.


                         Volume
    • A large amount of data, growing at large rates
                        Velocity
    • The speed at which the data must be processed
                         Variety
      • The range of data types and data structure


5                                                       Revolutionary Cloud Team
Initial Focus and Scope of Our Efforts

                        • The Operators
                           – Assist in running OpenStack
                           – Not for tenants
                        • The Data
                           – Load all detail into Hadoop
                           – Extract and index significant fields
                           – Enable future analysis
                        • The Patterns
                           – What works
                           – What is repeatable across installs
                           – How can we collaborate



6                                          Revolutionary Cloud Team
Hadoop 101 – Simplified Block Diagram

                                             HUE               Mahout
                             Oozie                            (Machine
                           (Workflow)      (Web UI)           Learning)
    Sqoop

                                      Pig           Hive
                API’s
    Flume
                                    (Batch        (SQL-ish
                                  Language)        query)
    JDBC

                                               MapReduce Distributed
                        HBase Database
                                                   Processing

                             HDFS Distributed Storage


7                                                       Revolutionary Cloud Team
The Big Pieces

    • Log Collection
       – Continuous streaming of log data into Hadoop from OpenStack
    • Intelligent Log parsing, Indexing and Search
          – Should ‘know’ about OpenStack

    • Well Defined Storage Organization
          – Defined schema for the data
          – Predefined queries for high level status - dashboard

    • Straightforward implementation pattern
          – Add as little complexity as possible

    • Ability to perform deeper analysis
          – Hadoop enables this



8                                                                  Revolutionary Cloud Team
OpenStack Log Analysis Block Diagram
           OpenStack Nodes                                                       Search

       Python
                        Syslog
       Logging
                                                                             Solr Cloud
                                        Query
               FlumeNG
                 Agent
                                               Pig          Hive
                                                                                Lucene
                                                                                Indexes

    Nagios +
    Ganglia          Avro
                                        Avro                                MapReduce
                  Sqoop                              HDFS                     Jobs
                  /Flume         Avro



9                                                                  Revolutionary Cloud Team
Current Development Status

     • Batch Only, no Flume Collection
     • Converting logs to AVRO format
     • First cut of schema in place
     • Loading into Hadoop
     • Processing into SOLR indexes
     • Starting to look at data
        – Solr Searches
        – Pig scripts




10                                                     Revolutionary Cloud Team
Schema Thoughts
     2013-03-26 11:57:41 WARNING nova.db.sqlalchemy.session

     [req-ace2ccc0-919e-4fd1-9f3a-671c0c87d28f None None] Got mysql server has gone
     away: (2006, 'MySQL server has gone away')
                                         {"namespace": "logfile.openstack",
                                          "type": "record",
                                          "name": "logentry",
                                          "fields": [
                                              {"name": "hostname", "type": "string"},
                                              {"name": "date", "type": "string"},
                                              {"name": "time",    "type": "string"},
                                              {"name": "level",    "type": "string"},
                                              {"name": "module",   "type": "string"},
                                              {"name": "request_id1",   "type": ["string", "null"]},
                                              {"name": "request_id2",   "type": ["string", "null"]},
                                              {"name": "request_id3",   "type": ["string", "null"]},
                                              {"name": "data",    "type": "string"}
                                          ]
                                         }


11                                                                        Revolutionary Cloud Team
Demo: Where we are today

     • Solr Indexing and Search
        – Example of indexed fields and searching.
     • Pig for batch analysis
        – Reconstruct a sequence of messages related to an API request




12                                                                 Revolutionary Cloud Team
Data Collection Thoughts

     • Sources
        –   OpenStack subsystems
        –   Syslog files
        –   Nagios and Ganglia
        –   General Infrastructure Data
             – Network Switches / Routers

     • Input Formats
        – Mostly Semi-structured text
        – Subsystem, timestamp, hostname, severity and error level are important
     • Output Formats
        – Avro
        – Thrift
        – Protocol Buffers

13                                                                  Revolutionary Cloud Team
Log Collection Thoughts

     • Well understood patterns
        – Evolving best practices
     • Commonly Used Tools
        – Kafka
        – Scribe
        – Flume and FlumeNG
     • Key Requirements
        –   Distributed
        –   Reliable
        –   Aggregators – consolidate streams
        –   Store and Forward – when links are down



14                                                      Revolutionary Cloud Team
Storage Organization Thoughts

     • File Organization within Hadoop
        – Naming Convention
        – Directory Organization


     • Data Lifecycle
        –   Input and Staging
        –   ‘Hot’ and ‘Cold’ Data
        –   Tiered Indexes
        –   Compression
        –   Archival and Deletion




15                                                   Revolutionary Cloud Team
What should we do next ?

     • Document the basic patterns so far ?
     • Are there any related efforts ?
     • Begin deeper discussions
        – Lots of decisions to make
        – Need community input and suggestions
     • Collaborate on Schema Design
     • What upstream OpenStack changes are needed ?
     • Get sample logs in hands of Hadoopians
        – Cleansed reference log sets would be useful



16                                                      Revolutionary Cloud Team
References

     • The unified logging infrastructure for data analytics at Twitter
     • Building LinkedIn’s Real-time Activity Data Pipeline
     • Advances and challenges in log analysis
     • BP: Ceilometer HBase Storage Backend
     • BP: Cross Service Request ID
     • Log Everything All the Time


     • Holy Grail, Gangam Style



17                                                                Revolutionary Cloud Team

Mais conteúdo relacionado

Mais procurados

Spark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in JapanSpark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in Japan
Taro L. Saito
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
thelabdude
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014
Ryu Kobayashi
 

Mais procurados (20)

Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
Spark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in JapanSpark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in Japan
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
Drill dchug-29 nov2012
Drill dchug-29 nov2012Drill dchug-29 nov2012
Drill dchug-29 nov2012
 
xPatterns on Spark, Tachyon and Mesos - Bucharest meetup
xPatterns on Spark, Tachyon and Mesos - Bucharest meetupxPatterns on Spark, Tachyon and Mesos - Bucharest meetup
xPatterns on Spark, Tachyon and Mesos - Bucharest meetup
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014
 
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
 
A deeper-understanding-of-spark-internals
A deeper-understanding-of-spark-internalsA deeper-understanding-of-spark-internals
A deeper-understanding-of-spark-internals
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 
Apache cassandra in 2016
Apache cassandra in 2016Apache cassandra in 2016
Apache cassandra in 2016
 
Heuritech: Apache Spark REX
Heuritech: Apache Spark REXHeuritech: Apache Spark REX
Heuritech: Apache Spark REX
 
Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduce
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
 
Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Intro to py spark (and cassandra)
Intro to py spark (and cassandra)
 
Spark Programming
Spark ProgrammingSpark Programming
Spark Programming
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 

Destaque

Microsoft xamarin-experience
Microsoft xamarin-experienceMicrosoft xamarin-experience
Microsoft xamarin-experience
Xpand IT
 
Secret Life of a Weather Datum end of project event
Secret Life of a Weather Datum end of project eventSecret Life of a Weather Datum end of project event
Secret Life of a Weather Datum end of project event
lifeofdata
 

Destaque (20)

Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
 
Xpand IT - Overview and Intro - Xamarin Experience - Live Seminar
Xpand IT - Overview and Intro - Xamarin Experience - Live SeminarXpand IT - Overview and Intro - Xamarin Experience - Live Seminar
Xpand IT - Overview and Intro - Xamarin Experience - Live Seminar
 
Design Thinking for Big Data Applications
Design Thinking for Big Data Applications  Design Thinking for Big Data Applications
Design Thinking for Big Data Applications
 
Customer Sucess Story: Big Data in EDP
Customer Sucess Story: Big Data in EDP Customer Sucess Story: Big Data in EDP
Customer Sucess Story: Big Data in EDP
 
Customer Success Story: Brisa
Customer Success Story: Brisa Customer Success Story: Brisa
Customer Success Story: Brisa
 
Racing to Big Data in the Cloud with Microsoft Azure
Racing to Big Data in the Cloud with Microsoft AzureRacing to Big Data in the Cloud with Microsoft Azure
Racing to Big Data in the Cloud with Microsoft Azure
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?
 
Design Thinking for Enterprise Mobile Apps
Design Thinking for Enterprise Mobile AppsDesign Thinking for Enterprise Mobile Apps
Design Thinking for Enterprise Mobile Apps
 
Anti-social Databases
Anti-social DatabasesAnti-social Databases
Anti-social Databases
 
Introduction Pentaho 5.0
Introduction Pentaho 5.0 Introduction Pentaho 5.0
Introduction Pentaho 5.0
 
Microsoft xamarin-experience
Microsoft xamarin-experienceMicrosoft xamarin-experience
Microsoft xamarin-experience
 
Special project
Special projectSpecial project
Special project
 
Secret Life of a Weather Datum end of project event
Secret Life of a Weather Datum end of project eventSecret Life of a Weather Datum end of project event
Secret Life of a Weather Datum end of project event
 
Cartagena Data Festival | Telling Stories with Data 2015 04-21
Cartagena Data Festival | Telling Stories with Data 2015 04-21Cartagena Data Festival | Telling Stories with Data 2015 04-21
Cartagena Data Festival | Telling Stories with Data 2015 04-21
 
Challenges in opening up qualitative research data
Challenges in opening up qualitative research dataChallenges in opening up qualitative research data
Challenges in opening up qualitative research data
 
MongoDB at Flight Centre Ltd
MongoDB at Flight Centre LtdMongoDB at Flight Centre Ltd
MongoDB at Flight Centre Ltd
 
Review: Leadership Frameworks
Review: Leadership FrameworksReview: Leadership Frameworks
Review: Leadership Frameworks
 
Grow Customer Retention with Predictive Marketing and User-Generated Content
Grow Customer Retention with Predictive Marketing and User-Generated ContentGrow Customer Retention with Predictive Marketing and User-Generated Content
Grow Customer Retention with Predictive Marketing and User-Generated Content
 
онлайн бронирование модуль для турагенств
онлайн бронирование модуль для турагенствонлайн бронирование модуль для турагенств
онлайн бронирование модуль для турагенств
 
GIT Best Practices V 0.1
GIT Best Practices V 0.1GIT Best Practices V 0.1
GIT Best Practices V 0.1
 

Semelhante a Pittaro open stackloganalysis_20130416

Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
Heriyadi Janwar
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 March
MapR Technologies
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache Drill
DataWorks Summit
 

Semelhante a Pittaro open stackloganalysis_20130416 (20)

Data Science
Data ScienceData Science
Data Science
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other tools
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 March
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache Drill
 
Big data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto softwareBig data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto software
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Swiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache DrillSwiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache Drill
 
Upcoming services in OpenStack
Upcoming services in OpenStackUpcoming services in OpenStack
Upcoming services in OpenStack
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
 
Spark meetup TCHUG
Spark meetup TCHUGSpark meetup TCHUG
Spark meetup TCHUG
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 

Mais de OpenStack Foundation

Mais de OpenStack Foundation (20)

Sponsor Webinar - OpenStack Summit Vancouver 2018
Sponsor Webinar  - OpenStack Summit Vancouver 2018Sponsor Webinar  - OpenStack Summit Vancouver 2018
Sponsor Webinar - OpenStack Summit Vancouver 2018
 
OpenStack Summits 101: A Guide For Attendees
OpenStack Summits 101: A Guide For AttendeesOpenStack Summits 101: A Guide For Attendees
OpenStack Summits 101: A Guide For Attendees
 
OpenStack Marketing Plan - Community Presentation
OpenStack Marketing Plan - Community PresentationOpenStack Marketing Plan - Community Presentation
OpenStack Marketing Plan - Community Presentation
 
OpenStack 5th Birthday - User Group Parties
OpenStack 5th Birthday - User Group PartiesOpenStack 5th Birthday - User Group Parties
OpenStack 5th Birthday - User Group Parties
 
Liberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messagesLiberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messages
 
OpenStack Foundation 2H 2015 Marketing Plan
OpenStack Foundation 2H 2015 Marketing PlanOpenStack Foundation 2H 2015 Marketing Plan
OpenStack Foundation 2H 2015 Marketing Plan
 
OpenStack Summit Tokyo Sponsor Webinar
OpenStack Summit Tokyo Sponsor Webinar OpenStack Summit Tokyo Sponsor Webinar
OpenStack Summit Tokyo Sponsor Webinar
 
Cinder Updates - Liberty Edition
Cinder Updates - Liberty Edition Cinder Updates - Liberty Edition
Cinder Updates - Liberty Edition
 
Glance Updates - Liberty Edition
Glance Updates - Liberty EditionGlance Updates - Liberty Edition
Glance Updates - Liberty Edition
 
Heat Updates - Liberty Edition
Heat Updates - Liberty EditionHeat Updates - Liberty Edition
Heat Updates - Liberty Edition
 
Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition
 
Nova Updates - Liberty Edition
Nova Updates - Liberty EditionNova Updates - Liberty Edition
Nova Updates - Liberty Edition
 
Sahara Updates - Liberty Edition
Sahara Updates - Liberty EditionSahara Updates - Liberty Edition
Sahara Updates - Liberty Edition
 
Searchlight Updates - Liberty Edition
Searchlight Updates - Liberty EditionSearchlight Updates - Liberty Edition
Searchlight Updates - Liberty Edition
 
Trove Updates - Liberty Edition
Trove Updates - Liberty EditionTrove Updates - Liberty Edition
Trove Updates - Liberty Edition
 
OpenStack: five years in
OpenStack: five years inOpenStack: five years in
OpenStack: five years in
 
Swift Updates - Liberty Edition
Swift Updates - Liberty EditionSwift Updates - Liberty Edition
Swift Updates - Liberty Edition
 
Congress Updates - Liberty Edition
Congress Updates - Liberty EditionCongress Updates - Liberty Edition
Congress Updates - Liberty Edition
 
Release Cycle Management Updates - Liberty Edition
Release Cycle Management Updates - Liberty EditionRelease Cycle Management Updates - Liberty Edition
Release Cycle Management Updates - Liberty Edition
 
OpenStack Day CEE 2015: Real-World Use Cases
OpenStack Day CEE 2015: Real-World Use CasesOpenStack Day CEE 2015: Real-World Use Cases
OpenStack Day CEE 2015: Real-World Use Cases
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Pittaro open stackloganalysis_20130416

  • 1. Hadoop For OpenStack Log Analysis Havana Summit, April 16th, 2013, Mike Pittaro Principal Architect, Big Data Solutions @pmikeyp Freenode: mikeyp michael_pittaro@dell.com
  • 2. The Problem : Operating OpenStack at Scale 2 Revolutionary Cloud Team
  • 3. The Search for the Holy Grail of OpenStack Operations • Imagine if we could follow a request … – Through the entire system … – Across compute, storage, network … – Independent of physical nodes … – With timestamps … – Correlated with events outside OpenStack … 3 Revolutionary Cloud Team
  • 4. It’s Easy ! Collect Analyze ? Sleep ! Logs 4 Revolutionary Cloud Team
  • 5. OpenStack Log Analysis is a Big Data Problem Big Data is when the data itself is part of the problem. Volume • A large amount of data, growing at large rates Velocity • The speed at which the data must be processed Variety • The range of data types and data structure 5 Revolutionary Cloud Team
  • 6. Initial Focus and Scope of Our Efforts • The Operators – Assist in running OpenStack – Not for tenants • The Data – Load all detail into Hadoop – Extract and index significant fields – Enable future analysis • The Patterns – What works – What is repeatable across installs – How can we collaborate 6 Revolutionary Cloud Team
  • 7. Hadoop 101 – Simplified Block Diagram HUE Mahout Oozie (Machine (Workflow) (Web UI) Learning) Sqoop Pig Hive API’s Flume (Batch (SQL-ish Language) query) JDBC MapReduce Distributed HBase Database Processing HDFS Distributed Storage 7 Revolutionary Cloud Team
  • 8. The Big Pieces • Log Collection – Continuous streaming of log data into Hadoop from OpenStack • Intelligent Log parsing, Indexing and Search – Should ‘know’ about OpenStack • Well Defined Storage Organization – Defined schema for the data – Predefined queries for high level status - dashboard • Straightforward implementation pattern – Add as little complexity as possible • Ability to perform deeper analysis – Hadoop enables this 8 Revolutionary Cloud Team
  • 9. OpenStack Log Analysis Block Diagram OpenStack Nodes Search Python Syslog Logging Solr Cloud Query FlumeNG Agent Pig Hive Lucene Indexes Nagios + Ganglia Avro Avro MapReduce Sqoop HDFS Jobs /Flume Avro 9 Revolutionary Cloud Team
  • 10. Current Development Status • Batch Only, no Flume Collection • Converting logs to AVRO format • First cut of schema in place • Loading into Hadoop • Processing into SOLR indexes • Starting to look at data – Solr Searches – Pig scripts 10 Revolutionary Cloud Team
  • 11. Schema Thoughts 2013-03-26 11:57:41 WARNING nova.db.sqlalchemy.session [req-ace2ccc0-919e-4fd1-9f3a-671c0c87d28f None None] Got mysql server has gone away: (2006, 'MySQL server has gone away') {"namespace": "logfile.openstack", "type": "record", "name": "logentry", "fields": [ {"name": "hostname", "type": "string"}, {"name": "date", "type": "string"}, {"name": "time", "type": "string"}, {"name": "level", "type": "string"}, {"name": "module", "type": "string"}, {"name": "request_id1", "type": ["string", "null"]}, {"name": "request_id2", "type": ["string", "null"]}, {"name": "request_id3", "type": ["string", "null"]}, {"name": "data", "type": "string"} ] } 11 Revolutionary Cloud Team
  • 12. Demo: Where we are today • Solr Indexing and Search – Example of indexed fields and searching. • Pig for batch analysis – Reconstruct a sequence of messages related to an API request 12 Revolutionary Cloud Team
  • 13. Data Collection Thoughts • Sources – OpenStack subsystems – Syslog files – Nagios and Ganglia – General Infrastructure Data – Network Switches / Routers • Input Formats – Mostly Semi-structured text – Subsystem, timestamp, hostname, severity and error level are important • Output Formats – Avro – Thrift – Protocol Buffers 13 Revolutionary Cloud Team
  • 14. Log Collection Thoughts • Well understood patterns – Evolving best practices • Commonly Used Tools – Kafka – Scribe – Flume and FlumeNG • Key Requirements – Distributed – Reliable – Aggregators – consolidate streams – Store and Forward – when links are down 14 Revolutionary Cloud Team
  • 15. Storage Organization Thoughts • File Organization within Hadoop – Naming Convention – Directory Organization • Data Lifecycle – Input and Staging – ‘Hot’ and ‘Cold’ Data – Tiered Indexes – Compression – Archival and Deletion 15 Revolutionary Cloud Team
  • 16. What should we do next ? • Document the basic patterns so far ? • Are there any related efforts ? • Begin deeper discussions – Lots of decisions to make – Need community input and suggestions • Collaborate on Schema Design • What upstream OpenStack changes are needed ? • Get sample logs in hands of Hadoopians – Cleansed reference log sets would be useful 16 Revolutionary Cloud Team
  • 17. References • The unified logging infrastructure for data analytics at Twitter • Building LinkedIn’s Real-time Activity Data Pipeline • Advances and challenges in log analysis • BP: Ceilometer HBase Storage Backend • BP: Cross Service Request ID • Log Everything All the Time • Holy Grail, Gangam Style 17 Revolutionary Cloud Team