SlideShare uma empresa Scribd logo
1 de 26
Jon Tedesco




IC2E 2013, San Francisco, CA, USA
Jon Tedesco, Roman Dudko, Abhishek Sharma, Reza Farivar, Roy Campbell
   Problem
    ◦ System administrators
         Bottleneck for detecting & responding to failures
         Communicate state of system quickly

   Monitoring
    ◦ Streaming, real-time data
    ◦ Ganglia
         Widely used, scalable, and flexible
                                                              Visualization
   Prediction
    ◦ Online prediction algorithms (real-time)

   Visualization Problem
    ◦ Ganglia
         Static, time-based graphs
3
4
   Interactive
    ◦ Responsive and controllable
   Real-time
    ◦ Streaming, real-time, automatic
   Informative
    ◦ Direct attention to potential problems and artifacts
   Intuitive
    ◦ Demand skill, not experience
   Scalable
    ◦ Visualize large clusters without sacrificing usability



                                                               5
   Objectives
    ◦ Streaming data
    ◦ Configurable and interactive
    ◦ Informative
   Use cases
    ◦ Heterogeneous cluster
    ◦ Rack failure
    ◦ Node failure
    ◦ Uneven load distribution


                                     6
   Architecture
    ◦ Simulator
       Generates simulated cluster data
       Streams data to clients
    ◦ Webpage
       Asynchronous & interactive

   Implementation
    ◦ JavaScript
       d3.js
       jQuery
    ◦ Python
    ◦ AJAX



                                           7
   Data
    ◦ Methodology
      Data types from previous work
      Heuristic values
    ◦ Examples
      CPU, memory, context switch rate
      Log events
      MapReduce tasks and jobs
      Failure or event prediction



                                          8
9
Main Visualization




              Customizable using control panel
              Aggregate view
               ◦   Summarize and drill down

              Draws attention to anomalies
                                                  10
   Switch between main visualizations
   Seamless transitions
    ◦   Uninterrupted data stream
                                         11
   Hierarchy of nodes, organized by rack
   Color and size configurable
   Scalable using summarization and drill-
    down
   Identify abnormal rack or nodes
                                              12
   Hierarchy of nodes, organized by rack
   Color and size configurable
   Scalable using summarization and drill-down
   Identify abnormal rack or nodes
                                                  13
   Grouped by job
   Color and size configurable
    ◦   Example uses role for color, time remaining for
        size

   Identify abnormal jobs or tasks

                                                          14
   Grouped by rack
   Color and size configurable
    ◦   Example uses CPU usage and rack color coding

   Identify abnormal nodes or racks
                                                       15
   Identify trends with nodes and racks
   Color, size, and plots configurable
   Identify correlations between metrics
                                            16
   Detailed data for individual node
   Traditional visualizations for single
    node
                                            17
Controls




              Configure metrics for visualizations
              Pause and resume data stream
              Legend for main visualization
                                                      18
Aggregate
                                 Data




   Aggregate data for the cluster
    ◦   Log events stream
    ◦   Global node data
    ◦   Summarization data
                                           19
History Controls


            Snapshots of historical data
             ◦   See main visualization and sidebar data at certain
                 time

            Visualize metric across time
                                                                      20
   Scalable
    ◦ Drill-down and summarization
    ◦ Efficient web-based framework
   Intuitive, informative
    ◦ Topological visualization
    ◦ Draw attention to abnormalities
   Interactive, real-time
    ◦ Designed for streaming data
    ◦ Configurable visualization
    ◦ Pause, rewind, resume



                                        21
   Experimental Setup
    ◦ Compare Theius to Ganglia
    ◦ 5 graduate students at UIUC
      No prior experience with Ganglia or Theius
    ◦ 4 comparative tasks
      Both Ganglia & Theius
    ◦ 6 scenarios for trends and correlations
      Theius only
    ◦ Timings & subjective feedback


                                                    22
60
                       Tasks
          50            ◦ Scenario 1
                           CPU usage in single node
          40
                        ◦ Scenario 2
Seconds




          30               Node with highest CPU
                        ◦ Scenario 3
          20
                           High memory usage
          10                nodes
                        ◦ Scenario 4
          0
                           Aggregate cluster use


               Theius
               Ganglia

                                                    23
   Task 1
    ◦ Identify abnormal rack in heterogeneous cluster                        2.2 s
   Task 2
    ◦ Identify rack with abnormal CPU usage
                                                                             6.2 s
   Task 3
                                                                             10.0 s
    ◦ Identify machine that logged the last fatal error
   Task 4
                                                                             67.4 s
    ◦ Identify machine with high CPU, memory usage, or context switch rate
   Task 5
    ◦ Identify rack with high CPU, memory usage, or context switch rate
                                                                             1.2 s
   Task 6
                                                                             7.8 s
    ◦ Identify correlation between context switch rate and CPU usage




                                                                                      24
   Source Code
    ◦ https://github.com/jtedesco/Theius
   Future Work
    ◦ User study
      System administrators
      Larger group
      Timing as appropriate metric
    ◦ MapReduce-specific visualizations
    ◦ Scalability experiments


                                           25
Jon Tedesco




IC2E 2013, San Francisco, CA, USA
Jon Tedesco, Roman Dudko, Abhishek Sharma, Reza Farivar, Roy Campbell

Mais conteúdo relacionado

Destaque

Presentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresecPresentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresecTiago Henriques
 
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...Jonas Traub
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkVenkata Naga Ravi
 
Web 2 0 Projects Elementary
Web 2 0 Projects ElementaryWeb 2 0 Projects Elementary
Web 2 0 Projects ElementaryCinci0987
 
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12cProcessing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12cGuido Schmutz
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan confluent
 

Destaque (6)

Presentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresecPresentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresec
 
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
Web 2 0 Projects Elementary
Web 2 0 Projects ElementaryWeb 2 0 Projects Elementary
Web 2 0 Projects Elementary
 
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12cProcessing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan
 

Semelhante a Theius: A Streaming Visualization Suite for Hadoop Clusters

Key-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscanaKey-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscanaMatteo Baglini
 
Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...Matthias Trapp
 
Instrumenting parsecs raytrace
Instrumenting parsecs raytraceInstrumenting parsecs raytrace
Instrumenting parsecs raytraceMário Almeida
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexThomas Weise
 
Nuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo ApplicationsNuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo ApplicationsNuxeo
 
Automating Monitoring with Puppet
Automating Monitoring with PuppetAutomating Monitoring with Puppet
Automating Monitoring with PuppetChristian Mague
 
Seattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffersSeattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffersbtoddb
 
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016Zabbix
 
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...Liming Zhu
 
Presentation agile
Presentation agilePresentation agile
Presentation agileuji_geotec
 
MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)Scott Hernandez
 
Open stack china_201109_sjtu_jinyh
Open stack china_201109_sjtu_jinyhOpen stack china_201109_sjtu_jinyh
Open stack china_201109_sjtu_jinyhOpenCity Community
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
Urs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural NetworksUrs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural NetworksIntel Nervana
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talkSatish Mehta
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
 
Bergman Enabling Computation for neuro ML external
Bergman Enabling Computation for neuro ML externalBergman Enabling Computation for neuro ML external
Bergman Enabling Computation for neuro ML externalazlefty
 

Semelhante a Theius: A Streaming Visualization Suite for Hadoop Clusters (20)

TARDEC Presentation 2
TARDEC Presentation 2TARDEC Presentation 2
TARDEC Presentation 2
 
Key-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscanaKey-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscana
 
Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...
 
Instrumenting parsecs raytrace
Instrumenting parsecs raytraceInstrumenting parsecs raytrace
Instrumenting parsecs raytrace
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
 
Nuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo ApplicationsNuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo Applications
 
Automating Monitoring with Puppet
Automating Monitoring with PuppetAutomating Monitoring with Puppet
Automating Monitoring with Puppet
 
Seattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffersSeattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffers
 
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
 
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
 
Scalarium and CouchDB
Scalarium and CouchDBScalarium and CouchDB
Scalarium and CouchDB
 
Presentation agile
Presentation agilePresentation agile
Presentation agile
 
Dynomite - PerconaLive 2017
Dynomite  - PerconaLive 2017Dynomite  - PerconaLive 2017
Dynomite - PerconaLive 2017
 
MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)
 
Open stack china_201109_sjtu_jinyh
Open stack china_201109_sjtu_jinyhOpen stack china_201109_sjtu_jinyh
Open stack china_201109_sjtu_jinyh
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Urs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural NetworksUrs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural Networks
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talk
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
Bergman Enabling Computation for neuro ML external
Bergman Enabling Computation for neuro ML externalBergman Enabling Computation for neuro ML external
Bergman Enabling Computation for neuro ML external
 

Último

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Theius: A Streaming Visualization Suite for Hadoop Clusters

  • 1. Jon Tedesco IC2E 2013, San Francisco, CA, USA Jon Tedesco, Roman Dudko, Abhishek Sharma, Reza Farivar, Roy Campbell
  • 2. Problem ◦ System administrators  Bottleneck for detecting & responding to failures  Communicate state of system quickly  Monitoring ◦ Streaming, real-time data ◦ Ganglia  Widely used, scalable, and flexible Visualization  Prediction ◦ Online prediction algorithms (real-time)  Visualization Problem ◦ Ganglia  Static, time-based graphs
  • 3. 3
  • 4. 4
  • 5. Interactive ◦ Responsive and controllable  Real-time ◦ Streaming, real-time, automatic  Informative ◦ Direct attention to potential problems and artifacts  Intuitive ◦ Demand skill, not experience  Scalable ◦ Visualize large clusters without sacrificing usability 5
  • 6. Objectives ◦ Streaming data ◦ Configurable and interactive ◦ Informative  Use cases ◦ Heterogeneous cluster ◦ Rack failure ◦ Node failure ◦ Uneven load distribution 6
  • 7. Architecture ◦ Simulator  Generates simulated cluster data  Streams data to clients ◦ Webpage  Asynchronous & interactive  Implementation ◦ JavaScript  d3.js  jQuery ◦ Python ◦ AJAX 7
  • 8. Data ◦ Methodology  Data types from previous work  Heuristic values ◦ Examples  CPU, memory, context switch rate  Log events  MapReduce tasks and jobs  Failure or event prediction 8
  • 9. 9
  • 10. Main Visualization  Customizable using control panel  Aggregate view ◦ Summarize and drill down  Draws attention to anomalies 10
  • 11. Switch between main visualizations  Seamless transitions ◦ Uninterrupted data stream 11
  • 12. Hierarchy of nodes, organized by rack  Color and size configurable  Scalable using summarization and drill- down  Identify abnormal rack or nodes 12
  • 13. Hierarchy of nodes, organized by rack  Color and size configurable  Scalable using summarization and drill-down  Identify abnormal rack or nodes 13
  • 14. Grouped by job  Color and size configurable ◦ Example uses role for color, time remaining for size  Identify abnormal jobs or tasks 14
  • 15. Grouped by rack  Color and size configurable ◦ Example uses CPU usage and rack color coding  Identify abnormal nodes or racks 15
  • 16. Identify trends with nodes and racks  Color, size, and plots configurable  Identify correlations between metrics 16
  • 17. Detailed data for individual node  Traditional visualizations for single node 17
  • 18. Controls  Configure metrics for visualizations  Pause and resume data stream  Legend for main visualization 18
  • 19. Aggregate Data  Aggregate data for the cluster ◦ Log events stream ◦ Global node data ◦ Summarization data 19
  • 20. History Controls  Snapshots of historical data ◦ See main visualization and sidebar data at certain time  Visualize metric across time 20
  • 21. Scalable ◦ Drill-down and summarization ◦ Efficient web-based framework  Intuitive, informative ◦ Topological visualization ◦ Draw attention to abnormalities  Interactive, real-time ◦ Designed for streaming data ◦ Configurable visualization ◦ Pause, rewind, resume 21
  • 22. Experimental Setup ◦ Compare Theius to Ganglia ◦ 5 graduate students at UIUC  No prior experience with Ganglia or Theius ◦ 4 comparative tasks  Both Ganglia & Theius ◦ 6 scenarios for trends and correlations  Theius only ◦ Timings & subjective feedback 22
  • 23. 60  Tasks 50 ◦ Scenario 1  CPU usage in single node 40 ◦ Scenario 2 Seconds 30  Node with highest CPU ◦ Scenario 3 20  High memory usage 10 nodes ◦ Scenario 4 0  Aggregate cluster use Theius Ganglia 23
  • 24. Task 1 ◦ Identify abnormal rack in heterogeneous cluster 2.2 s  Task 2 ◦ Identify rack with abnormal CPU usage 6.2 s  Task 3 10.0 s ◦ Identify machine that logged the last fatal error  Task 4 67.4 s ◦ Identify machine with high CPU, memory usage, or context switch rate  Task 5 ◦ Identify rack with high CPU, memory usage, or context switch rate 1.2 s  Task 6 7.8 s ◦ Identify correlation between context switch rate and CPU usage 24
  • 25. Source Code ◦ https://github.com/jtedesco/Theius  Future Work ◦ User study  System administrators  Larger group  Timing as appropriate metric ◦ MapReduce-specific visualizations ◦ Scalability experiments 25
  • 26. Jon Tedesco IC2E 2013, San Francisco, CA, USA Jon Tedesco, Roman Dudko, Abhishek Sharma, Reza Farivar, Roy Campbell