10. Main Visualization
Customizable using control panel
Aggregate view
◦ Summarize and drill down
Draws attention to anomalies
10
11. Switch between main visualizations
Seamless transitions
◦ Uninterrupted data stream
11
12. Hierarchy of nodes, organized by rack
Color and size configurable
Scalable using summarization and drill-
down
Identify abnormal rack or nodes
12
13. Hierarchy of nodes, organized by rack
Color and size configurable
Scalable using summarization and drill-down
Identify abnormal rack or nodes
13
14. Grouped by job
Color and size configurable
◦ Example uses role for color, time remaining for
size
Identify abnormal jobs or tasks
14
15. Grouped by rack
Color and size configurable
◦ Example uses CPU usage and rack color coding
Identify abnormal nodes or racks
15
16. Identify trends with nodes and racks
Color, size, and plots configurable
Identify correlations between metrics
16
17. Detailed data for individual node
Traditional visualizations for single
node
17
18. Controls
Configure metrics for visualizations
Pause and resume data stream
Legend for main visualization
18
19. Aggregate
Data
Aggregate data for the cluster
◦ Log events stream
◦ Global node data
◦ Summarization data
19
20. History Controls
Snapshots of historical data
◦ See main visualization and sidebar data at certain
time
Visualize metric across time
20
21. Scalable
◦ Drill-down and summarization
◦ Efficient web-based framework
Intuitive, informative
◦ Topological visualization
◦ Draw attention to abnormalities
Interactive, real-time
◦ Designed for streaming data
◦ Configurable visualization
◦ Pause, rewind, resume
21
22. Experimental Setup
◦ Compare Theius to Ganglia
◦ 5 graduate students at UIUC
No prior experience with Ganglia or Theius
◦ 4 comparative tasks
Both Ganglia & Theius
◦ 6 scenarios for trends and correlations
Theius only
◦ Timings & subjective feedback
22
23. 60
Tasks
50 ◦ Scenario 1
CPU usage in single node
40
◦ Scenario 2
Seconds
30 Node with highest CPU
◦ Scenario 3
20
High memory usage
10 nodes
◦ Scenario 4
0
Aggregate cluster use
Theius
Ganglia
23
24. Task 1
◦ Identify abnormal rack in heterogeneous cluster 2.2 s
Task 2
◦ Identify rack with abnormal CPU usage
6.2 s
Task 3
10.0 s
◦ Identify machine that logged the last fatal error
Task 4
67.4 s
◦ Identify machine with high CPU, memory usage, or context switch rate
Task 5
◦ Identify rack with high CPU, memory usage, or context switch rate
1.2 s
Task 6
7.8 s
◦ Identify correlation between context switch rate and CPU usage
24
25. Source Code
◦ https://github.com/jtedesco/Theius
Future Work
◦ User study
System administrators
Larger group
Timing as appropriate metric
◦ MapReduce-specific visualizations
◦ Scalability experiments
25
26. Jon Tedesco
IC2E 2013, San Francisco, CA, USA
Jon Tedesco, Roman Dudko, Abhishek Sharma, Reza Farivar, Roy Campbell