The document summarizes a PhD thesis defense presented by Kiev Santos da Gama at the Université de Grenoble on October 6, 2011. The thesis addresses providing flexible mechanisms for executing untrustworthy components in a dynamic environment while minimizing risks to applications. It proposes dynamic isolation of components using isolation containers and runtime reconfigurable policies, as well as self-healing containers that provide continuous monitoring and automatic recovery from faults. The goal is to develop more dependable dynamic component-based applications that can still function despite failures through minimizing the impact of untrustworthy components.
1. Towards Dependable Dynamic
Component-Based Applications
Kiev SANTOS DA GAMA
Laboratoire d’Informatique de Grenoble
Université de Grenoble
Thèse soutenue publiquement le 6 Octobre 2011, devant le jury:
Mme Claudia RONCANCIO
Professeur, Ensimag - Grenoble INP,
Président
M Gilles MULLER
Directeur de Recherche, INRIA,
Rapporteur
M Lionel SEINTURIER
Professeur, Institut Univ. de France & Univ. de Lille,
Rapporteur
M Ivica CRNKOVIC
Professor, Mälardalen University,
Examinateur
M Gaël THOMAS
Maître de Conférences, Univ. Pierre et Marie Curie,
Examinateur
M Didier DONSEZ
Professeur, Université Joseph Fourier,
Directeur
M Peter KRIENS
Technical Director, OSGi Alliance,
Invité
5. Whose fault is it?
Who is liable?
User/Administrator?
Plugin Provider?
Platform (i.e. the browser)?
What can be done about it?
Should the whole application pay the price for
someone else’s fault?
06 October 2011
PhD Defense Kiev Gama
5
6. “A chain is as strong as its “A component system is only as
weakest link”
strong as its weakest
component” [Szyperski 2002]
06 October 2011
PhD Defense Kiev Gama
6
7. Main Question
How to provide a flexible mechanism
for untrustworthy components
execution minimizing risks to the
application?
06 October 2011
PhD Defense Kiev Gama
7
8. Back to the browsers:
Isolation Trend
Fault is contained.
Browser remains intact
06 October 2011
PhD Defense Kiev Gama
8
9. Limitations
No automatic recovery of faulty plugin
No monitoring for diagnosing and fault avoidance
OK for browsers.
What about other contexts?
06 October 2011
PhD Defense Kiev Gama
9
10. Critical Applications
Availability 99%
Unavailability = losses (money, data, lives)
Business-Critical: Banking
eCommerce
Non-stop systems
Dynamic reconfigurations needed at runtime
with minimal system disruption
06 October 2011
PhD Defense Kiev Gama
10
11. Dynamic Reconfiguration
Potential source of faults
Parts Repository
(plugins, components,
elements, etc)
System
06 October 2011
PhD Defense Kiev Gama
11
12. Main Question
How to provide a flexible mechanism for
untrustworthy components execution
minimizing risks to the application in a
dynamic environment?
06 October 2011
PhD Defense Kiev Gama
12
13. STATE OF THE ART
OBJECTIVES AND PROPOSITIONS
IMPLEMENTATION
VALIDATION
CONCLUSIONS AND PERSPECTIVES
06 October 2011
PhD Defense Kiev Gama
13
14. STATE OF THE ART
I. COMPONENTS
II. DEPENDABILITY
III. ISOLATION
06 October 2011
PhD Defense Kiev Gama
14
16. Software Component
“A component is a static abstraction with plugs”
[Nierstrasz 1995]
“A software component is a unit of composition with
contractually specified interfaces and explicit context dependencies
only. A software component can be deployed independently
and is subject to composition by third parties.”
[Szyperski 2002]
06 October 2011
PhD Defense Kiev Gama
16
17. Component Platform
“A platform is the substrate that allows for installation of components
… such that these can be instantiated and activated.”
[Szyperski 2002]
06 October 2011
PhD Defense Kiev Gama
17
18. Component Quality
“ilities” (reliability, maintainability, usability, etc)
Quality attributes difficult to evaluate
Sometimes Subjective
May involve many subcharacteristics
Combined components ≠ Combined attributes
Hard to predict or test all possible compositions
Worse in dynamic platforms
Need to execute untrustworthy components
but still ensuring system dependability
06 October 2011
PhD Defense Kiev Gama
18
19. STATE OF THE ART
I. COMPONENTS
II. DEPENDABILITY
III. ISOLATION
06 October 2011
PhD Defense Kiev Gama
19
20. Dependability
“the ability to avoid service failures that are more frequent and
more severe than is acceptable”
[Avizienis 2004]
Dependability involves other attributes
(e.g., availability, reliability, maintainability)
Dependability in a changing environment: Resilience
Ability to recover/adjust from changes
06 October 2011
PhD Defense Kiev Gama
20
21. Fault Tolerance
Typically implemented through redundancy techniques
Fault containment as a means to reduce fault impact
06 October 2011
PhD Defense Kiev Gama
21
22. Types of Fault
• Deterministic
– Programming errors
• Abnormal behavior (intentional or not)
– Reproducible bugs
• Non-deterministic
It may happen with
– Race conditions
trustworthy code
– Hardware origin
• Electric noise
• Bit flips
• Cosmic rays
06 October 2011
PhD Defense Kiev Gama
22
23. Recovery Mechanisms
Recovery-oriented
Self-healing
Computing
Recovery
Autonomic
Computing
Resilient Systems
06 October 2011
PhD Defense Kiev Gama
23
24. STATE OF THE ART
I. COMPONENTS
II. DEPENDABILITY
III. ISOLATION
06 October 2011
PhD Defense Kiev Gama
24
25. Isolation
Means of protection from other users
(Humans, Systems, Components)
Avoiding Harms
Destroyed/Modified data
Privacy
Data read without permission
Degraded service
Fault containment
06 October 2011
PhD Defense Kiev Gama
25
26. Isolation Techniques
Hardware-enforced
Process-based
Process
Process
Virtualization
Domain
Domain
Software-based
Process
Application-level domains
Security Managers
,
Policy
Process
06 October 2011
PhD Defense Kiev Gama
26
30. Limitations of Studied Approaches as
Dependable Component Platforms
Decision about isolation is made at design time
Lack of fault monitoring mechanisms
No automatic automatic recovery from faults
06 October 2011
PhD Defense Kiev Gama
30
31. STATE OF THE ART
OBJECTIVES AND PROPOSITIONS
IMPLEMENTATION
VALIDATION
CONCLUSIONS AND PERSPECTIVES
06 October 2011
PhD Defense Kiev Gama
31
32. Vision
Still live with failure
Minimize the impact of untrustworthy
components
More dependable dynamic component-
based applications
06 October 2011
PhD Defense Kiev Gama
32
33. Objectives
Flexible Isolation of Components
Automatic Recovery from Faults
06 October 2011
PhD Defense Kiev Gama
33
34. Propositions
Dynamic isolation of components
I. Component Isolation Containers
II. Runtime Reconfigurable Policy
Self-healing Container
I. Continuous Monitoring
II. Automatic recovery
06 October 2011
PhD Defense Kiev Gama
34
35. Example Scenario
Sensor
Data Gathering
Report Generator
RFID Reader
RFID Application
06 October 2011
PhD Defense Kiev Gama
35
36. PROPOSITIONS
DYNAMIC ISOLATION OF COMPONENTS
I. COMPONENT ISOLATION CONTAINERS
II. RUNTIME RECONFIGURABLE POLICY
SELF-HEALING CONTAINER
I. CONTINUOUS MONITORING
II. AUTOMATIC RECOVERY
37. Dynamic Isolation of Components
I. Component Isolation Containers
Component quarantine
A “sandbox” approach
Fault confinement
II. Runtime Reconfigurable Policy
Isolation at runtime (i.e. dynamic)
Promotion of components
06 October 2011
PhD Defense Kiev Gama
37
38. Dynamic Isolation of Components
I. Component Isolation Containers
Communication
Reader A
Reader B
Sensor X
Sensor Y
Data Gathering
Report Generator
06 October 2011
PhD Defense Kiev Gama
38
39. Dynamic Isolation of Components
The fault is contained
Reader A
Reader B
Crash
Sensor X
Crash
Sensor Y
Data Gathering
Report Generator
06 October 2011
PhD Defense Kiev Gama
39
40. Dynamic Isolation of Components
II. Runtime Reconfigurable Policy
New Reader
Persistence
Check
Sensor X
Sensor Y
Reader A
Reader B
Data Gathering
Report Generator
06 October 2011
PhD Defense Kiev Gama
40
41. Dynamic Isolation of Components
II. Runtime Reconfigurable Policy
Change
Sensor X
Sensor Y
Reader A
Reader B
Apply changed
policy
Promoted component
Data Gathering
Report Generator
06 October 2011
PhD Defense Kiev Gama
41
42. How Many Sandboxes?
N-sandboxes x One sandbox
How to group components?
Trustworthiness
Different Levels
Criteria
Cohesion
Same provider
Similar functionality
Coupling
Dependencies
Intensive communication
06 October 2011
PhD Defense Kiev Gama
42
43. PROPOSITIONS
DYNAMIC ISOLATION OF COMPONENTS
I. COMPONENT ISOLATION CONTAINERS
II. RUNTIME RECONFIGURABLE POLICY
SELF-HEALING CONTAINER
I. CONTINUOUS MONITORING
II. AUTOMATIC RECOVERY
44. Self-Healing Container
I. Continuous monitoring
Problem Diagnosis
Observation for future promotion (quarantine period)
II. Automatic Recovery
Restablished execution
06 October 2011
PhD Defense Kiev Gama
44
45. Self-Healing Container
I. Continuous Monitoring
Reader A
Reader B
Sensor X
Sensor Y
Data Gathering
Report Generator
06 October 2011
PhD Defense Kiev Gama
45
46. Self-Healing Container
I. Continuous Monitoring
Reader A
Reader B
Crash
Crash
Sensor X
Sensor Y
Data Gathering
Report Generator
06 October 2011
PhD Defense Kiev Gama
46
47. Self-Healing Container
II. Automatic Recovery
Recovery
Reader A
Reader B
Sensor X
Sensor Y
Data Gathering
Report Generator
06 October 2011
PhD Defense Kiev Gama
47
48. Summary
Propositions
Dynamic Isolation of components
I. Component isolation containers
II. Runtime reconfigurable policy
Self-healing container
I. Continuous monitoring
II. Automatic recovery
Differences against other approaches
Flexible isolation
Self-healing isolation container
06 October 2011
PhD Defense Kiev Gama
48
49. STATE OF THE ART
OBJECTIVES AND PROPOSITIONS
IMPLEMENTATION
VALIDATION
CONCLUSIONS AND PERSPECTIVES
06 October 2011
PhD Defense Kiev Gama
49
50. IMPLEMENTATION
COMPONENT ISOLATION
I. TARGET COMPONENT PLATFORM
II. ISOLATION APPROACH
III. ISOLATION TECHNIQUES USED
IV. RECONFIGURABLE POLICY
SELF-HEALING SANDBOX
I. AUTONOMIC MANAGER
II. FAULT MODEL
06 October 2011
PhD Defense Kiev Gama
50
51. Target Component Platform
(un)Installation of components at runtime
Non-stop applications
OSGi
A module system for Java applications
Used in industry and academia
06 October 2011
PhD Defense Kiev Gama
51
52. Isolation Approach
Approach used for isolating components
Two Component platforms:
Trusted
Trusted Platform
Sandbox Platform
Sandbox (Quarantine )
Replicated components
(for type dependency purpose)
Mutual exclusive states
06 October 2011
PhD Defense Kiev Gama
52
53. Isolation Approach: Mutual Exclusive States
Trustworthy components are active execute on the trusted platform
Untrustworthy components are active on the sandbox platform
Fault Contained Environment
Trusted Platform Sandbox Platform
STARTED RESOLVED STARTED RESOLVED RESOLVED STARTED RESOLVED STARTED
Bundle A Bundle B Bundle C Bundle D Bundle A Bundle B Bundle C Bundle D
?
Actually two ? ? ?
running platforms
Main
OSGi
Sandbox
OSGi
Virtual Perspective
STARTED STARTED STARTED STARTED
Impression of having
Bundle A Bundle B Bundle C Bundle D a single application
Legend
Trustworthy
? Untrustworthy
OSGi
06 October 2011
PhD Defense Kiev Gama
53
54. Isolation Approach: Virtual Perspective
Fault Contained Environment
Trusted Platform Sandbox Platform
STARTED RESOLVED STARTED RESOLVED RESOLVED STARTED RESOLVED STARTED
Bundle A Bundle B Bundle C Bundle D Bundle A Bundle B Bundle C Bundle D
?
Actually two ? ? ?
running platforms
Main
OSGi
Sandbox
OSGi
Virtual Perspective
STARTED STARTED STARTED STARTED
Impression of having
Bundle A Bundle B Bundle C Bundle D a single application
Legend
Trustworthy
? Untrustworthy
OSGi
06 October 2011
PhD Defense Kiev Gama
54
55. Isolation Techniques Used
Domain-based (Java Isolates)
strong isolation containers
with fault containment
Isolate
Isolate
Process (MVM)
Process-based (Java Virtual Machine)
Process
Process
(JVM)
(JVM)
06 October 2011
PhD Defense Kiev Gama
55
56. Communication between Containers
JVM
Java Isolate Java Isolate
(MVM)
Bundle A Bundle B Bundle C Bundle D Bundle A Bundle B Bundle C Bundle D
? ? ? ?
Communication
via
Main Sockets or Sandbox
OSGi Link API OSGi
(JSR-121)
JVM JVM
Bundle A Bundle B Bundle C Bundle D Bundle A Bundle B Bundle C Bundle D
? ? ? ?
Communitation
Main via Sandbox
OSGi Sockets OSGi
06 October 2011
PhD Defense Kiev Gama
56
57. Reconfigurable Policy
Isolation Policy Model
06 October 2011
PhD Defense Kiev Gama
57
58. IMPLEMENTATION
COMPONENT ISOLATION
I. TARGET COMPONENT PLATFORM
II. ISOLATION APPROACH
III. ISOLATION TECHNIQUES USED
IV. RECONFIGURABLE POLICY
SELF-HEALING SANDBOX
I. AUTONOMIC MANAGER
II. FAULT MODEL
06 October 2011
PhD Defense Kiev Gama
58
59. Self-healing Sandbox
The sandbox with an automatic recovery mechanism
An autonomic manager for the sandbox
External application
Control loop using a sense, analyze and react principle
Fault detection and forecast
Pragmatic approach based on a fault model
06 October 2011
PhD Defense Kiev Gama
59
60. Self-healing Sandbox
Architecture
Sandbox Platform
use
Trusted Platform
Core
use
delegate
delegate
use
use
Core
PlatformProxy
Service
PlatformProxy
Registry
use
use
delegate
use
delegate
use
use
Monitoring
EffectorMBean
Isolation Service MBean
Policy Eval. Registry
delegate
delegate
use
delegate
HeartbeatProbe
SensorProbe
EffectorProbe
Autonomic Manager
delegate
delegate
delegate
Monitor
Policy Strategy
Watchdog
Evaluator
Executor
use
use
use
use
use
use
use
use
Script
Knowledge
Interpreter
06 October 2011
PhD Defense Kiev Gama
60
61. Self-healing Sandbox
Control Loop Details
Sys. Admin. Script Repository
Autonomic Manager
AP
Policy
Evaluator
Monitor
Analyze and Plan
K
Knowledge
Execute
M
Watchdog
Monitor
E
Strategy
Executor
Sensors
Effectors
06 October 2011
Sandbox
61
62. Fault Model
Hypotheses of faults
General issues
Resource Consumption (e.g. CPU, memory)
Crashes (e.g., errors from wrapped native libraries)
Specific dynamism mishandling issues
Dangling objects (stale services)
Excessive
Faulty
Thread CPU Resource Usage
Behavior
Allocation
Denial of
Crash
Service
Unresponsiveness
Application
Stale Service Memory
Hang
06 October 2011
PhD Defense Kiev Gama
62
63. Separation of Concerns
Dependability as crosscutting concerns
Aspect-oriented Programming approach
All dependability code in aspects
Application
code
Aspect Weaver
Aspects
Woven code
06 October 2011
PhD Defense Kiev Gama
63
64. Implementation Summary
Domain-based (Isolates)
Process-based (Multiple JVMs)
Dynamic Isolation of components
I. Component isolation containers
Propositions
II. Runtime reconfigurable policy
DSL
Self-healing container
I. Continuous monitoring
II. Automatic recovery
Autonomic Manager
06 October 2011
PhD Defense Kiev Gama
64
65. STATE OF THE ART
OBJECTIVES AND PROPOSITIONS
IMPLEMENTATION
VALIDATION
CONCLUSIONS AND PERSPECTIVES
06 October 2011
PhD Defense Kiev Gama
65
66. VALIDATION
EXPERIMENTS USE CASE
DOMAIN-BASED X PROCESS-BASED
TEST PLATFORM
SELF-HEALING CONTAINER VALIDATION
06 October 2011
PhD Defense Kiev Gama
66
67. Experiments Use Case
Aspire RFID FP7 project
RFID Network
Non-stop servers collecting data
Plug-and-play devices
Native code for drivers puts stability in risk
ONS
Edge
Edge
RFID Readers +
Sensors
EPC IS EPC IS
Premise
Edge
06 October 2011
PhD Defense Kiev Gama
67
68. Experiments Use Case
Sensor
RFID Reader
RFID Application
ONS
Edge
Edge
RFID Readers +
Sensors
EPC IS EPC IS
Premise
Edge
06 October 2011
PhD Defense Kiev Gama
68
70. Results
Single JVM (Domain-based)
90 Sandbox
80 Trusted platform
70
60
50
MB
40 Footprint of our solution using
30 process-based isolation is equivalent
to domain-based isolation
20
10
0
MVM (2 Isolates) 2 x JVM 1.5 2 x JVM 1.6
Isolation Containers Application Startup Sandbox Crash Sandbox Reboot
time (ms) detection time (ms) time (ms)
MVM (Multi-Isolate) 3186 32 303
MVM 1.5 (Multi-JVM)
JVM 1.5
3449
3945
697
660
3064
3047
Mean time to repair on sandbox is
JVM 1.6 3859 658 2537
faster when using Isolates
06 October 2011
PhD Defense Kiev Gama
70
71. Generic Test Platform
Fault deployment instead of fault injection
– Emulation of erroneous behavior based on our fault model
– Fault injection in the interface level does not represent actual
application usage
Management probes for triggering the faults
JVM JVM
RMI
Connector
Management and
Monitoring Console
MBeanServer
(JConsole, VisualVM)
Test Test Test Test
Probe Probe Probe Probe
Report Core Sensor Reader Reader
Sensor X Sensor Y
Generator Interfaces Aggregator Simulator A Simulator B
Sandbox OSGi
06 October 2011
PhD Defense Kiev Gama
71
72. Self-healing Container Validation
Fault detection
– Fault model
Event causality
– Heuristic for events correlation
– Updates that trigger abnormal behavior
– Useful for finding faulty components
Prediction of faults
(e.g., Stale service retainers, Out of memory error)
06 October 2011
PhD Defense Kiev Gama
72
73. Results
Correlation of events was possible
Proper actions taken upon abnormal behavior
06 October 2011
PhD Defense Kiev Gama
73
74. STATE OF THE ART
OBJECTIVES AND PROPOSITIONS
IMPLEMENTATION
VALIDATION
CONCLUSIONS AND PERSPECTIVES
06 October 2011
PhD Defense Kiev Gama
74
75. Conclusions and Perspectives
Dynamic Isolation of Components
Component Isolation Containers (“sandboxes”)
Runtime Reconfigurable Policy
Self-healing Container
Continuous Monitoring
Automatic recovery
06 October 2011
PhD Defense Kiev Gama
75
76. Missing Characteristics
Fine grained monitoring
Automatic promotion of well-behaving components
Automatic replacement of faulty components (e.g. taken from a
repository)
Open issue:
How to automatically evaluate component trust ?
06 October 2011
PhD Defense Kiev Gama
76
77. Perspectives
Resource monitoring at component level
Automated Component Promotion
Correlation of Historical Events
Rating Component Trustworthiness
Diversity of Isolation Environments
Embedded Systems
Cloud Computing
06 October 2011
PhD Defense Kiev Gama
77
We enumerate here the topics to be explored in this part of the talk.
System quality attributes concern non functional properties, often called “ilities” (reliability, maintainability, usability, etc). Some of these attributes are hard to be evaluated, especially those that can be observed only during runtime such as reliability and performance. In the context of component-based development, when composing components together one should not consider the quality of the final composition as simply an addition of the attribute value of the involved components. If a reliable component A is composed with another reliable component B it does not necessarily means that the resulting composition will be reliable as well.Functional testing is based on system requirements in order to verify if a component produces the expected outputs, but it is difficult to test it against all possible compositions with other components.It may be necessary to execute untrustworthy components which are not necessarily malicious. For instance a device driver that contains unstable code.However it is still necessary to ensuresystem dependability. The untrustworthy component should not compromise the rest of the application.
The general goal of this thesis is to provide mechanisms that can make dynamic componentbasedapplications more dependable. We want to minimize some of the impacts that runtime updatesmay introduce, especially those related to executing untrustworthy components. We propose distinctapproaches that combined together can lead us towards the construction of more dependableapplications in dynamic component-based platforms: