PhD Defense slides

Towards Dependable Dynamic
Component-Based Applications

Kiev SANTOS DA GAMA

Laboratoire d’Informatique de Grenoble

Université de Grenoble

Thèse soutenue publiquement le 6 Octobre 2011, devant le jury:

Mme Claudia RONCANCIO

Professeur, Ensimag - Grenoble INP,

Président

M Gilles MULLER

Directeur de Recherche, INRIA,

Rapporteur

M Lionel SEINTURIER

Professeur, Institut Univ. de France & Univ. de Lille,

Rapporteur

M Ivica CRNKOVIC

Professor, Mälardalen University,

Examinateur

M Gaël THOMAS

Maître de Conférences, Univ. Pierre et Marie Curie,

Examinateur

M Didier DONSEZ

Professeur, Université Joseph Fourier,

Directeur

M Peter KRIENS

Technical Director, OSGi Alliance,

Invité

Extensible Applications
Different elements (components) easily pluggable into the application

exte nsions
6 000+
at ﬁrefox .org

06 October 2011

PhD Defense Kiev Gama

2

Components from Many Sources

06 October 2011


3

Components from Many Sources

Crash

06 October 2011


4

Whose fault is it?
Who is liable?

User/Administrator?

Plugin Provider?

Platform (i.e. the browser)?

What can be done about it?

Should the whole application pay the price for
someone else’s fault?

06 October 2011


5

“A chain is as strong as its “A component system is only as
weakest link”

strong as its weakest

component” [Szyperski 2002]

06 October 2011


6

Main Question

How to provide a ﬂexible mechanism
for untrustworthy components
execution minimizing risks to the
application?

06 October 2011


7

Back to the browsers:
Isolation Trend

Fault is contained.

Browser remains intact

06 October 2011


8

Limitations
No automatic recovery of faulty plugin

No monitoring for diagnosing and fault avoidance

OK for browsers.

What about other contexts?

06 October 2011


9

Critical Applications
Availability 99%

Unavailability = losses (money, data, lives)

Business-Critical: Banking

eCommerce

Non-stop systems

Dynamic reconﬁgurations needed at runtime
with minimal system disruption

06 October 2011


10

Dynamic Reconﬁguration
Potential source of faults

Parts Repository

(plugins, components,

elements, etc)

System

06 October 2011


11

Main Question

How to provide a ﬂexible mechanism for
untrustworthy components execution
minimizing risks to the application in a
dynamic environment?

06 October 2011


12

STATE OF THE ART
OBJECTIVES AND PROPOSITIONS
IMPLEMENTATION
VALIDATION
CONCLUSIONS AND PERSPECTIVES

06 October 2011


13

STATE OF THE ART

I. COMPONENTS

II. DEPENDABILITY

III. ISOLATION

06 October 2011


14

Components
Software Component

Component Platform

Component Quality

06 October 2011


15

Software Component

“A component is a static abstraction with plugs”

[Nierstrasz 1995]

“A software component is a unit of composition with
contractually speciﬁed interfaces and explicit context dependencies
only. A software component can be deployed independently
and is subject to composition by third parties.”

[Szyperski 2002]

06 October 2011


16

Component Platform

“A platform is the substrate that allows for installation of components
… such that these can be instantiated and activated.”

[Szyperski 2002]

06 October 2011


17

Component Quality
“ilities” (reliability, maintainability, usability, etc)

Quality attributes difﬁcult to evaluate

Sometimes Subjective

May involve many subcharacteristics

Combined components ≠ Combined attributes

Hard to predict or test all possible compositions

Worse in dynamic platforms

Need to execute untrustworthy components
but still ensuring system dependability

06 October 2011


18

STATE OF THE ART

I. COMPONENTS

II. DEPENDABILITY

III. ISOLATION

06 October 2011


19

Dependability
“the ability to avoid service failures that are more frequent and
more severe than is acceptable”

[Avizienis 2004]

Dependability involves other attributes
(e.g., availability, reliability, maintainability)

Dependability in a changing environment: Resilience

Ability to recover/adjust from changes

06 October 2011


20

Fault Tolerance
Typically implemented through redundancy techniques

Fault containment as a means to reduce fault impact

06 October 2011


21

Types of Fault
•  Deterministic

–  Programming errors

•  Abnormal behavior (intentional or not)

–  Reproducible bugs

•  Non-deterministic

It may happen with
–  Race conditions

trustworthy code

–  Hardware origin

•  Electric noise

•  Bit ﬂips

•  Cosmic rays

06 October 2011


22

Recovery Mechanisms

Recovery-oriented
Self-healing

Computing

Recovery

Autonomic
Computing

Resilient Systems

06 October 2011


23

STATE OF THE ART

I. COMPONENTS

II. DEPENDABILITY

III. ISOLATION

06 October 2011


24

Isolation
Means of protection from other users

(Humans, Systems, Components)

Avoiding Harms

Destroyed/Modiﬁed data

Privacy

Data read without permission

Degraded service

Fault containment

06 October 2011


25

Isolation Techniques
Hardware-enforced

Process-based

Process

Process

Virtualization

Domain

Domain

Software-based

Process

Application-level domains

Security Managers

,
Policy

Process

06 October 2011


26

Techniques Summary

Privacy

Fault Containment

Process-based

P P
Virtualization

P P
Security Managers

P O
Application-level Domains

P P

06 October 2011


27

Component Isolation
•  Component Object Model

–  In-process

–  Out-of-process server

•  .NET Platform

–  Application Domains

–  Security managers

•  Java

–  Security managers

–  Class loaders

–  Isolates

06 October 2011


28

Component Isolation Summary

Privacy

Fault Containment

COM (In-process)

P O
COM (out-of-process)

P P
.NET Application Domains

P P
.NET Security Managers

P O
Java Security Managers

P O
Java Class loaders

P O
Java Isolates

P P

06 October 2011


29

Limitations of Studied Approaches as
Dependable Component Platforms

Decision about isolation is made at design time

Lack of fault monitoring mechanisms

No automatic automatic recovery from faults

06 October 2011


30

STATE OF THE ART
IMPLEMENTATION
VALIDATION

06 October 2011


31

Vision
Still live with failure

Minimize the impact of untrustworthy
components

More dependable dynamic component-
based applications

06 October 2011


32

Objectives

Flexible Isolation of Components

Automatic Recovery from Faults

06 October 2011


33

Propositions
Dynamic isolation of components

I.  Component Isolation Containers

II.  Runtime Reconﬁgurable Policy

Self-healing Container

I.  Continuous Monitoring

II.  Automatic recovery

06 October 2011


34

Example Scenario

Sensor

Data Gathering

Report Generator

RFID Reader

RFID Application

06 October 2011


35

PROPOSITIONS

DYNAMIC ISOLATION OF COMPONENTS

I. COMPONENT ISOLATION CONTAINERS

II. RUNTIME RECONFIGURABLE POLICY

SELF-HEALING CONTAINER

I. CONTINUOUS MONITORING

II. AUTOMATIC RECOVERY

Dynamic Isolation of Components
I. Component Isolation Containers

Component quarantine

A “sandbox” approach

Fault conﬁnement

II. Runtime Reconﬁgurable Policy

Isolation at runtime (i.e. dynamic)

Promotion of components

06 October 2011


37

I. Component Isolation Containers

Communication

Reader A

Reader B

Sensor X

Sensor Y

Data Gathering

Report Generator

06 October 2011


38

The fault is contained

Reader A

Reader B

Crash
Sensor X

Crash

Sensor Y

Data Gathering

Report Generator

06 October 2011


39


New Reader

Persistence

Check

Sensor X

Sensor Y

Reader A

Reader B

Data Gathering

Report Generator

06 October 2011


40


Change

Sensor X

Sensor Y

Reader A

Reader B

Apply changed
policy

Promoted component

Data Gathering

Report Generator

06 October 2011


41

How Many Sandboxes?
N-sandboxes x One sandbox

How to group components?

Trustworthiness

Different Levels

Criteria

Cohesion

Same provider

Similar functionality

Coupling

Dependencies

Intensive communication

06 October 2011


42

Self-Healing Container
I. Continuous monitoring

Problem Diagnosis

Observation for future promotion (quarantine period)

II. Automatic Recovery

Restablished execution

06 October 2011


44

I. Continuous Monitoring

Reader A

Reader B

Sensor X

Sensor Y

Data Gathering

Report Generator

06 October 2011


45

I. Continuous Monitoring

Reader A

Reader B

Crash
Crash
Sensor X

Sensor Y

Data Gathering

Report Generator

06 October 2011


46

II. Automatic Recovery

Recovery

Reader A

Reader B

Sensor X

Sensor Y

Data Gathering

Report Generator

06 October 2011


47

Summary
Propositions

Dynamic Isolation of components

I. Component isolation containers

II. Runtime reconﬁgurable policy

Self-healing container


II. Automatic recovery

Differences against other approaches

Flexible isolation

Self-healing isolation container

06 October 2011


48

STATE OF THE ART
IMPLEMENTATION
VALIDATION

06 October 2011


49

IMPLEMENTATION
COMPONENT ISOLATION

I. TARGET COMPONENT PLATFORM

II. ISOLATION APPROACH

III. ISOLATION TECHNIQUES USED

IV. RECONFIGURABLE POLICY
SELF-HEALING SANDBOX

I. AUTONOMIC MANAGER

II. FAULT MODEL

06 October 2011


50

Target Component Platform
(un)Installation of components at runtime

Non-stop applications

OSGi

A module system for Java applications

Used in industry and academia

06 October 2011


51

Isolation Approach
Approach used for isolating components

Two Component platforms:

Trusted

Trusted Platform

Sandbox Platform

Sandbox (Quarantine )

Replicated components

(for type dependency purpose)
Mutual exclusive states

06 October 2011


52

Isolation Approach: Mutual Exclusive States

Trustworthy components are active execute on the trusted platform

Untrustworthy components are active on the sandbox platform

Fault Contained Environment

Trusted Platform Sandbox Platform
STARTED RESOLVED STARTED RESOLVED RESOLVED STARTED RESOLVED STARTED

Bundle A Bundle B Bundle C Bundle D Bundle A Bundle B Bundle C Bundle D

?
Actually two ? ? ?

running platforms

Main
OSGi
Sandbox
OSGi

Virtual Perspective

STARTED STARTED STARTED STARTED

Impression of having
Bundle A Bundle B Bundle C Bundle D a single application

Legend

Trustworthy

? Untrustworthy
OSGi

06 October 2011


53

Isolation Approach: Virtual Perspective

Fault Contained Environment

Trusted Platform Sandbox Platform
STARTED RESOLVED STARTED RESOLVED RESOLVED STARTED RESOLVED STARTED


?
Actually two ? ? ?

running platforms

Main
OSGi
Sandbox
OSGi

Virtual Perspective

STARTED STARTED STARTED STARTED

Impression of having
Bundle A Bundle B Bundle C Bundle D a single application

Legend

Trustworthy

? Untrustworthy
OSGi

06 October 2011


54

Isolation Techniques Used

Domain-based (Java Isolates)

strong isolation containers
with fault containment

Isolate

Isolate

Process (MVM)

Process-based (Java Virtual Machine)

Process

Process

(JVM)

(JVM)

06 October 2011


55

Communication between Containers

JVM
Java Isolate Java Isolate
(MVM)


? ? ? ?
Communication
via
Main Sockets or Sandbox
OSGi Link API OSGi
(JSR-121)

JVM JVM


? ? ? ?
Communitation
Main via Sandbox
OSGi Sockets OSGi

06 October 2011


56

Reconﬁgurable Policy

Isolation Policy Model

06 October 2011


57

IMPLEMENTATION
COMPONENT ISOLATION

I. TARGET COMPONENT PLATFORM

II. ISOLATION APPROACH

III. ISOLATION TECHNIQUES USED

IV. RECONFIGURABLE POLICY
SELF-HEALING SANDBOX

I. AUTONOMIC MANAGER

II. FAULT MODEL

06 October 2011


58

Self-healing Sandbox

The sandbox with an automatic recovery mechanism

An autonomic manager for the sandbox

External application

Control loop using a sense, analyze and react principle

Fault detection and forecast

Pragmatic approach based on a fault model

06 October 2011


59

Architecture

Sandbox Platform

use

Trusted Platform

Core

use

delegate

delegate

use

use

Core

PlatformProxy

Service

PlatformProxy

Registry

use

use

delegate

use

delegate

use

use

Monitoring
EffectorMBean

Isolation Service MBean

Policy Eval. Registry
delegate

delegate

use

delegate

HeartbeatProbe

SensorProbe

EffectorProbe

Autonomic Manager

delegate

delegate

delegate

Monitor

Policy Strategy
Watchdog

Evaluator

Executor

use

use

use

use

use

use

use

use

Script

Knowledge

Interpreter

06 October 2011


60

Control Loop Details

Sys. Admin. Script Repository

Autonomic Manager

AP

Policy
Evaluator

Monitor

Analyze and Plan

K

Knowledge

Execute

M

Watchdog

Monitor

E

Strategy
Executor

Sensors

Effectors

06 October 2011

Sandbox

61

Fault Model
Hypotheses of faults

General issues

Resource Consumption (e.g. CPU, memory)

Crashes (e.g., errors from wrapped native libraries)

Speciﬁc dynamism mishandling issues

Dangling objects (stale services)

Excessive
Faulty
Thread CPU Resource Usage
Behavior
Allocation

Denial of
Crash
Service

Unresponsiveness

Application
Stale Service Memory
Hang

06 October 2011


62

Separation of Concerns
Dependability as crosscutting concerns

Aspect-oriented Programming approach

All dependability code in aspects

Application
code

Aspect Weaver

Aspects

Woven code

06 October 2011


63

Implementation Summary
Domain-based (Isolates)

Process-based (Multiple JVMs)

Dynamic Isolation of components

I. Component isolation containers

Propositions

II. Runtime reconﬁgurable policy

DSL

Self-healing container


II. Automatic recovery

Autonomic Manager

06 October 2011


64

STATE OF THE ART
IMPLEMENTATION
VALIDATION

06 October 2011


65

VALIDATION

EXPERIMENTS USE CASE

DOMAIN-BASED X PROCESS-BASED

TEST PLATFORM

SELF-HEALING CONTAINER VALIDATION

06 October 2011


66

Experiments Use Case
Aspire RFID FP7 project

RFID Network

Non-stop servers collecting data

Plug-and-play devices

Native code for drivers puts stability in risk

ONS
Edge

Edge

RFID Readers +
Sensors

EPC IS EPC IS
Premise

Edge

06 October 2011


67

Experiments Use Case
Sensor

RFID Reader

RFID Application

ONS
Edge

Edge

RFID Readers +
Sensors

EPC IS EPC IS
Premise

Edge

06 October 2011


68

Process-based x Domain-based

Trusted Platform

Sandbox Platform

Criteria

Memory footprint

Isolate

Isolate

Application startup

Sandbox reboot time

MVM (Java 1.5)

Trusted Platform

Sandbox Platform

JVM 1.5

JVM 1.5

Virtual Machines used

Trusted Platform

Sandbox Platform

MVM (Java 1.5)

Sun Oracle Hotspot JVM 1.5

Sun Oracle Hotspot JVM 1.6

JVM 1.6

JVM 1.6

06 October 2011


69

Results
Single JVM (Domain-based)
90 Sandbox

80 Trusted platform

70

60

50
MB
40 Footprint of our solution using
30 process-based isolation is equivalent
to domain-based isolation

20

10

0
MVM (2 Isolates) 2 x JVM 1.5 2 x JVM 1.6

Isolation Containers Application Startup Sandbox Crash Sandbox Reboot
time (ms) detection time (ms) time (ms)
MVM (Multi-Isolate) 3186 32 303
MVM 1.5 (Multi-JVM)
JVM 1.5
3449
3945
697
660
3064
3047
Mean time to repair on sandbox is
JVM 1.6 3859 658 2537
faster when using Isolates

06 October 2011


70

Generic Test Platform
Fault deployment instead of fault injection

–  Emulation of erroneous behavior based on our fault model

–  Fault injection in the interface level does not represent actual
application usage

Management probes for triggering the faults

JVM JVM
RMI
Connector
Management and
Monitoring Console
MBeanServer
(JConsole, VisualVM)

Test Test Test Test
Probe Probe Probe Probe

Report Core Sensor Reader Reader
Sensor X Sensor Y
Generator Interfaces Aggregator Simulator A Simulator B

Sandbox OSGi

06 October 2011


71

Self-healing Container Validation
Fault detection

–  Fault model

Event causality

–  Heuristic for events correlation

–  Updates that trigger abnormal behavior

–  Useful for ﬁnding faulty components

Prediction of faults
(e.g., Stale service retainers, Out of memory error)

06 October 2011


72

Results

Correlation of events was possible

Proper actions taken upon abnormal behavior

06 October 2011


73

STATE OF THE ART
IMPLEMENTATION
VALIDATION

06 October 2011


74

Conclusions and Perspectives


Component Isolation Containers (“sandboxes”)

Runtime Reconﬁgurable Policy

Self-healing Container

Continuous Monitoring

Automatic recovery

06 October 2011


75

Missing Characteristics
Fine grained monitoring

Automatic promotion of well-behaving components

Automatic replacement of faulty components (e.g. taken from a
repository)

Open issue:

How to automatically evaluate component trust ?

06 October 2011


76

Perspectives
Resource monitoring at component level

Automated Component Promotion

Correlation of Historical Events

Rating Component Trustworthiness

Diversity of Isolation Environments

Embedded Systems

Cloud Computing

06 October 2011


77

[Thanks|Merci|Obrigado|Gracias]

?

PhD Defense slides

Recomendados

Recomendados

Mais conteúdo relacionado

Mais de Kiev Gama

Mais de Kiev Gama (15)

Último

Último (20)

PhD Defense slides

Notas do Editor