The document discusses resilient computing and control systems for critical social infrastructures. It covers:
- ICT systems implement centralized control of infrastructure functions but correctness is threatened by threats.
- Systems aim to provide acceptable service despite faults through resilience - ability to adapt to changes while maintaining dependability.
- Interdependencies between system components and data flows must be considered for adaptation. Consensus protocols allow tolerance of some faulty processes.
- Achieving resilience involves balancing safety properties for correct data processing with liveness properties for adaptation. Security mechanisms like isolation, fault avoidance/tolerance, and evidence generation can improve this balance.
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
On Resilient Computing
1. On Resilient Computing
ISSI 2011, Tokyo, Japan
February 16, 2012
Sven Wohlgemuth
Transdisciplinary Research Integration Center
National Institute of Informatics, Japan
Research Organization for Information and Systems, Japan
2. Agenda
I. Social Infrastructures and ICT
II. Adaptation and Interdependencies
III. Isolation Mechanisms
IV. Resilient Computing
2Sven Wohlgemuth <wohlgemuth@nii.ac.jp> On Resilient Computing
3. Sensor and controller
ICT services
I. Social Infrastructures and ICT
Sensor and controller
Workflows
Energy supply
Communication network
ICT services
S1 S2 S3
S4
Physical
Cyber
Function Event-driven
S5 S6
S8
...
...
• ICT control systems implement functions of social infrastructures
• Real-time processing of context data and controlling location
• Centralized control
• Operated by public or private organizations
3Sven Wohlgemuth <wohlgemuth@nii.ac.jp> On Resilient Computing
4. Sensor and controller
ICT services
I. Social Infrastructures and ICT
Sensor and controller
Workflows
Energy supply
Communication network
ICT services
S1 S2 S3
S4
Physical
Cyber
Function Event-driven
S5 S6
S8
...
...
• ICT control systems implement functions of social infrastructures
• Real-time processing of context data and controlling location
• Centralized control
• Operated by public or private organizations
Correctness threatened
by crime, terrorism, and natural disasters
3Sven Wohlgemuth <wohlgemuth@nii.ac.jp> On Resilient Computing
5. Resilience and ICT
• An affected resilient ICT system delivers at least correct critical services in a hostile
environment (brittle) (Hollnagel et al., 2006)
• Ability of an ICT system to provide and maintain an acceptable level of service in the
face of various faults and challenges to normal operation (Sterbenz et al., 2010)
• Persistence of dependability when facing changes (Laprie, 2008)
Own illustration following (Sheffi, 2005; Günther et al., 2007; McNanus, 2009)
4Sven Wohlgemuth <wohlgemuth@nii.ac.jp> On Resilient Computing
6. II. Adaptation and Interdependencies
Function
Specification Service
Sensor and controller
ICT services
d1 d2 c1
S1 S2 S3
S4
d1 d1, d1*
d1. d2, ...
c2
S4
Sn
d1. d2, ...
c2
d1
OS
Sj Sk
Si
Data flows describe interdependenciesAdaptation of an ICT system
5Sven Wohlgemuth <wohlgemuth@nii.ac.jp> On Resilient Computing
7. Shared
service C
Shared
service C
Sensor
Service A
Actuator
d r
d
Case (a) - Passive attack
Sensor
Service A
Actuator
d r*
d*
Case (b) - Active attack
Sensor Actuator
Case (c) - Non-availability
Malicious interferences Non-malicious interference
d, d* : Input data for a data processing
: Shared used service
r, r* : Result of a data processing
d
Attacking
service B
Attacking
service B
Shared
service C
Service A
Service B
Covert Channels
Automatic detection of all cover channels is impossible (Wang and Ju, 2006)
Covert channels may be unknown and lead to a failure Fault isolation
6Sven Wohlgemuth <wohlgemuth@nii.ac.jp> On Resilient Computing
8. III. Isolation Mechanisms
Mechanisms &
Methods
Policies
• Bell-LaPadula, Chinese Wall
• BiBa, Clark-Wilson
• Role-based access control
• Optimistic Security
• APPLE
• Obligation Specification Language (OSL)
• Extended Privacy Definition Tools
(ExPDT)
• Testing
• Simulation
• Model checking
• Security engineering
• Non-linkable Delegation of
Rights
• Monitors
• Virtualization
• Privacy-enhancing technologies
• Verifiable homomorphic encryption
• Secure data aggregation
• Certified security patterns
• Vulnerability analysis
• Model checking
• Penetration testing
• Process Rewriting
• Software patches
Fault acceptanceFault avoidance
7Sven Wohlgemuth <wohlgemuth@nii.ac.jp> On Resilient Computing
Fault tolerance
Fault
forecasting
Fault
prevention
Fault removal
• Forensics
• Process mining
• Data provenance
• Redundancy
• Consensus protocols
• Recovery-oriented computing
9. Consensus and Adaptation
Objective: Majority on correct data (sensor data, computation result)
S4
S5
S6
Sj
Sl
Monitor
d1
d2
d3
d1, d2, d3
d1, d2, d3
Sk
Consensus protocols and malicious faults:
• Synchronous communication:
• Asynchronous communication: Consensus not possible if one process fails
• But: Bears risk of failure due to non-availability of data
• Tolerates t < n/3 faulty processes, with authenticated messages: t < n
dcorrect = (d1=d2=d3), (d1=d2), (d1=d3) OR (d2=d3)
?
Cachin et al. 2011
8Sven Wohlgemuth <wohlgemuth@nii.ac.jp> On Resilient Computing
10. Challenge: Correct data processing in spite of covert channels
Fulfilled safety (correct) properties
Fulfilled liveness (adaptation) properties
Expected risk of failure
Error rate0% 100%
The Error rate represents the probability
of faulty services of a system according to its
specification
Safety Liveness
5On Resilient ComputingSven Wohlgemuth <wohlgemuth@nii.ac.jp>
IV. Resilient Computing
11. Challenge: Correct data processing in spite of covert channels
Error rate0% 100%
CriticalBrittleBrittleCritical
Fulfilled safety (correct) properties
Fulfilled liveness (adaptation) properties
Expected risk of failure
Safety Liveness
5On Resilient Computing
Failure due to safety
High capability of correct data
processing
Few on demand data
processing
Sven Wohlgemuth <wohlgemuth@nii.ac.jp>
IV. Resilient Computing
12. Failure due to liveness
Low capability
on correct data processing
High on demand data
processing
Challenge: Correct data processing in spite of covert channels
Error rate0% 100%
CriticalBrittleBrittleCritical
Fulfilled safety (correct) properties
Fulfilled liveness (adaptation) properties
Expected risk of failure
Safety Liveness
5On Resilient ComputingSven Wohlgemuth <wohlgemuth@nii.ac.jp>
IV. Resilient Computing
13. Acceptable states
Acceptable correctness of
data processing
Acceptable on demand
data processing
Challenge: Correct data processing in spite of covert channels
Error rate0% 100%
CriticalBrittleBrittleCritical
Fulfilled safety (correct) properties
Fulfilled liveness (adaptation) properties
Expected risk of failure
Safety Liveness
5On Resilient ComputingSven Wohlgemuth <wohlgemuth@nii.ac.jp>
IV. Resilient Computing
14. Generate Evidences
S4
S5
S6
Sj Sk
Sl
Risk Assessment with
Uncertainty
Usage Control Policy Select Services
S4
S5
S6
Sj Sk
Sl
De-Select Services
S4
S5
S6
Sj Sk
Sl
Security Architecture for Resilient Computing
10Sven Wohlgemuth <wohlgemuth@nii.ac.jp> On Resilient Computing
15. Generate Evidences
S4
S5
S6
Sj Sk
Sl
Risk Assessment with
Uncertainty
Usage Control Policy Select Services
S4
S5
S6
Sj Sk
Sl
De-Select Services
S4
S5
S6
Sj Sk
Sl
Preliminary work: DREISAM (Delegation of Rights) & DETECTIVE (Data Provenance)
Security Architecture for Resilient Computing
10Sven Wohlgemuth <wohlgemuth@nii.ac.jp> On Resilient Computing