Why is this done? Benefits:- Determine components most contributing to software architecture- Allocate testing efforts, goals for testing units- Evaluate design alternatives, improve architecture- More reliabile system, quantitative numbers
report on experiences and methods usedlessons learnedwhat needs to be improved (from our perspective)
3 MLOC C++, COM, ATL9 subsystems, >100 componentsmanaging industrial process (e.g., power generation, paper production, oil and gas refining, etc.)distributed system, controllers, servers, networks, field devicesoperator workplace for controlling the process: montoring sensor readings, manipulating actuators
- also agenda of the rest of the talk
Schrift größer, weniger text
-Selected Littlewood/Verrall model from IEEE Std. 1633Industry affinity (SCADA), good fit in initial testsTime between failures exponentially distributed:Repair may introduce new faults, repair time = 0 is a random variable with Gamma distributionWe were able to fit the whole dataset without filtering data at5% significance level with the quadratic Littlewood/Verrallmodel (LV-Q)failure reports are often not mapped to components in bug tracking systemsdifficult to select a Modeltoo many models availablestatistical validity hard
failure data from bugtracker, filtered for critical/high severity bugsquadratic model: programmers have good intentions in fixing the codedone for each subsystem, result: 9 failure probabilities
Installed and configured the systemDefined 2 load profiles, configured load driversConfigured ABB tool to log subsystem transitionsExecuted load drivers for each profile (2 days)Processed logs (2 GB) with scriptAdded initial, final stateCalculate transition probabilitiesValidated the modelCompared with architectural documentationInterviewed PCS experts
- Q: transition probabilitiy matrix (by eliminating failure state)S: steady state probabilitiesR: system reliability (probability of reaching the successstate
units obfuscated for confidentiality reasonssubsystem 8 has highest failure probabilitysubsystem 1 has highest sensitivity to system reliabilitysubsystem 6 is used by many subsystems, but only limited contribution to system reliability
verteilung erklärenMany variation points, limited step-by-step guidanceTime-consuming data collection for non-expertsBest for for small changes to existing systemsNeeds to be tailored to available data