author: Marijn Peppelman
title: Encoding failure probability dependencies in Bayesian networks
company: NS
keywords: Fault Tree Analysis, Conditional probabilities, Railway industry
topics: Case studies and Applications , Dependability, security and performance
committee: Mariƫlle Stoelinga ,
Doina Bucur ,
Carlos Esteban Budde
started: October 2019
end: August 2020


Fault Tree Analysis models the failure of a system by way of basic events and gates. The failure probabilities of these Basic Events are assumed to be independent. Not all systems conform to this assumption, so a potential method of circumventing this assumption was investigated. Bayesian Networks are commonly used to describe dependant probabilities. If Bayesian Networks can be used to provide the effective failure probabilities to the Fault Tree, then the failure dependencies of the Basic Events can be encoded into the Bayesian Network and taken into account when the effective failure probabilities are generated.

This thesis documents a tool chain development of a tool chain that encodes the failure probabilities of the Fault Tree in a Bayesian network. The tool chain consists of a data pre-processing script to aggregate failure and parameter data, a commercial Bayesian Network toolkit, a custom Genetic Algorithm that can learn the Bayesian Network structure and parameters from data, and a script to calculate effective failure probabilities from the Bayesian Network and apply them to the Fault Tree.

A literature study on the current state of Bayesian Networks and Fault Tree Analysis was performed to evaluate what the current capabilities of both are, and if any other method of defeating the Independence assumption already exists. While methods of transforming Fault Trees into Bayesian networks, which could theoretically then encode such dependencies do exist, no methods exist that let Fault Trees operate while accounting for these dependencies. The operating principles of the encoding were explored, and the functionality implemented in the tool chain documented. The literature research indicated Genetic Algorithms were a highly flexible method of optimization, and were thus selected to provide the learning from data functionality due to uncertainty in what would be required to learn the encoding from data. A method of encoding the Bayesian Network structure into a chromosome is detailed, documenting the chromosome structure and genetic operators employed by the Genetic Algorithm. The performance of learning the networks from data is explored through use of artificial test data and a case study on obfuscated failure logs from HVAC units in trains from dutch railways (Nederlandse Spoorwegen, NS).

The artificial training data indicate that the processing time required to learn the Bayesian Networks from data scales at least linearly with the amount of data, and with the amount of components in the system/nodes in the Bayesian Network. The processing time also grows with the amount of time steps for which there is data present, which can not be explained due to the linear growth with the amount of data, but the exact relation is not known. More complex networks need increasingly more data to learn. While it should be possible to learn networks of any complexity, the amount of data required to do so is likely to be prohibitive. No networks that could be deemed accurate could be learned from the NS case study data, likely due to the fact that the amount of data used was insufficient for the complexity of the HVAC system being modeled. It is possible to use more data by employing smaller time steps, but the projected calculation time required would make that unfeasible until more optimizations are implemented in the tool chain.