author: H.A. (Marcel) Oldenkamp
title: Probabilistic model checking: a comparison of tools
keywords: probabilistic model checking, tools, comparison
committee: Ivan S. Zapreev MSc ,
Joost-Pieter Katoen ,
dr. David N. Jansen ,
Mariƫlle Stoelinga
end: May 2007


Model checking is a technique to establish the correctness of hardware or software systems in an automated fashion. The goal of this technique is to try to predict system behaviour, or more specifically, to formally prove that all possible executions of the system conform to the requirements. Probabilistic model checking focusses on proving correctness of stochastic systems (i.e. systems where probabilities play a role). A probabilistic model checker tool automates the correctness proving process. These tools can verify if a system - which is described by a model, written in a formal language - satisfies a formal specification, which is expressed using logics, such as Probabilistic Computation Tree Logic (PCTL). We have studied the efficiency of five probabilistic model checker tools, namely: PRISM (Sparse and Hybrid mode), MRMC, ETMCC, YMER and VESTA. We made a tool by tool comparison, analysing model check times and peak memory usage. This was achieved by using five representative case studies of fully probabilistic systems, namely; Synchronous Leader Election (SLE), Randomized Dining Philosophers (RDP), Birth-death process (BDP), Tandem Queuing Network (TQN) and Cyclic Server Polling System (CSP). Besides their performance, we also investigated the characteristics of each tool, comparing their implementation details, range of supported probabilistic models, model specification language, property specification language and supported algorithms and data structures. During our research, we have performed nearly 15,000 individual runs. By ensuring that our experiments are automated, repeatable, verifiable, statistically significant and free from external influences, our findings are based on a solid methodology.

We have witnessed a significant difference in model check time as well as memory usage between the tools. From our experiments we learned that YMER (which is a statistical tool) is by far the best tool for verifying medium to large size models. It outperforms the other statistical model checker VESTA and all numerical tools. YMER has a remarkably consistent (low) memory usage across various model sizes. Although its performance is excellent, we found that YMER does have limitations: the range of supported models and probabilistic operators is limited and it can not provide the same level of accuracy as numerical tools. YMER may occasionally report wrong answers, but this can be expected of tools using a statistical approach, where there exists a trade-off between speed and accuracy. The benefit of statistical tools is that they scale much better (performance wise) in relation to the state-space size than the numerical tools.

Comparing the numerical tools we conclude that MRMC has the best performance (time and memory wise) for models up to a few million states. This is especially true for steady-state and nested properties1, for other properties (i.e. bounded Until, interval Until and unbounded Until) MRMC and PRISMSparse are rather close. On larger models, PRISMSparse (and sometimes also PRISMHybrid) performs better. The sparse engine is usually faster than the hybrid engine at the cost of substantially greater memory usage. As for ETMCC, it has the worst time and memory performance, it frequently runs out of memory in situations where the models could easily be checked by the other tools. The performance differences between the two leading numerical tools MRMC and PRISM have several causes. First of all, PRISM always construct an MTBDD (Multiterminal Binary Decision Diagram), in sparse mode the MTBDD is converted to a sparse matrix after performing some pre-computations. This may take a significant time and influence the model check time. There is no such influence for MRMC as it starts model checking on the pre-generated sparse matrix. Secondly, the MTBDD size plays a crucial role in PRISMs performance. Large MTBDDs lead to poor performance of the hybrid engine. The difference between MRMC and PRISMS is caused by the pre-computation step that is performed by PRISM on the MTBDD. The pre-computation time is included in the model check time. Finally, MRMCs performance on large models might be influenced by its high memory usage, in situations where the memory usage exceeds the systems RAM space, swapping will cause a slow down.

On the aspect of user friendliness we find PRISM the most user friendly tool. MRMC is more appropriate as a fast back-end verification engine as it has a simple input format.