|Formal Methods for High-Level Software|
High-level programming languages such as C/C++ and Java are widely used in developing today's software. It is always desired that functional abnormalities of software are detected during an early development stage. Testing has been a common method for ensuring software quality. However, the quality of test cases often depends significantly on the experience of programmers. Thus, the approach could still leave some bugs remain uncovered.
We study formal methods, in particular model-checking techniques, to tackle the problem. It is well known that sequential programs can be naturally modelled with pushdown systems (PDSs), where global and local variables can be encoded into control states and stack symbols, respectively. However, the approach results in a blow-up in the size of variables. We alleviate the problem by employing BDDs to symbolically represent PDSs. We have developed efficient algorithms for reachability analysis of symbolic PDSs based on automata-theoretic techniques. The algorithms have been implemented in the tool moped.
Furthermore, we have implemented a translator which translates Java bytecodes - object codes that Java compiler produces - into PDSs. The translator and moped are combined into jMoped. jMoped takes the form of an Eclipse plug-in, which allows users to graphically interact with the tool. Roughly speaking, given a scope to be tested, jMoped performs coverage analyses on Java programs, and graphically displays the results. Also, jMoped finds some common Java errors such as assertion errors, null-pointer exceptions, and array bound violations.
Nevertheless, jMoped has some limitations and is currently not suitable for testing large programs. The aim of this project is to improve both translator and model-checking techniques, which will allow us to extend the horizon of high-level software testing.
|Verification of Probabilistic Systems|
Probabilistic methods are widely used in the design, analysis, and verification of computer systems that exhibit some kind of "quantified uncertainty" such as coin-tossing in randomized algorithms, subsystem failures (caused, e.g., by communication errors or bit flips with an empirically evaluated probability), or underspecification in some components of the system. The underlying semantic model are Markov chains or Markov decision processes, depending mainly on whether the systems under consideration are sequential or parallel. Properties of such systems can formally be specified as formulae of suitable temporal logics such as LTL, PCTL, or PCTL*. In these logics, one can express properties like "the probability of termination is at least 98%", "the probability that each request will eventually be granted is 1", etc. Model-checking algorithms for these logics have been developed mainly for finite-state Markov chains and finite-state Markov decision processes. This is certainly a limitation, because many implementations use unbounded data structures (counters, queues, stacks, etc.) that cannot always be faithfully abstracted into finite-state models. The question whether one can go beyond this limit has been rapidly gaining importance and attention in recent years.
In this project we study probabilistic pushdown automata and related models. These are probabilistic extensions of the well-known pushdown automata. This model can be used to analyze programs with procedures, but it turns out to have many other applications. In particular, since the model encompasses stochastic context-free grammars, it can be applied to some problems of natural language processing or even bioinformatics. It also turns out to be useful for the design of reputation systems.
|Foundations of Program Analysis|
Static analysis has been traditionally employed to compute functional information about a program in the form of local invariants, i.e., invariants holding at a program point. To compute these invariants one defines an adequate lattice of abstract values, say A, and a mapping that attaches to each program execution an element of A. The invariant associated to a program point p is given by the "meet over all paths" or MOP, defined as the meet of the values attached to all program executions leading to p. When certain conditions hold, the MOP is the solution of the least fixed point of a system of equations, which can be computed (or at least approximated) using worklist algorithms. The equation system is linear in the case of procedure-free programs (modeled as finite automata), whereas in the procedural case (modeled as pushdown systems), it is polynomial.
Static analysis can be applied to performance questions, e.g. execution time analysis. However, performance questions concerning the average behaviour of the program, like average execution time or memory consumption, cannot be cast into the MOP framework. The technical reason is that, by its very definition, the operator that takes the values attached to all program executions and returns the MOP must be idempotent. This is the case in worst-case analysis, where the MOP operator corresponds to a maximum, but not in average-case analysis.
While the MOP framework can be extended to accommodate non-idempotent meet operators, the corresponding worklist algorithms are inefficient. For instance, it is easy to find examples where O(2i) iterations of the worklist algorithm are required to compute the first i bits of the average runtime.
We have provided an extended mathematical framework allowing to solve this problem: The iteration scheme underlying worklist algorithms, based on Kleene's theorem, can be replaced by a generic extension of Newton's method, a well-known technique of numerical mathematics. We have provisorily christened this technique as Newtonian program analysis.
We work on further extensions of this framework and develop efficient algorithms for the computation of Newton approximants.
The goal is to conceive a powerful generic engine allowing to perform efficiently various combinations of performance and functional analysis.