Quantcast
Channel: Elliott Baron's Blog » Eclipse
Viewing all articles
Browse latest Browse all 3

Property Simulation – a new model for CDT’s Static Analysis

$
0
0

Several months ago, I discovered the need for improvements with Eclipse CDT’s static analysis. Namely, we need an analysis model that correctly accounts for control flow paths of a program. To test the abilities of the framework, I created a checker to detect instances where close was being called on a file descriptor that did not have a corresponding open call earlier in the control flow. Consider the following snippet of code:

int fd;
if (x) {
        fd = open("file", O_RDONLY);
}
close(fd);

If we simply traversed a function containing this code with an ASTVisitor, all we could do is detect that the open call is visited before the close call and assume there is no error condition. In reality, we know that close is called improperly if the branch is not executed. Thus we have a conditional error that will occur if x evaluates to false.

In my search for a solution, I came across a paper entitled “ESP: path-sensitive program verification in polynomial time” from Microsoft Research and the University of Washington. The paper explains and compares different static analysis techniques. A temporal property to check is expressed as a finite state machine. A finite state machine for our file open/close example could look like1:

The finite state machine’s states are known to the algorithm as Property States. We also keep track of the Execution State of the program at a given instant. This includes necessary conditions to execute a given block in the control flow graph. In our code snippet above, the open call would have an Execution State of x, meaning x must be true in order to execute that block. Each Execution State takes the form of a Boolean formula and subsequent if statements append another clause to it. The basic structure that contains all of this information in the algorithm is called a Symbolic State. Each Symbolic State contains one or more Property States and an Execution State. Each edge in the control flow graph references a set of Symbolic States that describe the state of the program at that point. In our snippet, the out-edge of the open statement block would have the following Symbolic State associated with it: [opened, x].

The paper covers three static analysis methods, which differ in the way Symbolic States are grouped and simplified. Here is how they differ at merge points:

  1. Fully Path-sensitive Analysis (PSA) – No grouping is performed. All Symbolic States from either side of the branch are kept.
  2. Dataflow Analysis – All Symbolic States from both sides of the branch are consolidated into a union of the Property States and join of Execution States.
  3. Property Simulation – Symbolic States are grouped by their Property State, corresponding Execution States are joined.

This diagram shows how each method analyzes a control flow graph1.

Click for the complete diagram

You can see that the Fully Path-sensitive Analysis (PSA) retains all information and performs no state grouping at merge points. Note that this means that each branch point doubles the number of states. Fully Path-sensitive Analysis is seen as intractable due to this exponential blowup in the number of states. The Dataflow Analysis does the very opposite. Whenever the Execution State is made more specific by a branch point, it is discarded at the merge point. We then mistakenly transition to an error state because we lost the Execution State context that caused the file to be opened.

Property Simulation meets the two approaches in the middle. It only discards Execution State information when the branch did not cause a Property State transition, thus it is not meaningful to our analysis. Note the Fully Path-sensitive Analysis, but not Property Simulation, stored Execution State information for p even though the branch that depends on p does not have any file operations. In doing this, we avoid the inefficiencies of the Fully Path-sensitive Analysis, while avoiding the false error reporting of the Dataflow Analysis.

Details on my work implementing this algorithm for the CDT will follow shortly.


1. Image Credit: Das, M., Lerner, S., and Seigle, M. 2002. ESP: path-sensitive program verification in polynomial time. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (Berlin, Germany, June 17 – 19, 2002). PLDI ’02. ACM, New York, NY, 57-68.

Viewing all articles
Browse latest Browse all 3

Trending Articles