Mining Function Specifications¶
In the chapter on assertions, we have seen how important it is to check whether the result is as expected. In this chapter, we introduce a technique that allows us to mine function specifications from a set of given executions, resulting in abstract and formal descriptions of what the function expects and what it delivers.
These so-called dynamic invariants produce pre- and post-conditions over function arguments and variables from a set of executions. Within debugging, the resulting assertions can immediately check whether function behavior has changed, but can also be useful to determine the characteristics of failing runs (as opposed to passing runs). Furthermore, the resulting specifications provide pre- and postconditions for formal program proofs, testing, and verification.
This chapter is based on a chapter with the same name in The Fuzzing Book, which focuses on test generation.
Prerequisites
- You should be familiar with tracing program executions, as in the chapter on tracing.
- Later in this section, we access the internal abstract syntax tree representations of Python programs and transform them, as in the chapter on tracking failure origins.
# ignore
from typing import Sequence, Any, Callable, Tuple
from typing import Dict, Union, Set, List, cast, Optional
Specifications and Assertions¶
When implementing a function or program, one usually works against a specification – a set of documented requirements to be satisfied by the code. Such specifications can come in natural language. A formal specification, however, allows the computer to check whether the specification is satisfied.
In the chapter on assertions, we have seen how preconditions and postconditions can describe what a function does. Consider the following (simple) square root function:
def square_root(x): # type: ignore
assert x >= 0 # Precondition
...
assert result * result == x # Postcondition
return result
The assertion assert p
checks the condition p
; if it does not hold, execution is aborted. Here, the actual body is not yet written; we use the assertions as a specification of what square_root()
expects, and what it delivers.
The topmost assertion is the precondition, stating the requirements on the function arguments. The assertion at the end is the postcondition, stating the properties of the function result (including its relationship with the original arguments). Using these pre- and postconditions as a specification, we can now go and implement a square root function that satisfies them. Once implemented, we can have the assertions check at runtime whether square_root()
works as expected.
However, not every piece of code is developed with explicit specifications in the first place; let alone does most code comes with formal pre- and post-conditions. (Just take a look at the chapters in this book.) This is a pity: As Ken Thompson famously said, "Without specifications, there are no bugs – only surprises". It is also a problem for debugging, since, of course, debugging needs some specification such that we know what is wrong, and how to fix it. This raises the interesting question: Can we somehow retrofit existing code with "specifications" that properly describe their behavior, allowing developers to simply check them rather than having to write them from scratch? This is what we do in this chapter.
Beyond Generic Failures¶
Before we go into mining specifications, let us first discuss why it could be useful to have them. As a motivating example, consider the full implementation of square_root()
from the chapter on assertions:
def square_root(x): # type: ignore
"""Computes the square root of x, using the Newton-Raphson method"""
approx = None
guess = x / 2
while approx != guess:
approx = guess
guess = (approx + x / approx) / 2
return approx
square_root()
does not come with any functionality that would check types or values. Hence, it is easy for callers to make mistakes when calling square_root()
:
with ExpectError():
square_root("foo")
Traceback (most recent call last): File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/1219646233.py", line 2, in <module> square_root("foo") File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/81976482.py", line 4, in square_root guess = x / 2 ~~^~~ TypeError: unsupported operand type(s) for /: 'str' and 'int' (expected)
with ExpectError():
x = square_root(0.0)
Traceback (most recent call last): File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/3113108935.py", line 2, in <module> x = square_root(0.0) ^^^^^^^^^^^^^^^^ File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/81976482.py", line 7, in square_root guess = (approx + x / approx) / 2 ~~^~~~~~~~ ZeroDivisionError: float division by zero (expected)
At least, the Python system catches these errors at runtime. The following call, however, simply lets the function enter an infinite loop:
with ExpectTimeout(1):
x = square_root(-1.0)
Traceback (most recent call last): File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/3421421264.py", line 2, in <module> x = square_root(-1.0) ^^^^^^^^^^^^^^^^^ File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/81976482.py", line 5, in square_root while approx != guess: ^^^^^^^^^^^^^^^ File "Timeout.ipynb", line 43, in timeout_handler raise TimeoutError() TimeoutError (expected)
Our goal is to avoid such errors by annotating functions with information that prevents errors like the above ones. The idea is to provide a specification of expected properties – a specification that can then be checked at runtime or statically.
\todo{Introduce the concept of contract.}
Mining Data Types¶
For our Python code, one of the most important "specifications" we need is types. Python being a "dynamically" typed language means that all data types are determined at run time; the code itself does not explicitly state whether a variable is an integer, a string, an array, a dictionary – or whatever.
As writer of Python code, omitting explicit type declarations may save time (and allows for some fun hacks). It is not sure whether a lack of types helps in reading and understanding code for humans. For a computer trying to analyze code, the lack of explicit types is detrimental. If, say, a constraint solver, sees if x:
and cannot know whether x
is supposed to be a number or a string, this introduces an ambiguity. Such ambiguities may multiply over the entire analysis in a combinatorial explosion – or in the analysis yielding an overly inaccurate result.
Python 3.6 and later allow data types as annotations to function arguments (actually, to all variables) and return values. We can, for instance, state that square_root()
is a function that accepts a floating-point value and returns one:
def square_root_with_type_annotations(x: float) -> float:
"""Computes the square root of x, using the Newton-Raphson method"""
return square_root(x)
By default, such annotations are ignored by the Python interpreter. Therefore, one can still call square_root_typed()
with a string as an argument and get the exact same result as above. However, one can make use of special type checking modules that would check types – dynamically at runtime or statically by analyzing the code without having to execute it.
Static Type Checking¶
Type annotations can also be checked statically – that is, without even running the code. Let us create a simple Python file consisting of the above square_root_typed()
definition and a bad invocation.
f = tempfile.NamedTemporaryFile(mode='w', suffix='.py')
f.name
'/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/tmp93t8wa0_.py'
f.write(inspect.getsource(square_root))
f.write('\n')
f.write(inspect.getsource(square_root_with_type_annotations))
f.write('\n')
f.write("print(square_root_with_type_annotations('123'))\n")
f.flush()
These are the contents of our newly created Python file:
print_file(f.name, start_line_number=1)
1 def square_root(x): # type: ignore 2 """Computes the square root of x, using the Newton-Raphson method""" 3 approx = None 4 guess = x / 2 5 while approx != guess: 6 approx = guess 7 guess = (approx + x / approx) / 2 8 9 return approx 10 11 def square_root_with_type_annotations(x: float) -> float: 12 """Computes the square root of x, using the Newton-Raphson method""" 13 return square_root(x) 14 15 print(square_root_with_type_annotations('123'))
Mypy is a type checker for Python programs. As it checks types statically, types induce no overhead at runtime; plus, a static check can be faster than a lengthy series of tests with runtime type checking enabled. Let us see what mypy
produces on the above file:
result = subprocess.run(["mypy", "--strict", f.name],
universal_newlines=True, stdout=subprocess.PIPE)
print(result.stdout.replace(f.name + ':', ''))
del f # Delete temporary file
13: error: Returning Any from function declared to return "float" [no-any-return] 13: error: Call to untyped function "square_root" in typed context [no-untyped-call] 15: error: Argument 1 to "square_root_with_type_annotations" has incompatible type "str"; expected "float" [arg-type] Found 3 errors in 1 file (checked 1 source file)
We see that mypy
complains about untyped function definitions such as square_root()
; most important, however, it finds that the call to square_root_with_type_annotations()
in the last line has the wrong type.
With mypy
, we can achieve the same type safety with Python as in statically typed languages – provided that we as programmers also produce the necessary type annotations. Is there a simple way to obtain these?
Mining Type Specifications¶
Our first task will be to mine type annotations (as part of the code) from values we observe at run time. These type annotations would be mined from actual function executions, learning from (normal) runs what the expected argument and return types should be. By observing a series of calls such as these, we could infer that both x
and the return value are of type float
:
y = square_root(25.0)
y
5.0
y = square_root(2.0)
y
1.414213562373095
How can we mine types from executions? The answer is simple:
- We observe a function during execution
- We track the types of its arguments
- We include these types as annotations into the code.
To do so, we can make use of Python's tracing facility we already observed in the chapter on tracing executions. With every call to a function, we retrieve the arguments, their values, and their types.
Tracing Calls¶
To observe argument types at runtime, we define a tracer function that tracks the execution of square_root()
, checking its arguments and return values. The CallTracer
class is set to trace functions in a with
block as follows:
with CallTracer() as tracer:
function_to_be_tracked(...)
info = tracer.collected_information()
To create the tracer, we build on the Tracer
superclass as in the chapter on tracing executions.
We start with two helper functions. get_arguments()
returns a list of call arguments in the given call frame.
Arguments = List[Tuple[str, Any]]
def get_arguments(frame: FrameType) -> Arguments:
"""Return call arguments in the given frame"""
# When called, all arguments are local variables
local_variables = dict(frame.f_locals) # explicit copy
arguments = [(var, frame.f_locals[var])
for var in local_variables]
# FIXME: This may be needed for Python < 3.10
# arguments.reverse() # Want same order as call
return arguments
simple_call_string()
is a helper for logging that prints out calls in a user-friendly manner.
def simple_call_string(function_name: str, argument_list: Arguments,
return_value : Any = None) -> str:
"""Return function_name(arg[0], arg[1], ...) as a string"""
call = function_name + "(" + \
", ".join([var + "=" + repr(value)
for (var, value) in argument_list]) + ")"
if return_value is not None:
call += " = " + repr(return_value)
return call
Now for CallTracer
. The constructor simply invokes the Tracer
constructor:
class CallTracer(Tracer):
def __init__(self, log: bool = False, **kwargs: Any)-> None:
super().__init__(**kwargs)
self._log = log
self.reset()
def reset(self) -> None:
self._calls: Dict[str, List[Tuple[Arguments, Any]]] = {}
self._stack: List[Tuple[str, Arguments]] = []
The traceit()
method does nothing yet; this is done in specialized subclasses. The CallTracer
class implements a traceit()
function that checks for function calls and returns:
class CallTracer(CallTracer):
def traceit(self, frame: FrameType, event: str, arg: Any) -> None:
"""Tracking function: Record all calls and all args"""
if event == "call":
self.trace_call(frame, event, arg)
elif event == "return":
self.trace_return(frame, event, arg)
trace_call()
is called when a function is called; it retrieves the function name and current arguments, and saves them on a stack.
class CallTracer(CallTracer):
def trace_call(self, frame: FrameType, event: str, arg: Any) -> None:
"""Save current function name and args on the stack"""
code = frame.f_code
function_name = code.co_name
arguments = get_arguments(frame)
self._stack.append((function_name, arguments))
if self._log:
print(simple_call_string(function_name, arguments))
When the function returns, trace_return()
is called. We now also have the return value. We log the whole call with arguments and return value (if desired) and save it in our list of calls.
class CallTracer(CallTracer):
def trace_return(self, frame: FrameType, event: str, arg: Any) -> None:
"""Get return value and store complete call with arguments and return value"""
code = frame.f_code
function_name = code.co_name
return_value = arg
# TODO: Could call get_arguments() here
# to also retrieve _final_ values of argument variables
called_function_name, called_arguments = self._stack.pop()
assert function_name == called_function_name
if self._log:
print(simple_call_string(function_name, called_arguments), "returns", return_value)
self.add_call(function_name, called_arguments, return_value)
add_call()
saves the calls in a list; each function name has its own list.
class CallTracer(CallTracer):
def add_call(self, function_name: str, arguments: Arguments,
return_value: Any = None) -> None:
"""Add given call to list of calls"""
if function_name not in self._calls:
self._calls[function_name] = []
self._calls[function_name].append((arguments, return_value))
we can retrieve the list of calls, either for a given function name (calls()
),
or for all functions (all_calls()
).
class CallTracer(CallTracer):
def calls(self, function_name: str) -> List[Tuple[Arguments, Any]]:
"""Return list of calls for `function_name`."""
return self._calls[function_name]
class CallTracer(CallTracer):
def all_calls(self) -> Dict[str, List[Tuple[Arguments, Any]]]:
"""
Return list of calls for function_name,
or a mapping function_name -> calls for all functions tracked
"""
return self._calls
Let us now put this to use. We turn on logging to track the individual calls and their return values:
with CallTracer(log=True) as tracer:
y = square_root(25)
y = square_root(2.0)
square_root(x=25) square_root(x=25) returns 5.0 square_root(x=2.0) square_root(x=2.0) returns 1.414213562373095
After execution, we can retrieve the individual calls:
calls = tracer.calls('square_root')
calls
[([('x', 25)], 5.0), ([('x', 2.0)], 1.414213562373095)]
Each call is a pair (argument_list
, return_value
), where argument_list
is a list of pairs (parameter_name
, value
).
square_root_argument_list, square_root_return_value = calls[0]
simple_call_string('square_root', square_root_argument_list, square_root_return_value)
'square_root(x=25) = 5.0'
If the function does not return a value, return_value
is None
.
def hello(name: str) -> None:
print("Hello,", name)
with CallTracer() as tracer:
hello("world")
Hello, world
hello_calls = tracer.calls('hello')
hello_calls
[([('name', 'world')], None)]
hello_argument_list, hello_return_value = hello_calls[0]
simple_call_string('hello', hello_argument_list, hello_return_value)
"hello(name='world')"
Getting Types¶
Despite what you may have read or heard, Python actually is a typed language. It is just that it is dynamically typed – types are used and checked only at runtime (rather than declared in the code, where they can be statically checked at compile time). We can thus retrieve types of all values within Python:
type(4)
int
type(2.0)
float
type([4])
list
We can retrieve the type of the first argument to square_root()
:
parameter, value = square_root_argument_list[0]
parameter, type(value)
('x', int)
as well as the type of the return value:
type(square_root_return_value)
float
Hence, we see that (so far), square_root()
is a function taking (among others) integers and returning floats. We could declare square_root()
as:
def square_root_annotated(x: int) -> float:
return square_root(x)
This is a representation we could place in a static type checker, allowing to check whether calls to square_root()
actually pass a number. A dynamic type checker could run such checks at runtime.
By default, Python does not do anything with such annotations. However, tools can access annotations from functions and other objects:
square_root_annotated.__annotations__
{'x': int, 'return': float}
This is how run-time checkers access the annotations to check against.
Annotating Functions with Types¶
Our plan is to annotate functions automatically, based on the types we have seen. Our aim is to build a class TypeAnnotator
that can be used as follows. First, it would track some execution:
with TypeAnnotator() as annotator:
some_function_call()
After tracking, TypeAnnotator
would provide appropriate methods to access (type-)annotated versions of the function seen:
print(annotator.typed_functions())
Let us put the pieces together to build TypeAnnotator
.
Here is how to use TypeAnnotator
. We first track a series of calls:
with TypeAnnotator() as annotator:
y = square_root(25.0)
y = square_root(2.0)
After tracking, we can immediately retrieve an annotated version of the functions tracked:
print_content(annotator.typed_functions(), '.py')
def square_root(x: float) -> float: """Computes the square root of x, using the Newton-Raphson method""" approx = None guess = x / 2 while approx != guess: approx = guess guess = (approx + x / approx) / 2 return approx
This also works for multiple and diverse functions. One could go and implement an automatic type annotator for Python files based on the types seen during execution.
with TypeAnnotator() as annotator:
hello('type annotations')
y = square_root(1.0)
Hello, type annotations
print_content(annotator.typed_functions(), '.py')
# Could not find function 'parent_header'# Could not find function '_is_master_process'# Could not find function 'utcoffset'# Could not find function 'is_set'# Could not find function '_wait_for_tstate_lock'# Could not find function 'is_alive'# Could not find function '_event_pipe'# Could not find function 'send'# Could not find function 'schedule'# Could not find function '_schedule_flush'# Could not find function 'write'def hello(name: str) -> None: print('Hello,', name)def square_root(x: float) -> float: """Computes the square root of x, using the Newton-Raphson method""" approx = None guess = x / 2 while approx != guess: approx = guess guess = (approx + x / approx) / 2 return approx
A content as above could now be sent to a type checker, which would detect any type inconsistency between callers and callees.
Mining Invariants¶
Besides basic data types. we can check several further properties from arguments. We can, for instance, whether an argument can be negative, zero, or positive; or that one argument should be smaller than the second; or that the result should be the sum of two arguments – properties that cannot be expressed in a (Python) type.
Such properties are called invariants, as they hold across all invocations of a function. Specifically, invariants come as pre- and postconditions – conditions that always hold at the beginning and at the end of a function. (There are also data and object invariants that express always-holding properties over the state of data or objects, but we do not consider these in this book.)
Annotating Functions with Pre- and Postconditions¶
The classical means to specify pre- and postconditions is via assertions, which we have introduced in the chapter on assertions. A precondition checks whether the arguments to a function satisfy the expected properties; a postcondition does the same for the result. We can express and check both using assertions as follows:
def square_root_with_invariants(x): # type: ignore
assert x >= 0 # Precondition
...
assert result * result == x # Postcondition
return result
A nicer way, however, is to syntactically separate invariants from the function at hand. Using appropriate decorators, we could specify pre- and postconditions as follows:
@precondition lambda x: x >= 0
@postcondition lambda return_value, x: return_value * return_value == x
def square_root_with_invariants(x):
# normal code without assertions
...
The decorators @precondition
and @postcondition
would run the given functions (specified as anonymous lambda
functions) before and after the decorated function, respectively. If the functions return False
, the condition is violated. @precondition
gets the function arguments as arguments; @postcondition
additionally gets the return value as first argument.
It turns out that implementing such decorators is not hard at all. Our implementation builds on a code snippet from StackOverflow:
def condition(precondition: Optional[Callable] = None,
postcondition: Optional[Callable] = None) -> Callable:
def decorator(func: Callable) -> Callable:
@functools.wraps(func) # preserves name, docstring, etc
def wrapper(*args: Any, **kwargs: Any) -> Any:
if precondition is not None:
assert precondition(*args, **kwargs), \
"Precondition violated"
# Call original function or method
retval = func(*args, **kwargs)
if postcondition is not None:
assert postcondition(retval, *args, **kwargs), \
"Postcondition violated"
return retval
return wrapper
return decorator
def precondition(check: Callable) -> Callable:
return condition(precondition=check)
def postcondition(check: Callable) -> Callable:
return condition(postcondition=check)
With these, we can now start decorating square_root()
:
@precondition(lambda x: x > 0)
def square_root_with_precondition(x): # type: ignore
return square_root(x)
This catches arguments violating the precondition:
with ExpectError():
square_root_with_precondition(-1.0)
Traceback (most recent call last): File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/2859393274.py", line 2, in <module> square_root_with_precondition(-1.0) File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/3759592669.py", line 7, in wrapper assert precondition(*args, **kwargs), \ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Precondition violated (expected)
Likewise, we can provide a postcondition:
@postcondition(lambda ret, x: math.isclose(ret * ret, x))
def square_root_with_postcondition(x): # type: ignore
return square_root(x)
y = square_root_with_postcondition(2.0)
y
1.414213562373095
If we have a buggy implementation of $\sqrt{x}$, this gets caught quickly:
@postcondition(lambda ret, x: math.isclose(ret * ret, x))
def buggy_square_root_with_postcondition(x): # type: ignore
return square_root(x) + 0.1
with ExpectError():
y = buggy_square_root_with_postcondition(2.0)
Traceback (most recent call last): File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/1806983978.py", line 2, in <module> y = buggy_square_root_with_postcondition(2.0) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/3759592669.py", line 13, in wrapper assert postcondition(retval, *args, **kwargs), \ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Postcondition violated (expected)
While checking pre- and postconditions is a great way to catch errors, specifying them can be cumbersome. Let us try to see whether we can (again) mine some of them.
Mining Invariants¶
To mine invariants, we can use the same tracking functionality as before; instead of saving values for individual variables, though, we now check whether the values satisfy specific properties or not. For instance, if all values of x
seen satisfy the condition x > 0
, then we make x > 0
an invariant of the function. If we see positive, zero, and negative values of x
, though, then there is no property of x
left to talk about.
The general idea is thus:
- Check all variable values observed against a set of predefined properties; and
- Keep only those properties that hold for all runs observed.
Defining Properties¶
What precisely do we mean by properties? Here is a small collection of value properties that would frequently be used in invariants. All these properties would be evaluated with the metavariables X
, Y
, and Z
(actually, any upper-case identifier) being replaced with the names of function parameters:
INVARIANT_PROPERTIES = [
"X < 0",
"X <= 0",
"X > 0",
"X >= 0",
# "X == 0", # implied by "X", below
# "X != 0", # implied by "not X", below
]
When square_root(x)
is called as, say square_root(5.0)
, we see that x = 5.0
holds. The above properties would then all be checked for x
. Only the properties X > 0
, X >= 0
, and not X
hold for the call seen; and hence x > 0
, x >= 0
, and not x
(or better: x != 0
) would make potential preconditions for square_root(x)
.
We can check for many more properties such as relations between two arguments:
INVARIANT_PROPERTIES += [
"X == Y",
"X > Y",
"X < Y",
"X >= Y",
"X <= Y",
]
Types also can be checked using properties. For any function parameter X
, only one of these will hold:
INVARIANT_PROPERTIES += [
"isinstance(X, bool)",
"isinstance(X, int)",
"isinstance(X, float)",
"isinstance(X, list)",
"isinstance(X, dict)",
]
We can check for arithmetic properties:
INVARIANT_PROPERTIES += [
"X == Y + Z",
"X == Y * Z",
"X == Y - Z",
"X == Y / Z",
]
Here's relations over three values, a Python special:
INVARIANT_PROPERTIES += [
"X < Y < Z",
"X <= Y <= Z",
"X > Y > Z",
"X >= Y >= Z",
]
These Boolean properties also check for other types, as in Python, None
, an empty list, an empty set, an empty string, and the value zero all evaluate to False
.
INVARIANT_PROPERTIES += [
"X",
"not X"
]
Finally, we can also check for list or string properties. Again, this is just a tiny selection.
INVARIANT_PROPERTIES += [
"X == len(Y)",
"X == sum(Y)",
"X in Y",
"X.startswith(Y)",
"X.endswith(Y)",
]
Extracting Meta-Variables¶
Let us first introduce a few helper functions before we can get to the actual mining. metavars()
extracts the set of meta-variables (X
, Y
, Z
, etc.) from a property. To this end, we parse the property as a Python expression and then visit the identifiers.
def metavars(prop: str) -> List[str]:
metavar_list = []
class ArgVisitor(ast.NodeVisitor):
def visit_Name(self, node: ast.Name) -> None:
if node.id.isupper():
metavar_list.append(node.id)
ArgVisitor().visit(ast.parse(prop))
return metavar_list
assert metavars("X < 0") == ['X']
assert metavars("X.startswith(Y)") == ['X', 'Y']
assert metavars("isinstance(X, str)") == ['X']
Instantiating Properties¶
To produce a property as invariant, we need to be able to instantiate it with variable names. The instantiation of X > 0
with X
being instantiated to a
, for instance, gets us a > 0
. To this end, the function instantiate_prop()
takes a property and a collection of variable names and instantiates the meta-variables left-to-right with the corresponding variables names in the collection.
def instantiate_prop_ast(prop: str, var_names: Sequence[str]) -> ast.AST:
class NameTransformer(ast.NodeTransformer):
def visit_Name(self, node: ast.Name) -> ast.Name:
if node.id not in mapping:
return node
return ast.Name(id=mapping[node.id], ctx=ast.Load())
meta_variables = metavars(prop)
assert len(meta_variables) == len(var_names)
mapping = {}
for i in range(0, len(meta_variables)):
mapping[meta_variables[i]] = var_names[i]
prop_ast = ast.parse(prop, mode='eval')
new_ast = NameTransformer().visit(prop_ast)
return new_ast
def instantiate_prop(prop: str, var_names: Sequence[str]) -> str:
prop_ast = instantiate_prop_ast(prop, var_names)
prop_text = ast.unparse(prop_ast).strip()
while prop_text.startswith('(') and prop_text.endswith(')'):
prop_text = prop_text[1:-1]
return prop_text
assert instantiate_prop("X > Y", ['a', 'b']) == 'a > b'
assert instantiate_prop("X.startswith(Y)", ['x', 'y']) == 'x.startswith(y)'
Evaluating Properties¶
To actually evaluate properties, we do not need to instantiate them. Instead, we simply convert them into a boolean function, using lambda
:
def prop_function_text(prop: str) -> str:
return "lambda " + ", ".join(metavars(prop)) + ": " + prop
Here is a simple example:
prop_function_text("X > Y")
'lambda X, Y: X > Y'
We can easily evaluate the function:
def prop_function(prop: str) -> Callable:
return eval(prop_function_text(prop))
Here is an example:
p = prop_function("X > Y")
quiz("What is p(100, 1)?",
[
"False",
"True"
], 'p(100, 1) + 1', globals())
Quiz
p(100, 1)
True
p(1, 100)
False
Checking Invariants¶
To extract invariants from an execution, we need to check them on all possible instantiations of arguments. If the function to be checked has two arguments a
and b
, we instantiate the property X < Y
both as a < b
and b < a
and check each of them.
To get all combinations, we use the Python permutations()
function:
for combination in itertools.permutations([1.0, 2.0, 3.0], 2):
print(combination)
(1.0, 2.0) (1.0, 3.0) (2.0, 1.0) (2.0, 3.0) (3.0, 1.0) (3.0, 2.0)
The function true_property_instantiations()
takes a property and a list of tuples (var_name
, value
). It then produces all instantiations of the property with the given values and returns those that evaluate to True.
Invariants = Set[Tuple[str, Tuple[str, ...]]]
def true_property_instantiations(prop: str, vars_and_values: Arguments,
log: bool = False) -> Invariants:
instantiations = set()
p = prop_function(prop)
len_metavars = len(metavars(prop))
for combination in itertools.permutations(vars_and_values, len_metavars):
args = [value for var_name, value in combination]
var_names = [var_name for var_name, value in combination]
try:
result = p(*args)
except:
result = None
if log:
print(prop, combination, result)
if result:
instantiations.add((prop, tuple(var_names)))
return instantiations
Here is an example. If x == -1
and y == 1
, the property X < Y
holds for x < y
, but not for y < x
:
invs = true_property_instantiations("X < Y", [('x', -1), ('y', 1)], log=True)
invs
X < Y (('x', -1), ('y', 1)) True X < Y (('y', 1), ('x', -1)) False
{('X < Y', ('x', 'y'))}
The instantiation retrieves the short form:
for prop, var_names in invs:
print(instantiate_prop(prop, var_names))
x < y
Likewise, with values for x
and y
as above, the property X < 0
only holds for x
, but not for y
:
invs = true_property_instantiations("X < 0", [('x', -1), ('y', 1)], log=True)
X < 0 (('x', -1),) True X < 0 (('y', 1),) False
for prop, var_names in invs:
print(instantiate_prop(prop, var_names))
x < 0
Extracting Invariants¶
Let us now run the above invariant extraction on function arguments and return values as observed during a function execution. To this end, we extend the CallTracer
class into an InvariantTracer
class, which automatically computes invariants for all functions and all calls observed during tracking.
By default, an InvariantTracer
uses the INVARIANT_PROPERTIES
properties as defined above; however, one can specify alternate sets of properties.
class InvariantTracer(CallTracer):
def __init__(self, props: Optional[List[str]] = None, **kwargs: Any) -> None:
if props is None:
props = INVARIANT_PROPERTIES
self.props = props
super().__init__(**kwargs)
The key method of the InvariantTracer
is the invariants()
method. This iterates over the calls observed and checks which properties hold. Only the intersection of properties – that is, the set of properties that hold for all calls – is preserved, and eventually returned. The special variable return_value
is set to hold the return value.
RETURN_VALUE = 'return_value'
class InvariantTracer(InvariantTracer):
def all_invariants(self) -> Dict[str, Invariants]:
return {function_name: self.invariants(function_name)
for function_name in self.all_calls()}
def invariants(self, function_name: str) -> Invariants:
invariants = None
for variables, return_value in self.calls(function_name):
vars_and_values = variables + [(RETURN_VALUE, return_value)]
s = set()
for prop in self.props:
s |= true_property_instantiations(prop, vars_and_values,
self._log)
if invariants is None:
invariants = s
else:
invariants &= s
assert invariants is not None
return invariants
Here's an example of how to use invariants()
. We run the tracer on a small set of calls.
with InvariantTracer() as tracer:
y = square_root(25.0)
y = square_root(10.0)
tracer.all_calls()
{'square_root': [([('x', 25.0)], 5.0), ([('x', 10.0)], 3.162277660168379)]}
The invariants()
method produces a set of properties that hold for the observed runs, together with their instantiations over function arguments.
invs = tracer.invariants('square_root')
invs
{('X', ('return_value',)), ('X', ('x',)), ('X < Y', ('return_value', 'x')), ('X <= Y', ('return_value', 'x')), ('X > 0', ('return_value',)), ('X > 0', ('x',)), ('X > Y', ('x', 'return_value')), ('X >= 0', ('return_value',)), ('X >= 0', ('x',)), ('X >= Y', ('x', 'return_value')), ('isinstance(X, float)', ('return_value',)), ('isinstance(X, float)', ('x',))}
As before, the actual instantiations are easier to read:
def pretty_invariants(invariants: Invariants) -> List[str]:
props = []
for (prop, var_names) in invariants:
props.append(instantiate_prop(prop, var_names))
return sorted(props)
pretty_invariants(invs)
['isinstance(return_value, float)', 'isinstance(x, float)', 'return_value', 'return_value < x', 'return_value <= x', 'return_value > 0', 'return_value >= 0', 'x', 'x > 0', 'x > return_value', 'x >= 0', 'x >= return_value']
We see that the both x
and the return value have a float
type. We also see that both are always greater than zero. These are properties that may make useful pre- and postconditions, notably for symbolic analysis.
However, there's also an invariant which does not universally hold, namely return_value <= x
, as the following example shows:
square_root(0.01)
0.1
Clearly, 0.1 > 0.01 holds. This is a case of us not learning from sufficiently diverse inputs. As soon as we have a call including x = 0.1
, though, the invariant return_value <= x
is eliminated:
with InvariantTracer() as tracer:
y = square_root(25.0)
y = square_root(10.0)
y = square_root(0.01)
pretty_invariants(tracer.invariants('square_root'))
['isinstance(return_value, float)', 'isinstance(x, float)', 'return_value', 'return_value > 0', 'return_value >= 0', 'x', 'x > 0', 'x >= 0']
We will discuss later how to ensure sufficient diversity in inputs. (Hint: This involves test generation.)
Let us try out our invariant tracer on sum3()
. We see that all types are well-defined; the properties that all arguments are non-zero, however, is specific to the calls observed.
with InvariantTracer() as tracer:
y = sum3(1, 2, 3)
y = sum3(-4, -5, -6)
pretty_invariants(tracer.invariants('sum3'))
['a', 'b', 'c', 'isinstance(a, int)', 'isinstance(b, int)', 'isinstance(c, int)', 'isinstance(return_value, int)', 'return_value']
If we invoke sum3()
with strings instead, we get different invariants. Notably, we obtain the postcondition that the returned string always starts with the string in the first argument a
– a universal postcondition if strings are used.
with InvariantTracer() as tracer:
y = sum3('a', 'b', 'c')
y = sum3('f', 'e', 'd')
pretty_invariants(tracer.invariants('sum3'))
['a', 'a < return_value', 'a <= return_value', 'a in return_value', 'b', 'b in return_value', 'c', 'c in return_value', 'return_value', 'return_value > a', 'return_value >= a', 'return_value.endswith(c)', 'return_value.startswith(a)']
If we invoke sum3()
with both strings and numbers (and zeros, too), there are no properties left that would hold across all calls. That's the price of flexibility.
with InvariantTracer() as tracer:
y = sum3('a', 'b', 'c')
y = sum3('c', 'b', 'a')
y = sum3(-4, -5, -6)
y = sum3(0, 0, 0)
pretty_invariants(tracer.invariants('sum3'))
[]
Converting Mined Invariants to Annotations¶
As with types, above, we would like to have some functionality where we can add the mined invariants as annotations to existing functions. To this end, we introduce the InvariantAnnotator
class, extending InvariantTracer
.
We start with a helper method. params()
returns a comma-separated list of parameter names as observed during calls.
class InvariantAnnotator(InvariantTracer):
def params(self, function_name: str) -> str:
arguments, return_value = self.calls(function_name)[0]
return ", ".join(arg_name for (arg_name, arg_value) in arguments)
with InvariantAnnotator() as annotator:
y = square_root(25.0)
y = sum3(1, 2, 3)
annotator.params('square_root')
'x'
annotator.params('sum3')
'a, b, c'
Now for the actual annotation. preconditions()
returns the preconditions from the mined invariants (i.e., those properties that do not depend on the return value) as a string with annotations:
class InvariantAnnotator(InvariantAnnotator):
def preconditions(self, function_name: str) -> List[str]:
"""Return a list of mined preconditions for `function_name`"""
conditions = []
for inv in pretty_invariants(self.invariants(function_name)):
if inv.find(RETURN_VALUE) >= 0:
continue # Postcondition
cond = ("@precondition(lambda " + self.params(function_name) +
": " + inv + ")")
conditions.append(cond)
return conditions
with InvariantAnnotator() as annotator:
y = square_root(25.0)
y = square_root(0.01)
y = sum3(1, 2, 3)
annotator.preconditions('square_root')
['@precondition(lambda x: isinstance(x, float))', '@precondition(lambda x: x)', '@precondition(lambda x: x > 0)', '@precondition(lambda x: x >= 0)']
postconditions()
does the same for postconditions:
class InvariantAnnotator(InvariantAnnotator):
def postconditions(self, function_name: str) -> List[str]:
"""Return a list of mined postconditions for `function_name`"""
conditions = []
for inv in pretty_invariants(self.invariants(function_name)):
if inv.find(RETURN_VALUE) < 0:
continue # Precondition
cond = (f"@postcondition(lambda {RETURN_VALUE},"
f" {self.params(function_name)}: {inv})")
conditions.append(cond)
return conditions
with InvariantAnnotator() as annotator:
y = square_root(25.0)
y = square_root(0.01)
y = sum3(1, 2, 3)
annotator.postconditions('square_root')
['@postcondition(lambda return_value, x: isinstance(return_value, float))', '@postcondition(lambda return_value, x: return_value)', '@postcondition(lambda return_value, x: return_value > 0)', '@postcondition(lambda return_value, x: return_value >= 0)']
With these, we can take a function and add both pre- and postconditions as annotations:
class InvariantAnnotator(InvariantAnnotator):
def functions_with_invariants(self) -> str:
"""Return the code of all observed functions, annotated with invariants"""
functions = ""
for function_name in self.all_invariants():
try:
function = self.function_with_invariants(function_name)
except KeyError:
function = '# Could not find function ' + repr(function_name)
functions += function
return functions
def function_with_invariants(self, function_name: str) -> str:
"""Return the code of `function_name`, annotated with invariants"""
function = self.search_func(function_name)
if not function:
raise KeyError
source = inspect.getsource(function)
return '\n'.join(self.preconditions(function_name) +
self.postconditions(function_name)) + \
'\n' + source
def __repr__(self) -> str:
"""String representation, like `functions_with_invariants()`"""
return self.functions_with_invariants()
Here comes function_with_invariants()
in all its glory:
with InvariantAnnotator() as annotator:
y = square_root(25.0)
y = square_root(0.01)
y = sum3(1, 2, 3)
print_content(annotator.function_with_invariants('square_root'), '.py')
@precondition(lambda x: isinstance(x, float)) @precondition(lambda x: x) @precondition(lambda x: x > 0) @precondition(lambda x: x >= 0) @postcondition(lambda return_value, x: isinstance(return_value, float)) @postcondition(lambda return_value, x: return_value) @postcondition(lambda return_value, x: return_value > 0) @postcondition(lambda return_value, x: return_value >= 0) def square_root(x): # type: ignore """Computes the square root of x, using the Newton-Raphson method""" approx = None guess = x / 2 while approx != guess: approx = guess guess = (approx + x / approx) / 2 return approx
Quite a number of invariants, isn't it? Further below (and in the exercises), we will discuss on how to focus on the most relevant properties.
Avoiding Overspecialization¶
Mined specifications can only be as good as the executions they were mined from. If we only see a single call, for instance, we will be faced with several mined pre- and postconditions that overspecialize towards the values seen.
Let us illustrate this effect on a simple sum2()
function which adds two numbers.
def sum2(a, b): # type: ignore
return a + b
If we invoke sum2()
with a variety of arguments, the invariants all capture the relationship between a
, b
, and the return value as return_value == a + b
in all its variations.
with InvariantAnnotator() as annotator:
sum2(31, 45)
sum2(0, 0)
sum2(-1, -5)
print_content(annotator.functions_with_invariants(), '.py')
@precondition(lambda a, b: isinstance(a, int)) @precondition(lambda a, b: isinstance(b, int)) @postcondition(lambda return_value, a, b: a == return_value - b) @postcondition(lambda return_value, a, b: b == return_value - a) @postcondition(lambda return_value, a, b: isinstance(return_value, int)) @postcondition(lambda return_value, a, b: return_value == a + b) @postcondition(lambda return_value, a, b: return_value == b + a) def sum2(a, b): # type: ignore return a + b
If, however, we see only a single call, the invariants will overspecialize to the single call seen:
with InvariantAnnotator() as annotator:
y = sum2(2, 2)
print_content(annotator.functions_with_invariants(), '.py')
@precondition(lambda a, b: a) @precondition(lambda a, b: a <= b) @precondition(lambda a, b: a == b) @precondition(lambda a, b: a > 0) @precondition(lambda a, b: a >= 0) @precondition(lambda a, b: a >= b) @precondition(lambda a, b: b) @precondition(lambda a, b: b <= a) @precondition(lambda a, b: b == a) @precondition(lambda a, b: b > 0) @precondition(lambda a, b: b >= 0) @precondition(lambda a, b: b >= a) @precondition(lambda a, b: isinstance(a, int)) @precondition(lambda a, b: isinstance(b, int)) @postcondition(lambda return_value, a, b: a < return_value) @postcondition(lambda return_value, a, b: a <= b <= return_value) @postcondition(lambda return_value, a, b: a <= return_value) @postcondition(lambda return_value, a, b: a == return_value - b) @postcondition(lambda return_value, a, b: a == return_value / b) @postcondition(lambda return_value, a, b: b < return_value) @postcondition(lambda return_value, a, b: b <= a <= return_value) @postcondition(lambda return_value, a, b: b <= return_value) @postcondition(lambda return_value, a, b: b == return_value - a) @postcondition(lambda return_value, a, b: b == return_value / a) @postcondition(lambda return_value, a, b: isinstance(return_value, int)) @postcondition(lambda return_value, a, b: return_value) @postcondition(lambda return_value, a, b: return_value == a * b) @postcondition(lambda return_value, a, b: return_value == a + b) @postcondition(lambda return_value, a, b: return_value == b * a) @postcondition(lambda return_value, a, b: return_value == b + a) @postcondition(lambda return_value, a, b: return_value > 0) @postcondition(lambda return_value, a, b: return_value > a) @postcondition(lambda return_value, a, b: return_value > b) @postcondition(lambda return_value, a, b: return_value >= 0) @postcondition(lambda return_value, a, b: return_value >= a) @postcondition(lambda return_value, a, b: return_value >= a >= b) @postcondition(lambda return_value, a, b: return_value >= b) @postcondition(lambda return_value, a, b: return_value >= b >= a) def sum2(a, b): # type: ignore return a + b
The mined precondition a == b
, for instance, only holds for the single call observed; the same holds for the mined postcondition return_value == a * b
. Yet, sum2()
can obviously be successfully called with other values that do not satisfy these conditions.
To get out of this trap, we have to learn from more and more diverse runs.
One way to obtain such runs is by generating inputs. Indeed, a simple test generator for calls of sum2()
will easily resolve the problem.
with InvariantAnnotator() as annotator:
for i in range(100):
a = random.randrange(-10, +10)
b = random.randrange(-10, +10)
length = sum2(a, b)
print_content(annotator.function_with_invariants('sum2'), '.py')
@precondition(lambda a, b: isinstance(a, int)) @precondition(lambda a, b: isinstance(b, int)) @postcondition(lambda return_value, a, b: a == return_value - b) @postcondition(lambda return_value, a, b: b == return_value - a) @postcondition(lambda return_value, a, b: isinstance(return_value, int)) @postcondition(lambda return_value, a, b: return_value == a + b) @postcondition(lambda return_value, a, b: return_value == b + a) def sum2(a, b): # type: ignore return a + b
Note, though, that an API test generator, such as above, will have to be set up such that it actually respects preconditions – in our case, we invoke sum2()
with integers only, already assuming its precondition. In some way, one thus needs a specification (a model, a grammar) to mine another specification – a chicken-and-egg problem.
However, there is one way out of this problem: If one can automatically generate tests at the system level, then one has an infinite source of executions to learn invariants from. In each of these executions, all functions would be called with values that satisfy the (implicit) precondition, allowing us to mine invariants for these functions. This holds, because at the system level, invalid inputs must be rejected by the system in the first place. The meaningful precondition at the system level, ensuring that only valid inputs get through, thus gets broken down into a multitude of meaningful preconditions (and subsequent postconditions) at the function level.
The big requirement for all this, though, is that one needs good test generators. This will be the subject of another book, namely The Fuzzing Book.
Partial Invariants¶
For debugging, it can be helpful to focus on invariants produced only by failing runs, thus characterizing the circumstances under which a function fails. Let us illustrate this on an example.
The middle()
function from the chapter on statistical debugging is supposed to return the middle of three integers x
, y
, and z
.
with InvariantAnnotator() as annotator:
for i in range(100):
x = random.randrange(-10, +10)
y = random.randrange(-10, +10)
z = random.randrange(-10, +10)
mid = middle(x, y, z)
By default, our InvariantAnnotator()
does not return any particular pre- or postcondition (other than the types observed). That is just fine, as the function indeed imposes no particular precondition; and the postcondition from middle()
is not covered by the InvariantAnnotator
patterns.
print_content(annotator.functions_with_invariants(), '.py')
# Could not find function '_randbelow_with_getrandbits'# Could not find function 'randrange'@precondition(lambda x, y, z: isinstance(x, int)) @precondition(lambda x, y, z: isinstance(y, int)) @precondition(lambda x, y, z: isinstance(z, int)) @postcondition(lambda return_value, x, y, z: isinstance(return_value, int)) def middle(x, y, z): # type: ignore if y < z: if x < y: return y elif x < z: return y else: if x > y: return y elif x > z: return x return z # Could not find function 'enumerate'# Could not find function 'ident'# Could not find function '_clean_thread_parent_frames'
Things get more interesting if we focus on a particular subset of runs only, though - say, a set of inputs where middle()
fails.
with InvariantAnnotator() as annotator:
for x, y, z in MIDDLE_FAILING_TESTCASES:
mid = middle(x, y, z)
print_content(annotator.functions_with_invariants(), '.py')
@precondition(lambda x, y, z: isinstance(x, int)) @precondition(lambda x, y, z: isinstance(y, int)) @precondition(lambda x, y, z: isinstance(z, int)) @precondition(lambda x, y, z: x) @precondition(lambda x, y, z: x < z) @precondition(lambda x, y, z: x <= z) @precondition(lambda x, y, z: x > 0) @precondition(lambda x, y, z: x > y) @precondition(lambda x, y, z: x >= 0) @precondition(lambda x, y, z: x >= y) @precondition(lambda x, y, z: y < x) @precondition(lambda x, y, z: y < x < z) @precondition(lambda x, y, z: y < z) @precondition(lambda x, y, z: y <= x) @precondition(lambda x, y, z: y <= x <= z) @precondition(lambda x, y, z: y <= z) @precondition(lambda x, y, z: y >= 0) @precondition(lambda x, y, z: z) @precondition(lambda x, y, z: z > 0) @precondition(lambda x, y, z: z > x) @precondition(lambda x, y, z: z > x > y) @precondition(lambda x, y, z: z > y) @precondition(lambda x, y, z: z >= 0) @precondition(lambda x, y, z: z >= x) @precondition(lambda x, y, z: z >= x >= y) @precondition(lambda x, y, z: z >= y) @postcondition(lambda return_value, x, y, z: isinstance(return_value, int)) @postcondition(lambda return_value, x, y, z: return_value < x) @postcondition(lambda return_value, x, y, z: return_value < x < z) @postcondition(lambda return_value, x, y, z: return_value < z) @postcondition(lambda return_value, x, y, z: return_value <= x) @postcondition(lambda return_value, x, y, z: return_value <= x <= z) @postcondition(lambda return_value, x, y, z: return_value <= y) @postcondition(lambda return_value, x, y, z: return_value <= y <= x) @postcondition(lambda return_value, x, y, z: return_value <= y <= z) @postcondition(lambda return_value, x, y, z: return_value <= z) @postcondition(lambda return_value, x, y, z: return_value == y) @postcondition(lambda return_value, x, y, z: return_value >= 0) @postcondition(lambda return_value, x, y, z: return_value >= y) @postcondition(lambda return_value, x, y, z: x > return_value) @postcondition(lambda return_value, x, y, z: x >= return_value) @postcondition(lambda return_value, x, y, z: x >= return_value >= y) @postcondition(lambda return_value, x, y, z: x >= y >= return_value) @postcondition(lambda return_value, x, y, z: y <= return_value) @postcondition(lambda return_value, x, y, z: y <= return_value <= x) @postcondition(lambda return_value, x, y, z: y <= return_value <= z) @postcondition(lambda return_value, x, y, z: y == return_value) @postcondition(lambda return_value, x, y, z: y >= return_value) @postcondition(lambda return_value, x, y, z: z > return_value) @postcondition(lambda return_value, x, y, z: z > x > return_value) @postcondition(lambda return_value, x, y, z: z >= return_value) @postcondition(lambda return_value, x, y, z: z >= return_value >= y) @postcondition(lambda return_value, x, y, z: z >= x >= return_value) @postcondition(lambda return_value, x, y, z: z >= y >= return_value) def middle(x, y, z): # type: ignore if y < z: if x < y: return y elif x < z: return y else: if x > y: return y elif x > z: return x return z
Now that's an intimidating set of pre- and postconditions. However, almost all of the preconditions are implied by the one precondition
@precondition(lambda x, y, z: y < x < z, doc='y < x < z')
which characterizes the exact condition under which middle()
fails (which also happens to be the condition under which the erroneous second return y
is executed). By checking how invariants for failing runs differ from invariants for passing runs, we can identify circumstances for function failures.
quiz("Could `InvariantAnnotator` also determine a precondition "
"that characterizes _passing_ runs?",
[
"Yes",
"No"
], 'int(math.exp(1))', globals())
Quiz
InvariantAnnotator
also determine a precondition that characterizes passing runs?
Indeed, it cannot – the correct invariant for passing runs would be the inverse of the invariant for failing runs, and not A < B < C
is not part of our invariant library. We can easily test this:
with InvariantAnnotator() as annotator:
for x, y, z in MIDDLE_PASSING_TESTCASES:
mid = middle(x, y, z)
print_content(annotator.functions_with_invariants(), '.py')
@precondition(lambda x, y, z: isinstance(x, int)) @precondition(lambda x, y, z: isinstance(y, int)) @precondition(lambda x, y, z: isinstance(z, int)) @precondition(lambda x, y, z: x >= 0) @precondition(lambda x, y, z: y >= 0) @precondition(lambda x, y, z: z >= 0) @postcondition(lambda return_value, x, y, z: isinstance(return_value, int)) @postcondition(lambda return_value, x, y, z: return_value >= 0) def middle(x, y, z): # type: ignore if y < z: if x < y: return y elif x < z: return y else: if x > y: return y elif x > z: return x return z
Some Examples¶
Let us try out the InvariantAnnotator
on a number of examples.
Removing HTML Markup¶
Running InvariantAnnotator
on our ongoing example remove_html_markup()
does not provide much, as our invariant properties are tailored towards numerical functions.
with InvariantAnnotator() as annotator:
remove_html_markup("<foo>bar</foo>")
remove_html_markup("bar")
remove_html_markup('"bar"')
print_content(annotator.functions_with_invariants(), '.py')
@precondition(lambda s: s) @postcondition(lambda return_value, s: return_value) @postcondition(lambda return_value, s: return_value >= s) @postcondition(lambda return_value, s: return_value in s) @postcondition(lambda return_value, s: s <= return_value) def remove_html_markup(s): # type: ignore tag = False quote = False out = "" for c in s: assert tag or not quote if c == '<' and not quote: tag = True elif c == '>' and not quote: tag = False elif (c == '"' or c == "'") and tag: quote = not quote elif not tag: out = out + c return out
In the chapter on DDSet, we will see how to express more complex properties for structured inputs.
A Recursive Function¶
Here's another example. list_length()
recursively computes the length of a Python function. Let us see whether we can mine its invariants:
def list_length(elems: List[Any]) -> int:
if elems == []:
length = 0
else:
length = 1 + list_length(elems[1:])
return length
with InvariantAnnotator() as annotator:
length = list_length([1, 2, 3])
print_content(annotator.functions_with_invariants(), '.py')
@precondition(lambda elems: isinstance(elems, list)) @postcondition(lambda return_value, elems: isinstance(return_value, int)) @postcondition(lambda return_value, elems: return_value == len(elems)) @postcondition(lambda return_value, elems: return_value >= 0) def list_length(elems: List[Any]) -> int: if elems == []: length = 0 else: length = 1 + list_length(elems[1:]) return length
Almost all these properties are relevant. Of course, the reason the invariants are so neat is that the return value is equal to len(elems)
is that X == len(Y)
is part of the list of properties to be checked.
Sum of two Numbers¶
The next example is a very simple function: If we have a function without return value, the return value is None
, and we can only mine preconditions. (Well, we get a "postcondition" not return_value
that the return value evaluates to False, which holds for None
).
def print_sum(a, b): # type: ignore
print(a + b)
with InvariantAnnotator() as annotator:
print_sum(31, 45)
print_sum(0, 0)
print_sum(-1, -5)
76 0 -6
print_content(annotator.functions_with_invariants(), '.py')
# Could not find function 'parent_header'# Could not find function '_is_master_process'# Could not find function 'utcoffset'# Could not find function 'is_set'# Could not find function '_wait_for_tstate_lock'# Could not find function 'is_alive'# Could not find function '_event_pipe'# Could not find function 'send'# Could not find function 'schedule'# Could not find function '_schedule_flush'# Could not find function 'write'@precondition(lambda a, b: isinstance(a, int)) @precondition(lambda a, b: isinstance(b, int)) @postcondition(lambda return_value, a, b: not return_value) def print_sum(a, b): # type: ignore print(a + b)
Checking Specifications¶
A function with invariants, as above, can be fed into the Python interpreter, such that all pre- and postconditions are checked. We create a function square_root_annotated()
which includes all the invariants mined above.
with InvariantAnnotator() as annotator:
y = square_root(25.0)
y = square_root(0.01)
square_root_def = annotator.functions_with_invariants()
square_root_def = square_root_def.replace('square_root',
'square_root_annotated')
print_content(square_root_def, '.py')
@precondition(lambda x: isinstance(x, float)) @precondition(lambda x: x) @precondition(lambda x: x > 0) @precondition(lambda x: x >= 0) @postcondition(lambda return_value, x: isinstance(return_value, float)) @postcondition(lambda return_value, x: return_value) @postcondition(lambda return_value, x: return_value > 0) @postcondition(lambda return_value, x: return_value >= 0) def square_root_annotated(x): # type: ignore """Computes the square root of x, using the Newton-Raphson method""" approx = None guess = x / 2 while approx != guess: approx = guess guess = (approx + x / approx) / 2 return approx
exec(square_root_def)
The "annotated" version checks against invalid arguments – or more precisely, against arguments with properties that have not been observed yet:
with ExpectError():
square_root_annotated(-1.0) # type: ignore
Traceback (most recent call last): File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/2533314883.py", line 2, in <module> square_root_annotated(-1.0) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/3759592669.py", line 11, in wrapper retval = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/3759592669.py", line 11, in wrapper retval = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/3759592669.py", line 7, in wrapper assert precondition(*args, **kwargs), \ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Precondition violated (expected)
This is in contrast to the original version, which just hangs on negative values:
with ExpectTimeout(1):
square_root(-1.0)
Traceback (most recent call last): File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/3436772654.py", line 2, in <module> square_root(-1.0) File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/81976482.py", line 5, in square_root while approx != guess: ^^^^^^^^^^^^^^^ File "Timeout.ipynb", line 43, in timeout_handler raise TimeoutError() TimeoutError (expected)
If we make changes to the function definition such that the properties of the return value change, such regressions are caught as violations of the postconditions. Let us illustrate this by simply inverting the result, and return $-2$ as square root of 4.
square_root_def = square_root_def.replace('square_root_annotated',
'square_root_negative')
square_root_def = square_root_def.replace('return approx',
'return -approx')
print_content(square_root_def, '.py')
@precondition(lambda x: isinstance(x, float)) @precondition(lambda x: x) @precondition(lambda x: x > 0) @precondition(lambda x: x >= 0) @postcondition(lambda return_value, x: isinstance(return_value, float)) @postcondition(lambda return_value, x: return_value) @postcondition(lambda return_value, x: return_value > 0) @postcondition(lambda return_value, x: return_value >= 0) def square_root_negative(x): # type: ignore """Computes the square root of x, using the Newton-Raphson method""" approx = None guess = x / 2 while approx != guess: approx = guess guess = (approx + x / approx) / 2 return -approx
exec(square_root_def)
Technically speaking, $-2$ is a square root of 4, since $(-2)^2 = 4$ holds. Yet, such a change may be unexpected by callers of square_root()
, and hence, this would be caught with the first call:
with ExpectError():
square_root_negative(2.0) # type: ignore
Traceback (most recent call last): File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/2047157021.py", line 2, in <module> square_root_negative(2.0) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/3759592669.py", line 11, in wrapper retval = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/3759592669.py", line 11, in wrapper retval = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/3759592669.py", line 11, in wrapper retval = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ [Previous line repeated 4 more times] File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/3759592669.py", line 13, in wrapper assert postcondition(retval, *args, **kwargs), \ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Postcondition violated (expected)
We see how pre- and postconditions, as well as types, can serve as oracles during testing. In particular, once we have mined them for a set of functions, we can check them again and again with test generators – especially after code changes. The more checks we have, and the more specific they are, the more likely it is we can detect unwanted effects of changes.
Synopsis¶
This chapter provides two classes that automatically extract specifications from a function and a set of inputs:
TypeAnnotator
for types, andInvariantAnnotator
for pre- and postconditions.
Both work by observing a function and its invocations within a with
clause. Here is an example for the type annotator:
def sum2(a, b): # type: ignore
return a + b
with TypeAnnotator() as type_annotator:
sum2(1, 2)
sum2(-4, -5)
sum2(0, 0)
The typed_functions()
method will return a representation of sum2()
annotated with types observed during execution.
print(type_annotator.typed_functions())
def sum2(a: int, b: int) -> int: return a + b
As a shortcut, one can also just evaluate the annotator:
type_annotator
def sum2(a: int, b: int) -> int: return a + b
The invariant annotator works similarly:
with InvariantAnnotator() as inv_annotator:
sum2(1, 2)
sum2(-4, -5)
sum2(0, 0)
The functions_with_invariants()
method will return a representation of sum2()
annotated with inferred pre- and postconditions that all hold for the observed values.
print(inv_annotator.functions_with_invariants())
@precondition(lambda a, b: isinstance(a, int)) @precondition(lambda a, b: isinstance(b, int)) @postcondition(lambda return_value, a, b: a == return_value - b) @postcondition(lambda return_value, a, b: b == return_value - a) @postcondition(lambda return_value, a, b: isinstance(return_value, int)) @postcondition(lambda return_value, a, b: return_value == a + b) @postcondition(lambda return_value, a, b: return_value == b + a) def sum2(a, b): # type: ignore return a + b
Again, a shortcut is available:
inv_annotator
@precondition(lambda a, b: isinstance(a, int)) @precondition(lambda a, b: isinstance(b, int)) @postcondition(lambda return_value, a, b: a == return_value - b) @postcondition(lambda return_value, a, b: b == return_value - a) @postcondition(lambda return_value, a, b: isinstance(return_value, int)) @postcondition(lambda return_value, a, b: return_value == a + b) @postcondition(lambda return_value, a, b: return_value == b + a) def sum2(a, b): # type: ignore return a + b
Such type specifications and invariants can be helpful as oracles (to detect deviations from a given set of runs). The chapter gives details on how to customize the properties checked for.
# ignore
from ClassDiagram import display_class_hierarchy
# ignore
display_class_hierarchy([TypeAnnotator, InvariantAnnotator],
public_methods=[
TypeAnnotator.typed_function,
TypeAnnotator.typed_functions,
TypeAnnotator.typed_function_ast,
TypeAnnotator.typed_functions_ast,
TypeAnnotator.__repr__,
InvariantAnnotator.function_with_invariants,
InvariantAnnotator.functions_with_invariants,
InvariantAnnotator.preconditions,
InvariantAnnotator.postconditions,
InvariantAnnotator.__repr__,
InvariantTracer.__init__,
CallTracer.__init__
],
project='debuggingbook'
)
Lessons Learned¶
- Type annotations and explicit invariants allow for checking arguments and results for expected data types and other properties.
- One can automatically mine data types and invariants by observing arguments and results at runtime.
- The quality of mined invariants depends on the diversity of values observed during executions; this variety can be increased by generating tests.
Next Steps¶
In the next chapter, we will explore abstracting failure conditions.
Background¶
The DAIKON dynamic invariant detector can be considered the mother of function specification miners. Continuously maintained and extended for more than 20 years, it mines likely invariants in the style of this chapter for a variety of languages, including C, C++, C#, Eiffel, F#, Java, Perl, and Visual Basic. On top of the functionality discussed above, it holds a rich catalog of patterns for likely invariants, supports data invariants, can eliminate invariants that are implied by others, and determines statistical confidence to disregard unlikely invariants. The corresponding paper \cite{Ernst2001} is one of the seminal and most-cited papers of Software Engineering. A multitude of works have been published based on DAIKON and detecting invariants; see this curated list for details.
The interaction between test generators and invariant detection is already discussed in \cite{Ernst2001} (incidentally also using grammars). The Eclat tool \cite{Pacheco2005} is a model example of tight interaction between a unit-level test generator and DAIKON-style invariant mining, where the mined invariants are used to produce oracles and to systematically guide the test generator towards fault-revealing inputs.
Mining specifications is not restricted to pre- and postconditions. The paper "Mining Specifications" \cite{Ammons2002} is another classic in the field, learning state protocols from executions. Grammar mining \cite{Gopinath2020} can also be seen as a specification mining approach, this time learning specifications for input formats.
As it comes to adding type annotations to existing code, the blog post "The state of type hints in Python" gives a great overview on how Python type hints can be used and checked. To add type annotations, there are two important tools available that also implement our above approach:
- MonkeyType implements the above approach of tracing executions and annotating Python 3 arguments, returns, and variables with type hints.
- PyAnnotate does a similar job, focusing on code in Python 2. It does not produce Python 3-style annotations, but instead produces annotations as comments that can be processed by static type checkers.
These tools have been created by engineers at Facebook and Dropbox, respectively, assisting them in checking millions of lines of code for type issues.
Exercises¶
Our code for mining types and invariants is in no way complete. There are dozens of ways to extend our implementations, some of which we discuss in exercises.
Exercise 1: Union Types¶
The Python typing
module allows expressing that an argument can have multiple types. For square_root(x)
, this allows expressing that x
can be an int
or a float
:
def square_root_with_union_type(x: Union[int, float]) -> float: # type: ignore
...
Extend the TypeAnnotator
such that it supports union types for arguments and return values. Use Optional[X]
as a shorthand for Union[X, None]
.
Exercise 2: Types for Local Variables¶
In Python, one cannot only annotate arguments with types, but actually also local and global variables – for instance, approx
and guess
in our square_root()
implementation:
def square_root_with_local_types(x: Union[int, float]) -> float:
"""Computes the square root of x, using the Newton-Raphson method"""
approx: Optional[float] = None
guess: float = x / 2
while approx != guess:
approx: float = guess # type: ignore
guess: float = (approx + x / approx) / 2 # type: ignore
return approx
Extend the TypeAnnotator
such that it also annotates local variables with types. Search the function AST for assignments, determine the type of the assigned value, and make it an annotation on the left-hand side.
Exercise 3: Verbose Invariant Checkers¶
Our implementation of invariant checkers does not make it clear for the user which pre-/postcondition failed.
@precondition(lambda s: len(s) > 0)
def remove_first_char(s: str) -> str:
return s[1:]
with ExpectError():
remove_first_char('')
Traceback (most recent call last): File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/2212034949.py", line 2, in <module> remove_first_char('') File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/3759592669.py", line 7, in wrapper assert precondition(*args, **kwargs), \ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Precondition violated (expected)
The following implementation adds an optional doc
keyword argument which is printed if the invariant is violated:
def verbose_condition(precondition: Optional[Callable] = None,
postcondition: Optional[Callable] = None,
doc: str = 'Unknown') -> Callable:
def decorator(func: Callable) -> Callable:
# Use `functools` to preserve name, docstring, etc
@functools.wraps(func)
def wrapper(*args: Any, **kwargs: Any) -> Any:
if precondition is not None:
assert precondition(*args, **kwargs), \
"Precondition violated: " + doc
# call original function or method
retval = func(*args, **kwargs)
if postcondition is not None:
assert postcondition(retval, *args, **kwargs), \
"Postcondition violated: " + doc
return retval
return wrapper
return decorator
def verbose_precondition(check: Callable, **kwargs: Any) -> Callable:
return verbose_condition(precondition=check,
doc=kwargs.get('doc', 'Unknown'))
def verbose_postcondition(check: Callable, **kwargs: Any) -> Callable:
return verbose_condition(postcondition=check,
doc=kwargs.get('doc', 'Unknown'))
@verbose_precondition(lambda s: len(s) > 0, doc="len(s) > 0") # type: ignore
def remove_first_char(s: str) -> str:
return s[1:]
remove_first_char('abc')
'bc'
with ExpectError():
remove_first_char('')
Traceback (most recent call last): File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/2212034949.py", line 2, in <module> remove_first_char('') File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_99319/706783034.py", line 9, in wrapper assert precondition(*args, **kwargs), \ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Precondition violated: len(s) > 0 (expected)
Extend InvariantAnnotator
into a VerboseInvariantAnnotator
class that includes the conditions in the generated pre- and postconditions.
Exercise 4: Save Initial Values¶
If the value of an argument changes during function execution, this can easily confuse our implementation: The values are tracked at the beginning of the function, but checked only when it returns. Extend the InvariantAnnotator
and the infrastructure it uses such that
- it saves argument values both at the beginning and at the end of a function invocation;
- postconditions can be expressed over both initial values of arguments and the final values of arguments;
- the mined postconditions refer to both these values as well.
Exercise 5: Implications¶
Several mined invariant are actually implied by others: If x > 0
holds, then this implies x >= 0
and x != 0
. Extend the InvariantAnnotator
such that implications between properties are explicitly encoded, and such that implied properties are no longer listed as invariants. See \cite{Ernst2001} for ideas.
Exercise 6: Local Variables¶
Postconditions may also refer to the values of local variables. Consider extending InvariantAnnotator
and its infrastructure such that the values of local variables at the end of the execution are also recorded and made part of the invariant inference mechanism.
Exercise 7: Embedding Invariants as Assertions¶
Rather than producing invariants as annotations for pre- and postconditions, insert them as assert
statements into the function code, as in:
def square_root(x):
'Computes the square root of x, using the Newton-Raphson method'
assert isinstance(x, int), 'violated precondition'
assert x > 0, 'violated precondition'
approx = None
guess = (x / 2)
while (approx != guess):
approx = guess
guess = ((approx + (x / approx)) / 2)
return_value = approx
assert return_value < x, 'violated postcondition'
assert isinstance(return_value, float), 'violated postcondition'
return approx
Such a formulation may make it easier for test generators and symbolic analysis to access and interpret pre- and postconditions.
Exercise 8: Grammar-Generated Properties¶
The larger the set of properties to be checked, the more potential invariants can be discovered. Create a grammar that systematically produces a large set of properties. See \cite{Ernst2001} for possible patterns.
Exercise 9: Loop Invariants¶
This is not so much a problem in debugging, but rather for symbolic verification. A loop invariant is a property that holds for every iteration of a loop, such as
assert tag or not quote
for remove_html_markup()
. Create an annotator that determines and adds loop invariants for for
and while
loops.
Exercise 10: Path Invariants¶
A path invariant is a property that holds for taking a particular path, such as
assert y < x < z
for the second (erroneous) return y
statement of the middle()
function. Create an annotator that adds path invariants (expressed over the preconditions that hold when taking this particular branch) for every if
statement in the code.