Learning from Failures¶
Given the many executions we can generate, it is only natural that these executions would also be subject to machine learning in order to learn which features of the input (or the execution) would be associated with failures.
In this chapter, we study the Alhazen approach, one of the first of this kind.
Alhazen by Kampmann et al. \cite{Kampmann2020} automatically learns the associations between the failure of a program and features of the input data, say "The error occurs whenever the <expr>
element is negative"
This chapter is based on an Alhazen implementation contributed by Martin Eberlein of TU Berlin. Thanks a lot, Martin!
# from bookutils import YouTubeVideo
# YouTubeVideo("w4u5gCgPlmg")
Prerequisites
- This chapter extends the ideas from the chapter on Generalizing Failure Circumstances.
Synopsis¶
To use the code provided in this chapter, write
>>> from debuggingbook.Alhazen import <identifier>
and then make use of the following features.
This chapter provides an implementation of the Alhazen approach \cite{Kampmann2020}, which trains machine learning classifiers from input features.
Given a test function, a grammar, and a set of inputs, the Alhazen
class produces a decision tree that characterizes failure circumstances:
>>> alhazen = Alhazen(sample_runner, CALC_GRAMMAR, initial_sample_list,
>>> max_iterations=20)
>>> alhazen.run()
The final decision tree can be accessed using last_tree()
:
>>> # alhazen.last_tree()
We can visualize the resulting decision tree using Alhazen.show_decision_tree()
:
>>> alhazen.show_decision_tree()
A decision tree is read from top to bottom. Decision nodes (with two children) come with a predicate on top. This predicate is either
- numeric, such as
<value> > 20
, indicating the numeric value of the given symbol, or - existential, such as
<digit> == '1'
, which has a negative value when False, and a positive value when True.
If the predicate evaluates to True
, follow the left path; if it evaluates to False
, follow the right path.
A leaf node (no children) will give you the final decision class = BUG
or class = NO_BUG
.
So if the predicate states <function> == 'sqrt' <= 0.5
, this means that
- If the function is not
sqrt
(the predicate<function> == 'sqrt'
is negative, see above, and hence less than 0.5), follow the left (True
) path. - If the function is
sqrt
(the predicate<function> == 'sqrt'
is positive), follow the right (False
) path.
The samples
field shows the number of sample inputs that contributed to this decision.
The gini
field (aka Gini impurity) indicates how many samples fall into the displayed class (BUG
or NO_BUG
).
A gini
value of 0.0
means purity - all samples fall into the displayed class.
The saturation of nodes also indicates purity – the higher the saturation, the higher the purity.
There is also a text version available, with much fewer (but hopefully still essential) details:
>>> print(alhazen.friendly_decision_tree())
if <term> <= -11.5000:
if <term> <= -42.2970:
NO_BUG
else:
if <function> == 'sqrt':
BUG
else:
NO_BUG
else:
NO_BUG
In both representations, we see that the present failure is associated with a negative value for the sqrt
function and precise boundaries for its value.
In fact, the error conditions are given in the source code:
>>> import inspect
>>> print(inspect.getsource(task_sqrt))
def task_sqrt(x):
"""Computes the square root of x, using the Newton-Raphson method"""
if x <= -12 and x >= -42:
x = 0 # Guess where the bug is :-)
else:
x = 1
x = max(x, 0)
approx = None
guess = x / 2
while approx != guess:
approx = guess
guess = (approx + x / approx) / 2
return approx
Try out Alhazen on your own code and your own examples!
Machine Learning for Automated Debugging¶
When diagnosing why a program fails, the first step is to determine the circumstances under which the program fails. In past chapters, we have examined approaches that correlate execution features with failures as well as tools that systematically generate inputs to reduce failure-inducing inputs or generalize failure circumstances. In this chapter, we will go one step further and make use of full-fledged machine learning to identify failure circumstances (and causes).
The Alhazen Approach¶
In 2020, Kampmann et al. \cite{Kampmann2020} presented one of the first approaches to automatically learn circumstances of (failing) program behavior. Their approach associates the program’s failure with the syntactical features of the input data, allowing them to learn and extract the properties that result in the specific behavior.
Their reference implementation Alhazen can generate a diagnosis and explain why, for instance, a particular bug occurs. Alhazen forms a hypothetical model based on the observed inputs. Additional test inputs are generated and executed to refine or refute the hypothesis, eventually obtaining a prediction model of the circumstances of why the behavior in question takes place.
The tool is named after Ḥasan Ibn al-Haytham (latinized name: Alhazen). Often referred to as the "Father of modern optics", Ibn al-Haytham made significant contributions to the principles of optics and visual perception. Most notably, he was an early proponent of the concept that a hypothesis must be supported by experiments, and thus one of the inventors of the scientific method, the key process in the Alhazen tool.
Let us give a high-level description of how Alhazen works, illustrated above.
Alhazen is given an input grammar and a number of input files (whose format is given by the grammar), and produces a decision tree – a machine learning model that explains under which circumstances the program fails.
Alhazen determines and refines these decision trees in five steps:
- For each input file, Alhazen extracts a number of input features that apply.
These input features are predicates over the individual elements of the input grammar, such as
<expr> > 0
(an<expr>
element is larger than zero) orexists(<minus-sign>)
(the input contains a minus sign). - The test outcomes of the input files label these input files as buggy or non-buggy. From the respective input features and the labels, Alhazen trains a decision tree that associates these features with the labels - that is, the decision tree explains which features lead to buggy or non-buggy.
- As it is typically trained on few samples only, the initial classification model may be imprecise.
Hence, Alhazen extracts further requirements for additional test cases that may help in increasing precision, such as
<digit> == '6'
(we need more inputs in which the<digit>
field has a value of6
.) - Satisfying these requirements, Alhazen then generates additional inputs...
- ...which it executes, thus again labeling them as buggy or non-buggy. From the new inputs, we can again extract the features, and repeat the cycle.
The whole process keeps on refining decision trees with more and more inputs. Eventually, the decision trees are supposed to be precise enough that they can become theory - that is, an explanation of why the program fails with high predictive power for future inputs.
The Alhazen process thus automates the scientific method of debugging:
- making initial observations (Steps 1 and 2),
- coming up with hypotheses that explain the observations (Step 3),
- designing experiments to further support or refute the hypotheses (Steps 4 and 5),
- and repeating the entire process until we have a predicting theory on why the program fails.
Structure of this Chapter¶
In the remainder of this chapter, we will first introduce grammars.
We then explore and implement the individual steps of Alhazen:
- Step 1: Extracting Features
- Step 2: Train Classification Model
- Step 3: Extract Feature Requirements
- Step 4: Generating New Samples
- Step 5: Executing New Inputs
After this is done, we can compose all these into a single Alhazen
class and run it on a sample input.
If you want to see Alhazen in action first (before going into all the details, check out the sample run.)
Inputs and Grammars¶
Alhazen heavily builds on grammars as a means to decompose inputs into individual elements, such that it can reason about these elements, and also generate new ones automatically.
To work with grammars, we use the framework provided by The Fuzzing Book. For a more detailed description of Grammars and how to use them for production, have a look at the chapter "Fuzzing with Grammars"
Let us build a simple grammar for a calculator. The calculator code is listed below.
"""
This file contains the code under test for the example bug.
The sqrt() method fails on x <= 0.
"""
def task_sqrt(x):
"""Computes the square root of x, using the Newton-Raphson method"""
if x <= -12 and x >= -42:
x = 0 # Guess where the bug is :-)
else:
x = 1
x = max(x, 0)
approx = None
guess = x / 2
while approx != guess:
approx = guess
guess = (approx + x / approx) / 2
return approx
def task_tan(x):
return rtan(x)
def task_cos(x):
return rcos(x)
def task_sin(x):
return rsin(x)
The language consists of functions (<function>
) that are being invoked on a numerical value (<term>
).
CALC_GRAMMAR: Grammar = {
"<start>":
["<function>(<term>)"],
"<function>":
["sqrt", "tan", "cos", "sin"],
"<term>": ["-<value>", "<value>"],
"<value>":
["<integer>.<digits>",
"<integer>"],
"<integer>":
["<lead-digit><digits>", "<digit>"],
"<digits>":
["<digit><digits>", "<digit>"],
"<lead-digit>": # First digit cannot be zero
["1", "2", "3", "4", "5", "6", "7", "8", "9"],
"<digit>":
["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"],
}
We see that the CALC_GRAMMAR
consists of several production rules. The calculator subject will only accept inputs that conform to this grammar definition.
Let us load two initial input samples:
sqrt(-16)
sqrt(4)
# Load initial input files
initial_sample_list = ['sqrt(-16)', 'sqrt(4)']
Let's execute our two input samples and observe the calculator's behavior.
We implement the function sample_runner(sample)
that lets us execute the calculator for a single sample. sample_runner(sample)
returns an OracleResult
for the sample.
class OracleResult(Enum):
BUG = "BUG"
NO_BUG = "NO_BUG"
UNDEF = "UNDEF"
def __str__(self):
return self.value
SUBJECT = "calculator"
def sample_runner(sample):
testcode = sample
try:
# Simply execute the calculator code, with the functions replaced
exec(testcode, {"sqrt": task_sqrt, "tan": task_tan, "sin": task_sin, "cos": task_cos}, {})
return OracleResult.NO_BUG
except ZeroDivisionError:
return OracleResult.BUG
except Exception as e:
print(e, file=sys.stderr)
return OracleResult.UNDEF
Let's test the function:
sample = "sqrt(-16)"
sample_runner(sample)
<OracleResult.BUG: 'BUG'>
As expected, the sample sqrt(-16)
triggers the calculator bug. Let's try some more samples:
assert sample_runner("sqrt(-23)") == OracleResult.BUG
assert sample_runner("sqrt(44)") == OracleResult.NO_BUG
assert sample_runner("cos(-9)") == OracleResult.NO_BUG
What happens if we parse inputs to calculator that do not conform to its input format?
sample_runner("undef_function(QUERY)")
name 'undef_function' is not defined
<OracleResult.UNDEF: 'UNDEF'>
The function sample_runner(sample)
returns an OracleResult.UNDEF
whenever the runner is not able to execute the sample.
Finally, we provide the function execute_samples(sample_list)
that obtains the oracle/label for a list of samples.
We use the pandas
module to place these in a data frame.
# Executes a list of samples and return the execution outcome (label)
# The function returns a pandas dataframe
def execute_samples(sample_list):
data = []
for sample in sample_list:
result = sample_runner(sample)
data.append({"oracle": result })
return pandas.DataFrame.from_records(data)
Let us define a bigger list of samples to execute...
sample_list = ["sqrt(-20)", "cos(2)", "sqrt(-100)", "undef_function(foo)"]
... and obtain the execution outcome
labels = execute_samples(sample_list)
labels
name 'undef_function' is not defined
oracle | |
---|---|
0 | BUG |
1 | NO_BUG |
2 | NO_BUG |
3 | UNDEF |
We can combine these with the sample_list
:
for i, row in enumerate(labels['oracle']): print(sample_list[i].ljust(30) + str(row))
sqrt(-20) BUG cos(2) NO_BUG sqrt(-100) NO_BUG undef_function(foo) UNDEF
We can remove the undefined input samples like this:
clean_data = labels.drop(labels[labels.oracle.astype(str) == "UNDEF"].index)
clean_data
oracle | |
---|---|
0 | BUG |
1 | NO_BUG |
2 | NO_BUG |
We can combine sample and labels by iterating over the obtained oracle:
oracle = execute_samples(sample_list)
for i, row in enumerate(oracle['oracle']):
print(sample_list[i].ljust(30) + str(row))
sqrt(-20) BUG cos(2) NO_BUG sqrt(-100) NO_BUG undef_function(foo) UNDEF
name 'undef_function' is not defined
We observe that the sample sqrt(-16)
triggers a bug in the calculator, whereas the sample sqrt(4)
does not show unusual behavior. Of course, we want to know why the sample fails the program. In a typical use case, the developers of the calculator program would now try other input samples and evaluate if similar inputs also trigger the program's failure. Let's try some more input samples; maybe we can refine our understanding of why the calculator crashes:
Our guesses - maybe the failure is also in the cos()
or tan()
function?
guess_samples = ['cos(-16)', 'tan(-16)', 'sqrt(-100)', 'sqrt(-20.23412431234123)']
Let's obtain the execution outcome for each of our guesses:
guess_oracle = execute_samples(guess_samples)
Here come the results:
for i, row in enumerate(guess_oracle['oracle']):
print(guess_samples[i].ljust(30) + str(row))
cos(-16) NO_BUG tan(-16) NO_BUG sqrt(-100) NO_BUG sqrt(-20.23412431234123) BUG
It looks like the failure only occurs in the sqrt()
function, however, only for specific x
values.
We could now try other values for x
and repeat the process.
However, this would be highly time-consuming and not an efficient debugging technique for a larger and more complex test subject.
Wouldn't it be great if there was a tool that automatically does this for us? And this is exactly what Alhazen is there for. It helps us explain why specific input features cause a program to fail.
Step 1: Extracting Features¶
In this section, we are concerned with the problem of extracting semantic features from inputs. In particular, Alhazen defines various features based on the input grammar, such as existence and numeric interpretation. These features are then extracted from the parse trees of the inputs (see Section 3 of \cite{Kampmann2020} for more details).
The implementation of the feature extraction module consists of the following three tasks:
- Implementation of individual feature classes, whose instances allow deriving specific feature values from inputs
- Extraction of features from the grammar through instantiation of the aforementioned feature classes
- Computation of feature vectors from a set of inputs, which will then be used as input for the decision tree
Internal and "Friendly" Feature Names¶
We use two kinds of names for features:
- internal names have the form
<SYMBOL>@N
and refer to theN
-th expansion of symbol (starting with 0). InCALC_GRAMMAR
, for instance,<function>@0
refers to the expansion of<function>
to"sqrt"
- friendly names are more user-friendly (hence the name).
The above feature
<function>@0
has the "friendly" name<function> == "sqrt"
.
We use internal names in all our interaction with the machine learner, as they are unambiguous and do not contain whitespace. When showing the final results, we switch to "friendly" names.
Implementing Feature Classes¶
class Feature(ABC):
'''
The abstract base class for grammar features.
Args:
name : A unique identifier name for this feature. Should not contain Whitespaces.
e.g., 'type(<feature>@1)'
rule : The production rule (e.g., '<function>' or '<value>').
key : The feature key (e.g., the chosen alternative or rule itself).
'''
def __init__(self, name: str, rule: str, key: str, /,
friendly_name: str = None) -> None:
self.name = name
self.rule = rule
self.key = key
self._friendly_name = friendly_name or name
super().__init__()
def __repr__(self) -> str:
'''Returns a printable string representation of the feature.'''
return self.name_rep()
@abstractmethod
def name_rep(self) -> str:
pass
def friendly_name(self) -> str:
return self._friendly_name
@abstractmethod
def get_feature_value(self, derivation_tree) -> float:
'''Returns the feature value for a given derivation tree of an input.'''
pass
def replace(self, new_key: str) -> 'Feature':
'''Returns a new feature with the same name but a different key.'''
return self.__class__(self.name, self.rule, new_key)
class ExistenceFeature(Feature):
'''
This class represents existence features of a grammar. Existence features indicate
whether a particular production rule was used in the derivation sequence of an input.
For a given production rule P -> A | B, a production existence feature for P and
alternative existence features for each alternative (i.e., A and B) are defined.
name : A unique identifier name for this feature. Should not contain Whitespaces.
e.g., 'exist(<digit>@1)'
rule : The production rule.
key : The feature key, equal to the rule attribute for production features,
or equal to the corresponding alternative for alternative features.
'''
def __init__(self, name: str, rule: str, key: str,
friendly_name: str = None) -> None:
super().__init__(name, rule, key, friendly_name=friendly_name)
def name_rep(self) -> str:
if self.rule == self.key:
return f"exists({self.rule})"
else:
return f"exists({self.rule} == {self.key})"
def get_feature_value(self, derivation_tree) -> float:
'''Returns the feature value for a given derivation tree of an input.'''
raise NotImplementedError
def get_feature_value(self, derivation_tree: DerivationTree) -> float:
'''Counts the number of times this feature was matched in the derivation tree.'''
(node, children) = derivation_tree
# The local match count (1 if the feature is matched for the current node, 0 if not)
count = 0
# First check if the current node can be matched with the rule
if node == self.rule:
# Production existance feature
if self.rule == self.key:
count = 1
# Production alternative existance feature
# We compare the children of the expansion with the actual children
else:
expansion_children = list(map(lambda x: x[0], expansion_to_children(self.key)))
node_children = list(map(lambda x: x[0], children))
if expansion_children == node_children:
count= 1
# Recursively compute the counts for all children and return the sum for the whole tree
for child in children:
count = max(count, self.get_feature_value(child))
return count
class NumericInterpretation(Feature):
'''
This class represents numeric interpretation features of a grammar. These features
are defined for productions that only derive words composed of the characters
[0-9], '.', and '-'. The returned feature value corresponds to the maximum
floating-point number interpretation of the derived words of a production.
name : A unique identifier name for this feature. Should not contain Whitespaces.
e.g., 'num(<integer>)'
rule : The production rule.
'''
def __init__(self, name: str, rule: str, /,
friendly_name: str = None) -> None:
super().__init__(name, rule, rule, friendly_name=friendly_name)
def name_rep(self) -> str:
return f"num({self.key})"
def get_feature_value(self, derivation_tree) -> float:
'''Returns the feature value for a given derivation tree of an input.'''
raise NotImplementedError
def get_feature_value(self, derivation_tree: DerivationTree) -> float:
'''Determines the maximum float of this feature in the derivation tree.'''
(node, children) = derivation_tree
value = float('nan')
if node == self.rule:
try:
#print(self.name, float(tree_to_string(derivation_tree)))
value = float(tree_to_string(derivation_tree))
except ValueError:
#print(self.name, float(tree_to_string(derivation_tree)), "err")
pass
# Return maximum value encountered in tree, ignoring all NaNs
tree_values = [value] + [self.get_feature_value(c) for c in children]
if all(isnan(tree_values)):
return value
else:
return nanmax(tree_values)
Extracting Feature Sets from Grammars¶
def extract_existence_features(grammar: Grammar) -> List[ExistenceFeature]:
'''
Extracts all existence features from the grammar and returns them as a list.
grammar : The input grammar.
'''
features = []
for rule in grammar:
# add the rule
features.append(ExistenceFeature(f"exists({rule})", rule, rule))
# add all alternatives
for count, expansion in enumerate(grammar[rule]):
name = f"exists({rule}@{count})"
friendly_name = f"{rule} == {repr(expansion)}"
feature = ExistenceFeature(name, rule, expansion,
friendly_name=friendly_name)
features.append(feature)
return features
# Regex for non-terminal symbols in expansions
RE_NONTERMINAL = re.compile(r'(<[^<> ]*>)')
def extract_numeric_features(grammar: Grammar) -> List[NumericInterpretation]:
'''
Extracts all numeric interpretation features from the grammar and returns them as a list.
grammar : The input grammar.
'''
features = []
# Mapping from non-terminals to derivable terminal chars
derivable_chars = defaultdict(set)
for rule in grammar:
for expansion in grammar[rule]:
# Remove non-terminal symbols and whitespace from expansion
terminals = re.sub(RE_NONTERMINAL, '', expansion).replace(' ', '')
# Add each terminal char to the set of derivable chars
for c in terminals:
derivable_chars[rule].add(c)
# Repeatedly update the mapping until convergence
while True:
updated = False
for rule in grammar:
for r in reachable_nonterminals(grammar, rule):
before = len(derivable_chars[rule])
derivable_chars[rule].update(derivable_chars[r])
after = len(derivable_chars[rule])
# Set of derivable chars was updated
if after > before:
updated = True
if not updated:
break
numeric_chars = set(['0','1','2','3','4','5','6','7','8','9','.','-'])
for key in derivable_chars:
# Check if derivable chars contain only numeric chars
if len(derivable_chars[key] - numeric_chars) == 0:
name = f"num({key})"
friendly_name = f"{key}"
features.append(NumericInterpretation(f"num({key})", key,
friendly_name=friendly_name))
return features
def extract_all_features(grammar: Grammar) -> List[Feature]:
return (extract_existence_features(grammar)
+ extract_numeric_features(grammar))
Here are all the features from our calculator grammar:
extract_all_features(CALC_GRAMMAR)
[exists(<start>), exists(<start> == <function>(<term>)), exists(<function>), exists(<function> == sqrt), exists(<function> == tan), exists(<function> == cos), exists(<function> == sin), exists(<term>), exists(<term> == -<value>), exists(<term> == <value>), exists(<value>), exists(<value> == <integer>.<digits>), exists(<value> == <integer>), exists(<integer>), exists(<integer> == <lead-digit><digits>), exists(<integer> == <digit>), exists(<digits>), exists(<digits> == <digit><digits>), exists(<digits> == <digit>), exists(<lead-digit>), exists(<lead-digit> == 1), exists(<lead-digit> == 2), exists(<lead-digit> == 3), exists(<lead-digit> == 4), exists(<lead-digit> == 5), exists(<lead-digit> == 6), exists(<lead-digit> == 7), exists(<lead-digit> == 8), exists(<lead-digit> == 9), exists(<digit>), exists(<digit> == 0), exists(<digit> == 1), exists(<digit> == 2), exists(<digit> == 3), exists(<digit> == 4), exists(<digit> == 5), exists(<digit> == 6), exists(<digit> == 7), exists(<digit> == 8), exists(<digit> == 9), num(<term>), num(<value>), num(<lead-digit>), num(<digit>), num(<integer>), num(<digits>)]
The friendly
representation is a bit more concise and more readable:
[f.friendly_name() for f in extract_all_features(CALC_GRAMMAR)]
['exists(<start>)', "<start> == '<function>(<term>)'", 'exists(<function>)', "<function> == 'sqrt'", "<function> == 'tan'", "<function> == 'cos'", "<function> == 'sin'", 'exists(<term>)', "<term> == '-<value>'", "<term> == '<value>'", 'exists(<value>)', "<value> == '<integer>.<digits>'", "<value> == '<integer>'", 'exists(<integer>)', "<integer> == '<lead-digit><digits>'", "<integer> == '<digit>'", 'exists(<digits>)', "<digits> == '<digit><digits>'", "<digits> == '<digit>'", 'exists(<lead-digit>)', "<lead-digit> == '1'", "<lead-digit> == '2'", "<lead-digit> == '3'", "<lead-digit> == '4'", "<lead-digit> == '5'", "<lead-digit> == '6'", "<lead-digit> == '7'", "<lead-digit> == '8'", "<lead-digit> == '9'", 'exists(<digit>)', "<digit> == '0'", "<digit> == '1'", "<digit> == '2'", "<digit> == '3'", "<digit> == '4'", "<digit> == '5'", "<digit> == '6'", "<digit> == '7'", "<digit> == '8'", "<digit> == '9'", '<term>', '<value>', '<lead-digit>', '<digit>', '<integer>', '<digits>']
Extracting Feature Values from Inputs¶
This is a rather slow implementation. For many grammars with many syntactically features, the feature collection can be optimized.
def collect_features(sample_list: List[str],
grammar: Grammar) -> pandas.DataFrame:
data = []
# parse grammar and extract features
all_features = extract_all_features(grammar)
# iterate over all samples
for sample in sample_list:
parsed_features = {}
parsed_features["sample"] = sample
# initate dictionary
for feature in all_features:
parsed_features[feature.name] = 0
# Obtain the parse tree for each input file
earley = EarleyParser(grammar)
for tree in earley.parse(sample):
for feature in all_features:
parsed_features[feature.name] = feature.get_feature_value(tree)
data.append(parsed_features)
return pandas.DataFrame.from_records(data)
sample_list = ["sqrt(-900)", "sin(24)", "cos(-3.14)"]
collect_features(sample_list, CALC_GRAMMAR)
sample | exists(<start>) | exists(<start>@0) | exists(<function>) | exists(<function>@0) | exists(<function>@1) | exists(<function>@2) | exists(<function>@3) | exists(<term>) | exists(<term>@0) | ... | exists(<digit>@6) | exists(<digit>@7) | exists(<digit>@8) | exists(<digit>@9) | num(<term>) | num(<value>) | num(<lead-digit>) | num(<digit>) | num(<integer>) | num(<digits>) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | sqrt(-900) | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | ... | 0 | 0 | 0 | 0 | -900.00 | 900.00 | 9.0 | 0.0 | 900.0 | 0.0 |
1 | sin(24) | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 24.00 | 24.00 | 2.0 | 4.0 | 24.0 | 4.0 |
2 | cos(-3.14) | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | ... | 0 | 0 | 0 | 0 | -3.14 | 3.14 | NaN | 4.0 | 3.0 | 14.0 |
3 rows × 47 columns
# TODO: handle multiple trees
def compute_feature_values(sample: str, grammar: Grammar, features: List[Feature]) -> Dict[str, float]:
'''
Extracts all feature values from an input.
sample : The input.
grammar : The input grammar.
features : The list of input features extracted from the grammar.
'''
earley = EarleyParser(CALC_GRAMMAR)
features = {}
for tree in earley.parse(sample):
for feature in extract_all_features(CALC_GRAMMAR):
features[feature.name_rep()] = feature.get_feature_value(tree)
return features
all_features = extract_all_features(CALC_GRAMMAR)
for sample in sample_list:
print(f"Features of {sample}:")
features = compute_feature_values(sample, CALC_GRAMMAR, all_features)
for feature, value in features.items():
print(f" {feature}: {value}")
Features of sqrt(-900): exists(<start>): 1 exists(<start> == <function>(<term>)): 1 exists(<function>): 1 exists(<function> == sqrt): 1 exists(<function> == tan): 0 exists(<function> == cos): 0 exists(<function> == sin): 0 exists(<term>): 1 exists(<term> == -<value>): 1 exists(<term> == <value>): 0 exists(<value>): 1 exists(<value> == <integer>.<digits>): 0 exists(<value> == <integer>): 1 exists(<integer>): 1 exists(<integer> == <lead-digit><digits>): 1 exists(<integer> == <digit>): 0 exists(<digits>): 1 exists(<digits> == <digit><digits>): 1 exists(<digits> == <digit>): 1 exists(<lead-digit>): 1 exists(<lead-digit> == 1): 0 exists(<lead-digit> == 2): 0 exists(<lead-digit> == 3): 0 exists(<lead-digit> == 4): 0 exists(<lead-digit> == 5): 0 exists(<lead-digit> == 6): 0 exists(<lead-digit> == 7): 0 exists(<lead-digit> == 8): 0 exists(<lead-digit> == 9): 1 exists(<digit>): 1 exists(<digit> == 0): 1 exists(<digit> == 1): 0 exists(<digit> == 2): 0 exists(<digit> == 3): 0 exists(<digit> == 4): 0 exists(<digit> == 5): 0 exists(<digit> == 6): 0 exists(<digit> == 7): 0 exists(<digit> == 8): 0 exists(<digit> == 9): 0 num(<term>): -900.0 num(<value>): 900.0 num(<lead-digit>): 9.0 num(<digit>): 0.0 num(<integer>): 900.0 num(<digits>): 0.0 Features of sin(24): exists(<start>): 1 exists(<start> == <function>(<term>)): 1 exists(<function>): 1 exists(<function> == sqrt): 0 exists(<function> == tan): 0 exists(<function> == cos): 0 exists(<function> == sin): 1 exists(<term>): 1 exists(<term> == -<value>): 0 exists(<term> == <value>): 1 exists(<value>): 1 exists(<value> == <integer>.<digits>): 0 exists(<value> == <integer>): 1 exists(<integer>): 1 exists(<integer> == <lead-digit><digits>): 1 exists(<integer> == <digit>): 0 exists(<digits>): 1 exists(<digits> == <digit><digits>): 0 exists(<digits> == <digit>): 1 exists(<lead-digit>): 1 exists(<lead-digit> == 1): 0 exists(<lead-digit> == 2): 1 exists(<lead-digit> == 3): 0 exists(<lead-digit> == 4): 0 exists(<lead-digit> == 5): 0 exists(<lead-digit> == 6): 0 exists(<lead-digit> == 7): 0 exists(<lead-digit> == 8): 0 exists(<lead-digit> == 9): 0 exists(<digit>): 1 exists(<digit> == 0): 0 exists(<digit> == 1): 0 exists(<digit> == 2): 0 exists(<digit> == 3): 0 exists(<digit> == 4): 1 exists(<digit> == 5): 0 exists(<digit> == 6): 0 exists(<digit> == 7): 0 exists(<digit> == 8): 0 exists(<digit> == 9): 0 num(<term>): 24.0 num(<value>): 24.0 num(<lead-digit>): 2.0 num(<digit>): 4.0 num(<integer>): 24.0 num(<digits>): 4.0 Features of cos(-3.14): exists(<start>): 1 exists(<start> == <function>(<term>)): 1 exists(<function>): 1 exists(<function> == sqrt): 0 exists(<function> == tan): 0 exists(<function> == cos): 1 exists(<function> == sin): 0 exists(<term>): 1 exists(<term> == -<value>): 1 exists(<term> == <value>): 0 exists(<value>): 1 exists(<value> == <integer>.<digits>): 1 exists(<value> == <integer>): 0 exists(<integer>): 1 exists(<integer> == <lead-digit><digits>): 0 exists(<integer> == <digit>): 1 exists(<digits>): 1 exists(<digits> == <digit><digits>): 1 exists(<digits> == <digit>): 1 exists(<lead-digit>): 0 exists(<lead-digit> == 1): 0 exists(<lead-digit> == 2): 0 exists(<lead-digit> == 3): 0 exists(<lead-digit> == 4): 0 exists(<lead-digit> == 5): 0 exists(<lead-digit> == 6): 0 exists(<lead-digit> == 7): 0 exists(<lead-digit> == 8): 0 exists(<lead-digit> == 9): 0 exists(<digit>): 1 exists(<digit> == 0): 0 exists(<digit> == 1): 1 exists(<digit> == 2): 0 exists(<digit> == 3): 1 exists(<digit> == 4): 1 exists(<digit> == 5): 0 exists(<digit> == 6): 0 exists(<digit> == 7): 0 exists(<digit> == 8): 0 exists(<digit> == 9): 0 num(<term>): -3.14 num(<value>): 3.14 num(<lead-digit>): nan num(<digit>): 4.0 num(<integer>): 3.0 num(<digits>): 14.0
Step 2: Train Classification Model¶
Now that we have all the input features and the test outcomes, we can start training a machine learner from these. Although other machine learning models have much higher accuracy, we use decision trees as machine learning models because they are easy to interpret by humans. This is crucial as it will be these very same humans that have to fix the code.
Before we start with our actual implementation, let us first illustrate how training such a classifier works, again using our calculator as an example.
Decision Trees¶
We will use scikit-learn
as the machine learning library.
The DecisionTreeClassifier
can then learn the syntactical input features that are responsible for the bug-triggering behavior of our Calculator.
First, we transform the individual input features (represented as Python dictionaries) into a NumPy array.
For this example, we use the following four features (function-sqrt
, function-cos
, function-sin
, number
) to describe an input feature.
(Please note that this is an extremely reduced example; this is not the complete list of features that should be extracted from the CALC_GRAMMAR
Grammar.)
The features function-sqrt
, function-cos
, function-sin
state whether the function sqrt, cos, or sin was used.
A 1
is given if the sample contains the respective function, otherwise the feature contains a 0
.
For each <function>(x)
, the number
feature describes which value was used for x
. For instance, the first input sqrt(-900)
corresponds to 'function-sqrt': 1 and 'number': -900.
# Features for each input, one dict per input
features = [
{'function-sqrt': 1, 'function-cos': 0, 'function-sin': 0, 'number': -900}, # sqrt(-900)
{'function-sqrt': 0, 'function-cos': 1, 'function-sin': 0, 'number': 300}, # cos(300)
{'function-sqrt': 1, 'function-cos': 0, 'function-sin': 0, 'number': -1}, # sqrt(-1)
{'function-sqrt': 0, 'function-cos': 1, 'function-sin': 0, 'number': -10}, # cos(-10)
{'function-sqrt': 0, 'function-cos': 0, 'function-sin': 1, 'number': 36}, # sin(36)
{'function-sqrt': 0, 'function-cos': 0, 'function-sin': 1, 'number': -58}, # sin(-58)
{'function-sqrt': 1, 'function-cos': 0, 'function-sin': 0, 'number': 27}, # sqrt(27)
]
We define a list of labels (or oracles) that state whether the specific input file resulted in a bug or not. We use the OracleResult
-Class to keep everything tidy and clean.
# Labels for each input
oracle = [
OracleResult.BUG,
OracleResult.NO_BUG,
OracleResult.BUG,
OracleResult.NO_BUG,
OracleResult.NO_BUG,
OracleResult.NO_BUG,
OracleResult.NO_BUG
]
# Transform to numpy array
vec = DictVectorizer()
X = vec.fit_transform(features).toarray()
Using the feature array and labels, we can now train a decision tree classifier as follows:
# Fix the random state to produce a deterministic result (for illustration purposes only)
clf = DecisionTreeClassifier(random_state=10)
# sci-kit learn requires an array of strings
oracle_clean = [str(c) for c in oracle]
clf = clf.fit(X, oracle_clean)
Let's have a look at the learned decision tree:
def show_decision_tree(clf, feature_names):
dot_data = sklearn.tree.export_graphviz(clf, out_file=None,
feature_names=feature_names,
class_names=["BUG", "NO_BUG"],
filled=True, rounded=True)
return graphviz.Source(dot_data)
show_decision_tree(clf, vec.get_feature_names_out())
Here is a much reduced textual variant, still retaining the essential features:
def friendly_decision_tree(clf, feature_names,
class_names = ['NO_BUG', 'BUG'],
indent=0):
def _tree(index, indent):
s = ""
feature = clf.tree_.feature[index]
feature_name = feature_names[feature]
threshold = clf.tree_.threshold[index]
value = clf.tree_.value[index]
class_ = int(value[0][0])
class_name = class_names[class_]
left = clf.tree_.children_left[index]
right = clf.tree_.children_right[index]
if left == right:
# Leaf node
s += " " * indent + class_name + "\n"
else:
if math.isclose(threshold, 0.5):
s += " " * indent + f"if {feature_name}:\n"
s += _tree(right, indent + 2)
s += " " * indent + f"else:\n"
s += _tree(left, indent + 2)
else:
s += " " * indent + f"if {feature_name} <= {threshold:.4f}:\n"
s += _tree(left, indent + 2)
s += " " * indent + f"else:\n"
s += _tree(right, indent + 2)
return s
ROOT_INDEX = 0
return _tree(ROOT_INDEX, indent)
print(friendly_decision_tree(clf, vec.get_feature_names_out()))
if function-sqrt: if number <= 13.0000: BUG else: NO_BUG else: NO_BUG
We can see that our initial hypothesis is that the feature function-sqrt
must be greater than 0.5 (i.e., present) and the feature number
must be less or equal than 13 in order to produce a bug. The decision rule is not yet perfect, thus we need to refine our decision tree!
Learning a Decision Tree¶
For Alhazen's second step (Train Classification Model), we write a function train_tree(data)
that trains a decision tree on a given data frame:
def train_tree(data: pandas.core.frame.DataFrame) -> sklearn.tree._classes.DecisionTreeClassifier
The function requires the following parameter:
- data: a
pandas
data frame containing the parsed and extracted features and the outcome of the executed input sample (oracle).
For instance, the data frame may look similar to this:
feature_1 | feature_2 | ... | oracle |
---|---|---|---|
1 | 0 | ... | 'BUG' |
0 | 1 | ... | 'NO_BUG' |
Note: Each row of data['oracle']
is of type OracleResult
.
However, sci-kit learn requires an array of strings.
We have to convert them to learn the decision tree.
OUTPUT: the function returns a learned decision tree of type _sklearn.tree._classes.DecisionTreeClassifier_
.
def train_tree(data):
sample_bug_count = len(data[(data["oracle"].astype(str) == "BUG")])
assert sample_bug_count > 0, "No bug samples found"
sample_count = len(data)
clf = DecisionTreeClassifier(min_samples_leaf=1,
min_samples_split=2, # minimal value
max_features=None,
max_depth=5, # max depth of the decision tree
class_weight={str("BUG"): (1.0/sample_bug_count),
str("NO_BUG"):
(1.0/(sample_count - sample_bug_count))})
clf = clf.fit(data.drop('oracle', axis=1), data['oracle'].astype(str))
# MARTIN: This is optional, but is a nice extesion that results in nicer decision trees
# clf = treetools.remove_infeasible(clf, features)
return clf
Step 3: Extract Feature Requirements¶
In this section, we will extract the learned features from the decision tree. Again, let us first test this manually on our calculator example.
# Features for each input, one dict per input
features = [
{'function-sqrt': 1, 'function-cos': 0, 'function-sin': 0, 'number': -900},
{'function-sqrt': 0, 'function-cos': 1, 'function-sin': 0, 'number': 300},
{'function-sqrt': 1, 'function-cos': 0, 'function-sin': 0, 'number': -1},
{'function-sqrt': 0, 'function-cos': 1, 'function-sin': 0, 'number': -10},
{'function-sqrt': 0, 'function-cos': 0, 'function-sin': 1, 'number': 36},
{'function-sqrt': 0, 'function-cos': 0, 'function-sin': 1, 'number': -58},
{'function-sqrt': 1, 'function-cos': 0, 'function-sin': 0, 'number': 27},
]
# Labels for each input
oracle = [
"BUG",
"NO_BUG",
"BUG",
"NO_BUG",
"NO_BUG",
"NO_BUG",
"NO_BUG"
]
# We can use the sklearn DictVectorizer to transform the features to numpy array:
# Notice: Use the correct labeling of the feature_names
# vec = DictVectorizer()
# X_vec = vec.fit_transform(features).toarray()
# feature_names = vec.get_feature_names_out()
# We can also use a pandas DataFrame and directly parse it to the decision tree learner
feature_names = ['function-sqrt', 'function-cos', 'function-sin', 'number']
X_data = pandas.DataFrame.from_records(features)
# Fix the random state to produce a deterministic result (for illustration purposes only)
clf = DecisionTreeClassifier(random_state=10)
# Train with DictVectorizer
# **Note:** The sklearn `DictVectorizer` uses an internal sort function as default. This will result in different feature_name indices. If you want to use the `Dictvectorizer` please ensure that you only access the feature_names with the function `vec.get_feature_names_out()`.
# We recommend that you use the `pandas` data frame, since this is also the format used in the feedback loop.
# clf = clf.fit(X_vec, oracle)
# Train with Pandas Dataframe
clf = clf.fit(X_data, oracle)
dot_data = sklearn.tree.export_graphviz(clf, out_file=None,
feature_names=feature_names,
class_names=["BUG", "NO BUG"],
filled=True, rounded=True)
graph = graphviz.Source(dot_data)
graph
print(friendly_decision_tree(clf, feature_names, class_names = ['NO_BUG', 'BUG']))
if function-sqrt: if number <= 13.0000: BUG else: NO_BUG else: NO_BUG
Step 4: Generating New Samples¶
The next step is to generate new samples. For this purpose, we negate the requirements on a path to refine and refute the decision tree.
Negating Requirements¶
First we will determine some boundaries to obtain better path negations.
x = pandas.DataFrame.from_records(features)
bounds = pandas.DataFrame([{'feature': c, 'min': x[c].min(), 'max': x[c].max()}
for c in feature_names],
columns=['feature', 'min', 'max']).set_index(['feature']).transpose()
We can use the function path.get(i).get_neg_ext(bounds)
to obtain a negation for a single requirement on a path (indexed with i
).
Let's verify if we can negate a whole path.
for count, path in enumerate(all_paths):
negated_string_path = path.get(0).get_neg_ext(bounds)[0]
for box_ in range(1, len(path)):
negated_string_path += " " + str(path.get(box_).get_neg_ext(bounds)[0])
print(f"Path {count}: {negated_string_path}, is_bug: {path.is_bug()}")
Path 0: function-sqrt > 0.5, is_bug: False Path 1: function-sqrt <= 0.5 number > 13.0, is_bug: True Path 2: function-sqrt <= 0.5 number <= 13.0, is_bug: False
Systematically Negating Paths¶
We will use the Decision tree and extract new input specifications to refine or refute our hypothesis (See Section 4.1 "Extracting Prediction Paths" in \cite{Kampmann2020}). These input specifications will be parsed to the input generator that tries to generate new inputs that fulfill the defined input specifications.
def extracting_prediction_paths(clf, feature_names, data):
# determine the bounds
bounds = pandas.DataFrame([{'feature': c, 'min': data[c].min(), 'max': data[c].max()}
for c in feature_names],
columns=['feature', 'min', 'max']).set_index(['feature']).transpose()
# go through tree leaf by leaf
all_reqs = set()
for path in tree_to_paths(clf, feature_names):
# generate conditions
for i in range(0, len(path)+1):
reqs_list = []
bins = format(i, "#0{}b".format(len(path)+2))[2:]
for p, b in zip(range(0, len(bins)), bins):
r = path.get(p)
if '1' == b:
reqs_list.append(r.get_neg_ext(bounds))
else:
reqs_list.append([r.get_str_ext()])
for reqs in all_combinations(reqs_list):
all_reqs.add(", ".join(sorted(reqs)))
return all_reqs
def all_combinations(reqs_lists):
result = [[]]
for reqs in reqs_lists:
t = []
for r in reqs:
for i in result:
t.append(i+[r])
result = t
return result
We will use the Decision tree and extract new input specifications to refine or refute our hypothesis (See paper Section 4.1 - Extracting Prediction Paths). These input specifications will be parsed to the input generator that tries to generate new inputs that fulfill the defined input specifications.
new_prediction_paths = extracting_prediction_paths(clf, feature_names, data=x)
for path in new_prediction_paths:
print(path)
function-sqrt > 0.5, number > 13.0 function-sqrt > 0.5, number <= 13.0 function-sqrt <= 0.5 function-sqrt > 0.5 function-sqrt <= 0.5, number <= 13.0 function-sqrt <= 0.5, number > 13.0
Input Specification Parser¶
Once we have input specifications, we must again extract them from the decision tree so we can interpret them.
SPEC_GRAMMAR: Grammar = {
"<start>":
["<req_list>"],
"<req_list>":
["<req>", "<req>"", ""<req_list>"],
"<req>":
["<feature>"" ""<quant>"" ""<num>"],
"<feature>": ["exists(<string>)",
"num(<string>)",
# currently not used
"char(<string>)",
"length(<string>)"],
"<quant>":
["<", ">", "<=", ">="],
"<num>": ["-<value>", "<value>"],
"<value>":
["<integer>.<integer>",
"<integer>"],
"<integer>":
["<digit><integer>", "<digit>"],
"<digit>":
["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"],
'<string>': ['<letters>'],
'<letters>': ['<letter><letters>', '<letter>'],
'<letter>': list(string.ascii_letters + string.digits + string.punctuation)
}
assert is_valid_grammar(SPEC_GRAMMAR)
Retrieving New input Specifications¶
The following classes represent requirements for the test cases to be generated.
class SpecRequirement:
'''
This class represents a requirement for a new input sample that should be generated.
This class contains the feature that should be fullfiled (Feature), a quantifier
("<", ">", "<=", ">=") and a value. For instance exist(feature) >= 0.5 states that
the syntactical existence feature should be used to produce a new input.
feature : Is the associated feature class
quant : The quantifier
value : The value of the requirement. Note that for existence features this value
is allways between 0 and 1.
'''
def __init__(self, feature: Feature, quantificator, value):
self.feature: Feature = feature
self.quant = quantificator
self.value = value
def __str__(self):
return f"Requirement({self.feature.name} {self.quant} {self.value})"
def __repr__(self):
return f"Requirement({self.feature.name}, {self.quant}, {self.value})"
def friendly(self):
def value(x):
try:
return float(x)
except Exception:
return None
if isinstance(self.feature, ExistenceFeature):
if value(self.value) > 0:
return f"{self.feature.friendly_name()}"
elif value(self.value) < 0:
return f"not {self.feature.friendly_name()}"
return f"{self.feature.friendly_name()} {self.quant} {self.value}"
class InputSpecification:
'''
This class represents a complete input specification of a new input. A input specification
consists of one or more requirements.
requirements : Is a list of all requirements that must be used.
'''
def __init__(self, requirements: List[SpecRequirement]):
self.requirements: List[SpecRequirement] = requirements
def __str__(self):
s = ", ".join(str(r) for r in self.requirements)
return f"InputSpecification({s})"
def friendly(self):
return " and ".join(r.friendly() for r in self.requirements)
def __repr__(self):
return self.__str__()
def get_all_subtrees(derivation_tree, non_terminal):
'''
Iteratively returns a list of subtrees that start with a given non_terminal.
'''
subtrees = []
(node, children) = derivation_tree
if node == non_terminal:
subtrees.append(derivation_tree)
for child in children:
subtrees = subtrees + get_all_subtrees(child, non_terminal)
return subtrees
def create_new_input_specification(derivation_tree, all_features) -> InputSpecification:
'''
This function creates a new input specification for a parsed decision tree path.
The input derivation_tree corresponds to a already negated path in the decision tree.
'''
requirement_list = []
for req in get_all_subtrees(derivation_tree, '<req>'):
feature_name = tree_to_string(get_all_subtrees(req, '<feature>')[0])
quant = tree_to_string(get_all_subtrees(req, '<quant>')[0])
value = tree_to_string(get_all_subtrees(req, '<num>')[0])
feature_class = None
for f in all_features:
if f.name == feature_name:
feature_class = f
requirement_list.append(SpecRequirement(feature_class, quant, value))
return InputSpecification(requirement_list)
def get_all_input_specifications(dec_tree,
all_features: List[Feature],
feature_names: List[str],
data) -> List[InputSpecification]:
'''
Returns a complete list new input specification that were extracted from a learned decision tree.
INPUT:
- dec_tree : The learned decision tree.
- all_features : A list of all features
- feature_names : The list of the feature names (feature.name)
- data. : The data that was used to learn the decision tree
OUTPUT:
- Returns a list of InputSpecifications
'''
prediction_paths = extracting_prediction_paths(dec_tree, feature_names, data)
input_specifications = []
# parse all extracted paths
for r in prediction_paths:
earley = EarleyParser(SPEC_GRAMMAR)
try:
for tree in earley.parse(r):
input_specifications.append(create_new_input_specification(tree, all_features))
except SyntaxError:
# Catch Parsing Syntax Errors: num(<term>) in [-900, 0] will fail; Might fix later
# For now, inputs following that form will be ignored
pass
return input_specifications
We implement a Grammar-Based Input Generator that generates new input samples from a List of InputSpecifications
.
The input specifications are extracted from the decision tree boundaries in the previous Step 3: RequirementExtraction
.
An InputSpecification
consists of 1 to n many predicates or requirements (e.g. <feature> >= value
, or num(<term>) <= 13
).
We generate a new input for each InputSpecification
.
The new input fulfills all the given requirements of an InputSpecification
.
For further details, please refer to Section 4.4 and 4.5 of \cite{Kampmann2020} and the Chapter Efficient Grammar Fuzzing in the Fuzzing Book.
We define a function generate_samples()
with the following input parameters:
grammar
: the grammar used to produce new inputs (e.g. the CALCULATOR-Grammar)new_input_specification
: a List of new inputs specifications (typeList[InputSpecification]
)timeout
: a max time budget. Return the generated inputs when the time budget is exceeded.
The function returns a list of new inputs that are specified by the given input specifications.
def best_trees(forest, spec):
samples = [tree_to_string(tree) for tree in forest]
fulfilled_fractions= []
for sample in samples:
gen_features = collect_features([sample], CALC_GRAMMAR)
# calculate percentage of fulfilled requirements (used to rank the sample)
fulfilled_count = 0
total_count = len(spec.requirements)
for req in spec.requirements:
# for now, interpret requirement(exists(<...>) <= number) as false and requirement(exists(<...>) > number) as true
if isinstance(req.feature, ExistenceFeature):
expected = 1.0 if req.quant == '>' or req.quant == '>=' else 0.0
actual = gen_features[req.feature.name][0]
if actual == expected:
fulfilled_count += 1
else:
pass
# print(f'{req.feature} expected: {expected}, actual:{actual}')
elif isinstance(req.feature, NumericInterpretation):
expected_value = float(req.value)
actual_value = gen_features[req.feature.name][0]
fulfilled = False
if req.quant == '<':
fulfilled = actual_value < expected_value
elif req.quant == '<=':
fulfilled = actual_value <= expected_value
elif req.quant == '>':
fulfilled = actual_value > expected_value
elif req.quant == '>=':
fulfilled = actual_value >= expected_value
if fulfilled:
fulfilled_count += 1
else:
pass
# print(f'{req.feature} expected: {expected_value}, actual:{actual_value}')
fulfilled_fractions.append(fulfilled_count / total_count)
# print(f'Fraction of fulfilled requirements: {fulfilled_count / total_count}')
max_frac = max(fulfilled_fractions)
best_chosen = []
if max_frac == 1.0:
return True, forest[fulfilled_fractions.index(1.0)]
for i, t in enumerate(forest):
if fulfilled_fractions[i] == max_frac:
best_chosen.append(t)
return False, best_chosen
# well, not perfect and probably not very robust. but it works :)
def generate_samples_advanced(grammar: Grammar,
new_input_specifications: List[InputSpecification],
timeout: int) -> List[str]:
# if there are no input specifications: generate some random samples
if len(new_input_specifications) == 0:
fuzzer = GrammarFuzzer(grammar)
samples = [fuzzer.fuzz() for _ in range(100)]
return samples
final_samples = []
each_spec_timeout = timeout / len(new_input_specifications)
rhs_nonterminals = grammar.keys()# list(chain(*[nonterminals(expansion) for expansion in grammar[rule]]))
fuzzer = GrammarFuzzer(grammar)
for spec in new_input_specifications:
done = False
starttime = time.time()
best_chosen = [fuzzer.fuzz_tree() for _ in range(100)]
done, best_chosen = best_trees(best_chosen, spec)
if done:
final_samples.append(tree_to_string(best_chosen))
while not done and time.time() - starttime < each_spec_timeout:
# split in prefix, postfix and try to reach targets
for tree in best_chosen:
prefix_len = random.randint(1, 3)
curr = tree
valid = True
for i in range(prefix_len):
nt, children = curr
poss_desc_idxs = []
for c_idx, c in enumerate(children):
s, _ = c
possible_descend = s in rhs_nonterminals
if possible_descend:
poss_desc_idxs.append(c_idx)
if len(poss_desc_idxs) < 1:
valid = False
break
desc = random.randint(0, len(poss_desc_idxs) - 1)
curr = children[poss_desc_idxs[desc]]
if valid:
nt, _ = curr
for req in spec.requirements:
if isinstance(req.feature, NumericInterpretation) and nt == req.feature.key:
# hacky: generate a derivation tree for this numeric interpretation
hacky_grammar = copy.deepcopy(grammar)
hacky_grammar["<start>"] = [nt]
parser = EarleyParser(hacky_grammar)
try:
test = parser.parse(req.value)
x = list(test)[0]
_, s = x
# print(str(s[0]))
# replace curr in tree with this new tree
curr = s[0]
except SyntaxError:
pass
done, best_chosen = best_trees(best_chosen, spec)
if done:
final_samples.append(tree_to_string(best_chosen))
if not done:
final_samples.extend([tree_to_string(t) for t in best_chosen])
return final_samples
Here's another interesting generator function:
def generate_samples_random(grammar, new_input_specifications, num):
f = GrammarFuzzer(grammar ,max_nonterminals=50, log=False)
data = []
for _ in range(num):
new_input = f.fuzz()
data.append(new_input)
return data
generate_samples = generate_samples_advanced
Step 5: Executing New Inputs¶
We are almost done! All that is left is putting the above pieces together and create a loop around them.
The Alhazen Class¶
We implement a class Alhazen
that serves as main entry point for our approach.
class Alhazen:
def __init__(self,
runner: Any,
grammar: Grammar,
initial_inputs: List[str], /,
verbose: bool = False,
max_iterations: int = 10,
generator_timeout: int = 10):
self._initial_inputs = initial_inputs
self._runner = runner
self._grammar = grammar
self._verbose = verbose
self._max_iter = max_iterations
self._previous_samples = None
self._data = None
self._trees = []
self._generator_timeout = generator_timeout
self._setup()
class Alhazen(Alhazen):
def _setup(self):
self._previous_samples = self._initial_inputs
self._all_features = extract_all_features(self._grammar)
self._feature_names = [f.name for f in self._all_features]
if self._verbose:
print("Features:", ", ".join(f.friendly_name()
for f in self._all_features))
class Alhazen(Alhazen):
def _add_new_data(self, exec_data, feature_data):
joined_data = exec_data.join(feature_data.drop(['sample'], axis=1))
# Only add valid data
new_data = joined_data[(joined_data['oracle'] != OracleResult.UNDEF)]
new_data = joined_data.drop(joined_data[joined_data.oracle.astype(str) == "UNDEF"].index)
if 0 != len(new_data):
if self._data is None:
self._data = new_data
else:
self._data = pandas.concat([self._data, new_data], sort=False)
class Alhazen(Alhazen):
def execute_samples(self, sample_list = None):
if sample_list is None:
sample_list = self._initial_inputs
data = []
for sample in sample_list:
result = self._runner(sample)
data.append({"oracle": result })
return pandas.DataFrame.from_records(data)
class Alhazen(Alhazen):
def run(self):
for iteration in range(1, self._max_iter + 1):
if self._verbose:
print(f"\nIteration #{iteration}")
self._iterate(self._previous_samples)
class Alhazen(Alhazen):
def all_trees(self, /, prune: bool = True):
trees = self._trees
if prune:
trees = [remove_unequal_decisions(tree) for tree in self._trees]
return trees
def last_tree(self, /, prune: bool = True):
return self.all_trees(prune=prune)[-1]
class Alhazen(Alhazen):
def _iterate(self, sample_list):
# Run samples, obtain test outcomes
exec_data = self.execute_samples(sample_list)
# Step 1: Extract features from the new samples
feature_data = collect_features(sample_list, self._grammar)
# Combine the new data with the already existing data
self._add_new_data(exec_data, feature_data)
# display(self._data)
# Step 2: Train the Decision Tree Classifier
dec_tree = train_tree(self._data)
self._trees.append(dec_tree)
if self._verbose:
print(" Decision Tree:")
all_features = extract_all_features(self._grammar)
all_feature_names = [f.friendly_name() for f in all_features]
print(friendly_decision_tree(dec_tree, all_feature_names, indent=4))
# Step 3: Extract new requirements from the tree
new_input_specifications = get_all_input_specifications(dec_tree,
self._all_features,
self._feature_names,
self._data.drop(['oracle'], axis=1))
if self._verbose:
print(f" New input specifications:")
for spec in new_input_specifications:
print(f" {spec.friendly()}")
# Step 4: Generate new inputs according to the new input specifications
new_samples = generate_samples(self._grammar,
new_input_specifications,
self._generator_timeout)
if self._verbose:
print(f" New samples:")
print(f" {', '.join(new_samples)}")
self._previous_samples = new_samples
class Alhazen(Alhazen):
def all_feature_names(self, friendly: bool = True) -> List[str]:
if friendly:
all_feature_names = [f.friendly_name() for f in self._all_features]
else:
all_feature_names = [f.name for f in self._all_features]
return all_feature_names
class Alhazen(Alhazen):
def show_decision_tree(self, tree = None, friendly: bool = True):
return show_decision_tree(tree or self.last_tree(),
self.all_feature_names())
class Alhazen(Alhazen):
def friendly_decision_tree(self, tree = None):
return friendly_decision_tree(tree or self.last_tree(),
self.all_feature_names())
A Sample Run¶
We can finally run Alhazen!
Set the number of refinement iterations and the timeout for the input generator. The execution time of Alhazen mainly depends on the number of iterations.
MAX_ITERATIONS = 20
GENERATOR_TIMEOUT = 10 # timeout in seconds
We initialize Alhazen with the previously used initial_sample_list
:
initial_sample_list
['sqrt(-16)', 'sqrt(4)']
And here we go! When initialized with verbose=True
, Alhazen prints its progress during execution, issuing for each iteration
- the last decision tree
- the new input specification resulting from the tree
- the new samples satisfying the input specification.
alhazen = Alhazen(sample_runner, CALC_GRAMMAR, initial_sample_list,
verbose=True,
max_iterations=MAX_ITERATIONS,
generator_timeout=GENERATOR_TIMEOUT)
alhazen.run()
Features: exists(<start>), <start> == '<function>(<term>)', exists(<function>), <function> == 'sqrt', <function> == 'tan', <function> == 'cos', <function> == 'sin', exists(<term>), <term> == '-<value>', <term> == '<value>', exists(<value>), <value> == '<integer>.<digits>', <value> == '<integer>', exists(<integer>), <integer> == '<lead-digit><digits>', <integer> == '<digit>', exists(<digits>), <digits> == '<digit><digits>', <digits> == '<digit>', exists(<lead-digit>), <lead-digit> == '1', <lead-digit> == '2', <lead-digit> == '3', <lead-digit> == '4', <lead-digit> == '5', <lead-digit> == '6', <lead-digit> == '7', <lead-digit> == '8', <lead-digit> == '9', exists(<digit>), <digit> == '0', <digit> == '1', <digit> == '2', <digit> == '3', <digit> == '4', <digit> == '5', <digit> == '6', <digit> == '7', <digit> == '8', <digit> == '9', <term>, <value>, <lead-digit>, <digit>, <integer>, <digits> Iteration #1 Decision Tree: if <digits> <= nan: NO_BUG else: BUG New input specifications: New samples: tan(85), cos(-163), sin(-27.60), sqrt(-5), tan(2046.977), sin(-3.11), sqrt(-19.982), sqrt(-8.67), cos(-275), cos(3863), sqrt(-0.01), cos(-8.62380), cos(8), tan(2), cos(31.9), sin(12.9), tan(6.91), cos(-43.9), cos(-9), sin(-2.0), cos(-9.793), sin(761.0), cos(-312), sin(-216.2), sqrt(-1.6), cos(3), sqrt(-488), cos(6.65), sin(46.8), tan(-798), cos(-6.82), sin(19.50), cos(-8), sqrt(-7.3), sin(3.84), tan(-57.276), tan(99), tan(611), sin(9), cos(-1), sin(-5.0), tan(-48.5), tan(482235.4), sin(3368), sin(19.3), sqrt(5), tan(2), tan(-7), sqrt(-6.649), cos(9.9730), tan(-2), sqrt(3966), cos(55), sin(-3), sqrt(752.7), cos(8.0013), cos(-36.29), cos(-611), cos(5), sqrt(-1.014), tan(-5), tan(-38.6), tan(-70), sin(-9), cos(-48), cos(1893), cos(99.83), tan(-7.80), tan(327.39), cos(-35708), cos(-987.85), tan(-15.3), tan(-8785596.0), tan(-4), sqrt(9.283), tan(3.95), tan(-6), sqrt(-20.0), cos(-66.319), tan(0), sin(-4), sin(7.1), cos(-5), sin(-33), tan(-6.46058), sin(-9.6), sin(-5831.1), sqrt(49), tan(5.7), sqrt(-16.43), sin(1), tan(6.556), cos(-4), cos(-7962), tan(102), tan(291.7), sqrt(-64.4), tan(23942.0182), tan(-8.4348), cos(-9) Iteration #2 Decision Tree: if <lead-digit> <= 2.5000: if <function> == 'sqrt': BUG else: NO_BUG else: NO_BUG New input specifications: <function> == 'sqrt' and <lead-digit> <= 2.5 <function> == 'sqrt' and <lead-digit> > 2.5 <function> == 'sqrt' and <lead-digit> <= 2.5 <function> == 'sqrt' and <lead-digit> > 2.5 <lead-digit> > 2.5 <lead-digit> <= 2.5
New samples: cos(24), sqrt(85.93), sqrt(-24), cos(6726.274), cos(-3424.3), sin(243) Iteration #3 Decision Tree: if <lead-digit> <= 2.5000: if <function> == 'sqrt': BUG else: if <function> == 'tan': NO_BUG else: NO_BUG else: NO_BUG New input specifications: <function> == 'sqrt' and <function> == 'tan' and <lead-digit> > 2.5 <function> == 'sqrt' and <function> == 'tan' and <lead-digit> <= 2.5 <function> == 'sqrt' and <lead-digit> <= 2.5 <function> == 'sqrt' and <function> == 'tan' and <lead-digit> <= 2.5 <function> == 'sqrt' and <lead-digit> <= 2.5 <function> == 'sqrt' and <lead-digit> > 2.5 <lead-digit> > 2.5 <function> == 'sqrt' and <function> == 'tan' and <lead-digit> > 2.5 <lead-digit> <= 2.5
New samples: tan(-97.8), tan(-23), sin(15.1), cos(13), sqrt(281), sqrt(-422.2), cos(-637), sin(-81), tan(17.3367) Iteration #4 Decision Tree: if <function> == 'sqrt': if <lead-digit> <= 2.5000: if <digit> == '1': NO_BUG else: BUG else: NO_BUG else: NO_BUG New input specifications: <function> == 'sqrt' and <lead-digit> > 2.5 <function> == 'sqrt' and <lead-digit> <= 2.5 <function> == 'sqrt' and <lead-digit> > 2.5 <digit> == '1' and <function> == 'sqrt' and <lead-digit> > 2.5 <function> == 'sqrt' <function> == 'sqrt' <digit> == '1' and <function> == 'sqrt' and <lead-digit> <= 2.5 <digit> == '1' and <function> == 'sqrt' and <lead-digit> > 2.5 <digit> == '1' and <function> == 'sqrt' and <lead-digit> <= 2.5
New samples: sqrt(4307), sqrt(208), cos(-989), sqrt(-8047662286.62), sqrt(7.9), sin(0), sqrt(294.7), sqrt(44.1), sqrt(11.7) Iteration #5 Decision Tree: if <lead-digit> <= 2.5000: if <function> == 'sqrt': if <term> == '-<value>': BUG else: NO_BUG else: NO_BUG else: NO_BUG New input specifications: <function> == 'sqrt' and <lead-digit> <= 2.5 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> <= 2.5 <function> == 'sqrt' and <lead-digit> <= 2.5 <function> == 'sqrt' and <lead-digit> > 2.5 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> <= 2.5 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> > 2.5 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> > 2.5 <lead-digit> > 2.5 <lead-digit> <= 2.5
New samples: cos(-142.8), sqrt(37.81), sin(20.9), tan(1818), sqrt(-214.433), sqrt(52.097), sqrt(8.1), sqrt(2.02), sqrt(68), tan(20), sqrt(8), sqrt(66.25), sqrt(4.87), sqrt(9), sqrt(1), sqrt(97.26), sin(12), sqrt(91), cos(224.07), sqrt(47.2843), sqrt(85.51), sqrt(63.568), sqrt(78.79), sin(21), sqrt(84), sqrt(966.64), cos(2110486864), cos(24), tan(22692.3), sqrt(0), sqrt(-2), tan(-268.8), sqrt(941.4), sin(1325), sqrt(-37), cos(136), tan(-136.2), cos(-145396), sqrt(-920), sqrt(-8.197), sqrt(3), sin(1959.68), sqrt(-6.35), sqrt(5.9), cos(-29), sqrt(-2.0), sqrt(6), sqrt(-61.61), sqrt(72.097906), sqrt(-6.98876), sqrt(-64.82749), tan(-21.3), tan(22), sqrt(-64009.27), tan(-192), sqrt(95.80), sqrt(8.48), sqrt(85.3514), sqrt(7), sqrt(-596), tan(-25), sin(98), sqrt(-187), sqrt(62.4), sqrt(-830), cos(-878.2), sin(-19.70) Iteration #6 Decision Tree: if <term> <= -15.6500: if <integer> <= 37.5000: if <function> == 'sqrt': BUG else: if <value> == '<integer>.<digits>': NO_BUG else: NO_BUG else: if <value> == '<integer>': NO_BUG else: NO_BUG else: if <lead-digit> == '2': NO_BUG else: NO_BUG
New input specifications: <value> == '<integer>' and <integer> <= 37.5 and <term> > -15.650000095367432 <function> == 'sqrt' and <value> == '<integer>.<digits>' and <integer> > 37.5 and <term> > -15.650000095367432 <value> == '<integer>' and <integer> <= 37.5 and <term> > -15.650000095367432 <value> == '<integer>' and <integer> > 37.5 and <term> > -15.650000095367432 <value> == '<integer>' and <integer> > 37.5 and <term> <= -15.650000095367432 <lead-digit> == '2' and <term> <= -15.650000095367432 <lead-digit> == '2' and <term> <= -15.650000095367432 <function> == 'sqrt' and <integer> > 37.5 and <term> > -15.650000095367432 <value> == '<integer>' and <integer> <= 37.5 and <term> <= -15.650000095367432 <function> == 'sqrt' and <value> == '<integer>.<digits>' and <integer> <= 37.5 and <term> <= -15.650000095367432 <function> == 'sqrt' and <value> == '<integer>.<digits>' and <integer> > 37.5 and <term> <= -15.650000095367432 <function> == 'sqrt' and <value> == '<integer>.<digits>' and <integer> > 37.5 and <term> <= -15.650000095367432 <function> == 'sqrt' and <value> == '<integer>.<digits>' and <integer> <= 37.5 and <term> > -15.650000095367432 <lead-digit> == '2' and <term> > -15.650000095367432 <function> == 'sqrt' and <integer> <= 37.5 and <term> > -15.650000095367432 <function> == 'sqrt' and <value> == '<integer>.<digits>' and <integer> <= 37.5 and <term> <= -15.650000095367432 <function> == 'sqrt' and <integer> <= 37.5 and <term> <= -15.650000095367432 <lead-digit> == '2' and <term> > -15.650000095367432 <function> == 'sqrt' and <integer> > 37.5 and <term> <= -15.650000095367432 <function> == 'sqrt' and <value> == '<integer>.<digits>' and <integer> > 37.5 and <term> > -15.650000095367432 <value> == '<integer>' and <integer> <= 37.5 and <term> <= -15.650000095367432 <value> == '<integer>' and <integer> > 37.5 and <term> > -15.650000095367432 <value> == '<integer>' and <integer> > 37.5 and <term> <= -15.650000095367432 <function> == 'sqrt' and <value> == '<integer>.<digits>' and <integer> <= 37.5 and <term> > -15.650000095367432
New samples: tan(4), sin(201464.99), cos(31.11), tan(96235169), cos(-656), sqrt(-93.8), sin(-292311), sqrt(94209), tan(-27.17), tan(-181.653), tan(-31), cos(-48.77), cos(2.8), sin(3.2), cos(-2680432.44), sin(2.708), sin(-34), cos(1.257), cos(8.1), tan(-2639.9), cos(4.475), cos(-49117.007), tan(-0.68), sin(7.20), sin(4.80910), sin(-7.313), cos(-8.0), tan(-2.3), cos(2.6), cos(30.30), cos(-69.4), cos(8.4), sin(9.9), sin(36.8), sin(-17), tan(-404.0), cos(-92.6409), sin(-8763), cos(-90.64), tan(-8), cos(10.3), sqrt(8.309), tan(-35), sqrt(-21), sin(29.35), sqrt(-41.033), cos(64), sqrt(-6), cos(-79), cos(-1), tan(1), cos(2), sqrt(-837), tan(-3), sin(-894), tan(-4), cos(-43), cos(7), sin(3), cos(6), sqrt(-95), tan(9), cos(-23.0), cos(2), cos(-10), sin(6), sin(-627344826), cos(-55), sqrt(-9), sqrt(-3), sqrt(-7), tan(-6175), cos(-9), tan(-471458), sqrt(-52), cos(-6), sin(-7), sqrt(-90), cos(0), sin(35), cos(8), tan(-1), sqrt(-7), tan(-46059), sin(90.04), sin(-78186.9), cos(8.4) Iteration #7 Decision Tree: if <function> == 'sqrt': if <lead-digit> <= 4.5000: if <value> <= 42.5665: if <term> <= -11.5000: BUG else: NO_BUG else: NO_BUG else: NO_BUG else: NO_BUG
New input specifications: <function> == 'sqrt' and <lead-digit> > 4.5 and <value> > 42.56649971008301 <function> == 'sqrt' and <lead-digit> > 4.5 and <term> > -11.5 and <value> > 42.56649971008301 <function> == 'sqrt' and <lead-digit> <= 4.5 and <term> <= -11.5 and <value> > 42.56649971008301 <function> == 'sqrt' and <lead-digit> > 4.5 and <term> <= -11.5 and <value> > 42.56649971008301 <function> == 'sqrt' and <lead-digit> > 4.5 and <term> > -11.5 and <value> <= 42.56649971008301 <function> == 'sqrt' and <lead-digit> <= 4.5 and <value> > 42.56649971008301 <function> == 'sqrt' <function> == 'sqrt' and <lead-digit> <= 4.5 and <term> > -11.5 and <value> > 42.56649971008301 <function> == 'sqrt' and <lead-digit> > 4.5 and <value> <= 42.56649971008301 <function> == 'sqrt' <function> == 'sqrt' and <lead-digit> > 4.5 <function> == 'sqrt' and <lead-digit> > 4.5 and <term> <= -11.5 and <value> <= 42.56649971008301 <function> == 'sqrt' and <lead-digit> <= 4.5 and <term> <= -11.5 and <value> <= 42.56649971008301 <function> == 'sqrt' and <lead-digit> <= 4.5 <function> == 'sqrt' and <lead-digit> <= 4.5 and <value> <= 42.56649971008301 <function> == 'sqrt' and <lead-digit> <= 4.5 and <term> > -11.5 and <value> <= 42.56649971008301 <function> == 'sqrt' and <lead-digit> > 4.5
New samples: sqrt(54), sqrt(85524), sqrt(-350162), sqrt(-95), sqrt(9), sqrt(-5), sqrt(689.2), sqrt(586), sqrt(22.95), sqrt(6.7), sqrt(8.349248), sqrt(6.89), sqrt(6), sqrt(3.03), sqrt(-1.333), sqrt(621676), sqrt(5.198), sqrt(39.80), sqrt(-9.7), sqrt(61), sqrt(1.44), sqrt(4924), sqrt(4.4), sqrt(3521), sqrt(51), sqrt(-8.82), sqrt(6.2), sqrt(-9.4), sqrt(8.02), sqrt(-20), sqrt(908), sqrt(17.9), sqrt(-88.8), sqrt(-5), sqrt(-84.806), sqrt(-9.1), sqrt(-25.63), sqrt(15.1), sqrt(-8), sqrt(-6), sqrt(-37.2), sqrt(8), sqrt(3), sqrt(-8.8), sqrt(-3), sqrt(6), sqrt(-31), tan(1.9), sin(-77.78), sqrt(-30.40), sqrt(-62.0), tan(-22), sqrt(-2982), sqrt(-2399), tan(-25), sqrt(38.986), sqrt(20), sqrt(28), sqrt(40), sqrt(7.34), sqrt(-9.1), sin(-11), sqrt(2), cos(26.44), sqrt(3), sin(32.8), sqrt(4.31), sqrt(-2), sqrt(190.15), sqrt(0), sqrt(-1.683), sqrt(4.8), sqrt(-4.4), sqrt(4), sqrt(-8), sin(13.857), sqrt(4.7), sqrt(1), sqrt(7.9), sqrt(332.3), cos(14), sqrt(-789) Iteration #8 Decision Tree: if <term> <= -15.6500: if <term> <= -42.0165: if <digit> == '8': NO_BUG else: NO_BUG else: if <function> == 'sqrt': BUG else: NO_BUG else: NO_BUG New input specifications: <digit> == '8' and <term> <= -42.01650047302246 <term> > -15.650000095367432 <digit> == '8' and <term> <= -42.01650047302246 <term> <= -15.650000095367432 <digit> == '8' and <term> > -42.01650047302246 <digit> == '8' and <term> > -42.01650047302246
New samples: sqrt(-74), cos(-4.0), tan(-12813.7), cos(-36614), cos(6.7832), tan(30.0) Iteration #9 Decision Tree: if <term> <= -15.6500: if <term> <= -42.0165: if <function> == 'sqrt': NO_BUG else: NO_BUG else: if <function> == 'sqrt': BUG else: if <digit> == '5': NO_BUG else: NO_BUG else: NO_BUG New input specifications: <term> > -15.650000095367432 <term> <= -15.650000095367432 <function> == 'sqrt' and <term> <= -42.01650047302246 <function> == 'sqrt' and <term> > -42.01650047302246 <function> == 'sqrt' and <term> > -42.01650047302246 <function> == 'sqrt' and <term> <= -42.01650047302246
New samples: sqrt(929.73), sin(-629.102), sqrt(-77), sin(0.16), sqrt(599), cos(-60166146.5) Iteration #10 Decision Tree: if <term> <= -15.6500: if <term> <= -42.0165: NO_BUG else: if <function> == 'sqrt': BUG else: NO_BUG else: if <digit> == '8': NO_BUG else: NO_BUG New input specifications: <digit> == '8' and <term> <= -15.650000095367432 <digit> == '8' and <term> > -15.650000095367432 <term> > -42.01650047302246 <term> <= -42.01650047302246 <digit> == '8' and <term> <= -15.650000095367432 <digit> == '8' and <term> > -15.650000095367432
New samples: sqrt(-6853.5418911), sin(-8), cos(-5), sin(-91.10), tan(-537549), tan(3) Iteration #11 Decision Tree: if <term> <= -15.6500: if <term> <= -42.0165: if <function> == 'sqrt': NO_BUG else: NO_BUG else: if <function> == 'sqrt': BUG else: NO_BUG else: NO_BUG New input specifications: <term> > -15.650000095367432 <term> <= -15.650000095367432 <function> == 'sqrt' and <term> <= -42.01650047302246 <function> == 'sqrt' and <term> > -42.01650047302246 <function> == 'sqrt' and <term> > -42.01650047302246 <function> == 'sqrt' and <term> <= -42.01650047302246
New samples: tan(5.234), tan(-98593), sqrt(-5640572), cos(95), sqrt(-8), tan(-994.79) Iteration #12 Decision Tree: if <term> <= -15.6500: if <term> <= -42.0165: if <term> <= -4337503456.0000: NO_BUG else: NO_BUG else: if <function> == 'sqrt': BUG else: if <digit> == '5': NO_BUG else: NO_BUG else: NO_BUG New input specifications: <term> <= -4337503456.0 <term> > -15.650000095367432 <term> <= -15.650000095367432 <term> > -4337503456.0
New samples: sin(8.14), tan(-39.1), sqrt(-57.35), tan(-2), tan(-43.8), tan(-1), sqrt(-910), cos(5), sqrt(1), sqrt(-69), cos(-0), sqrt(-1.7), cos(37), sqrt(-17), cos(89253237), sqrt(6.346), sin(-10), tan(-97.1), sin(-2), sin(-4.1), cos(-73535.8), sqrt(-5), tan(-4299), sin(-0.3), tan(-4), cos(-578), tan(-34.8), tan(2584.07), cos(-4.1), sqrt(380), tan(3.3), sin(5), sin(-4.43), sqrt(94.76), sin(86912), sqrt(1), cos(0.6), sqrt(-4.7), cos(-1), cos(18.15), cos(-84.59), cos(83.6778), sqrt(839), sqrt(152.4), tan(-8), sqrt(2.7), cos(7931.22), sqrt(653.7943), tan(-3.5), sqrt(367574), tan(-42), sqrt(-21.881), sin(-65704), sqrt(-0), cos(854.6), sin(43), tan(6145.0), tan(-6201), cos(12756), sqrt(137), tan(2940.0), cos(-8), cos(73.946976), sqrt(3), sin(-3), cos(972247.7), cos(7), tan(73.96), tan(-0), sin(-1.403), sin(-21.6), sin(27.2), sqrt(49.4), tan(-171), sqrt(-31.146), tan(-4), sin(-227), sin(98.3), cos(-9), sin(-3), cos(-74), cos(-38.88), sqrt(-85), sqrt(-8), tan(-30), sin(-6171), sin(-6), tan(-0.44060), tan(-766.7), tan(42.3), tan(-5460.24), sin(4.6), tan(0.5), sqrt(6.3088371), cos(250271), tan(-30.3), sqrt(-0.6316622), sin(5.7899), sqrt(0), cos(-5), tan(-6), sin(-24.2), sqrt(-36164) Iteration #13 Decision Tree: if <term> <= -15.6500: if <value> <= 41.5165: if <function> == 'sqrt': BUG else: if <lead-digit> == '3': NO_BUG else: NO_BUG else: if <lead-digit> == '5': NO_BUG else: NO_BUG else: if <digit>: NO_BUG else: NO_BUG
New input specifications: <function> == 'sqrt' and <lead-digit> == '3' and <term> > -15.650000095367432 and <value> <= 41.51650047302246 <function> == 'sqrt' and <term> <= -15.650000095367432 and <value> <= 41.51650047302246 <function> == 'sqrt' and <lead-digit> == '3' and <term> <= -15.650000095367432 and <value> > 41.51650047302246 <function> == 'sqrt' and <lead-digit> == '3' and <term> > -15.650000095367432 and <value> > 41.51650047302246 <lead-digit> == '5' and <term> > -15.650000095367432 and <value> <= 41.51650047302246 <function> == 'sqrt' and <lead-digit> == '3' and <term> > -15.650000095367432 and <value> <= 41.51650047302246 <digit> > 0.5 and <term> <= -15.650000095367432 <digit> <= 0.5 and <term> > -15.650000095367432 <lead-digit> == '5' and <term> <= -15.650000095367432 and <value> > 41.51650047302246 <function> == 'sqrt' and <lead-digit> == '3' and <term> <= -15.650000095367432 and <value> > 41.51650047302246 <lead-digit> == '5' and <term> > -15.650000095367432 and <value> > 41.51650047302246 <lead-digit> == '5' and <term> > -15.650000095367432 and <value> <= 41.51650047302246 <lead-digit> == '5' and <term> <= -15.650000095367432 and <value> > 41.51650047302246 <lead-digit> == '5' and <term> <= -15.650000095367432 and <value> <= 41.51650047302246 <digit> > 0.5 and <term> > -15.650000095367432 <function> == 'sqrt' and <term> > -15.650000095367432 and <value> <= 41.51650047302246 <digit> <= 0.5 and <term> <= -15.650000095367432 <function> == 'sqrt' and <lead-digit> == '3' and <term> > -15.650000095367432 and <value> > 41.51650047302246 <function> == 'sqrt' and <term> > -15.650000095367432 and <value> > 41.51650047302246 <function> == 'sqrt' and <term> <= -15.650000095367432 and <value> > 41.51650047302246 <lead-digit> == '5' and <term> <= -15.650000095367432 and <value> <= 41.51650047302246 <function> == 'sqrt' and <lead-digit> == '3' and <term> <= -15.650000095367432 and <value> <= 41.51650047302246 <function> == 'sqrt' and <lead-digit> == '3' and <term> <= -15.650000095367432 and <value> <= 41.51650047302246 <lead-digit> == '5' and <term> > -15.650000095367432 and <value> > 41.51650047302246
New samples: tan(4.1), sqrt(-23.556), cos(-381226), cos(56412), sqrt(-1.061), sin(12), sin(54), cos(1), tan(-7.5), sin(-1.83), tan(-5), sin(-8), sqrt(-9), cos(6.94524), sqrt(-4.9), sqrt(-2.3), tan(-3.593250), cos(-5), tan(7), tan(37), sqrt(11), sqrt(1.0), sqrt(0.3), tan(0), sin(-0.870), sqrt(9.9), sqrt(6.4), tan(-6.5421), sqrt(-5), sin(8), sin(-8), sqrt(-2), sin(4.72), cos(-4), tan(-5.8), sqrt(8), sqrt(7.9), sqrt(-2), tan(-6), sin(-5), tan(1), cos(-3), sin(-4.76), sqrt(-3), sqrt(32.69), sin(-6), tan(7), tan(7.4), cos(5), sqrt(5), cos(5), sin(-0.2), sqrt(-8.71), tan(5.663), sqrt(4.61879), sin(-5.8), sqrt(4.7), tan(-9.3), tan(-6.0647), sqrt(-8.4), cos(-0.5), sin(522), sqrt(-3), sin(1.908), tan(1.6), sqrt(54), sin(30.6574), cos(-72239.8), tan(0), tan(-44), cos(-125743), sqrt(64.578), cos(2), sin(-57.2), cos(-19), sqrt(67), sqrt(-5.47), cos(-57), cos(-76.94), cos(-56.7428), sqrt(-75), sqrt(-73), cos(-3898.8), tan(-69606), sqrt(-222), tan(-70.676396), sin(-77), cos(-36), cos(-43.62), sin(-526.6), tan(-840.0), sin(-57), sin(-91.7), tan(-660), tan(-412), sqrt(-58), tan(369), sqrt(783.765), sqrt(-258450.2), cos(-26), sin(-5425.227947), sin(-35.3), tan(-500.1), sin(-17.71), sin(-38.1), sqrt(51.9) Iteration #14 Decision Tree: if <term> <= -15.6500: if <integer> <= 41.5000: if <function> == 'sqrt': BUG else: if <lead-digit> == '2': NO_BUG else: NO_BUG else: NO_BUG else: NO_BUG
New input specifications: <function> == 'sqrt' and <integer> <= 41.5 and <term> > -15.650000095367432 <function> == 'sqrt' and <lead-digit> == '2' and <integer> > 41.5 and <term> > -15.650000095367432 <function> == 'sqrt' and <lead-digit> == '2' and <integer> > 41.5 and <term> > -15.650000095367432 <term> > -15.650000095367432 <integer> > 41.5 and <term> <= -15.650000095367432 <function> == 'sqrt' and <lead-digit> == '2' and <integer> <= 41.5 and <term> <= -15.650000095367432 <term> <= -15.650000095367432 <function> == 'sqrt' and <lead-digit> == '2' and <integer> <= 41.5 and <term> > -15.650000095367432 <function> == 'sqrt' and <lead-digit> == '2' and <integer> > 41.5 and <term> <= -15.650000095367432 <integer> <= 41.5 and <term> <= -15.650000095367432 <function> == 'sqrt' and <lead-digit> == '2' and <integer> <= 41.5 and <term> > -15.650000095367432 <function> == 'sqrt' and <integer> <= 41.5 and <term> <= -15.650000095367432 <function> == 'sqrt' and <integer> > 41.5 and <term> > -15.650000095367432 <function> == 'sqrt' and <lead-digit> == '2' and <integer> <= 41.5 and <term> <= -15.650000095367432 <function> == 'sqrt' and <lead-digit> == '2' and <integer> > 41.5 and <term> <= -15.650000095367432 <integer> > 41.5 and <term> > -15.650000095367432 <function> == 'sqrt' and <integer> > 41.5 and <term> <= -15.650000095367432
New samples: sqrt(5), tan(150.877), sin(77), sin(916), sin(44.5), tan(92.0), tan(138.8), tan(-269.167), sin(82), tan(126), cos(43.55), tan(779), cos(5928.6), sin(88), cos(85.0), sin(-3.7261), cos(-504679), tan(-18.8), sin(-8203), cos(0), cos(-86.3), cos(-41), cos(22), sqrt(-26), sqrt(321.5290), tan(-26.9), tan(-285.9), cos(115), sqrt(-895) Iteration #15 Decision Tree: if <term> <= -15.6500: if <integer> <= 41.5000: if <function> == 'sqrt': if <digit> == '1': BUG else: BUG else: NO_BUG else: if <digit> == '1': NO_BUG else: NO_BUG else: if <lead-digit> == '3': NO_BUG else: NO_BUG
New input specifications: <digit> == '1' and <integer> > 41.5 and <term> <= -15.650000095367432 <digit> == '1' and <integer> <= 41.5 and <term> > -15.650000095367432 <function> == 'sqrt' and <integer> > 41.5 and <term> <= -15.650000095367432 <digit> == '1' and <function> == 'sqrt' and <integer> > 41.5 and <term> <= -15.650000095367432 <digit> == '1' and <integer> <= 41.5 and <term> <= -15.650000095367432 <lead-digit> == '3' and <term> > -15.650000095367432 <digit> == '1' and <function> == 'sqrt' and <integer> <= 41.5 and <term> <= -15.650000095367432 <digit> == '1' and <integer> > 41.5 and <term> > -15.650000095367432 <digit> == '1' and <integer> <= 41.5 and <term> <= -15.650000095367432 <lead-digit> == '3' and <term> <= -15.650000095367432 <lead-digit> == '3' and <term> > -15.650000095367432 <digit> == '1' and <integer> > 41.5 and <term> <= -15.650000095367432 <digit> == '1' and <function> == 'sqrt' and <integer> > 41.5 and <term> > -15.650000095367432 <digit> == '1' and <integer> <= 41.5 and <term> > -15.650000095367432 <digit> == '1' and <function> == 'sqrt' and <integer> > 41.5 and <term> <= -15.650000095367432 <function> == 'sqrt' and <integer> > 41.5 and <term> > -15.650000095367432 <digit> == '1' and <function> == 'sqrt' and <integer> <= 41.5 and <term> > -15.650000095367432 <digit> == '1' and <function> == 'sqrt' and <integer> <= 41.5 and <term> > -15.650000095367432 <lead-digit> == '3' and <term> <= -15.650000095367432 <digit> == '1' and <integer> > 41.5 and <term> > -15.650000095367432 <digit> == '1' and <function> == 'sqrt' and <integer> <= 41.5 and <term> <= -15.650000095367432 <function> == 'sqrt' and <integer> <= 41.5 and <term> > -15.650000095367432 <digit> == '1' and <function> == 'sqrt' and <integer> > 41.5 and <term> > -15.650000095367432 <function> == 'sqrt' and <integer> <= 41.5 and <term> <= -15.650000095367432
New samples: cos(-372), sqrt(9.707), sin(-8418.9), sqrt(-42.5), sin(1.7), tan(-7.41), sqrt(1.10), sin(-6.2193), sqrt(5.1), sqrt(-1.073), sqrt(-23.68), cos(-34.284), tan(1.66), cos(-991.2173), cos(2.164), sin(-17.73), sin(-4570425.11), sin(-80160.46), tan(-14.12), sqrt(-8.61), cos(1), sqrt(-1), cos(-94.814), sin(-6.3), sqrt(-1), sqrt(5.801), sqrt(-841907.14), sqrt(-1.80), sqrt(-52.1), cos(5629.7), cos(-40.3), cos(-75), sin(30.63), cos(-61), sqrt(995.6), tan(-5.127), sqrt(9630.918), sqrt(-534), tan(-59.18187), sqrt(-99), sin(-97.01), sin(-34132), tan(-5714.9), sin(6809), sqrt(-15), sqrt(1.86), sqrt(-38.79), cos(35891), sqrt(-22), tan(7.675), sqrt(74715), cos(-26.77) Iteration #16 Decision Tree: if <lead-digit> <= 4.5000: if <function> == 'sqrt': if <term> == '<value>': NO_BUG else: if <term> <= -41.7665: NO_BUG else: BUG else: NO_BUG else: NO_BUG
New input specifications: <function> == 'sqrt' and <term> == '<value>' and <lead-digit> <= 4.5 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> > 4.5 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> <= 4.5 and <term> > -41.76650047302246 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> > 4.5 and <term> > -41.76650047302246 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> <= 4.5 and <term> > -41.76650047302246 <function> == 'sqrt' and <lead-digit> <= 4.5 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> > 4.5 and <term> <= -41.76650047302246 <function> == 'sqrt' and <lead-digit> > 4.5 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> <= 4.5 and <term> <= -41.76650047302246 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> <= 4.5 and <term> <= -41.76650047302246 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> <= 4.5 <lead-digit> > 4.5 <function> == 'sqrt' and <lead-digit> <= 4.5 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> > 4.5 <lead-digit> <= 4.5
New samples: sqrt(35.989), sqrt(-950.1), sqrt(-27.8), sqrt(694.91), sqrt(-5.2), sqrt(-0), sqrt(-3.08), sqrt(785), sqrt(-8), sqrt(925.21469), sqrt(-4), sqrt(-23.3), sqrt(-0), sqrt(-4), sqrt(67), sqrt(411.4), tan(-40), sqrt(-907.5), cos(846), sqrt(-45), sqrt(1062), sqrt(33766.9), sqrt(-386.88), sqrt(-272.1), cos(6327.5), sqrt(126.88), sqrt(8698), cos(-15.4681) Iteration #17 Decision Tree: if <lead-digit> <= 4.5000: if <function> == 'sqrt': if <term> <= -11.1500: if <term> <= -41.7665: NO_BUG else: BUG else: NO_BUG else: NO_BUG else: NO_BUG
New input specifications: <function> == 'sqrt' and <lead-digit> > 4.5 and <term> > -41.76650047302246 <function> == 'sqrt' and <lead-digit> <= 4.5 <function> == 'sqrt' and <lead-digit> <= 4.5 and <term> <= -41.76650047302246 <function> == 'sqrt' and <lead-digit> > 4.5 and <term> <= -11.150000095367432 <function> == 'sqrt' and <lead-digit> <= 4.5 and <term> <= -11.150000095367432 <function> == 'sqrt' and <lead-digit> <= 4.5 and <term> > -41.76650047302246 <function> == 'sqrt' and <lead-digit> > 4.5 <function> == 'sqrt' and <lead-digit> <= 4.5 and <term> > -11.150000095367432 <lead-digit> > 4.5 <function> == 'sqrt' and <lead-digit> > 4.5 and <term> <= -41.76650047302246 <function> == 'sqrt' and <lead-digit> > 4.5 and <term> > -11.150000095367432 <function> == 'sqrt' and <lead-digit> <= 4.5 <lead-digit> <= 4.5
New samples: sqrt(70812), sin(115.655615855), sqrt(-86), sqrt(-25), tan(-45), tan(-167), tan(-100), sqrt(-60), sqrt(32227.792), tan(-43), sqrt(-68), sqrt(-7405), sqrt(-98508), sqrt(-26), sqrt(31), sqrt(-95), sqrt(-44.3), sqrt(300306.2), tan(87), sqrt(105.1), tan(63), sqrt(-86.80), sqrt(879.8), sqrt(33.042), cos(-37) Iteration #18 Decision Tree: if <lead-digit> <= 4.5000: if <function> == 'sqrt': if <term> == '<value>': NO_BUG else: if <term> <= -41.7665: NO_BUG else: BUG else: NO_BUG else: NO_BUG
New input specifications: <function> == 'sqrt' and <term> == '<value>' and <lead-digit> <= 4.5 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> > 4.5 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> <= 4.5 and <term> > -41.76650047302246 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> > 4.5 and <term> > -41.76650047302246 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> <= 4.5 and <term> > -41.76650047302246 <function> == 'sqrt' and <lead-digit> <= 4.5 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> > 4.5 and <term> <= -41.76650047302246 <function> == 'sqrt' and <lead-digit> > 4.5 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> <= 4.5 and <term> <= -41.76650047302246 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> <= 4.5 and <term> <= -41.76650047302246 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> <= 4.5 <lead-digit> > 4.5 <function> == 'sqrt' and <lead-digit> <= 4.5 <function> == 'sqrt' and <term> == '<value>' and <lead-digit> > 4.5 <lead-digit> <= 4.5
New samples: sqrt(38), sqrt(-80.2), sqrt(-12), sqrt(9534), sqrt(-54.776), sqrt(690), sqrt(-74.4), sqrt(-75), sqrt(-1), sqrt(7518401677), sqrt(-0), sqrt(-4), sqrt(87.02), sqrt(-2), sqrt(-82284597690.6), sqrt(-5), sqrt(21.95), tan(-463), sqrt(-783), cos(79), sqrt(-4677), sqrt(284), sqrt(-151), sqrt(11), sqrt(-27366), sqrt(-407), sqrt(-338.2028), sqrt(98.1), sqrt(15.40), sqrt(63), tan(16) Iteration #19 Decision Tree: if <lead-digit> <= 3.5000: if <function> == 'sqrt': if <term> == '-<value>': if <term> <= -94.8950: NO_BUG else: if <integer> <= 13.5000: BUG else: BUG else: NO_BUG else: if <term> <= -1530829.2500: NO_BUG else: NO_BUG else: if <lead-digit> <= 4.5000: if <integer> <= 41.5000: if <value> <= 41.0165: if <digits>: NO_BUG else: NO_BUG else: BUG else: NO_BUG else: if <digit> == '7': NO_BUG else: NO_BUG
New input specifications: <function> == 'sqrt' and <lead-digit> <= 3.5 and <term> <= -1530829.25 <digit> == '7' and <lead-digit> > 4.5 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> > 3.5 <digit> == '7' and <lead-digit> <= 4.5 <function> == 'sqrt' and <lead-digit> > 3.5 and <term> <= -1530829.25 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> <= 3.5 <function> == 'sqrt' and <term> == '-<value>' and <integer> > 13.5 and <lead-digit> <= 3.5 and <term> > -94.89500045776367 <function> == 'sqrt' and <term> == '-<value>' and <integer> <= 13.5 and <lead-digit> <= 3.5 and <term> > -94.89500045776367 <function> == 'sqrt' and <term> == '-<value>' and <integer> <= 13.5 and <lead-digit> > 3.5 and <term> > -94.89500045776367 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> <= 3.5 <function> == 'sqrt' and <term> == '-<value>' and <integer> <= 13.5 and <lead-digit> <= 3.5 and <term> <= -94.89500045776367 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> <= 3.5 and <term> <= -94.89500045776367 <function> == 'sqrt' and <term> == '-<value>' and <integer> > 13.5 and <lead-digit> > 3.5 and <term> > -94.89500045776367 <function> == 'sqrt' and <lead-digit> > 3.5 and <term> > -1530829.25 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> > 3.5 and <term> > -94.89500045776367 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> <= 3.5 and <term> <= -94.89500045776367 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> > 3.5 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> <= 3.5 and <term> > -94.89500045776367 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> > 3.5 and <term> <= -94.89500045776367 <function> == 'sqrt' and <term> == '-<value>' and <integer> > 13.5 and <lead-digit> <= 3.5 and <term> <= -94.89500045776367 <digit> == '7' and <lead-digit> <= 4.5 <digit> == '7' and <lead-digit> > 4.5 <function> == 'sqrt' and <lead-digit> <= 3.5 and <term> > -1530829.25
New samples: tan(354.30), tan(-23), sin(-14), sin(-2506.60683), tan(-28273.15), cos(21), cos(-355.7), sin(-285.3), cos(72.73), sqrt(-820), sqrt(-2803), cos(-745), tan(-86), sin(773), sin(-472960), sin(93.51), tan(435.2), tan(9221.5940), tan(65708), cos(-972.1), tan(-785), tan(47), sin(732), sin(42580.2), sin(-9700), sin(-7046), tan(-846), cos(8036128.2), tan(-90.9), sin(75), tan(50), cos(-556), sin(-71), tan(-77), sqrt(16553.8), sqrt(-2368.8), sqrt(-45.3), sin(-24.0), sqrt(163), sin(-36), sqrt(-77.258), tan(-36), tan(-19.09), sqrt(-40), sqrt(-4.478), sqrt(-30.815), sqrt(-16.25), tan(-13.8), sqrt(-54), sqrt(-42.8), sqrt(-6.97), sqrt(-3.7), sqrt(-76), sqrt(-5), sqrt(-6), sqrt(-1), sqrt(-0), sqrt(-24541.55), sqrt(-6), sqrt(-784), sqrt(-14.07), sqrt(-478), sqrt(-22), sqrt(-0), sqrt(-7.8), tan(-295), cos(-302.20019), sqrt(-9), sqrt(-3.8), sqrt(-18.080), sqrt(-2), sqrt(-80585.7), tan(-11.91), sqrt(-764), sqrt(152), sqrt(-299.9), sqrt(37), sqrt(-68), sin(9218.7), sqrt(-2), sqrt(-4), sqrt(-8.2), sqrt(7560.6), sin(-94.65), tan(-55), sqrt(-6.2), tan(-59), sqrt(479.30171938), sqrt(681), sqrt(-4), sqrt(-9.7), sqrt(904), sqrt(-0.43), sqrt(-2.1), sin(-84), sqrt(-0), sqrt(63.8), sqrt(-9.1), sqrt(960.47281930), sqrt(-8), sqrt(-7), sqrt(-1.8), cos(-80.57), sin(-68), sqrt(-3789), sqrt(577.82), sqrt(-12), sqrt(-4043.9), sqrt(-94.9), cos(-3332.2), sin(-289167.2871), sqrt(-19.32733), sqrt(-60395), tan(-3543), sqrt(-580.03), tan(-2370.280), sin(-208), tan(-2044586.876), sqrt(-23.0), cos(-276.26), cos(-394.802402), sqrt(-38.2), sqrt(237.5), cos(69.1), sin(-253) Iteration #20 Decision Tree: if <lead-digit> <= 4.5000: if <function> == 'sqrt': if <value> <= 41.7665: if <term> == '-<value>': BUG else: NO_BUG else: NO_BUG else: NO_BUG else: NO_BUG
New input specifications: <function> == 'sqrt' and <lead-digit> > 4.5 and <value> <= 41.76650047302246 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> > 4.5 and <value> <= 41.76650047302246 <function> == 'sqrt' and <lead-digit> <= 4.5 and <value> > 41.76650047302246 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> <= 4.5 and <value> > 41.76650047302246 <lead-digit> > 4.5 <lead-digit> <= 4.5 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> > 4.5 and <value> > 41.76650047302246 <function> == 'sqrt' and <lead-digit> <= 4.5 <function> == 'sqrt' and <lead-digit> > 4.5 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> <= 4.5 and <value> <= 41.76650047302246 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> > 4.5 and <value> <= 41.76650047302246 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> > 4.5 and <value> > 41.76650047302246 <function> == 'sqrt' and <lead-digit> > 4.5 and <value> > 41.76650047302246 <function> == 'sqrt' and <lead-digit> <= 4.5 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> <= 4.5 and <value> > 41.76650047302246 <function> == 'sqrt' and <term> == '-<value>' and <lead-digit> <= 4.5 and <value> <= 41.76650047302246 <function> == 'sqrt' and <lead-digit> <= 4.5 and <value> <= 41.76650047302246
New samples: sqrt(-8.93), sqrt(-30.17), sqrt(6.536), sqrt(12.22), sqrt(-80), sqrt(9), sqrt(-5), sqrt(5.3), sqrt(-3.4), sqrt(-81.62), sqrt(-1), sqrt(50822), sqrt(9.7), sqrt(-0), sqrt(57308.728), sqrt(-26), sqrt(-2.8), sqrt(-7), sqrt(8), sqrt(-59), sqrt(-4), sqrt(1.1), sqrt(36), sqrt(-551220.4), sqrt(2), sqrt(6.388), sqrt(0.5), sqrt(-6), sqrt(-12.2), sqrt(-965), sqrt(-37), sqrt(-0), sqrt(-33), sqrt(-3.7), sqrt(-2.92), sqrt(-957.82897), sqrt(-13580), sqrt(-49.6), sqrt(972), sin(176), sqrt(65796), cos(138988162.948), cos(99), sin(30.0), sqrt(-31), tan(38), sqrt(6), sqrt(436.3512), tan(11.1329), cos(12.1), sqrt(6), sqrt(5), sqrt(2), sqrt(7.207), sqrt(6), sqrt(1), sqrt(61.84263), sqrt(22), sqrt(9), sqrt(3), sqrt(97.05), sqrt(5.6), sqrt(2.209365), sqrt(3), sqrt(6.6), sqrt(-900.97008), sqrt(-97.650), sqrt(39), sqrt(48503.68), sin(-22.2), sqrt(-6), sqrt(-192), sin(-12.7), tan(-22), sqrt(-3085.50), sqrt(-5.0464), sqrt(-4), sqrt(-6), sqrt(-278750), sin(-17.8), sqrt(-13)
To access the final decision tree learned by Alhazen, use:
alhazen.last_tree()
DecisionTreeClassifier(class_weight={'BUG': 0.02702702702702703, 'NO_BUG': 0.0011792452830188679}, max_depth=5)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier(class_weight={'BUG': 0.02702702702702703, 'NO_BUG': 0.0011792452830188679}, max_depth=5)
Let's display it:
alhazen.show_decision_tree()
We can also view the tree as text:
print(alhazen.friendly_decision_tree())
if <lead-digit> <= 4.5000: if <function> == 'sqrt': if <value> <= 41.7665: if <term> == '-<value>': BUG else: NO_BUG else: NO_BUG else: NO_BUG else: NO_BUG
In both views, we see that the failure is related to the sqrt()
function being called with a negative value.
But what's the deal with the <lead-digit>
and <value>
fields?
For this, let's have a look at our sqrt function code:
print(inspect.getsource(task_sqrt))
def task_sqrt(x): """Computes the square root of x, using the Newton-Raphson method""" if x <= -12 and x >= -42: x = 0 # Guess where the bug is :-) else: x = 1 x = max(x, 0) approx = None guess = x / 2 while approx != guess: approx = guess guess = (approx + x / approx) / 2 return approx
We see that Alhazen has correctly determined the boundaries of x
for the bug - the <lead-digit>
value must be 4
or less (otherwise, the value of x
will not trigger the bug); and <value>
and <term>
correctly reflect the boundaries.
(Note that <term>
comes with a sign, whereas <value>
has no sign.)
Not too bad for a machine learning approach :-)
Synopsis¶
This chapter provides an implementation of the Alhazen approach \cite{Kampmann2020}, which trains machine learning classifiers from input features.
Given a test function, a grammar, and a set of inputs, the Alhazen
class produces a decision tree that characterizes failure circumstances:
alhazen = Alhazen(sample_runner, CALC_GRAMMAR, initial_sample_list,
max_iterations=20)
alhazen.run()
The final decision tree can be accessed using last_tree()
:
# alhazen.last_tree()
We can visualize the resulting decision tree using Alhazen.show_decision_tree()
:
alhazen.show_decision_tree()
A decision tree is read from top to bottom. Decision nodes (with two children) come with a predicate on top. This predicate is either
- numeric, such as
<value> > 20
, indicating the numeric value of the given symbol, or - existential, such as
<digit> == '1'
, which has a negative value when False, and a positive value when True.
If the predicate evaluates to True
, follow the left path; if it evaluates to False
, follow the right path.
A leaf node (no children) will give you the final decision class = BUG
or class = NO_BUG
.
So if the predicate states <function> == 'sqrt' <= 0.5
, this means that
- If the function is not
sqrt
(the predicate<function> == 'sqrt'
is negative, see above, and hence less than 0.5), follow the left (True
) path. - If the function is
sqrt
(the predicate<function> == 'sqrt'
is positive), follow the right (False
) path.
The samples
field shows the number of sample inputs that contributed to this decision.
The gini
field (aka Gini impurity) indicates how many samples fall into the displayed class (BUG
or NO_BUG
).
A gini
value of 0.0
means purity - all samples fall into the displayed class.
The saturation of nodes also indicates purity – the higher the saturation, the higher the purity.
There is also a text version available, with much fewer (but hopefully still essential) details:
print(alhazen.friendly_decision_tree())
if <term> <= -11.5000: if <term> <= -42.2970: NO_BUG else: if <function> == 'sqrt': BUG else: NO_BUG else: NO_BUG
In both representations, we see that the present failure is associated with a negative value for the sqrt
function and precise boundaries for its value.
In fact, the error conditions are given in the source code:
print(inspect.getsource(task_sqrt))
def task_sqrt(x): """Computes the square root of x, using the Newton-Raphson method""" if x <= -12 and x >= -42: x = 0 # Guess where the bug is :-) else: x = 1 x = max(x, 0) approx = None guess = x / 2 while approx != guess: approx = guess guess = (approx + x / approx) / 2 return approx
Try out Alhazen on your own code and your own examples!
Lessons Learned¶
- Training machine learners from input features can give important insights on failure circumstances.
- Generating additional inputs based on feedback from the machine learner can greatly enhance precision.
- Applying machine learners on input and execution features is still at its infancy.
Next Steps¶
Our next chapter introduces automated repair of programs, building on the fault localization and generalization mechanisms introduced so far.
Background¶
This chapter is built on the Alhazen paper by Kampmann et al. \cite{Kampmann2020}.
In \cite{Eberlein2023}, Eberlein et al. introduced Avicenna, a new interpretation of Alhazen that makes use of the ISLa framework \cite{Steinhoefel2022} to learn and produce input features. Avicenna improves over Alhazen in terms of performance, expressiveness, and precision.