In this book, we want to explore debugging - the art and science of fixing bugs in computer software. In particular, we want to explore techniques that automatically answer questions like: Where is the bug? When does it occur? And how can we repair it? But before we start automating the debugging process, we first need to understand what this process is.
In this chapter, we introduce basic concepts of systematic software debugging and the debugging process, and at the same time get acquainted with Python and interactive notebooks.
Prerequisites
# ignore
# If this is the first time you are opening a notebook, please read this.
#
# Interaction basics
# ------------------
#
# * The notebook is composed of _code cells_ (like this one) and
# _text cells_ (like the one above).
#
# * To interact with the notebook, click into a cell (or type `Return`)
# and start editing. Type `Shift+Return` to execute the code, or
# `Esc` to leave without executing it.
#
# * Since code cells depend on earlier cells, you will have to execute
# those first - beginning from the top of the notebook.
#
# * You can change code (and text) at your leisure to try out alternatives.
# In Jupyter, use `b` to add a new cell below, and `Return` to enter it.
# Type `x` to delete (cut) a cell, and `z` to undo this.
#
# * There's a `Help` menu at the top of Jupyter Notebook. Enjoy!
#
#
# `# ignore` markers
# ------------------
#
# * In the notebook, there are some extra code cells starting with `# ignore`
# (like this one, actually). These are code blocks used to create diagrams,
# run tests, create or tear down special environments or more - code blocks
# that are not necessary for reading (but for creation).
#
#
# `# type: ignore` markers
# ------------------------
#
# * In the notebook, some lines come with a comment `# type: ignore`. This tells
# static code checkers to ignore any typing errors in that line. Most frequently,
# this is used to mark untyped Python code (as in our examples) such that the
# static code checker does not mark it as erroneous.
Let us start with a simple example. You may have heard of how documents on the Web are made out of text and HTML markup. HTML markup consists of tags in angle brackets that surround the text, providing additional information on how the text should be interpreted. For instance, in the HTML text
This is <em>emphasized</em>.
the word "emphasized" is enclosed in the HTML tags <em>
(start) and </em>
(end), meaning that it should be interpreted (and rendered) in an emphasized way – typically in italics. In your environment, the HTML text gets rendered as
This is emphasized.
There's HTML tags for pretty much everything – text markup (bold text, strikethrough text), text structure (titles, lists), references (links) to other documents, and many more. These HTML tags shape the Web as we know it.
However, with all the HTML markup, it may become difficult to actually access the text that lies within. We'd like to implement a simple function that removes HTML markup and converts it into text. If our input is
Here's some <strong>strong argument</strong>.
the output should be
Here's some strong argument.
Here's a Python function which does exactly this. It takes a (HTML) string and returns the text without markup.
def remove_html_markup(s): # type: ignore
tag = False
out = ""
for c in s:
if c == '<': # start of markup
tag = True
elif c == '>': # end of markup
tag = False
elif not tag:
out = out + c
return out
This function works, but not always. Before we start debugging things, let us first explore its code and how it works.
If you're new to Python, you might first have to understand what the above code does. We very much recommend the Python tutorial to get an idea on how Python works. The most important things for you to understand the above code are these three:
for
bodies are defined by being indented.c
, tag
, or out
is determined at run-time.while
, if
, for
), assignments (=
), or comparisons (==
, !=
, <
).With that, you can already understand what the above code does: remove_html_markup()
takes a (HTML) string s
and then iterates over the individual characters (for c in s
). By default, these characters are added to the return string out
. However, if remove_html_markup()
finds a <
character, it sets the tag
flag, meaning that all further characters are ignored until a >
character is found.
In contrast to other languages, Python makes no difference between strings and characters – there are only strings. As in HTML, strings can be enclosed in single quotes ('a'
) and in double quotes ("a"
). This is useful if you want to specify a string that contains quotes, as in 'She said "hello", and then left'
or "The first character is a 'c'"
To find out whether remove_html_markup()
works correctly, we can test it with a few values. For the string
Here's some <strong>strong argument</strong>.
for instance, it produces the correct value:
remove_html_markup("Here's some <strong>strong argument</strong>.")
"Here's some strong argument."
If you are reading this in the interactive notebook, you can try out remove_html_markup()
with other values as well. Click on the above cells with the invocation of remove_html_markup()
and change the value – say, to remove_html_markup("<em>foo</em>")
. Press Shift+Enter (or click on the play symbol) to execute it and see the result. If you get an error message, go to the above cell with the definition of remove_html_markup()
and execute this first. You can also run all cells at once; see the Notebook menu for details. (You can actually also change the text by clicking on it, and corect mistaks such as in this sentence.)
Executing a single cell does not execute other cells, so if your cell builds on a definition in another cell that you have not executed yet, you will get an error. You can select Run all cells above
from the menu to ensure all definitions are set.
Also keep in mind that, unless overwritten, all definitions are kept across executions. Occasionally, it thus helps to restart the kernel (i.e. start the Python interpreter from scratch) to get rid of older, superfluous definitions.
Since one can change not only invocations, but also definitions, we want to ensure that our function works properly now and in the future. To this end, we introduce tests through assertions – a statement that fails if the given check is false. The following assertion, for instance, checks that the above call to remove_html_markup()
returns the correct value:
assert remove_html_markup("Here's some <strong>strong argument</strong>.") == \
"Here's some strong argument."
If you change the code of remove_html_markup()
such that the above assertion fails, you will have introduced a bug.
As nice and simple as remove_html_markup()
is, it is buggy. Some HTML markup is not properly stripped away. Consider this HTML tag:
<input type="text" value="<your name>">
This would render as an input field in a form:
If we feed this string into remove_html_markup()
, we would expect an empty string as the result. Instead, this is what we get:
remove_html_markup('<input type="text" value="<your name>">')
'"'
Every time we encounter a bug, this means that our earlier tests have failed. We thus need to introduce another test that documents not only how the bug came to be, but also the result we actually expected.
The assertion we write now fails with an error message. (The ExpectError
magic ensures we see the error message, but the rest of the notebook is still executed.)
with ExpectError():
assert remove_html_markup('<input type="text" value="<your name>">') == ""
Traceback (most recent call last): File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_72676/145241764.py", line 2, in <cell line: 1> assert remove_html_markup('<input type="text" value="<your name>">') == "" AssertionError (expected)
With this, we now have our task: Fix the failure as above.
To properly understand what is going on here, it helps drawing a diagram on how remove_html_markup()
works. Technically, remove_html_markup()
implements a state machine with two states tag
and ¬ tag
. We change between these states depending on the characters we process. This is visualized in the following diagram:
# ignore
PASS = "✔"
FAIL = "✘"
PASS_COLOR = 'darkgreen' # '#006400' # darkgreen
FAIL_COLOR = 'red4' # '#8B0000' # darkred
STEP_COLOR = 'peachpuff'
FONT_NAME = 'Raleway'
# ignore
def graph(comment: str ="default") -> Digraph:
return Digraph(name='', comment=comment, graph_attr={'rankdir': 'LR'},
node_attr={'style': 'filled',
'fillcolor': STEP_COLOR,
'fontname': FONT_NAME},
edge_attr={'fontname': FONT_NAME})
# ignore
state_machine = graph()
state_machine.node('Start', )
state_machine.edge('Start', '¬ tag')
state_machine.edge('¬ tag', '¬ tag', label=" ¬ '<'\\nadd character")
state_machine.edge('¬ tag:s', '¬ tag', label="'>'")
state_machine.edge('¬ tag', 'tag', label="'<'")
state_machine.edge('tag', '¬ tag', label="'>'")
state_machine.edge('tag', 'tag', label="¬ '>'")
# ignore
display(state_machine)
You see that we start in the non-tag state (¬ tag
). Here, for every character that is not '<'
, we add the character and stay in the non-tag state. '>'
characters are skipped, though.
When we read a '<'
, though, we transition to the tag state (tag
) and stay in the tag state, skipping characters up to a closing '>'
character, which makes us transition to the non-tag state again.
We will now iteratively try to find a correct solution to our bug. On the way we will come up with different interim solutions that bring us closer and closer to the perfect code. Of course, we could come up with a near-perfect solution right away, but each of these interim solutions illustrate some techniques we can use during debugging.
Let us look at the above state machine, and process through our input:
<input type="text" value="<your name>">
So what you can see is: We are interpreting the '>'
of "<your name>"
as the closing of the tag. However, this is a quoted string, so the '>'
should be interpreted as a regular character, not as markup. This is an example of missing functionality: We do not handle quoted characters correctly. We haven't claimed yet to take care of all functionality, so we still need to extend our code.
So we extend the whole thing. We set up a special "quote" state which processes quoted inputs in tags until the end of the quoted string is reached. This is how the state machine looks like:
# ignore
state_machine = graph()
state_machine.node('Start')
state_machine.edge('Start', '¬ quote\\n¬ tag')
state_machine.edge('¬ quote\\n¬ tag', '¬ quote\\n¬ tag',
label="¬ '<'\\nadd character")
state_machine.edge('¬ quote\\n¬ tag', '¬ quote\\ntag', label="'<'")
state_machine.edge('¬ quote\\ntag', 'quote\\ntag', label="'\"'")
state_machine.edge('¬ quote\\ntag', '¬ quote\\ntag', label="¬ '\"' ∧ ¬ '>'")
state_machine.edge('quote\\ntag', 'quote\\ntag', label="¬ '\"'")
state_machine.edge('quote\\ntag', '¬ quote\\ntag', label="'\"'")
state_machine.edge('¬ quote\\ntag', '¬ quote\\n¬ tag', label="'>'")
# ignore
display(state_machine)
This is a bit more complex already. Proceeding from left to right, we first have the state ¬ quote ∧ ¬ tag
, which is our "standard" state for text. If we encounter a '<'
, we again switch to the "tagged" state ¬ quote ∧ tag
. In this state, however (and only in this state), if we encounter a quotation mark, we switch to the "quotation" state quote ∧ tag
, in which we remain until we see another quotation mark indicating the end of the string – and then continue in the "tagged" state ¬ quote ∧ tag
until we see the end of the string.
Things get even more complicated as HTML allows both single and double quotation characters. Here's a revised implementation of remove_html_markup()
that takes the above states into account:
def remove_html_markup(s): # type: ignore
tag = False
quote = False
out = ""
for c in s:
if c == '<' and not quote:
tag = True
elif c == '>' and not quote:
tag = False
elif c == '"' or c == "'" and tag:
quote = not quote
elif not tag:
out = out + c
return out
Now, our previous input works well:
remove_html_markup('<input type="text" value="<your name>">')
''
and our earlier tests also pass:
assert remove_html_markup("Here's some <strong>strong argument</strong>.") == \
"Here's some strong argument."
assert remove_html_markup('<input type="text" value="<your name>">') == ""
However, the above code still has a bug. In two of these inputs, HTML markup is still not properly stripped:
<b>foo</b>
<b>"foo"</b>
"<b>foo</b>"
<"b">foo</"b">
Can you guess which ones these are?
Again, a simple assertion will reveal the culprits:
with ExpectError():
assert remove_html_markup('<b>foo</b>') == 'foo'
with ExpectError():
assert remove_html_markup('<b>"foo"</b>') == '"foo"'
Traceback (most recent call last): File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_72676/4031927246.py", line 2, in <cell line: 1> assert remove_html_markup('<b>"foo"</b>') == '"foo"' AssertionError (expected)
with ExpectError():
assert remove_html_markup('"<b>foo</b>"') == '"foo"'
Traceback (most recent call last): File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_72676/242211734.py", line 2, in <cell line: 1> assert remove_html_markup('"<b>foo</b>"') == '"foo"' AssertionError (expected)
with ExpectError():
assert remove_html_markup('<"b">foo</"b">') == 'foo'
So, unfortunately, we're not done yet – our function still has errors.
Let us now discuss a couple of methods that do not work well for debugging. (These "devil's suggestions" are adapted from the 1993 book "Code Complete" from Steve McConnell.)
When I was a student, I never got any formal training in debugging, so I had to figure this out for myself. What I learned was how to use debugging output; in Python, this would be the print()
function. For instance, I would go and scatter print()
calls everywhere:
def remove_html_markup_with_print(s): # type: ignore
tag = False
quote = False
out = ""
for c in s:
print("c =", repr(c), "tag =", tag, "quote =", quote)
if c == '<' and not quote:
tag = True
elif c == '>' and not quote:
tag = False
elif c == '"' or c == "'" and tag:
quote = not quote
elif not tag:
out = out + c
return out
This way of inspecting executions is commonly called "Printf debugging", after the C printf()
function. Then, running this would allow me to see what's going on in my code:
remove_html_markup_with_print('<b>"foo"</b>')
c = '<' tag = False quote = False c = 'b' tag = True quote = False c = '>' tag = True quote = False c = '"' tag = False quote = False c = 'f' tag = False quote = True c = 'o' tag = False quote = True c = 'o' tag = False quote = True c = '"' tag = False quote = True c = '<' tag = False quote = False c = '/' tag = True quote = False c = 'b' tag = True quote = False c = '>' tag = True quote = False
'foo'
Yes, one sees what is going on – but this is horribly inefficient! Think of a 1,000-character input – you'd have to go through 1,000 lines of logs. It may help you, but it's a total time waster. Plus, you have to enter these statements, remove them again... it's a maintenance nightmare.
(You may even forget printf's in your code, creating a security problem: Mac OS X versions 10.7 to 10.7.3 would log the password in clear because someone had forgotten to turn off debugging output.)
I would also try to debug the program into existence. Just change things until they work. Let me see: If I remove the conditions "and not quote" from the program, it would actually work again:
def remove_html_markup_without_quotes(s): # type: ignore
tag = False
quote = False
out = ""
for c in s:
if c == '<': # and not quote:
tag = True
elif c == '>': # and not quote:
tag = False
elif c == '"' or c == "'" and tag:
quote = not quote
elif not tag:
out = out + c
return out
assert remove_html_markup_without_quotes('<b id="bar">foo</b>') == 'foo'
Cool! Unfortunately, the function still fails on the other input:
with ExpectError():
assert remove_html_markup_without_quotes('<b>"foo"</b>') == '"foo"'
Traceback (most recent call last): File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_72676/3864559921.py", line 2, in <cell line: 1> assert remove_html_markup_without_quotes('<b>"foo"</b>') == '"foo"' AssertionError (expected)
So, maybe we can change things again, such that both work? And maybe the other tests we had earlier won't fail? Let's just continue to change things randomly again and again and again.
Oh, and of course, I would never back up earlier versions such that I would be able to keep track of what has changed and when. Especially keep in mind that our tests will never cover all corner cases. Randomly patching code will just lead to harder to find bugs.
My favorite: Use the most obvious fix. This means that you're fixing the symptom, not the problem. In our case, this would be something like:
def remove_html_markup_fixed(s): # type: ignore
if s == '<b>"foo"</b>':
return '"foo"'
...
Miracle! Our earlier failing assertion now works! Now we can do the same for the other failing test, too, and we're done. (Rumor has it that some programmers use this technique to get their tests to pass...)
As with any devil's guide, you get an idea of how to do things by doing the opposite. What this means is:
which is what we will apply for the rest of this chapter.
To understand how to systematically debug a program, we first have to understand how failures come to be. The typical debugging situation looks like this:
If the input is incorrect, we wouldn't normally be searching for the bug in our program, but in whatever produced its input. If the input is under control of a third party, though (which typically is the case for system inputs), your program must check its correctness and reject it if it is incorrect. Once your program accepts a system input as valid, it must be properly processed by the program in question.
Hence, the (very simple) situation we're having can be shown as:
# ignore
from typing import List, Optional
# ignore
def execution_diagram(show_steps: bool = True, variables: List[str] = [],
steps: int = 3, error_step: int = 666,
until: int = 666, fault_path: List[str] = []) -> Digraph:
dot = graph()
dot.node('input', shape='none', fillcolor='white', label=f"Input {PASS}",
fontcolor=PASS_COLOR)
last_outgoing_states = ['input']
for step in range(1, min(steps + 1, until)):
step_color: Optional[str]
if step == error_step:
step_label = f'Step {step} {FAIL}'
step_color = FAIL_COLOR
else:
step_label = f'Step {step}'
step_color = None
if step >= error_step:
state_label = f'State {step} {FAIL}'
state_color = FAIL_COLOR
else:
state_label = f'State {step} {PASS}'
state_color = PASS_COLOR
state_name = f's{step}'
outgoing_states = []
incoming_states = []
if not variables:
dot.node(name=state_name, shape='box',
label=state_label, color=state_color,
fontcolor=state_color)
else:
var_labels = []
for v in variables:
vpath = f's{step}:{v}'
if vpath in fault_path:
var_label = f'<{v}>{v} ✘'
outgoing_states.append(vpath)
incoming_states.append(vpath)
else:
var_label = f'<{v}>{v}'
var_labels.append(var_label)
record_string = " | ".join(var_labels)
dot.node(name=state_name, shape='record',
label=nohtml(record_string), color=state_color,
fontcolor=state_color)
if not outgoing_states:
outgoing_states = [state_name]
if not incoming_states:
incoming_states = [state_name]
for outgoing_state in last_outgoing_states:
for incoming_state in incoming_states:
if show_steps:
dot.edge(outgoing_state, incoming_state,
label=step_label, fontcolor=step_color)
else:
dot.edge(outgoing_state, incoming_state)
last_outgoing_states = outgoing_states
if until > steps + 1:
# Show output
if error_step > steps:
dot.node('output', shape='none', fillcolor='white',
label=f"Output {PASS}", fontcolor=PASS_COLOR)
else:
dot.node('output', shape='none', fillcolor='white',
label=f"Output {FAIL}", fontcolor=FAIL_COLOR)
for outgoing_state in last_outgoing_states:
label = "Execution" if steps == 0 else None
dot.edge(outgoing_state, 'output', label=label)
display(dot)
# ignore
execution_diagram(show_steps=False, steps=0, error_step=0)
This situation we see above is what we call a failure: An externally visible error in the program behavior, with the error again being an unwanted and unintended deviation from what is correct, right, or true.
How does this failure come to be? The execution we see above breaks down into several program states, one after the other.
# ignore
for until in range(1, 6):
execution_diagram(show_steps=False, until=until, error_step=2)
Initially, the program state is still correct (✔). However, at some point in the execution, the state gets an error, also known as a fault. This fault – again an unwanted and unintended deviation from what is correct, right, or true – then propagates along the execution, until it becomes externally visible as a failure. (In reality, there are many, many more states than just this, but these would not fit in a diagram.)
How does a fault come to be? Each of these program states is produced by a step in the program code. These steps take a state as input and produce another state as output. Technically speaking, the program inputs and outputs are also parts of the program state, so the input flows into the first step, and the output is the state produced by the last step.
# ignore
for until in range(1, 6):
execution_diagram(show_steps=True, until=until, error_step=2)
Now, in the diagram above, Step 2 gets a correct state as input and produces a faulty state as output. The produced fault then propagates across more steps to finally become visible as a failure.
The goal of debugging thus is to search for the step in which the state first becomes faulty. The code associated with this step is again an error – an unwanted and unintended deviation from what is correct, right, or true – and is called a defect. This is what we have to find – and to fix.
Sounds easy, right? Unfortunately, things are not that easy, and that has something to do with the program state. Let us assume our state consists of three variables, var1
to var3
, and that Step 2 produces a fault in var2
. This fault then propagates to the output:
# ignore
for until in range(1, 6):
execution_diagram(show_steps=True, variables=['var1', 'var2', 'var3'],
error_step=2,
until=until, fault_path=['s2:var2', 's3:var2'])
The way these faults propagate is called a cause-effect chain:
Even worse: in most cases the fault often expresses itself first in the control flow rather than the variable contents. Such a deviation from a normal execution is in general even harder to find.
Since the code was originally written by a human, any defect can be related to some original mistake the programmer made. This gives us a number of terms that all are more precise than the general "error" or the colloquial "bug":
The cause-effect chain of events is thus
Note that not every defect also causes a failure, which is despite all testing, there can still be defects in the code looming around until the right conditions are met to trigger them. On the other hand, though, every failure can be traced back to the defect that causes it. Our job is to break the cause-effect chain.
To find a defect from a failure, we trace back the faults along their propagation – that is, we find out which faults in the earlier state have caused the later faults. We start from the very end of the execution and then gradually progress backwards in time, examining fault after fault until we find a transition from a correct state to a faulty state – that is, a step in which a correct state comes in and a faulty state comes out. At this point, we have found the origin of the failure – and the defect that causes it.
What sounds like a straight-forward strategy, unfortunately, doesn't always work this way in practice. That is because of the following problems of debugging:
First, program states are actually large, encompassing dozens to thousands of variables, possibly even more. If you have to search all of these manually and check them for faults, you will spend a lot of time for a single state.
Second, you do not always know whether a state is correct or not. While most programs have some form of specification for their inputs and outputs, these do not necessarily exist for intermediate results. If one had a specification that could check each state for correctness (possibly even automatically), debugging would be trivial. Unfortunately, it is not, and that's partly due to the lack of specifications.
Third, executions typically do not come in a handful of steps, as in the diagrams above; instead, they can easily encompass thousands to millions of steps. This means that you will have to examine not just one state, but several, making the problem much worse.
To make your search efficient, you thus have to focus your search – starting with most likely causes and gradually progressing to the less probable causes. This is what we call a debugging strategy. Intermediate assertions of the program state also come in handy, as they make the cause-effect chain smaller.
Now that we know how failures come to be, let's look into how to systematically find their causes. What we need is a strategy that helps us search for how and when the failure comes to be. For this, we use a process called the scientific method.
When we are debugging a program, we are trying to find the causes of a given effect – very much like natural scientists try to understand why things in nature are as they are and how they come to be. Over thousands of years, scientists have conducted observations and experiments to come to an understanding of how our world works. The process by which experimental scientists operate has been coined "The scientific method". This is how it works:
At this point, your hypothesis may be named a theory – that is, a predictive and comprehensive description of some aspect of the natural world. The gravitational theory, for instance, predicts very well how the moon revolves around the earth, and how the earth revolves around the sun. Our debugging problems are of a slightly lesser scale – we'd like a theory of how our failure came to be – but the process is pretty much the same.
# ignore
dot = graph()
dot.node('Hypothesis')
dot.node('Observation')
dot.node('Prediction')
dot.node('Experiment')
dot.edge('Hypothesis', 'Observation',
label="<Hypothesis<BR/>is <I>supported:</I><BR/>Refine it>",
dir='back')
dot.edge('Hypothesis', 'Prediction')
dot.node('Problem Report', shape='none', fillcolor='white')
dot.edge('Problem Report', 'Hypothesis')
dot.node('Code', shape='none', fillcolor='white')
dot.edge('Code', 'Hypothesis')
dot.node('Runs', shape='none', fillcolor='white')
dot.edge('Runs', 'Hypothesis')
dot.node('More Runs', shape='none', fillcolor='white')
dot.edge('More Runs', 'Hypothesis')
dot.edge('Prediction', 'Experiment')
dot.edge('Experiment', 'Observation')
dot.edge('Observation', 'Hypothesis',
label="<Hypothesis<BR/>is <I>rejected:</I><BR/>Seek alternative>")
# ignore
display(dot)
In debugging, we proceed the very same way – indeed, we are treating bugs as if they were natural phenomena. This analogy may sound far-fetched, as programs are anything but natural. Nature, by definition, is not under our control. But bugs are out of our control just as well. Hence, the analogy is not that far-fetched – and we can apply the same techniques for debugging.
Let us apply the scientific method to our Python program which removes HTML tags. First of all, let us recall the problem – remove_html_markup()
works for some inputs, but fails on others.
for i, html in enumerate(['<b>foo</b>',
'<b>"foo"</b>',
'"<b>foo</b>"',
'<b id="bar">foo</b>']):
result = remove_html_markup(html)
print("%-2d %-15s %s" % (i + 1, html, result))
1 <b>foo</b> foo 2 <b>"foo"</b> foo 3 "<b>foo</b>" <b>foo</b> 4 <b id="bar">foo</b> foo
Input #1 and #4 work as expected, the others do not. We can write these down in a table, such that we can always look back at our previous results:
Input | Expectation | Output | Outcome |
---|---|---|---|
<b>foo</b> |
foo |
foo |
✔ |
<b>"foo"</b> |
"foo" |
foo |
✘ |
"<b>foo</b>" |
"foo" |
<b>foo</b> |
✘ |
<b id="bar">foo</b> |
foo |
foo |
✔ |
quiz("From the difference between success and failure,"
" we can already devise some observations about "
" what is wrong with the output."
" Which of these can we turn into general hypotheses?",
[
"Double quotes (`\"`) are stripped from the tagged input.",
"Tags in double quotes are not stripped.",
"The tag `<>` is always stripped from the input.",
"Four-letter words are stripped."
], '[298 % 33, 1234 % 616]')
The hypotheses that remain are:
These may be two separate issues, but chances are they are tied to each other. Let's focus on 1., because it is simpler. Does it hold for all inputs, even untagged ones? Our hypothesis becomes
Let's devise an experiment to validate this. If we feed the string
"foo"
(including the double quotes) into remove_html_markup()
, we should obtain
"foo"
as result – that is, the output should be the unchanged input. However, if our hypothesis 1. is correct, we should obtain
foo
as result – that is, "Double quotes are stripped from the input" as predicted by the hypothesis.
We can very easily test this hypothesis:
remove_html_markup('"foo"')
'foo'
Our hypothesis is confirmed! We can add this to our list of observations.
Input | Expectation | Output | Outcome |
---|---|---|---|
<b>foo</b> |
foo |
foo |
✔ |
<b>"foo"</b> |
"foo" |
foo |
✘ |
"<b>foo</b>" |
"foo" |
<b>foo</b> |
✘ |
<b id="bar">foo</b> |
foo |
foo |
✔ |
"foo" |
"foo" |
foo |
✘ |
You can try out the hypothesis with more inputs – and it remains valid. Any non-markup input that contains double quotes will have these stripped.
Where does that quote-stripping come from? This is where we need to explore the cause-effect chain. The only place in remove_html_markup()
where quotes are handled is this line:
elif c == '"' or c == "'" and tag:
quote = not quote
So, quotes should be removed only if tag
is set. However, tag
can be set only if the input contains a markup tag, which is not the case for a simple input like "foo"
. Hence, what we observe is actually impossible. Yet, it happens.
Debugging is a game of falsifying assumptions. You assume the code works – it doesn't. You assume the tag
flag cannot be set – yet it may be. What do we do? Again, we create a hypothesis:
tag
being set.How do we know whether tag is being set? Let me introduce one of the most powerful debugging tools ever invented, the assert
statement. The statement
assert cond
evaluates the given condition cond
and
cond
does not hold: throw an exceptionAn assert
statement encodes our assumptions and as such, should never fail. If it does, well, then something is wrong.
Furthermore, assert
statements break the cause-effect chain into smaller pieces. The observable failure occurs earlier in
the program execution and thus closer to the initial fault.
Using assert
, we can check the value of tag
all through the loop:
def remove_html_markup_with_tag_assert(s): # type: ignore
tag = False
quote = False
out = ""
for c in s:
assert not tag # <=== Just added
if c == '<' and not quote:
tag = True
elif c == '>' and not quote:
tag = False
elif c == '"' or c == "'" and tag:
quote = not quote
elif not tag:
out = out + c
return out
Our expectation is that this assertion would fail. So, do we actually get an exception? Try it out for yourself by uncommenting the following line:
# remove_html_markup_with_tag_assert('"foo"')
quiz("What happens after inserting the above assertion?",
[
"The program raises an exception. (i.e., `tag` is set)",
"The output is as before, i.e., `foo` without quotes."
" (which means that `tag` is not set)"
], 2)
Here's the solution:
with ExpectError():
result = remove_html_markup_with_tag_assert('"foo"')
result
'foo'
We did not get an exception, hence we reject our hypothesis:
tag
being set.Again, let's go back to the only place in our code where quotes are handled:
elif c == '"' or c == "'" and tag:
quote = not quote
Because of the assertion, we already know that tag
is always False. Hence, this condition should never hold either.
But maybe there's something wrong with the condition such that it holds? Here's our hypothesis:
If the condition evaluates to true, then quote
should be set. We could now go and assert that quote
is false; but we only care about the condition. So we insert an assertion that assumes that setting the code setting the quote
flag is never reached:
def remove_html_markup_with_quote_assert(s): # type: ignore
tag = False
quote = False
out = ""
for c in s:
if c == '<' and not quote:
tag = True
elif c == '>' and not quote:
tag = False
elif c == '"' or c == "'" and tag:
assert False # <=== Just added
quote = not quote
elif not tag:
out = out + c
return out
Our expectation this time again is that the assertion fails. So, do we get an exception this time? Try it out for yourself by uncommenting the following line:
# remove_html_markup_with_quote_assert('"foo"')
quiz("What happens after inserting the 'assert' tag?",
[
"The program raises an exception (i.e., the quote condition holds)",
"The output is still foo (i.e., the quote condition does not hold)"
], 29 % 7)
Here's what happens now that we have the assert
tag:
with ExpectError():
result = remove_html_markup_with_quote_assert('"foo"')
Traceback (most recent call last): File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_72676/904218512.py", line 2, in <cell line: 1> result = remove_html_markup_with_quote_assert('"foo"') File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_72676/2877654332.py", line 12, in remove_html_markup_with_quote_assert assert False # <=== Just added AssertionError (expected)
From this observation, we can deduce that our hypothesis is confirmed:
and the condition is actually faulty. It evaluates to True although tag
is always False:
elif c == '"' or c == "'" and tag:
quote = not quote
But this condition holds for single and double quotes. Is there a difference?
Let us see whether our observations generalize towards general quotes:
We can verify these hypotheses with an additional experiment. We go back to our original implementation (without any asserts), and then check it:
remove_html_markup("'foo'")
"'foo'"
Surprise: Our hypothesis is rejected and we can add another observation to our table:
Input | Expectation | Output | Outcome |
---|---|---|---|
'foo' |
'foo' |
'foo' |
✔ |
So, the condition
At this point, you should have enough material to solve the problem. How do we have to fix the condition? Here are four alternatives:
c == "" or c == '' and tag # Choice 1
c == '"' or c == "'" and not tag # Choice 2
(c == '"' or c == "'") and tag # Choice 3
... # Something else
quiz("How should the condition read?",
[
'''`c == "" or c == '' and tag` (Choice 1)''',
'''`c == '"' or c == "'" and not tag` (Choice 2)''',
'''`(c == '"' or c == "'") and tag` (Choice 3)''',
"Something else"
],
'399 % 4')
So, you have spotted the defect: In Python (and most other languages), and
takes precedence over or
, which is why the condition is wrong. It should read:
(c == '"' or c == "'") and tag
(Actually, good programmers rarely depend on precedence; it is considered good style to use parentheses lavishly.)
So, our hypothesis now has become
quote
condition evaluating to TrueIs this our final hypothesis? We can check our earlier examples whether they should now work well:
Input | Expectation | Output | Outcome |
---|---|---|---|
<b>foo</b> |
foo |
foo |
✔ |
<b>"foo"</b> |
"foo" |
foo |
✘ |
"<b>foo</b>" |
"foo" |
<b>foo</b> |
✘ |
<b id="bar">foo</b> |
foo |
foo |
✔ |
"foo" |
'foo' |
foo |
✘ |
'foo' |
'foo' |
'foo' |
✔ |
In all of these examples, the quote
flag should now be set outside of tags; hence, everything should work as expected.
In terms of the scientific process, we now have a theory – a hypothesis that
For debugging, our problems are usually too small for a big word like theory, so we use the word diagnosis instead. So is our diagnosis sufficient to fix the bug? Let us check.
In debugging, you should start to fix your code if and only if you have a diagnosis that shows two things:
Showing both these aspects requirements – causality and incorrectness – are crucial for a debugging diagnosis:
When you do have a diagnosis that explains both causality (how the failure came to be), and incorrectness (how to correct the code accordingly), then (and only then!) is it time to actually fix the code accordingly. After applying the fix, the failure should be gone, and no other failure should occur. If the failure persists, this should come as a surprise. Obviously, there is some other aspect that you haven't considered yet, so you have to go back to the drawing board and add another failing test case to the set of observations.
All these things considered, let us go and fix remove_html_markup()
. We know how the defect causes the failure (by erroneously setting quote
outside of tags). We know that the line in question is incorrect (as single and double of quotes should be treated similarly). So, our diagnosis shows both causality and incorrectness, and we can go and fix the code accordingly:
def remove_html_markup(s): # type: ignore
tag = False
quote = False
out = ""
for c in s:
if c == '<' and not quote:
tag = True
elif c == '>' and not quote:
tag = False
elif (c == '"' or c == "'") and tag: # <-- FIX
quote = not quote
elif not tag:
out = out + c
return out
We verify that the fix was successful by running our earlier tests. Not only should the previously failing tests now pass, the previously passing tests also should not be affected. Fortunately, all tests now pass:
assert remove_html_markup("Here's some <strong>strong argument</strong>.") == \
"Here's some strong argument."
assert remove_html_markup(
'<input type="text" value="<your name>">') == ""
assert remove_html_markup('<b>foo</b>') == 'foo'
assert remove_html_markup('<b>"foo"</b>') == '"foo"'
assert remove_html_markup('"<b>foo</b>"') == '"foo"'
assert remove_html_markup('<b id="bar">foo</b>') == 'foo'
So, our hypothesis was a theory, and our diagnosis was correct. Success!
A defect may have more than one hypothesis, and each diagnosis can be obtained by many ways. We could also have started with our other hypothesis
and by reasoning and experiments, we would have reached the same conclusion that the condition is faulty:
tag
flag must be set (but it is not).tag
flag, the quote
variable must not be set (but it is).quote
flag is set under the given condition (which thus must be faulty).This gets us to the same diagnosis as above – and, of course, the same fix.
After having successfully validated the fix, we still have some homework to make.
First, we may want to check that the underlying mistake was not made elsewhere, too.
For an error as with remove_html_markup()
, it may be wise to check other parts of the code (possibly written by the same programmer) whether Boolean formulas show proper precedence. Consider setting up a static program checker or style checker to catch similar mistakes.
Usually if you code in an IDE specifically designed for language you are coding in, you will get hints on so called code smells,
parts of your code that are typically erroneous because of bad style or other typical mistakes.
If the defect was not found through testing, now is a good time to make sure it will be found the next time. If you use automated tests, add a test that catches the bug (as well as similar ones), such that you can prevent regressions.
To be 100% sure, we could add an assertion to remove_html_markup()
that checks the final result for correctness. Unfortunately, writing such an assertion is just as complex as writing the function itself.
There is one assertion, though, which could be placed in the loop body to catch this kind of errors, and which could remain in the code. Which is it?
quiz("Which assertion would have caught the problem?",
[
"`assert quote and not tag`",
"`assert quote or not tag`",
"`assert tag or not quote`",
"`assert tag and not quote`"
], '3270 - 3267')
Indeed, the statement
assert tag or not quote
is correct. This excludes the situation of ¬tag
∧ quote
– that is, the tag
flag is not set, but the quote
flag is. If you remember our state machine from above, this is actually a state that should never exist:
# ignore
display(state_machine)
Here's our function in its "final" state. As software goes, software is never final – and this may also hold for our function, as there is still room for improvement. For this chapter though, we leave it be.
def remove_html_markup(s): # type: ignore
tag = False
quote = False
out = ""
for c in s:
assert tag or not quote
if c == '<' and not quote:
tag = True
elif c == '>' and not quote:
tag = False
elif (c == '"' or c == "'") and tag:
quote = not quote
elif not tag:
out = out + c
return out
It may sound obvious, but your fix is worth nothing if it doesn't go into production. Be sure to commit your change to the code repository, together with your diagnosis. If your fix has to be approved by a third party, a good diagnosis on why and what happened is immensely helpful.
If you systematically track bugs, and your bug is properly tracked, now is the time to mark the issue as "resolved". Check for duplicates of the issue and check whether they are resolved, too. And now, you are finally done:
Time to relax – and look for the next bug!
We have now systematically fixed a bug. In this book, we will explore a number of techniques to make debugging easier – coming up with automated diagnoses, explanations, even automatic repairs, including for our example above. But there are also a number of things you can do to become a better debugger.
If you're an experienced programmer, you may have spotted the problem in remove_html_markup()
immediately, and start fixing the code right away. But this is dangerous and risky.
Why is this so? Well, because you should first
You can skip these steps, and jump right to your interactive debugger the very moment you see a failure, happily stepping through their program. This may even work well for simple problems, including this one. The risk, however, is that this narrows your view to just this one execution, which limits your ability to understand all the circumstances of the problem. Even worse: If you start "fixing" the bug without exactly understanding the problem, you may end up with an incomplete solution – as illustrated in "The Devil's Guide to Debugging", above.
A second risk of starting debugging too soon is that it lets you easily deviate from a systematic process. Remember how we wrote down every experiment in a table? How we numbered every hypothesis? This is not just for teaching. Writing these things down explicitly allow you to keep track of all your observations and hypotheses over time.
Input | Expectation | Output | Outcome |
---|---|---|---|
<b>foo</b> |
foo |
foo |
✔ |
Every time you come up with a new hypothesis, you can immediately check it against your earlier observations, which will help you to eliminate unlikely ones from the start. This is a bit like in the classic "Mastermind" board game, in which you have to guess some secret combination of pins, and in which your opponent gives you hints on whether and how your guesses are correct. At any time, you can see your previous guesses (experiments) and the results (observations) you got; any new guess (hypothesis) has to be consistent with the previous observations and experiments.
Keeping such a log also allows you to interrupt your debugging session at any time. You can be home in time, sleep over the problem, and resume the next morning with a refreshed mind. You can even hand over the log to someone else, stating your findings so far.
The alternative to having a log is to keep all in memory. This only works for short amounts of time, as it puts a higher and higher cognitive load on your memory as you debug along. After some time, you will forget earlier observations, which leads to mistakes. Worst of all, any interruption will break your concentration and make you forget things, so you can't stop debugging until you're done.
Sure, if you are a real master, you can stay glued to the screen all night. But I'd rather be home in time, thank you.
A great technique to revisit your observations and to come up with new hypotheses is to explain the problem to someone else. In this process, the "someone else" is important, but even more important is that you are explaining the problem to yourself! As Kernighan and Pike \cite{Kernighan1999} put it:
Sometimes it takes no more than a few sentences, followed by an embarrassed "Never mind. I see what's wrong. Sorry to bother you."
The reason why this works is that teaching someone else forces you to take different perspectives, and these help you to resolve the inconsistency between what you assume and what you actually observe.
Since that "someone else" can be totally passive, you can even replace her with an inanimate object to talk to – even a rubber duck. This technique is called rubber duck debugging or rubberducking – the idea is that you explain your problem to a rubber duck first before interrupting one of your co-workers with the problem. Some programmers, when asked for advice, explicitly request that you "explain your problem to the duck first", knowing that this resolves a good fraction of problems.
And it's not only that debugging takes time – the worst thing is that it is a search process, which can take anything between a few minutes and several hours, sometimes even days and weeks. But even if you never know how much time a bug will take, it's a bit of blessing to use a process which gradually gets you towards its cause.
The fix of a bug may lead to a code construct, a more complex condition or another solution which may look strange on first sight. Often the initial code was easier to understand but did not catch all cases. After fixing a bug it may be beneficial to write down why the code now looks like this and what caused the more complex construct. A comment right at the fixed location will help future programmers to not be confused or even re-introduce the bug while refactoring for better readability. And of course, helpful comments at complex code locations should even exist before something went wrong with the code -- but thats a chapter in another book.
Engineers and programmers have long used the term "bug" for faults in their systems – as if it were something that crept into an otherwise flawless program to cause the effects that none could explain. And from a psychological standpoint, it is far easier to blame some "bug" rather than taking responsibility ourselves. In the end, though, we have to face the fact: We made the bugs, and they are ours to fix.
Having said that, there has been one recorded instance where a real bug has crept into a system. That was on September 9, 1947, when a moth got stuck in the relay of a Harvard Mark II machine. This event was logged, and the log book is now on display at the Smithsonian Natural Museum of American History, as "First actual case of bug being found."
The actual term "bug", however, is much older. What do you think is its origin?
# ignore
import hashlib
# ignore
bughash = hashlib.md5(b"debug").hexdigest()
quiz('Where has the name "bug" been used to denote disruptive events?',
[
'In the early days of Morse telegraphy, referring to a special key '
'that would send a string of dots',
'Among radio technicians to describe a device that '
'converts electromagnetic field variations into acoustic signals',
"In Shakespeare's " '"Henry VI", referring to a walking spectre',
'In Middle English, where the word "bugge" is the basis for terms '
'like "bugbear" and "bugaboo"'
], [bughash.index(i) for i in "d42f"])
Need help on this quiz? To learn more about the term "bug" and its origin, have a look at the etymology of the word "bug" in The Jargon File. Also check out the Wikipedia entry on debugging!
In this chapter, we introduce some basics of how failures come to be as well as a general process for debugging.
In the next chapters, we will learn how to
Enjoy!
There are several good books on debugging, but these three are especially recommended:
All these books focus on manual debugging and the debugging process, just like this chapter; for automated debugging, simply read on :-)
Your first exercise in this book is to get acquainted with notebooks and Python, such that you can run the code examples in the book – and try out your own. Here are a few tasks to get you started.
The easiest way to get access to the code is to run them in your browser.
Resources
$\rightarrow$ Edit as Notebook
.For help on Jupyter Notebooks, from the Web Page, check out the Help
menu.
This is useful if you want to make greater changes, but do not want to work with Jupyter.
Resources
$\rightarrow$ Download Code
.For help on Python, from the Web Page, check out the Help
menu.
This is useful if you want to work with Jupyter on your machine. This will allow you to also run more complex examples, such as those with graphical output.
Resources
$\rightarrow$ All Notebooks
.00_Table_of_Contents.ipynb
.Resources
$\rightarrow$ Download Notebook
. Running these, however, will require that you have the other notebooks downloaded already.For help on Jupyter Notebooks, from the Web Page, check out the Help
menu.
This is useful if you want to contribute to the book with patches or other material. It also gives you access to the very latest version of the book.
Resources
$\rightarrow$ GitHub Repo
.git pull
will get you updated.If you want to contribute code or text, check out the Guide for Authors.
You may have noticed that our remove_html_markup()
function is still not working perfectly under all circumstances. The error has something to do with different quotes occurring in the input.
What does the problem look like? Set up a test case that demonstrates the problem.
assert(...)
Set up additional test cases as useful.
Using the scientific method, identify the extent and cause of the problem. Write down your hypotheses and log your observations, as in
Input | Expectation | Output | Outcome |
---|---|---|---|
(input) | (expectation) | (output) | (outcome) |
Design a fix for the problem. Show that it satisfies the earlier tests and does not violate any existing test.