Illustrated Code

Building Software in a Literate Way

This notebook contains the code examples for Andreas Zeller's keynotes:

  • "Illustrated Code: Building Software in a Literate Way" at ASE 2021; and
  • "Illustrated Code; What Software Engineering can Learn from Research Software" at SE 2022.

Go and

Talk Abstract

Notebooks – rich, interactive documents that join together code, documentation, and outputs – are all the rage with data scientists. But can they be used for actual software development? In this talk, I share experiences from authoring two interactive textbooks – fuzzingbook.org and debuggingbook.org – and show how notebooks not only serve for exploring and explaining code and data, but also how they can be used as software modules, integrating self-checking documentation, tests, and tutorials all in one place. The resulting software focuses on the essential, is well-documented, highly maintainable, easily extensible, and has a much higher shelf life than the "duct tape and wire” prototypes frequently found in research and beyond.

Andreas Zeller is faculty at the CISPA Helmholtz Center for Information Security and professor for Software Engineering at Saarland University, both in Saarbrücken, Germany. His research on automated debugging, mining software archives, specification mining, and security testing has proven highly influential. Zeller is an ACM Fellow and holds an ACM SIGSOFT Outstanding Research Award.

Some Support Code

from Tracer import Tracer
from typing import Any, Callable
from types import FrameType
from inspect import signature, getmembers

Sequence Diagrams with Mermaid

class SequenceDiagramTracer(Tracer):
    def __init__(self, client='Client', server='Server'):
        super().__init__()
        self.lines = []
        self.client = client
        self.server = server

    def traceit(self, frame: FrameType, event: str, arg: Any):
        if event == 'call':
            func = frame.f_code.co_name
            args = frame.f_locals
            args_line = ", ".join(reversed([var + "=" + repr(args[var]) 
                                            for var in args]))
            line = f'{self.client}->>+{self.server}: {func}({args_line})'
            self.lines.append(line)

        if event == 'return':
            line = f'{self.server}-->>-{self.client}'
            if arg is not None:
                line += f': {repr(arg)}'
            self.lines.append(line)

    def _repr_markdown_(self) -> str:
        return '\n'.join(['```mermaid'] +
                         ['sequenceDiagram'] + 
                         self.lines + 
                         ['```'])

Class Diagrams with Mermaid

class ClassDiagram():
    def __init__(self, cls):
        self.cls = cls

    def methods_str(self):
        members = [(name, fun) for (name, fun) in getmembers(self.cls)
                   if not name.startswith('_')]
        attributes = '\n'.join([f'  -{name} = {repr(member)}'
                                for (name, member) in members if not callable(member)
                                ])

        methods = '\n'.join([f'  +{name}{str(signature(member)).replace(" -> ", " ")}'
                             for (name, member) in members if callable(member)
                             ])
        return attributes + '\n' + methods

    def _repr_markdown_(self) -> str:
        return f"""
```mermaid
classDiagram
direction TD
class Server {{
  {self.methods_str()}
}}
```"""

Illustrated Code: Building Software in a Literate Way

Jupyter Demo: Factorials

We do a bit of Jupyter demo. Double-click on a cell to edit it. Press Shift+Return to execute/render it.

factorial(n) computes the factorial of n, that is $n! = \prod_{i=1}^n i = 1 \times 2 \times \dots \times n$.

def factorial(n: int) -> int:
    return 1 if n <= 1 else n * factorial(n - 1)
factorial(3)
6
assert factorial(3) == 6

Can Programming be Liberated from the Typewriter Style?

We define a function middle(x, y, z) that returns the "middle" of three integers $x$, $y$, and $z$ – i.e. the one that is neither the maximum nor the minimum of the three.

We show how to use notebooks to

  • document its interface
  • provide a specification
  • provide rationales and experiments
  • include tests
  • include architecture

Interface

Let us define an interface for middle():

def middle_i(x: int, y: int, z: int) -> int:
    """Return the middle of three numbers x, y, z"""
    ...
  • Standard way of documenting things
  • No formal spec (what is "the middle" here?); no context; no rationale
  • No usage example
  • No implementation (yet)

Specification

Here's an (executable) specification of middle():

def middle_spec(x: int, y: int, z: int) -> int:
    return sorted([x, y, z])[1]

This specification is executable, so we can easily include examples:

middle_spec(5, 3, 7)
5

Or just write the examples as assertions, so we can use them as tests later:

assert middle_spec(5, 4, 7) == 5

Of course, your specification can also include all sorts of diagrams. (Install jupyterlab-markup for this.)

mermaid
sequenceDiagram
    Client->>+Server: middle(5, 4, 7)
    Server-->>-Client: 5

For the record, this diagram is created with five lines of Markdown:

```mermaid
sequenceDiagram
Client->>+Server: middle(5, 4, 7)
Server-->>-Client: 5
```

Implementation

Let us now provide an efficient implementation for middle():

def middle(x: int, y: int, z: int) -> int:
    """Return the middle of three numbers x, y, z"""
    if y < z:
        if x < y:
            return y
        elif x < z:
            return x
    else:
        if x > y:
            return y
        elif x > z:
            return x
    return z

Once written, this is executable:

middle(5, 4, 1)
4

Tests and results become part of the doc!

Rationale

Why do we implement middle() as above, rather than using the much shorter middle_spec() code? Because it is about twice as fast:

import time
import random
def middle_benchmark(middle_fun: Callable[[int, int, int], int],
                     n: int = 1000000) -> float:
    """Return elapsed time for calling `middle_fun` `n` times"""
    elapsed = 0.0

    for i in range(n):
        x = random.randint(0, 1000)
        y = random.randint(0, 1000)
        z = random.randint(0, 1000)
        start = time.perf_counter()
        _ = middle_fun(x, y, z)
        end = time.perf_counter()
        elapsed += (end - start)

    return elapsed
middle_elapsed = middle_benchmark(middle)
middle_elapsed
0.11059187731007114
middle_spec_elapsed = middle_benchmark(middle_spec)
middle_spec_elapsed
0.2636633733054623

The document can include all these experiments and their results as a rationale, as above

The document can also discuss and evaluate more alternatives,
 reproducing the thoughts and experiments of the original programmer

We can have the document check automatically whether the rationale holds:

assert middle_elapsed < middle_spec_elapsed, "Inconsistent doc"
assert middle_elapsed * 1.2 < middle_spec_elapsed, "Inconsistent doc"

This ensures consistency between text and code.

Tests

Tests can be written as additional examples on how the code should work:

Regular Tests

assert middle(1, 2, 3) == 2
assert middle(3, 1, 2) == 2
assert middle(2, 3, 1) == 2

If a test fails, that's the same as an example failing. (And examples act as tests.)

One can analyze (and report) test performance, again in the document – for instance, measure the coverage of our code (* = line is covered during testing)

from StatisticalDebugger import CoverageCollector, code_with_coverage
with CoverageCollector() as c:
    assert middle(1, 2, 3) == 2
    assert middle(3, 1, 2) == 2
    assert middle(2, 3, 1) == 2
code_with_coverage(middle, c.coverage())
   1 * def middle(x: int, y: int, z: int) -> int:
   2       """Return the middle of three numbers x, y, z"""
   3 *     if y < z:
   4 *         if x < y:
   5 *             return y
   6 *         elif x < z:
   7               return x
   8       else:
   9 *         if x > y:
  10               return y
  11 *         elif x > z:
  12 *             return x
  13 *     return z

It seems we need to add more tests to cover all lines:

with c:
    assert middle(2, 1, 3) == 2
    assert middle(3, 2, 1) == 2

And we achieve 100% coverage:

code_with_coverage(middle, c.coverage())
   1 * def middle(x: int, y: int, z: int) -> int:
   2       """Return the middle of three numbers x, y, z"""
   3 *     if y < z:
   4 *         if x < y:
   5 *             return y
   6 *         elif x < z:
   7 *             return x
   8       else:
   9 *         if x > y:
  10 *             return y
  11 *         elif x > z:
  12 *             return x
  13 *     return z

Assumptions about coverage can be made in the document, too:

assert len(c.coverage()) >= 11

Check against Spec

One can check against the spec, again in the document. Here we compare middle() against middle_spec() with 100,000 random numbers.

for i in range(100000):
    x = random.randint(0, 1000)
    y = random.randint(0, 1000)
    z = random.randint(0, 1000)
    assert middle(x, y, z) == middle_spec(x, y, z)

All these tests can be run (and debugged) right from the document.

Symbolic Verification

One can also include static checks or symbolic verification. Here, we encode the path constraints from the middle() code for the Z3 constraint solver:

from z3 import *
s = Solver()  # Create a Z3 solver with four variables
x, y, z = Int('x'), Int('y'), Int('z')
m = Int('middle')
s.add(Implies(And(y < z, x < y), m == y))  # Encode the middle() constraints
s.add(Implies(And(y < z, x >= y), m == x))
s.add(Implies(And(y >= z, x > y), m == y))
s.add(Implies(And(y >= z, x <= y), m == x))
s.add(Implies(And(Not(x < y), Not(x < z), Not(x > y), Not(x > z)), m == z))

We can actually prove correctness this way: Is it possible that middle() returns a wrong result? (no.)

s.add(And(x < y, y < z, m != y))  # This shouldn't be possible
s.check()
unsat

Architecture

We can extract architecture diagrams such as a class diagram from code, always kept up to date:

class Server():
    state = 42

    def middle(x: int, y: int, z: int) -> int:
        return middle(x, y, z)

    def min(x: int, y: int, z: int) -> int:
        return min(x, y, z)

    def max(x: int, y: int, z: int) -> int:
        return max(x, y, z)
ClassDiagram(Server)
mermaid
classDiagram
direction TD
class Server {
    -state = 42
  +max(x: int, y: int, z: int) int
  +middle(x: int, y: int, z: int) int
  +min(x: int, y: int, z: int) int
}

We can extract dynamic diagrams from executions:

with SequenceDiagramTracer() as tracer:
    m = middle(30, 50, 20)
tracer
mermaid
sequenceDiagram
Client->>+Server: middle(z=20, y=50, x=30)
Server-->>-Client: 30
Client->>+Server: __del__(self=middle)
Client->>+Server: ref(self=<z3.z3.Context object at 0x1198ac4d0>)
Server-->>-Client: <ContextObj object at 0x1198c7a50>
Client->>+Server: ref(self=<z3.z3.Context object at 0x1198ac4d0>)
Server-->>-Client: <ContextObj object at 0x1198c7a50>
Client->>+Server: as_ast(self=middle)
Server-->>-Client: <Ast object at 0x1198c7d50>
Client->>+Server: Z3_dec_ref(_elems=<z3.z3core.Elementaries object at 0x1198acfe0>, a1=<Ast object at 0x1198c7d50>, a0=<ContextObj object at 0x1198c7a50>)
Client->>+Server: from_param(obj=<ContextObj object at 0x1198c7a50>)
Server-->>-Client: <ContextObj object at 0x1198c7a50>
Client->>+Server: from_param(obj=<Ast object at 0x1198c7d50>)
Server-->>-Client: <Ast object at 0x1198c7d50>
Server-->>-Client
Server-->>-Client

One may even compare this diagram with the one in the specification and flag mismatches.

Tutorial

The document can contain instructions on how to run things. (Of course, these would be executable too, testing the tutorial.)

To use middle, you need Python 3.9 or later. First install the debuggingbook module, available via the Python pip program:

!pip install --quiet debuggingbook

Within debuggingbook, the StatisticalDebugger provides a middle() function, but it is buggy (as it serves to demonstrate statistical debugging). A correct version is available as middle_fixed(), which you can import as middle() as follows:

from debuggingbook.StatisticalDebugger import middle_fixed as middle
middle(5, 4, 1)
4
from z3 import *
s = Solver()  # Create a Z3 solver with four variables
x, y, z = Int('x'), Int('y'), Int('z')
m = Int('middle')
s.add(Implies(And(y < z, x < y), m == y))  # Encode the middle() constraints
s.add(Implies(And(y < z, x >= y), m == x))
s.add(Implies(And(y >= z, x > y), m == y))
s.add(Implies(And(y >= z, x <= y), m == x))
s.add(Implies(And(Not(x < y), Not(x < z), Not(x > y), Not(x > z)), m == z))

We can actually prove correctness this way: Is it possible that middle() returns a wrong result? (no.)

s.add(And(x < y, y < z, m != y))  # This shouldn't be possible
s.check()
unsat

Q&A

The document can contain sections with questions and answers. These would be managed by the public, and continuously ensure consistency.

What's wrong with middle()?

Question. I use middle from the StatisticalDebugger module. However, it doesn't seem to return the correct value. What am I doing wrong? -- novice@python.net

from StatisticalDebugger import middle
middle(2, 1, 3)  # should be 2
1

Best Answer (+10). You need to import middle_fixed instead. -- expert@debugging.com

from StatisticalDebugger import middle_fixed as middle
middle(2, 1, 3)  # should be 2
2

Answer (+5) Don't use middle(x, y, z) -- just write sorted([x, y, z])[1] -- say.no.to@libraries.com

sorted([2, 1, 3])[1]
2

Comment (-2) Actually, sort() is more efficient. -- cpluspluslover@programming.com

xyz = list([2, 1, 3]); xyz.sort(); xyz[1]
2

Comment (+2) sort() takes three lines of code, whereas sorted() takes one. Also, avoid multiple statements on a line -- haskellfan@beautifulcode.org

I get a syntax error

Question. When I run the above middle() code, I get a message SyntaxError: invalid syntax. What am I doing wrong? -- appleuser@mac.com

Best Answer (+10). Are you using Python 2.x? Type annotations work with Python 3.6 and later -- updates@python.org

Can we have a largest() and smallest() function too?

Question. How do I get the greatest (or smallest) of $x$, $y$, and $z$? -- novice@python.net

Best Answer (+10). Try Python max() and min() -- guido@python.org

max(2, 1, 3)  # should be 3
3
min(2, 1, 3)  # should be 1
1

More Drawing UML

mermaid
graph TD;
    A-->B;
    A-->C;
    B-->D;
    C-->D;
from IPython.display import display, Markdown
Markdown("""
```mermaid
graph TD;
    A-->B;
    B-->A;
    A-->C;
    B-->D;
    C-->D;
```""")
mermaid
graph TD;
    A-->B;
    B-->A;
    A-->C;
    B-->D;
    C-->D;

Influences:

  • \TeX: The Program by Donald J. Knuth
  • Operating Systems: Design and Implementation by Andrew Tanenbaum

Our goal: Have a single document that encompasses

  • Code
  • Tests
  • Tutorial
  • How-To-Guides
  • Explanations
  • Reference

in an executable and self-updating way.

What "Illustrate" stands for

I is for Illustrate

Every piece of code should come with an example that illustrates is usage.

L is for Lavish

Go beyond typewriter text. Make use of all media modern tools have to offer – diagrams! charts! videos! Tie these to your code and examples, such that they stay in sync.

L is for Log

Do not just describe the final version. Discuss alternatives you have tried (and revoked, and why). This will be helpful for understanding later.

Note down your ideas and plans during development + testing

Also log during debugging. Keep track of your experiments and their results. Turn these into test cases later.

U is for Update

Make your document the single place to be updated.

S is for Spiralize

Focus on the essentials first, and add details later. Give the reader moments where they can stay and recapitulate what they learned. Make these abstraction layers in your code, such that readers can choose what to use (and what to read, too!) Proceed step by step, illustrating one piece at a time.

T is for Tutorial

Allow your readers to learn what your code is about – from a usage perspective, but also from an implementation perspective. Use quizzes, tests, exercises.

R is for Reference

Allow your readers to extract and study those parts they need – interfaces, specs. And, of course, allow them to use your code.

A is for Assist

Have others contribute to your tutorial – e.g. by providing recipes for specific how-to questions Think of StackOverflow, but constantly tested and updated

StackOverflow is filled with how-to questions. Yet, many of these are outdated over time. Allow for your readers to ask questions (with code that will be executed and tested as part of your work). Make your work a community effort!

T is for Test

Having detailed examples should automatically lead to full coverage of your code. Have assertions on top – to explain what is going on, but also to ensure consistency between code and text.

Save previous outputs of your examples; get notified when things change. Great way to do regression testing!

E is for Ease

Your job is coding, teaching, and testing – all in one place

Ease the life of future readers (which may include you)

Ease the life of people who want to use your code

Creative Commons License The content of this project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The source code that is part of the content, as well as the source code used to format and display that content is licensed under the MIT License. Last change: 2024-06-29 18:27:38+02:00CiteImprint