Debugging Performance Issues

Most chapters of this book deal with functional issues – that is, issues related to the functionality (or its absence) of the code in question. However, debugging can also involve nonfunctional issues, however – performance, usability, reliability, and more. In this chapter, we give a short introduction on how to debug such nonfunctional issues, notably performance issues.

from bookutils import YouTubeVideo
YouTubeVideo("0tMeB9G0uUI")

Prerequisites

Synopsis

To use the code provided in this chapter, write

>>> from debuggingbook.PerformanceDebugger import <identifier>

and then make use of the following features.

This chapter provides a class PerformanceDebugger that allows measuring and visualizing the time taken per line in a function.

>>> with PerformanceDebugger(TimeCollector) as debugger:
>>>     for i in range(100):
>>>         s = remove_html_markup('<b>foo</b>')

The distribution of executed time within each function can be obtained by printing out the debugger:

>>> print(debugger)
 238   5% def remove_html_markup(s):
 239   2%     tag = False
 240   2%     quote = False
 241   2%     out = ""
 242   0%
 243  16%     for c in s:
 244  14%         assert tag or not quote
 245   0%
 246  14%         if c == '<' and not quote:
 247   2%             tag = True
 248  11%         elif c == '>' and not quote:
 249   2%             tag = False
 250   8%         elif (c == '"' or c == "'") and tag:
 251   0%             quote = not quote
 252   8%         elif not tag:
 253   4%             out = out + c
 254   0%
 255   2%     return out

The sum of all percentages in a function should always be 100%.

These percentages can also be visualized, where darker shades represent higher percentage values:

>>> debugger
 238 def remove_html_markup(s):  # type: ignore
 239     tag = False
 240     quote = False
 241     out = ""
 242  
 243     for c in s:
 244         assert tag or not quote
 245  
 246         if c == '<' and not quote:
 247             tag = True
 248         elif c == '>' and not quote:
 249             tag = False
 250         elif (c == '"' or c == "'") and tag:
 251             quote = not quote
 252         elif not tag:
 253             out = out + c
 254  
 255     return out

The abstract MetricCollector class allows subclassing to build more collectors, such as HitCollector.

PerformanceDebugger PerformanceDebugger __init__() MetricDebugger MetricDebugger color() maximum() metric() suspiciousness() tooltip() total() PerformanceDebugger->MetricDebugger SpectrumDebugger SpectrumDebugger MetricDebugger->SpectrumDebugger DifferenceDebugger DifferenceDebugger FAIL PASS SpectrumDebugger->DifferenceDebugger StatisticalDebugger StatisticalDebugger DifferenceDebugger->StatisticalDebugger TimeCollector TimeCollector __enter__() __init__() all_metrics() collect() metric() reset_timer() MetricCollector MetricCollector all_metrics() maximum() metric() total() TimeCollector->MetricCollector CoverageCollector CoverageCollector MetricCollector->CoverageCollector Collector Collector CoverageCollector->Collector StackInspector StackInspector _generated_function_cache CoverageCollector->StackInspector Tracer Tracer Collector->Tracer Tracer->StackInspector HitCollector HitCollector __init__() all_metrics() collect() metric() HitCollector->MetricCollector Legend Legend •  public_method() •  private_method() •  overloaded_method() Hover over names to see doc

Measuring Performance

The solution to debugging performance issues fits in two simple rules:

  1. Measure performance
  2. Break down how individual parts of your code contribute to performance.

The first part, actually measuring performance, is key here. Developers often take elaborated guesses on which aspects of their code impact performance, and think about all possible ways to optimize their code – and at the same time, making it harder to understand, harder to evolve, and harder to maintain. In most cases, such guesses are wrong. Instead, measure performance of your program, identify the very few parts that may need to get improved, and again measure the impact of your changes.

Almost all programming languages offer a way to measure performance and breaking it down to individual parts of the code – a means also known as profiling. Profiling works by measuring the execution time for each function (or even more fine-grained location) in your program. This can be achieved by

  1. Instrumenting or tracing code such that the current time at entry and exit of each function (or line), thus determining the time spent. In Python, this is achieved by profilers like profile or cProfile

  2. Sampling the current function call stack at regular intervals, and thus assessing which functions are most active (= take the most time) during execution. For Python, the scalene profiler works this way.

Pretty much all programming languages support profiling, either through measuring, sampling, or both. As a rule of thumb, interpreted languages more frequently support measuring (as it is easy to implement in an interpreter), while compiled languages more frequently support sampling (because instrumentation requires recompilation). Python is lucky to support both methods.

Tracing Execution Profiles

Let us illustrate profiling in a simple example. The ChangeCounter class (which we will encounter in the chapter on mining version histories) reads in a version history from a git repository. Yet, it takes more than a minute to read in the debugging book change history:

from ChangeCounter import ChangeCounter, debuggingbook_change_counter  # minor dependency
import Timer
with Timer.Timer() as t:
    change_counter = debuggingbook_change_counter(ChangeCounter)
t.elapsed_time()
114.99636066693347

The Python profile and cProfile modules offer a simple way to identify the most time-consuming functions. They are invoked using the run() function, whose argument is the command to be profiled. The output reports, for each function encountered:

  • How often it was called (ncalls column)
  • How much time was spent in the given function, excluding time spent in calls to sub-functions (tottime column)
  • The fraction of tottime / ncalls (first percall column)
  • How much time was spent in the given function, including time spent in calls to sub-functions (cumtime column)
  • The fraction of cumtime / percall (second percall column)

Let us have a look at the profile we obtain:

import cProfile
cProfile.run('debuggingbook_change_counter(ChangeCounter)', sort='cumulative')
         41673535 function calls (41472571 primitive calls) in 126.578 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2618    0.010    0.000  346.536    0.132 threading.py:1117(join)
2631/2623    0.013    0.000  346.465    0.132 threading.py:1155(_wait_for_tstate_lock)
23850/18599    0.066    0.000  342.649    0.018 {method 'acquire' of '_thread.lock' objects}
     1361    0.008    0.000  126.887    0.093 repository.py:213(traverse_commits)
      2/1    0.000    0.000  126.577  126.577 {built-in method builtins.exec}
      2/1    0.000    0.000  126.577  126.577 <string>:1(<module>)
      2/1    0.000    0.000  126.577  126.577 ChangeCounter.ipynb:168(debuggingbook_change_counter)
        1    0.000    0.000  126.577  126.577 ChangeCounter.ipynb:51(__init__)
        1    0.000    0.000  126.577  126.577 ChangeCounter.ipynb:88(mine)
        1    0.000    0.000  126.536  126.536 _base.py:646(__exit__)
        1    0.000    0.000  126.536  126.536 thread.py:220(shutdown)
   2618/1    0.034    0.000  126.536  126.536 threading.py:1018(_bootstrap)
   2618/1    0.017    0.000  126.536  126.536 threading.py:1058(_bootstrap_inner)
   2618/1    0.008    0.000  126.536  126.536 ipkernel.py:744(run_closure)
   2618/1   14.680    0.006  126.536  126.536 threading.py:1001(run)
      4/1    0.000    0.000  126.536  126.536 thread.py:70(_worker)
     13/2    0.018    0.001  126.536   63.268 {method 'get' of '_queue.SimpleQueue' objects}
     1360    0.158    0.000  126.203    0.093 ChangeCounter.ipynb:102(mine_commit)
     1360    0.036    0.000  122.340    0.090 commit.py:753(modified_files)
     1307    0.371    0.000  122.268    0.094 diff.py:184(diff)
     1307   14.272    0.011   99.437    0.076 diff.py:583(_index_from_patch_format)
     1307    0.048    0.000   79.488    0.061 cmd.py:97(handle_process_output)
2614/1307   34.571    0.013   64.790    0.050 cmd.py:144(pump_stream)
      125   29.203    0.234   42.094    0.337 {built-in method time.sleep}
5336/5034    0.041    0.000   20.039    0.004 {method 'close' of '_io.BufferedReader' objects}
    15102    0.081    0.000   18.502    0.001 diff.py:412(__init__)
    14273    0.017    0.000   18.112    0.001 base.py:481(submodules)
    14273    0.043    0.000   18.095    0.001 util.py:1270(list_items)
33008/33002    0.053    0.000   18.037    0.001 {method 'extend' of 'list' objects}
    42819    0.184    0.000   17.985    0.000 base.py:1571(iter_items)
     1362    0.083    0.000    8.400    0.006 cmd.py:1523(_call_process)
     1362    0.149    0.000    8.282    0.006 cmd.py:1079(execute)
     1308    0.048    0.000    8.132    0.006 cmd.py:986(<lambda>)
     1362    0.169    0.000    7.767    0.006 subprocess.py:807(__init__)
     1362    0.210    0.000    7.494    0.006 subprocess.py:1791(_execute_child)
    72725    0.091    0.000    6.520    0.000 util.py:248(__getattr__)
   131552    0.380    0.000    5.990    0.000 cmd.py:1659(__get_object_header)
    42819    0.036    0.000    5.883    0.000 base.py:715(commit)
99911/28546    0.066    0.000    5.426    0.000 tree.py:361(__getitem__)
99911/28546    0.246    0.000    5.389    0.000 tree.py:231(join)
    42819    0.028    0.000    5.352    0.000 symbolic.py:290(_get_commit)
    42819    0.049    0.000    5.324    0.000 symbolic.py:280(_get_object)
     1362    5.100    0.004    5.100    0.004 {built-in method _posixsubprocess.fork_exec}
    57092    0.063    0.000    4.758    0.000 tree.py:210(_set_cache_)
    74460    0.151    0.000    4.722    0.000 db.py:44(stream)
   234578    4.671    0.000    4.671    0.000 {method 'readline' of '_io.BufferedReader' objects}
       29    0.001    0.000    4.562    0.157 base_events.py:1915(_run_once)
    29/28    0.000    0.000    4.558    0.163 selectors.py:558(select)
    74460    0.185    0.000    4.441    0.000 cmd.py:1695(stream_object_data)
    57092    0.046    0.000    4.156    0.000 symbolic.py:156(dereference_recursive)
   114185    0.109    0.000    4.109    0.000 symbolic.py:269(_get_ref_info)
   114185    0.342    0.000    4.000    0.000 symbolic.py:221(_get_ref_info_helper)
    14273    0.058    0.000    3.264    0.000 base.py:229(_config_parser)
    57092    0.120    0.000    3.111    0.000 base.py:136(new_from_sha)
    57092    0.126    0.000    2.642    0.000 db.py:39(info)
    57092    0.066    0.000    2.408    0.000 cmd.py:1667(get_object_header)
    14273    0.061    0.000    2.131    0.000 fun.py:230(rev_parse)
    14273    0.046    0.000    2.059    0.000 fun.py:150(name_to_object)
   131659    1.881    0.000    1.946    0.000 {built-in method _io.open}
     1362    1.676    0.001    1.676    0.001 {built-in method posix.read}
    15633    0.034    0.000    1.660    0.000 commit.py:241(_set_cache_)
114505/100018    0.110    0.000    1.542    0.000 config.py:111(assure_data_present)
      869    0.019    0.000    1.512    0.002 ChangeCounter.ipynb:121(update_stats)
    14273    0.023    0.000    1.471    0.000 util.py:82(__init__)
     1736    0.002    0.000    1.468    0.001 commit.py:214(content)
     1736    0.004    0.000    1.466    0.001 commit.py:222(_get_undecoded_content)
    14379    0.051    0.000    1.457    0.000 config.py:315(__init__)
    14379    0.077    0.000    1.402    0.000 configparser.py:582(__init__)
    74460    0.085    0.000    1.343    0.000 cmd.py:863(read)
    58827    0.024    0.000    1.271    0.000 base.py:137(read)
   148919    1.250    0.000    1.250    0.000 {method 'read' of '_io.BufferedReader' objects}
    57092    0.724    0.000    1.161    0.000 fun.py:77(tree_entries_from_data)
114505/100127    0.087    0.000    1.114    0.000 config.py:589(read)
    14379    0.249    0.000    1.042    0.000 configparser.py:1287(__init__)
     1307    0.953    0.001    0.953    0.001 {method 'join' of 'bytes' objects}
 14272983    0.812    0.000    0.812    0.000 {method 'append' of 'list' objects}
   114185    0.398    0.000    0.766    0.000 symbolic.py:173(_check_ref_name_valid)
    14379    0.331    0.000    0.726    0.000 config.py:439(_read)
     1735    0.001    0.000    0.665    0.000 base.py:192(data_stream)
   114393    0.448    0.000    0.544    0.000 {method 'read' of '_io.TextIOWrapper' objects}
    15633    0.126    0.000    0.534    0.000 commit.py:782(_deserialize)
    14379    0.507    0.000    0.507    0.000 {built-in method builtins.dir}
    15102    0.004    0.000    0.445    0.000 ChangeCounter.ipynb:112(include)
    15102    0.014    0.000    0.441    0.000 ChangeCounter.ipynb:176(filter)
    36455    0.053    0.000    0.440    0.000 commit.py:276(new_path)
   131552    0.035    0.000    0.431    0.000 cmd.py:1646(_get_persistent_cmd)
  1971717    0.393    0.000    0.393    0.000 {method 'match' of 're.Pattern' objects}
   881745    0.352    0.000    0.352    0.000 cmd.py:972(__getattribute__)
   499555    0.179    0.000    0.352    0.000 compat.py:117(safe_decode)
    16462    0.006    0.000    0.330    0.000 commit.py:633(committer_date)
    16462    0.007    0.000    0.324    0.000 commit.py:254(committed_datetime)
   161261    0.193    0.000    0.321    0.000 posixpath.py:71(join)
   131552    0.294    0.000    0.294    0.000 {method 'flush' of '_io.BufferedWriter' objects}
    36076    0.051    0.000    0.289    0.000 pathlib.py:437(__str__)
    57197    0.063    0.000    0.285    0.000 configparser.py:743(get)
    14379    0.178    0.000    0.267    0.000 configparser.py:1210(__init__)
     1362    0.041    0.000    0.243    0.000 os.py:748(copy)
    31266    0.081    0.000    0.239    0.000 util.py:326(parse_actor_and_date)
     2720    0.018    0.000    0.235    0.000 repository.py:245(_iter_commits)
   128776    0.234    0.000    0.234    0.000 {method '__exit__' of '_io._IOBase' objects}
  1086219    0.180    0.000    0.226    0.000 {built-in method builtins.getattr}
   131552    0.152    0.000    0.226    0.000 cmd.py:1601(_parse_object_header)
   806682    0.130    0.000    0.219    0.000 cmd.py:792(__getattr__)
     1362    0.066    0.000    0.215    0.000 subprocess.py:1282(_close_pipe_fds)
  1182431    0.212    0.000    0.212    0.000 {method 'decode' of 'bytes' objects}
   131552    0.112    0.000    0.208    0.000 cmd.py:1632(_prepare_ref)
   189736    0.164    0.000    0.206    0.000 base.py:100(__init__)
2110652/2110648    0.201    0.000    0.205    0.000 {built-in method builtins.isinstance}
   115598    0.094    0.000    0.204    0.000 base.py:231(__init__)
    36075    0.025    0.000    0.199    0.000 pathlib.py:551(drive)
    57092    0.104    0.000    0.198    0.000 util.py:111(get_object_type_by_name)
    28884    0.129    0.000    0.191    0.000 util.py:91(mode_str_to_int)
   115492    0.067    0.000    0.185    0.000 {built-in method builtins.any}
    36073    0.022    0.000    0.174    0.000 pathlib.py:407(_load_parts)
  2283688    0.164    0.000    0.164    0.000 {built-in method builtins.ord}
    74138    0.064    0.000    0.163    0.000 commit.py:109(__init__)
    58452    0.039    0.000    0.156    0.000 tree.py:192(__init__)
    36073    0.060    0.000    0.146    0.000 pathlib.py:387(_parse_path)
     2618    0.009    0.000    0.145    0.000 ipkernel.py:768(init_closure)
   231540    0.074    0.000    0.137    0.000 os.py:841(fsencode)
     2618    0.061    0.000    0.136    0.000 threading.py:884(__init__)
   100015    0.093    0.000    0.134    0.000 util.py:272(join_path)
    16462    0.016    0.000    0.132    0.000 util.py:211(from_timestamp)
    57197    0.061    0.000    0.123    0.000 configparser.py:1122(_unify_values)
    96807    0.046    0.000    0.121    0.000 os.py:709(__getitem__)
    28546    0.020    0.000    0.121    0.000 base.py:437(index)
   342554    0.070    0.000    0.118    0.000 symbolic.py:215(<genexpr>)
     2618    0.019    0.000    0.116    0.000 threading.py:975(start)
   382507    0.115    0.000    0.115    0.000 {method 'split' of 'str' objects}
   561325    0.110    0.000    0.110    0.000 {method 'endswith' of 'str' objects}
     4086    0.061    0.000    0.103    0.000 contextlib.py:530(callback)
    28546    0.017    0.000    0.101    0.000 base.py:140(__init__)
   114393    0.069    0.000    0.097    0.000 codecs.py:319(decode)
   896928    0.091    0.000    0.091    0.000 {built-in method builtins.len}
   142986    0.072    0.000    0.091    0.000 config.py:218(__getitem__)
    57197    0.056    0.000    0.089    0.000 __init__.py:1009(__getitem__)
     1362    0.037    0.000    0.084    0.000 util.py:529(remove_password_if_present)
    28546    0.013    0.000    0.083    0.000 base.py:172(_index_path)
    14382    0.011    0.000    0.082    0.000 config.py:536(_has_includes)
     2618    0.080    0.000    0.080    0.000 {built-in method _thread.start_new_thread}
    32688    0.013    0.000    0.080    0.000 subprocess.py:1880(<genexpr>)
    98064    0.015    0.000    0.079    0.000 _collections_abc.py:868(__iter__)
   101200    0.073    0.000    0.078    0.000 config.py:205(__setitem__)
   330089    0.078    0.000    0.078    0.000 {method 'startswith' of 'str' objects}
     1362    0.010    0.000    0.077    0.000 subprocess.py:1688(_get_handles)
   460031    0.076    0.000    0.076    0.000 {method 'encode' of 'str' objects}
    91562    0.076    0.000    0.076    0.000 {method 'search' of 're.Pattern' objects}
    30204    0.010    0.000    0.075    0.000 diff.py:570(_pick_best_path)
    28546    0.018    0.000    0.073    0.000 base.py:122(__init__)
    28650    0.023    0.000    0.071    0.000 util.py:309(join_path_native)
    14378    0.017    0.000    0.070    0.000 config.py:539(_included_paths)
    31266    0.032    0.000    0.069    0.000 util.py:808(_from_string)
    30204    0.034    0.000    0.065    0.000 diff.py:105(decode_path)
    98064    0.036    0.000    0.064    0.000 os.py:732(__iter__)
     2618    0.026    0.000    0.063    0.000 threading.py:637(wait)
   114393    0.051    0.000    0.062    0.000 codecs.py:309(__init__)
    36178    0.020    0.000    0.061    0.000 pathlib.py:1157(__init__)
   193404    0.029    0.000    0.059    0.000 os.py:795(decode)
      104    0.001    0.000    0.057    0.001 util.py:171(wrapper)
      104    0.000    0.000    0.056    0.001 base.py:1404(module)
      105    0.002    0.000    0.056    0.001 base.py:172(__init__)
    16462    0.051    0.000    0.056    0.000 {built-in method fromtimestamp}
    42820    0.023    0.000    0.055    0.000 base.py:450(head)
     1362    0.019    0.000    0.053    0.000 os.py:654(get_exec_path)
   162871    0.053    0.000    0.053    0.000 {method 'split' of 'bytes' objects}
   173310    0.051    0.000    0.051    0.000 typing.py:392(inner)
   164720    0.030    0.000    0.051    0.000 posixpath.py:41(_get_sep)
   130562    0.050    0.000    0.050    0.000 {built-in method sys.intern}
   306845    0.049    0.000    0.049    0.000 {method 'strip' of 'str' objects}
    57098    0.033    0.000    0.049    0.000 <frozen importlib._bootstrap>:1390(_handle_fromlist)
   114185    0.038    0.000    0.048    0.000 symbolic.py:52(_git_dir)
    74460    0.028    0.000    0.047    0.000 base.py:128(__new__)
   249923    0.046    0.000    0.046    0.000 {built-in method binascii.a2b_hex}
    28546    0.033    0.000    0.045    0.000 symbolic.py:596(to_full_path)
    96807    0.024    0.000    0.045    0.000 os.py:791(encode)
    57093    0.024    0.000    0.044    0.000 <frozen importlib._bootstrap>:645(parent)
     1361    0.008    0.000    0.044    0.000 _base.py:612(result_iterator)
   159181    0.044    0.000    0.044    0.000 {method 'group' of 're.Match' objects}
   183342    0.044    0.000    0.044    0.000 {built-in method __new__ of type object at 0x10119c990}
    16462    0.038    0.000    0.043    0.000 {method 'astimezone' of 'datetime.datetime' objects}
     1308    0.003    0.000    0.042    0.000 subprocess.py:1135(_get_devnull)
   505424    0.042    0.000    0.042    0.000 {built-in method posix.fspath}
     1361    0.000    0.000    0.042    0.000 git.py:110(get_list_commits)
    36178    0.025    0.000    0.041    0.000 pathlib.py:358(__init__)
    57092    0.025    0.000    0.040    0.000 base.py:35(__new__)
     1308    0.006    0.000    0.040    0.000 util.py:500(finalize_process)
   131684    0.040    0.000    0.040    0.000 {method 'write' of '_io.BufferedWriter' objects}
     1308    0.039    0.000    0.039    0.000 {built-in method posix.open}
     3966    0.007    0.000    0.039    0.000 threading.py:323(wait)
    28546    0.023    0.000    0.038    0.000 configparser.py:855(has_option)
     2668    0.004    0.000    0.038    0.000 subprocess.py:1259(wait)
     2615    0.018    0.000    0.038    0.000 cmd.py:796(wait)
   150627    0.037    0.000    0.037    0.000 {method 'startswith' of 'bytes' objects}
     1414    0.008    0.000    0.037    0.000 cmd.py:789(__del__)
     1360    0.020    0.000    0.036    0.000 _base.py:314(_result_or_cancel)
       52    0.000    0.000    0.035    0.001 base.py:1427(module_exists)
     2668    0.004    0.000    0.034    0.000 subprocess.py:2021(_wait)
     1361    0.002    0.000    0.034    0.000 commit.py:512(_iter_from_process_or_stream)
    36073    0.015    0.000    0.033    0.000 pathlib.py:429(_format_parsed_parts)
    86935    0.019    0.000    0.033    0.000 diff.py:535(b_path)
    42821    0.028    0.000    0.033    0.000 head.py:50(__init__)
   172947    0.032    0.000    0.032    0.000 {method 'rstrip' of 'str' objects}
    31266    0.024    0.000    0.031    0.000 util.py:146(utctz_to_altz)
     2618    0.007    0.000    0.030    0.000 threading.py:588(__init__)
    58346    0.029    0.000    0.029    0.000 config.py:208(add)
    28651    0.022    0.000    0.029    0.000 configparser.py:630(sections)
     1361    0.002    0.000    0.029    0.000 subprocess.py:2008(_try_wait)
     4086    0.028    0.000    0.028    0.000 contextlib.py:475(_create_cb_wrapper)
     1414    0.003    0.000    0.028    0.000 cmd.py:754(_terminate)
   137045    0.027    0.000    0.027    0.000 {built-in method binascii.b2a_hex}
   114393    0.027    0.000    0.027    0.000 {built-in method _codecs.utf_8_decode}
    71949    0.027    0.000    0.027    0.000 {built-in method builtins.setattr}
        1    0.001    0.001    0.027    0.027 _base.py:583(map)
13128/1362    0.015    0.000    0.027    0.000 cmd.py:1477(_unpack_args)
    15102    0.008    0.000    0.027    0.000 commit.py:595(committer)
   102684    0.027    0.000    0.027    0.000 typing.py:1766(_no_init_or_replace_init)
     1467    0.027    0.000    0.027    0.000 {built-in method posix.waitpid}
     1362    0.018    0.000    0.026    0.000 contextlib.py:567(__exit__)
     1339    0.017    0.000    0.025    0.000 parse.py:452(urlsplit)
   143236    0.024    0.000    0.024    0.000 {built-in method builtins.hasattr}
    77634    0.024    0.000    0.024    0.000 {method 'groups' of 're.Match' objects}
    86183    0.024    0.000    0.024    0.000 {method 'rpartition' of 'str' objects}
      106    0.000    0.000    0.023    0.000 cmd.py:1710(clear_cache)
     2618    0.003    0.000    0.023    0.000 threading.py:616(set)
    74460    0.021    0.000    0.021    0.000 base.py:132(__init__)
     1362    0.021    0.000    0.021    0.000 warnings.py:484(__enter__)
     1568    0.013    0.000    0.021    0.000 ipkernel.py:775(_clean_thread_parent_frames)
    36073    0.013    0.000    0.020    0.000 posixpath.py:138(splitroot)
   143118    0.020    0.000    0.020    0.000 {function _OMD.__getitem__ at 0x10fbd0ae0}
    12050    0.014    0.000    0.020    0.000 threading.py:299(__enter__)
    28546    0.014    0.000    0.019    0.000 util.py:42(sm_name)
     3979    0.015    0.000    0.019    0.000 threading.py:277(__init__)
     1362    0.019    0.000    0.019    0.000 contextlib.py:481(__init__)
      104    0.000    0.000    0.018    0.000 base.py:327(__del__)
      104    0.000    0.000    0.018    0.000 base.py:333(close)
     1360    0.013    0.000    0.018    0.000 commit.py:796(_parse_diff)
    28546    0.013    0.000    0.017    0.000 base.py:168(__ne__)
    16462    0.017    0.000    0.017    0.000 util.py:191(__init__)
    57092    0.017    0.000    0.017    0.000 base.py:38(__init__)
     2618    0.016    0.000    0.016    0.000 threading.py:1356(_make_invoke_excepthook)
     3979    0.004    0.000    0.016    0.000 threading.py:424(notify_all)
    14273    0.013    0.000    0.016    0.000 util.py:1169(__new__)
     1360    0.002    0.000    0.015    0.000 thread.py:165(submit)
    57197    0.015    0.000    0.015    0.000 __init__.py:999(__init__)
    87496    0.015    0.000    0.015    0.000 {method 'lower' of 'str' objects}
    36178    0.009    0.000    0.015    0.000 pathlib.py:1164(__new__)
     6810    0.014    0.000    0.014    0.000 {built-in method posix.close}
     4140    0.014    0.000    0.014    0.000 {built-in method posix.pipe}
  418/210    0.001    0.000    0.014    0.000 fun.py:99(find_submodule_git_dir)
    24658    0.009    0.000    0.014    0.000 conf.py:52(get)
    28884    0.013    0.000    0.013    0.000 {method 'sub' of 're.Pattern' objects}
     4086    0.011    0.000    0.013    0.000 contextlib.py:548(_push_exit_callback)
   206052    0.013    0.000    0.013    0.000 typing.py:2183(cast)
   144090    0.013    0.000    0.013    0.000 config.py:435(optionxform)
     1360    0.006    0.000    0.013    0.000 _base.py:428(result)
    74460    0.012    0.000    0.012    0.000 cmd.py:852(__init__)
    78889    0.012    0.000    0.012    0.000 {method 'readline' of '_io.BytesIO' objects}
    29090    0.008    0.000    0.012    0.000 parse.py:193(_userinfo)
    14545    0.005    0.000    0.012    0.000 parse.py:160(password)
     2726    0.002    0.000    0.012    0.000 {built-in method builtins.next}
     1307    0.012    0.000    0.012    0.000 {method 'finditer' of 're.Pattern' objects}
     3992    0.004    0.000    0.011    0.000 threading.py:394(notify)
   114393    0.011    0.000    0.011    0.000 codecs.py:260(__init__)
    74460    0.011    0.000    0.011    0.000 cmd.py:939(__del__)
    31266    0.009    0.000    0.011    0.000 util.py:789(__init__)
     7948    0.011    0.000    0.011    0.000 {built-in method _thread.allocate_lock}
    35964    0.008    0.000    0.011    0.000 posixpath.py:131(splitdrive)
    57197    0.010    0.000    0.010    0.000 base.py:381(common_dir)
    16462    0.009    0.000    0.010    0.000 mailmap.py:16(get_developer)
      105    0.000    0.000    0.010    0.000 base.py:660(config_reader)
     5236    0.009    0.000    0.010    0.000 threading.py:1485(current_thread)
     2618    0.005    0.000    0.010    0.000 _weakrefset.py:85(add)
      105    0.000    0.000    0.010    0.000 base.py:683(_config_reader)
        2    0.000    0.000    0.010    0.005 repository.py:179(_prep_repo)
     1363    0.001    0.000    0.009    0.000 contextlib.py:141(__exit__)
     2614    0.008    0.000    0.008    0.000 threading.py:839(_newname)
     1363    0.004    0.000    0.008    0.000 contextlib.py:299(helper)
    14545    0.003    0.000    0.008    0.000 parse.py:156(username)
    15102    0.005    0.000    0.008    0.000 commit.py:660(msg)
    57092    0.008    0.000    0.008    0.000 base.py:52(type)
     1360    0.001    0.000    0.008    0.000 thread.py:184(_adjust_thread_count)
     1362    0.003    0.000    0.008    0.000 warnings.py:168(simplefilter)
     2618    0.006    0.000    0.007    0.000 threading.py:1108(_delete)
    14382    0.004    0.000    0.007    0.000 config.py:398(__del__)
    14084    0.005    0.000    0.007    0.000 diff.py:531(a_path)
    62532    0.007    0.000    0.007    0.000 {built-in method builtins.abs}
      523    0.001    0.000    0.007    0.000 fun.py:57(is_git_dir)
    57092    0.007    0.000    0.007    0.000 base.py:60(size)
    36074    0.007    0.000    0.007    0.000 {method 'join' of 'str' objects}
        1    0.000    0.000    0.007    0.007 base.py:758(iter_commits)
        1    0.000    0.000    0.007    0.007 commit.py:299(iter_items)
     2618    0.002    0.000    0.007    0.000 threading.py:1045(_set_tstate_lock)
     1360    0.001    0.000    0.007    0.000 threading.py:468(acquire)
    71895    0.007    0.000    0.007    0.000 {built-in method builtins.callable}
    57092    0.007    0.000    0.007    0.000 base.py:42(binsha)
     2932    0.004    0.000    0.007    0.000 posixpath.py:179(dirname)
     1362    0.007    0.000    0.007    0.000 {built-in method posix.access}
    14273    0.004    0.000    0.006    0.000 base.py:162(__eq__)
     9308    0.006    0.000    0.006    0.000 {method 'release' of '_thread.lock' objects}
        2    0.000    0.000    0.006    0.003 git.py:77(clear)
    12050    0.004    0.000    0.006    0.000 threading.py:302(__exit__)
     1363    0.002    0.000    0.006    0.000 contextlib.py:132(__enter__)
    50746    0.006    0.000    0.006    0.000 util.py:198(utcoffset)
    28651    0.006    0.000    0.006    0.000 {method 'keys' of 'collections.OrderedDict' objects}
    14379    0.004    0.000    0.006    0.000 configparser.py:1329(__iter__)
    28650    0.005    0.000    0.005    0.000 util.py:303(to_native_path_linux)
    31326    0.005    0.000    0.005    0.000 {method 'endswith' of 'bytes' objects}
     1360    0.001    0.000    0.005    0.000 thread.py:54(run)
    57197    0.005    0.000    0.005    0.000 configparser.py:335(before_get)
    26334    0.005    0.000    0.005    0.000 {method 'get' of 'dict' objects}
     1360    0.001    0.000    0.005    0.000 _base.py:328(__init__)
    65848    0.005    0.000    0.005    0.000 util.py:204(dst)
     5493    0.002    0.000    0.005    0.000 base.py:186(hexsha)
    12392    0.004    0.000    0.004    0.000 cmd.py:338(__getattribute)
    42821    0.004    0.000    0.004    0.000 symbolic.py:77(__init__)
      869    0.004    0.000    0.004    0.000 ChangeCounter.ipynb:145(update_changes)
     1467    0.001    0.000    0.004    0.000 abc.py:117(__instancecheck__)
    15739    0.004    0.000    0.004    0.000 {method 'pop' of 'list' objects}
     1362    0.002    0.000    0.004    0.000 warnings.py:184(_add_filter)
      784    0.004    0.000    0.004    0.000 threading.py:1535(enumerate)
     1472    0.004    0.000    0.004    0.000 {built-in method posix.stat}
    28754    0.004    0.000    0.004    0.000 base.py:372(working_tree_dir)
    13877    0.004    0.000    0.004    0.000 threading.py:1198(ident)
    14383    0.003    0.000    0.004    0.000 config.py:410(release)
     3988    0.004    0.000    0.004    0.000 {method 'add' of 'set' objects}
     4086    0.001    0.000    0.004    0.000 contextlib.py:477(_exit_wrapper)
     6597    0.002    0.000    0.004    0.000 threading.py:314(_is_owned)
    14273    0.004    0.000    0.004    0.000 fun.py:219(to_commit)
    17052    0.004    0.000    0.004    0.000 {method 'strip' of 'bytes' objects}
     1363    0.002    0.000    0.004    0.000 contextlib.py:104(__init__)
     1467    0.004    0.000    0.004    0.000 {built-in method _abc._abc_instancecheck}
     1360    0.001    0.000    0.003    0.000 conf.py:272(is_commit_filtered)
     7854    0.003    0.000    0.003    0.000 threading.py:1236(daemon)
      869    0.003    0.000    0.003    0.000 ChangeCounter.ipynb:137(update_size)
     5441    0.003    0.000    0.003    0.000 {method '__enter__' of '_thread.RLock' objects}
     1048    0.000    0.000    0.003    0.000 genericpath.py:39(isdir)
     1339    0.003    0.000    0.003    0.000 parse.py:119(_coerce_args)
    15633    0.003    0.000    0.003    0.000 {method 'read' of '_io.BytesIO' objects}
      105    0.000    0.000    0.003    0.000 pathlib.py:1022(read_text)
     2618    0.002    0.000    0.003    0.000 threading.py:1081(_stop)
     1361    0.003    0.000    0.003    0.000 __init__.py:1529(info)
      105    0.000    0.000    0.003    0.000 pathlib.py:1005(open)
    15102    0.003    0.000    0.003    0.000 {method 'end' of 're.Match' objects}
     1360    0.001    0.000    0.003    0.000 commit.py:568(author)
     1361    0.001    0.000    0.003    0.000 _base.py:537(set_result)
     1362    0.003    0.000    0.003    0.000 {built-in method sys.exc_info}
     2618    0.002    0.000    0.003    0.000 _weakrefset.py:39(_remove)
    36074    0.003    0.000    0.003    0.000 pathlib.py:569(_tail)
    36075    0.003    0.000    0.003    0.000 pathlib.py:560(root)
        1    0.000    0.000    0.003    0.003 git.py:39(__init__)
     8085    0.003    0.000    0.003    0.000 {method 'append' of 'collections.deque' objects}
        1    0.000    0.000    0.003    0.003 git.py:86(_open_repository)
     1413    0.001    0.000    0.003    0.000 commit.py:669(parents)
    14379    0.002    0.000    0.003    0.000 config.py:378(_acquire_lock)
    14079    0.003    0.000    0.003    0.000 {method '__exit__' of '_thread.RLock' objects}
    31/30    0.000    0.000    0.003    0.000 events.py:86(_run)
      109    0.000    0.000    0.003    0.000 pathlib.py:447(__fspath__)
    13090    0.003    0.000    0.003    0.000 {built-in method _thread.get_ident}
     2614    0.001    0.000    0.003    0.000 base.py:178(__str__)
    13795    0.002    0.000    0.002    0.000 {method 'start' of 're.Match' objects}
    31/30    0.000    0.000    0.002    0.000 {method 'run' of '_contextvars.Context' objects}
     1360    0.001    0.000    0.002    0.000 commit.py:559(hash)
     2614    0.002    0.000    0.002    0.000 threading.py:1251(daemon)
    15102    0.002    0.000    0.002    0.000 commit.py:155(__init__)
     2618    0.002    0.000    0.002    0.000 {built-in method _thread._set_sentinel}
     9539    0.002    0.000    0.002    0.000 {method 'replace' of 'str' objects}
     1360    0.001    0.000    0.002    0.000 _base.py:364(cancel)
     6609    0.002    0.000    0.002    0.000 {method '__enter__' of '_thread.lock' objects}
    14273    0.002    0.000    0.002    0.000 util.py:1172(__init__)
     1362    0.002    0.000    0.002    0.000 {method 'remove' of 'list' objects}
     2722    0.001    0.000    0.002    0.000 subprocess.py:1233(poll)
    14379    0.002    0.000    0.002    0.000 {built-in method builtins.iter}
     3966    0.001    0.000    0.002    0.000 threading.py:311(_acquire_restore)
      315    0.001    0.000    0.002    0.000 util.py:517(expand_path)
     2615    0.001    0.000    0.002    0.000 cmd.py:977(__getattr__)
     3966    0.001    0.000    0.002    0.000 threading.py:308(_release_save)
     2615    0.001    0.000    0.002    0.000 encoding.py:1(force_bytes)
     1362    0.002    0.000    0.002    0.000 {built-in method builtins.sorted}
    14379    0.002    0.000    0.002    0.000 configparser.py:1170(converters)
    16462    0.002    0.000    0.002    0.000 developer.py:27(__init__)
      105    0.000    0.000    0.002    0.000 configparser.py:804(getboolean)
     2618    0.001    0.000    0.002    0.000 threading.py:1038(_set_ident)
    10716    0.002    0.000    0.002    0.000 {method '__exit__' of '_thread.lock' objects}
     1418    0.001    0.000    0.001    0.000 cmd.py:294(dashify)
     1360    0.001    0.000    0.001    0.000 _base.py:497(set_running_or_notify_cancel)
     1362    0.001    0.000    0.001    0.000 _collections_abc.py:819(keys)
    14377    0.001    0.000    0.001    0.000 base.py:390(bare)
      105    0.000    0.000    0.001    0.000 configparser.py:783(_get_conv)
     1361    0.001    0.000    0.001    0.000 subprocess.py:1961(_handle_exitstatus)
     1362    0.001    0.000    0.001    0.000 warnings.py:505(__exit__)
     1362    0.001    0.000    0.001    0.000 __init__.py:1517(debug)
      105    0.000    0.000    0.001    0.000 configparser.py:780(_get)
     1362    0.001    0.000    0.001    0.000 contextlib.py:564(__enter__)
     3966    0.001    0.000    0.001    0.000 {method 'remove' of 'collections.deque' objects}
     5283    0.001    0.000    0.001    0.000 {method 'insert' of 'list' objects}
       13    0.000    0.000    0.001    0.000 iostream.py:118(_run_event_pipe_gc)
    14386    0.001    0.000    0.001    0.000 config.py:764(read_only)
     1361    0.001    0.000    0.001    0.000 subprocess.py:1120(__del__)
     4086    0.001    0.000    0.001    0.000 {method 'pop' of 'collections.deque' objects}
     1339    0.001    0.000    0.001    0.000 <string>:1(<lambda>)
     5462    0.001    0.000    0.001    0.000 {method 'items' of 'dict' objects}
     2618    0.001    0.000    0.001    0.000 threading.py:1042(_set_native_id)
      105    0.000    0.000    0.001    0.000 __init__.py:174(search)
       53    0.000    0.000    0.001    0.000 subprocess.py:2208(terminate)
     1362    0.001    0.000    0.001    0.000 cmd.py:1466(transform_kwargs)
     2723    0.001    0.000    0.001    0.000 __init__.py:1790(isEnabledFor)
        3    0.000    0.000    0.001    0.000 config.py:710(write)
     1362    0.001    0.000    0.001    0.000 {method 'rfind' of 'bytes' objects}
     2627    0.001    0.000    0.001    0.000 threading.py:1222(is_alive)
      313    0.000    0.000    0.001    0.000 cmd.py:662(is_cygwin)
     2618    0.001    0.000    0.001    0.000 {method 'discard' of 'set' objects}
     5228    0.001    0.000    0.001    0.000 diff.py:235(<genexpr>)
      317    0.000    0.000    0.001    0.000 genericpath.py:27(isfile)
      105    0.000    0.000    0.001    0.000 genericpath.py:16(exists)
       53    0.000    0.000    0.001    0.000 subprocess.py:2176(send_signal)
     4083    0.001    0.000    0.001    0.000 subprocess.py:1973(_internal_poll)
     1339    0.001    0.000    0.001    0.000 {method 'lstrip' of 'str' objects}
      2/1    0.000    0.000    0.001    0.001 config.py:127(flush_changes)
      317    0.000    0.000    0.001    0.000 posixpath.py:408(abspath)
      105    0.001    0.000    0.001    0.000 __init__.py:280(_compile)
     7863    0.001    0.000    0.001    0.000 threading.py:601(is_set)
        1    0.000    0.000    0.001    0.001 base.py:698(config_writer)
     1339    0.001    0.000    0.001    0.000 {method 'find' of 'str' objects}
     1362    0.001    0.000    0.001    0.000 warnings.py:458(__init__)
     1570    0.001    0.000    0.001    0.000 {method 'rfind' of 'str' objects}
      105    0.000    0.000    0.001    0.000 pathlib.py:719(__truediv__)
     1362    0.001    0.000    0.001    0.000 cmd.py:749(__init__)
        4    0.000    0.000    0.001    0.000 util.py:1050(_obtain_lock)
      105    0.000    0.000    0.001    0.000 cmd.py:947(__init__)
        4    0.000    0.000    0.001    0.000 util.py:1026(_obtain_lock_or_raise)
      105    0.000    0.000    0.001    0.000 db.py:34(__init__)
       26    0.000    0.000    0.001    0.000 tasks.py:653(sleep)
      313    0.000    0.000    0.001    0.000 util.py:486(is_cygwin_git)
     1362    0.001    0.000    0.001    0.000 subprocess.py:268(_cleanup)
        1    0.000    0.000    0.001    0.001 config.py:866(set_value)
       13    0.000    0.000    0.001    0.000 futures.py:313(_set_result_unless_cancelled)
      105    0.000    0.000    0.001    0.000 pathlib.py:711(joinpath)
     1360    0.000    0.000    0.001    0.000 git.py:140(get_commit_from_gitpython)
     4086    0.001    0.000    0.001    0.000 {built-in method _warnings._filters_mutated}
      527    0.000    0.000    0.001    0.000 posixpath.py:60(isabs)
     1051    0.001    0.000    0.001    0.000 {built-in method posix._path_normpath}
       53    0.001    0.000    0.001    0.000 {built-in method posix.kill}
     3136    0.001    0.000    0.001    0.000 {method 'keys' of 'dict' objects}
     1416    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
     1365    0.000    0.000    0.000    0.000 {method 'put' of '_queue.SimpleQueue' objects}
      107    0.000    0.000    0.000    0.000 pathlib.py:380(with_segments)
     1360    0.000    0.000    0.000    0.000 threading.py:124(RLock)
      105    0.000    0.000    0.000    0.000 _collections_abc.py:811(__contains__)
       15    0.000    0.000    0.000    0.000 {method 'set_result' of '_asyncio.Future' objects}
     1672    0.000    0.000    0.000    0.000 {method 'values' of 'dict' objects}
      105    0.000    0.000    0.000    0.000 loose.py:77(__init__)
     2618    0.000    0.000    0.000    0.000 {built-in method _thread.get_native_id}
       53    0.000    0.000    0.000    0.000 {method 'close' of '_io.BufferedWriter' objects}
     2614    0.000    0.000    0.000    0.000 {built-in method _thread.daemon_threads_allowed}
     2638    0.000    0.000    0.000    0.000 {method 'locked' of '_thread.lock' objects}
      126    0.000    0.000    0.000    0.000 {built-in method posix.getppid}
      104    0.000    0.000    0.000    0.000 base.py:288(abspath)
     1467    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
      106    0.000    0.000    0.000    0.000 base.py:633(_get_config_path)
        3    0.000    0.000    0.000    0.000 config.py:664(_write)
      104    0.000    0.000    0.000    0.000 mman.py:408(collect)
        2    0.000    0.000    0.000    0.000 zmqstream.py:583(_handle_events)
     2724    0.000    0.000    0.000    0.000 subprocess.py:1311(_on_error_fd_closer)
      315    0.000    0.000    0.000    0.000 posixpath.py:256(expanduser)
     1361    0.000    0.000    0.000    0.000 {built-in method posix.waitstatus_to_exitcode}
       13    0.000    0.000    0.000    0.000 iostream.py:127(_event_pipe_gc)
     1361    0.000    0.000    0.000    0.000 {built-in method posix.WIFSTOPPED}
       45    0.000    0.000    0.000    0.000 config.py:668(write_section)
     2753    0.000    0.000    0.000    0.000 {built-in method time.monotonic}
     1360    0.000    0.000    0.000    0.000 _base.py:398(__get_result)
       15    0.000    0.000    0.000    0.000 base_events.py:784(call_soon)
       13    0.000    0.000    0.000    0.000 base_events.py:742(call_later)
        1    0.000    0.000    0.000    0.000 asyncio.py:200(_handle_events)
      313    0.000    0.000    0.000    0.000 util.py:455(_is_cygwin_git)
      105    0.000    0.000    0.000    0.000 base.py:113(__init__)
     1362    0.000    0.000    0.000    0.000 {built-in method sys.audit}
     1364    0.000    0.000    0.000    0.000 {method 'get_nowait' of '_queue.SimpleQueue' objects}
       13    0.000    0.000    0.000    0.000 base_events.py:766(call_at)
      104    0.000    0.000    0.000    0.000 mman.py:303(_collect_lru_region)
      2/1    0.000    0.000    0.000    0.000 ioloop.py:742(_run_callback)
      315    0.000    0.000    0.000    0.000 posixpath.py:320(expandvars)
        2    0.000    0.000    0.000    0.000 zmqstream.py:624(_handle_recv)
     1307    0.000    0.000    0.000    0.000 diff.py:172(_process_diff_args)
     1339    0.000    0.000    0.000    0.000 parse.py:421(_checknetloc)
     1362    0.000    0.000    0.000    0.000 _collections_abc.py:845(__init__)
       58    0.000    0.000    0.000    0.000 cmd.py:1450(transform_kwarg)
     1361    0.000    0.000    0.000    0.000 _base.py:337(_invoke_callbacks)
      105    0.000    0.000    0.000    0.000 configparser.py:1142(_convert_to_boolean)
       16    0.000    0.000    0.000    0.000 base_events.py:813(_call_soon)
        2    0.000    0.000    0.000    0.000 zmqstream.py:556(_run_callback)
        2    0.000    0.000    0.000    0.000 iostream.py:157(_handle_event)
        2    0.000    0.000    0.000    0.000 iostream.py:276(<lambda>)
        2    0.000    0.000    0.000    0.000 iostream.py:278(_really_send)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1360    0.000    0.000    0.000    0.000 thread.py:48(__init__)
      104    0.000    0.000    0.000    0.000 base.py:351(__ne__)
        2    0.000    0.000    0.000    0.000 socket.py:703(send_multipart)
      943    0.000    0.000    0.000    0.000 {built-in method _stat.S_ISDIR}
     1360    0.000    0.000    0.000    0.000 commit.py:536(__init__)
     1339    0.000    0.000    0.000    0.000 parse.py:108(_noop)
        5    0.000    0.000    0.000    0.000 attrsettr.py:43(__getattr__)
       13    0.000    0.000    0.000    0.000 events.py:155(cancel)
     1361    0.000    0.000    0.000    0.000 {method '_is_owned' of '_thread.RLock' objects}
        3    0.000    0.000    0.000    0.000 zmqstream.py:663(_rebuild_io_state)
       29    0.000    0.000    0.000    0.000 events.py:36(__init__)
        1    0.000    0.000    0.000    0.000 git.py:92(_discover_main_branch)
       57    0.000    0.000    0.000    0.000 base_events.py:733(time)
        1    0.000    0.000    0.000    0.000 base.py:1041(active_branch)
        5    0.000    0.000    0.000    0.000 attrsettr.py:66(_get_attr_opt)
      869    0.000    0.000    0.000    0.000 ChangeCounter.ipynb:155(update_elems)
        1    0.000    0.000    0.000    0.000 zmqstream.py:694(<lambda>)
        1    0.000    0.000    0.000    0.000 symbolic.py:402(_get_reference)
        3    0.000    0.000    0.000    0.000 zmqstream.py:686(_update_handler)
       13    0.000    0.000    0.000    0.000 events.py:111(__init__)
       45    0.000    0.000    0.000    0.000 config.py:242(items_all)
       13    0.000    0.000    0.000    0.000 base_events.py:446(create_future)
       88    0.000    0.000    0.000    0.000 config.py:861(_value_to_string)
        2    0.000    0.000    0.000    0.000 pathlib.py:1228(resolve)
        2    0.000    0.000    0.000    0.000 util.py:1057(_release_lock)
        1    0.000    0.000    0.000    0.000 kernelbase.py:324(_flush)
        1    0.000    0.000    0.000    0.000 util.py:245(rmfile)
        2    0.000    0.000    0.000    0.000 posixpath.py:423(realpath)
        8    0.000    0.000    0.000    0.000 {built-in method posix.lstat}
      132    0.000    0.000    0.000    0.000 config.py:235(getall)
       16    0.000    0.000    0.000    0.000 enum.py:1545(__or__)
      104    0.000    0.000    0.000    0.000 base.py:346(__eq__)
        1    0.000    0.000    0.000    0.000 {built-in method posix.remove}
       43    0.000    0.000    0.000    0.000 {built-in method builtins.max}
        2    0.000    0.000    0.000    0.000 posixpath.py:432(_joinrealpath)
        1    0.000    0.000    0.000    0.000 conf.py:85(sanity_check_filters)
       72    0.000    0.000    0.000    0.000 enum.py:1538(_get_value)
       14    0.000    0.000    0.000    0.000 socket.py:626(send)
        1    0.000    0.000    0.000    0.000 zmqstream.py:427(flush)
       13    0.000    0.000    0.000    0.000 threading.py:515(release)
        1    0.000    0.000    0.000    0.000 kernelbase.py:302(poll_control_queue)
        8    0.000    0.000    0.000    0.000 enum.py:1556(__and__)
      210    0.000    0.000    0.000    0.000 {built-in method _io.text_encoding}
        8    0.000    0.000    0.000    0.000 threading.py:857(_maintain_shutdown_locks)
      211    0.000    0.000    0.000    0.000 {built-in method _stat.S_ISREG}
       59    0.000    0.000    0.000    0.000 {method 'find' of 'bytes' objects}
        1    0.000    0.000    0.000    0.000 selector_events.py:129(_read_from_self)
        3    0.000    0.000    0.000    0.000 config.py:238(items)
        2    0.000    0.000    0.000    0.000 pathlib.py:1398(expanduser)
       13    0.000    0.000    0.000    0.000 events.py:72(cancel)
       30    0.000    0.000    0.000    0.000 selector_events.py:750(_process_events)
        1    0.000    0.000    0.000    0.000 conf.py:302(_check_timezones)
       88    0.000    0.000    0.000    0.000 encoding.py:11(force_text)
       13    0.000    0.000    0.000    0.000 {built-in method _heapq.heappop}
        1    0.000    0.000    0.000    0.000 symbolic.py:879(from_path)
        1    0.000    0.000    0.000    0.000 conf.py:310(_replace_timezone)
        1    0.000    0.000    0.000    0.000 repository.py:44(__init__)
       31    0.000    0.000    0.000    0.000 enum.py:720(__call__)
       59    0.000    0.000    0.000    0.000 {method 'rstrip' of 'bytes' objects}
        2    0.000    0.000    0.000    0.000 socket.py:774(recv_multipart)
        2    0.000    0.000    0.000    0.000 pathlib.py:835(stat)
      107    0.000    0.000    0.000    0.000 {built-in method builtins.issubclass}
        1    0.000    0.000    0.000    0.000 thread.py:127(__init__)
        2    0.000    0.000    0.000    0.000 {method 'recv' of '_socket.socket' objects}
        4    0.000    0.000    0.000    0.000 weakref.py:369(remove)
        1    0.000    0.000    0.000    0.000 queues.py:186(put)
      105    0.000    0.000    0.000    0.000 base.py:70(__init__)
        1    0.000    0.000    0.000    0.000 queues.py:209(put_nowait)
        1    0.000    0.000    0.000    0.000 conf.py:26(__init__)
        1    0.000    0.000    0.000    0.000 asyncio.py:225(add_callback)
        1    0.000    0.000    0.000    0.000 futures.py:396(_call_set_state)
        1    0.000    0.000    0.000    0.000 conf.py:202(build_args)
        2    0.000    0.000    0.000    0.000 repository.py:154(_is_remote)
       13    0.000    0.000    0.000    0.000 {built-in method _heapq.heappush}
        1    0.000    0.000    0.000    0.000 zmqstream.py:468(update_flag)
       15    0.000    0.000    0.000    0.000 {built-in method _contextvars.copy_context}
        4    0.000    0.000    0.000    0.000 zmqstream.py:542(sending)
        2    0.000    0.000    0.000    0.000 typing.py:1221(__instancecheck__)
        1    0.000    0.000    0.000    0.000 base_events.py:837(call_soon_threadsafe)
       15    0.000    0.000    0.000    0.000 {method 'cancelled' of '_asyncio.Future' objects}
       35    0.000    0.000    0.000    0.000 {method 'popleft' of 'collections.deque' objects}
       31    0.000    0.000    0.000    0.000 enum.py:1123(__new__)
        4    0.000    0.000    0.000    0.000 queue.py:97(empty)
        2    0.000    0.000    0.000    0.000 traitlets.py:676(__get__)
        1    0.000    0.000    0.000    0.000 {method 'replace' of 'datetime.datetime' objects}
        1    0.000    0.000    0.000    0.000 queues.py:225(get)
        2    0.000    0.000    0.000    0.000 typing.py:1492(__subclasscheck__)
       14    0.000    0.000    0.000    0.000 {built-in method builtins.min}
       14    0.000    0.000    0.000    0.000 {built-in method _asyncio.get_running_loop}
        1    0.000    0.000    0.000    0.000 concurrent.py:182(future_set_result_unless_cancelled)
       57    0.000    0.000    0.000    0.000 base_events.py:2010(get_debug)
        1    0.000    0.000    0.000    0.000 threading.py:457(__init__)
        8    0.000    0.000    0.000    0.000 {method 'difference_update' of 'set' objects}
        1    0.000    0.000    0.000    0.000 git.py:343(__del__)
        1    0.000    0.000    0.000    0.000 selector_events.py:141(_write_to_self)
        1    0.000    0.000    0.000    0.000 __init__.py:230(utcoffset)
        1    0.000    0.000    0.000    0.000 poll.py:80(poll)
        4    0.000    0.000    0.000    0.000 weakref.py:427(__setitem__)
     28/1    1.717    0.061    0.000    0.000 {method 'control' of 'select.kqueue' objects}
        1    0.000    0.000    0.000    0.000 conf.py:72(_check_only_one_from_commit)
        1    0.000    0.000    0.000    0.000 configparser.py:869(set)
        1    0.000    0.000    0.000    0.000 {method 'send' of '_socket.socket' objects}
       29    0.000    0.000    0.000    0.000 base_events.py:538(_check_closed)
        1    0.000    0.000    0.000    0.000 reference.py:54(__init__)
        4    0.000    0.000    0.000    0.000 typing.py:1285(__hash__)
        8    0.000    0.000    0.000    0.000 {method 'partition' of 'str' objects}
        1    0.000    0.000    0.000    0.000 conf.py:79(_check_only_one_to_commit)
        2    0.000    0.000    0.000    0.000 queues.py:322(_consume_expired)
        2    0.000    0.000    0.000    0.000 base_events.py:1900(_add_callback)
        8    0.000    0.000    0.000    0.000 conf.py:43(set_value)
        1    0.000    0.000    0.000    0.000 util.py:1009(__del__)
        2    0.000    0.000    0.000    0.000 abc.py:121(__subclasscheck__)
        2    0.000    0.000    0.000    0.000 iostream.py:216(_check_mp_mode)
        1    0.000    0.000    0.000    0.000 queues.py:256(get_nowait)
        1    0.000    0.000    0.000    0.000 reference.py:120(name)
       13    0.000    0.000    0.000    0.000 base_events.py:1910(_timer_handle_cancelled)
        1    0.000    0.000    0.000    0.000 queues.py:317(__put_internal)
        2    0.000    0.000    0.000    0.000 traitlets.py:629(get)
        3    0.000    0.000    0.000    0.000 config.py:756(_assure_writable)
        2    0.000    0.000    0.000    0.000 iostream.py:213(_is_master_process)
        1    0.000    0.000    0.000    0.000 {method 'reverse' of 'list' objects}
        8    0.000    0.000    0.000    0.000 {built-in method _stat.S_ISLNK}
        4    0.000    0.000    0.000    0.000 queue.py:209(_qsize)
        1    0.000    0.000    0.000    0.000 conf.py:61(_sanity_check_repos)
        2    0.000    0.000    0.000    0.000 conf.py:192(only_one_filter)
        5    0.000    0.000    0.000    0.000 {method 'upper' of 'str' objects}
        2    0.000    0.000    0.000    0.000 util.py:1012(_lock_file_path)
        2    0.000    0.000    0.000    0.000 {built-in method _abc._abc_subclasscheck}
        1    0.000    0.000    0.000    0.000 conf.py:151(get_starting_commit)
        1    0.000    0.000    0.000    0.000 conf.py:175(get_ending_commit)
        1    0.000    0.000    0.000    0.000 conf.py:123(_check_correct_filters_order)
        8    0.000    0.000    0.000    0.000 util.py:1016(_has_lock)
        4    0.000    0.000    0.000    0.000 {built-in method builtins.hash}
        5    0.000    0.000    0.000    0.000 zmqstream.py:538(receiving)
        1    0.000    0.000    0.000    0.000 queues.py:312(_put)
        3    0.000    0.000    0.000    0.000 git.py:63(repo)
        1    0.000    0.000    0.000    0.000 queues.py:173(qsize)
        2    0.000    0.000    0.000    0.000 selectors.py:275(_key_from_fd)
        1    0.000    0.000    0.000    0.000 poll.py:31(register)
        1    0.000    0.000    0.000    0.000 pathlib.py:583(name)
        1    0.000    0.000    0.000    0.000 queues.py:309(_get)
        1    0.000    0.000    0.000    0.000 unix_events.py:81(_process_self_data)
        1    0.000    0.000    0.000    0.000 util.py:1005(__init__)
        2    0.000    0.000    0.000    0.000 {built-in method posix.getpid}
        1    0.000    0.000    0.000    0.000 configparser.py:649(has_section)
        2    0.000    0.000    0.000    0.000 iostream.py:255(closed)
        1    0.000    0.000    0.000    0.000 {method 'isalpha' of 'str' objects}
        1    0.000    0.000    0.000    0.000 queues.py:177(empty)
        1    0.000    0.000    0.000    0.000 codecs.py:186(__init__)
        1    0.000    0.000    0.000    0.000 {method 'isascii' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'done' of '_asyncio.Future' objects}
        1    0.000    0.000    0.000    0.000 base_events.py:719(is_closed)
        1    0.000    0.000    0.000    0.000 _base.py:643(__enter__)
        1    0.000    0.000    0.000    0.000 zmqstream.py:659(_check_closed)
        1    0.000    0.000    0.000    0.000 queues.py:59(_set_timeout)
        1    0.000    0.000    0.000    0.000 configparser.py:338(before_set)
        1    0.000    0.000    0.000    0.000 locks.py:224(clear)

Yes, that's an awful lot of functions, but we can quickly narrow things down. The cumtime column is sorted by largest values first. We see that the debuggingbook_change_counter() method at the top takes up all the time – but this is not surprising, since it is the method we called in the first place. This calls a method mine() in the ChangeCounter class, which does all the work.

The next places are more interesting: almost all time is spent in a single method, named modifications(). This method determines the difference between two versions, which is an expensive operation; this is also supported by the observation that half of the time is spent in a diff() method.

This profile thus already gets us a hint on how to improve performance: Rather than computing the diff between versions for every version, we could do so on demand (and possibly cache results so we don't have to compute them twice). Alas, this (slow) functionality is part of the underlying PyDriller Python package, so we cannot fix this within the ChangeCounter class. But we could file a bug with the developers, suggesting a patch to improve performance.

Sampling Execution Profiles

Instrumenting code is precise, but it is also slow. An alternate way to measure performance is to sample in regular intervals which functions are currently active – for instance, by examining the current function call stack. The more frequently a function is sampled as active, the more time is spent in that function.

One profiler for Python that implements such sampling is Scalene – a high-performance, high-precision CPU, GPU, and memory profiler for Python. We can invoke it on our example as follows:

$ scalene --html test.py > scalene-out.html

where test.py is a script that again invokes

debuggingbook_change_counter(ChangeCounter)

The output of scalene is sent to a HTML file (here, scalene-out.html) which is organized by lines – that is, for each line, we see how much it contributed to overall execution time. Opening the output scalene-out.html in a HTML browser, we see these lines:

As with cProfile, above, we identify the mine() method in the ChangeCounter class as the main performance hog – and in the mine() method, it is the iteration over all modifications that takes all the time. Adding the option --profile-all to scalene would extend the profile to all executed code, including the pydriller third-party library.

Besides relying on sampling rather that tracing (which is more efficient) and breaking down execution time by line, scalene also provides additional information on memory usage and more. If cProfile is not sufficient, then scalene will bring profiling to the next level.

Improving Performance

Identifying a culprit is not always that easy. Notably, when the first set of obvious performance hogs is fixed, it becomes more and more difficult to squeeze out additional performance – and, as stated above, such optimization may be in conflict with readability and maintainability of your code. Here are some simple ways to improve performance:

  • Efficient algorithms. For many tasks, the simplest algorithm is not always the best performing one. Consider alternatives that may be more efficient, and measure whether they pay off.

  • Efficient data types. Remember that certain operations, such as looking up whether an element is contained, may take different amounts of time depending on the data structure. In Python, a query like x in xs takes (mostly) constant time if xs is a set, but linear time if xs is a list; these differences become significant as the size of xs grows.

  • Efficient modules. In Python, most frequently used modules (or at least parts of) are implemented in C, which is way more efficient than plain Python. Rely on existing modules whenever possible. Or implement your own, after having measured that this may pay off.

These are all things you can already use during programming – and also set up your code such that exchanging, say, one data type by another will still be possible later. This is best achieved by hiding implementation details (such as the used data types) behind an abstract interface used by your clients.

But beyond these points, remember the famous words by Donald E. Knuth:

from bookutils import quiz

Quiz

Donald E. Knuth said: "Premature optimization..."





This quote should always remind us that after a good design, you should always first measure and then optimize.

Building a Profiler

Having discussed profilers from a user perspective, let us now dive into how they are actually implemented. It turns out we can use most of our existing infrastructure to implement a simple tracing profiler with only a few lines of code.

The program we will apply our profiler on is – surprise! – our ongoing example, remove_html_markup(). Our aim is to understand how much time is spent in each line of the code (such that we have a new feature on top of Python cProfile).

from Intro_Debugging import remove_html_markup
print_content(inspect.getsource(remove_html_markup), '.py',
              start_line_number=238)
238  def remove_html_markup(s):  # type: ignore
239      tag = False
240      quote = False
241      out = ""
242  
243      for c in s:
244          assert tag or not quote
245  
246          if c == '<' and not quote:
247              tag = True
248          elif c == '>' and not quote:
249              tag = False
250          elif (c == '"' or c == "'") and tag:
251              quote = not quote
252          elif not tag:
253              out = out + c
254  
255      return out

We introduce a class PerformanceTracer that tracks, for each line in the code:

  • how often it was executed (hits), and
  • how much time was spent during its execution (time).

To this end, we make use of our Timer class, which measures time, and the Tracer class from the chapter on tracing, which allows us to track every line of the program as it is being executed.

from Tracer import Tracer

In PerformanceTracker, the attributes hits and time are mappings indexed by unique locations – that is, pairs of function name and line number.

Location = Tuple[str, int]
class PerformanceTracer(Tracer):
    """Trace time and #hits for individual program lines"""

    def __init__(self) -> None:
        """Constructor."""
        super().__init__()
        self.reset_timer()
        self.hits: Dict[Location, int] = {}
        self.time: Dict[Location, float] = {}

    def reset_timer(self) -> None:
        self.timer = Timer.Timer()

As common in this book, we want to use PerformanceTracer in a with-block around the function call(s) to be tracked:

with PerformanceTracer() as perf_tracer:
    function(...)

When entering the with block (__enter__()), we reset all timers. Also, coming from the __enter__() method of the superclass Tracer, we enable tracing through the traceit() method.

from types import FrameType
class PerformanceTracer(PerformanceTracer):
    def __enter__(self) -> Any:
        """Enter a `with` block."""
        super().__enter__()
        self.reset_timer()
        return self

The traceit() method extracts the current location. It increases the corresponding hits value by 1, and adds the elapsed time to the corresponding time.

class PerformanceTracer(PerformanceTracer):
    def traceit(self, frame: FrameType, event: str, arg: Any) -> None:
        """Tracing function; called for every line."""
        t = self.timer.elapsed_time()
        location = (frame.f_code.co_name, frame.f_lineno)

        self.hits.setdefault(location, 0)
        self.time.setdefault(location, 0.0)
        self.hits[location] += 1
        self.time[location] += t

        self.reset_timer()

This is it already. We can now determine where most time is spent in remove_html_markup(). We invoke it 10,000 times such that we can average over runs:

with PerformanceTracer() as perf_tracer:
    for i in range(10000):
        s = remove_html_markup('<b>foo</b>')

Here are the hits. For every line executed, we see how often it was executed. The most executed line is the for loop with 110,000 hits – once for each of the 10 characters in <b>foo</b>, once for the final check, and all of this 10,000 times.

perf_tracer.hits
{('__init__', 17): 1,
 ('__init__', 19): 1,
 ('clock', 8): 1,
 ('clock', 12): 2,
 ('__init__', 20): 2,
 ('remove_html_markup', 238): 10000,
 ('remove_html_markup', 239): 10000,
 ('remove_html_markup', 240): 10000,
 ('remove_html_markup', 241): 10000,
 ('remove_html_markup', 243): 110000,
 ('remove_html_markup', 244): 100000,
 ('remove_html_markup', 246): 100000,
 ('remove_html_markup', 247): 20000,
 ('remove_html_markup', 248): 80000,
 ('remove_html_markup', 250): 60000,
 ('remove_html_markup', 252): 60000,
 ('remove_html_markup', 249): 20000,
 ('remove_html_markup', 253): 30000,
 ('remove_html_markup', 255): 20000}

The time attribute collects how much time was spent in each line. Within the loop, again, the for statement takes the most time. The other lines show some variability, though.

perf_tracer.time
{('__init__', 17): 0.00020849995780736208,
 ('__init__', 19): 1.374981366097927e-06,
 ('clock', 8): 1.2080417945981026e-06,
 ('clock', 12): 1.5009427443146706e-06,
 ('__init__', 20): 1.5830155462026596e-06,
 ('remove_html_markup', 238): 0.012197923264466226,
 ('remove_html_markup', 239): 0.013060983153991401,
 ('remove_html_markup', 240): 0.012579386122524738,
 ('remove_html_markup', 241): 0.011003078776411712,
 ('remove_html_markup', 243): 0.07900447177235037,
 ('remove_html_markup', 244): 0.07023449544794858,
 ('remove_html_markup', 246): 0.06870396924205124,
 ('remove_html_markup', 247): 0.014365122187882662,
 ('remove_html_markup', 248): 0.05404824938159436,
 ('remove_html_markup', 250): 0.04095112928189337,
 ('remove_html_markup', 252): 0.04195001139305532,
 ('remove_html_markup', 249): 0.013921678997576237,
 ('remove_html_markup', 253): 0.020369785837829113,
 ('remove_html_markup', 255): 0.013756285421550274}

For a full profiler, these numbers would now be sorted and printed in a table, much like cProfile does. However, we will borrow some material from previous chapters and annotate our code accordingly.

Visualizing Performance Metrics

In the chapter on statistical debugging, we have encountered the CoverageCollector class, which collects line and function coverage during execution, using a collect() method that is invoked for every line. We will repurpose this class to collect arbitrary metrics on the lines executed, notably time taken.

Collecting Time Spent

from StatisticalDebugger import CoverageCollector, SpectrumDebugger

The MetricCollector class is an abstract superclass that provides an interface to access a particular metric.

class MetricCollector(CoverageCollector):
    """Abstract superclass for collecting line-specific metrics"""

    def metric(self, event: Any) -> Optional[float]:
        """Return a metric for an event, or none."""
        return None

    def all_metrics(self, func: str) -> List[float]:
        """Return all metric for a function `func`."""
        return []

Given these metrics, we can also compute sums and maxima for a single function.

class MetricCollector(MetricCollector):
    def total(self, func: str) -> float:
        return sum(self.all_metrics(func))

    def maximum(self, func: str) -> float:
        return max(self.all_metrics(func))

Let us instantiate this superclass into TimeCollector – a subclass that measures time. This is modeled after our PerformanceTracer class, above; notably, the time attribute serves the same role.

class TimeCollector(MetricCollector):
    """Collect time executed for each line"""

    def __init__(self) -> None:
        """Constructor"""
        super().__init__()
        self.reset_timer()
        self.time: Dict[Location, float] = {}
        self.add_items_to_ignore([Timer.Timer, Timer.clock])

    def collect(self, frame: FrameType, event: str, arg: Any) -> None:
        """Invoked for every line executed. Accumulate time spent."""
        t = self.timer.elapsed_time()
        super().collect(frame, event, arg)
        location = (frame.f_code.co_name, frame.f_lineno)

        self.time.setdefault(location, 0.0)
        self.time[location] += t

        self.reset_timer()

    def reset_timer(self) -> None:
        self.timer = Timer.Timer()

    def __enter__(self) -> Any:
        super().__enter__()
        self.reset_timer()
        return self

The metric() and all_metrics() methods accumulate the metric (time taken) for an individual function:

class TimeCollector(TimeCollector):
    def metric(self, location: Any) -> Optional[float]:
        if location in self.time:
            return self.time[location]
        else:
            return None

    def all_metrics(self, func: str) -> List[float]:
        return [time
                for (func_name, lineno), time in self.time.items()
                if func_name == func]

Here's how to use TimeCollector() – again, in a with block:

with TimeCollector() as collector:
    for i in range(100):
        s = remove_html_markup('<b>foo</b>')

The time attribute holds the time spent in each line:

for location, time_spent in collector.time.items():
    print(location, time_spent)
('remove_html_markup', 238) 0.0003748020390048623
('remove_html_markup', 239) 0.0002687927335500717
('remove_html_markup', 240) 0.00025362137239426374
('remove_html_markup', 241) 0.00021975429262965918
('remove_html_markup', 243) 0.0016262800199910998
('remove_html_markup', 244) 0.0014232781250029802
('remove_html_markup', 246) 0.0014147617621347308
('remove_html_markup', 247) 0.0002851528115570545
('remove_html_markup', 248) 0.0011260181199759245
('remove_html_markup', 250) 0.0008431547321379185
('remove_html_markup', 252) 0.0008588926866650581
('remove_html_markup', 249) 0.000286611495539546
('remove_html_markup', 253) 0.0004212367348372936
('remove_html_markup', 255) 0.00027999025769531727

And we can also create a total for an entire function:

collector.total('remove_html_markup')
0.00968234718311578

Visualizing Time Spent

Let us now go and visualize these numbers in a simple form. The idea is to assign each line a color whose saturation indicates the time spent in that line relative to the time spent in the function overall – the higher the fraction, the darker the line. We create a MetricDebugger class built as a specialization of SpectrumDebugger, in which suspiciousness() and color() are repurposed to show these metrics.

class MetricDebugger(SpectrumDebugger):
    """Visualize a metric"""

    def metric(self, location: Location) -> float:
        sum = 0.0
        for outcome in self.collectors:
            for collector in self.collectors[outcome]:
                assert isinstance(collector, MetricCollector)
                m = collector.metric(location)
                if m is not None:
                    sum += m

        return sum

    def total(self, func_name: str) -> float:
        total = 0.0
        for outcome in self.collectors:
            for collector in self.collectors[outcome]:
                assert isinstance(collector, MetricCollector)
                total += sum(collector.all_metrics(func_name))

        return total

    def maximum(self, func_name: str) -> float:
        maximum = 0.0
        for outcome in self.collectors:
            for collector in self.collectors[outcome]:
                assert isinstance(collector, MetricCollector)
                maximum = max(maximum, 
                              max(collector.all_metrics(func_name)))

        return maximum

    def suspiciousness(self, location: Location) -> float:
        func_name, _ = location
        return self.metric(location) / self.total(func_name)

    def color(self, location: Location) -> str:
        func_name, _ = location
        hue = 240  # blue
        saturation = 100  # fully saturated
        darkness = self.metric(location) / self.maximum(func_name)
        lightness = 100 - darkness * 25
        return f"hsl({hue}, {saturation}%, {lightness}%)"

    def tooltip(self, location: Location) -> str:
        return f"{super().tooltip(location)} {self.metric(location)}"

We can now introduce PerformanceDebugger as a subclass of MetricDebugger, using an arbitrary MetricCollector (such as TimeCollector) to obtain the metric we want to visualize.

class PerformanceDebugger(MetricDebugger):
    """Collect and visualize a metric"""

    def __init__(self, collector_class: Type, log: bool = False):
        assert issubclass(collector_class, MetricCollector)
        super().__init__(collector_class, log=log)

With PerformanceDebugger, we inherit all the capabilities of SpectrumDebugger, such as showing the (relative) percentage of time spent in a table. We see that the for condition and the following assert take most of the time, followed by the first condition.

with PerformanceDebugger(TimeCollector) as debugger:
    for i in range(100):
        s = remove_html_markup('<b>foo</b>')
print(debugger)
 238   3% def remove_html_markup(s):  # type: ignore
 239   2%     tag = False
 240   2%     quote = False
 241   2%     out = ""
 242   0%
 243  16%     for c in s:
 244  14%         assert tag or not quote
 245   0%
 246  15%         if c == '<' and not quote:
 247   3%             tag = True
 248  11%         elif c == '>' and not quote:
 249   2%             tag = False
 250  10%         elif (c == '"' or c == "'") and tag:
 251   0%             quote = not quote
 252   8%         elif not tag:
 253   4%             out = out + c
 254   0%
 255   2%     return out

However, we can also visualize these percentages, using shades of blue to indicate those lines most time spent in:

debugger
 238 def remove_html_markup(s):  # type: ignore
 239     tag = False
 240     quote = False
 241     out = ""
 242  
 243     for c in s:
 244         assert tag or not quote
 245  
 246         if c == '<' and not quote:
 247             tag = True
 248         elif c == '>' and not quote:
 249             tag = False
 250         elif (c == '"' or c == "'") and tag:
 251             quote = not quote
 252         elif not tag:
 253             out = out + c
 254  
 255     return out

Other Metrics

Our framework is flexible enough to collect (and visualize) arbitrary metrics. This HitCollector class, for instance, collects how often a line is being executed.

class HitCollector(MetricCollector):
    """Collect how often a line is executed"""

    def __init__(self) -> None:
        super().__init__()
        self.hits: Dict[Location, int] = {}

    def collect(self, frame: FrameType, event: str, arg: Any) -> None:
        super().collect(frame, event, arg)
        location = (frame.f_code.co_name, frame.f_lineno)

        self.hits.setdefault(location, 0)
        self.hits[location] += 1

    def metric(self, location: Location) -> Optional[int]:
        if location in self.hits:
            return self.hits[location]
        else:
            return None

    def all_metrics(self, func: str) -> List[float]:
        return [hits
                for (func_name, lineno), hits in self.hits.items()
                if func_name == func]

We can plug in this class into PerformanceDebugger to obtain a distribution of lines executed:

with PerformanceDebugger(HitCollector) as debugger:
    for i in range(100):
        s = remove_html_markup('<b>foo</b>')

In total, during this call to remove_html_markup(), there are 6,400 lines executed:

debugger.total('remove_html_markup')
6400.0

Again, we can visualize the distribution as a table and using colors. We can see how the shade gets lighter in the lower part of the loop as individual conditions have been met.

print(debugger)
 238   1% def remove_html_markup(s):  # type: ignore
 239   1%     tag = False
 240   1%     quote = False
 241   1%     out = ""
 242   0%
 243  17%     for c in s:
 244  15%         assert tag or not quote
 245   0%
 246  15%         if c == '<' and not quote:
 247   3%             tag = True
 248  12%         elif c == '>' and not quote:
 249   3%             tag = False
 250   9%         elif (c == '"' or c == "'") and tag:
 251   0%             quote = not quote
 252   9%         elif not tag:
 253   4%             out = out + c
 254   0%
 255   3%     return out
debugger
 238 def remove_html_markup(s):  # type: ignore
 239     tag = False
 240     quote = False
 241     out = ""
 242  
 243     for c in s:
 244         assert tag or not quote
 245  
 246         if c == '<' and not quote:
 247             tag = True
 248         elif c == '>' and not quote:
 249             tag = False
 250         elif (c == '"' or c == "'") and tag:
 251             quote = not quote
 252         elif not tag:
 253             out = out + c
 254  
 255     return out

Integrating with Delta Debugging

Besides identifying causes for performance issues in the code, one may also search for causes in the input, using Delta Debugging. This can be useful if one does not immediately want to embark into investigating the code, but maybe first determine external influences that are related to performance issues.

Here is a variant of remove_html_markup() that introduces a (rather obvious) performance issue.

import time
def remove_html_markup_ampersand(s: str) -> str:
    tag = False
    quote = False
    out = ""

    for c in s:
        assert tag or not quote

        if c == '&':
            time.sleep(0.1)  # <-- the obvious performance issue

        if c == '<' and not quote:
            tag = True
        elif c == '>' and not quote:
            tag = False
        elif (c == '"' or c == "'") and tag:
            quote = not quote
        elif not tag:
            out = out + c

    return out

We can easily trigger this issue by measuring time taken:

with Timer.Timer() as t:
    remove_html_markup_ampersand('&&&')
t.elapsed_time()
0.31450774997938424

Let us set up a test that checks whether the performance issue is present.

def remove_html_test(s: str) -> None:
    with Timer.Timer() as t:
        remove_html_markup_ampersand(s)
    assert t.elapsed_time() < 0.1

We can now apply delta debugging to determine a minimum input that causes the failure:

s_fail = '<b>foo&amp;</b>'
with DeltaDebugger.DeltaDebugger() as dd:
    remove_html_test(s_fail)
dd.min_args()
{'s': '&'}

For performance issues, however, a minimal input is often not enough to highlight the failure cause. This is because short inputs tend to take less processing time than longer inputs, which increases the risks of a spurious diagnosis. A better alternative is to compute a maximum input where the issue does not occur:

s_pass = dd.max_args()
s_pass
{'s': '<b>fooamp;</b>'}

We see that the culprit character (the &) is removed. This tells us the failure-inducing difference – or, more precisely, the cause for the performance issue.

Lessons Learned

  • To measure performance,
    • instrument the code such that the time taken per function (or line) is collected; or
    • sample the execution that at regular intervals, the active call stack is collected.
  • To make code performant, focus on efficient algorithms, efficient data types, and sufficient abstraction such that you can replace them by alternatives.
  • Beyond efficient algorithms and data types, do not optimize before measuring.

Next Steps

This chapter concludes the part on abstracting failures. The next part will focus on

Background

Scalene is a high-performance, high-precision CPU, GPU, and memory profiler for Python. In contrast to the standard Python cProfile profiler, it uses sampling instead of instrumentation or relying on Python's tracing facilities; and it also supports line-by-line profiling. Scalene might be the tool of choice if you want to go beyond basic profiling.

The Wikipedia articles on profiling) and performance analysis tools provide several additional resources on profiling tools and how to apply them in practice.

Exercises

Exercise 1: Profiling Memory Usage

The Python tracemalloc module allows tracking memory usage during execution. Between tracemalloc.start() and tracemalloc.end(), use tracemalloc.get_traced_memory() to obtain how much memory is currently being consumed:

import tracemalloc
tracemalloc.start()
current_size, peak_size = tracemalloc.get_traced_memory()
current_size
23083
tracemalloc.stop()

Create a subclass of MetricCollector named MemoryCollector. Make it measure the memory consumption before and after each line executed (0 if negative), and visualize the impact of individual lines on memory. Create an appropriate test program that (temporarily) consumes larger amounts of memory.

Exercise 2: Statistical Performance Debugging

In a similar way as we integrated a binary "performance test" with delta debugging, we can also integrate such a test with other techniques. Combining a performance test with Statistical Debugging, for instance, will highlight those lines whose execution correlates with low performance. But then, the performance test need not be binary, as with functional pass/fail tests – you can also weight individual lines by how much they impact performance. Create a variant of StatisticalDebugger that reflects the impact of individual lines on an arbitrary (summarized) performance metric.

Creative Commons License The content of this project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The source code that is part of the content, as well as the source code used to format and display that content is licensed under the MIT License. Last change: 2023-11-16 15:49:39+01:00CiteImprint