Useful Sticky Notes

Sunday, July 12, 2015

Automatic Performance Profiling for Python with cProfile

This is extensively documented online in a lot of places, but I figured it is worth spreading some more. Plus I think I may have some additional perspectives to add from my own point of view.

Some Motivating Background
This post is motivated by some work I had been doing for the OpenWorm project, and one of the things that had been desired was a quick way to generate and manage performance data for the purposes of performance regression on our test codes. Of course this is not restricted to Python codes, but as it turns out some of our earlier test codes are in Python, and Python's cProfile is one of the low-hanging fruits available for the automatic gathering of basic (and maybe more sophisticated - I'll elaborate later) performance profiles for Python.

Available Online Documentation
Profiling and cProfile documentation can be found from the Python documentation here:

Googling for "cProfile" should also result in a decent number of technical blog posts discussing this tool.

Usage is pretty straight forward and in its most basic form requires no changes to our python code base. One simply specifies the inclusion of the cProfile module in the command line with an output file parameter. These changes are highlighted in bold in the example below:

> python -m cProfile -o myPerfProfile.out

The output file myPerfProfile.out is a binary file. Apparently the format is not 100% documented. As I'll elaborate later, it may contain more information than simply a flat function performance profile of your python code. It is also an option to specify an output file to the cProfile module. The result of not exercising the option is a simple text output to stdout of a flat performance profile. While this is kind of convenient, it isn't probably what you'd want if you need to perform (even basic) useful performance analysis or regression analysis of your code (the latter is what our group needs.)

Getting What We Want
If we printed the default text performance profile to stdout, it is displayed as a list of entries sorted by the names of functions invoked. There are several reasons why this is not satisfactory for our purposes:

  1. We may only want to see performance information associated with user code (i.e., functions that our Python code invokes directly.) The default output includes performance information associated with all library calls invoked on behalf of user code. In this scenario, we probably want the data sorted by cumulative execution time (i.e., the sum of time taken exclusively by our functions, and taken by all function calls made by our functions) and filtered appropriately in order to figure out where most time is spent where our own code is concerned. From a performance analysis perspective, this grants us some basic information from which code-optimization decisions may be reached.
  2. We may want to see performance information in the form of absolute time spent - exactly which leaf functions take the most time. In this scenario, we want the data sorted by total execution time (also known as exclusive time) with no filtering required.
For our purposes, we would like to start the consideration of our performance regression needs from the perspective spelled out in point 1. The following python script can be used to extract that information from the binary file generated in the prior section:

import sys
import cProfile
import pstats

if len(sys.argv) != 2:
    print 'Performance data file required. You have ' + str(len(sys.argv) - 1) + ' arguments\n'
    stats = pstats.Stats(sys.argv[1])
# This configuration generates somewhat more appropriate user-function focused data
#   sorted by cumulative time which includes the time of calls further down the callstack.
#    stats.sort_stats('time','calls')

# This configuration filters the output so only user-functions will show up.
#    stats.print_stats()

No comments:

Post a Comment