Chee Wai's Technical Notes: Tools Development

Showing posts with label Tools Development. Show all posts

Sunday, July 12, 2015

Automatic Performance Profiling for Python with cProfile

This is extensively documented online in a lot of places, but I figured it is worth spreading some more. Plus I think I may have some additional perspectives to add from my own point of view.

Some Motivating Background

This post is motivated by some work I had been doing for the OpenWorm project, and one of the things that had been desired was a quick way to generate and manage performance data for the purposes of performance regression on our test codes. Of course this is not restricted to Python codes, but as it turns out some of our earlier test codes are in Python, and Python's cProfile is one of the low-hanging fruits available for the automatic gathering of basic (and maybe more sophisticated - I'll elaborate later) performance profiles for Python.

Available Online Documentation

Profiling and cProfile documentation can be found from the Python documentation here:

https://docs.python.org/2/library/profile.html

Googling for "cProfile" should also result in a decent number of technical blog posts discussing this tool.

Usage

Usage is pretty straight forward and in its most basic form requires no changes to our python code base. One simply specifies the inclusion of the cProfile module in the command line with an output file parameter. These changes are highlighted in bold in the example below:

> python -m cProfile -o myPerfProfile.out myPythonCode.py

The output file myPerfProfile.out is a binary file. Apparently the format is not 100% documented. As I'll elaborate later, it may contain more information than simply a flat function performance profile of your python code. It is also an option to specify an output file to the cProfile module. The result of not exercising the option is a simple text output to stdout of a flat performance profile. While this is kind of convenient, it isn't probably what you'd want if you need to perform (even basic) useful performance analysis or regression analysis of your code (the latter is what our group needs.)

Getting What We Want

If we printed the default text performance profile to stdout, it is displayed as a list of entries sorted by the names of functions invoked. There are several reasons why this is not satisfactory for our purposes:

We may only want to see performance information associated with user code (i.e., functions that our Python code invokes directly.) The default output includes performance information associated with all library calls invoked on behalf of user code. In this scenario, we probably want the data sorted by cumulative execution time (i.e., the sum of time taken exclusively by our functions, and taken by all function calls made by our functions) and filtered appropriately in order to figure out where most time is spent where our own code is concerned. From a performance analysis perspective, this grants us some basic information from which code-optimization decisions may be reached.
We may want to see performance information in the form of absolute time spent - exactly which leaf functions take the most time. In this scenario, we want the data sorted by total execution time (also known as exclusive time) with no filtering required.

For our purposes, we would like to start the consideration of our performance regression needs from the perspective spelled out in point 1. The following python script can be used to extract that information from the binary file generated in the prior section:

import sys
import cProfile
import pstats

if len(sys.argv) != 2:
    print 'Performance data file required. You have ' + str(len(sys.argv) - 1) + ' arguments\n'
    sys.exit()
else:
    stats = pstats.Stats(sys.argv[1])
# This configuration generates somewhat more appropriate user-function focused data
#   sorted by cumulative time which includes the time of calls further down the callstack.
    stats.sort_stats('cumulative','time','calls')
#    stats.sort_stats('time','calls')

# This configuration filters the output so only user-functions will show up.
    stats.print_stats('PyOpenWorm')    
#    stats.print_stats()

Sunday, May 31, 2015

Some Brief Notes On -finstrument-functions

Been trying to get back to profiling some of my Mac OS X stuff, and I struggled a little bit to get this working. The main snafu on my part was testing a "simple" code naively. Not really thinking, I did:

gcc -finstrument-functions test.c instrument.c -o test

This results in a segfault, which on closer examination via lldb (gdb equivalent on the Mac) would show a very very deep stack trace. The reason this happens is because the instrumentation functions __cyg_profile_func_enter and __cyg_profile_func_exit both also get automatically included in the instrumentation setup.

There appears to be a non-standard function attribute that can be associated with the 2 instrumentation functions, but others appear to have run into scenarios where compilers would not recognize the attribute, resulting in the same segfault.

The solution is simple:

gcc -c instrument.c
gcc -finstrument-functions test.c instrument.o -o test

but this just tells me the support for profiling in this setup is pretty much an afterthought. Anyway, this really is a note-to-self in case I forget in the future, and need to figure this thing out again.

Friday, October 31, 2014

Quick Updates - Development Focus

I apologize for the lack of any new posts over the past few months. I'd been suffering from some health issues that limited my ability to focus on technical matters. It has gotten better now, and I am trying to do some catching up.

It has recently occurred to me that my background of working with HPC tools and codes over a wide range of HPC software stacks and platforms, had left me with a development experience that is ad-hoc and poorly structured. This realization was coupled with the understanding that I now only possess the Mac OS X development environment as the only real convenient platform to work with. I can also do some work on my Windows machine via minGW, but it is really set up as my gaming platform. Finally, I do have Ubuntu on Virtualbox on my MacBook Pro on which to do Linux development, but that is fairly inconvenient. My living environment is also limiting any idea that I pick up a dedicated Linux box as an alternative. Finally, I realized that Mac OS X Darwin is not merely a Linux/Unix variant, but has its own set of internal design choices and implementation that make any naive approach to tool development from a Linux-centered perspective unproductive.

As such, I plan on taking a step back after the long hiatus to properly acquaint myself with Mac OS X internals from the perspective of a tool developer. Over the next few months, I will attempt to document my experiences and what I have learned as I try to implement common HPC tools design idioms. These idioms will include basics like attaching/coupling tool runtimes with user code/binaries at varying degrees of transparency. They will also include library wrapping and override idioms. These idioms will perhaps for the basis for my next article, to give readers a sense of where I am coming from with regards to my own software development background involving tools.

Useful Sticky Notes