Useful Sticky Notes

Wednesday, May 10, 2017

Work-around for exec() differences between Python 2 and Python 3

Recently I've had to navigate a Python 2 vs 3 compatibility issue with regard to the changes in semantics (?) for the exec() call. I'll note here that I do not yet fully comprehend all of the deep issues and subtleties involved here, so if anyone could enlighten me please do so in the comments. I am also not certain that what I am implementing in order to achieve my goals is in fact a well-established (or even reasonable) coding idiom.

Anyway in a project I am working on with the OpenWorm Foundation, I encountered a scenario where it was beneficial to parameterize a oft-repeated section of code. In a GET request from Django, I'd loop through all of our supported field types, and invoke a function to process each field type if that field's signature shows up in the GET request.

The call to filter on a field in Django requires that the field type be a part of the code text - something like - filter(__icontains=searchString). The searchString term is not a problem since it refers to a variable name. myField however is part of the code text, and cannot be a variable string. So to achieve the parameterization I desired, I had to encapsulate the code in an exec() call like so - exec('filter(' + myField + '__icontains=searchString'));

A quick caveat - the above pseudo-code fragment will work just fine in both Python 2 and 3. It was used to build the context for my motivations for writing code of this nature. The real problem arises when I attempt to assign code of that nature to a local variable in a loop within the function, something akin to an accumulation operation. In Python 2's case, the code will work just fine as intended - the exec() call is treated as a in-place statement. In the case of Python 3, exec() is a function but there are some rules governing the way scoping works that I do not yet fully understand. Because of those rules, direct assignment to local function variables will not work.

To illustrate here is a code fragment I wrote which more or less captures the nature of what I was trying to achieve in the production code:

In this case, the output looks like this:


There is a weird bug where the number doesn't come out right in the "correct" Python 2 case when "*" was supplied to the code, but I don't think that should distract us from the main problem here. Python 2 will report the expected result, but Python 3 won't.

The workaround appears to be to assign into a construct like a list. Somehow exec() will allow mutable variables to be modified while properly scoped and referenced like in the following code:

Now Python 2 and Python 3 agrees on the output and behavior:

Friday, August 19, 2016

Public Digital Credentials via Badge List

In the course of my recent volunteer work with the OpenWorm community, I've had the opportunity to explore the use of Badges as a community tool to establish digital credentials for members of the community.

The basic concept of Badges comes from an education perspective. The idea is to permit an open and transparent method of acknowledging one's online course credentials after completing courses, tutorials, and the like. For organizations like OpenWorm, it has the secondary benefit of being a way to identify expert mentors whom new volunteers could approach for help in appropriate technical domains.

As an open science organization, OpenWorm (http://openworm.org) has started to make use of Badge List (https://www.badgelist.com/OpenWorm) for this purpose. And the purpose of this blog post was simply to test the embedding of earned badges from Badge List into a blog. I figured the prior long exposition on badges and digital credentials might make this a more meaningful post :).



Sunday, July 12, 2015

Automatic Performance Profiling for Python with cProfile

This is extensively documented online in a lot of places, but I figured it is worth spreading some more. Plus I think I may have some additional perspectives to add from my own point of view.

Some Motivating Background
This post is motivated by some work I had been doing for the OpenWorm project, and one of the things that had been desired was a quick way to generate and manage performance data for the purposes of performance regression on our test codes. Of course this is not restricted to Python codes, but as it turns out some of our earlier test codes are in Python, and Python's cProfile is one of the low-hanging fruits available for the automatic gathering of basic (and maybe more sophisticated - I'll elaborate later) performance profiles for Python.

Available Online Documentation
Profiling and cProfile documentation can be found from the Python documentation here:

https://docs.python.org/2/library/profile.html

Googling for "cProfile" should also result in a decent number of technical blog posts discussing this tool.

Usage
Usage is pretty straight forward and in its most basic form requires no changes to our python code base. One simply specifies the inclusion of the cProfile module in the command line with an output file parameter. These changes are highlighted in bold in the example below:

> python -m cProfile -o myPerfProfile.out myPythonCode.py

The output file myPerfProfile.out is a binary file. Apparently the format is not 100% documented. As I'll elaborate later, it may contain more information than simply a flat function performance profile of your python code. It is also an option to specify an output file to the cProfile module. The result of not exercising the option is a simple text output to stdout of a flat performance profile. While this is kind of convenient, it isn't probably what you'd want if you need to perform (even basic) useful performance analysis or regression analysis of your code (the latter is what our group needs.)

Getting What We Want
If we printed the default text performance profile to stdout, it is displayed as a list of entries sorted by the names of functions invoked. There are several reasons why this is not satisfactory for our purposes:

  1. We may only want to see performance information associated with user code (i.e., functions that our Python code invokes directly.) The default output includes performance information associated with all library calls invoked on behalf of user code. In this scenario, we probably want the data sorted by cumulative execution time (i.e., the sum of time taken exclusively by our functions, and taken by all function calls made by our functions) and filtered appropriately in order to figure out where most time is spent where our own code is concerned. From a performance analysis perspective, this grants us some basic information from which code-optimization decisions may be reached.
  2. We may want to see performance information in the form of absolute time spent - exactly which leaf functions take the most time. In this scenario, we want the data sorted by total execution time (also known as exclusive time) with no filtering required.
For our purposes, we would like to start the consideration of our performance regression needs from the perspective spelled out in point 1. The following python script can be used to extract that information from the binary file generated in the prior section:

import sys
import cProfile
import pstats

if len(sys.argv) != 2:
    print 'Performance data file required. You have ' + str(len(sys.argv) - 1) + ' arguments\n'
    sys.exit()
else:
    stats = pstats.Stats(sys.argv[1])
# This configuration generates somewhat more appropriate user-function focused data
#   sorted by cumulative time which includes the time of calls further down the callstack.
    stats.sort_stats('cumulative','time','calls')
#    stats.sort_stats('time','calls')

# This configuration filters the output so only user-functions will show up.
    stats.print_stats('PyOpenWorm')    
#    stats.print_stats()

Sunday, May 31, 2015

Some Brief Notes On -finstrument-functions

Been trying to get back to profiling some of my Mac OS X stuff, and I struggled a little bit to get this working. The main snafu on my part was testing a "simple" code naively. Not really thinking, I did:

gcc -finstrument-functions test.c instrument.c -o test

This results in a segfault, which on closer examination via lldb (gdb equivalent on the Mac) would show a very very deep stack trace. The reason this happens is because the instrumentation functions __cyg_profile_func_enter and __cyg_profile_func_exit both also get automatically included in the instrumentation setup.

There appears to be a non-standard function attribute that can be associated with the 2 instrumentation functions, but others appear to have run into scenarios where compilers would not recognize the attribute, resulting in the same segfault.

The solution is simple:

gcc -c instrument.c
gcc -finstrument-functions test.c instrument.o -o test

but this just tells me the support for profiling in this setup is pretty much an afterthought. Anyway, this really is a note-to-self in case I forget in the future, and need to figure this thing out again.

Monday, January 26, 2015

Mac OS X Latex Support on Yosemite

I had not intended to jump the gun on documenting Latex over the various other overall stages of my attempt to set up a coherent development environment on a fresh Yosemite machine ... but my preference is to get the documentation down, and properly linked against my master software list first. So here goes.

I recalled a great deal of pain when I used Homebrew to install tex-live on my MacBook Pro. Sadly, I do not recall what it was … it was one of those pain-once forget-later scenarios I am determined to eliminate when setting up my development environment on my Yosemite machine.

This online article sums it up (amongst others which basically said the same thing about latex and homebrew): http://tex.stackexchange.com/questions/97183/what-are-the-practical-differences-between-installing-latex-from-mactex-or-macpo

The key takeaway was this article linked in the above:

and the comment from the homebrew people themselves:

“Installing TeX from source is weird and gross, requires a lot of patches,
and only builds 32-bit (and thus can't use Homebrew deps on Snow Leopard.)

We recommend using a MacTeX distribution: http://www.tug.org/mactex/

However, there seems to be hope in the form of MacTex being available via a Homebrew cask - brew cask install mactex

This process has a caveat however:

Arya:~ cheelee$ brew cask install mactex
==> Caveats
To use mactex, zsh users may need to add the following line to their
~/.zprofile.  (Among other effects, /usr/texbin will be added to the
PATH environment variable):

 eval `/usr/libexec/path_helper -s`

Finally, mactex installs its binaries (pdflatex is now a default symlink to pdftex as a convenience) in /usr/texbin. To re-create the command-line interface I am familiar with, with access to such tools as pdflatex, bibtex, etc ... I needed to create an environment module for mactex/2014 to be loaded with my bash shell as a default (see http://cwleehpc.blogspot.com/2013/09/the-modules-environment-on-mac-os-x.html).

Friday, January 16, 2015

Update: New Mac Mini - Baseline Software Stack Setup

I've successfully set up a baseline software environment upon which I can proceed to build/install additional tools, and eventually my own software development projects on my Yosemite system.

1. XCode - this is downloaded/installed for free using the App Store application that is shipped with Yosemite. Straightforward process - one point to note is we no longer seem to be required to activate CommandLine Tools as a separate optional feature of XCode.

2. Homebrew - The reason I think this should come second is because I speculate Homebrew's requirement for ruby necessitates getting XCode first. Again, the process is fairly straightforward (information about Homebrew can be found here - http://computers.tutsplus.com/tutorials/homebrew-demystified-os-xs-ultimate-package-manager--mac-44884) :
  • Download via http://brew.sh
  • ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
  • brew doctor
3. Modules Environment - while technically not a part of a baseline software stack, I am including this here as its installation subsequently allows all other software to be conscientiously built either as the default software layer, or as part of a hot-swappable framework (which is what I am after.)

Modules has as its prerequisite the Tcl/Tk libraries and headers. I've documented the details of a successful build/install here - http://cwleehpc.blogspot.sg/2013/09/the-modules-environment-on-mac-os-x.html
At this point, I'll just highlight how I intend to (roughly) structure my development environment to use this baseline stack framework.
Folder structure:
software 
   - software/non-repository (where source packages are kept for building purposes)
   - software/installed/packages (where built packages are installed for hot-swap)
   - software/repository/[git|svn|cvs|hg] (where source packages are kept)
   - software/modules (where module files control how each installed package is loaded/unloaded)

The result is fairly satisfying - Homebrew places default software in a space it manages, and is responsible for appropriately overriding the original default software shipped with Yosemite. Modules then allows me to further compartmentalize alternative software installations in software/installed/packages, and allows me to hot-swap these to override Homebrew's default as appropriate.

From this baseline, I will continue to document how I layer my working environment according to the classes of software I think are important to me, for example documentation/publication tools like latex and gnuplot; the host of language compilers; support tools for my intended web development environment etc ...