Conversations about life & privacy in the digital age

Python is Python is Python…. except when it isn’t Python.

One of the largest factors to recommend dynamic interpreted
languages and runtimes is, of course, memory and object management.
However, when interfacing these to external libraries, the boundary
is crossed from a managed environment to a binary ABI environment,
with all the ‘fun’ that entails. This becomes especially interesting
when your interface is a ‘light’ wrapper that does not protect
against shooting yourself in the foot or insulating you away
from the bugs of that binary ABI.

Awhile back, the excellent
valgrind tool was developed, which
is a dynamic memory and threading debugging tool for Linux
applications. Valgrind becomes an excellent tool for complicated C
and C++ programs. Because valgrind works at the OS/ABI level, it can
be adapted to any environment, however.

Here at SpiderOak we use valgrind when a debugging issue appears
to be involved with any C or C++ library we interface with; the
most frequent case is Qt. When writing an application handling
I/O in real-time from multiple sources, you end up with a sophisticated
flow of code, which makes the output of tools like valgrind difficult
to use. In Python, valgrind has been used as a tool to debug Python
itself, but not necessarily to debug Python applications, as valgrind
won’t tell you what Python code called the C or C++ or other library
code where the bug you’re hunting has appeared.

We have a patch to valgrind and a small wrapper library that lets
you recover this information. You can download this (GPLv3)
at our code page.

To use, you will need to be able to recompile your own valgrind
executable. For us, the Ubuntu gutsy or hardy valgrind source
packages are excellent for this. In our python support for valgrind,
we implement a ‘supplemental stack’ that a running program can use to
notify valgrind of where in your application it’s at, so you can
track what python functions are involved with an issue as well as the
C/C++ library functions. In our example environment, this
information is helpful when your application involves twisted or
pydispatch/louie-powered indirect calls (i.e. via Twisted deferred or
pydispatch/louie signals). We distribute this supplemental stack
patch along with a Cpython wrapper library which valgrind will use to
wray Python stack frames to retrieve the information needed.

After downloading our valgrind-python support patches, and
building libpywrap.so, you can run your Python application with
LD_PRELOAD=libpywrap.so valgrind /path/to/python/interp/using/app.
Valgrind will then give you output corresponding to the python stack
frames and source locations alongside the usual Valgrind stack
output.

This is not a turnkey or very stable solution. We absolutely do
not suggest running it in an untrusted environment. Make sure you’re not
running this with anything involving the opportunity to leak data,
or, a particularly nasty user might crack your box.
That said, valgrind often allows you to shave hours off your
debugging time for tracking down some problems. Now you can shave
hours off your debugging for those problems when they’re in Python,
too.

For those new to valgrind, here’s a short example of how to use this in
Ubuntu, having a download of our valgrind-python-1.0.1.tar.bz2. You should
also have HREF="http://svn.python.org/view/python/branches/release25-maint/Misc/valgrind-python.supp?rev=51333&view=markup">Misc/valgrind-python.supp
from your python source distribution. (Or use our provided link from the python
SVN).

% sudo apt-get build-dep valgrind
% sudo aptitude install fakeroot python2.5-dev
% apt-get source valgrind
% tar xjf valgrind-python-1.0.1.tar.bz2
% # this is where we add our supplemental stack patch for valgrind
% cd valgrind-3.3.0/debian/patches
% cp ../../../valgrind-python-1.0.1/50_sup-stack.dpatch .
% # go ahead and edit this line in the middle of patches if you care
% echo 50_sup-stack >> patches
% cd ../..
% fakeroot ./debian/rules binary
% sudo dpkg -i ../the_valgrind_deb_you_made.deb
% cd ../valgrind-python-1.0.1
% make
% # after make finishes, you should have libpywrap.so in the
valgrind-python dir. This is what you run with
LD_PRELOAD=libpywrap.so valgrind python2.5

And so…

% LD_PRELOAD=$(pwd)/libpywrap.so valgrind
–suppressions=valgrind-python.supp ipython
[various valgrind boilerplate here]
>>> from ctypes import *
>>> class crasher(Union):
… _fields_=[(“x”,c_int),(“y”,c_char_p)]

>>> badptr=crasher()
>>> badptr.x=2
>>> badptr.y[0] # BOOM!

==29497== Python Stack:
==29497== <stdin>:1 <module>
==29497== Invalid read of size 1
==29497== at 0x40239D8: strlen (mc_replace_strmem.c:242)
==29497== by 0x80945A9: PyString_FromString (stringobject.c:112)
==29497== by 0x47F1474: z_get (cfield.c:1341)
==29497== by 0x47ECD0D: CData_get (_ctypes.c:2315)
==29497== by 0x47F0BE9: CField_get (cfield.c:221)
==29497== by 0x808968C: PyObject_GenericGetAttr (object.c:1351)
==29497== by 0x80C7608: PyEval_EvalFrameEx (ceval.c:1990)
==29497== by 0x402773B: PyEval_EvalFrameEx (pywrap.c:62)
==29497== by 0x80CB0D6: PyEval_EvalCodeEx (ceval.c:2836)
==29497== by 0x80CB226: PyEval_EvalCode (ceval.c:494)
==29497== by 0x80EADAF: PyRun_InteractiveOneFlags (pythonrun.c:1273)
==29497== by 0x80EAFD5: PyRun_InteractiveLoopFlags (pythonrun.c:723)
==29497== Address 0×2 is not stack’d, malloc’d or (recently) free’d

With some more configuration work, you will get valgrind
output with useful data for whichever libraries you use, and can tell
what python usage may be tweaking bugs in your non-python libraries.
Good luck!