Tuesday, March 25, 2008

Debugging D Unit Tests

I had one slide in my presentation at the D Programming Language Conference last year that demonstrated how to automatically insert breakpoints at all unit test functions in a D program (using, of course, the ZeroBUGS debugger for Linux).

The trick mainly consists in invoking a small Python script at startup:
zero --py-run=dunit.py
where the contents of dunit.py may look something like this:

# dunit.py
import os

import re
import zero

def on_table_done(symTable):
extension = re.compile(".d$")
process = symTable.process()
module = symTable.module()

for unit in module.translation_units():

if unit.language() == zero.TranslationUnit.Language.D:
name = os.path.basename(unit.filename())

i = 0
while True:
symName = "void " + extension.sub('.__unittest%d()' % i, name)

print symName
matches = symTable.lookup(symName)
if len(matches) == 0:
break
for sym in matches:
process.set_breakpoint(sym.addr())
i += 1

(see http://zero-bugs.com/python.html for more information on the scripting support in the debugger).

This contortion is painful but necessary with D, because unit tests have the interesting property of running before anything else in the program.

Without this bit of gymnastics, by the time the debugger (trusted hunting companion) is wagging its tail at the main function, the unit test functions have long completed (or failed).

The script line shown above in bold (which constructs the symbol name of the unit tests) is now broken because of a change in the D 2.0 language. The name is prefixed by the module, as explained here: http://www.digitalmars.com/d/2.0/hijack.html

What happens is that the scripts is looking for functions named
void Blah.__unittest0() , void Blah.__unittest1() , etc whereas they are now called void mymodule.Blah.__unittest0(), void mymodule.Blah.__unittest1(), and so on.

Before, I could synthesize the symbol names to look for, but now this is hardly possible since there's no place to infer mymodule from, unless the compiler produces the (standard) DW_TAG_module DWARF debug info entry.

Unfortunately the DMD compiler does not generate a DW_TAG_module, and so it is almost impossible for the debugger to determine the module name. One possible hack would be to sniff other symbols in the module and look at their prefixes.

Or we could change the script like so:

import os
import re
import zero

def on_table_done(symTable):
extension = re.compile('.d$')

process = symTable.process()
module = symTable.module()

for unit in module.translation_units():
if unit.language() == zero.TranslationUnit.Language.D:
matches = symTable.lookup("_moduleUnitTests")
if len(matches) == 0:
break
for sym in matches:
process.set_breakpoint(sym.addr()


This stops the program in the debugger a couple of function calls before the unit test functions are executed. However it is not ideal since now we have to step over a bunch of assembly code.

Ideally, the D compiler should produce DW_TAG_module info.

Or the debugger could look up symbol names by regular expressions, but I have to punt on the performance implications of such an approach.on the performance implications of such an approach.on the performance implications of such an approach.

No comments: