Programming and Debugging (in my Underhøøsen)

I'm Coming in With the Flag

2013-01-13T21:12:00.000-08:00

I got so excited about my success with modding Sauerbraten and adding XInput support that I decided to add the gamepad code to AssaultCube as well. Because Cube1 and Cube2 engines are similar in many ways, doing so was a breeze (took me a bit to figure that the Y axis is reversed in AssaultCube but other than that it worked great). Since my last post I also made some enhancements to the controller code, such as the ability to bind buttons and triggers to arbitrary script code. Actions can now be bound to the analog trigger half-presses which is extremely useful for engaging the aim-assist code. Yeah, I wrote my own aim-assist to compensate for the game controller's lack of precision and maybe it is best I do not elaborate any further on that, the fine line between aim assist and aim bots being a controversial one.

Controller in hand and cold beer in reach, I sat down for a single-player session of AC only to realize that the AI is not as capable as Sauerbraten's and the bots have many limitations. For example they cannot play Capture the Flag (CTF) and cannot navigate the underground sewage system in the ac_aqueous map. It also seemed a little strange that the AC bots use a configuration file with a syntax of its own instead of using the powerful Cubescript language.

I could not pass the challenge and decided to look into writing my own AI code. I have never tackled this kind of stuff before. As a professional programmer I worked with compilers, debuggers and touched databases; I worked for Amazon.com and for Microsoft (yeah, I did write a few thousand lines for Windows 8, guilty as charged). But I have no game development experience, aside of a lame Python implementation of Reversi I wrote a few years back.

Naturally I got very excited about experimenting with an AI system. My goal was to come up with playable, good-enough code that could coexist with the current AI. I did not want to overhaul the entire bot code and disturb other parts of the game that depend on it. I just wanted to add a side-by-side AI, and I must admit the thought of having the two AIs fight each-other while I spectate crossed my evil mind. I aimed for keeping the code small and compact, in the minimalist tradition of the Cube engines.

I decided to use a waypoint system for navigating the maps, similar to what the existing bots use. I am aware that better systems can be devised (http://www.ai-blog.net/archives/000152.html) but I wanted to keep the code very simple. The waypoints are created by a greedy algorithm that tries to span trees to cover the entire map using a collision detection scheme. A second pass ads a few more edges and attempts to connect together unconnected sub-graphs. This work is not spread across several frames so depending on how fast (or slow) the computer is the game may seem to "freeze" for a short moment when it loads the map. I added a variable named wpxautosave which when set causes the waypoints to be written to a file named the same as the map (plus the .wpx extension). This file is automatically loaded if present, so the short "freeze" inconvenience can be alleviated this way. I preferred this over a more complicated code that spreads the work over several frames.

The problem I grappled with for many weekends is that using the game physics for collision detection is not a precise business. Trying to move the player's character on all possible edges in order to validate them is expensive. Cutting some corners (pun well intended) may yield graphs that are incorrect and cause the AI to bump into walls. On the other hand, being too conservative may end up in disconnected graphs. I decided to go with a hybrid approach were waypoints can be generated automatically (with a somewhat conservative take on collisions) but also manually, by running over the map and dropping waypoints in Sauerbraten's style. This way, if I have time to kill I can make my own waypoints (manually and accurately) for a given map, or if I just want instant gratification I let the machine generate them, and more are added as I play. The latter approach has an interesting side-effect. As new waypoints are added, the bots appear as they "learn" new ways around the map. I added a variable (exposed to the user via Cubescript) to limit the growth of the graph, and thus the computational effort required for path finding during game play.

The bots use the Dijkstra algorithm for their path finding instead of A*. I thought this was more convenient in situations where they are not interested in finding their way from point A to a specific point B, but rather to the nearest of any points that satisfy certain criteria (any health pickup, or any Kevlar or helmet, or any ammo pickup, for example).

I wrote the code using Visual Studio 2010 to take advantage of the C++11 features that it supports, such as lambda functions. I guess GCC would've worked just fine but these days I have no compelling reason to boot up Ubuntu, thanks to how badly Gnome 3 managed to disgust me; but that's a story for another day.

Each bot implements a simple state machine in the form of a stack of std::function<void()>. The function at the top of the stack represents the current state; when it is popped off the stack the bot returns to executing its previous state. If the stack is empty, it executes an idle() routine. The stack is populated by the think() routine, which sets a long term goal and optionally some short-term goals, following the wisdom of the Quake Arena AI design. In a Team Death Match game, the long term goal is to seek and engage enemy units. In a CTF game the goal is to locate and retrieve a flag.

Because the collision detection is not perfect and because of using a waypoint system for navigation, the bots may get stuck every once in a while. I added some heuristics to resolve the collisions by jumping, crouching, or rotating around the obstacle which so far seems to work well with solid obstacles but requires more future tweaking for things such as link fences or light posts; overall I am pretty pleased with the result and last night I was able to play CTF on ac_aqueous with the bots chasing me through water underground.

And I could not resist wiring in some sound bytes since they are already present in the game. Your bot team mates will complain when shot at accidentally, or may brag about their kills.

For the curious and courageous that want to play with the code: first, get the latest AssaultCube 1.2 source from http://sourceforge.net/projects/actiongame/. Then get my hacks from my SkyDrive:
http://sdrv.ms/VEU0f0 and http://sdrv.ms/VEU12G and follow the instructions in the comments at the top of the ai.cpp file. Adapting the code so that it builds under Linux is possible but I have not tried it, VisualStudio 2010 is the only environment under which I built the code. You can also grab the Windows binary file from here and overwrite your AssaultCube 1.2 (make sure you get the latest bits from sourceforge, the mod may not work with older versions).

Currently there is not a lot of magic in the bots decision making process. It is by and large driven by their health, armor, ammunition, who they may cross paths with, and plain randomness. I plan to experiment with genetic algorithms once I get a good feel for the quality of the existing code (got a couple of bugs to address before I move on).

There are other interesting features in the relatively small code (around 2600 lines of C++), such as team collaboration. The bots ask for help when attacked or when going for the enemy team's flag. I could go on and on about how much fun I had with this experiment. But I've a game to play!

Hackenbraten: Native XInput Support for Sauerbraten

2012-10-17T23:39:00.000-07:00

Although I've been programming professionally for eighteen years I've very little to no experience with video game development and I always wanted to get my hands dirty messing with a FPS engine. An open source game is a great place to start so I checked out Sauerbraten from http://sauerbraten.sourceforge.net/.

The rain was back in Seattle this past weekend. An excellent excuse to cozy up indoors and hack Sauerbraten.

I must say that I am quite impressed with the no-fuss, minimalistic coding style of the authors and by how well the code reads. I was able to find my way around it fairly quickly. As a professional developer I have seen way too many over-complicated code bases, polluted with "clever" design patterns and unnecessary C++ - isms that nobody under an IQ of 300 understands. Heck, I wrote my own fair share of inscrutable C++ templates. Saurebraten's style is all about self-explanatory short functions that focus on the algorithms. The scripting engine is very efficient and elegant.

Because I type all day for a living when I play videogames I prefer giving the keyboard a rest and use a controller.

Although Sauerbraten has no native support for gamepads it can be played with controllers that know how to emulate keyboard and mouse events (such as the Logitech F510, which comes with software for customizing it). This is okay, albeit not the best experience you can get.

In particular you miss on the rumble / vibration feedback, and shooting things is all that Sauer is all about. There are other potentially cool things such as analog triggers with configurable threshold. And how about some finesse and control the speed of your movement with the thumbstick (move faster when it is pushed farther out)?

I was curious if I could improve my gaming experience (and maybe learn a thing or two about the Sauerbraten engine in the process). Since I was doing this on Windows 7 and the game engine works on Linux and Mac as well one goal was to make a minimal impact on the overall portability.

I ended up with a "hybrid" implementation, in the sense that for some gamepad actions I directly interact with the engine (for moving the player I mess with the move, strafe, and velocity data) and for others (such as mouse motion) I just push SDL_Events into the event queue.

Overall I am pretty pleased with the result. At the high level, the "mod" I came up with consists of:

a change to weapons.cpp so that custom actions can be performed upon firing a gun;
one C++ file for the gamepad module proper;
two function declarations (the "entry points" into the module), one for polling the controller state and the other for writing back the bindings to the config file when the game exits, and
changes to main.cpp and console.cpp to call the functions above.

The first item on the list is orthogonal to the rest of the gamepad code and it could be useful in its own right for example to play the sound of an empty cartridge hitting the floor.

The gamepad can be turned on or off with the gamepadon variable, and gamepadbind allows for extra tweaking. One kludge that I am not particularly proud of is that gamepad buttons cannot be bound directly to cubescript, instead they have to be bound to a key, like in this example:

sound_drop = ( registersound "cristiv/dropcartridge" )
bind F8 [ sound $sound_drop ]
gamepadbind right_trigger_release F8

it would be nicer to simply say:

gamepadbind right_trigger_release [ sound $sound_drop ]

But this is an exercise for another rainy day. By the way, it is so cool being able to map separate actions to analog trigger presses and releases. For example, in my implementation the default mapping of the left trigger is to zoom in when pressed, and zoom out when released. One idea I'd like to try out some day is to make my player character start jumping when the left trigger is pressed, and stop when it is released.

I am right handed. My default bindings are to control movement with the left thumbstick or D-pad, look around with the right stick, and shoot with the right analog and digital triggers. This scheme can be easily reversed by using the gamepadbind command to remap the triggers and thumbsticks.

One thing I did before proceeding with implementing the gamepad input code, was to add some triggers for the guns, so that I can call custom cubescript when the weapons are fired. In weapons.cpp I added this block (right before the shoteffects function):


    VAR(gun_trigger_debug, 0, 0, 1);

    static void doguntrigger(const fpsent* d, int gun)
    {
        defformatstring(aliasname)("gun_fired_%d", gun);
        if(gun_trigger_debug) conoutf(CON_DEBUG, "%s:%s", name, aliasname);
        if(identexists(aliasname)) execute(aliasname);
    }

And then I added a line at the end of the shoteffects function:


    if (d==player1) doguntrigger(d, gun);

The gamepad code exposes a function called vibrate to cubescript which takes three parameters: the speed for left vibrate motor, speed for right motor, and the duration in milliseconds. The above plumbing allows for it to be called from my configuration script:

// guns feedback
gun_fired_0 = [ vibrate 20000 20000 500 ]
gun_fired_1 = [ vibrate 5000 35000 200 ]
gun_fired_2 = [ vibrate 10000 25000 300 ]
gun_fired_3 = [ vibrate 32000 64000 360 ]
gun_fired_4 = [ vibrate 0 30000 200 ]
gun_fired_5 = [ vibrate 0 25000 200 ]
gun_fired_6 = [ vibrate 0 25000 200 ]
gun_fired_7 = [ vibrate 2000 10000 200 ]

The next thing to do was to add two function declarations to engine.h (towards the end of the file, right before the closing #endif):

namespace gamepad
{
#if _WIN32
    extern void checkinput(dynent*);
    extern void writebinds(stream*);
#else
    inline void checkinput(dynent*) {}
    inline void writebinds(stream*) {}
#endif
}

These two entry points are called right at the beginning of checkinputs (in main.cpp), and at the end of writebinds (in console.cpp), respectively:


void checkinput()
{
    gamepad::checkinput(player);

    SDL_Event event;
    int lasttype = 0, lastbut = 0;
// etc...


void writebinds(stream *f)
{
 // ...snip...
    gamepad::writebinds(f);
}

Finally, I "just" added a file to the Visual Studio project (the most recent development version of Sauerbraten comes with a solution file in src/vcpp which plays nice with Vistual Studio 2010) called xinputpad.cpp, and then edited its properties so that it uses the engine.h / engine.pch files for precompiled headers (rather than cube.h / cube.pch).

I think this approach is minimally invasive as the code sits nicely in its own file, and the changes to engine.h, main.cpp and console.cpp are tiny. I also tried imitating the terse coding style of Sauerbraten rather than using my own (C++ politically correct and at times bombastic) pen.

I am having a blast (rumble rumble) with this game. Boy I love that rocket launcher!

This is the full xinputpad.cpp code. Be careful when copying and pasting, some characters that are "unsafe" for HTML might have been encoded.

// Experimental support for Microsoft XInput-compatible gamepads in Sauerbraten.
// Zlib license. Copyright (c) 2012  cristi.vlasceanu@gmail.com
#include "engine.h"
#include <XInput.h>
#if defined(_MSC_VER)
 #pragma comment(lib, "XInput.lib")
#endif
#ifndef _countof
 #define _countof(a) sizeof(a)/sizeof(a[0])
#endif

#define DECLARE_XINPUTS \
    XINPUT(none, action_none), \
    XINPUT(left_stick, action_move), \
    XINPUT(right_stick, action_mouse), \
    XINPUT(left_trigger, action_key, "z"), \
    XINPUT(right_trigger, action_key, "MOUSE1"), \
    XINPUT(left_trigger_release, action_key, "z"), \
    XINPUT(right_trigger_release, action_none ), \
    XINPUT(dpad_up, action_key, "w"), \
    XINPUT(dpad_down, action_key, "s"), \
    XINPUT(dpad_left, action_key, "a"), \
    XINPUT(dpad_right, action_key, "d"), \
    XINPUT(start, action_none), \
    XINPUT(back, action_none), \
    XINPUT(left_thumb, action_none), \
    XINPUT(right_thumb, action_none), \
    XINPUT(left_shoulder,  action_key, "c"),\
    XINPUT(right_shoulder, action_key, "MOUSE1"), \
    XINPUT(button_a, action_key, "0"),\
    XINPUT(button_b, action_key, "F10"), \
    XINPUT(button_x, action_key, "F9"), \
    XINPUT(button_y, action_key, "SPACE")

#define XINPUT(i,...) x_##i
enum { DECLARE_XINPUTS };

#undef XINPUT
#define STRINGIZE(i) #i
#define XINPUT(i,...) STRINGIZE(i)
static const char* inputs[] = { DECLARE_XINPUTS };

// bind to keyboard, mouse motion or player movement
enum actiontype
{ 
    action_none,
    action_mouse,
    action_move,
    action_key,
};
static const char* actions[] = { "none", "mouse", "move" };

struct keym;  // defined in console.cpp
extern keym* findbind(char* key);
extern void execbind(keym &k, bool isdown);

#undef XINPUT
#define XINPUT(i,...) { __VA_ARGS__ }

static struct action
{
    actiontype type;
    const char* def;
    keym* km;
    Uint8 prevstate;
    string name;
}
const defaultbinds [] = { DECLARE_XINPUTS };

static void bindaction(action& a, actiontype type, const char* key)
{
    a.type = type;
    a.def = NULL;
    a.km = NULL;
    a.prevstate = 0;

    if (type==action_key && key)
    {
        copystring(a.name, key);
        a.km = findbind(const_cast<char*>(key));
        if(!a.km) conoutf(CON_ERROR, "unknown key \"%s\"", key);
    }
}

// used when synthesizing mouse events, affects how fast the player turns
VARP(gamepadspeed, 0, 32, 512);
// invert y axis when emulating mouse motion events
VARP(gamepadinverty, 0, 0, 1);

VARP(triggerthreshold, 0, XINPUT_GAMEPAD_TRIGGER_THRESHOLD, 255);


struct controller
{
    enum thumb { left, right };
    int id, vibratemillis;
    bool bounded;
    action binds[_countof(defaultbinds)];
    static int count;

    controller() : id(count++), vibratemillis(0), bounded(false) { }
    ~controller() { XINPUT_STATE state; if (getstate(state)) vibrate(0, 0); }

    void resetbinds()
    {
        loopi(_countof(binds))
            bindaction(binds[i], defaultbinds[i].type, defaultbinds[i].def);
        bounded = true;
    }

    void vibrate(int left, int right, int duration = 0)
    {
        XINPUT_VIBRATION vibration = { min(left, 65535), min(right, 65535) };
        XInputSetState(id, &vibration);
        vibratemillis = duration + lastmillis;
    }

    bool getstate(XINPUT_STATE& state)
    {
        ZeroMemory(&state, sizeof state);
        bool result = (XInputGetState(id, &state) == ERROR_SUCCESS);
        if (lastmillis > vibratemillis) vibrate(0, 0);
        return result;
    }

    void motionevent(int x, int y, int dx, int dy)
    {
        SDL_Event e = { };
        e.type = SDL_MOUSEMOTION;
        e.motion.x = x;
        e.motion.y = y;
        e.motion.xrel = dx;
        e.motion.yrel = gamepadinverty ? dy : -dy;
        SDL_PushEvent(&e);
    }

    int inline mousedelta(int x, int xmax, int speed)
    {
        if (x >= xmax) return speed;
        if (x < -xmax) return -speed;
        return 0;
    }

    // compute the delta that we're going to use in constructing a fake mouse motion event
    int mousedelta(thumb t, int x)
    {
        static const int xmax = 32767;
        const int speed = gamepadspeed;
        int dx = mousedelta(x, xmax, speed);
        for (int i = 2; i != 32; i *= 2)
            if (dx == 0) dx = mousedelta(x, xmax / i, speed / i);
        return dx;
    }

    void checkthumbstick(thumb t, int x, int y, dynent* player)
    {
        static const int deadzone[] = { XINPUT_GAMEPAD_LEFT_THUMB_DEADZONE, XINPUT_GAMEPAD_RIGHT_THUMB_DEADZONE };

        int dx, dy;

        switch (binds[t + x_left_stick].type)
        {
        case action_mouse:
            dx = mousedelta(t, x);
            dy = mousedelta(t, y);
            if (dx || dy) motionevent(0, 0, dx, dy);
            break;

        case action_move:
            if (player->k_up || player->k_down || player->k_left || player->k_right) break;
            player->move = y > deadzone[t] ? 1 : (y < -deadzone[t] ? -1 : 0);
            player->strafe = x > deadzone[t] ? -1 : (x < -deadzone[t] ? 1 : 0);
            if (player->move || player->strafe) player->vel.mul(sqrtf(x*x + y*y) / 32767);
            break;
        }
    }

    void checkbindkey(action& a, Uint8 on)
    {
        if (a.type != action_key) return;
        if (on != a.prevstate)
        {
            if (a.km) execbind(*a.km, on != 0);
            a.prevstate = on;
        }
    }

    void checktrigger(thumb t, BYTE level)
    {
        const bool pressed = level > triggerthreshold;
        // check for trigger release
        if (!pressed && binds[t + x_left_trigger].prevstate)
        {
            auto& release = binds[t + x_left_trigger + 2];
            if (release.type==action_key && release.km) execbind(*release.km, true);
        }
        checkbindkey(binds[t + x_left_trigger], pressed);
    }

    void checkbuttons(WORD b)
    {
        checkbindkey(binds[x_dpad_up], (b & XINPUT_GAMEPAD_DPAD_UP) != 0);
        checkbindkey(binds[x_dpad_down], (b & XINPUT_GAMEPAD_DPAD_DOWN) != 0);
        checkbindkey(binds[x_dpad_left], (b & XINPUT_GAMEPAD_DPAD_LEFT) != 0);
        checkbindkey(binds[x_dpad_right], (b & XINPUT_GAMEPAD_DPAD_RIGHT) != 0);

        checkbindkey(binds[x_start], (b & XINPUT_GAMEPAD_START) != 0);
        checkbindkey(binds[x_back], (b & XINPUT_GAMEPAD_BACK) != 0);
        checkbindkey(binds[x_left_thumb], (b & XINPUT_GAMEPAD_LEFT_THUMB) != 0);
        checkbindkey(binds[x_right_thumb], (b & XINPUT_GAMEPAD_RIGHT_THUMB) != 0);
        
        // these are the digital triggers on some Logitech controllers
        checkbindkey(binds[x_left_shoulder], (b & XINPUT_GAMEPAD_LEFT_SHOULDER) != 0);
        checkbindkey(binds[x_right_shoulder], (b & XINPUT_GAMEPAD_RIGHT_SHOULDER) != 0);

        checkbindkey(binds[x_button_a], (b & XINPUT_GAMEPAD_A) != 0);
        checkbindkey(binds[x_button_b], (b & XINPUT_GAMEPAD_B) != 0);
        checkbindkey(binds[x_button_x], (b & XINPUT_GAMEPAD_X) != 0);
        checkbindkey(binds[x_button_y], (b & XINPUT_GAMEPAD_Y) != 0);
    }    
};
int controller::count = 0;

static controller c;

ICOMMAND(vibrate, "iii", (int* left, int* right, int* millisec), {
    c.vibrate(*left, *right, *millisec);
});

void gamepadbind(const char* name, const char* act)
{
    if (!c.bounded) c.resetbinds();
    if (strcasecmp(name, "reset") == 0) { c.resetbinds(); return; }

    int a = 0, n = 1;
    for (; a != _countof(actions) && strcasecmp(actions[a], act); ++a);
    for (; n != _countof(inputs) && strcasecmp(inputs[n], name); ++n);
    if (a == _countof(actions)) a = action_key;
    if (n == _countof(inputs)) conoutf(CON_ERROR, "gamepad input %s is not defined", name); 
    else if (a == action_key && (n == x_left_stick || n == x_right_stick))
        conoutf(CON_ERROR, "Cannot bind thumbsticks to key presses");
    else if ((a == action_move || a == action_mouse) && n != x_left_stick && n != x_right_stick)
        conoutf(CON_ERROR, "Cannot bind mouse or movement to key presses");
    else bindaction(c.binds[n], actiontype(a), act);
}
COMMAND(gamepadbind, "ss");

VARP(gamepadon, 0, 0, 1);

namespace gamepad
{
    void checkinput(dynent* player)
    {
        if (!gamepadon) return;
        if (!c.bounded) c.resetbinds();
    
        XINPUT_STATE state;
        if (!c.getstate(state)) return;

        c.checkthumbstick(controller::left, state.Gamepad.sThumbLX, state.Gamepad.sThumbLY, player);
        c.checkthumbstick(controller::right, state.Gamepad.sThumbRX, state.Gamepad.sThumbRY, player);
        c.checktrigger(controller::left, state.Gamepad.bLeftTrigger);
        c.checktrigger(controller::right, state.Gamepad.bRightTrigger);
        c.checkbuttons(state.Gamepad.wButtons);
    }

    void writebinds(stream* f)
    { 
        for (int i = 1; i != _countof(c.binds); ++i)
        {
            action& a = c.binds[i];
            f->printf("gamepadbind %s %s\n", inputs[i], a.type==action_key ? a.name : actions[a.type]);
        }
    }
}

ZeroBUGS Lambda Fun

2011-07-17T16:45:00.002-07:00

A friend of mine tried to compile the code from my previous post using GCC, only to get a bunch of error messages. As I suspected, the fix was to specify -std=c++0x on the command line. But before answering my friend's G+ message, I had to verify that the code worked with GCC. And one thing lead to another. After compiling, I was curious to see how my ZeroBUGS debugger copes with lambdas. What else to do on a rainy Sunday afternoon in Seattle, other than playing with some old C++ code while listening to Judas Priest?

ZeroBUGS is a visual debugger for Linux, a project that ate up all of my spare time between 2004 and 2008. I kept making small changes to it since, but very limited in scope. For some reason, since the Big Recession I found myself having lesser and lesser time for on-the-side projects. I tried for a while making ZeroBUGS a commercial enterprise, hoping to leave my day time job(s) and become self employed. I learned the hard way that selling proprietary software to Linux geeks is not very lucrative. Or maybe I should have partnered with a savvy sales guy, the kind that can sell refrigerators to penguins.

In late 09 I put ZeroBUGS on ice and went working on Microsoft Windows for a short bit (just long enough to write a few lines of code that will hopefully make it into the upcoming Windows 8.)

After leaving Microsoft and joining TableauSoftware, I yanked the closed source product off my web site, and re-released ZeroBUGS as open source (and free as in free beer under the Boost License.)

I have not come upon major bugs in the debugger since a few years ago, when I discovered that the "step-over" functionality was broken for recursive functions.

So I was pretty confident the debugger will handle the new C++0X just fine. Except it didn't!

After some debugging, I traced the problem to the unnamed classes that the compiler generates to capture the surrounding variables. My debugger cashes data types by name for performance reasons. Unnamed classes normally occur in some scope, and thus there is no clash. Except that in the case of lambda functions, GCC generates unnamed classes at the outer most scope (i.e. the DWARF entries describing their type is at level 1, immediately nested in the compilation unit scope.) The data structures visualization was completely off, because the debugger used the wrong datatype (the first "unnamed" always won).

A simple hack that appends the file index and the line number to the offending "unnamed" solves the problem for now, as the snapshot above can testify.

While I think of a better solution this one will have to do. I am done with the computer for now, off to enjoy the weather and barbecue in the rain for the rest of the night!

Template Template Method

2011-07-16T11:00:00.000-07:00

I recently had to write a bunch of C++ functions that shared a common pattern:

1) acquire a resource lock
2) TRY
3) do some work
4) CATCH exceptions of interest, and log them
5) release resource lock

I was eager to implement the "do some work" bits but did not want to bore myself silly by repeating the steps 1, 2, 4, 5 in each function (about thirty or so of them.) Now, you may recognize the problem, it is what the Template Method Design Pattern solves: "avoid duplication in the code: the general workflow structure is implemented once in the abstract class's algorithm, and necessary variations are implemented in each of the subclasses."

I could not however apply the Template Method pattern "as is" because my functions did not share a common signature. So wrapping them with a non-virtual function and then implement the "do some work" as virtual methods would not work in my case.

One alternative was to code steps 1 and 2 above as a BEGIN_CALL macro, wrap up steps 4 and 5 into a END_CALL and decorate each of my methods with these archaic C-isms. The approach would indeed work for C, but it is utterly indecorous in C++.

The spirit of the Template Method pattern can be preserved very elegantly by making the template method a C++ template method, and wrap the variant "do some work" bits into a lambda block.

The code sample below illustrates the idea (I added some mock objects to give more context).

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

// Mock a resource that needs exclusive locking
//
// -- think CComCriticalSection for example:
// http://msdn.microsoft.com/en-us/library/04tsf4b5(v=vs.80).aspx
class Resource
{
public:
void Lock() { cout << "Resource locked." << endl; }
void Unlock() { cout << "Resource unlocked." << endl; }
};

// Generic stack-based lock. Works with any resource
// that implements Lock() and Unlock() methods.
// See:
// http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization
template<typename T>
class Lock
{
Lock(const Lock&);              // non-copyable
Lock& operator=(const Lock&);   // non-assignable

T&  m_resource;                 // Lock MUST NOT outlive resource

public:
explicit Lock( T& resource ) : m_resource( resource )
{
    m_resource.Lock( );
}

~Lock( )
{
    m_resource.Unlock( );
}
};

/////////////////////////////////////////////////////////////////////////////
// Template Method Wrapper for an arbitrary lambda block:
//  1) lock a resource
//  2) log exceptions
// This is a variant of the Template Method Design Pattern
// -- implemented as a C++ template method.
template<typename F>
auto Execute(Resource& r, F f) -> decltype(f())
{
Lock<Resource> lock(r);
try
{
    return f();
}
catch (const exception& e)
{
    // log error
    clog << e.what() << endl;
}
typedef decltype(f()) result_type;
return result_type();
}
/////////////////////////////////////////////////////////////////////////////

// Usage example:

static Resource globalResource;


int f (int i)
{
return Execute(globalResource, [&]()->int
{
    return i + 1;
});
}

string g (unsigned j)
{
return Execute(globalResource, [&]()->string
{
    ostringstream ss;
    ss << '[' << j << ']';
    return ss.str();
});
}

int main()
{
cout << f(41) << endl;
cout << g(42) << endl;

return 0;
}

Update: to compile the code above using GCC you may need to specify "-std=c++0x" on the command line.

One thing that I grappled with for a bit was how to make the Execute template function figure out the return type of the wrapped lambda. After pinging Andrei Alexandrescu at Facebook (or is it "on Facebook"? No matter -- my English as a second language works either way, because Andrei does work for Facebook) and some googling around, I found the magic incantation: decltype(f()).

A Voyage to the Center of the Borg

2010-12-12T21:59:00.000-08:00

A great while has passed since my last blog post. To my defense: I had been very busy playing Locutus of Borg. Microsoft bought out my employer back in 2007, and consequently I spent the past three years working in their Online Services Division, and Windows Server (yes, I did contribute a few lines of code to the upcoming version of Windows).

For months, if not years of my being a Redmondite I envisioned how I was going to blog about the strange and at times surreal experience of working for Microsoft. After all, I am the kind of geek who did not use Windows on his home machines. I only installed it after the acquisition, so that I could work from home every once in a while. I had imagined how the day would look like, when free of the Borg, I was going to report on how life on the inside felt like. I yearned to write about Microsoft politics, coding style and culture, the famous never-ending meetings and (some really amazing) people that I met there. I thought that I should document how their caste system (oops, I mean, career path model) twists people.

But now that I am on the outside, I cannot think of anything interesting to write about my days at Microsoft. Life goes on.

I'd rather blog on what a great product Tableau Software (my new employer) has. Or I could write about what an exciting weekend this one was, living on the web with my new Chrome OS laptop from Google (thanks again Sorin for the great surprise that got delivered Friday night!) Or I could write about enjoying the great read that the D Programming Language is.

Later.

The Power of Foreach

2009-06-23T01:06:00.000-07:00

In D, arrays can be traversed using the foreach construct:


int [] a = [1, 5, 10, 42, 13];
foreach (i;a) {
    writefln(“%d”, i);
}

The array of integers in this example is traversed and each element printed, in the natural order in which elements appear in the sequence. To visit the elements in reverse order, simply replace foreach with foreach_reverse. It is as intuitive as it gets.

Moreover, linear searches can be implemented with foreach: simply break out of the loop when the searched-for value is found:


foreach (i;a) {
    writefln(“%d”, i);
    if (i == 42) {
        break;
    }
}

Consider now a tree data structure where a tree node is defined as:


class TreeNode {
 TreeNode left;
 TreeNode right;
 int value;
}

What is the meaning of a statement such as foreach (node; tree) { … }? The simple answer is that with the above definition of TreeNode, the code does not compile.

But if it were to compile, what should it do? Visit the nodes in order, or in a breadth-first fashion? No answer is the right one, unless we get to know more about the problem at hand. If we’re using a binary search tree to sort some data, then foreach would most likely visit the nodes in-order; if we’re evaluating a Polish-notation expression tree, we might want to consider post-order traversal.

Foreach Over Structs and Classes

Tree data structures and tree traversal occur often in computer science problems (and nauseatingly often in job interviews). Balanced binary trees are routinely used to implement associative containers, as C++ programmers are certainly familiar with the standard template collections set and map.

One difference between sequence containers (such as lists, arrays, and queues) on one hand, and containers implemented with trees on the other, is that there are more ways to iterate over the elements of a tree than there are ways to enumerate the elements of a sequence. A list (for example) can be traversed from begin to end and, in the case of double-linked lists, in reverse, from the end to the beginning; that’s it. But a tree can be traversed in order, in pre-order, post-order, or breadth first.

No built-in traversal algorithm will fit all possible application requirements. D’s approach is to provide the opApply operator as "standard plumbing" where users can plug their own algorithms for iterating over the elements of a class or struct. The operator is supposed to implement the iteration logic, and delegate the actual processing of the objects to a ... delegate:


class TreeNode {
public:
    TreeNode left;
    TreeNode right;
    int value;
    int opApply(int delegate(ref TreeNode) processNode) {
        // ... tree traversal
        // ...
        return 0;
    }
}

When the programmer writes a foreach loop, the compiler syntesizes a delegate function from the body of the loop, and passes it to opApply. In this example, the body of the delegate will have exactly one line that contains the writefln statement:


TreeNode tree = constructTree();
foreach(node; tree) {
    writefln(“%d”, node.value);
}

For an in-order traversal, the implementation of opApply may look something like this:


int opApply(int delegate(ref TreeNode) processNode) {
    return (left && left.opApply(processNode)) 
        ||  processNode(this)
        || (right && right.opApply(processNode));
}

The delegate that the compiler synthesizes out of the foreach body returns an integer (which is zero by default). A break statement in the foreach loop translates to the delegate function returning a non-zero value. As you can see in code above, a correct implementation of opApply should make sure that the iteration is "cancelled" when the delegate returns a non-zero value.

The traversal function’s argument must match the delegate argument type in the signature of opApply. In the example above the processNode function could modify the tree node that is passed in. If the TreeNode class writer wanted to outlaw such use, the opApply operator should have been declared to take a delegate that takes a const TreeNode parameter:


class TreeNode {
// …
    int opApply(int delegate(ref const TreeNode) processNode) {
        return processNode(this);      
    }
}

The new signature demands that the client code changes the parameter type from TreeNode to const TreeNode. Any attempt to modify the node object from within the user-supplied traversal function will fail to compile.

Another possible design is to encode all traversal algorithms as TreeNode methods. The following shows an example for the in-order algorithm (other traversal algorithms are left as an exercise for the reader):


class TreeNode {
// …
    int traverseInOrder(int delegate(ref int) dg) {
        if (left) {
            int r = left.traverseInOrder(dg);
            if (r) {
                return r;
            }
        }
        int r = dg(value);
        if (r) {
            return r;
        }
        if (right) {
            r = right.traverseInOrder(dg);
            if (r) {
                return r;
            }
        }
        return 0;
    }
}

foreach(val; &tree.traverseInOrder) {
    Console.WriteLine(val);
}

D Generators

A generator is a function or functor that returns a sequence, but instead of building an array or vector containing all the values and returning them all at once, a generator yields the values one at a time. Languages such as C# and Python have a yield keyword for this purpose. In D a generator can be implemented with foreach and a custom opApply operator. Assume one wants to print the prime numbers up to N, like this:


    foreach (i; PrimeNumbers()) {
        if (i > N) {
            break;
        }
        writeln(i);
    }

To make this work, the PrimeNumbers struct could be implemented like this:


struct PrimeNumbers {
    int n = 1;
    int primes[];

    int opApply(int delegate(ref int) dg) {
loop:
        while (true) {
            ++n;
            foreach (p; primes) {
                if (n % p == 0) {
                    continue loop;
                }
            }
            primes ~= n;
            if (dg(n)) {
                break;
            }
        }
        return 1;
    }
}

Pragma Assembly

2009-06-18T00:11:00.000-07:00

Publishing the source code for D.NET on CodePlex in its current (rough) form turned out to be a great idea, as I received very good feedback. Tim Matthews of New Zealand has submitted several bug reports and patches and convinced me to change my stance on the "assembly class hack".

The hack was a very limited solution to the problem of qualifying imported declarations by the name of the assembly where they live. I described the problem and the attempt to hack around it at the end of a post back in December; a diligent reader commented that I should have used the pragma mechanism instead. I resisted the suggestion at the time, mainly because I am trying to avoid changing the compiler front-end if I can help it. (Front-end changes have to ultimately be merged back into Walter' s source tree, and he is a very very busy guy.)

A D implementation on .NET is not going to be of much use without the ability to generate code that imports and interfaces with existing assemblies. Guided by this idea, Tim Matthews did a lot of trail-blazing and prototyping and showed that because in .NET namespaces can span across assemblies, there has to be a way of specifying an arbitrary number of assemblies in one import file. My "assembly class" hack allowed for one and only one assembly name to be specified.

So I had to byte the bullet and do the right thing: assembly names are now specified with a pragma, like this:


pragma (assembly, "mscorlib")
{
// imported declarations here...
}

Any number of such blocks can appear within a D import file. And in the future, the code will be extended to allow a version and a public key to be specified after the assembly name.

Another thing that has to be fixed is to make the compiler adhere to the convention of using a slash between enclosing classes and nested types, as described in Serge Lidin's Expert .NET 2.0 IL Assembler, CHAPTER 7 page 139:

Since the nested classes are identified by their full name and their encloser (which is in turn identified by its scope and full name), the nested classes are referenced in ILAsm as a concatenation of the encloser reference, nesting symbol / (forward slash), and full name of the nested class

Tim also submitted a front-end patch that allows directories and import files of the same name to coexist, so that the code below compiles without errors:


import System;
import System.Windows.Forms;

An alternative solution that I proposed was to create import files named "this.d", so that the above would have read:


import System.this;
import System.Windows.Forms;

After some consideration, I bought Tim's point that this is not what .NET programmers would expect, and went on to apply his patch.

D .NET on Codeplex

2009-05-10T00:17:00.000-07:00

I will be taking a few days off next week and I decided to upload the code for the D compiler .net back-end on Codeplex before I close shop. I hope that it will provide context for the last few months worth of blog posts.

Most core language features are usable, but there's no Phobos port and if you need to import functionality from external DLLs, you'll have to hand-write some import files (following the model in druntime/import/System.di); I hope to get around to writing a tool that automates the process one of these days -- and it will most likely be in D.

There is a Visual Studio 8 (works with Visual Studio 8 Express) solution file in the source tree, and there is also a Makefile for the adventurous. The project compiles on Linux but it does not work quite well with Mono -- I strongly suspect it is Mono's segfault.

Check it out at http://dnet.codeplex.com/!

Argument against _argptr

2009-04-28T22:47:00.000-07:00

Variadic functions work slightly different in my D.NET implementation than under the native D compiler.

For functions with variable numbers of arguments, the native compiler synthesizes two parameters: _arguments and _argptr; _arguments is an array of TypeInfo objects, and _argptr is a pointer to the beginning of the variable arguments on the stack. The user is supposed to query the type information in _arguments, and do the proper pointer arithmetic to navigate the arguments. You can see some examples at http://www.digitalmars.com/d/2.0/function.html:


void printargs(int x, ...)
{
    writefln("%d arguments", _arguments.length);
    for (int i = 0; i < _arguments.length; i++)
    {   _arguments[i].print();

        if (_arguments[i] == typeid(int))
        {
            int j = *cast(int *)_argptr;
            _argptr += int.sizeof;
            writefln("\t%d", j);
        }
        else if (_arguments[i] == typeid(long))
        {
            long j = *cast(long *)_argptr;
            _argptr += long.sizeof;
            writefln("\t%d", j);
        }
        // ...

The pointer arithmetic is not verifiable in managed code. A separate array of type descriptors is not necessary in .net, because the type meta-data can be passed in with the arguments.

In D.NET, the variable arguments are passed as an array of objects. For example, for a D function with the prototype

void fun(...)

the compiler outputs:


.method public void '_D23funFYv' (object[] _arguments)

I handled variadic support slightly differently from the native compiler: I dropped _argptr and provided a new helper function, _argtype, that can be used as demonstrated in this example:


void fun(...)
{
    foreach(arg; _arguments)
    {
        if (_argtype(arg) == typeid(int))
        {
            int i = arg;
            Console.WriteLine("int={0}".sys, i);
        }
        else if (_argtype(arg) == typeid(string))
        {
            string s = arg;
            Console.WriteLine(s.sys);
        }
    }
}

If the type of the arguments is known, there is no need to check for the typeid:


void fun(...)
{
    foreach(arg; _arguments)
    {
        int i = arg;
        Console.WriteLine(i);
    }
}

If an incorrect type is passed in, it is still okay, because the error is detected at runtime.


fun("one", "two", "three"); // int i = arg will throw

The downside of this approach is that it is not compatible with the native code. This does not affect template variadic functions, which should be perfectly portable.

Slice and D-ice

2009-04-28T12:12:00.000-07:00

It is amazing how much insight one can get into a language by simply writing a compiler for it... Today I am going to spend half a lunch break copying and pasting into this post a stack of notes related to array slices (collected over the past few months of working on a .net back-end for the D compiler).

D sports a data type called array slice that is intended to boost the performance of array-intensive computations. A slice is a lightweight value type, conceptually equivalent to a range of array elements, or a "view" into an array. It can be thought of as consisting of a reference to an array, and a pair of begin-end indexes into the array:


struct (T) ArraySlice {
    T[] a; // reference to array
    int begin; // index where slice begins
    int end; // one past index where slice ends
}

The actual representation of a slice is currently internal to the compiler, and completely opaque to the programmer. The template struct above is not what the layout of a slice actually looks like, it is intended for illustrative purposes only.

Consider this code:


int a[] = [1, 3, 5, 7, 9, 11, 13, 17];
int[] s = a[2..5];

The second declaration introduces "s" as a slice of array "a", starting at position two and ending at (but not including) the fifth element. Using the template pseudo-definition of ArraySlice, the code is conceptually equivalent to:


int a[] = [1, 3, 5, 7, 9, 11, 13, 17];
ArraySlice!(int) s = { a, 2, 5 };

To understand how array slices may help performance, consider an application that reads and parses XML files. The input can be loaded as an array of characters (a huge string). The application builds a DOM representation of the input, and each node in the DOM contains a token string. This approach is wasteful, because the token strings hold copies of character sequences that are already present in the input; copying tokens around has a linear complexity (it is directly proportional with the number of characters in the token) and the same is true for the spatial complexity (how much memory is being used). But XML tokens could be modeled as slices of character arrays ("views" into the original XML input string), and complexity in both time and space would drop down to a constant value.

This design can be implemented in other languages than D, but memory management issues may add unnecessary complexity. In C++ for example we'll have to make sure that the life-time of the token slices does not exceed the scope of the original input string. D belongs to the family of garbage-collected languages; by default, objects are reference-counted, and holding a reference to an array slice indirectly keeps the original array "alive", because the slice contains a reference to it.

Now that the design rationale behind array slices is understood, let's take another look at the syntax. You have probably noticed that in the statement:
int[] s = a[2..5];

The declaration part introduces "s" as an array of integers; it is not until you see the assignment that the lightweight, array slice, true nature of "s" is revealed.
D has no special syntax for disambiguating between "true" arrays and array slices in declarations; they can be used interchangeably. As a matter of fact, a function signature with array parameters will happily accept a slice as argument. In the following code, both "a" and "s" are legal arguments for the count function:


int count(int[] arr) {
    return arr.length; // return the number of elements in arr
}
writefln("%d", count(a)); // ok
writefln("%d", count(s)); // also ok

Resizing Slices

Both arrays and slices support the built-in length property. As you expect, an expression such as a.length tells you how many elements are present in the array; the property applies to array slices as well, and in that case it gives the number of elements within the slice range. For example, the output of the following code is "3":


int a[] = [1, 2, 3, 5, 7, 9, 11, 13, 17];
int[] s = a[2..5];
writefln("%d", s.length); // prints 3

So far so good, but I forgot to mention that the length property is read-write: not only can you query the size of an array, you can also change it, like this:


a.length = 100; // extend the array to 100 elements

Assignment to the length property resizes the array. This begs the question: what happens when an array slice is being resized? The answer of course is "it depends".
With "a" and "s" defined as above, let's say we resize the "s" slice from 3 elements to 7:


s.length = 7;

This extends the view of "s" into "a" up to the ninth element of "a" ("s" starts at 2). It is as we have said:


s = a[2..9];

The slice is still within the bounds of the "a" array. Resizing it is a constant-time operation that changes one field inside the internal representation of "s". If instead of the built-in slice we had used the template ArraySlice struct introduced above, resizing the slice would have amounted to:


int a[] = [1, 3, 5, 7, 9, 11, 13, 17];
ArraySlice!(int) s = { a, 2, 5 };
s.end = 9; // resize the slice

Because "s" is simply a "view" of the array, modifying an element in the array is immediately reflected into the slice, for example:


writefln("%d %d", a[2], s[0]); // prints "5 5"
a[2] = 23;
writefln("%d %d", a[2], s[0]); // prints "23 23"

What happens if we re-size "s" past the end of "a"?


s.length = 100; // what does this do?

The answer is that the behavior is up to the compiler. The current native compiler from Digital Mars changes the type of "s" from a lightweight view into an array to a full-fledged array, and re-sizes it to fit 100 elements.


int a[] = [1, 2, 3, 5, 7, 9, 11, 13, 17];
int[] s = a[2..5];
writefln("%d %d", a[2], s[0]); // prints "5 5"
s.length = 100;
a[2] = 23;
writefln("%d %d", a[2], s[0]); // prints "23 5"

In other words, resizing a slice past the end of the original array breaks up the relationship between the two, and from that point they go their separate merry ways.
This behavior also underlines the schizophrenic nature of array slices that are not full copies of arrays, unless they change their mind.

Concatenating Slices

We saw how array slices may be re-sized via the “length” property. Slices may be re-sized implicitly via concatenation, as in the following example:


int a[] = [1, 2, 3, 5, 7, 9, 11, 13, 17];
int s[] = a[0..5];
s ~= [23, 29];

The tilde is the concatenation operator for arrays, strings and slices. In this example, the slice is resized to accommodate the elements 23 and 29. Note that even that in this situation the resizing is different from had we written:
s.length += 2;

Extending the length by two elements simply bumps up the upper limit of the slice (because there is still room in the original array, "a"). As we saw in the previous section, if the new length exceeds the bounds of the original array, the slice will be "divorced" from the original array, and promoted from a light-weight view to a full, first-class array. If we just extend the length by two elements, the bounds of "a" are not exceeded.

However, in the case of appending (as in s ~= [23, 29]) in addition to resizing we are also setting the values of two additional elements. The slice needs to be divorced from the array, so the a[5] and a[6] are not overwritten with the values 23 and 29. The compiler turns "s" into a full array of length + 2 == 7 elements, copies the elements of “a” from 0 to 5, then appends values 23 and 29.

The problem, as with resizing past the bounds of the original array, is that after the array and the slice part their ways, it is no longer possible to modify value types in the original array via the slice (which has now been promoted to a standalone array). This is a run time behavior, hard to predict by statically analyzing (or visually inspecting) the code.

Rolling Your Own Array Slices

It is impossible to determine statically whether a D function parameter is an array or a slice by examining the function's code alone. It is up to the function's caller to pass in an array or a slice.


void f (int[] a) {
// ...
}
int a[] = [1, 2, 3, 5, 7, 9, 11, 13, 17];
f(a); // call f with an array argument
f(a[2..5]); // call f with a slice argument;

In some cases you may want to better communicate that your function is intended to work with slices rather than arrays. You may also want to have better control over the slice's properties. Say for example you want to make sure that a slice is never re-sized.

You can accomplish these things by rolling your own ArraySlice. The template struct that was introduced earlier is a good starting point. The signature of "f" can be changed to:


void f (ArraySlice!(int) a) { // ...

That's a good start, but the struct is not compatible with a built-in array slice. The following code does not compile:


struct (T) ArraySlice {
    T[] a; // reference to array
    int begin; // index where slice begins
    int end; // one past index where slice ends
}

int a[] = [1, 2, 3, 5, 7, 9, 11, 13, 17];
ArraySlice!(int) s = { a, 2, 5 };
foreach(i; s) { // error, does not compile
    writefln("%d", i);
}

You could of course rewrite the foreach loop to use the begin..end range:


foreach(i; s.begin..s.end) {
    writefln("%d", s.a[i]);
}

In addition to being more verbose such code is not very well encapsulated, since it explicitly accesses the struct's public members. If we later decide to factor out ArraySlice into its own module, and make the “a”, “begin”, and “end” members private, the code above will not compile anymore.

All that's preventing the compact version of the foreach loop from compiling is that the opApply operator is missing. So let's add one:


struct (T) ArraySlice {
  // ...
  int opApply(int delegate(ref int) dg) {
      foreach (i;begin..end) {
          dg(a[i]);
      }
      return 0;
  }
}

Great! This gets us past the compilation error. The foreach loop now compiles and prints out all the elements in the slice. There is a small and subtle bug in this code though. Suppose that instead of printing all elements in the slice, you're doing a linear search, for example:


foreach(i; s) {
    if (i == 5) { // found it!
        break;
    }
}

Astonishingly enough, this code will not break out of the foreach loop, instead it will continue through all the elements in the slice.

The http://www.digitalmars.com/d/2.0/statement.html website prescribes the behavior of the opApply operator:

"The body of the apply function iterates over the elements it aggregates, passing them each to the dg function. If the dg returns 0, then apply goes on to the next element. If the dg returns a nonzero value, apply must cease iterating and return that value".

The D compiler synthesizes the delegate function from the body of the foreach loop, and the code above is transformed internally to this equivalent form:


int dg(ref int i) {
    if (i == 5) {
        return 1;
    }
    return 0;
}
s.opApply(&dg);

The bug in the opApply operator is that the loop should be broken out of when dg returns non-zero:


int opApply(int delegate(ref int) dg) {
 foreach (i; begin..end) {
     if (dg(a[i])) break;
 }
 return 0;
}

Now foreach works correctly. To support foreach_reverse, just add a opApplyReverse member function to the ArraySlice template struct:


int opApplyReverse(int delegate(ref int) dg) {
 foreach_reverse (i; begin..end) {
     if (dg(a[i])) break;
 }
 return 0;
}

What about manipulating the length of the slice? Neither of the lines below compiles:


writefln("%d", s.length);
s.length = 100;

To support the length property, we have to add these methods:


struct (T) ArraySlice {
    // ...
    // return the length of the slice
    int length() { return end - begin; }

    // resize the slice, preventing it to grow past
    // the original array's length
    void length(int newLength) {
        end = begin + newlength;
        if (end > a.length) { end = a.length; }
    }
}

To prevent resizing the slice, all there is to do is to leave the second overload undefined, effectively turning length into a read-only property.

Clients of the struct can still set its individual members to inconsistent values. To disallow incorrect usage, the "a", "begin", and "end" members can be made private (and the struct will have to move to its own module, because in D private access is not enforced if the class or struct lives in the same module as the client code).

To make the ArraySlice struct even more source-compatible with built-in slices, you can give it an opIndex operator:


struct (T) ArraySlice {
    // ...
    T opIndex(size_t i) { return a[i]; }
}

The opIndex operator allows this to work:
writefln("%d", s[3]);

Assignment to array elements is not allowed:


s[3] = 42; // error, does not compile

If you want the array elements to be modified via the slice like that, just define
opIndexAssign:


struct (T) ArraySlice {
    // ...
    void opIndexAssign(size_t i, T val) { a[i] = val; }
}

When you put it all together, the ArraySlice struct will look something like this:


struct (T) ArraySlice {
private:
    T[] a; // reference to array
    int begin; // index where slice begins
    int end; // one past index where slice ends
public:
    T opIndex(size_t i) { return a[i]; }
    void opIndexAssign(size_t i, T val) { a[i] = val; }
    int length() { return end - begin; }
    // comment this function out to prevent resizing
    void length(int newLength) {
        end = begin + newlength;
        if (end > a.length) { end = a.length; }
    }

    // support foreach
    int opApply(int delegate(ref int) dg) {
        foreach (i;begin..end) {
            if (dg(a[i])) break;
        }
        return 0;
    }
}

In conclusion, by rolling your own array slice implementation the intent of your code becomes clearer, your level of control over it increases, and you can still retain the brevity of the built-in slices.

Static ctors in D.NET (Part 2)

2009-04-24T00:03:00.000-07:00

D allows multiple static constructors per class (all sharing the same signature). For example, the following code is legal:


version(D_NET)
{
    import System;
    alias Console.WriteLine println;
}
else
{
    import std.stdio;
    alias writefln println;
}
class A
{
    static int i = 42;
    static this() 
    {
        println("static A.this 1");
    }
    static this() 
    {
        println("static A.this 2");
    }
}
void main()
{
    println(A.i);
}

The program prints:


static A.this 1
static A.this 2
42

Because IL does not allow duplicate methods with the same signature, instead of mapping static constructors directly to .cctor methods, my compiler generates one .cctor per class (where needed) that makes function calls to the static this() constructors. The .cctor is not called if the class is never referenced -- this behavior is different from the native Digital Mars D compiler. If we comment out the one line in main, it will still print the constructor messages in native mode, but not under the .net compiler.

D classes may also have one or more static destructors, as in this example:


class A
{
    static int i = 42;
    static this() 
    {
        println("static A.this 1");
    }
    static this() 
    {
        println("static A.this 2");
    }
    static ~this()
    {
        println("static A.~this 1");
    }
    static ~this()
    {
        println("static A.~this 2");
    }
}

Unlike with the class constructors, there is no special IL method to map static destructors to. My compiler supports them with AppDomain.ProcessExit event handlers, registered in reverse order of their lexical occurrences. IL allows non-member .cctor methods, and the compiler takes advantage of this feature to synthesize code that registers the static destructors as ProcessExit handlers.

It is interesting to observe that the global .cctor does reference the class when it constructs the event handler delegates:


.method static private void .cctor()
{
  // register static dtor as ProcessExit event handler
  call class [mscorlib]System.AppDomain [mscorlib]System.AppDomain::get_CurrentDomain()
  ldnull
  ldftn void 'example.A'::'_staticDtor4'(object, class [mscorlib]System.EventArgs)
  newobj instance void [mscorlib]System.EventHandler::.ctor(object, native int)
  callvirt instance void [mscorlib]System.AppDomain::add_ProcessExit(class [mscorlib]System.EventHandler)
  // register static dtor as ProcessExit event handler
  call class [mscorlib]System.AppDomain [mscorlib]System.AppDomain::get_CurrentDomain()
  ldnull
  ldftn void 'example.A'::'_staticDtor3'(object, class [mscorlib]System.EventArgs)
  newobj instance void [mscorlib]System.EventHandler::.ctor(object, native int)
  callvirt instance void [mscorlib]System.AppDomain::add_ProcessExit(class [mscorlib]System.EventHandler)
  ret
}

This means that the .cctor of the class will be called, even if no user code ever references it.

In addition to class static constructors and destructors, D also features module static constructors and destructors. These are expressed as non-member functions with the signature static this() and static ~this(), respectively.
For example:


//file b.d
import a;
version(D_NET)
{
    import System;
    alias Console.WriteLine println;
}
else
{
    import std.stdio;
    alias writefln println;
}

static this()
{
    println("module B");
    map["foo"] = "bar";
}
static this()
{
    println("boo");
}
static ~this()
{
    println("~boo");
}

//file a.d
version(D_NET)
{
    import System;
    alias Console.WriteLine println;
}
else
{
    import std.stdio;
    alias writefln println;
}

string map[string];

static this()
{
    println("module A");
}
static ~this()
{
    println("~module A");
}

void main()
{
    foreach (k, v; map)
    {
        version(D_NET)
        {
            Console.WriteLine("{0} -> {1}".sys, k, v.sys);
        }
        else
        {
            writefln("%s -> %s", k, v);
        }
    }
}

It is noteworthy that regardless in which order the two files above are compiled the resulting program prints the same output:


module A
module B
boo
foo -> bar
~boo
~module A

The explanation lay in the D language rules: if a module B imports a module A, the imported module (A) must be statically initialized first (before B).

As in the case of static constructors and destructors for classes, the compiler uses the global, free-standing .cctor method to stick calls to module ctors and register ProcessExit events that call the module's static dtors.

Thanks to BCSd for prompting this post with his comment and code sample.

Static Constructors in D.NET

2009-04-14T01:22:00.000-07:00

The D programming language features static constructors that are similar to class constructors in C#: they are called automatically to initialize the static fields (shared data) of a class.

At the IL level, static constructors are implemented by the special .cctor methods. The experimental compiler for D that I am working on in my virtual garage groups together user-written static constructor code with static field initializers into .cctors (and I believe that the C# compiler does the same).


class Example {
  static int solution = 42; // assignment is moved inside .cctor
  static double pi;

  static this() { // explicit static ctor ==> .cctor
    pi = 3.14159;
  }
}

The code above produces the same IL as:


class Example {
  static int solution = 42;
  static double pi = 3.14159;
}

Also, the compiler synthesizes one class per module to group all "free-standing"global variables (if any).

For example, the IL code that is generated out of this D source


static int x = 42;

void main() {
  writefln(x);
}

is equivalent to the code generated for:


class ModuleData {
  static int x = 42;
}
void main() {
  writefln(ModuleData.x);
}

only that in the first case the ModuleData class is implicit and not accessible from D code. This strategy allows for the initializers of global variables to be moved inside the .cctor of the hidden module data class.

IL guarantees that the class constructors are invoked right before any static fields are first referenced. If the compiler flags the class with the beforefieldinit attribute, then the class constructors are called even earlier, i.e. right when the class is referenced, even if no members of the class are ever accessed (the C# compiler sets the beforefieldinit attribute on classes without explicit class constructors).

Serge Lidin explains all the mechanics in his great book Expert .NET 2.0 IL Assembler, and recommends avoiding beforefieldinit, on grounds that invoking a .cctor is a slow business. I am considering using it though, on the synthesized module data class.

In conjunction with a compiler-generated reference to the module data class, the beforefieldinit attribute will guarantee that the global variables are initialized on the main thread, and will avoid strange race conditions and bugs.

No Teleprompter to Blame

2009-04-04T21:31:00.000-07:00

I will never run for President of the United States. Not because I wouldn't like to, but because I can't: I am a Naturalized Citizen, one step below the Natural Born One. As depressing as this can be, there's a good side to it, too: I can always recant.

I have the luxury to lightheartedly declare "Folks, I don't know what I was smoking when I said that D structs cannot be implemented as value types in .net", without being afraid of losing any points in any poll.

Further research proved that my initial argument, in .net value types do not participate in garbage collection was... er irrelevant. That's because Walter Bright' s D compiler front-end is smart enough to insert calls to the structs' destructors wherever necessary! Now that's what I call a bright design. It doesn't matter that the CLR does not garbage-collect new-ed value types, because the front-end generates the code that deletes them.

I was running some compiler back-end tests for the D language post-blit operator when I realized that copying value types in IL is straight-forward; you LOAD the source variable / field / argument and STORE into the destination (boom!) whereas bit-copying managed classes is not as trivial.

I did not give up right away. Hanging on to my structs-as-classes implementation, I wrote a run-time helper blitting routine:


        using System.Runtime.InteropServices;

        // assumes src and dest are of the same type
        static public void blit(Object dest, Object src)
        {
            int size = Marshal.SizeOf(src.GetType());
            IntPtr p = Marshal.AllocHGlobal(size);
            try
            {
                Marshal.StructureToPtr(src, p, false);
                Marshal.PtrToStructure(p, dest);
            }
            finally
            {
                Marshal.FreeHGlobal(p);
            }
        }

It did not take long for the truth to dawn upon me: Wow, bit-blitting non-value types is a major pain in the (oh) bum (ah). Efficient that code ain't (or should I say ISN'T? man do those consonants hurt). Honestly, I am not even sure that code is kosher. Better not to get into a pickle if one can avoid it... so back to the struct-as-value type implementation I am.

D for Programmers

2009-03-28T22:28:00.000-07:00

I wrote a while ago about implementing thread support in the D .net compiler. The idea was to generate code that constructs delegates that are passed to the Start method of the System.Threading.Thread class. I discussed some details of constructing delegates out of nested functions, and I concluded that my next thread-related task was to implement support for the synchronized keyword.

Like the postman who goes for a walk in the park after coming home from work, I relax in the after hours by working on pet projects like C++ debuggers and D compilers. So this Saturday morning I sat down to implement the code generation for synchronized. The D language accepts two statement forms:

synchronized ScopeStatement
synchronized (Expression) ScopeStatement

The former can be easily reduced to the latter by synthesizing a global object and an expression that references it.

Here is a sample of D code that illustrates the use of the synchronized keyword:


import System;

class Semaphore {
  bool go;
}
Semaphore sem = new Semaphore;

void main() {
  void asyncWork() {
    while (true) {           // busy wait
      synchronized(sem) {
          if (sem.go) break;
      }
    }
  }
  Threading.Thread t = new Threading.Thread(&asyncWork);
  t.Start();
  synchronized(sem) {
    sem.go = true;
  }
  t.Join();
}

A synchronized statement can be transformed into the following pseudo-code:

object m = Expression();
try {
  lock(m);
  ScopeStatement();
}
finally {
  unlock(m);
}

Implementing lock / unlock maps naturally to the Enter / Exit methods of the System.Threading.Monitor class. That's really all there is to it, generating the method calls is trivial.

I was a bit disappointed by how easy the task turned out to be, but on the bright side I had plenty of time left to spend with my kid. I took him to Barnes and Noble to check out the Thomas trains and board books and the computer section, where I found the most helpful book title ever: "C# for Programmers". I guess no lawyer accidentally wandering through the computer book section can claim that he bought the book because the title was misleading. I wish that all book titles be so clear: "Finance for Accountants" or "Elmo Board Book for Toddlers".

Another Perl in the Wall

2009-03-25T23:44:00.000-07:00

The most successful computer languages out there were born out of concrete problems: Perl in the beginning was nothing more than a Reporting Language; C came out of the need of writing OS-es in a portable fashion; PHP emerged of somebody's need for expressing dynamic web content as server-side macros. C# solves the problem of doing component programming without intricate COM knowledge and 2 PhD-s per developer.

Typically, after the problem is solved, and the engineers scratched their itch, wrote the code, and shipped the (working!) products, some rather academic type decides: "Now I am going to redo it the right way" (and that's how UNIX got rewritten as Plan 9, a great OS that nobody uses).

Interestingly enough, the redesigned products rarely end up being as successful as the original. People have been trying for years to replace things such as Windows, Office and the C programming language with "better" rewrites, but the market does not seem to care much. If it was good enough the first time around, it got adopted and used. Who cares how neat (or messy) the internal plumbings are?

The C and C++ languages are great for doing low level systems programming; they may be harder to use for constructing applications, and I would definitely advise against using C++ for web development. The D programming language is fancying itself as a better C++ and I think that is true in the application development area. But I do not see D as a systems language. I will never write a task scheduler in a garbage-collected language. When I write system level stuff, I want the good old WYSIWYG behavior: no garbage collector thread running willy-nilly and no strange array slices (that are like arrays except for when they aren't). And thanks but no thanks, I want no memory fences, no thingamajica inserted with parental care by the compiler to protect me from shooting my own foot on many-core systems. That is the point of systems programming: I want the freedom to shoot myself in the foot.

I have been trying (unsuccessfully) to argue with some folks on the digitalmars.d newsgroup that the new 2.0 D language should not worry much about providing support for custom allocation schemes. D is designed to help productivity: it relieves programmers from the grueling tasks of managing memory, and it encourages good coding practices, but a systems language it is not. We already have C, C++ and assembly languages to do that low-level tweak when we need it.

Sadly, some of the people involved with the design of D 2.0 are aiming for the moral absolute, rather than focus on shipping something that works well enough. I think it is a bad decision to allow for mixing the managed and unmanaged memory paradigms; it is even worse that there are no separate pointer types to disambiguate between GC-ed and explicitly-managed objects. C++ went that route in its first incarnation, and it wasn't long before people realized that it was really hard to keep track of what objects live on the garbage collected heap and what objects are explicitly managed. A new pointer type had to invented (the one denoted by the caret) to solve the problem.

If folks really want to use D to program OS-es and embedded devices and rewrite the code that controls the breaks in their cars, they should at least make a separate D dialect and name it for what it is, Systems D, Embedded D or something like that. The garbage collection and other non-WYSIWYG features should be stripped out from such a dialect.

The ambition of making D 2.0 an all encompassing, moral absolute language may cause 2.0 to never ship, never mind get wide adoption. Perl started out with more modest goals and ended up enjoying a huge mind share.

So the ship-it today if it works hacker in me feels like dedicating a song to the language purists:

We don't need no education
Perl's best language of them all,
We don't need no education
Thanks a bunch to Larry Wall!

Associative Arrays in D.NET

2009-03-20T02:45:00.001-07:00

The D Programming Language supports foreach loops over associative arrays.

Associative arrays are data structures that look much like "normal" arrays, but the index types are not integers. Here's an example of an array of doubles indexed by strings, expressed in D:

double [string] a = [ "pi" : 3.14159, "e" : 2.718281828, "phi" : 1.6180339 ];

The D language does not explicitly specify how associative arrays should be implemented. In C++ associative arrays can be implemented as standard STL maps, hash_maps (to be replaced by unordered_maps in C++0x) or with Google's sparse hash maps, to name a few possibilities.

In other languages such as Python, associative arrays are called dictionaries. The family of .NET languages take advantage of the System.Collections.Generic.Dictionary class (this is also what the D compiler for .NET does: it implements associative arrays as system dictionaries).

D provides an easy way to iterate over an associative array, the foreach keyword. This keyword should be familiar to anyone programming in C#, UNIX shell, or managed C++. Here is an example for how it is being used in D:


foreach (string key, double value; a)  {
  version(D_NET) {
    Console.WriteLine("{0}={1}".sys, key, value);
  }
  else {
    writefln("%s=%f", key, value);
  }
}

The types for the key and value arguments can be explicitly specified, but that is not necessary as the compiler can infer them automatically. The foreach line can be re-written more compactly as:


foreach (key, value; a) {

Another legal form for the statement, used to iterate over the values (and ignore the keys) is:


foreach (value; a) {

The current implementation of the compiler front-end synthesizes a nested function out of the loop's body. The .NET back-end constructs a delegate out of this nested function and its closure; then it "wimps out" and calls a run-time helper written in C#:


    public class AssocArray
    {
        public delegate int Callback<V>(ref V value);
        public delegate int Callback<K, V>(K key, ref V value);

        static public void Foreach<K, V>(Dictionary<K, V> aa,
                   int unused,
                   Callback<V> callback)
        /// ...
        static public void Foreach<K, V>(Dictionary<K, V> aa,
                  int unused,
                  Callback<K, V> callback)
        /// ...
      }

The generic Foreach function has two overloads, to accommodate both forms of the foreach statement.

D rules do not allow an array to be modified from within the loop, but the elements of the array can be modified if the value argument has a ref storage class:


foreach (key, ref value; a)  {
    value = 0.0;
}

C#'s rules are stricter, one cannot modify neither the collection (by adding / removing elements) nor change the individual elements. To work around this restriction, the run-time helper code does two passes over the dictionary that corresponds to the D associative array:


static public void Foreach<K, V>(Dictionary<K, V> aa,
       int unused, Callback<K, V> callback)
{
       Dictionary<K, V> changed = new Dictionary<K, V>();
       foreach (KeyValuePair<K, V> kvp in aa)
       {
            V temp = kvp.Value;
            int r = callback(kvp.Key, ref temp);
            if (!kvp.Value.Equals(temp))
            {
                changed[kvp.Key] = temp;
            }
            if (r != 0)
            {
                break;
            }
        }
        foreach (KeyValuePair<K, V> kvp in changed)
        {
            aa[kvp.Key] = kvp.Value;
        }
}

The Callback delegate is constructed from the address of a closure object and a nested foreach function, both synthesized in the compiler. The generated code looks something like this:


  newobj instance void 'vargs.main.Closure_2'::.ctor()
  stloc.s 1 // '$closure3'
  ldloc.1 // '$closure3'
  ldftn instance int32 'vargs.main.Closure_2'::'__foreachbody1' (float64& '__applyArg0')
  // construct Foreach delegate
  newobj instance void class [dnetlib]runtime.AssocArray/Callback`1::.ctor(object, native int)
  .line 14
  call void [dnetlib]runtime.AssocArray::Foreach(
 class [mscorlib]System.Collections.Generic.Dictionary`2,
 int32,
 class [dnetlib]runtime.AssocArray/Callback`1)

Edit: After writing this piece I noticed that I forgot to mention one interesting side effect of my implementation: because there is no try / catch around the Callback call in the C# run-time support code, the foreach loop has all-or-nothing transactional semantics.

For example, this program has different outputs when compiled with DMD from when it is compiled with my D / .NET compiler:


version(D_NET)
{
  import System;
  import dnet;
}
else
{
  import std.stdio;
}

void main()
{
    int [string] x = ["one" : 1, "two" : 2, "three" : 3];

    try
    {
        foreach (ref v; x) 
        {
            if (v == 3)
                throw new Exception("kaboom");
            v = 0;                        
        }
    }
    catch (Exception e)
    {
        version(D_NET)
        {
            Console.WriteLine(e.toString().sys);
        }
        else
        {
            writefln("%s", e.toString());
        }
    }
    foreach (k, v; x) 
    {
        version(D_NET)
        {
            Console.WriteLine("{0}, {1}".sys, k, v);
        }
        else
        {
            writefln("%s, %d", k, v);
        }
    }
}

Under D/.NET it prints:


kaboom
one, 1
two, 2
three, 3

while the native compilation gives:


object.Exception: kaboom
two, 0
three, 3
one, 1

It would be very easy to get my compiler to emulate the native behavior, but I kind of like the "transactional" flavor...

D Conditional Compilation

2009-03-15T22:08:00.000-07:00

The D programming language supports conditional compilation using version identifiers and version numbers, a solution that is slightly better than the #ifdef, pre-processor driven, way of C/C++ that most of us are used to.

When using the .NET compiler for D that I am developing, one will be able to import and take advantage of .NET assemblies. For example the System.Console.WriteLine family of functions may come in handy. But such code would not compile when fed to the native Digital Mars D compiler.

Conditional compilation and the version identifier D_NET do the trick, like in this example:


version(D_NET)
{
 import System;
 import dnet;
}
else
{
 import std.stdio;
}

void main()
{
   int [string] x;

   x["one"] = 1;
   x["two"] = 2;

   foreach (k, v; x)
   {
       version(D_NET)
       {
           Console.WriteLine("{0}, {1}".sys, k, v);
       }
       else
       {
           writefln("%s, %d", k, v);
       }
   }
}

So I hacked the front-end of the D for .NET compiler to predefine D_NET.

Of course, abusing conditional compilation will yield code that is unreadable and hard to grasp as a C++ source littered with #ifdef ... #else ... (or the US tax code).

But I am a strong supporter of The Second Amendment of the Internet Constitution: "the right of the People to keep and bear compilers that let them shoot themselves in the foot shall not be infringed".

Strumming Strings In the D Chord

2009-03-11T18:14:00.000-07:00

In my recent interview with the InfoQ technology magazine I was asked about compatibility issues between D and .NET. I replied with a brief description of how array slices in D raise a conceptual incompatibility: System.Array and System.ArraySegment are distinct, unrelated types. In D arrays slices are indistinguishable from arrays and this creates the problems that I mentioned in the interview.

But there are other incompatibilities between D and .NET that I did not mention because I wanted to keep the interview lean and focused.

Take for example the built-in support for strings.

The keyword string is shorthand in both IL and C# for System.String (essentially a sequence of Unicode characters in the range U+0000 to U+FFFF).

In the D programming language, string is shorthand for invariant char[] and characters are unsigned bytes.

Side notes: D uses UTF8 to support foreign languages, and also sports the types wstring (a sequence of 2 byte-wide characters, compatible with Microsoft's Unicode) and dstring (for UTF-32 strings). Wide and double-wide (UTF-32) string literals are denoted by the "w" and "d" respective suffixes, as in: "Hello"w, "Good Bye"d (UTF-32 dchar works in the native D compiler, but is not currently supported in my .NET compiler).

When a variable of type string is encountered in a D source, the compiler emits a corresponding IL variable with the type unsigned int8[].

In IL there is a special instruction, ldstr, for loading string literals on the evaluation stack. This code

ldstr "Hello"

loads a "Hello" [mscorlib]System.String. If this literal is to be stored into a variable (say "x"), then my compiler will insert conversion code that looks somewhat like this:


call class [mscorlib]System.Text.Encoding
         [mscorlib]System.Text.Encoding::get_UTF8()
ldstr "Hello"
callvirt instance uint8[]
         [mscorlib]System.Text.Encoding::GetBytes(string)

stloc 'x' // store byte array into variable x

For the cases where a D string (array of bytes) has to be converted to a System.String I provide an explicit string property, called sys, with the following D prototype:


static public String sys(string x);

The D programmer would write something like this:


import System;
// ... snip ...
string x = "Hello";
Console.WriteLine(x.sys);
Console.WriteLine("Hello .NET".sys);

The compiler figures out that in the case of


Console.WriteLine("Hello .NET".sys);

the call to the sys function can be elided, and generates straightforwardly:


ldstr "Hello .NET"
call void [mscorlib]System.Console::'WriteLine' (string)

Matters get more interesting when we consider associative arrays. D offers a great convenience to programmers by supporting associative arrays directly in the language, for example


int [string] dict;

dict["one"] = 1;
dict["two"] = 2;
// ...

introduces an array of integers indexed by strings. By contrast, in other languages such data structures are implemented "externally" in a library; in C++ for example, std::map<std::string, int> is implemented in the STL; the C# equivalent of an associative array is the System.Collections.Generic.Dictionary. My friend and colleague Ionut Gabriel Burete contributed an implementation of associative arrays in the D compiler for .NET using exactly that class.

An associative array / dictionary with string keys is an interesting case, because System.String::Equals does the right thing out of the box, namely performs a lexicographical comparison of two strings; System.Array::Equals however simply compares object references, it does not iterate over and compare elements. This means that a Dictionary(string, int) will behave as you expect, but if the key is of the unsigned int8[] type you may be in for a surprise.

For this reason, I put a hack in the compiler back-end: for the case of associative arrays I break away from representing D strings as byte arrays in .NET, and use System.String instead, which works great (or so I thought up until I ran into the problem of generating the code for a foreach iteration loop).

For D code such as:


import System;
int [string] x;
x["one"] = 1;
x["two"] = 2;
// ...
foreach (k, v; x) {
  Console.WriteLine("{0}, {1}", k, v);
}

the compiler synthesizes a function that corresponds to the body of the loop, and binds the key and value ("k" and "v" in the code snippet) to local variables in that function (this happens all inside the front-end).

The back-end must be able to detect the variables bound to foreach arguments and reconcile the data types where necessary. In the example above the type of the "k" variable in the IL will thus be System.String, and not unsigned int8[].

Reaching Closure...

2009-02-22T00:17:00.000-08:00

A diligent reader commented on my previous post that the implementation of nested functions in D for .NET was buggy for multi-threaded programs.

Indeed, in the code below asyncWork() never returned. That's because a copy by-value of the variable go was used in the closure.


void main()
{
    bool go;
 
    void asyncWork()
    {
        while (!go)
        {
            //busy wait
        }
    }
    Threading.Thread t = new Threading.Thread(&asyncWork);
    t.Start();

    go = true;
}

I am trying to avoid generating unverifiable code in this compiler project: IL does not allow managed pointers as fields in a class; ILASM accepts unmanaged pointers but that yields unverifiable code. My first instinct for closures was to use a copy for all the referenced variables, but as observed by my reader that approach did not work for multi-threaded programs.

One way to solve the problem is to use unmanaged pointers in the closure; under this implementation the example above runs correctly. There may be at least another solution: wrap each referenced variable into an object, and have both the nested function and the surrounding context share the object reference; I found it to convoluted and pursued the unmanaged pointers route instead.

This is how the generated IL looks for the D code in the example:


.module 'example'
.custom instance void [mscorlib]System.Security.UnverifiableCodeAttribute::.ctor() = ( 01 00 00 00 )


//--------------------------------------------------------------
// main program
//--------------------------------------------------------------
.method public hidebysig static void _Dmain ()
{
  .entrypoint
  .maxstack 3
  .locals init (
    [0] bool pinned 'go',
    [1] class [mscorlib]System.Threading.Thread 't',
    [2] class example.main.closure1 '$closure3'
  )
  newobj instance void example.main.closure1::.ctor()
  stloc.s 2 // '$closure3'
  ldloc.2 // '$closure3'
  ldloca 0 // 'go'
  stfld bool* 'example.main.closure1'::go2
  ldloc.2 // '$closure3'
  dup
  ldvirtftn instance void example.main.closure1::'asyncWork' ()
  newobj instance void class [mscorlib]System.Threading.ThreadStart::.ctor(object, native int)
  newobj instance void [mscorlib]System.Threading.Thread::.ctor (ThreadStart)
  stloc.s 1 // 't'
  ldloc.1 // 't'
  callvirt instance void [mscorlib]System.Threading.Thread::'Start' ()
  ldc.i4 1
  stloc.s 0 // 'go'
  ret
}

.class private auto example.main.closure1 extends [dnetlib]core.Object
{

  .method public virtual newslot hidebysig instance void 'asyncWork' ()
  {
    .maxstack 2
L1_example:
    ldarg.0
    ldfld bool* 'example.main.closure1'::go2
    ldind.i1
    ldc.i4 0
    beq L1_example
    ret
  }
  .field public bool* go2
  // default ctor, compiler-generated
  .method public hidebysig instance void .ctor()
  {
    ldarg.0
    call instance void [dnetlib]core.Object::.ctor()
    ret
  }
} // end of example.main.closure1

Now there is only one more small problem to address: synchronization between threads...

Nested Functions and Delegates

2009-02-11T22:42:00.000-08:00

My previous post missed one aspect of delegates in D: nested functions. Walter Bright gave me this example:


int delegate() foo(int i)
{
  int bar() { return i; }
  return &bar;
}

Function bar is nested inside foo; foo wraps bar into a delegate which is returned. My blog post is guilty of overlooking this use case for delegates; yet my compiler implementation is innocent: the example compiles and runs correctly.

The code example may look like a new use case at first, but is in fact similar to making a delegate from an object instance and a method:


class Foo {
    int i;
    int bar() { return i; }
}
...
Foo f = new Foo;
int delegate() dg = &f.bar;

The reason is that there is an invisible object in the nested function case. In the D programming language, nested functions have access to the surrounding lexical scope (note how function bar uses i which is declared as a parameter of foo); the .NET D compiler represents internally the lexical context of the nested function as an object. The fields of the context object are shallow copies of the variables in the "parent" scope. The IL class declaration for the context is synthesized by the compiler, which also instantiates the context. The context is populated on the way in (before calling the nested function) and used to update the variables in the parent scope on the way out (after the call has completed).

The constructor of a delegate object takes two parameters: an object reference and a pointer to a function; in the case of nested functions, the first parameter that the compiler passes under the hood is the context object. This is why constructing a delegate from a nested function is not different from using an object and one of its methods.

What if the nested function is declared inside of a class method (you ask). In this case there is no need to synthesize a class declaration to model the context of the nested call. The class to which the method belongs is augmented with hidden fields that shadow the variables in the parent scope.

Delegates in D for .NET

2009-02-10T22:40:00.000-08:00

This past weekend I typed "Joe Newman" in Pandora and sat down for a couple of hours to implement delegates in my .NET back-end for the D compiler.

I begun by studying the documentation on MSDN and I noticed some differences in the way delegates work in .NET and D.

In .NET (and C#) delegates are objects that wrap pointers to functions so that they can be manipulated and invoked safely. The functions may be either standalone or members of a class. In D, the concept of delegates applies only to member functions. Delegates may be called asynchronously in .NET (I am not aware of a similar feature in the D programming language). The concept of delegates is thus simpler in D.

The implementation that I came up with is straight-forward: classes that derive from [mscorlib]System.MulticastDelegate are generated for each delegate type. The classes are sealed and each have a virtual Invoke method that matches the signature of the D delegate.

For the following D code snippet


class Test
{
    void fun(int i)
    { ...
      ...
    }
}
Test t = new Test;
void delegate(int) dg = &t.fun;

the generated IL looks like this:


.class sealed $Delegate_1 extends [mscorlib]System.MulticastDelegate
{
  .method public instance void .ctor(object, native int) runtime managed {}
  .method public virtual void Invoke(int32) runtime managed {}
}
...
...
.locals init (
    [0] class Test 't',
    [1] class $Delegate_1 'dg'
 )
newobj instance void delegate.Test::.ctor ()
stloc.s 0 // 't'

ldloc.0 // 't'
dup
ldvirtftn instance void delegate.Test::'print' (int32 'i')
newobj instance void class $Delegate_1::.ctor(object, native int)
stloc.1

One small (and annoying) surprise that I had was that although the IL standard contains code samples with user-defined classes derived directly from [mscorlib]System.Delegate, such code did not pass PEVERIFY and, more tragically, crashed at run-time. The error message ("Unable to resolve token", or something like that) was not helpful; but the ngen utility dispelled the confusion by stating bluntly that my class could not inherit System.Delegate directly. Replacing System.Delegate with System.MulticastDelegate closed the issue.

Once I got delegates to work for class methods, I realized that the code can be reused to support D pointers to functions as well. In D pointers to functions are a different concept from delegates; in .NET however, a delegate can be constructed from a standalone function by simply passing a null for the object in the constructor. It is trivial for the compiler to generate code that instantiates .NET delegates in lieu of function pointers.

One nice side-effect of representing pointers to functions as delegates is that they can be aggregated as class members, unlike pointers to other data types that cannot be aggregated as struct or class fields (an IL-imposed restriction for managed pointers).

I hope that one day D decides to support asynchronous delegate calls. I have yet to imagine the possibilities for asynchronous, pure methods.

Until then, the .NET back-end is moving along getting closer and closer to a public release.

D-elegating Constructors

2009-02-08T21:46:00.000-08:00

The D programming language allows a constructor of a class to call another constructor of the same class, for the purpose of sharing initialization code. This feature is called "delegating constructors"; it is also present in C# and in the emerging C++ 0x.

C#'s syntax for delegating constructors resembles the initializer lists in C++, and strictly enforces that the delegated constructor is called before any other code in the body of the caller constructor; the feature is masterfully explained in More Effective C#: 50 Specific Ways to Improve Your C# (Effective Software Development Series).

D is more flexible, a constructor can be called from another constructor's body pretty much like any other "regular" method, provided that some simple rules are observed (for example, it is not permitted to call a constructor from within a loop).

A D compiler must detect constructor delegation and ensure that some initialization code is not executed more than once. Let's consider an example:


class Example
{
    int foo = 42;
    int bar;

    this()
    { 
        bar = 13; 
    }
    this(int i)
    {
        foo = i;
        this();
    }
}

In the first constructor, before the field bar is assigned the value 13, some "invisible" code executes: first, the constructor of the base class is invoked. The Example class does not have an explicit base; but in D, similar to Java and C#, all classes have an implicit root Object base. It is as if we wrote:


class Example : Object
{ ...
}

After generating the call to Object's constructor, the compiler generates the code that initializes foo to 42. The explicit assignment as written by the programmer executes after wards.

The compiler must be careful so that the initializations steps described above happen only once in the second constructor. This is not simply a matter of efficiency; it is more importantly, a matter of correctness. If calling the base Object constructor and the initialization of foo where generated blindly inside the body of each constructor, then the following would happen in the second constructor's case:

Object's ctor is invoked (compiler generated)

foo = 42 (compiler generated)

foo = i (programmer's code)

constructor delegation occurs (programmer's code), which means that:

Object's ctor is invoked

foo = 42 (compiler generated)

This is obviously incorrect, since it leaves the Example object in a different state than the programmer intended.

Such scenario is very easily avoided by a native compiler. Object creation is translated to several distinct steps:

memory for the object is allocated

invocation of base ctor is generated

initializers are generated (this is where foo = 42 happens)

constructor as written by programmer is invoked

The important thing to note is that in the native compiler's case the compiler leaves the constructors alone, as written by the programmer, and inserts its magic "pre-initializaton" steps in between the memory allocation and constructor invocation.

When writing a compiler back-end for .NET things are slightly different: the creation of an object is expressed in one compact, single line of MSIL (Microsoft Intermediary Language) assembly code:


newobj <constructor call>

In our example, that would be


newobj void class Example::.ctor()

and


newobj void class Example::.ctor(int32)

respectively. So the compiler-generated magic steps of calling the base constructor, etc have to happen inside the constructor body. To prevent the erroneous scenario of double-initialization from happening, I had to generate a hidden, "guard" Boolean field for classes that use constructor delegation. The variable is set when entering a constructor's body; it is checked inside each constructor before calling the base constructor and stuff. Here's how the generated IL code looks like:


//--------------------------------------------------------------
// ctor.d compiled: Sun Feb 08 23:04:49 2009
//--------------------------------------------------------------
.assembly extern mscorlib {}
.assembly extern dnetlib {}
.assembly 'ctor' {}

.module 'ctor'


.class public auto ctor.Example extends [dnetlib]core.Object
{
  .field public int32 foo
  .field public int32 bar
  .method public hidebysig instance void .ctor ()
  {
    .maxstack 3
    ldarg.0
    ldfld bool 'ctor.Example'::$in_ctor
    brtrue L0_ctor
    ldarg.0
    call instance void [dnetlib]core.Object::.ctor()
    ldarg.0
    ldc.i4 42
    stfld int32 'ctor.Example'::foo
L0_ctor:
    ldarg.0 // 'this'
    ldc.i4 13
    stfld int32 'ctor.Example'::bar
    ret
  }
  .method public hidebysig instance void .ctor (int32 'i')
  {
    .maxstack 3
    ldarg.0
    call instance void [dnetlib]core.Object::.ctor()
    ldarg.0
    ldc.i4 42
    stfld int32 'ctor.Example'::foo
    ldarg.0 // 'this'
    ldarg.1 // 'i'
    stfld int32 'ctor.Example'::foo
    ldarg.0 // 'this'
    ldc.i4 1
    stfld bool 'ctor.Example'::$in_ctor
    ldarg.0
    call instance void ctor.Example::.ctor ()
    ret
  }
  .field bool $in_ctor
} // end of ctor.Example

As a side note, in the second constructor's case a small redundancy still exists: foo is assigned to 42 only to be set to another value right away. I am hoping that this isn't much of an issue if the JIT engine detects it and optimizes it out. I'd be happy to hear any informed opinions.

Stepping Over STL Code

2009-02-01T12:49:00.000-08:00

When debugging C++ code written using the Standard Template Library (STL) it is not unusual to find yourself stepping through STL code. Most of the time, this is not very useful: The STL implementation typically comes bundled with the C++ compiler, and it has been thoroughly tested by the vendor; it is unlikely that the bug you are after is caused by the STL.

So when a statement such as myVector.push_back(x) is encountered while tracing with the debugger, you normally want to step over it, not into it. Most debuggers offer a "step over" and a "step into" function. So you would chose "step over".

But how about this? You want to debug a routine named my_func(size_t containerSize) and want to step into the body of my_func when this statement is hit: my_func(myVector.size()). If you select "step into", the debugger will first take you into the guts of STL's vector<T>::size() implementation before stepping into my_func.

The ZeroBUGS debugger allows you to avoid such annoyances. Once inside size(), you can right click, and select to "Always step over..." that function, all functions in that file, or all files in the same directory. The debugger will remember your option, and you don't have to see the guts of size(), or any other vector function, or any other STL function, respectively.

The functionality can be used not just with the STL but any code. If you later change your mind, the "Manage" menu allows you to remove functions, files or directories from the step-into blacklist.

To Destruct a D Struct

2009-02-01T00:02:00.000-08:00

I wrote a while ago about similarities between D and .NET (and implicitly C#). My interest in mapping D features to .NET is driven by a research project that I took on a few months ago: a D 2.0 language compiler for .NET (D 2.0 is a branch version of D that includes experimental features). I was mentioning how in both D and C# structs are lightweight, value types.

After working on struct support in more detail, I have come to the realization that D structs cannot be implemented as .NET value type classes. Rather, they have to be implemented as reference type classes.

The short explanation is that while in IL value classes do not participate in garbage collection, D expects the GC to reap structs after they are no longer in use.

Interestingly enough, value types may be newobj-ed (not just created on the stack).

We can use a simple example to demonstrate the difference between value classes and reference classes. If we compile the following program using the IL assembler (ILASM) and run it, nothing gets printed on the screen:


.assembly extern mscorlib {}
.assembly 'test' {}

.class public value auto Test
{
  .field public int32 i

  .method public void .ctor()
  {
      ret
  }
  .method virtual family void Finalize()
  {
    ldstr "finalizing..."
    call void [mscorlib]System.Console::WriteLine(string)
    ret
  }
}
//--------------------------------------------------------------
// main program
//--------------------------------------------------------------
.method public static void main ()
{
  .entrypoint
  .locals init (
    class Test t
  )
  newobj instance void Test::.ctor()
  stloc 't'
  ret
}

But if we changed the declaration of the Test class from a value type to class, like this:


.class public auto Test

we could see "finalizing..." printed, a confirmation that the destructor (the Finalize method) is being invoked by the garbage collector. All it takes is removing "value" from the declaration.

In IL, value types have no self-describing type information attached. I suspect that the reason for not having them being garbage collected is that, without type information, the system cannot possibly know which (virtual) Finalize method to call (note that although C# struct are implemented as sealed value classes, "sealed" and "value" are orthogonal).

D supports the contract programming paradigm, and class invariants is one of its core concepts.

The idea is that the user can write a special method named "invariant", which tests that certain properties of a class or struct hold. In debug mode, the D compiler inserts "probing points" throughout the lifetime of the class (or struct), ensuring that this function is automatically called: after construction, before and after execution of public methods, and before destruction.

The natural mechanism for implementing the last statement is to generate a call to the invariant method at the top the destructor function body. But if the destructor is never called then we've got a problem.

So having destructors work correctly is not just a matter of collecting memory after the struct expires, but it is also crucial to contract programming in D.

Assignment to structs and passing in and to functions may become heavier weight in D.NET than in the native, Digital Mars D compiler (albeit this is something that I have to measure) by implementing structs as reference type classes, but it is necessary in order to support important D language features.

42

2009-01-22T19:42:00.000-08:00

Yesterday night, at the monthly NWCPP meeting Walter Bright gave a presentation on meta-programming using the D language. Once again, D put C++ to shame.

Because of transportation arrangements I could not accompany Walter et. Co to the watering hole after the lecture. Instead I went home and decided to test how my D.NET work-in-progress compiler handles templates, and what kind of code it generates.

I picked a variadic template for my test, which computes the maximum of an arbitrarily long list of numbers (adapted from a version written by Andrei Alexandrescu) :

import System;

auto max(T1, T2, Tail...)(T1 first, T2 second, Tail args)
{
 auto r = second > first ? second : first;
 static if (Tail.length == 0) {
   return r;
 }
 else {
   return max(r, args);
 }
}

void main()
{
  uint k = 42;
  auto i = max(3, 2, k, 2.5);
  Console.WriteLine(i);
}

The program above prints 42 (of course), and here's how the generated IL looks like:


//--------------------------------------------------------------
// max.d compiled: Thu Jan 22 19:38:26 2009
//--------------------------------------------------------------
.assembly extern mscorlib {}
.assembly extern dnetlib {}
.assembly 'max' {}

.module 'max'

//--------------------------------------------------------------
// main program
//--------------------------------------------------------------
.method public hidebysig static void _Dmain ()
{
.entrypoint
.maxstack 4
.locals init (
[0] unsigned int32 'k',
[1] float64 'i'
)
ldc.i4 42
stloc.s 0    // 'k'
ldc.i4 3
ldc.i4 2
ldloc.0    // 'k'
ldc.r8 2.5
call float64 _D3max16__T3maxTiTiTkTdZ3maxFiikdZd (
  int32 'first', int32 'second', unsigned int32, float64)
stloc.s 1    // 'i'
ldloc.1    // 'i'
call void [mscorlib]System.Console::'WriteLine' (float64)
ret
}
.method public hidebysig static float64 _D3max16__T3maxTiTiTkTdZ3maxFiikdZd (
 int32 'first', int32 'second', unsigned int32, float64)
{
.maxstack 4
.locals init (
[0] int32 'r'
)
ldarg.1    // 'second'
ldarg.0    // 'first'
bgt L0_max
ldarg.0    // 'first'
br L1_max
L0_max:
ldarg.1    // 'second'
L1_max:
stloc.s 0    // 'r'
ldloc.0    // 'r'
ldarg.2    // '_args_field_0'
ldarg.3    // '_args_field_1'
call float64 _D3max14__T3maxTiTkTdZ3maxFikdZd (
int32 'first', unsigned int32 'second', float64)
ret
}
.method public hidebysig static float64 _D3max14__T3maxTiTkTdZ3maxFikdZd (
 int32 'first', unsigned int32 'second', float64)
{
.maxstack 3
.locals init (
[0] unsigned int32 'r'
)
ldarg.1    // 'second'
ldarg.0    // 'first'
conv.u4
bgt L2_max
ldarg.0    // 'first'
conv.u4
br L3_max
L2_max:
ldarg.1    // 'second'
L3_max:
stloc.s 0    // 'r'
ldloc.0    // 'r'
ldarg.2    // '_args_field_0'
call float64 _D3max12__T3maxTkTdZ3maxFkdZd (unsigned int32 'first', float64 'second')
ret
}
.method public hidebysig static float64 _D3max12__T3maxTkTdZ3maxFkdZd (
 unsigned int32 'first', float64 'second')
{
.maxstack 2
.locals init (
[0] float64 'r'
)
ldarg.1    // 'second'
ldarg.0    // 'first'
conv.r8
bgt L4_max
ldarg.0    // 'first'
conv.r8
br L5_max
L4_max:
ldarg.1    // 'second'
L5_max:
stloc.s 0    // 'r'
ldloc.0    // 'r'
ret
}

Edit: One more reason for loving D templates: pasting D code into HTML does not require replacing angular brackets with < and > respectively!