Saturday, March 28, 2009

D for Programmers

I wrote a while ago about implementing thread support in the D .net compiler. The idea was to generate code that constructs delegates that are passed to the Start method of the System.Threading.Thread class. I discussed some details of constructing delegates out of nested functions, and I concluded that my next thread-related task was to implement support for the synchronized keyword.

Like the postman who goes for a walk in the park after coming home from work, I relax in the after hours by working on pet projects like C++ debuggers and D compilers. So this Saturday morning I sat down to implement the code generation for synchronized. The D language accepts two statement forms:
synchronized ScopeStatement
synchronized (Expression) ScopeStatement
The former can be easily reduced to the latter by synthesizing a global object and an expression that references it.

Here is a sample of D code that illustrates the use of the synchronized keyword:

import System;

class Semaphore {
bool go;
}
Semaphore sem = new Semaphore;

void main() {
void asyncWork() {
while (true) { // busy wait
synchronized(sem) {
if (sem.go) break;
}
}
}
Threading.Thread t = new Threading.Thread(&asyncWork);
t.Start();
synchronized(sem) {
sem.go = true;
}
t.Join();
}
A synchronized statement can be transformed into the following pseudo-code:
object m = Expression();
try {
lock(m);
ScopeStatement();
}
finally {
unlock(m);
}
Implementing lock / unlock maps naturally to the Enter / Exit methods of the System.Threading.Monitor class. That's really all there is to it, generating the method calls is trivial.

I was a bit disappointed by how easy the task turned out to be, but on the bright side I had plenty of time left to spend with my kid. I took him to Barnes and Noble to check out the Thomas trains and board books and the computer section, where I found the most helpful book title ever: "C# for Programmers". I guess no lawyer accidentally wandering through the computer book section can claim that he bought the book because the title was misleading. I wish that all book titles be so clear: "Finance for Accountants" or "Elmo Board Book for Toddlers".

Wednesday, March 25, 2009

Another Perl in the Wall

The most successful computer languages out there were born out of concrete problems: Perl in the beginning was nothing more than a Reporting Language; C came out of the need of writing OS-es in a portable fashion; PHP emerged of somebody's need for expressing dynamic web content as server-side macros. C# solves the problem of doing component programming without intricate COM knowledge and 2 PhD-s per developer.

Typically, after the problem is solved, and the engineers scratched their itch, wrote the code, and shipped the (working!) products, some rather academic type decides: "Now I am going to redo it the right way" (and that's how UNIX got rewritten as Plan 9, a great OS that nobody uses).

Interestingly enough, the redesigned products rarely end up being as successful as the original. People have been trying for years to replace things such as Windows, Office and the C programming language with "better" rewrites, but the market does not seem to care much. If it was good enough the first time around, it got adopted and used. Who cares how neat (or messy) the internal plumbings are?

The C and C++ languages are great for doing low level systems programming; they may be harder to use for constructing applications, and I would definitely advise against using C++ for web development. The D programming language is fancying itself as a better C++ and I think that is true in the application development area. But I do not see D as a systems language. I will never write a task scheduler in a garbage-collected language. When I write system level stuff, I want the good old WYSIWYG behavior: no garbage collector thread running willy-nilly and no strange array slices (that are like arrays except for when they aren't). And thanks but no thanks, I want no memory fences, no thingamajica inserted with parental care by the compiler to protect me from shooting my own foot on many-core systems. That is the point of systems programming: I want the freedom to shoot myself in the foot.

I have been trying (unsuccessfully) to argue with some folks on the digitalmars.d newsgroup that the new 2.0 D language should not worry much about providing support for custom allocation schemes. D is designed to help productivity: it relieves programmers from the grueling tasks of managing memory, and it encourages good coding practices, but a systems language it is not. We already have C, C++ and assembly languages to do that low-level tweak when we need it.

Sadly, some of the people involved with the design of D 2.0 are aiming for the moral absolute, rather than focus on shipping something that works well enough. I think it is a bad decision to allow for mixing the managed and unmanaged memory paradigms; it is even worse that there are no separate pointer types to disambiguate between GC-ed and explicitly-managed objects. C++ went that route in its first incarnation, and it wasn't long before people realized that it was really hard to keep track of what objects live on the garbage collected heap and what objects are explicitly managed. A new pointer type had to invented (the one denoted by the caret) to solve the problem.

If folks really want to use D to program OS-es and embedded devices and rewrite the code that controls the breaks in their cars, they should at least make a separate D dialect and name it for what it is, Systems D, Embedded D or something like that. The garbage collection and other non-WYSIWYG features should be stripped out from such a dialect.

The ambition of making D 2.0 an all encompassing, moral absolute language may cause 2.0 to never ship, never mind get wide adoption. Perl started out with more modest goals and ended up enjoying a huge mind share.

So the ship-it today if it works hacker in me feels like dedicating a song to the language purists:
We don't need no education
Perl's best language of them all,
We don't need no education
Thanks a bunch to Larry Wall!

Friday, March 20, 2009

Associative Arrays in D.NET

The D Programming Language supports foreach loops over associative arrays.

Associative arrays are data structures that look much like "normal" arrays, but the index types are not integers. Here's an example of an array of doubles indexed by strings, expressed in D:

double [string] a = [ "pi" : 3.14159, "e" : 2.718281828, "phi" : 1.6180339 ];

The D language does not explicitly specify how associative arrays should be implemented. In C++ associative arrays can be implemented as standard STL maps, hash_maps (to be replaced by unordered_maps in C++0x) or with Google's sparse hash maps, to name a few possibilities.

In other languages such as Python, associative arrays are called dictionaries. The family of .NET languages take advantage of the System.Collections.Generic.Dictionary class (this is also what the D compiler for .NET does: it implements associative arrays as system dictionaries).

D provides an easy way to iterate over an associative array, the foreach keyword. This keyword should be familiar to anyone programming in C#, UNIX shell, or managed C++. Here is an example for how it is being used in D:

foreach (string key, double value; a) {
version(D_NET) {
Console.WriteLine("{0}={1}".sys, key, value);
}
else {
writefln("%s=%f", key, value);
}
}

The types for the key and value arguments can be explicitly specified, but that is not necessary as the compiler can infer them automatically. The foreach line can be re-written more compactly as:

foreach (key, value; a) {

Another legal form for the statement, used to iterate over the values (and ignore the keys) is:

foreach (value; a) {


The current implementation of the compiler front-end synthesizes a nested function out of the loop's body. The .NET back-end constructs a delegate out of this nested function and its closure; then it "wimps out" and calls a run-time helper written in C#:

public class AssocArray
{
public delegate int Callback<V>(ref V value);
public delegate int Callback<K, V>(K key, ref V value);

static public void Foreach<K, V>(Dictionary<K, V> aa,
int unused,
Callback<V> callback)
/// ...
static public void Foreach<K, V>(Dictionary<K, V> aa,
int unused,
Callback<K, V> callback)
/// ...
}

The generic Foreach function has two overloads, to accommodate both forms of the foreach statement.

D rules do not allow an array to be modified from within the loop, but the elements of the array can be modified if the value argument has a ref storage class:

foreach (key, ref value; a) {
value = 0.0;
}
C#'s rules are stricter, one cannot modify neither the collection (by adding / removing elements) nor change the individual elements. To work around this restriction, the run-time helper code does two passes over the dictionary that corresponds to the D associative array:

static public void Foreach<K, V>(Dictionary<K, V> aa,
int unused, Callback<K, V> callback)
{
Dictionary<K, V> changed = new Dictionary<K, V>();
foreach (KeyValuePair<K, V> kvp in aa)
{
V temp = kvp.Value;
int r = callback(kvp.Key, ref temp);
if (!kvp.Value.Equals(temp))
{
changed[kvp.Key] = temp;
}
if (r != 0)
{
break;
}
}
foreach (KeyValuePair<K, V> kvp in changed)
{
aa[kvp.Key] = kvp.Value;
}
}

The Callback delegate is constructed from the address of a closure object and a nested foreach function, both synthesized in the compiler. The generated code looks something like this:

newobj instance void 'vargs.main.Closure_2'::.ctor()
stloc.s 1 // '$closure3'
ldloc.1 // '$closure3'
ldftn instance int32 'vargs.main.Closure_2'::'__foreachbody1' (float64& '__applyArg0')
// construct Foreach delegate
newobj instance void class [dnetlib]runtime.AssocArray/Callback`1::.ctor(object, native int)
.line 14
call void [dnetlib]runtime.AssocArray::Foreach(
class [mscorlib]System.Collections.Generic.Dictionary`2,
int32,
class [dnetlib]runtime.AssocArray/Callback`1)

Edit: After writing this piece I noticed that I forgot to mention one interesting side effect of my implementation: because there is no try / catch around the Callback call in the C# run-time support code, the foreach loop has all-or-nothing transactional semantics.

For example, this program has different outputs when compiled with DMD from when it is compiled with my D / .NET compiler:

version(D_NET)
{
import System;
import dnet;
}
else
{
import std.stdio;
}

void main()
{
int [string] x = ["one" : 1, "two" : 2, "three" : 3];

try
{
foreach (ref v; x)
{
if (v == 3)
throw new Exception("kaboom");
v = 0;
}
}
catch (Exception e)
{
version(D_NET)
{
Console.WriteLine(e.toString().sys);
}
else
{
writefln("%s", e.toString());
}
}
foreach (k, v; x)
{
version(D_NET)
{
Console.WriteLine("{0}, {1}".sys, k, v);
}
else
{
writefln("%s, %d", k, v);
}
}
}

Under D/.NET it prints:

kaboom
one, 1
two, 2
three, 3

while the native compilation gives:

object.Exception: kaboom
two, 0
three, 3
one, 1

It would be very easy to get my compiler to emulate the native behavior, but I kind of like the "transactional" flavor...

Sunday, March 15, 2009

D Conditional Compilation

The D programming language supports conditional compilation using version identifiers and version numbers, a solution that is slightly better than the #ifdef, pre-processor driven, way of C/C++ that most of us are used to.

When using the .NET compiler for D that I am developing, one will be able to import and take advantage of .NET assemblies. For example the System.Console.WriteLine family of functions may come in handy. But such code would not compile when fed to the native Digital Mars D compiler.

Conditional compilation and the version identifier D_NET do the trick, like in this example:

version(D_NET)
{
import System;
import dnet;
}
else
{
import std.stdio;
}

void main()
{
int [string] x;

x["one"] = 1;
x["two"] = 2;

foreach (k, v; x)
{
version(D_NET)
{
Console.WriteLine("{0}, {1}".sys, k, v);
}
else
{
writefln("%s, %d", k, v);
}
}
}

So I hacked the front-end of the D for .NET compiler to predefine D_NET.

Of course, abusing conditional compilation will yield code that is unreadable and hard to grasp as a C++ source littered with #ifdef ... #else ... (or the US tax code).

But I am a strong supporter of The Second Amendment of the Internet Constitution: "the right of the People to keep and bear compilers that let them shoot themselves in the foot shall not be infringed".

Wednesday, March 11, 2009

Strumming Strings In the D Chord

In my recent interview with the InfoQ technology magazine I was asked about compatibility issues between D and .NET. I replied with a brief description of how array slices in D raise a conceptual incompatibility: System.Array and System.ArraySegment are distinct, unrelated types. In D arrays slices are indistinguishable from arrays and this creates the problems that I mentioned in the interview.

But there are other incompatibilities between D and .NET that I did not mention because I wanted to keep the interview lean and focused.

Take for example the built-in support for strings.

The keyword string is shorthand in both IL and C# for System.String (essentially a sequence of Unicode characters in the range U+0000 to U+FFFF).

In the D programming language, string is shorthand for invariant char[] and characters are unsigned bytes.

Side notes: D uses UTF8 to support foreign languages, and also sports the types wstring (a sequence of 2 byte-wide characters, compatible with Microsoft's Unicode) and dstring (for UTF-32 strings). Wide and double-wide (UTF-32) string literals are denoted by the "w" and "d" respective suffixes, as in: "Hello"w, "Good Bye"d (UTF-32 dchar works in the native D compiler, but is not currently supported in my .NET compiler).

When a variable of type string is encountered in a D source, the compiler emits a corresponding IL variable with the type unsigned int8[].

In IL there is a special instruction, ldstr, for loading string literals on the evaluation stack. This code
ldstr "Hello"
loads a "Hello" [mscorlib]System.String. If this literal is to be stored into a variable (say "x"), then my compiler will insert conversion code that looks somewhat like this:

call class [mscorlib]System.Text.Encoding
[mscorlib]System.Text.Encoding::get_UTF8()
ldstr "Hello"
callvirt instance uint8[]
[mscorlib]System.Text.Encoding::GetBytes(string)

stloc 'x' // store byte array into variable x

For the cases where a D string (array of bytes) has to be converted to a System.String I provide an explicit string property, called sys, with the following D prototype:

static public String sys(string x);

The D programmer would write something like this:

import System;
// ... snip ...
string x = "Hello";
Console.WriteLine(x.sys);
Console.WriteLine("Hello .NET".sys);


The compiler figures out that in the case of
Console.WriteLine("Hello .NET".sys);
the call to the sys function can be elided, and generates straightforwardly:
ldstr "Hello .NET"
call void [mscorlib]System.Console::'WriteLine' (string)

Matters get more interesting when we consider associative arrays. D offers a great convenience to programmers by supporting associative arrays directly in the language, for example

int [string] dict;

dict["one"] = 1;
dict["two"] = 2;
// ...

introduces an array of integers indexed by strings. By contrast, in other languages such data structures are implemented "externally" in a library; in C++ for example, std::map<std::string, int> is implemented in the STL; the C# equivalent of an associative array is the System.Collections.Generic.Dictionary. My friend and colleague Ionut Gabriel Burete contributed an implementation of associative arrays in the D compiler for .NET using exactly that class.

An associative array / dictionary with string keys is an interesting case, because System.String::Equals does the right thing out of the box, namely performs a lexicographical comparison of two strings; System.Array::Equals however simply compares object references, it does not iterate over and compare elements. This means that a Dictionary(string, int) will behave as you expect, but if the key is of the unsigned int8[] type you may be in for a surprise.

For this reason, I put a hack in the compiler back-end: for the case of associative arrays I break away from representing D strings as byte arrays in .NET, and use System.String instead, which works great (or so I thought up until I ran into the problem of generating the code for a foreach iteration loop).

For D code such as:

import System;
int [string] x;
x["one"] = 1;
x["two"] = 2;
// ...
foreach (k, v; x) {
Console.WriteLine("{0}, {1}", k, v);
}

the compiler synthesizes a function that corresponds to the body of the loop, and binds the key and value ("k" and "v" in the code snippet) to local variables in that function (this happens all inside the front-end).

The back-end must be able to detect the variables bound to foreach arguments and reconcile the data types where necessary. In the example above the type of the "k" variable in the IL will thus be System.String, and not unsigned int8[].