Tuesday, December 30, 2008

Jagging Away

The D programming language does not support multi-dimensional arrays.

Instead, multi-dimensional matrices can be implemented with arrays of arrays (aka jagged arrays), same as in C and C++.

When a static, multidimensional array needs to be initialized, in a statement such as:

int foo[3][4][5][6];

the native compiler back-end implicitly initializes the array by reserving the memory and filling it with zeros.

In the .NET back-end for the D compiler that I am working on, things are different: explicit newarr calls are required, in conjunction with navigating the data structure and initializing the individual elements.

And this is where it gets interesting. The array may have any arbitrary rank, and thus the compiler needs to figure out the types of the nested arrays; for the example above, they are:

int32 [][][][]
int32 [][][]
int32 [][]
int32 []

My implementation uses a runtime helper function in the dnetlib.dll assembly; rather than trying to determine the rank of the array and the types involved, the compiler back-end simply generates a call to the runtime helper, which does the heavy lifting. This solution works for jagged arrays of any rank.

The helper code itself is written in C# and uses generic recursion; it appends square brackets [] to the generic parameter at each recursion level, like shown below.

namespace runtime
{
public class Array
{
//helper for initalizing jagged array
static public void Init<T>(System.Array a, uint[] sizes, int start, int length)
{
if (length == 2)
{
uint n = sizes[start];
uint m = sizes[start + 1];
for (uint i = 0; i != n; ++i)
{
a.SetValue(new T[m], i);
}
}
else
{
--length;
//call recursively, changing template parameter from T to T[]
Init<T[]>(a, sizes, start, length);
uint n = sizes[start];
for (uint i = 0; i != n; ++i)
{
Init
<T>((System.Array)a.GetValue(i), sizes, start + 1, length);
}
}
}

//called at runtime
static public void Init<T>(System.Array a, uint[] sizes)
{
Init<T>(a, sizes, 0, sizes.Length);
}
}
} //namespace runtime

Sunday, December 28, 2008

Is There A Point in Using Pointers?

A few people wrote back in response to a previous blog post on the D for .NET project, some asking, well, why .NET?

Part of the answer is that .NET and D seem to be made for each other:

A common fragrance imbues both designs; for example, in D structs are not objects, but value types -- same as in C#. In D all objects inherit from a root object, which has methods such as toString , toHash and opEquals; in .NET, [mscorlib]System.Object sports ToString, GetHashCode, and Equals.

Still not convinced? How about array properties, then? In D there are properties such as sort, reverse, and dup; in .NET we have System.Array.Sort(), System.Array.Reverse(), and (tadaaa) System.Array.Clone(). Coincidence? Perhaps. Or maybe powerful memes where floating free in the air and found propitious hosts in both .NET and D (not unlike the idea of Python-scripting a debugger, which was pioneered by ZeroBUGS, and it is now being adopted by GDB).

But the cute metaphors have to stop somewhere (no honeymoon lasts forever) and so we come upon the thorny issue of pointers. D allows pointers, albeit does not encourage them. But unmanaged pointers (and even managed pointers arithmetic) does not yield verifiable code in .NET. I have experimented with both managed and unmanaged pointers, and generated textual IL that compiles and runs; PEVERIFY however refuses to put the seal of approval on such code.

And so I am very tempted to disallow pointers in class and struct members (in D, as in .NET objects are manipulated via references anyway, so what's the point of a pointer, anyway?)

A Good Idea is Worth Stealing

As always, the freetards in the Open Source community are stealing good ideas. Scripting the debugger with Python, pioneered by my work in ZeroBUGS is now copied by GDB: http://sourceware.org/ml/gdb/2008-02/msg00140.html

Too bad their implementation is awfully buggy.

And too bad that back in 2006 I did not think the idea was patentable :)

Sunday, December 21, 2008

Hello .NET, D Here Calling

A piece of advice from someone who spent fifteen years writing software professionally: if some "experts" ever say "printf debugging" is a poor technique, tell them to get out of town.

Printf debugging is helpful in a many great deal of situations, for example when you are writing a compiler. The debugger cannot be trusted, because the work-in-progress compiler may not output complete debug information just yet. But you can trust what's printed white on black on the screen.

There is a chicken and egg problem with printf though: how does one compile the implementation of printf (or writefln, as it is the case with the D programming language) if the compiler itself is not there?

Luckily, in my implementation of D for .NET (a project that I plan to release under the BSD in 2009) I was able to completely circumvent the problem. See, .NET already has a rich set of libraries and services. As a matter of fact, the very purpose of D.NET is to enable D programmers to take advantage of this state-of-the-art computing platform.

The obvious choice is to use one of the System.Console.WriteLine overloads instead of writefln. In order to do that, the front-end needs to know about System.Console. The good news is that Walter Bright's implementation of the D compiler front-end (which I am using) already handles imports.

In order to get this code to work

import System;

void main() {
System.Console.WriteLine("hello D.NET");
}

I wrote a System.d file, containing the D version of the Console class declaration (not complete, but good enough to get me going):

public class Console
{
static public void WriteLine();
static public void WriteLine(string);
static public void WriteLine(string, ...);

static public void WriteLine(char);
static public void WriteLine(bool);
static public void WriteLine(int);
static public void WriteLine(uint);
static public void WriteLine(long);
static public void WriteLine(float);
static public void WriteLine(double);

static public void Write(char);
static public void Write(string);
static public void Write(string, ...);

static public void Write(bool);
static public void Write(int);
static public void Write(long);
static public void Write(float);
static public void Write(double);
}

In the future, I plan to write a program that produces this kind of declaration automatically from .NET assemblies, using reflection.

Okay, so at this point the front-end compiles the "hello D.NET" code happily, but ILASM cannot resolve the System.Console::WriteLine symbol. This is because in IL the class names need to be qualified by assembly name, so what ILASM expects is a statement like:

call void class [mscorlib]System.Console::WriteLine(string)

One of my design guide lines for this project is to modify the front-end as least as possible, if at all. So then how do I get the imports to be fully qualified by assembly names?

With a clever hack, of course. I added these lines at the top of System.d:

class mscorlib { }
class assembly : mscorlib { }

Then I tweaked my back-end to recognize the "class assembly" construct and prefix the imported module names with whatever the name of the base class of the assembly class is.

Sunday, December 07, 2008

Dee Dot

It does not seem that long ago when I thought .NET and C# were mere toys compared to "real" languages such as C++. Oh well. One always lives to learn!

A couple of months ago I was tasked (at my day job) with making a legacy piece of C++ business logic inter-operate with another team's C# code. It was a small, one week project, but it forced me to pick up some .NET stuff along the way. Long story short, I was so intrigued with the maturity and elegance of the technology that I decided to push the exploration further, on my own time.

A friend and colleague of mine expressed interest in the past in writing a compiler back-end for the D language, that would generate .NET assemblies. So we met and re-discussed the project and decided to give it a go. The D language features map onto .NET quite well: there's no multiple inheritance of implementation, but one class can implement any number of interfaces; structs are value types, the D language is garbage-collected with provisions for explicit destructors (just implement IDisposable), and there are static constructors (.cctors in .NET).

D has static destructors also, which I found really easy to implement by generating code that registers a ProcessExit handler, like this:

// register static dtor as ProcessExit event handler
call class [mscorlib]System.AppDomain [mscorlib]System.AppDomain::get_CurrentDomain()
ldnull
ldftn void dtor.Base::'_staticDtor1'(object, class [mscorlib]System.EventArgs)
newobj instance void [mscorlib]System.EventHandler::.ctor(object, native int)
callvirt instance void [mscorlib]System.AppDomain::add_ProcessExit(class [mscorlib]System.EventHandler)
ret

Our approach so far is to generate IL assembly code, and then invoke ILASM. This way we are portable (the code so far works on Windows as well as with Novell's mono) and easy to debug.

We got some basic flow control working, and exception handling and assertions are also supported. But there's a long way to a fully-working compiler.

We intend to make the project open source (as soon as we decide upon the license -- I kind of hate the GPL for being anti-business and anti-profit, but the jury is still out on this one).

I think this is not just a great opportunity to bring the D language to the .NET family's table, but also give D programmers access to the wealth of libraries and utilities written for .NET to date.