Thursday, January 22, 2009

42

Yesterday night, at the monthly NWCPP meeting Walter Bright gave a presentation on meta-programming using the D language. Once again, D put C++ to shame.

Because of transportation arrangements I could not accompany Walter et. Co to the watering hole after the lecture. Instead I went home and decided to test how my D.NET work-in-progress compiler handles templates, and what kind of code it generates.

I picked a variadic template for my test, which computes the maximum of an arbitrarily long list of numbers (adapted from a version written by Andrei Alexandrescu) :

import System;

auto max(T1, T2, Tail...)(T1 first, T2 second, Tail args)
{
auto r = second > first ? second : first;
static if (Tail.length == 0) {
return r;
}
else {
return max(r, args);
}
}

void main()
{
uint k = 42;
auto i = max(3, 2, k, 2.5);
Console.WriteLine(i);
}


The program above prints 42 (of course), and here's how the generated IL looks like:


//--------------------------------------------------------------
// max.d compiled: Thu Jan 22 19:38:26 2009
//--------------------------------------------------------------
.assembly extern mscorlib {}
.assembly extern dnetlib {}
.assembly 'max' {}

.module 'max'

//--------------------------------------------------------------
// main program
//--------------------------------------------------------------
.method public hidebysig static void _Dmain ()
{
.entrypoint
.maxstack 4
.locals init (
[0] unsigned int32 'k',
[1] float64 'i'
)
ldc.i4 42
stloc.s 0 // 'k'
ldc.i4 3
ldc.i4 2
ldloc.0 // 'k'
ldc.r8 2.5
call float64 _D3max16__T3maxTiTiTkTdZ3maxFiikdZd (
int32 'first', int32 'second', unsigned int32, float64)
stloc.s 1 // 'i'
ldloc.1 // 'i'
call void [mscorlib]System.Console::'WriteLine' (float64)
ret
}
.method public hidebysig static float64 _D3max16__T3maxTiTiTkTdZ3maxFiikdZd (
int32 'first', int32 'second', unsigned int32, float64)
{
.maxstack 4
.locals init (
[0] int32 'r'
)
ldarg.1 // 'second'
ldarg.0 // 'first'
bgt L0_max
ldarg.0 // 'first'
br L1_max
L0_max:
ldarg.1 // 'second'
L1_max:
stloc.s 0 // 'r'
ldloc.0 // 'r'
ldarg.2 // '_args_field_0'
ldarg.3 // '_args_field_1'
call float64 _D3max14__T3maxTiTkTdZ3maxFikdZd (
int32 'first', unsigned int32 'second', float64)
ret
}
.method public hidebysig static float64 _D3max14__T3maxTiTkTdZ3maxFikdZd (
int32 'first', unsigned int32 'second', float64)
{
.maxstack 3
.locals init (
[0] unsigned int32 'r'
)
ldarg.1 // 'second'
ldarg.0 // 'first'
conv.u4
bgt L2_max
ldarg.0 // 'first'
conv.u4
br L3_max
L2_max:
ldarg.1 // 'second'
L3_max:
stloc.s 0 // 'r'
ldloc.0 // 'r'
ldarg.2 // '_args_field_0'
call float64 _D3max12__T3maxTkTdZ3maxFkdZd (unsigned int32 'first', float64 'second')
ret
}
.method public hidebysig static float64 _D3max12__T3maxTkTdZ3maxFkdZd (
unsigned int32 'first', float64 'second')
{
.maxstack 2
.locals init (
[0] float64 'r'
)
ldarg.1 // 'second'
ldarg.0 // 'first'
conv.r8
bgt L4_max
ldarg.0 // 'first'
conv.r8
br L5_max
L4_max:
ldarg.1 // 'second'
L5_max:
stloc.s 0 // 'r'
ldloc.0 // 'r'
ret
}

Edit: One more reason for loving D templates: pasting D code into HTML does not require replacing angular brackets with < and > respectively!

Friday, January 16, 2009

Deeper and D-per

I am very excited about the D for .NET compiler project because of all the things I am learning. Books on my lab's desk these days are: Compiling for the .NET Common Language Runtime (CLR), Distributed Virtual Machines: Inside the Rotor CLI, Concepts of Programming Languages (8th Edition), The Dragon Book.

As I am digging deeper and deeper I am discovering interesting design challenges.

1. Enum

In .NET, the base class for enumerated types is System.Enum, which does not allow non-integral- (nor char-) based types.

D allows strings in enumerated types, like so:

enum : string
{
A = "hello",
B = "betty"
}

Possible solutions: make D.NET's enums based on [mscorlib]System.Enum and forbid non integrals (current implementation), or make my own [dnetlib]core.Enum base class. The problem with the latter is that it would preclude inter-operation with other languages. Walter Bright's suggestion (it is so clever that I wish I came up with it myself) is to use a combination of both solutions: generate structures derived from System.Enum when possible, and base them off a custom class when not. The D .NET compiler will split enums into two groups - those that are integral types, and those that are not, plus the "char" case. This allows interoperability without crippling language features.

2. Strings

D strings are UTF8, System.String is Unicode 16. My attempt to cleverly use System.String under the hood (rather than represent D strings as byte arrays) created way more complications than I initially thought, and prompted a major re-factoring effort that ate up almost my entire week-end.

An interesting wrinkle is that in D.NET associative arrays are implemented using Dictionary objects under the hood, which use the Equals method to compare keys. For System.String, Equals does a lexicographical comparison as one would normally expect; for System.Array, the implementation of Equals simply compares object references.

D strings are now represented as arrays of bytes; in order to make associative arrays work correctly, extra work had to be done.

3. Pointers

Can't do pointers as members of classes or elements in an array.

This stems from a restriction in the IL: cannot have managed pointers as class fields or array elements.

Interesting consequences:
3.a) in D.NET a nested method cannot access variables of pointer type in the surrounding lexical context (because the implementation constructs a delegate under the hood, and the object part has all the accessed variables copied as its members)

3.b) can't pass pointers to variadic functions (because I send in the variable argument list in as an array -- for compatibility with System.Console.WriteLine)

How severe are these limitations? Are there any reasonable workarounds? I guess I'll have to dig D-per to find out.