Friday, January 16, 2009

Deeper and D-per

I am very excited about the D for .NET compiler project because of all the things I am learning. Books on my lab's desk these days are: Compiling for the .NET Common Language Runtime (CLR), Distributed Virtual Machines: Inside the Rotor CLI, Concepts of Programming Languages (8th Edition), The Dragon Book.

As I am digging deeper and deeper I am discovering interesting design challenges.

1. Enum

In .NET, the base class for enumerated types is System.Enum, which does not allow non-integral- (nor char-) based types.

D allows strings in enumerated types, like so:

enum : string
{
A = "hello",
B = "betty"
}

Possible solutions: make D.NET's enums based on [mscorlib]System.Enum and forbid non integrals (current implementation), or make my own [dnetlib]core.Enum base class. The problem with the latter is that it would preclude inter-operation with other languages. Walter Bright's suggestion (it is so clever that I wish I came up with it myself) is to use a combination of both solutions: generate structures derived from System.Enum when possible, and base them off a custom class when not. The D .NET compiler will split enums into two groups - those that are integral types, and those that are not, plus the "char" case. This allows interoperability without crippling language features.

2. Strings

D strings are UTF8, System.String is Unicode 16. My attempt to cleverly use System.String under the hood (rather than represent D strings as byte arrays) created way more complications than I initially thought, and prompted a major re-factoring effort that ate up almost my entire week-end.

An interesting wrinkle is that in D.NET associative arrays are implemented using Dictionary objects under the hood, which use the Equals method to compare keys. For System.String, Equals does a lexicographical comparison as one would normally expect; for System.Array, the implementation of Equals simply compares object references.

D strings are now represented as arrays of bytes; in order to make associative arrays work correctly, extra work had to be done.

3. Pointers

Can't do pointers as members of classes or elements in an array.

This stems from a restriction in the IL: cannot have managed pointers as class fields or array elements.

Interesting consequences:
3.a) in D.NET a nested method cannot access variables of pointer type in the surrounding lexical context (because the implementation constructs a delegate under the hood, and the object part has all the accessed variables copied as its members)

3.b) can't pass pointers to variadic functions (because I send in the variable argument list in as an array -- for compatibility with System.Console.WriteLine)

How severe are these limitations? Are there any reasonable workarounds? I guess I'll have to dig D-per to find out.

3 comments:

Unknown said...

Hello there,

I like the idea of a D.NET compiler a lot. I was just wondering if you had looked at the MSIL backend for llvm now that the ldc compiler has an official release.

The MSIL backend was just a proof-of-concept but the issues you are having with enums, etc. when converting to .NET would disappear when using llvm/MSIL wouldn't they?

Just wondering what your thoughts are.

Thanks,
Phizzzt

Unknown said...

Hello there,

I like the idea of a D.NET compiler a lot. I was just wondering if you had looked at the MSIL backend for llvm now that the ldc compiler has an official release.

The MSIL backend was just a proof-of-concept but the issues you are having with enums, etc. when converting to .NET would disappear when using llvm/MSIL wouldn't they? Anything that can be expressed in llvmIR should be directly translatable to MSIL, I think...I may be off on that though.

Just wondering what your thoughts are.

Thanks,
Phizzzt

Cristache said...

K, great comments.

As a matter of fact my first attempt / prototype was LLVM-based, but it did not take me very far.

When I looked at it (back in October of 2008) I found it too Low Level (a Virtual Machine :)) for my purposes. For example, as I remember, to construct a class you have to layout vtables, specify alignments, etc. In managed IL you just say .class auto ... etc and do not have to worry about the details. After that it is between ILASM, the execution engine and the JIT-er to make sure things work right.

By going from D to IL with no intermediary layer, I get the freedom of generating IL whichever way I want (well, along with the issues that I outlined in the blog article).

Another consideration was that there is another D/LLVM effort under way, so if/when the LLVM MSIL backend becomes mature enough, that avenue can be explored independently of my project.

And one of my goals for the D.NET project is to learn more about the CLR. LLVM would've been a distraction from that.