Sunday, February 01, 2009

To Destruct a D Struct

I wrote a while ago about similarities between D and .NET (and implicitly C#). My interest in mapping D features to .NET is driven by a research project that I took on a few months ago: a D 2.0 language compiler for .NET (D 2.0 is a branch version of D that includes experimental features). I was mentioning how in both D and C# structs are lightweight, value types.

After working on struct support in more detail, I have come to the realization that D structs cannot be implemented as .NET value type classes. Rather, they have to be implemented as reference type classes.

The short explanation is that while in IL value classes do not participate in garbage collection, D expects the GC to reap structs after they are no longer in use.

Interestingly enough, value types may be newobj-ed (not just created on the stack).

We can use a simple example to demonstrate the difference between value classes and reference classes. If we compile the following program using the IL assembler (ILASM) and run it, nothing gets printed on the screen:

.assembly extern mscorlib {}
.assembly 'test' {}

.class public value auto Test
{
.field public int32 i

.method public void .ctor()
{
ret
}
.method virtual family void Finalize()
{
ldstr "finalizing..."
call void [mscorlib]System.Console::WriteLine(string)
ret
}
}
//--------------------------------------------------------------
// main program
//--------------------------------------------------------------
.method public static void main ()
{
.entrypoint
.locals init (
class Test t
)
newobj instance void Test::.ctor()
stloc 't'
ret
}


But if we changed the declaration of the Test class from a value type to class, like this:

.class public auto Test

we could see "finalizing..." printed, a confirmation that the destructor (the Finalize method) is being invoked by the garbage collector. All it takes is removing "value" from the declaration.

In IL, value types have no self-describing type information attached. I suspect that the reason for not having them being garbage collected is that, without type information, the system cannot possibly know which (virtual) Finalize method to call (note that although C# struct are implemented as sealed value classes, "sealed" and "value" are orthogonal).

D supports the contract programming paradigm, and class invariants is one of its core concepts.

The idea is that the user can write a special method named "invariant", which tests that certain properties of a class or struct hold. In debug mode, the D compiler inserts "probing points" throughout the lifetime of the class (or struct), ensuring that this function is automatically called: after construction, before and after execution of public methods, and before destruction.

The natural mechanism for implementing the last statement is to generate a call to the invariant method at the top the destructor function body. But if the destructor is never called then we've got a problem.

So having destructors work correctly is not just a matter of collecting memory after the struct expires, but it is also crucial to contract programming in D.

Assignment to structs and passing in and to functions may become heavier weight in D.NET than in the native, Digital Mars D compiler (albeit this is something that I have to measure) by implementing structs as reference type classes, but it is necessary in order to support important D language features.

3 comments:

Max Lybbert said...

Very interesting.

I tried the IL example, and was not able to get it to compile with my copy of ILASM until I added "class" in front of Test t (line 26):

.locals init(
class Test t
)

This was necessary for both examples.

Cristache said...

I just tried with the versions of ILASM as shipped with Visual Studio 2003, 2005 and 2008 and I stand corrected.

You are absolutely right, thanks for pointing out the error.

I was logged in a Linux box when I wrote the post; mono's ILASM compiled the code just fine. This adds another piece of evidence against mono :)

Max Lybbert said...

I simply wanted to make sure anybody else trying the examples won't have much trouble.

I don't have an opinion either way on mono's (probably unintended) extension. But your posts are getting me interested in VM assembly code (so long as it's not JVM bytecode, I gave up on finding anything useful in that a long time ago).