Programming and Debugging (in my Underhøøsen): April 2009

Tuesday, April 28, 2009

Argument against _argptr

Variadic functions work slightly different in my D.NET implementation than under the native D compiler.

For functions with variable numbers of arguments, the native compiler synthesizes two parameters: _arguments and _argptr; _arguments is an array of TypeInfo objects, and _argptr is a pointer to the beginning of the variable arguments on the stack. The user is supposed to query the type information in _arguments, and do the proper pointer arithmetic to navigate the arguments. You can see some examples at http://www.digitalmars.com/d/2.0/function.html:


void printargs(int x, ...)
{
    writefln("%d arguments", _arguments.length);
    for (int i = 0; i < _arguments.length; i++)
    {   _arguments[i].print();

        if (_arguments[i] == typeid(int))
        {
            int j = *cast(int *)_argptr;
            _argptr += int.sizeof;
            writefln("\t%d", j);
        }
        else if (_arguments[i] == typeid(long))
        {
            long j = *cast(long *)_argptr;
            _argptr += long.sizeof;
            writefln("\t%d", j);
        }
        // ...

The pointer arithmetic is not verifiable in managed code. A separate array of type descriptors is not necessary in .net, because the type meta-data can be passed in with the arguments.

In D.NET, the variable arguments are passed as an array of objects. For example, for a D function with the prototype

void fun(...)

the compiler outputs:


.method public void '_D23funFYv' (object[] _arguments)

I handled variadic support slightly differently from the native compiler: I dropped _argptr and provided a new helper function, _argtype, that can be used as demonstrated in this example:


void fun(...)
{
    foreach(arg; _arguments)
    {
        if (_argtype(arg) == typeid(int))
        {
            int i = arg;
            Console.WriteLine("int={0}".sys, i);
        }
        else if (_argtype(arg) == typeid(string))
        {
            string s = arg;
            Console.WriteLine(s.sys);
        }
    }
}

If the type of the arguments is known, there is no need to check for the typeid:


void fun(...)
{
    foreach(arg; _arguments)
    {
        int i = arg;
        Console.WriteLine(i);
    }
}

If an incorrect type is passed in, it is still okay, because the error is detected at runtime.


fun("one", "two", "three"); // int i = arg will throw

The downside of this approach is that it is not compatible with the native code. This does not affect template variadic functions, which should be perfectly portable.

Slice and D-ice

It is amazing how much insight one can get into a language by simply writing a compiler for it... Today I am going to spend half a lunch break copying and pasting into this post a stack of notes related to array slices (collected over the past few months of working on a .net back-end for the D compiler).

D sports a data type called array slice that is intended to boost the performance of array-intensive computations. A slice is a lightweight value type, conceptually equivalent to a range of array elements, or a "view" into an array. It can be thought of as consisting of a reference to an array, and a pair of begin-end indexes into the array:


struct (T) ArraySlice {
    T[] a; // reference to array
    int begin; // index where slice begins
    int end; // one past index where slice ends
}

The actual representation of a slice is currently internal to the compiler, and completely opaque to the programmer. The template struct above is not what the layout of a slice actually looks like, it is intended for illustrative purposes only.

Consider this code:


int a[] = [1, 3, 5, 7, 9, 11, 13, 17];
int[] s = a[2..5];

The second declaration introduces "s" as a slice of array "a", starting at position two and ending at (but not including) the fifth element. Using the template pseudo-definition of ArraySlice, the code is conceptually equivalent to:


int a[] = [1, 3, 5, 7, 9, 11, 13, 17];
ArraySlice!(int) s = { a, 2, 5 };

To understand how array slices may help performance, consider an application that reads and parses XML files. The input can be loaded as an array of characters (a huge string). The application builds a DOM representation of the input, and each node in the DOM contains a token string. This approach is wasteful, because the token strings hold copies of character sequences that are already present in the input; copying tokens around has a linear complexity (it is directly proportional with the number of characters in the token) and the same is true for the spatial complexity (how much memory is being used). But XML tokens could be modeled as slices of character arrays ("views" into the original XML input string), and complexity in both time and space would drop down to a constant value.

This design can be implemented in other languages than D, but memory management issues may add unnecessary complexity. In C++ for example we'll have to make sure that the life-time of the token slices does not exceed the scope of the original input string. D belongs to the family of garbage-collected languages; by default, objects are reference-counted, and holding a reference to an array slice indirectly keeps the original array "alive", because the slice contains a reference to it.

Now that the design rationale behind array slices is understood, let's take another look at the syntax. You have probably noticed that in the statement:
int[] s = a[2..5];

The declaration part introduces "s" as an array of integers; it is not until you see the assignment that the lightweight, array slice, true nature of "s" is revealed.
D has no special syntax for disambiguating between "true" arrays and array slices in declarations; they can be used interchangeably. As a matter of fact, a function signature with array parameters will happily accept a slice as argument. In the following code, both "a" and "s" are legal arguments for the count function:


int count(int[] arr) {
    return arr.length; // return the number of elements in arr
}
writefln("%d", count(a)); // ok
writefln("%d", count(s)); // also ok

Resizing Slices

Both arrays and slices support the built-in length property. As you expect, an expression such as a.length tells you how many elements are present in the array; the property applies to array slices as well, and in that case it gives the number of elements within the slice range. For example, the output of the following code is "3":


int a[] = [1, 2, 3, 5, 7, 9, 11, 13, 17];
int[] s = a[2..5];
writefln("%d", s.length); // prints 3

So far so good, but I forgot to mention that the length property is read-write: not only can you query the size of an array, you can also change it, like this:


a.length = 100; // extend the array to 100 elements

Assignment to the length property resizes the array. This begs the question: what happens when an array slice is being resized? The answer of course is "it depends".
With "a" and "s" defined as above, let's say we resize the "s" slice from 3 elements to 7:


s.length = 7;

This extends the view of "s" into "a" up to the ninth element of "a" ("s" starts at 2). It is as we have said:


s = a[2..9];

The slice is still within the bounds of the "a" array. Resizing it is a constant-time operation that changes one field inside the internal representation of "s". If instead of the built-in slice we had used the template ArraySlice struct introduced above, resizing the slice would have amounted to:


int a[] = [1, 3, 5, 7, 9, 11, 13, 17];
ArraySlice!(int) s = { a, 2, 5 };
s.end = 9; // resize the slice

Because "s" is simply a "view" of the array, modifying an element in the array is immediately reflected into the slice, for example:


writefln("%d %d", a[2], s[0]); // prints "5 5"
a[2] = 23;
writefln("%d %d", a[2], s[0]); // prints "23 23"

What happens if we re-size "s" past the end of "a"?


s.length = 100; // what does this do?

The answer is that the behavior is up to the compiler. The current native compiler from Digital Mars changes the type of "s" from a lightweight view into an array to a full-fledged array, and re-sizes it to fit 100 elements.


int a[] = [1, 2, 3, 5, 7, 9, 11, 13, 17];
int[] s = a[2..5];
writefln("%d %d", a[2], s[0]); // prints "5 5"
s.length = 100;
a[2] = 23;
writefln("%d %d", a[2], s[0]); // prints "23 5"

In other words, resizing a slice past the end of the original array breaks up the relationship between the two, and from that point they go their separate merry ways.
This behavior also underlines the schizophrenic nature of array slices that are not full copies of arrays, unless they change their mind.

Concatenating Slices

We saw how array slices may be re-sized via the “length” property. Slices may be re-sized implicitly via concatenation, as in the following example:


int a[] = [1, 2, 3, 5, 7, 9, 11, 13, 17];
int s[] = a[0..5];
s ~= [23, 29];

The tilde is the concatenation operator for arrays, strings and slices. In this example, the slice is resized to accommodate the elements 23 and 29. Note that even that in this situation the resizing is different from had we written:
s.length += 2;

Extending the length by two elements simply bumps up the upper limit of the slice (because there is still room in the original array, "a"). As we saw in the previous section, if the new length exceeds the bounds of the original array, the slice will be "divorced" from the original array, and promoted from a light-weight view to a full, first-class array. If we just extend the length by two elements, the bounds of "a" are not exceeded.

However, in the case of appending (as in s ~= [23, 29]) in addition to resizing we are also setting the values of two additional elements. The slice needs to be divorced from the array, so the a[5] and a[6] are not overwritten with the values 23 and 29. The compiler turns "s" into a full array of length + 2 == 7 elements, copies the elements of “a” from 0 to 5, then appends values 23 and 29.

The problem, as with resizing past the bounds of the original array, is that after the array and the slice part their ways, it is no longer possible to modify value types in the original array via the slice (which has now been promoted to a standalone array). This is a run time behavior, hard to predict by statically analyzing (or visually inspecting) the code.

Rolling Your Own Array Slices

It is impossible to determine statically whether a D function parameter is an array or a slice by examining the function's code alone. It is up to the function's caller to pass in an array or a slice.


void f (int[] a) {
// ...
}
int a[] = [1, 2, 3, 5, 7, 9, 11, 13, 17];
f(a); // call f with an array argument
f(a[2..5]); // call f with a slice argument;

In some cases you may want to better communicate that your function is intended to work with slices rather than arrays. You may also want to have better control over the slice's properties. Say for example you want to make sure that a slice is never re-sized.

You can accomplish these things by rolling your own ArraySlice. The template struct that was introduced earlier is a good starting point. The signature of "f" can be changed to:


void f (ArraySlice!(int) a) { // ...

That's a good start, but the struct is not compatible with a built-in array slice. The following code does not compile:


struct (T) ArraySlice {
    T[] a; // reference to array
    int begin; // index where slice begins
    int end; // one past index where slice ends
}

int a[] = [1, 2, 3, 5, 7, 9, 11, 13, 17];
ArraySlice!(int) s = { a, 2, 5 };
foreach(i; s) { // error, does not compile
    writefln("%d", i);
}

You could of course rewrite the foreach loop to use the begin..end range:


foreach(i; s.begin..s.end) {
    writefln("%d", s.a[i]);
}

In addition to being more verbose such code is not very well encapsulated, since it explicitly accesses the struct's public members. If we later decide to factor out ArraySlice into its own module, and make the “a”, “begin”, and “end” members private, the code above will not compile anymore.

All that's preventing the compact version of the foreach loop from compiling is that the opApply operator is missing. So let's add one:


struct (T) ArraySlice {
  // ...
  int opApply(int delegate(ref int) dg) {
      foreach (i;begin..end) {
          dg(a[i]);
      }
      return 0;
  }
}

Great! This gets us past the compilation error. The foreach loop now compiles and prints out all the elements in the slice. There is a small and subtle bug in this code though. Suppose that instead of printing all elements in the slice, you're doing a linear search, for example:


foreach(i; s) {
    if (i == 5) { // found it!
        break;
    }
}

Astonishingly enough, this code will not break out of the foreach loop, instead it will continue through all the elements in the slice.

The http://www.digitalmars.com/d/2.0/statement.html website prescribes the behavior of the opApply operator:

"The body of the apply function iterates over the elements it aggregates, passing them each to the dg function. If the dg returns 0, then apply goes on to the next element. If the dg returns a nonzero value, apply must cease iterating and return that value".

The D compiler synthesizes the delegate function from the body of the foreach loop, and the code above is transformed internally to this equivalent form:


int dg(ref int i) {
    if (i == 5) {
        return 1;
    }
    return 0;
}
s.opApply(&dg);

The bug in the opApply operator is that the loop should be broken out of when dg returns non-zero:


int opApply(int delegate(ref int) dg) {
 foreach (i; begin..end) {
     if (dg(a[i])) break;
 }
 return 0;
}

Now foreach works correctly. To support foreach_reverse, just add a opApplyReverse member function to the ArraySlice template struct:


int opApplyReverse(int delegate(ref int) dg) {
 foreach_reverse (i; begin..end) {
     if (dg(a[i])) break;
 }
 return 0;
}

What about manipulating the length of the slice? Neither of the lines below compiles:


writefln("%d", s.length);
s.length = 100;

To support the length property, we have to add these methods:


struct (T) ArraySlice {
    // ...
    // return the length of the slice
    int length() { return end - begin; }

    // resize the slice, preventing it to grow past
    // the original array's length
    void length(int newLength) {
        end = begin + newlength;
        if (end > a.length) { end = a.length; }
    }
}

To prevent resizing the slice, all there is to do is to leave the second overload undefined, effectively turning length into a read-only property.

Clients of the struct can still set its individual members to inconsistent values. To disallow incorrect usage, the "a", "begin", and "end" members can be made private (and the struct will have to move to its own module, because in D private access is not enforced if the class or struct lives in the same module as the client code).

To make the ArraySlice struct even more source-compatible with built-in slices, you can give it an opIndex operator:


struct (T) ArraySlice {
    // ...
    T opIndex(size_t i) { return a[i]; }
}

The opIndex operator allows this to work:
writefln("%d", s[3]);

Assignment to array elements is not allowed:


s[3] = 42; // error, does not compile

If you want the array elements to be modified via the slice like that, just define
opIndexAssign:


struct (T) ArraySlice {
    // ...
    void opIndexAssign(size_t i, T val) { a[i] = val; }
}

When you put it all together, the ArraySlice struct will look something like this:


struct (T) ArraySlice {
private:
    T[] a; // reference to array
    int begin; // index where slice begins
    int end; // one past index where slice ends
public:
    T opIndex(size_t i) { return a[i]; }
    void opIndexAssign(size_t i, T val) { a[i] = val; }
    int length() { return end - begin; }
    // comment this function out to prevent resizing
    void length(int newLength) {
        end = begin + newlength;
        if (end > a.length) { end = a.length; }
    }

    // support foreach
    int opApply(int delegate(ref int) dg) {
        foreach (i;begin..end) {
            if (dg(a[i])) break;
        }
        return 0;
    }
}

In conclusion, by rolling your own array slice implementation the intent of your code becomes clearer, your level of control over it increases, and you can still retain the brevity of the built-in slices.

Friday, April 24, 2009

Static ctors in D.NET (Part 2)

D allows multiple static constructors per class (all sharing the same signature). For example, the following code is legal:


version(D_NET)
{
    import System;
    alias Console.WriteLine println;
}
else
{
    import std.stdio;
    alias writefln println;
}
class A
{
    static int i = 42;
    static this() 
    {
        println("static A.this 1");
    }
    static this() 
    {
        println("static A.this 2");
    }
}
void main()
{
    println(A.i);
}

The program prints:


static A.this 1
static A.this 2
42

Because IL does not allow duplicate methods with the same signature, instead of mapping static constructors directly to .cctor methods, my compiler generates one .cctor per class (where needed) that makes function calls to the static this() constructors. The .cctor is not called if the class is never referenced -- this behavior is different from the native Digital Mars D compiler. If we comment out the one line in main, it will still print the constructor messages in native mode, but not under the .net compiler.

D classes may also have one or more static destructors, as in this example:


class A
{
    static int i = 42;
    static this() 
    {
        println("static A.this 1");
    }
    static this() 
    {
        println("static A.this 2");
    }
    static ~this()
    {
        println("static A.~this 1");
    }
    static ~this()
    {
        println("static A.~this 2");
    }
}

Unlike with the class constructors, there is no special IL method to map static destructors to. My compiler supports them with AppDomain.ProcessExit event handlers, registered in reverse order of their lexical occurrences. IL allows non-member .cctor methods, and the compiler takes advantage of this feature to synthesize code that registers the static destructors as ProcessExit handlers.

It is interesting to observe that the global .cctor does reference the class when it constructs the event handler delegates:


.method static private void .cctor()
{
  // register static dtor as ProcessExit event handler
  call class [mscorlib]System.AppDomain [mscorlib]System.AppDomain::get_CurrentDomain()
  ldnull
  ldftn void 'example.A'::'_staticDtor4'(object, class [mscorlib]System.EventArgs)
  newobj instance void [mscorlib]System.EventHandler::.ctor(object, native int)
  callvirt instance void [mscorlib]System.AppDomain::add_ProcessExit(class [mscorlib]System.EventHandler)
  // register static dtor as ProcessExit event handler
  call class [mscorlib]System.AppDomain [mscorlib]System.AppDomain::get_CurrentDomain()
  ldnull
  ldftn void 'example.A'::'_staticDtor3'(object, class [mscorlib]System.EventArgs)
  newobj instance void [mscorlib]System.EventHandler::.ctor(object, native int)
  callvirt instance void [mscorlib]System.AppDomain::add_ProcessExit(class [mscorlib]System.EventHandler)
  ret
}

This means that the .cctor of the class will be called, even if no user code ever references it.

In addition to class static constructors and destructors, D also features module static constructors and destructors. These are expressed as non-member functions with the signature static this() and static ~this(), respectively.
For example:


//file b.d
import a;
version(D_NET)
{
    import System;
    alias Console.WriteLine println;
}
else
{
    import std.stdio;
    alias writefln println;
}

static this()
{
    println("module B");
    map["foo"] = "bar";
}
static this()
{
    println("boo");
}
static ~this()
{
    println("~boo");
}

//file a.d
version(D_NET)
{
    import System;
    alias Console.WriteLine println;
}
else
{
    import std.stdio;
    alias writefln println;
}

string map[string];

static this()
{
    println("module A");
}
static ~this()
{
    println("~module A");
}

void main()
{
    foreach (k, v; map)
    {
        version(D_NET)
        {
            Console.WriteLine("{0} -> {1}".sys, k, v.sys);
        }
        else
        {
            writefln("%s -> %s", k, v);
        }
    }
}

It is noteworthy that regardless in which order the two files above are compiled the resulting program prints the same output:


module A
module B
boo
foo -> bar
~boo
~module A

The explanation lay in the D language rules: if a module B imports a module A, the imported module (A) must be statically initialized first (before B).

As in the case of static constructors and destructors for classes, the compiler uses the global, free-standing .cctor method to stick calls to module ctors and register ProcessExit events that call the module's static dtors.

Thanks to BCSd for prompting this post with his comment and code sample.

Tuesday, April 14, 2009

Static Constructors in D.NET

The D programming language features static constructors that are similar to class constructors in C#: they are called automatically to initialize the static fields (shared data) of a class.

At the IL level, static constructors are implemented by the special .cctor methods. The experimental compiler for D that I am working on in my virtual garage groups together user-written static constructor code with static field initializers into .cctors (and I believe that the C# compiler does the same).


class Example {
  static int solution = 42; // assignment is moved inside .cctor
  static double pi;

  static this() { // explicit static ctor ==> .cctor
    pi = 3.14159;
  }
}

The code above produces the same IL as:


class Example {
  static int solution = 42;
  static double pi = 3.14159;
}

Also, the compiler synthesizes one class per module to group all "free-standing"global variables (if any).

For example, the IL code that is generated out of this D source


static int x = 42;

void main() {
  writefln(x);
}

is equivalent to the code generated for:


class ModuleData {
  static int x = 42;
}
void main() {
  writefln(ModuleData.x);
}

only that in the first case the ModuleData class is implicit and not accessible from D code. This strategy allows for the initializers of global variables to be moved inside the .cctor of the hidden module data class.

IL guarantees that the class constructors are invoked right before any static fields are first referenced. If the compiler flags the class with the beforefieldinit attribute, then the class constructors are called even earlier, i.e. right when the class is referenced, even if no members of the class are ever accessed (the C# compiler sets the beforefieldinit attribute on classes without explicit class constructors).

Serge Lidin explains all the mechanics in his great book Expert .NET 2.0 IL Assembler, and recommends avoiding beforefieldinit, on grounds that invoking a .cctor is a slow business. I am considering using it though, on the synthesized module data class.

In conjunction with a compiler-generated reference to the module data class, the beforefieldinit attribute will guarantee that the global variables are initialized on the main thread, and will avoid strange race conditions and bugs.

Saturday, April 04, 2009

No Teleprompter to Blame

I will never run for President of the United States. Not because I wouldn't like to, but because I can't: I am a Naturalized Citizen, one step below the Natural Born One. As depressing as this can be, there's a good side to it, too: I can always recant.

I have the luxury to lightheartedly declare "Folks, I don't know what I was smoking when I said that D structs cannot be implemented as value types in .net", without being afraid of losing any points in any poll.

Further research proved that my initial argument, in .net value types do not participate in garbage collection was... er irrelevant. That's because Walter Bright' s D compiler front-end is smart enough to insert calls to the structs' destructors wherever necessary! Now that's what I call a bright design. It doesn't matter that the CLR does not garbage-collect new-ed value types, because the front-end generates the code that deletes them.

I was running some compiler back-end tests for the D language post-blit operator when I realized that copying value types in IL is straight-forward; you LOAD the source variable / field / argument and STORE into the destination (boom!) whereas bit-copying managed classes is not as trivial.

I did not give up right away. Hanging on to my structs-as-classes implementation, I wrote a run-time helper blitting routine:


        using System.Runtime.InteropServices;

        // assumes src and dest are of the same type
        static public void blit(Object dest, Object src)
        {
            int size = Marshal.SizeOf(src.GetType());
            IntPtr p = Marshal.AllocHGlobal(size);
            try
            {
                Marshal.StructureToPtr(src, p, false);
                Marshal.PtrToStructure(p, dest);
            }
            finally
            {
                Marshal.FreeHGlobal(p);
            }
        }

It did not take long for the truth to dawn upon me: Wow, bit-blitting non-value types is a major pain in the (oh) bum (ah). Efficient that code ain't (or should I say ISN'T? man do those consonants hurt). Honestly, I am not even sure that code is kosher. Better not to get into a pickle if one can avoid it... so back to the struct-as-value type implementation I am.

Programming and Debugging (in my Underhøøsen)