Friday, March 20, 2009

Associative Arrays in D.NET

The D Programming Language supports foreach loops over associative arrays.

Associative arrays are data structures that look much like "normal" arrays, but the index types are not integers. Here's an example of an array of doubles indexed by strings, expressed in D:

double [string] a = [ "pi" : 3.14159, "e" : 2.718281828, "phi" : 1.6180339 ];

The D language does not explicitly specify how associative arrays should be implemented. In C++ associative arrays can be implemented as standard STL maps, hash_maps (to be replaced by unordered_maps in C++0x) or with Google's sparse hash maps, to name a few possibilities.

In other languages such as Python, associative arrays are called dictionaries. The family of .NET languages take advantage of the System.Collections.Generic.Dictionary class (this is also what the D compiler for .NET does: it implements associative arrays as system dictionaries).

D provides an easy way to iterate over an associative array, the foreach keyword. This keyword should be familiar to anyone programming in C#, UNIX shell, or managed C++. Here is an example for how it is being used in D:

foreach (string key, double value; a) {
version(D_NET) {
Console.WriteLine("{0}={1}".sys, key, value);
}
else {
writefln("%s=%f", key, value);
}
}

The types for the key and value arguments can be explicitly specified, but that is not necessary as the compiler can infer them automatically. The foreach line can be re-written more compactly as:

foreach (key, value; a) {

Another legal form for the statement, used to iterate over the values (and ignore the keys) is:

foreach (value; a) {


The current implementation of the compiler front-end synthesizes a nested function out of the loop's body. The .NET back-end constructs a delegate out of this nested function and its closure; then it "wimps out" and calls a run-time helper written in C#:

public class AssocArray
{
public delegate int Callback<V>(ref V value);
public delegate int Callback<K, V>(K key, ref V value);

static public void Foreach<K, V>(Dictionary<K, V> aa,
int unused,
Callback<V> callback)
/// ...
static public void Foreach<K, V>(Dictionary<K, V> aa,
int unused,
Callback<K, V> callback)
/// ...
}

The generic Foreach function has two overloads, to accommodate both forms of the foreach statement.

D rules do not allow an array to be modified from within the loop, but the elements of the array can be modified if the value argument has a ref storage class:

foreach (key, ref value; a) {
value = 0.0;
}
C#'s rules are stricter, one cannot modify neither the collection (by adding / removing elements) nor change the individual elements. To work around this restriction, the run-time helper code does two passes over the dictionary that corresponds to the D associative array:

static public void Foreach<K, V>(Dictionary<K, V> aa,
int unused, Callback<K, V> callback)
{
Dictionary<K, V> changed = new Dictionary<K, V>();
foreach (KeyValuePair<K, V> kvp in aa)
{
V temp = kvp.Value;
int r = callback(kvp.Key, ref temp);
if (!kvp.Value.Equals(temp))
{
changed[kvp.Key] = temp;
}
if (r != 0)
{
break;
}
}
foreach (KeyValuePair<K, V> kvp in changed)
{
aa[kvp.Key] = kvp.Value;
}
}

The Callback delegate is constructed from the address of a closure object and a nested foreach function, both synthesized in the compiler. The generated code looks something like this:

newobj instance void 'vargs.main.Closure_2'::.ctor()
stloc.s 1 // '$closure3'
ldloc.1 // '$closure3'
ldftn instance int32 'vargs.main.Closure_2'::'__foreachbody1' (float64& '__applyArg0')
// construct Foreach delegate
newobj instance void class [dnetlib]runtime.AssocArray/Callback`1::.ctor(object, native int)
.line 14
call void [dnetlib]runtime.AssocArray::Foreach(
class [mscorlib]System.Collections.Generic.Dictionary`2,
int32,
class [dnetlib]runtime.AssocArray/Callback`1)

Edit: After writing this piece I noticed that I forgot to mention one interesting side effect of my implementation: because there is no try / catch around the Callback call in the C# run-time support code, the foreach loop has all-or-nothing transactional semantics.

For example, this program has different outputs when compiled with DMD from when it is compiled with my D / .NET compiler:

version(D_NET)
{
import System;
import dnet;
}
else
{
import std.stdio;
}

void main()
{
int [string] x = ["one" : 1, "two" : 2, "three" : 3];

try
{
foreach (ref v; x)
{
if (v == 3)
throw new Exception("kaboom");
v = 0;
}
}
catch (Exception e)
{
version(D_NET)
{
Console.WriteLine(e.toString().sys);
}
else
{
writefln("%s", e.toString());
}
}
foreach (k, v; x)
{
version(D_NET)
{
Console.WriteLine("{0}, {1}".sys, k, v);
}
else
{
writefln("%s, %d", k, v);
}
}
}

Under D/.NET it prints:

kaboom
one, 1
two, 2
three, 3

while the native compilation gives:

object.Exception: kaboom
two, 0
three, 3
one, 1

It would be very easy to get my compiler to emulate the native behavior, but I kind of like the "transactional" flavor...

6 comments:

Anonymous said...

object.Exception: kaboom
two, 0
three, 3
one, 1


Just curious, I'm not overly familiar with D; however, if the entire set was iterated over, what determines the order? From the example above, you defined the elements in a different order (ala: int [string] x = ["two" : 2, "three" : 3, "one" : 1];) from the example provided. Just noticed by the fact that "one":1 was still 1.

If you didn't use a different set of code, could the exception be de-synchronizing the associative array (based upon the underlying implementation, of course)? I'm probably over-thinking this.

On a side note, why did you forgo a transformation on the foreach iteration of the associative array? Wouldn't it be easier, were you to omit the transactional flavor, to emit the proper CIL to handle a C#-ish foreach block (to get proper code flow on exceptions)?

Cristache said...

Good observation. The order is implementation-dependent:

In D.NET the associative array is a generic Dictionary while the native D compiler rolls its own hash container under the wraps.

The D language spec leaves the implementation of associative arrays up to the compiler.

Throwing an exception from the iteration loop does not break the internals of the array, if that's what you meant.

The transactional flavor was unintentional, and it is easy to eliminate. I am not sure that I understand what you mean by "forgo a transformation on the foreach iteration of the associative array".

Anonymous said...

For the most part, I meant in your CIL emission phase. Rather than utilize a helper class which uses a lambda-extract (based upon how I'm guessing you call it, for it to properly maintain scope. Though what I do know of D, sub-methods are simply a feature, so it was probably easy once that feature was implemented), you could have emitted CIL similar to how C# implements its for-each iterations on enumerable targets. Replacing references of 'key' and 'value', or whatever the named elements resolve to, with KeyValuePair<..., ...>.Key and ..Value respectively. This should be pretty simple once you properly resolve all symbols, right?

Also, couldn't there be issues with alternative implementations of core constructs (ie. associative arrays)? Especially when related to basic behavior, the code on two compilers would yield different functionality, ignoring the semantic differences between versions (such as version(D_NET)). Such functional differences would make code confusing in cases where you use conditional compilation arguments, since one would think that the native compiler and the D_NET compiler would emit functionally equal code.

Certain predictability assumptions would be more difficult to make until you became fully aware of the differences between the versions maintained in the code (for example, the order in which elements in an associative array might be important in certain cases, it's not related to a system feature either, so different functionality would be odd, to say the least).

Naturally, I might be missing something, I stumbled upon your blog while reading about 'D' from another link I received from someone else (so what I know about it is 5 minutes worth of reading, your blog was listed in a page about The Astoria Seminar).

If there's something I missed about D, implementing it, and so on, please clue me in (I'm interested in language design and implementation, code generation, and so on).

Cristache said...

It was easier to go the helper / delegate way because the front-end (which I did not write) does the lambda-extract already.

My first instinct when I sat down to implement foreach over associative arrays was exactly what you are suggesting -- but then I ran into the "foreach over opApply" language feature.

As for the ordering question: I believe that if some code depends on a particular order of the elements in the associative array then something is wrong with the design, and a different data structure should be used. In C++ for example, if you needed the elements to be ordered you would use a map over a hash_map (which BTW in C++ox will be renamed to unordered_map, to make it unambiguous that one should not rely on any particular ordering).

Feel free to email if you prefer that medium over the comment format.

Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...

Have you considered the fact that this might work another way? I am wondering if anyone else has come across something
similar in the past? Let me know your thoughts...