Sunday, July 17, 2011

ZeroBUGS Lambda Fun



A friend of mine tried to compile the code from my previous post using GCC, only to get a bunch of error messages. As I suspected, the fix was to specify -std=c++0x on the command line. But before answering my friend's G+ message, I had to verify that the code worked with GCC. And one thing lead to another. After compiling, I was curious to see how my ZeroBUGS debugger copes with lambdas. What else to do on a rainy Sunday afternoon in Seattle, other than playing with some old C++ code while listening to Judas Priest?

ZeroBUGS is a visual debugger for Linux, a project that ate up all of my spare time between 2004 and 2008. I kept making small changes to it since, but very limited in scope. For some reason, since the Big Recession I found myself having lesser and lesser time for on-the-side projects. I tried for a while making ZeroBUGS a commercial enterprise, hoping to leave my day time job(s) and become self employed. I learned the hard way that selling proprietary software to Linux geeks is not very lucrative. Or maybe I should have partnered with a savvy sales guy, the kind that can sell refrigerators to penguins.

In late 09 I put ZeroBUGS on ice and went working on Microsoft Windows for a short bit (just long enough to write a few lines of code that will hopefully make it into the upcoming Windows 8.)
After leaving Microsoft and joining TableauSoftware, I yanked the closed source product off my web site, and re-released ZeroBUGS as open source (and free as in free beer under the Boost License.)
I have not come upon major bugs in the debugger since a few years ago, when I discovered that the "step-over" functionality was broken for recursive functions.

So I was pretty confident the debugger will handle the new C++0X just fine. Except it didn't!

After some debugging, I traced the problem to the unnamed classes that the compiler generates to capture the surrounding variables. My debugger cashes data types by name for performance reasons. Unnamed classes normally occur in some scope, and thus there is no clash. Except that in the case of lambda functions, GCC generates unnamed classes at the outer most scope (i.e. the DWARF entries describing their type is at level 1, immediately nested in the compilation unit scope.) The data structures visualization was completely off, because the debugger used the wrong datatype (the first "unnamed" always won).

A simple hack that appends the file index and the line number to the offending "unnamed" solves the problem for now, as the snapshot above can testify.

While I think of a better solution this one will have to do. I am done with the computer for now, off to enjoy the weather and barbecue in the rain for the rest of the night!



Saturday, July 16, 2011

Template Template Method

I recently had to write a bunch of C++ functions that shared a common pattern:

1) acquire a resource lock
2) TRY
3) do some work
4) CATCH exceptions of interest, and log them
5) release resource lock

I was eager to implement the "do some work" bits but did not want to bore myself silly by repeating the steps 1, 2, 4, 5 in each function (about thirty or so of them.) Now, you may recognize the problem, it is what the Template Method Design Pattern solves: "avoid duplication in the code: the general workflow structure is implemented once in the abstract class's algorithm, and necessary variations are implemented in each of the subclasses."

I could not however apply the Template Method pattern "as is" because my functions did not share a common signature. So wrapping them with a non-virtual function and then implement the "do some work" as virtual methods would not work in my case.

One alternative was to code steps 1 and 2 above as a BEGIN_CALL macro, wrap up steps 4 and 5 into a END_CALL and decorate each of my methods with these archaic C-isms. The approach would indeed work for C, but it is utterly indecorous in C++.

The spirit of the Template Method pattern can be preserved very elegantly by making the template method a C++ template method, and wrap the variant "do some work" bits into a lambda block.

The code sample below illustrates the idea (I added some mock objects to give more context).

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

// Mock a resource that needs exclusive locking
//
// -- think CComCriticalSection for example:
// http://msdn.microsoft.com/en-us/library/04tsf4b5(v=vs.80).aspx
class Resource
{
public:
void Lock() { cout << "Resource locked." << endl; }
void Unlock() { cout << "Resource unlocked." << endl; }
};

// Generic stack-based lock. Works with any resource
// that implements Lock() and Unlock() methods.
// See:
// http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization
template<typename T>
class Lock
{
Lock(const Lock&); // non-copyable
Lock& operator=(const Lock&); // non-assignable

T& m_resource; // Lock MUST NOT outlive resource

public:
explicit Lock( T& resource ) : m_resource( resource )
{
m_resource.Lock( );
}

~Lock( )
{
m_resource.Unlock( );
}
};

/////////////////////////////////////////////////////////////////////////////
// Template Method Wrapper for an arbitrary lambda block:
// 1) lock a resource
// 2) log exceptions
// This is a variant of the Template Method Design Pattern
// -- implemented as a C++ template method.
template<typename F>
auto Execute(Resource& r, F f) -> decltype(f())
{
Lock<Resource> lock(r);
try
{
return f();
}
catch (const exception& e)
{
// log error
clog << e.what() << endl;
}
typedef decltype(f()) result_type;
return result_type();
}
/////////////////////////////////////////////////////////////////////////////

// Usage example:

static Resource globalResource;


int f (int i)
{
return Execute(globalResource, [&]()->int
{
return i + 1;
});
}

string g (unsigned j)
{
return Execute(globalResource, [&]()->string
{
ostringstream ss;
ss << '[' << j << ']';
return ss.str();
});
}

int main()
{
cout << f(41) << endl;
cout << g(42) << endl;

return 0;
}

Update: to compile the code above using GCC you may need to specify "-std=c++0x" on the command line.

One thing that I grappled with for a bit was how to make the Execute template function figure out the return type of the wrapped lambda. After pinging Andrei Alexandrescu at Facebook (or is it "on Facebook"? No matter -- my English as a second language works either way, because Andrei does work for Facebook) and some googling around, I found the magic incantation: decltype(f()).

Sunday, December 12, 2010

A Voyage to the Center of the Borg

A great while has passed since my last blog post. To my defense: I had been very busy playing Locutus of Borg. Microsoft bought out my employer back in 2007, and consequently I spent the past three years working in their Online Services Division, and Windows Server (yes, I did contribute a few lines of code to the upcoming version of Windows).

For months, if not years of my being a Redmondite I envisioned how I was going to blog about the strange and at times surreal experience of working for Microsoft. After all, I am the kind of geek who did not use Windows on his home machines. I only installed it after the acquisition, so that I could work from home every once in a while. I had imagined how the day would look like, when free of the Borg, I was going to report on how life on the inside felt like. I yearned to write about Microsoft politics, coding style and culture, the famous never-ending meetings and (some really amazing) people that I met there. I thought that I should document how their caste system (oops, I mean, career path model) twists people.

But now that I am on the outside, I cannot think of anything interesting to write about my days at Microsoft. Life goes on.

I'd rather blog on what a great product Tableau Software (my new employer) has. Or I could write about what an exciting weekend this one was, living on the web with my new Chrome OS laptop from Google (thanks again Sorin for the great surprise that got delivered Friday night!) Or I could write about enjoying the great read that the D Programming Language is.

Later.

Tuesday, June 23, 2009

The Power of Foreach

In D, arrays can be traversed using the foreach construct:


int [] a = [1, 5, 10, 42, 13];
foreach (i;a) {
writefln(“%d”, i);
}

The array of integers in this example is traversed and each element printed, in the natural order in which elements appear in the sequence. To visit the elements in reverse order, simply replace foreach with foreach_reverse. It is as intuitive as it gets.

Moreover, linear searches can be implemented with foreach: simply break out of the loop when the searched-for value is found:

foreach (i;a) {
writefln(“%d”, i);
if (i == 42) {
break;
}
}

Consider now a tree data structure where a tree node is defined as:

class TreeNode {
TreeNode left;
TreeNode right;
int value;
}

What is the meaning of a statement such as foreach (node; tree) { … }? The simple answer is that with the above definition of TreeNode, the code does not compile.

But if it were to compile, what should it do? Visit the nodes in order, or in a breadth-first fashion? No answer is the right one, unless we get to know more about the problem at hand. If we’re using a binary search tree to sort some data, then foreach would most likely visit the nodes in-order; if we’re evaluating a Polish-notation expression tree, we might want to consider post-order traversal.


Foreach Over Structs and Classes



Tree data structures and tree traversal occur often in computer science problems (and nauseatingly often in job interviews). Balanced binary trees are routinely used to implement associative containers, as C++ programmers are certainly familiar with the standard template collections set and map.

One difference between sequence containers (such as lists, arrays, and queues) on one hand, and containers implemented with trees on the other, is that there are more ways to iterate over the elements of a tree than there are ways to enumerate the elements of a sequence. A list (for example) can be traversed from begin to end and, in the case of double-linked lists, in reverse, from the end to the beginning; that’s it. But a tree can be traversed in order, in pre-order, post-order, or breadth first.

No built-in traversal algorithm will fit all possible application requirements. D’s approach is to provide the opApply operator as "standard plumbing" where users can plug their own algorithms for iterating over the elements of a class or struct. The operator is supposed to implement the iteration logic, and delegate the actual processing of the objects to a ... delegate:


class TreeNode {
public:
TreeNode left;
TreeNode right;
int value;
int opApply(int delegate(ref TreeNode) processNode) {
// ... tree traversal
// ...
return 0;
}
}

When the programmer writes a foreach loop, the compiler syntesizes a delegate function from the body of the loop, and passes it to opApply. In this example, the body of the delegate will have exactly one line that contains the writefln statement:

TreeNode tree = constructTree();
foreach(node; tree) {
writefln(“%d”, node.value);
}

For an in-order traversal, the implementation of opApply may look something like this:

int opApply(int delegate(ref TreeNode) processNode) {
return (left && left.opApply(processNode))
|| processNode(this)
|| (right && right.opApply(processNode));
}

The delegate that the compiler synthesizes out of the foreach body returns an integer (which is zero by default). A break statement in the foreach loop translates to the delegate function returning a non-zero value. As you can see in code above, a correct implementation of opApply should make sure that the iteration is "cancelled" when the delegate returns a non-zero value.

The traversal function’s argument must match the delegate argument type in the signature of opApply. In the example above the processNode function could modify the tree node that is passed in. If the TreeNode class writer wanted to outlaw such use, the opApply operator should have been declared to take a delegate that takes a const TreeNode parameter:

class TreeNode {
// …
int opApply(int delegate(ref const TreeNode) processNode) {
return processNode(this);
}
}

The new signature demands that the client code changes the parameter type from TreeNode to const TreeNode. Any attempt to modify the node object from within the user-supplied traversal function will fail to compile.

Another possible design is to encode all traversal algorithms as TreeNode methods. The following shows an example for the in-order algorithm (other traversal algorithms are left as an exercise for the reader):

class TreeNode {
// …
int traverseInOrder(int delegate(ref int) dg) {
if (left) {
int r = left.traverseInOrder(dg);
if (r) {
return r;
}
}
int r = dg(value);
if (r) {
return r;
}
if (right) {
r = right.traverseInOrder(dg);
if (r) {
return r;
}
}
return 0;
}
}

foreach(val; &tree.traverseInOrder) {
Console.WriteLine(val);
}


D Generators



A generator is a function or functor that returns a sequence, but instead of building an array or vector containing all the values and returning them all at once, a generator yields the values one at a time. Languages such as C# and Python have a yield keyword for this purpose. In D a generator can be implemented with foreach and a custom opApply operator. Assume one wants to print the prime numbers up to N, like this:

foreach (i; PrimeNumbers()) {
if (i > N) {
break;
}
writeln(i);
}

To make this work, the PrimeNumbers struct could be implemented like this:

struct PrimeNumbers {
int n = 1;
int primes[];

int opApply(int delegate(ref int) dg) {
loop:
while (true) {
++n;
foreach (p; primes) {
if (n % p == 0) {
continue loop;
}
}
primes ~= n;
if (dg(n)) {
break;
}
}
return 1;
}
}

Thursday, June 18, 2009

Pragma Assembly

Publishing the source code for D.NET on CodePlex in its current (rough) form turned out to be a great idea, as I received very good feedback. Tim Matthews of New Zealand has submitted several bug reports and patches and convinced me to change my stance on the "assembly class hack".

The hack was a very limited solution to the problem of qualifying imported declarations by the name of the assembly where they live. I described the problem and the attempt to hack around it at the end of a post back in December; a diligent reader commented that I should have used the pragma mechanism instead. I resisted the suggestion at the time, mainly because I am trying to avoid changing the compiler front-end if I can help it. (Front-end changes have to ultimately be merged back into Walter' s source tree, and he is a very very busy guy.)

A D implementation on .NET is not going to be of much use without the ability to generate code that imports and interfaces with existing assemblies. Guided by this idea, Tim Matthews did a lot of trail-blazing and prototyping and showed that because in .NET namespaces can span across assemblies, there has to be a way of specifying an arbitrary number of assemblies in one import file. My "assembly class" hack allowed for one and only one assembly name to be specified.

So I had to byte the bullet and do the right thing: assembly names are now specified with a pragma, like this:


pragma (assembly, "mscorlib")
{
// imported declarations here...
}

Any number of such blocks can appear within a D import file. And in the future, the code will be extended to allow a version and a public key to be specified after the assembly name.

Another thing that has to be fixed is to make the compiler adhere to the convention of using a slash between enclosing classes and nested types, as described in Serge Lidin's Expert .NET 2.0 IL Assembler, CHAPTER 7 page 139:
Since the nested classes are identified by their full name and their encloser (which is in turn identified by its scope and full name), the nested classes are referenced in ILAsm as a concatenation of the encloser reference, nesting symbol / (forward slash), and full name of the nested class


Tim also submitted a front-end patch that allows directories and import files of the same name to coexist, so that the code below compiles without errors:

import System;
import System.Windows.Forms;


An alternative solution that I proposed was to create import files named "this.d", so that the above would have read:

import System.this;
import System.Windows.Forms;


After some consideration, I bought Tim's point that this is not what .NET programmers would expect, and went on to apply his patch.

Sunday, May 10, 2009

D .NET on Codeplex

I will be taking a few days off next week and I decided to upload the code for the D compiler .net back-end on Codeplex before I close shop. I hope that it will provide context for the last few months worth of blog posts.

Most core language features are usable, but there's no Phobos port and if you need to import functionality from external DLLs, you'll have to hand-write some import files (following the model in druntime/import/System.di); I hope to get around to writing a tool that automates the process one of these days -- and it will most likely be in D.

There is a Visual Studio 8 (works with Visual Studio 8 Express) solution file in the source tree, and there is also a Makefile for the adventurous. The project compiles on Linux but it does not work quite well with Mono -- I strongly suspect it is Mono's segfault.

Check it out at http://dnet.codeplex.com/!

Tuesday, April 28, 2009

Argument against _argptr

Variadic functions work slightly different in my D.NET implementation than under the native D compiler.

For functions with variable numbers of arguments, the native compiler synthesizes two parameters: _arguments and _argptr; _arguments is an array of TypeInfo objects, and _argptr is a pointer to the beginning of the variable arguments on the stack. The user is supposed to query the type information in _arguments, and do the proper pointer arithmetic to navigate the arguments. You can see some examples at http://www.digitalmars.com/d/2.0/function.html:


void printargs(int x, ...)
{
writefln("%d arguments", _arguments.length);
for (int i = 0; i < _arguments.length; i++)
{ _arguments[i].print();

if (_arguments[i] == typeid(int))
{
int j = *cast(int *)_argptr;
_argptr += int.sizeof;
writefln("\t%d", j);
}
else if (_arguments[i] == typeid(long))
{
long j = *cast(long *)_argptr;
_argptr += long.sizeof;
writefln("\t%d", j);
}
// ...


The pointer arithmetic is not verifiable in managed code. A separate array of type descriptors is not necessary in .net, because the type meta-data can be passed in with the arguments.

In D.NET, the variable arguments are passed as an array of objects. For example, for a D function with the prototype
void fun(...)
the compiler outputs:

.method public void '_D23funFYv' (object[] _arguments)

I handled variadic support slightly differently from the native compiler: I dropped _argptr and provided a new helper function, _argtype, that can be used as demonstrated in this example:

void fun(...)
{
foreach(arg; _arguments)
{
if (_argtype(arg) == typeid(int))
{
int i = arg;
Console.WriteLine("int={0}".sys, i);
}
else if (_argtype(arg) == typeid(string))
{
string s = arg;
Console.WriteLine(s.sys);
}
}
}

If the type of the arguments is known, there is no need to check for the typeid:

void fun(...)
{
foreach(arg; _arguments)
{
int i = arg;
Console.WriteLine(i);
}
}

If an incorrect type is passed in, it is still okay, because the error is detected at runtime.

fun("one", "two", "three"); // int i = arg will throw


The downside of this approach is that it is not compatible with the native code. This does not affect template variadic functions, which should be perfectly portable.

Slice and D-ice

It is amazing how much insight one can get into a language by simply writing a compiler for it... Today I am going to spend half a lunch break copying and pasting into this post a stack of notes related to array slices (collected over the past few months of working on a .net back-end for the D compiler).

D sports a data type called array slice that is intended to boost the performance of array-intensive computations. A slice is a lightweight value type, conceptually equivalent to a range of array elements, or a "view" into an array. It can be thought of as consisting of a reference to an array, and a pair of begin-end indexes into the array:


struct (T) ArraySlice {
T[] a; // reference to array
int begin; // index where slice begins
int end; // one past index where slice ends
}

The actual representation of a slice is currently internal to the compiler, and completely opaque to the programmer. The template struct above is not what the layout of a slice actually looks like, it is intended for illustrative purposes only.

Consider this code:

int a[] = [1, 3, 5, 7, 9, 11, 13, 17];
int[] s = a[2..5];

The second declaration introduces "s" as a slice of array "a", starting at position two and ending at (but not including) the fifth element. Using the template pseudo-definition of ArraySlice, the code is conceptually equivalent to:

int a[] = [1, 3, 5, 7, 9, 11, 13, 17];
ArraySlice!(int) s = { a, 2, 5 };

To understand how array slices may help performance, consider an application that reads and parses XML files. The input can be loaded as an array of characters (a huge string). The application builds a DOM representation of the input, and each node in the DOM contains a token string. This approach is wasteful, because the token strings hold copies of character sequences that are already present in the input; copying tokens around has a linear complexity (it is directly proportional with the number of characters in the token) and the same is true for the spatial complexity (how much memory is being used). But XML tokens could be modeled as slices of character arrays ("views" into the original XML input string), and complexity in both time and space would drop down to a constant value.

This design can be implemented in other languages than D, but memory management issues may add unnecessary complexity. In C++ for example we'll have to make sure that the life-time of the token slices does not exceed the scope of the original input string. D belongs to the family of garbage-collected languages; by default, objects are reference-counted, and holding a reference to an array slice indirectly keeps the original array "alive", because the slice contains a reference to it.

Now that the design rationale behind array slices is understood, let's take another look at the syntax. You have probably noticed that in the statement:
int[] s = a[2..5];

The declaration part introduces "s" as an array of integers; it is not until you see the assignment that the lightweight, array slice, true nature of "s" is revealed.
D has no special syntax for disambiguating between "true" arrays and array slices in declarations; they can be used interchangeably. As a matter of fact, a function signature with array parameters will happily accept a slice as argument. In the following code, both "a" and "s" are legal arguments for the count function:

int count(int[] arr) {
return arr.length; // return the number of elements in arr
}
writefln("%d", count(a)); // ok
writefln("%d", count(s)); // also ok


Resizing Slices


Both arrays and slices support the built-in length property. As you expect, an expression such as a.length tells you how many elements are present in the array; the property applies to array slices as well, and in that case it gives the number of elements within the slice range. For example, the output of the following code is "3":

int a[] = [1, 2, 3, 5, 7, 9, 11, 13, 17];
int[] s = a[2..5];
writefln("%d", s.length); // prints 3

So far so good, but I forgot to mention that the length property is read-write: not only can you query the size of an array, you can also change it, like this:

a.length = 100; // extend the array to 100 elements

Assignment to the length property resizes the array. This begs the question: what happens when an array slice is being resized? The answer of course is "it depends".
With "a" and "s" defined as above, let's say we resize the "s" slice from 3 elements to 7:

s.length = 7;

This extends the view of "s" into "a" up to the ninth element of "a" ("s" starts at 2). It is as we have said:

s = a[2..9];

The slice is still within the bounds of the "a" array. Resizing it is a constant-time operation that changes one field inside the internal representation of "s". If instead of the built-in slice we had used the template ArraySlice struct introduced above, resizing the slice would have amounted to:

int a[] = [1, 3, 5, 7, 9, 11, 13, 17];
ArraySlice!(int) s = { a, 2, 5 };
s.end = 9; // resize the slice

Because "s" is simply a "view" of the array, modifying an element in the array is immediately reflected into the slice, for example:

writefln("%d %d", a[2], s[0]); // prints "5 5"
a[2] = 23;
writefln("%d %d", a[2], s[0]); // prints "23 23"

What happens if we re-size "s" past the end of "a"?

s.length = 100; // what does this do?

The answer is that the behavior is up to the compiler. The current native compiler from Digital Mars changes the type of "s" from a lightweight view into an array to a full-fledged array, and re-sizes it to fit 100 elements.

int a[] = [1, 2, 3, 5, 7, 9, 11, 13, 17];
int[] s = a[2..5];
writefln("%d %d", a[2], s[0]); // prints "5 5"
s.length = 100;
a[2] = 23;
writefln("%d %d", a[2], s[0]); // prints "23 5"

In other words, resizing a slice past the end of the original array breaks up the relationship between the two, and from that point they go their separate merry ways.
This behavior also underlines the schizophrenic nature of array slices that are not full copies of arrays, unless they change their mind.

Concatenating Slices


We saw how array slices may be re-sized via the “length” property. Slices may be re-sized implicitly via concatenation, as in the following example:

int a[] = [1, 2, 3, 5, 7, 9, 11, 13, 17];
int s[] = a[0..5];
s ~= [23, 29];

The tilde is the concatenation operator for arrays, strings and slices. In this example, the slice is resized to accommodate the elements 23 and 29. Note that even that in this situation the resizing is different from had we written:
s.length += 2;

Extending the length by two elements simply bumps up the upper limit of the slice (because there is still room in the original array, "a"). As we saw in the previous section, if the new length exceeds the bounds of the original array, the slice will be "divorced" from the original array, and promoted from a light-weight view to a full, first-class array. If we just extend the length by two elements, the bounds of "a" are not exceeded.

However, in the case of appending (as in s ~= [23, 29]) in addition to resizing we are also setting the values of two additional elements. The slice needs to be divorced from the array, so the a[5] and a[6] are not overwritten with the values 23 and 29. The compiler turns "s" into a full array of length + 2 == 7 elements, copies the elements of “a” from 0 to 5, then appends values 23 and 29.

The problem, as with resizing past the bounds of the original array, is that after the array and the slice part their ways, it is no longer possible to modify value types in the original array via the slice (which has now been promoted to a standalone array). This is a run time behavior, hard to predict by statically analyzing (or visually inspecting) the code.

Rolling Your Own Array Slices



It is impossible to determine statically whether a D function parameter is an array or a slice by examining the function's code alone. It is up to the function's caller to pass in an array or a slice.

void f (int[] a) {
// ...
}
int a[] = [1, 2, 3, 5, 7, 9, 11, 13, 17];
f(a); // call f with an array argument
f(a[2..5]); // call f with a slice argument;

In some cases you may want to better communicate that your function is intended to work with slices rather than arrays. You may also want to have better control over the slice's properties. Say for example you want to make sure that a slice is never re-sized.

You can accomplish these things by rolling your own ArraySlice. The template struct that was introduced earlier is a good starting point. The signature of "f" can be changed to:

void f (ArraySlice!(int) a) { // ...

That's a good start, but the struct is not compatible with a built-in array slice. The following code does not compile:

struct (T) ArraySlice {
T[] a; // reference to array
int begin; // index where slice begins
int end; // one past index where slice ends
}

int a[] = [1, 2, 3, 5, 7, 9, 11, 13, 17];
ArraySlice!(int) s = { a, 2, 5 };
foreach(i; s) { // error, does not compile
writefln("%d", i);
}

You could of course rewrite the foreach loop to use the begin..end range:

foreach(i; s.begin..s.end) {
writefln("%d", s.a[i]);
}

In addition to being more verbose such code is not very well encapsulated, since it explicitly accesses the struct's public members. If we later decide to factor out ArraySlice into its own module, and make the “a”, “begin”, and “end” members private, the code above will not compile anymore.

All that's preventing the compact version of the foreach loop from compiling is that the opApply operator is missing. So let's add one:

struct (T) ArraySlice {
// ...
int opApply(int delegate(ref int) dg) {
foreach (i;begin..end) {
dg(a[i]);
}
return 0;
}
}

Great! This gets us past the compilation error. The foreach loop now compiles and prints out all the elements in the slice. There is a small and subtle bug in this code though. Suppose that instead of printing all elements in the slice, you're doing a linear search, for example:

foreach(i; s) {
if (i == 5) { // found it!
break;
}
}

Astonishingly enough, this code will not break out of the foreach loop, instead it will continue through all the elements in the slice.

The http://www.digitalmars.com/d/2.0/statement.html website prescribes the behavior of the opApply operator:
"The body of the apply function iterates over the elements it aggregates, passing them each to the dg function. If the dg returns 0, then apply goes on to the next element. If the dg returns a nonzero value, apply must cease iterating and return that value".
The D compiler synthesizes the delegate function from the body of the foreach loop, and the code above is transformed internally to this equivalent form:

int dg(ref int i) {
if (i == 5) {
return 1;
}
return 0;
}
s.opApply(&dg);

The bug in the opApply operator is that the loop should be broken out of when dg returns non-zero:

int opApply(int delegate(ref int) dg) {
foreach (i; begin..end) {
if (dg(a[i])) break;
}
return 0;
}

Now foreach works correctly. To support foreach_reverse, just add a opApplyReverse member function to the ArraySlice template struct:

int opApplyReverse(int delegate(ref int) dg) {
foreach_reverse (i; begin..end) {
if (dg(a[i])) break;
}
return 0;
}

What about manipulating the length of the slice? Neither of the lines below compiles:

writefln("%d", s.length);
s.length = 100;

To support the length property, we have to add these methods:

struct (T) ArraySlice {
// ...
// return the length of the slice
int length() { return end - begin; }

// resize the slice, preventing it to grow past
// the original array's length
void length(int newLength) {
end = begin + newlength;
if (end > a.length) { end = a.length; }
}
}

To prevent resizing the slice, all there is to do is to leave the second overload undefined, effectively turning length into a read-only property.

Clients of the struct can still set its individual members to inconsistent values. To disallow incorrect usage, the "a", "begin", and "end" members can be made private (and the struct will have to move to its own module, because in D private access is not enforced if the class or struct lives in the same module as the client code).

To make the ArraySlice struct even more source-compatible with built-in slices, you can give it an opIndex operator:

struct (T) ArraySlice {
// ...
T opIndex(size_t i) { return a[i]; }
}

The opIndex operator allows this to work:
writefln("%d", s[3]);

Assignment to array elements is not allowed:

s[3] = 42; // error, does not compile

If you want the array elements to be modified via the slice like that, just define
opIndexAssign:

struct (T) ArraySlice {
// ...
void opIndexAssign(size_t i, T val) { a[i] = val; }
}

When you put it all together, the ArraySlice struct will look something like this:

struct (T) ArraySlice {
private:
T[] a; // reference to array
int begin; // index where slice begins
int end; // one past index where slice ends
public:
T opIndex(size_t i) { return a[i]; }
void opIndexAssign(size_t i, T val) { a[i] = val; }
int length() { return end - begin; }
// comment this function out to prevent resizing
void length(int newLength) {
end = begin + newlength;
if (end > a.length) { end = a.length; }
}

// support foreach
int opApply(int delegate(ref int) dg) {
foreach (i;begin..end) {
if (dg(a[i])) break;
}
return 0;
}
}

In conclusion, by rolling your own array slice implementation the intent of your code becomes clearer, your level of control over it increases, and you can still retain the brevity of the built-in slices.