Tuesday, October 02, 2007

The Strange Tale of Two Exits

Since my early days as a C programmer, I have been always confused by standard library functions that came in two flavors, just one leading underscore apart (as in open and _open). A simple rule of thumb, “choose either one for the job at hand”, makes life easier (or almost).

There are exceptions to this rule. For example, what is the difference between _exit and exit?

If you answered “unlike exit, _exit doesn't cleanup global objects” then congratulations, you may skip this article; go do something fun. But if you thought that _exit is the ISO standard synonym for exit then I beg you to read on.

Both functions will cause the application to… well, exit and the operating system reclaims all resources held by the process (memory, file handles, and so on).

The underscore variant does just that, without further ado. The non-underscore form, however, does a little bit more: before yielding control back to the operating system, it loops through all the functions that have been previously registered with atexit, and calls them in the reverse order of their registration.

Let’s take a look at the following C program:

1 #include <stdio.h>
2 #include <stdlib.h>
4 void finish(void)
5 {
6 const char msg[] = "Good Bye.\n";
7 write(1, msg, sizeof msg);
8 }
10 int main()
11 {
12 atexit(finish);
13 fprintf(stdout, "returning from main...\n");
14 exit(0);
15 }

Line 14 is equivalent to saying “return 0”; when the function main() returns, control is transferred to the C runtime which invokes exit(), passing it the return code.

The output of this program is:

returning from main...
Good Bye.

If the exit call in line 14 is replaced with _exit, the finish function will not be called, and the “Good Bye” message does not show in the output.

But how does this old C stuff affect us, C++ programmers? The short answer is that global objects (which live outside function main(), are cleaned up by functions registered with atexit().

Let's consider the following code:

1 #include <stdio.h>
2 #include <unistd.h> // use <stdlib.h> on Windows
4 struct Fred
5 {
6 ~Fred()
7 {
8 fprintf(stderr, "Good Bye\n");
9 }
10 };
12 Fred fred;
14 int main()
15 {
16 fprintf(stderr, "Returning from main...\n");
17 exit(0);
18 }

Again, line 17 is equivalent to saying “return 0”; this program behaves the same as our first example. If we change line 17 to read _exit(0) we may observe that the string “Good Bye” is never printed.

“But we are not registering anything with atexit!” you may say. True, we are not, but the compiler is, on our behalf. Because an instance of Fred is being constructed outside of main(), the compiler generates some cleanup code, arranging for the destructor to be invoked as if it were registered with atexit.

And you might not even be aware that you are calling exit; maybe all you do is invoke a third-party library, or a library function developed by some other person or group in your company; that function may call exit when it encounters an error.

This may become a serious issue in a multithreaded program that has global objects, if any thread calls exit(). This may lead to the untimely demise of a global object that another thread is depending upon (such as a global mutex wrapper class instance).

No comments: