Distributed Memory: Exceptions and their treatment

Saturday, May 19, 2007

Exceptions and their treatment

Depending on your preferred language, exceptions can vary from being scary black magic and an indication that all has gone horribly wrong ('C', even with Windows Structured Exception Handling), through to being familiar to the point of contempt (Java, C#).

As a rule of thumb, exceptions should indicate that, well, something exceptional has happened -- not that a user has said "No" rather than "Yes" at a particular interaction, but rather that a part of the infrastructure has not lived up to its expected contact (memory cannot be allocated; a network connection cannot be made). There is, even so, a grey area in the middle -- things like "File not found", where the low level code is reacting helplessly to the caller having not lived up to its side of the bargain -- where an exception generalises the concept of an error code.

When should an exception be raised (i.e. when should a throw statement appear in code)? When something happens that the code at this point cannot sensibly deal with because it can't see the big picture.

When should it be caught? Two answers here -- first, when the code reached by the stack unwinding knows enough about what it is doing to be able to respond sensibly to the problem ("This file does not exist, please choose another"); secondly, when the code knows enough about what is going on that it can describe the error in a better fashion to the code higher up -- a process of catch, augment (or wrap), and re-throw.

So, what happens in the middle?

In some cases, what has happened is that the system has gone so horribly wrong that we might as well throw up our hands, and let the process terminate; in others, what has happened is essentially trivial (like file not found) so we really want to be able to pick up and continue.

How do we characterise the behaviour of the system, then, when we perform an operation that may throw?

There is an established terminology for this. We say that an operation has a basic level of exception safety if the remains in a consistent and usable state (no leaks, all objects valid) after the exception. The operation is strongly exception safe if after the exception is handled, the system has returned to the state before the operation began.

And some operations can be classed as not failing (and without them we would indeed be building on sand). Destructor/deallocator and assignment/swap operations must fall into this class -- without them we cannot reliably or safely perform any clean-up or recovery.

RAII (or using, or try/finally) is the idiom that (with no-fail deallocation) most cleanly supports the two weaker types of behaviour : this is where we can prevent resources leaking, and restore objects to a consistent state (if not always the original one).

The easiest way of ensuring strong safety, i.e. a roll-back to the original state is to do the all the operations with temporary variables. Only when all the tricky work has been done, do we mutate the external state, using (no-fail assignment or swap operations) -- any exception that could happen, happens before the objects are changed from their original pristine state, and roll-back is a no-op.

Working to improve the resilience of code under exceptions is another good reason to strive to separate responsibilities within the code -- a routine that has many effects, especially if some are dependent on others, can be more difficult to separate out into fallible and safe sections. As always, this is an ideal to be striven for, rather than an absolute -- consider the operation of popping a stack in C++, where the stack is mutated and an object (rather than an object reference) is copied.