Conman Laboratories

Better living through software …

Managing Magic Numbers

Mark Grosberg

The Problem

Programmers are often faced with the problem of maintaining “magic numbers,” or “code points.” The term “magic numbers” refers to any kind of internal identifier for an enumeration in a software system. Typical examples of magic numbers are:

As software grows in complexity, the task of managing the magic numbers becomes much harder. The first and most obvious problem is extensibility. As a simple example, let us assume a software system that is layered atop a portability insulator. The interface for the portability layer defines several error codes that are returned from operating system functions. For example (in C):

typedef enum
{
  ERR_OS_NONE = 0,
  ERR_OS_NO_MEMORY,
  ERR_OS_IO_FAILED,
  ERR_OS_FILE_NOT_FOUND,
  ERR_OS_BAD_PARAMETER
} ErrorCode;

As development progresses, modules are built upon the portability layer. For convenience, most routines return a value of type ErrorCode in case an operating system call fails. Eventually, some module will need to return either an error from the portability layer or some other internal error.

For example, a networking library may wish to indicate a failure from the portability layer or an internal error (such as a timeout, say ERR_NETWK_TIMEOUT). It is important that the value of ERR_NETWK_TIMEOUT is unique across the possible return values.

The Obvious Solution

The simplest solution is to simply extend the enumeration above to include the new codepoint:

typedef enum
{
  /* OS Errors */
  ERR_OS_NONE = 0,
  ERR_OS_NO_MEMORY,
  ERR_OS_IO_FAILED,
  ERR_OS_FILE_NOT_FOUND,
  ERR_OS_BAD_PARAMETER,

  /* Network Errors */
  ERR_NETWK_TIMEOUT,
  ERR_NETWK_NO_ROUTE,
  ERR_NETWK_SERVICE_UNAVAILABLE
} ErrorCode;

There are several disadvantages with this simplistic approach. The first problem is that modules are no longer independent. Each module requires that its error numbers be declared in this common location. For small systems, this really isn't much of a problem. For large systems the error list can become a point of contention and confusion.

Another problem is binary compatibility. Unless all modules are compiled against the same error definitions there can be problems with identification of errors. This problem becomes significant in systems where new code can be loaded dynamically.

Managing Magic Numbers With Dynamic Loading of Code

Until now, our solutions have been assuming a very traditional form of software development. A software system, consisting of one or more modules, is compiled and linked. Once built, the management of the error numbers is of no importance. Of course, not all systems are so simple.

For example, if the software being developed was the operating system then it may be possible for new code to be loaded (in the form of drivers, subsystems, etc) after the system is compiled. In this situation, the magic numbers become a much greater problem. Just imagine the kinds of errors a tape drive can produce; they are most likely very different from the kinds of errors a disk drive can produce.

The obvious solution is to add more structure to the error numbers. A very common approach is to reserve some number of upper bits to mean the module and the lower bits are unique to that module. This solution really doesn't solve the problem at all, it simply shifts the problem from maintaining error numbers to maintaining module numbers.

A Different Approach

The problem can be solved by expending a bit of time and space more effectively. If integers were, say 128 bits in size, it is mathematically probable that simply picking a random number will avoid a collision. This solution, while elegant and pure, is unwieldy from a coding standpoint requiring random number generation and large integer support.

The important thing to understand is that as the number of bits for an identifier increases the chance of a collision decreases. Although programmers rarely deal with large (128 bit) numbers, strings are commonplace. While not as efficient as small integers, strings provide a lot more headroom for structure. In fact, because of their variable length they are quite expandable.

Of course, for many applications strings are overkill. Nevertheless, there is one set of applications where strings are the only realistic solution: distributed systems. Moreover, with a little bit of trickery they can be made to perform with minimal overhead.

Improving Efficiency

Since most magic numbers are known at compile time, there is a technique we can apply to efficiently decide what identifier we have: perfect hashes. A perfect hash function is guaranteed to generate no collisions for a given set of keys. They are perfect for this kind of technique requiring only an iteration over they key string (not over the set of possible values).

The good thing about perfect hash functions is that they can be generated automatically. In fact, the GNU tool gperf will do this on most modern systems.

The interesting thing about perfect hash generators is that they seem to prefer strings with lots of structure (it allows for faster hash functions).

Magic numbers can then be given well structured names. For example:

package.path.module.codepoint

can then be used as an identifier of a particular code point. Although the string takes up more space we get the advantage of human readable identifiers which can be useful in many situations. Using a perfect hash function when code points must be tested from a set removes almost any speed penalty.

The solution of using strings as magic numbers can add a great deal of flexability to software. Besides making extensability easier using well structured magic numbers can also make debugging easier and increase the interoperability of different protocol versions.