Anyone trying to write a generic data structure in C will run across a few problems, namely how do you manage objects that you don't know anything about.

These were exactly the kinds of problems I was facing when I was 13 years old and trying to write a generic doubly-linked list for a computer simulation system I was writing.

The first problem I encountered was destruction. I had these routines of linked-lists that could handle any object. But if the objects in the list had linked-lists (or other dynamic data structures) as members, those lists would have to be cleaned up as well.

I wanted a system that would do this automatically. The most obvious solution was to add a function pointer to my linked list node structure that would be called to destroy each node. The overhead was only one pointer for each node. Quite a good tradeoff for automatic memory cleanup.

That scheme served me so well that I got the great idea of saving and loading linked-lists from disk files (a primitive database system). So I added function pointers to save and load nodes from disk and some external machinery to build a linked-list out of a file. The problem is that each node would take one pointer per operation of overhead. This is no longer an acceptable waste of space because each object could be carrying around several dozen function pointers. One possible solution (taken by some C++ compilers) is to give each object a pointer to an array of function pointers: The overhead for this solution was still only four bytes, this time it was a pointer to a structure containing the function pointers for the various actions.

[Structure pointing to array of function pointers]

The table could then be shared between multiple instances of that object type. The problem with this approach is that to treat objects in the same manner, the function pointer for a certain action must always reside at the same slot in the table, for example:

[Two objects supporting disjoint interface supersets]
Object A, which has lots of methods requires that the function table for Object B also be equally large. While it is possible to remove much of the empty space by pushing common routines closer towards the front of the array there are still many drawbacks to this approach:

Of course, there is a better way. Each object gets a single function to handle all of the work for every object of that type. The action requested of the object is specified with an integer identifier. This is similar to how the UNIX operating system handles the ioctl interface.

With this in mind I began to rewrite all of my data structure library routines around this concept. I wrote a header file called objects.h that acted as a simple object manager. The main points of interest in the file were the following declarations:


typedef struct object_tag OBJECT; typedef int (*METHOD)(OBJECT *obj, int msgid, ...); struct object_tag { METHOD service_function; }; #define GetMethod(o) (((OBJECT *)o)->service_function)
Because of the following rule of ANSI-C:
A pointer to a structure is equivalent to a pointer to the first member of the structure when casted to a pointer to the type of the first member. This rule holds recursively if the first element is a structure.
In plain English that means given a generic pointer (void *) that can point to any number of possible structures which all have the same first member, we can always get the first member. The GetMethod macro, defined above, will get a function pointer from an OBJECT.

We can then call this function pointer and command the object to do something, for example:


GetMethod(myShape)(myShape, DRAW_SHAPE, 100, 200);
The above can be read as "Send the DRAW_SHAPE message to the myShape object with the parameters 100 and 200."

So at this point all of my generic data structures contained an OBJECT structure as their first member (so the service function could be found). In addition, all of the objects that my generic data structures contained had the same structure. This scheme worked as designed, memory was cleaned up, complex structures could be written to disk with a single function call and the performance was actually quite good. Unfortunately, that solution is good in theory but cumbersome in practice:

Because of the above limitations I abandoned the notion of rewriting all of my data structure library code. One day I was having dinner with a friend of mine, Sean Connor, at TGI Fridays. The topic of conversation happened to drift towards the Amiga's operating system.

After discussing various IPC mechanisms of other operating systems and how the Amiga has two simple objects: A message port (receives messages) and a message (the item being transferred) I got a better idea:

What are we really doing? We are sending messages to objects. What are messages? Why, they are objects of course!
This certainly seemed like a much more powerful idea. So I went back to my computer after dinner and wrote up the following declarations:

typedef struct object_tag OBJECT; typedef struct message_tag MESSAGE; typedef unsigned long MSGID; typedef BOOL (*METHOD)(OBJECT *self, MESSAGE *msg); typedef struct { METHOD method; } TYPE; struct object_tag { TYPE *type; }; struct message_tag { OBJECT object; MSGID msg; }; #define GetMessage(m) (((MESSAGE *)(m))->msg) #define GetObject(o) (((OBJECT  *)(o)) #define CallMethod(o) (GetObject(o)->type->method)
A careful examination reveals that we have essentially packed the parameters into a structure (which happens to be called a message). The fact that a MESSAGE is also an OBJECT means that we can send messages to another message (if this sounds like distributed computing, then you have the idea). This scheme has some major advantages: Of course, there is still one thorn with this solution:

Coding message packers.

I am of the opinion that actions should be as simple as possible to state within a programming language. If every time you had to send a message to an object you needed to write the following code:

MYMESSAGE m1; m1.object.type = &msg_type; m1.msg = ANIMATE; m1.scene = GetCurrentScene(); m1.script = GetScript("CHEESE SHOP"); m1.bgcolor = BLACK; m1.music = "dadada.wav"; CallMethod(actor)(actor, &m1);
Chances are that you would spend more time concentrating on how to send the message rather than why you should send the message in the first place. I consider this a cardinal sin of any programming system. It leads to unclear, cluttered code (where its purpose for being doesn't "jump out" at you). Worse, it can derail your train of thought when writing the code. Definatly not a good thing.

To solve this problem really required that the computer build macros that do this for you. Of course, the ANSI C preprocessor is so limited that I decided to write a small "tool" to do it for me. Of course, the tool has been my major focus for several years now.

My first attempt at writing such a tool (the first in a long line of AMC implementations) was more like YACC. It compiled over a module, scanning for things that it could pick out from the source code and performed simple substitutions. It then ran the C compiler on that module for you. All of the C code that this version generated in the output was hard-coded in the AMC executable.

This version of the tool was written for OS/2. At the time I had a 486. I downgraded the 486 to a 386 running Linux and ported the tool over to POSIX.

With the temptation to tinker mounting; I decided to have a whirl at letting AMC generate the message numbers automatically. This proved to be a little more complicated and I felt that to ensure that no two messages had the same number AMC had to be in charge of the compilation process as well. I have never been happy with make or any of the "integrated development environments" that most people use to build software. I have worked on many large-scale (million line plus) software packages, and building them always seemed so difficult because of the tools we used.

Several months later (and into my sophomore semister at college), the first version of AMC (that resembles this version) was born.

Because (at the time) the company I was working for was using HP-UX boxes, I got in contact with their local dealer and picked myself up a HP 710/50 workstation for a reasonable price. This machine has been the development machine for all subsequent AMC versions up until v3.4. After v3.4 portability became very important to me, so I purchased as many machines with different processor types as I could.

At this point I intend to use AMC to write an operating system/compiler package. The operating system would be based on the object-oriented architecture described above. An object-oriented operating system is quite a bit different from a traditional one. For example, the file system is replaced with an OO "database engine." And memory management starts to extend to the point where objects that are in memory, on disk, out on a network, in a piece of hardware are all accessed in a uniform way.

The language would also be based on the object-oriented architecture. There would be very little syntax (similar to AMC in concept). Each syntactic element would be an object (that receives a Parse method). Simply name the construct, the OO "database engine" in the operating system retrieves that from a catalog somewhere and requests that that object parse its syntax from the input stream and generate the appropriate code in the output stream.

Such a system would be much more tightly integrated than current programming systems. In the example given above, the symbol table for the language is actually a "directory" or database index in the operating system. Of course, file system directories and symbol tables are actually very similar. In fact, the IFP language uses a filesystem as its symbol table (with the limitation that traditional filesystems have way too much overhead for that kind of data).


As I find free time to code more of this stuff, I will make it freely available just as with AMC.