In the process of trying to get one of my Linux 2.0 file servers able to serve files via AppleTalk I discovered a bug in the Linux AppleTalk kernel code:
Restarting of the user-mode daemons
(atalkd and afpd) required rebooting of
the machine.
Because I don't want to continually have to reboot my machine as I make changes to the configuration of the daemons I developed a fix in the Linux kernel. For educational purposes, I describe that fix here.
Because I do not run modular kernels because of the security risks involved I had no simple fix that could be done from outside the kernel.
When debugging something like the kernel, it is usually helpful
to litter things with printk() calls to get an idea of
the flow of control.
I traced the error back in the user mode code to a bind
system call returning an EADDRINUSE error. So my first
step in isolating the problem was to put a distinctive
printk() in each place where the kernel code for
AppleTalk could return such an error.
After performing this experiment, I realized that the following code was where the error was being returned:
if (atalk_find_socket(addr)!=NULL) return -EADDRINUSE;
Apparently, the code was trying to bind to a socket that was already of the same address as one in the system.
So, my initial guess was that some sockets were being left over
even after the AppleTalk daemons exited. The immediate solution to
this problem was to allow user-mode code to clear out the socket
table. I knew that IPv4 forwarding could be controlled via the
/proc file system, so I wrote some skeletal code to
handle writes to a file
/proc/sys/net/appletalk/ddp_reset.
This required two things:
The body of the handler did the following:
while (atalk_socket_list != NULL) atalk_destroy_socket(atalk_socket_list);
I derived this code by studying the code in atalk_find_socket to realize that the socket list was implemented as a singly-linked list.
I compiled the kernel, rebooted, and tried to run my reset code.
It did clear out the socket table (which I found, via
printk() out was empty) but the problem of restarting
the AppleTalk daemons still existed.
To figure out what was going on I put some
logging in the loop of atalk_find_socket(). This
allowed me to see that during the first (and error-free) startup of
atalkd, the network and node addresses of the socket
being bound were zero.
During a failed attempt at starting atalkd, however,
the node was 21 and the network was 255. This caused a duplicate
socket name to be present because atalkd binds to
socket 6 (the Zone Information Protocol) twice.
It assumes the first time is network zero, node 0 and then it binds
the interface to a different network and node and re-binds port 6
under that address.
The problem is that on a subsequent atalkd start up,
the interfaces' address is not cleared back to zero when
atalkd exits.
After determining the problem, it was a trivial matter to write
some kernel code to clean the interface list when
dpp_reset is written to. The relevant code is as
follows:
struct atalk_iface *iface;
for(iface = atalk_iface_list; iface != NULL; iface = iface->next)
{
iface->status = 0;
iface->address.s_net = 0;
iface->address.s_node = 0;
}
This code must be done under the protection of CLI.
It simply walks the interface list (which is usually very short) and
it resets a few bytes to zero.
Adding a line to the shell script that brings up the AppleTalk servers to reset the DDP stack in the kernel before starting the daemons.
This allows them to be restarted at any time without resorting to a modular kernel (which I consider a security risk) or rebooting a critical server.