Last week, I wrote three blogs about the situation with starting child processes on Unix and being notified of their exit. I raised several problems with the current implementation, which I have tried to solve and I have now a proposal for. If you haven’t yet, you should take some time to read the previous three blogs:
- Part 1: Launching processes on Unix;
- Part 2: http://www.macieira.org/blog/2012/07/forkfd-part-2-finding-out-that-a-child-process-exited-on-unix/;
- Part 3: QProcess’s requirements and current solution;
The road so far
I explained in the first blog how one launches processes on Unix, by way of the fork and execve system calls, and the problem associated with file descriptors being inherited without closing in the child processes. I also showed how Linux has solved this problem. Since no other Unix system has yet done the same, they are excluded from going forward. They need to be brought into the 2010s first and leave the 1970s behind.
In the second blog, I went over the contortions required to be notified that a child process has exited, which uses the SIGCHLD signal. Designed in the early Unix times, signal handlers have conceptual problems with two modern requirements: libraries and multi-threading. And in the third blog, I explained the requirements that QProcess presents and how Qt has tried so far to solve those problems.
Unfortunately, there are two issues that can’t be solved. One is the race condition involved in the installation of the SIGCHLD signal handler, and the other is its uninstallation when the library is being unloaded. With the current API, unless I missed something, it’s not possible to do this cleanly. That leads me to the conclusion that signal handlers should really have been left in the 1970s.
The solution I propose
The solution I’d like to see implemented requires another change to the Linux kernel. Attentive readers may have guessed what I want by the title of the blog: I want a new system call named forkfd. Similar to additions to Linux like the signalfd, timerfd_create and eventfd, this would be a function that opens a new file descriptor.
Its man page would be something like the following:
- forkfd - create a child process and a file descriptor for being notified of its exit
- int forkfd(int flags, pid_t *pid);
- forkfd() creates a file descriptor that can be used to be notified of when a child process exits. This file descriptor can be monitored using select(2), poll(2) or similar mechanisms.
The flags parameter can contain the following values ORed to change the behaviour of forkfd():
- Set the O_NONBLOCK file status flag on the new open file descriptor. Using this flag saves extra calls to fnctl(2) to achieve the same result.
- Set the close-on-exec (FD_CLOEXEC) flag on the new file descriptor. This flag applies to the parent process side of the fork and new processes created after that. The child process created by forkfd() does not have this file descriptor open.
The file descriptor returned by forkfd() supports the following operations:
- When the child process exits, then the buffer supplied to read(2) is used to return information about the status of the child in the form of one siginfo_t structure. The buffer must be at least sizeof(siginfo_t) bytes. The return value of read(2) is the total number of bytes read.
- poll(2), select(2) (and similar)
- The file descriptor is readable (the select(2) readfds argument; the poll(2) POLLIN flag) if the child has exited or signalled via SIGCHLD.
- When the file descriptor is no longer required it should be closed.
RETURN VALUE On success, in the parent process forkfd() returns a new forkfd file descriptor and sets the PID of the child process to *pid; in the child process, it returns FFD_CHILD_PROCESS and sets *pid to zero. On error, -1 is returned and errno is set to indicate the error, with no process being created.
This solution has the following benefits:
- No signal handler installation or uninstallation is necessary, which avoids both outstanding unfixable issues;
- If no signal handler is needed, there is no need to start a thread for managing the child process status;
- Notification is sent via a read notification on a file descriptor, which all event-driven applications know how to handle, plus it matches the requirements that QProcess has in its own waitFor functions;
- The child process is automatically reaped by the read() call, which avoids the need to call wait or waitpid.
Implementing it in userland
I tried to implement the above function in userland, in pure C using only POSIX calls. The idea was that this code could be used in many different libraries to solve their process-management problems. I came up with three implementations:
The first attempt was a direct rewrite of the QProcess solution in C, using only POSIX calls. The code has a global pthread_mutex_t that protects a doubly-linked list of currently-running processes. It installs the SIGCHLD handler under a mutex lock, creates a pipe, and forks. In the child side of the fork, it closes the pipe and returns the magic constants. In the parent side, it adds the writing end of the pipe and the PID to the list, and returns the reading end.
Since I wrote this code before I realised the fatal flaw with SA_SIGINFO, this code is still using it. It writes the siginfo_t structure received in the signal handler to the process manager, by way of a private pipe. That one, in turn, will read the PID from the structure and proceed to write the structure again to the user, via the writing end of the pipe that was saved in the forkfd() call.
This code is fixable, by making the signal handler write one byte to the pipe (or use eventfd) and have the process manager thread loop over the currently-known child processes, calling waitpid on each and synthesising siginfo_t for the user.
Source code: same header, source code
The problem with the first implementation, besides relying on SA_SIGINFO, is that it requires pthreads and mutexes. I wanted something lock-free, so I started writing that. I wrote a lock-free structure to replace the doubly-linked list of PID and writing pipe pairs, based on previous experience with Qt’s lock-free timer ID allocator. I haven’t done exhaustive testing on it, but it’s simple enough that it’s probably correct (pending further reviews).
This solution is, unfortunately, still based on SA_SIGINFO (why? because I hadn’t realised it was a problem by then; I only did so when writing the blog). The way it works is that the signal handler will read from this structure and figure out, based on the PID that from siginfo_t, what the file descriptor to write is. The signal handler also does the necessary waitpid call to reap the child process.
Unfortunately, this solution doesn’t work, since it introduces several race conditions. One is an extremely rare situation, in which the library with this code is being unloaded in one thread while the signal handler is still running in another. More importantly, though, there’s a race condition between the time of the fork and the addition of the PID to the list of children. This condition didn’t happen before because of the mutex: the process manager thread would not read from the list until the forkfd function released the lock, after adding the child process.
This implementation is still salvageable, though. First, it needs to stop relying on SA_SIGINFO, which means it must iterate over all the known children inside the signal handler, doing waitpid calls on each. Second, with the absence of a lock, it must prevent the child process from exiting before its PID and pipe are added to the list. That can be done by adding an extra, blocking pipe between the parent and child process: the child process tries to read() from it, suspending itself, until the parent process releases it by writing something.
Adding a spin lock
Source code: same header, source code
The solution I ended up writing to the race conditions of the previous implementation was to add a spin lock (why this and not the pipe lock I described above? Because it hadn’t occurred to me until just now). It’s a step back from the fully lock-free solution, but not all the way back to the pthreads implementation. For one thing, it doesn’t start a thread for the process management. For another, since it implements the spinlock on its own, it can lock inside the signal handler (note that pthread_mutex_lock is not a permitted function inside one).
I just had to be careful about one thing: before locking the spin lock, the calling thread must block SIGCHLD using pthread_sigmask. If it didn’t do that, the signal handler could be called asynchronously in the same thread as the one where the spin lock is locked, producing a deadlock.
Choosing a solution
To be honest, none of the three solutions are the ideal ones. If I had to choose between one of them, I’d go for the lock-free one for personal reasons, but the spin-lock one might have fewer bugs in the threading code.
But that’s not what I want. What I really want is that forkfd be implemented in the kernel, so that no signal handler is involved, eliminating the unsolvable problems that those introduce.
If there are any kernel hackers listening in, do you think there’s a chance?