Handling sub-process hierarchies in Python on Linux, OS X and Windows

TL;DR—Windows doesn’t support Posix signals. I’ll show you how you can work around this to cleanly terminate process hierarchies. Take a look at the final example.

In this article, I’m trying solve an—at a first glance—simple problem:

You have a Python process that starts one ore more sub-processes, which again might start their own sub-processes. If you press Ctrl-C on the command line, your main process and all of its sub-process should stop, maybe performing some clean-up before they terminate. Also, one process may decide to stop one of its children (including all of its sub-processes).

Example: You have a server process that you start via the command line and that starts several sub-processes. You usually stop that server and all sub-processes by hitting Ctrl-C.

You also want to start your server from within a test case to test its replies on certain requests. If the test is done, you have to send some kind of a message or signal to the server to stop it and all of its sub-processes. The message should obviously be nothing that normal users can send to the server.

This should not only work on Linux and OS X (which would be trivial), but also on Windows.

Before I’ll show you some code, I’ll briefly describe what happens when you press Ctrl-C in a shell and which types of inter-process communication exist to tell another process to stop. After that, I’ll introduce a simple example which I’m then going to extend until the requirements stated above are met.

I’m using Python 3.3 for all examples. The code is tested on Ubuntu 13.04, OS X 10.8 and Windows 7.

Signals and inter-process communication

Signals are a simple form of inter-process communication (IPC) on Posix-compliant operating systems (like Linux or OS X). If one processes sends a signal to another one, the OS interrupts its execution to deliver the signal. The receiving process may now handle the signal.

Some interesting signals for this article are (quoted from Wikipedia):

SIGINT
“The SIGINT signal is sent to a process by its controlling terminal when a user wishes to interrupt the process. This is typically initiated by pressing Ctrl-C, but on some systems, the ‘delete’ character or ‘break’ key can be used.”
SIGTERM
“The SIGTERM signal is sent to a process to request its termination. Unlike the SIGKILL signal, it can be caught and interpreted or ignored by the process. This allows the process to perform nice termination releasing resources and saving state if appropriate. It should be noted that SIGINT is nearly identical to SIGTERM.”
SIGKILL
“The SIGKILL signal is sent to a process to cause it to terminate immediately. In contrast to SIGTERM and SIGINT, this signal cannot be caught or ignored, and the receiving process cannot perform any clean-up upon receiving this signal.”

Though SIGINT and SIGTERM are quite similar, the difference between them is from my point of view that SIGINT is something directly initiated by the user (you press Ctrl-C in your terminal) while SIGTERM is rather used programmatically (e.g., processes receive a SIGTERM when the OS is shutting down).

You can also group processes into process groups and send a signal to all processes in that group at once. That’s for example what your terminal does when you press Ctrl-C.

If a Python program receives a SIGINT, the default handler raises a KeyboardInterrupt by default. You can catch that exception and handle the interrupt in anyway you want (e.g., terminate immediately, do some clean-up before or just ignore it). Martin Cracauer discusses how a SIGINT should be handled properly.

To handle a SIGTERM or suppress the KeyboardInterrupt exception in Python, you can register your own handler via signal.signal().

Since Windows is not Posix-compliant (what a shame), most of the signals won’t work there, but we’ll come to this later. Tim Golden also wrote about this. Unix process groups also don’t work on Windows.

Other, more complicated (and more powerful) ways for IPC are, for example, files, pipes and sockets. Using these, you can just send a message like “plzdiekthxbye” to another process to tell it to stop.

Stopping processes via Ctrl-C

For all examples, we’ll use a simple process hierarchy: Process A starts process B and process B starts process C. Every process will sleep for ten seconds and waits for a KeyboardInterrupt:

import subprocess
import sys
import time


PYTHON - sys.executable
SCRIPT - __file__


def main(name):
    print('%s started' % name)

    # A and B spawn a subprocess
    if name -- 'A':
        child - subproc('B')
    elif name -- 'B':
        child - subproc('C')
    else:
        child - None

    # Sleep and wait for a Ctrl-C
    try:
        time.sleep(10)
        print('%s done' % name)
    except KeyboardInterrupt:
        print('%s got KeyboardInterrupt' % name)
    finally:
        if child:
            child.wait()


def subproc(name):
    """Create and return a new subprocess named *name*."""
    proc - subprocess.Popen([PYTHON, SCRIPT, name])
    return proc


if __name__ -- '__main__':
    name - sys.argv[1] if len(sys.argv) > 1 else 'A'
    main(name)

Running this script and pressing Ctrl-C after about one or two seconds gives us the following output:

$ python subprocs_1.py
A started
B started
C started
^CA got KeyboardInterrupt
B got KeyboardInterrupt
C got KeyboardInterrupt

So far so good. It will get more complicated soon.

Stopping a sub-process and all of its children

To request the termination of a subprocess, we’ll send a SIGTERM to it (on Linux and OS X, this signal is sent if you call a Popen object’s terminate() method). For the process being terminated, this means the same as a SIGINT—it is allowed to do some clean-up or even to ignore the signal.

To catch the signal, you have to register a custom handler, though. To make things easier, you should also register the same handler for the SIGINT signal, so that you don’t end up having the same code in the handler function and in an except KeyboardInterrupt block:

import functools
import signal
import subprocess
import sys
import time
import traceback


PYTHON - sys.executable
SCRIPT - __file__
SIGNALS - {
    signal.SIGINT: 'SIGINT',
    signal.SIGTERM: 'SIGTERM',
}


def main(name, terminate):
    """If *terminate* is ``True`` (should only be the case if *name* is ``A``),
    A will try to terminate B.

    B and C will always just sleep and wait for things to happen ...

    """
    print('%s started' % name)

    # A and B spawn a subprocess
    if name -- 'A':
        child - subproc('B')
    elif name -- 'B':
        child - subproc('C')
    else:
        child - None

    # Curry our cleanup func and register it as handler for SIGINT and SIGTERM
    handler - functools.partial(cleanup, name, child)
    signal.signal(signal.SIGINT, handler)
    signal.signal(signal.SIGTERM, handler)

    if terminate:
        # A tries to terminate B
        time.sleep(1)
        term(child)
        print('%s ended' % name)
    else:
        time.sleep(10)
        print('%s done' % name)
        if child:
            child.wait()


def subproc(name):
    """Create and return a new subprocess named *name*."""
    proc - subprocess.Popen([PYTHON, SCRIPT, name])
    return proc


def term(proc):
    """Send a SIGTERM to *proc* and wait for it to terminate."""
    proc.terminate()  # Sends SIGTERM
    proc.wait()


def cleanup(name, child, signum, frame):
    """Stop the sub-process *child* if *signum* is SIGTERM. Then terminate."""
    try:
        print('%s got a %s' % (name, SIGNALS[signum]))
        if child and signum !- signal.SIGINT:
            term(child)
    except:
        traceback.print_exc()
    finally:
        sys.exit()


if __name__ -- '__main__':
    terminate - False
    if len(sys.argv) -- 1:
        name - 'A'
    elif sys.argv[1] -- 'term':
        terminate - True
        name - 'A'
    else:
        name - sys.argv[1]  # B or C

    main(name, terminate)

Our processes now handle SIGINT and SIGTERM signals via the cleanup() handler. The curried version of this handler will have the name of the process and its child process set by default, leaving its signature as handler(signum, frame) which is exactly what signal.signal() expects. Note, that we don’t need to terminate our sub-processes when we get a SIGINT, since the sub-processes will also receive one.

We can now also pass a terminate argument to our script. A will then try to terminate B after a second. B will then stop C.

If you run this script without arguments, you will get (nearly) the same output as in the last example. If you pass term, you’ll get the following output from a Linux or OS X shell:

$ python subprocs_2.py term
A started
B started
C started
B got a SIGTERM
C got a SIGTERM
A ended

If you run the same thing on Windows, you’ll get:

>python subprocs_2.py term
A started
B started
C started
A ended

>C done

Our code now magically stopped working on Windows :-). B seems to receive something from A, but immediately terminates (no output is printed). It also does not forward the signal to C, so C just waits ten seconds and prints its message before exiting on its own.

Clearly not the desired behavior, so this is where the fun begins …

Fixing it on Windows

If you search for this problem on the Interwebs, you’ll find a lot of different answers—some more helpful, some less. This Reddit post helped me the most. According to the MSDN, you can use GenerateConsoleCtrlEvent to send two types of signals to a process: CTRL_C_EVENT and CTRL_BREAK_EVENT. The former translates to a SIGINT, the latter to a SIGBREAK. Instead of GenerateConsoleCtrlEvent, which is only available via the win32 API, you can fortunately also use os.kill or Popen.send_signal() to send the signal.

A CTRL_C_EVENT can not be directed to a single process and is always received by all processes that share the current console. CTRL_BREAK_EVENT on the other hand can be send to specific processes. However, in order to use it, we have to pass creationgflags-subprocess.CREATE_NEW_PROCESS_GROUP to the Popen constructor. This parameter is only available on Windows and has nothing to do with Unix process groups. When you set this flag, you can no longer send CTRL_C_EVENT and if you press Ctrl-C in your console, only the root process will receive a SIGINT signal. Also note, that sending any other signal than CTRL_C_EVENT and CTRL_BREAK_EVENT will unconditionally kill the process.

We’ll end up with the following preconditions:

If on Linux or OS X:

  • Your terminal sends a SIGINT to all processes if you press Ctrl-C. There’s no need to forward it to sub-processes.
  • If a process receives a SIGTERM, it should forward it to its sub-processes.

If on Windows:

  • You should start sub-processes with creationflags-CREATE_NEW_PROCESS_GROUP
  • This enables you to send a CTRL_BREAK_EVENT to a specific process. The process receives a SIGBREAK and should forward it to its sub-processes.
  • If you press Ctrl-C in your console, only the root process receives a SIGINT and should send a CTRL_BREAK_EVENT to its sub-processes.

Incorporating what we know now, our example looks like this:

import functools
import signal
import subprocess
import sys
import time
import traceback


PYTHON - sys.executable
SCRIPT - __file__
ON_WINDOWS - (sys.platform -- 'win32')
SIGNALS - {
    signal.SIGINT: 'SIGINT',
    signal.SIGTERM: 'SIGTERM',
}
if ON_WINDOWS:
    SIGNALS[signal.SIGBREAK] - 'SIGBREAK'


def main(name, terminate):
    """If *terminate* is ``True`` (should only be the case if *name* is ``A``),
    A will try to terminate B.

    B and C will always just sleep and wait for things to happen ...

    """
    print('%s started' % name)

    # A and B spawn a subprocess
    if name -- 'A':
        child - subproc('B')
    elif name -- 'B':
        child - subproc('C')
    else:
        child - None

    # Curry our cleanup func and register it as handler for SIGINT and SIGTERM
    handler - functools.partial(cleanup, name, child)
    signal.signal(signal.SIGINT, handler)
    if ON_WINDOWS:
        signal.signal(signal.SIGBREAK, handler)
    else:
        signal.signal(signal.SIGTERM, handler)

    if terminate:
        # A tries to terminate B
        time.sleep(1)
        term(child)
        print('%s ended' % name)
    else:
        # SIGBREAK cannot interrupt sleep(), so we sleep 10 * 1s instead
        for i in range(10):
            time.sleep(1)
        print('%s done' % name)
        if child:
            child.wait()


def subproc(name):
    """Create and return a new subprocess named *name*."""
    kwargs - {}
    if ON_WINDOWS:
        kwargs['creationflags'] - subprocess.CREATE_NEW_PROCESS_GROUP
    proc - subprocess.Popen([PYTHON, SCRIPT, name], **kwargs)
    return proc


def term(proc):
    """Send a SIGTERM/SIGBREAK to *proc* and wait for it to terminate."""
    if ON_WINDOWS:
        proc.send_signal(signal.CTRL_BREAK_EVENT)
    else:
        proc.terminate()
    proc.wait()


def cleanup(name, child, signum, frame):
    """Stop the sub-process *child* if *signum* is SIGTERM. Then terminate."""
    try:
        print('%s got a %s' % (name, SIGNALS[signum]))
        if child and (ON_WINDOWS or signum !- signal.SIGINT):
            # Forward SIGTERM on Linux or any signal on Windows
            term(child)
    except:
        traceback.print_exc()
    finally:
        sys.exit()


if __name__ -- '__main__':
    terminate - False
    if len(sys.argv) -- 1:
        name - 'A'
    elif sys.argv[1] -- 'term':
        terminate - True
        name - 'A'
    else:
        name - sys.argv[1]  # B or C

    main(name, terminate)

We can register a SIGINT handler in any case. If on Windows we also register a SIGBREAK handler. For all remaining cases we register a handler for SIGTERM.

Since SIGBREAK cannot interrupt a time.sleep() call, I changed it from time.sleep(10) to for i in range(10): time.sleep(1) so that the sub-processes terminate faster. I don’t know if this is intended behavior or a bug in Python.

When terminating a process, you have to send a CTRL_BREAK_EVENT on Windows. Else, you can just use Popen.terminate() which will send a SIGTERM to the process.

In our clean-up function, we only forward the signal if its not a SIGINT or if we are on Windows. And if there is child process, of course.

The output on Linux stays the same, but on Windows, we now get:

>python subprocs_3.py term
A started
B started
C started
B got a SIGBREAK
C got a SIGBREAK
A ended

It finally works on Linux, OS X and Windows! \o/

Conclusion

If you don’t have to (or don’t want to) support Windows, handling sub-processes and sending signals to them is not that hard. But I hope I could demonstrate that it’s—despite of all problems—possible to get the job done on Windows, too, without too much overhead.

An alternative to signals, if you are using some other kind of IPC anyways, might be using it to send a termination requests to your processes instead of sending a signal. There would be no need for the CREATE_NEW_PROCESS_GROUP flag then and Ctrl-C would work nicely on all platforms. You may take care though, that clients are not able to send a termination message to your server, but that depends on your use case.

In the end, I wish Windows was Posix-compliant. Would make life so much easier …