The other day at work we encountered an
unusual exception in our nightly pounder test run after landing some
new code to expose some internal state via a monitoring API. The
problem occurred on shutdown. The new monitoring code was trying to
log some information, but was encountering an exception. Our logging
code was built on top of Python’s
logging
module, and
we thought perhaps that something was shutting down the logging system
without us knowing. We ourselves never explicitly shut it down, since
we wanted it to live until the process exited.
The monitoring was done inside a daemon thread. The Python docs say only:
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left."
Which sounds pretty good, right? This thread is just occasionally grabbing some data, and we don’t need to do anything special when the program shuts down. Yeah, I remember when I used to believe in things too.
Despite a global interpreter lock that prevents Python from being
truly concurrent anyway, there is a very real possibility that the
daemon threads can still execute after the Python runtime has started
its own tear-down process. One step of this process appears to be to
set the values inside globals()
to None
, meaning that any module resolution results in an
AttributeError
attempting to dereference NoneType
.
Other variations on this cause TypeError
to be thrown.
The code which triggered this looked something like this, although with more abstraction layers which made hunting it down a little harder:
try:
log.info("Some thread started!")
try:
do_something_every_so_often_in_a_loop_and_sleep()
except somemodule.SomeException:
pass
else:
pass
finally:
log.info("Some thread exiting!")
The exception we were seeing was an AttributeError
on the
last line, the log.info()
call. But that wasn’t even the
original exception. It was actually another AttributeError
caused by the somemodule.SomeException
dereference. Because
all the modules had been reset, somemodule
was None
too.
Unfortunately the docs are completely devoid of this information, at least in the threading sections which you would actually reference. The best information I was able to find was this email to python-list a few years back, and a few other emails which don’t really put the issue front and center.
In the end the solution for us was simply to make them non-daemon
threads, notice when the app is being shut down and join them to the
main thread. Another possibility for us was to catch
AttributeError
in our thread wrapper class – which is what
the author of the aforementioned email does – but that seems like
papering over a real bug and a real error. Because of this
misbehavior, daemon threads lose almost all of their appeal, but oddly
I can’t find people really publicly saying “don’t use them” except in
scattered emails. It seems like it’s underground information known
only to the Python cabal. (There is no
cabal.)
So, I am going to say it. When I went searching there weren’t any helpful hints in a Google search of “python daemon threads considered harmful”. So, I am staking claim to that phrase. People of The Future: You’re welcome.