Debugging a Hung Python Process in Production

When a production service freezes you cannot always rely on logs or breakpoints. Here is how I investigate hangs on CentOS 7.

General-purpose tools

These commands apply to more than just Python.

List processes:

ps auxf

See which system call a process is stuck in:

strace -p <pid>

Replace <pid> with the actual process ID. The output shows the current syscall and file descriptor.

Look up the descriptor from the previous step:

ls -l /proc/<pid>/fd

The numbers correspond to file descriptors.

/proc/<pid> contains lots of other useful metadata if you want to dig deeper.

List all open files for the process:

lsof -p <pid>

Inspect the Python stack

To see the exact line of Python code that is stuck, attach with gdb.

Alibaba Cloud’s CentOS mirrors do not ship debuginfo, so add the repo first:

sudo vim /etc/yum.repos.d/CentOS-Debug.repo

1
2
3
4
5
6
7
8
#Debug Info
[debug]
name=CentOS-$releasever - DebugInfo
baseurl=http://debuginfo.centos.org/$releasever/$basearch/
gpgcheck=0
enabled=1
protect=1
priority=1

Install the tooling:

sudo yum install gdb
sudo yum install yum-utils
sudo debuginfo-install glibc
sudo yum install python-debuginfo

Attach to the running interpreter:

gdb python <pid>

Show the current source location:

py-list

Show the Python-level stack trace:

py-bt

References:

DebuggingWithGdb
用strace查找进程卡死原因
centos7 安装 debuginfo
使用 gdb 调试运行中的 Python 进程
通过/PROC查看LINUX内核态调用栈来定位卡死问题