Debugging a Hung Python Process in Production
When a production service freezes you cannot always rely on logs or breakpoints. Here is how I investigate hangs on CentOS 7.
General-purpose tools
These commands apply to more than just Python.
List processes:
ps auxf
See which system call a process is stuck in:
strace -p <pid>
Replace <pid>
with the actual process ID. The output shows the current syscall and file descriptor.
Look up the descriptor from the previous step:
ls -l /proc/<pid>/fd
The numbers correspond to file descriptors.
/proc/<pid>
contains lots of other useful metadata if you want to dig deeper.
List all open files for the process:
lsof -p <pid>
Inspect the Python stack
To see the exact line of Python code that is stuck, attach with gdb.
Alibaba Cloud’s CentOS mirrors do not ship debuginfo, so add the repo first:
sudo vim /etc/yum.repos.d/CentOS-Debug.repo
1 | Debug Info |
Install the tooling:
sudo yum install gdb
sudo yum install yum-utils
sudo debuginfo-install glibc
sudo yum install python-debuginfo
Attach to the running interpreter:
gdb python <pid>
Show the current source location:
py-list
Show the Python-level stack trace:
py-bt
References:
DebuggingWithGdb
用strace查找进程卡死原因
centos7 安装 debuginfo
使用 gdb 调试运行中的 Python 进程
通过/PROC查看LINUX内核态调用栈来定位卡死问题