GDB: find the thread which has locked the mutex

Q. In Linux if a multi threaded code seems hanged how to find which thread has locked the concerned mutex?
Ans:
1. Attach the gdb to the concerned process.

$sudo gdb -p pid

2. Get the information of all the running threads.

(gdb) info threads
 ......
 20 Thread 0x7f3804dde700 (LWP 19453) "XYZ" 0x00007f38db3afb9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
 19 Thread 0x7f37f82dd700 (LWP 19454) "XYZ" 0x00007f38db3afb9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
 18 Thread 0x7f37f7adc700 (LWP 19455) "XYZ" 0x00007f38db3af6dd in accept () at ../sysdeps/unix/syscall-template.S:81
 17 Thread 0x7f37f70d0700 (LWP 19460) "XYZ" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 16 Thread 0x7f37f68cf700 (LWP 19461) "XYZ" 0x00007f38dbcecc0b in __memp_get_bucket () from /usr/lib/x86_64-linux-gnu/libdb_cxx-5.1.so
 15 Thread 0x7f37f60ce700 (LWP 19463) "XYZ" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 14 Thread 0x7f37f58cd700 (LWP 19464) "XYZ" 0x00007f38db3af7eb in __libc_recv (fd=27, buf=0x7f37f58ccd88, n=4, flags=-616892437)
 at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
 13 Thread 0x7f37f50cc700 (LWP 19466) "XYZ" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 12 Thread 0x7f37f48cb700 (LWP 19467) "XYZ" 0x00007f38db3af7eb in __libc_recv (fd=30, buf=0x7f37f48cad88, n=4, flags=-616892437)
 at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
 .......

3. Lets say thread 17 is not able to lock a mutex. Now check the details of the thread 17.

(gdb) thread 17
(gdb) where
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f38db3aa664 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f38db3aa4c6 in __GI___pthread_mutex_lock (mutex=0x7f37e8004da8) at ../nptl/pthread_mutex_lock.c:114
#3  0x000000000041602f in LinuxMutex::lock (this=0x7f37e8004da0) at linux/mutex.cpp:34

Print the details of the locked mutex. In my case Mutex I’m trying to lock is of type LinuxMutex which in turn is a wrap over class around pthread_mutex_t. So I’m diplaying the class object, which has a private member variable m_mutex, which is of type pthread_mutex_t . In your case, you can directly print the pthread_mutex_t variable itself.

(gdb) p *((LinuxMutex *) 0x7f37e8004da0)
$4 = { = {_vptr.Mutex = 0x433130 }, m_mutex = {__data = {__lock = 2, __count = 1, __owner = 19461, __nusers = 1, __kind = 1,
      __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\001\000\000\000\005L\000\000\001\000\000\000\001", '\000' ,
    __align = 4294967298}}

The pthread_mutex_t variable has __data.__owner variable, which indicates the thread ID of the thread which has currently locked the mutex.
In our case the thread which has locked the mutex is 19461.

4. Check again the info threads to find out the thread which has locked the mutex.

(gdb) info threads
 ......
 20 Thread 0x7f3804dde700 (LWP 19453) "XYZ" 0x00007f38db3afb9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
 19 Thread 0x7f37f82dd700 (LWP 19454) "XYZ" 0x00007f38db3afb9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
 18 Thread 0x7f37f7adc700 (LWP 19455) "XYZ" 0x00007f38db3af6dd in accept () at ../sysdeps/unix/syscall-template.S:81
 17 Thread 0x7f37f70d0700 (LWP 19460) "XYZ" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 16 Thread 0x7f37f68cf700 (LWP 19461) "XYZ" 0x00007f38dbcecc0b in __memp_get_bucket () from /usr/lib/x86_64-linux-gnu/libdb_cxx-5.1.so
 15 Thread 0x7f37f60ce700 (LWP 19463) "XYZ" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 14 Thread 0x7f37f58cd700 (LWP 19464) "XYZ" 0x00007f38db3af7eb in __libc_recv (fd=27, buf=0x7f37f58ccd88, n=4, flags=-616892437)
 at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
 13 Thread 0x7f37f50cc700 (LWP 19466) "XYZ" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 12 Thread 0x7f37f48cb700 (LWP 19467) "XYZ" 0x00007f38db3af7eb in __libc_recv (fd=30, buf=0x7f37f48cad88, n=4, flags=-616892437)
 at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
 .......

So here thread 16 is the thread which has locked the mutex needed by the thread 17.

Multithreading: Errno 9 (EBADF) – Bad File Number

If you’re randomly hitting “Bad File Number” error (Errno: 9) in a multithreaded application on Unix, then most likely you’re trapped in “Opened once but closed twice” bug.

What is this “Opened once but closed twice” bug?

Check out the below code:

foo ()
 {
     fd = open();
     // some processing.
     close (fd);
     // some processing.
     close (fd);
 }

The above code will work smoothly for a single threaded application. But in case of multithreading, if a thread “X” call open() (either in same function foo or any other function bar), when thread “Y” is in between the two close() calls, then in that case, thread “X” will get the same fd for open which thread “Y” was using (since open always return the smallest unused descriptor). File descriptors are shared by all the threads. So when thread “Y” carries out its second close(), it actually closes the file descriptor of thread “X”, which was valid & in use.

FD numbers are shared by all the threads of the same process. That means when thread “Y” closes FD number “n”, twice. Then in the second case of closing descriptor “n”, can result closing in use descriptor “n” by thread “X”. When the thread “X” starts using the descriptor “n”, it gets error – “Bad File Number” (Errno: 9).

Thus closing a descriptor twice in multithreaded application leads to random “Bad File Number” error (Errno: 9).