Debugging Fawkes
Table of Contents
Here are some notes what you should mind when you debug Fawkes or plugins run in Fawkes. Of course you should have read about the known PadawanPitfalls before you start debugging.
If you start coding with Fawkes you should make yourself comfortable with the GNU Debugger (GDB) and Valgrind. These two tools are extremely powerful and important if you start debugging your application. Below are some general remarks for the tools. If you use GDB you should also have a look at the Data Display Debugger (DDD) which is a nice frontend for GDB. Although the GUI is not the most modern one it still makes debugging a lot easier.
GNU Debugger (GDB)
Fawkes has been programmed with GDB in mind, thus we avoided anything that could have made it unecessary awkward to use it like usage of internal signals. It is often easier to use DDD as the frontend to have two lists on the screen, the active threads and a backtrace. All commands that follow in this subsection are gdb commands unless stated otherwise.
Starting a debug run
At the very basic do the following to start Fawkes in GDB:
BASH Command:# gdb fawkes GNU gdb Red Hat Linux (6.6-42.fc8rh) Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"... Using host libthread_db library "/lib64/libthread_db.so.1". (gdb) run
This will get Fawkes running. To interrupt the run to execute GDB commands press Ctrl+C. Use the command cont to continue execution. Often it is easier to use the -C argument to run Fawkes to automatically cleanup any dead BlackBoard shared memory segments. If you debug your plugin and it segfaults for example you can also use the -p argument to automatically load it on startup so that you do not have to use the plugin tool or GUI all the time.
Identifying a particular thread
Since Fawkes is massively multi-threaded it can be challenging to identify the thread. To see all the threads currently running do
(gdb) info threads 8 Thread 1147169104 (LWP 24390) 0x000000361200a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 7 Thread 1136679248 (LWP 24389) 0x000000361200a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 6 Thread 1126189392 (LWP 24388) 0x000000361200a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 5 Thread 1115699536 (LWP 24387) 0x000000361200a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 4 Thread 1105209680 (LWP 24386) 0x00000036114cbd66 in poll () from /lib64/libc.so.6 3 Thread 1094719824 (LWP 24385) 0x000000361200a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 2 Thread 1084229968 (LWP 24384) 0x00000036114d567b in accept () from /lib64/libc.so.6 * 1 Thread 46912517298272 (LWP 24381) 0x00000036120076dd in pthread_join () from /lib64/libpthread.so.0
You see a list of currently running threads with a typical output of a freshly started Fawkes shown above. You see that everything is on hold waiting for action. In the very first column you see the gdb-internal thread number that you can use to access a specific thread. The asterisk marks the currently selected thread. After the address and "in" comes the function that the thread is currently in.
In general this does not help very much, because you do not exactly now what thread is behind which number. All threads have a name to make it easy finding this out. For this very purpose there is the Thread::__name internal variable. For this you step into a thread by doing thread N where N is the thread number shown in the info output and then step up until you are in thread and then print the variable. For example to do this for thread 8 above you would do:
(gdb) thread 8
[Switching to thread 8 (Thread 1147169104 (LWP 24390))]#2 0x000000000041c444 in FawkesThreadManager::wait_for_timed_threads (this=0x636870)
at /home/tim/robocup/fawkes/src/mainapp/thread_manager.cpp:477
477 wait_for_timed->wait();
(gdb) bt
#0 0x000000361200a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00002aaaaaaf99a9 in WaitCondition::wait (this=0x636b60, timeout_sec=0, timeout_nanosec=0) at /home/tim/robocup/fawkes/src/libs/core/threading/wait_condition.cpp:155
#2 0x000000000041c444 in FawkesThreadManager::wait_for_timed_threads (this=0x636870) at /home/tim/robocup/fawkes/src/mainapp/thread_manager.cpp:477
#3 0x0000000000413e20 in FawkesMainThread::loop (this=0x62c620) at /home/tim/robocup/fawkes/src/mainapp/main_thread.cpp:320
#4 0x00002aaaaaaf3877 in Thread::run (this=0x62c620) at /home/tim/robocup/fawkes/src/libs/core/threading/thread.cpp:763
#5 0x00002aaaaaaf3949 in Thread::entry (pthis=0x62c620) at /home/tim/robocup/fawkes/src/libs/core/threading/thread.cpp:498
#6 0x0000003612006407 in start_thread () from /lib64/libpthread.so.0
#7 0x00000036114d4b0d in clone () from /lib64/libc.so.6
(gdb) up
#3 0x0000000000413e20 in FawkesMainThread::loop (this=0x62c620) at /home/tim/robocup/fawkes/src/mainapp/main_thread.cpp:320
320 thread_manager->wait_for_timed_threads();
(gdb) up
#4 0x00002aaaaaaf3877 in Thread::run (this=0x62c620) at /home/tim/robocup/fawkes/src/libs/core/threading/thread.cpp:763
763 loop();
(gdb) print __name
$3 = 0x62c700 "FawkesMainThread"
So we see that in this case thread no. 8 is the main thread. In general if you run the very same setup several times thread should have the same ID, although this cannot be guaranteed.
Valgrind
Valgrind is a very powerful tool to identify memory problems and uninitialized variables in your code. A simple run should be of the form
valgrind --leak-check=full --show-reachable=yes ./fawkes
Again you should think about adding the -C and -p flags.
Debugging a dead lock
Here we are going to describe dead lock strategies. Currently only one is documented.
A dead lock can happen for example if there are two mutexes and two threads lock both of these mutexes. Now thread A locks mutex m1, at the very same time thread B locks mutex m2. Now A tries to lock m2 and B tries to lock m1, classic dead lock. None of the threads is able to aquire both mutexes. Of course the challenge is now to unify or protect the locks appropriately. But often it is not even easy to find the location where the mutexes are locked.
Identifying a lock holder
From the scenario described above we now want to identify the thread that locked a particular mutex. For this we enabled debug support for threading. This is done by adding -DDEBUG_THREADING to the CFLAGS_BASE for example in trunk/etc/buildsys_local/config_fawkes.mk. Then do a make clean; make in the src dir to rebuild the whole software (absolutely necessary!). Now the mutex contains more data and keeps track of which thread locked the mutex.
To identify the lock holder step into a locked thread. Then step up until you are in the mutex (one or two ups should do the trick). Then you can print the lock holder with
(gdb) print mutex_data->lock_holder
This will print the name of the thread. Although the name does not have to be unique usually it's a good indicator where to look at.

