As an alternative to gdb-remote, Simics comes with some symbolic debugging facilities of its own in the symtable module. It is less full-featured than GDB but is easy to use, and it can be scripted in Python.
Each processor in the simulated system has a current context, which represents the virtual address space currently visible to code running on the processor. This context is embodied by a context object. A context object has various attributes, such as virtual-address breakpoints and symbolic information for the address space (contained in a symtable object).
The correctness of the simulation does not depend on contexts in any way; the concept of multiple virtual address spaces is useful for understanding the simulated software, but not necessary for just running it. What contexts to create and how to use them is entirely your business; Simics does not care.
By default, each processor has the object primary-context as its current context. You may create new contexts and switch between them at any time. This allows you, for example, to maintain separate debugging symbols and breakpoints for different processes in your target machine. When a context is used in this manner (active when and only when a certain simulated process is active), the context is said to follow the process.
One handy tool when trying to make a context follow a simulated process is process trackers. A process tracker knows something about the target machine and its operating system—just enough to be able to tell when a given process is active (Simics itself knows nothing about the abstractions—such as processes—implemented by the simulated software). When listening to the haps triggered by the process tracker, switching contexts at the right moment is a breeze.
Simics comes with process trackers for some targets, but far from all. Chapter 21 describes process trackers in more detail, including how to build your own.
Here we inspect a user-space program—the zsh shell—running on a 32-bit PowerPC target. Two things are required for this session: a zsh binary built with debug info (see section 12.3.5), and its source code.
Start by creating a process tracker:
simics> new-linux-process-tracker kernel = ppc32-linux-2.4.17 Using parameters suitable for ppc32 Linux 2.4.17. New process tracker tracker0 created. simics> tracker0.add-processor cpu0 simics> new-context-switcher tracker = tracker0 New context switcher switcher0 created.
Note that we had to tell the process tracker which kernel we use—or rather, what we have to tell it is the value of a number of numerical parameters:
simics> tracker0.status Status of tracker0 [class linux-process-tracker] ================================================ Processors : cpu0 Processor type : ppc32 Process tracking parameters: ts_comm : 582 ts_next : 72 ts_next_relative : 0 ts_pid : 128 ts_prev : 76 ts_state : 0 ts_thread_struct : 624
ppc32-linux-2.4.17 is simply a convenient name for the set of parameters that work with the 32-bit PowerPC Linux 2.4.17 kernel; such predefined parameter sets exist for some of the kernels in the machines shipped by Virtutech. If you want to do process tracking on a kernel for which there is no such predefined parameter set, you will want to look up the process tracker's autodetect-parameters command in the Reference Manual.
We also created a context switcher. It handles the rather boring task of listening to the process tracker and actually switching contexts at the right moment, so that one context will follow the process that runs zsh. (Context switchers are covered in more detail in section 21.3.)
Now create the symbol table and load the symbols. Note that for this to work, the zsh binary must have been built with debug info.
simics> new-symtable zsh_sym Created symbol table 'zsh_sym' zsh_sym set for context primary_context simics> zsh_sym.load-symbols ~/zsh-4.2.3/Src/zsh found load segment at 0x10000000 [symtable] Symbols loaded at 0x10000000
Tell the context switcher to use a special context for the zsh process. Make sure that the new context uses the symbol table:
simics> switcher0.track-bin zsh zsh_context Context 'zsh_context' will be tracking the first process that executes the binary 'zsh'. simics> @conf.zsh_context.symtable = conf.zsh_sym
We would like to start debugging the program at the beginning of its main function. The symbol table can tell us where that is:
simics> psym main {int (int, char **)} 0x100001f8 simics> whereis (sym main) in main() at /home/jane/zsh-4.2.3/Src/main.c:92
sym is like psym, except that it only returns the value, and not its type—which is exactly what other commands are expecting as input.
Let us set a breakpoint at main, and let the simulation run:
simics> zsh_context.break -x (sym main) Breakpoint 1 set on address 0x100001f8 with access mode 'x' 1 simics> c
Note that as long as you do not execute a binary named "zsh", you can run whatever program you want without triggering this breakpoint. That is because it is set on the zsh_context context, which will not be activated until zsh is run:
$ ls / bin dev home lib mnt proc sbin usr boot etc host lost+found opt root tmp var $ sh -c 'echo foo' foo $ zsh
Now the simulation stops:
Code breakpoint 1 reached. main (argc=0, argv=0x0) at /home/jane/zsh-4.2.3/Src/main.c:92 92 { [cpu0] v:0x100001f8 p:0x079d41f8 stwu r1,-32(r1)
We can single-step through the code:
simics> zsh_context.step 93 return (zsh_main(argc, argv)); simics> zsh_main (argc=1, argv=0x7ffffe14) at /home/jane/zsh-4.2.3/Src/init.c:1205 1205 { simics> 1206 char **t; simics> 1209 setlocale(LC_ALL, ""); simics> 1212 init_jobs(argv, environ); simics> init_jobs (argv=0x1, envp=0x7ffffe14) at /home/jane/zsh-4.2.3/Src/jobs.c:1465 1465 {
(Just pressing Return at the prompt repeats the last stepping command.)
Note how the function setlocale was skipped. It is part of the C library linked into zsh, which was not compiled with line number information.
We can also examine the contents of variables (note that some C expressions must be quoted to prevent the command-line parser from trying to parse them):
simics> psym envp (char **) 0x7ffffe14 simics> psym "envp[0]" (char *) 0x7ffffefa "zsh"
Looking at the stack, we can see that we have made two function calls that have not returned since we started single-stepping:
simics> stack-trace #0 0x10043914 in init_jobs (argv=0x1, envp=0x7ffffe14) at /home/jane/zsh-4.2.3/Src/jobs.c:1465 #1 0x1003d394 in zsh_main (argc=1, argv=0x7ffffe14) at /home/jane/zsh-4.2.3/Src/init.c:1219 #2 0x10000220 in main (argc=1, argv=0x7ffffe14) at /home/jane/zsh-4.2.3/Src/main.c:93 #3 0x10129830 in __libc_start_main () in zsh #4 0x0 in ?? ()
There are other source code stepping functions besides step. next, for example, steps to the next source line without descending into function calls like step does. (This is exactly what happened when step skipped setlocale, but next will do this with every function call whether or not we have line number information for them.) And finish runs the simulation until the current function returns:
simics> zsh_context.finish zsh_main (argc=1, argv=0x7ffffe14) at /home/jane/zsh-4.2.3/Src/init.c:1219 1219 typtab['\0'] |= IMETA;
If Hindsight is enabled, all these stepping commands have reverse counterparts: rstep, rnext, and uncall.
The reason for this is that all but the simplest stepping commands rely on the stack pointer to be well-behaved – in particular, that it keeps pointing to the same stack. The presence of multiple threads – or multiple processes not hidden by a process tracker – breaks this assumption.
We saw earlier how sym could be used to set a breakpoint on a function. pos can be used to set a breakpoint on a source line:
simics> pos jobs.c:1465 268712212 simics> hex (pos jobs.c:1465) 0x10043914 simics> zsh_context.break -x (pos jobs.c:1465) Breakpoint 26 set on address 0x10043914 with access mode 'x' 26
It is also possible to set a breakpoint on data (a watchpoint). The following example sets a data breakpoint on the variable "argc", causing the simulation to stop whenever this variable is read from or written to. The second parameter is the extent of the breakpoint, in bytes.
simics> zsh_context.break -r -w (sym "&argc") (sym "sizeof argc") Breakpoint 27 set on address 0x7ffffd88, length 4 with access mode 'rw' 27
See section 12.1 for more information about how to use breakpoints.
Symbolic information is normally read from file using the <symtable>.load-symbols command as in the example above. Currently only ELF binaries can be used, and the debug info must be in the STABS format. Also, the files must be present on the host machine—Simics cannot read directly from the file system of the simulated machine.
Here are some things to think about when preparing a binary for debugging:
Sometimes it is desirable to read symbols from a source other than a binary file—perhaps all you have is a text file listing the symbols. The <symtable>.plain-symbols command reads symbols from a file in the output format of the BSD nm command. Example:
000000000046b7e0 T iunique 000000000062ba40 B ivector_table 00000000005a6338 D jiffies
The hexadecimal number is the symbol value, usually the address. The letter is a type code; for this purpose, D, B, and R are treated as data and anything else as code.
The symbols do not have any C type or line number information associated with them, but you will at least be able to print stack traces and find the location of statically allocated variables.
The process tracker and context switcher can without problem handle separate contexts (and symbol tables) for two or more processes at once:
simics> new-symtable encode_sym Created symbol table 'encode_sym' encode_sym set for context primary_context simics> encode_sym.load-symbols ~/sharutils-4.3.80/src/uuencode [symtable] Symbols loaded at 0x10000000 simics> switcher0.track-bin uuencode context = encode_context Context 'encode_context' will be tracking the first process that executes the binary 'uuencode'. simics> @conf.encode_context.symtable = conf.encode_sym
simics> new-symtable decode_sym Created symbol table 'decode_sym' simics> decode_sym.load-symbols ~/sharutils-4.3.80/src/uudecode [symtable] Symbols loaded at 0x10000000 simics> switcher0.track-bin uudecode context = decode_context Context 'decode_context' will be tracking the first process that executes the binary 'uudecode'. simics> @conf.decode_context.symtable = conf.decode_sym
Here, we have created separate contexts for the programs uuencode and uudecode, loaded their symbols into two symtables, and asked the context switcher to associate these contexts with the first processes that execute "uuencode" and "uudecode", respectively.
We would like to step through uudecode first:
simics> decode_context.step
The simulation just runs freely now, waiting for us to reach a source line while decode_context is active—which will happen as soon as we start uudecode:
$ ls -laR / | uuencode - | uudecode | wc
main (argc=0, argv=0x0) at /home/jane/sharutils-4.3.80/src/uudecode.c:432 432 {
We started four programs at once, here. First, ls prints a listing of every file on the target machine. This listing is fed to uuencode, which encodes it, then feeds the coded result to uudecode, which decodes it. The decoded file listing (which is identical to the original listing produced by ls) is then fed to wc, which counts the number of words in it and prints the result on the terminal.
This description makes it sound like the four programs are run in sequence, one after the other. This is not the case. They all run simultaneously—or rather, since this is a single-processor system, they run interleaved. It works more or less like this:
Things may not happen in exactly that order; the only constraint is that a process that is waiting for input cannot be run until some input is available. As long as the amount of data to be passed is large enough to fill the programs' output buffers several times over (and a directory listing of the entire file system should be large enough), execution will alternate between the different programs. So, when we reach the first source line in uudecode, uuencode should still be running.
Let's first step a few lines in uudecode:
simics> decode_context.step 433 int opt; simics> 437 program_name = argv[0];
This is just like when we were debugging a single program. Now, let's test our assumption by stepping to the next line in uuencode. The step command will let the simulation run until we reach a new line in that context, so we will either stop when uuencode gets to run again, or continue running forever if uuencode has already finished.
simics> encode_context.step try_putchar (c=34) at /home/jane/sharutils-4.3.80/src/uuencode.c:130 130 if (putchar (c) == EOF)
As expected, uuencode was still running. In fact, it was busy outputting text for uudecode when it was interrupted so that uudecode could be started.
If we step to the next line in uudecode again, we see that it has now run to the point where it was blocking, waiting for uuencode to produce more output:
simics> decode_context.step read_stduu (inname=0x100464ec "stdin", outname=0x7fffbcf8 "-") at /home/jane/sharutils-4.3.80/src/uudecode.c:126 126 if (fgets ((char *) buf, sizeof(buf), stdin) == NULL)
It is often useful to access data symbolically from Python scripts. Scripts access the debugging facilities using the symtable interface and attributes of the symtable class. These are documented in the Simics Reference Manual.
For instance, here is a short script to print out the contents of one of the linked lists that zsh uses. It uses the eval_sym function, which takes a C expression and returns a (type, value) pair. The expression parsed by eval_sym may contain casts, struct member selection and indexing.
eval_sym = SIM_get_class_interface("symtable", "symtable").eval_sym def eval_expr(cpu, expr): return eval_sym(cpu, expr, [], 'v') def ptr_str(typed_val): (type, val) = typed_val return "((%s)0x%x)" % (type, val) def print_linklist(list): cpu = current_processor() ll = eval_expr(cpu, list) first = eval_expr(cpu, ptr_str(ll) + "->first") l = [] def print_tail(node): type, val = node if val == 0: return # end of list type, val = eval_expr(cpu, ptr_str(node) + "->dat") type, val = eval_expr(cpu, ptr_str(("char *", val))) l.append(val) next = eval_expr(cpu, ptr_str(node) + "->next") print_tail(next) print_tail(first) print l
zsh uses these lists for lots of things, among them to store the directory stack. After having given the command pushd a few times on the zsh prompt, we can inspect the directory stack by stopping in the bin_cd function and printing the linked list "dirstack":
simics> zsh_context.break -x (sym bin_cd) Breakpoint 1 set on address 0x1000282c with access mode 'x' 1 simics> c Code breakpoint 1 reached. bin_cd (nam=0x300001a8 "/usr/bin", argv=0x0, ops=0x101b0000, func=2147481984) at /home/jane/zsh-4.2.3/Src/builtin.c:772 772 { simics> @print_linklist("dirstack") ['/var/tmp', '/tmp', '/usr', '/home', '/sbin', '/bin', '/root']