Previous - Up - Next

13.1   Instruction Profiling


Note: Instruction profiling is not yet implemented for targets other than SPARC, PowerPC, x86, and MIPS, and is only available when Simics is started with -stall.

Simics can maintain an exact execution profile: every single taken branch is counted, showing you exactly what code was executed and how often branches were taken.

To get started with instruction profiling, type cpu0.start-instruction-profiling at the prompt (assuming the machine you are simulating has a processor called cpu0). This has the same effect as if you had executed the following two commands:

    simics> new-branch-recorder cpu0_branch_recorder physical
    simics> cpu0.attach-branch-recorder cpu0_branch_recorder
    

These commands create a branch recorder and attach it to the cpu0 processor. All branches taken by cpu0 will now be recorder in cpu0_branch_recorder.

You need to use these separate commands instead of start-instruction-profiling if you want a branch recorder with a particular name, or if you want to record virtual instead of physical addresses.

A branch recorder remembers the source and destination address of each branch, as well as the number of times the branch has been taken. It does not remember in which order the branches happened, so it cannot be used to reconstruct an execution trace (if you want that, you have to use the trace module, which is slower and generates much more profiling data). There is however enough information to compute a number of interesting statistics. To get a list of what the branch recorder can do, type:

    simics> cpu0_branch_recorder.address-profile-info
    cpu0_branch_recorder has 6 address profiler views:
    View 0: execution count
       64-bit addresses, granularity 4 bytes
    View 1: branches from
       64-bit addresses, granularity 4 bytes
    View 2: branches to
       64-bit addresses, granularity 4 bytes
    View 3: interrupted execution count
       64-bit addresses, granularity 4 bytes
    View 4: exceptions from
       64-bit addresses, granularity 4 bytes
    View 5: exceptions to
       64-bit addresses, granularity 4 bytes
    

An address profiler is an object whose data can be meaningfully displayed as a count for each address interval. The output of the last command indicates that cpu0_branch_recorder is an address profiler with six separate views; i.e., there are six separate ways of displaying its data as counts for each four-byte interval.

View 0 is the execution count per instruction, conveniently represented as one count per four-byte address interval since the instructions of the simulated machine (a SPARC in this case) are all four bytes long and aligned on four-byte boundaries. Views 1 and 2 count the number of times the processor has branched from and to each instruction, except when the branch is caused by an exception (or exception return); those branches are counted separately by views 4 and 5. View 3, finally, counts the number of times an instruction was started but not completed because an exception occurred.

When you are done recording for the moment, type:

    simics> cpu0.detach-branch-recorder cpu0_branch_recorder
    

to detach the branch recorder; it will stop recording new branches, but the already collected branches will remain in the branch recorder (until you type cpu0_branch_recorder.clean at the prompt). You can reattach it again at any time:

    simics> cpu0.attach-branch-recorder cpu0_branch_recorder
    

Section 13.3 explains how to access the recorded data.

13.1.1   Virtual Instruction Profiling

Branch recorders can record virtual instead of physical addresses; just say so when you create them:

    simics> new-branch-recorder ls_profile virtual
    

Once it has been created, the new branch recorder object behaves just the same as a physical profiler would, except for one thing: when you want a physical profile, you typically expect the profiler to collect statistics from the whole system, but when you want a virtual profile, you probably are interested in one process (or the kernel) only.

To be consistent with the name of the branch recorder we just created, let us collect a virtual profile of a run of the ls program. The problem is to have the branch recorder attached when ls is running, and detached when other processes are running. This is the same problem that we had in chapter 12.3, when we wanted a context object to be active precisely when a given process was active.

Unfortunately, we cannot use the same convenient tools for branch recorders as we did for contexts, since Simics does not yet support that. We can use the process tracker directly, though, with Python scripts very similar to those in section 21.2:

    def exec_hap(user_arg, tracker, tid, cpu, binary):
        if binary.endswith("ls"):
            def set_profiler(user_arg, tracker, tid, cpu, active):
                if active:
                    cpu.branch_recorders = [conf.ls_profile]
                else:
                    cpu.branch_recorders = []
            SIM_hap_add_callback_obj_index("Core_Trackee_Active", tracker,
                                           0, set_profiler, None, tid)
    SIM_hap_add_callback_obj("Core_Trackee_Exec", conf.tracker0,
                             0, exec_hap, None)
    

This script assumes the existence of a process tracker called tracker0, and the branch recorder ls_profile that we created above.


Note: Remember to use the CLI command <tracker>.activate (or, equivalently, call the activate function of the tracker interface) before trying to use a tracker.

First, it listens for the Core_Trackee_Exec hap, which is triggered when any process in the system calls the exec system call. When that happens, and the binary being executed is called "ls", the script starts listening for the Core_Trackee_Active hap for that process, attaching the branch recorder ls_profile to the processor when the process becomes active, and detaching it when the process becomes inactive.

Previous - Up - Next