Introduction to Cache Simulation with Simics

18.1 Introduction to Cache Simulation with Simics

By default, Simics does not model any cache system. It uses its own memory system to obtain high speed simulation, and modeling a hardware cache model would only slow it down.

Note: Although it is often said that Simics models a processor, such as an UltraSPARC III or an x86 P4, remember that Simics is an instruction-set simulator, not a processor simulator. Transactions coming out of a real x86 P4 processor have already gone through the L1 and L2 caches, so they consist mainly of cache misses to be fetched from memory. In Simics, all transactions performed by the processor execution core are made visible.

For simplicity and performance, Simics does not model incoherence. In Simics, the memory is always up to date with the latest CPU and device transactions. Memory accesses take no time to execute and are always atomic.

The possibility to observe and alter memory transactions (both in timing and in execution) makes Simics very suitable for cache simulation:

Cache Profiling: The goal is to gather information about the cache behavior of a system or an application. Unless the application runs on multi-processors, takes a lot of interrupts or runs a lot of system-level code, the timing of the memory operations is often irrelevant, thus no stalling is necessary. The timing-model interface is a good place to be informed of all transactions sent by the processor.
Note that this type of simulation does not modify the execution of the target program. It could be done by using Simics as a simple memory transaction trace generator, and then computing the cache state evolution afterward. However, doing the cache simulation at the same time as the execution enables a number of optimizations that Simics models make good use of.
Cache Timing: The goal is to study the timing behavior of the transactions, in which case a transaction to memory should take much more time than, for example, a transaction to an L1 cache. This is useful when studying interactions between several CPUs, or to grossly estimate the CPI of an application. To be able to stall the processor, a cache timing model should be connected to the timing-model interface. Simics models can be used for such a simulation.
This type of simulation modifies the execution, since interrupts and multi-processor interaction will be influenced by the timing provided by the cache model. However, unless the target program is not written properly, the execution will always be correct, although different from the execution obtained without any cache model.
Cache Content Simulation: It is possible to change Simics coherency model by allowing a cache model to contain data that is different from the contents of the memory. Such a model needs to use both the timing-model and the snoop-memory interfaces to properly handle the memory transactions (it must be able to change the values of loads and stores or to prevent their execution to main memory).
Note that this kind of simulation is difficult to do and requires a well-written, bug-free cache model, since it can prevent the target program from executing properly.

Simics comes with cache models that allow for the two first types of cache simulation:

g-cache is the standard cache model. It handles one transaction at a time in a flat way: all needed operations (copy-back, fetch, etc.) are performed in order and at once. The cache returns the sum of the stall times reported for each operation.

g-cache-ooo is a more complex model adapted to Simics MAI. It handles multiple outstanding transactions and keeps track of their current status (copy-back or fetch ongoing). g-cache-ooo is the standard cache model for Simics out-of-order; it is described in detail in Simics Micro-Architectural Interface.

The source code of these caches is available in [simics]/src/extensions/.

Previous - Up - Next