Previous - Up - Next

17.3   Multiprocessor Simulation

Simics can model systems with several processors, each with their own clock frequency. In this case the definition of how long a cycle is becomes processor-dependent. Ideally, Simics would make time progress and execute one cycle at a time, scheduling processors according to their frequency. However, perfect synchronization is exceedingly slow, so Simics serializes execution to improve performance.

Simics does this by dividing time into segments and serializing the execution of separate processors within a segment. The length of these segments is referred to as the quantum and is specified in seconds (this is similar to the way operating systems implement multitasking on a single-processor machine: each process is given access to the processor and runs for a certain time quantum). The processors are scheduled in a round-robin fashion, and when a particular processor P has finished its quantum, all other processors will finish their quanta before execution returns to P. The length of the time quantum can be set by using the command cpu-switch-time. The argument to cpu-switch-time is specified in cycles (referring to the first processor in the system) rather than absolute time.

As in the single-processor case, instruction execution and latency are defined with execution modes and timing interfaces. Simics does not define the order in which the processors are serialized, which means that if causality is to be preserved, processor-to-processor communications must have a minimum latency of one quantum. Another consequence of serializing the execution is that Simics will maintain strict sequential consistency. However, through careful use of the memory hierarchy interface, the user can choose to simulate other consistency models.

As an example, consider a dual-processor system where the first processor runs at 4 MHz and the second at 1 MHz. Setting cpu-switch-time to 10 will give a quantum of 2.5 simulated microseconds. During each quantum, the first processor will execute 10 steps, and the second 2 or 3 steps, not necessarily in that order. Breakpoints do not affect this schedule, so that interaction remains non-intrusive.

Note that if you are single-stepping (step-instruction) on a processor P, which has just executed the last cycle of a quantum, the next single-step will cause all other processors to advance an entire quantum and then P will stop after one step. This behavior makes it convenient to follow the execution of instructions on a particular processor. You can use the <processor>.ptime command to see the flow of time on each particular processor in the simulated machine.

For a multi-processor simulation to run efficiently, the quantum should not be set too low, since a CPU switch causes simulator overhead. It should not be set below 10, and should preferably be set to 50 or higher. The default value is 1000. For a perfectly synchronized simulation, set the switch time to 1 (which will give a very slow simulation but is useful for detailed cache studies, for example). Note that all of the above remains essentially the same when running a distributed simulation (see next section).

Time events in Simics are executed when the processor on which they were posted run the triggering cycle during its quantum. However, it is possible to post synchronizing time events that will ensure that all processors have the same local time when the event is executed, independently of the time quantum. Synchronizing events can not be posted less than one time quantum in the future unless the simulation is already synchronized.

Simics MAI has limited support for multiprocessor simulation; processors are always scheduled in a round-robin fashion, one cycle at a time.

Let us have a look at a 2-machines setup containing two SPARC SunFire machine (with one processor each) to illustrate multiprocessor simulation. The processor in the first machine runs at 168MHz; the other runs at 56MHz (equal to 168/3). The time quantum (configured via the cpu-switch-time command) is 1000 cycles of the first processor, or 6 microseconds.

                                
  +----------------+    Copyright 1998-2005 by Virtutech, All Rights Reserved
  |   Virtutech    |    Version: 
  |     Simics     |    Build: 
  +----------------+
    www.simics.com      "Virtutech" and "Simics" are trademarks of Virtutech AB


simics> @conf.d1_cpu0.freq_mhz
168
simics> @conf.d2_cpu0.freq_mhz
56
simics> @conf.sim.cpu_switch_time
1000
simics> c 10000
[d1_cpu0] v:0xfffffffff0001364 p:0x1fff0001364  bne,pt %xcc, 0xfffffffff0001360
simics> ptime -all
processor                 steps             cycles    time [s]
d1_cpu0                   10000              10000       0.000
d2_cpu0                    3333               3333       0.000

While the first processor executed 10000 steps, the second processor completed 3333 steps, which corresponds to the ratio between the two frequencies (168MHz compared to 56MHz). Let us now examine the effects of the time quantum:

simics> c 30
[d1_cpu0] v:0xfffffffff0001364 p:0x1fff0001364  bne,pt %xcc, 0xfffffffff0001360
simics> ptime -all
processor                 steps             cycles    time [s]
d1_cpu0                   10030              10030       0.000
d2_cpu0                    3333               3333       0.000

Although the first processor ran 30 steps further, the second processor has not run the 10 steps that we would expect, and the frequency ratio is not respected anymore. This is the effect of the 1000 cycles time quantum: the first processor is scheduled for the next 1000 cycles and no other processor will be run until the quantum is finished. If we switch to the second processor and try to make it run one step further, we will observe the following:

simics> pselect d2_cpu0
simics> c 1
[d2_cpu0] v:0xfffffffff0001364 p:0x1fff0001364  bne,pt %xcc, 0xfffffffff0001360
simics> ptime -all
processor                 steps             cycles    time [s]
d1_cpu0                   11000              11000       0.000
d2_cpu0                    3334               3334       0.000

The second processor has run 1 step further as requested, but the first had to finish its time quantum before the second processor could be allowed to run, which explains its step count of 11000 compared to 10030 before. Let us now set the time quantum to 1:

simics> cpu-switch-time 1
The switch time will change to 1 cycles (for CPU-0) once all
 processors have synchronized.
simics> c 1
[d2_cpu0] v:0xfffffffff0001368 p:0x1fff0001368  nop
simics> ptime -all
processor                 steps             cycles    time [s]
d1_cpu0                   11000              11000       0.000
d2_cpu0                    3335               3335       0.000

Note that the new time quantum length will only become valid once all processors have finished their current time quantum. This is why stepping one more step forward with the second processor hasn't affected the first yet. Now let us select the first processor again, and run three steps:

simics> pselect d1_cpu0
simics> c 3
[d1_cpu0] v:0xfffffffff0001368 p:0x1fff0001368  nop
simics> ptime -all
processor                 steps             cycles    time [s]
d1_cpu0                   11003              11003       0.000
d2_cpu0                    3668               3668       0.000
simics> c 3
[d1_cpu0] v:0xfffffffff0001368 p:0x1fff0001368  nop
simics> ptime -all
processor                 steps             cycles    time [s]
d1_cpu0                   11006              11006       0.000
d2_cpu0                    3669               3669       0.000
simics>

All processors finished their 1000 cycles time quantum and started to run with the new 1 cycle value, which means that they are now advancing in lockstep. For every 3 steps performed by the first processor, the second executes 1.

Previous - Up - Next