A More Complex Cache System

18.4 A More Complex Cache System

Using g-cache, we can simulate more complex cache systems. Let us consider the structure described by figure 5.

Figure 5. A More Complex Cache System

What we want to simulate is a system with separate instruction and data caches at level 1 backed by a level 2 cache. We will connect this memory hierarchy to an x86 processor. The dotted components in the diagram represent elements that are introduced in Simics to complete the simulation. Let us have a look at these components:

id-splitter: This module is used by Simics to separate instruction and data accesses and send them to separate L1 caches.
splitter: Since we are simulating an x86 machine, accesses can cross a cache-line boundary. To avoid that, we connect two splitters before the caches. The splitters will let uncacheable and correctly aligned accesses go through untouched, whereas others will be split in two accesses.
trans-staller: The trans-staller is a very simple device that will simulate the memory latency. It will stall all accesses by a fixed amount of cycles.

In terms of script, this configuration gives us the following:

#
# Transaction staller for memory
#
@staller = pre_conf_object('staller', 'trans-staller')
@staller.stall_time = 200
#
# l2 cache: 512Kb Write-back
#
@l2c = pre_conf_object('l2c', 'g-cache')
@l2c.cpus = conf.cpu0
@l2c.config_line_number = 4096
@l2c.config_line_size = 128
@l2c.config_assoc = 8
@l2c.config_virtual_index = 0
@l2c.config_virtual_tag = 0
@l2c.config_write_back = 1
@l2c.config_write_allocate = 1
@l2c.config_replacement_policy = 'lru'
@l2c.penalty_read = 10
@l2c.penalty_write = 10
@l2c.penalty_read_next = 0
@l2c.penalty_write_next = 0
@l2c.timing_model = staller
#
# instruction cache: 16Kb
#
@ic = pre_conf_object('ic', 'g-cache')
@ic.cpus = conf.cpu0
@ic.config_line_number = 256
@ic.config_line_size = 64
@ic.config_assoc = 2
@ic.config_virtual_index = 0
@ic.config_virtual_tag = 0
@ic.config_replacement_policy = 'lru'
@ic.penalty_read = 3
@ic.penalty_write = 0
@ic.penalty_read_next = 0
@ic.penalty_write_next = 0
@ic.timing_model = l2c
#
# data cache: 16Kb Write-through
#
@dc = pre_conf_object('dc', 'g-cache')
@dc.cpus = conf.cpu0
@dc.config_line_number = 256
@dc.config_line_size = 64
@dc.config_assoc = 4
@dc.config_virtual_index = 0
@dc.config_virtual_tag = 0
@dc.config_replacement_policy = 'lru'
@dc.penalty_read = 3
@dc.penalty_write = 3
@dc.penalty_read_next = 0
@dc.penalty_write_next = 0
@dc.timing_model = l2c
#
# transaction splitter for instruction cache
#
@ts_i = pre_conf_object('ts_i', 'trans-splitter')
@ts_i.cache = ic
@ts_i.timing_model = ic
@ts_i.next_cache_line_size = 64
#
# transaction splitter for data cache
#
@ts_d = pre_conf_object('ts_d', 'trans-splitter')
@ts_d.cache = dc
@ts_d.timing_model = dc
@ts_d.next_cache_line_size = 64
#
# instruction-data splitter
#
@id = pre_conf_object('id', 'id-splitter')
@id.ibranch = ts_i
@id.dbranch = ts_d

@SIM_add_configuration([staller, l2c, ic, dc, ts_i, ts_d, id], None)

Once this is done, we can simply plug the id-splitter to the main memory:

@conf.phys_mem0.timing_model = conf.id

Note the way the penalties have been set: we don't use _next penalties but let the next level report penalties in case they are accessed. In this configuration, a read hit in L1 would take 3 cycles; a read miss that goes to memory would take 3 + 10 + 200 = 213 cycles if no copy-back is performed in the L2 cache. There's no best way to set up the penalties so it's up to you to decide how your model should behave.

Previous - Up - Next