Memoization tabulates the results to the problems as they are generated. The table of stored solutions is consulted as each problem is presented; the problem needs to be solved only if the result to the problem cannot be found in the table.
The Dragon Book
All applications dynamically link against libraries because of myriad benefits of dynamically-linked libraries. However, dynamic linking costs performance as library symbols are read from memory and not written into the binary. These trampolines perform the same work for calls to the same target (read the same memory address to get the address of function, and the function address remains the same). In this work, we present a hardware solution to skip redundant trampolines and show how this improves the performance of applications.
Dynamic Linking Mechanism
Applications compiled against libraries do not include library source code. The compiler uses function prototypes from library headers to prevent potential errors from application linking against the library. The compiler creates a lookup table for all the library symbols that are used by the application. At launch time, the dynamic linker loads the library in memory and resolves the library symbols and populates the lookup table of the application.
For library functions, the compiler inserts trampolines that can read the function address from the lookup table and branch to the function. The compiler replaces the target of CALL instruction to library functions, with the CALL instruction to their respective trampolines.
All processors use branch predictors to speculatively fetch instructions by predicting the target of branches. In the case of library function calls, we observe that there are two simultaneous branches – CALL instruction and the INDIRECT BRANCH instruction. Our hardware modifies the behavior of branch predictor, such that the branch predictor predicts the target of the CALL instruction as the target of the INDIRECT BRANCH instruction. Thus, skipping the INDIRECT BRANCH instruction.
We use a hardware structure, called Alternate Branch Target Buffer (ABTB), that stores the address of functions indexed by the library function call targets, which are trampolines. When the branch resolver resolves the target of a call instruction as the trampoline, the ABTB changes the resolved address to the function address, tricking the branch predictor to set the target of function calls to actual function address instead of trampoline address.
Skipping a single instruction is typically not that beneficial, but the instruction skipped in this case is an expensive instruction that performs a branch and a memory load. Skipping the trampoline has microarchitectural benefits apart from reduced instruction count:
- Reduced pressure on D-$, D-TLB
- Reduced pressure on I-$, I-TLB
- Reduced pressure on branch predictor