Massively Parallel CPUs

July 2020

Today's data centers are under constant pressure to achieve higher performance and improved energy efficiency. However, server processors have not made significant improvements in performance and efficiency in more than a decade. The majority of the energy spent by modern CPUs is not in the functional units, but in the peripherals responsible for fetching, decoding, and scheduling instructions and memory accesses. Hence, processor designs that can amortize this overhead across many concurrent threads of execution can achieve significantly higher levels of energy efficiency. The Single-Instruction-Multiple-Threads (SIMT) computational paradigm is a successful example of such a design. SIMT on an abundance of data-level parallelism amortizes instruction overheads across a large number of computations. In fact, the SIMT model has been successfully applied to the other big user of massive-scale computers, namely, high-performance computing (HPC), allowing this domain to take several phenomenal leaps forward over the past decade.

This project leverages the observation that busy high-performance servers frequently service similar or even identical requests concurrently and that, despite periodic control-flow divergence, server workloads repeatedly execute the same instruction sequences across requests. This leads to the conjecture that such request-level parallelism can be harnessed with a SIMT processor design to achieve orders of magnitude improvement in the efficiency of server processors. It is also observed that existing SIMT designs are inadequate in exploiting this server request-level parallelism effectively, and that significant innovation in many layers of the computing stack are required to fulfill this promise.

We conduct a detailed cross-layer study of the microarchitecture and memory system to develop a high-performance energy-efficient SIMT server architecture that meets the stringent quality of service demands of server workloads. The project targets designs that permit the adoption of this new type of server processor without a dramatic departure from modern software development practices. To achieve our vision, we develop innovations in SIMT microarchitecture (branch prediction, out-of-order execution, register files and other core parameters), memory subsystem (cache and memory hierarchy, virtual memory coherence, instruction and data prefetch), and operating system (vector system calls and thread scheduling).