Spring 2018 :: CSE 502 — Computer Architecture

Software Setup

Each student enrolled in the class will be given a Linux virtual machine on a departmental teaching cluster with the basic required software installed. You will have root access to this machine, and be able to install additional tools (editors, debuggers, etc) as you see fit.

You are also welcome to install the needed software on your own machine. The course staff will provide you with necessary source files, but is not available to help you debug your personal laptop configuration.

You will need to install Verilator, qemu-sparc, gtkwave and the SPARC cross-compiler toolchain.

  • For Verilator, you can download the code from its website and compile it.
  • You can install qemu-spark and gtkwave using apt-get (or the equivalent package manager of your Linux distribution), or compile them from source.
  • For the cross compiler, you will need to compile it from source. The course staff will provide you with a tar ball of the necessary source files to compiler.

Good citizenship. You have administrator access to this VM and can install anything you like. That said, to keep load down, DO NOT INSTALL A GRAPHICAL DESKTOP on the VM. You may tunnel the X protocol (using ssh -Y or ssh -X) to your local machine and display your editor in a window.

Accessing Your VMs

Student VMs are already provisioned with the basic set of software packages needed for the lab assignments, including Verilator, qemu-sparc, gtkwave and the SPARC cross-compiler toolchain. You can access your VMs over SSH. Your account's username is student but there is no password associated with that account. Instead, you will use public key authentication to access the VMs.

You will receive your VM's IP address and authentication key pairs in a private email. The SSH server on the VM is configured to use port 130. Assuming you have saved your private key as vm506_id, you can SSH to your VM using the following command:

$ ssh -X student@<your_VM_IP_address> -i vm506_id -p 130

Git

The files you will need for the labs are distributed using the Git version control system. Git is a powerful, but tricky, version control system. We highly recommend taking time to understand git so that you will be comfortable using it during the labs.

We recommend the following resources to learn more about Git:


SystemVerilog

Hardware designers usually use a Hardware Description Language (HDL) to describe their designs in such a way that is amenable to automatic translation to hardware using the so called "Synthesis Tools". HDLs are used for many tasks in a hardware design flow, including hardware description, testing and verification. In this course you will use an HDL for describing your design, and perhaps writing some test cases.

You will implement your designs in a subset of the SystemVerilog HDL. Although it is a hardware description language, SystemVerilog has many features that make it resemble high-level programming languages such as C or C++. Many of such advanced features, however, are primarily intended for testing and verification, and not hardware description. In this course, we will use a subset of the language that is called the "synthesizable" subset for describing our processors. A synthesizable subset is what a synthesis tool can automatically translate to hardware.

Do not panic if you have not used an HDL before! We will teach and discuss SystemVerilog and its synthesizable subset (which is frankly very simple) in enough detail in the class. We will also provide a SystemVerilog-to-C++ translator (called Verilator) to translate your SystemVerilog code to C++ code that can be compiled and run to simulate your design. We will provide the necessary testing infrastructure that you will compile together with Verilator's output to create a fully functional simulator for your design.


Homework 1

The goal of this homework is to get you started on SystemVerilog and writing test cases. To get the code, do git clone http://compas.cs.stonybrook.edu/~nhonarmand/courses/sp18/cse502/hw1.git.

Please carefully read the README file for an overview of what you need to do for this homework. You should work on this homework individually and not as a group. Each student should submit a separate solution. To submit, type make submit.


Homework 2

The goal of this homework is to implement a direct-mapped cache that you can use in your course project later. To get the code, do git clone http://compas.cs.stonybrook.edu/~nhonarmand/courses/sp18/cse502/hw2.git. Please carefully read the README file for an overview of what you need to do for this homework.

Please carefully read the README file for an overview of what you need to do for this homework. You should work on this homework individually and not as a group. Each student should submit a separate solution. To submit, type make submit.


Project

Overview

In this course, you will design and implement a SPARCv8-compatible processor. At the minimum, your processor will include a 5-stage pipeline (similar to the one covered in the class), multiple functional units with varying latencies, and direct mapped instruction and data caches. For more points, you can add other features (see below) to your processor. The grading scheme is as follows:

  • 5-Stage pipeline + direct-mapped caches (40 pts)
  • 5-Stage pipeline + set-associative caches (45 pts)
  • Above + super-scalar pipeline (60 pts)
  • All of the above + out-of-order execution (80 pts)
  • Multi-cycle divider and pipelined multiplier on top of any of the above (5 extra pts).
  • Branch prediction and speculative execution on top of any of the above (10-20 extra pts)
  • SMT on top of any of the above (10-20 extra pts)

I will provide the code for the multi-cycle divider and pipelined multiplier as part of the project sekelton code. Please read the code and comments carefully to make sure you know how to use them.
  • A pipelined multiplier accepts a new multiplication operation every cycle but generates the result after multiple cycles (just like a normal pipeline)
  • A multi-cycle divider accepts a new operation when the "start" signal is given and takes "n" cycles to finish it. After that, it can accept another operation.

Getting the Skeleton Code

To get the code, do git clone http://compas.cs.stonybrook.edu/~nhonarmand/courses/sp18/cse502/proj.git. You should implement your processor by modifying the existing SystemVerilog files and adding new ones. top.sv is the top-level SystemVerilog file and Core.sv is your processor core. You may also need to modify system.cpp to emulate some OS features (e.g., support for register window overflow/underflow). Read the README file for more information.

Project Report

In addition to submitting your code (using "make submit"), you should also submit a short report (ideally, no more than 3 pages). It should provide a high-level overview of your processor pipeline, the implemented features and details of each pipeline stage—in particular,

  • General flavor of your processor: scalar or super-scalar, out-of-order or in-order, etc.
  • Instruction and data cache details
  • How your caches connect to the main memory (e.g., how you arbitrate between them)
  • Functional units implementations
  • Memory unit implementation
  • How you handle data and control flow dependencies
  • How you implement register window overflow and underflow situations
A well-organized and comprehensive report can substantially improve your project grade. It can help me and the TA identify important aspects of your design that we might overlook otherwise.

You should email me your reports by the project deadline.

Target Instruction Set

Your processor should implement the user-mode (non-privileged) subset of SPARCv8 instructions. You can ignore the ISA subset related to the "Floating Point" instructions, "Alternate Address Spaces", "Ancillary State Registers" and the "Co-Processor" as well as any instruction that is marked as "privileged" in the SPARCv8 manual. Specifically, your processor needs to implement all the instructions (and the requisite architectural state) described in the following sections of the manual:

  1. Load/store instructions: B.1, B.4, B.7, B.8
  2. Arithmetic/logical/shift instructions: B.11, B.12, B.13, B.14, B.15, B.16, B.17, B.18, B.19
  3. Control transfer instructions: B.21, B.24, B.25, B.27
  4. Misc. instructions: B.9, B10, B20, B28 (only RDY), B29 (only WRY), B.30 (treat as NOP), B.31, B.32 (treat as NOP unless you have caches)

Traps and Interrupts

Your processor need not deal with external interrupts. It should treat all "exceptions" (i.e., traps caused by instructions) as precise—no deferred interrupts. Your implementation should correctly check for and generate all the exceptions described in the semantics of your implemented instructions.

Virtual Memory and MMU

Your processor need not include any virtual memory support. Hence, you don't need to implement an MMU and can ignore the issues related to the Address Space Identifiers (ASI).


Project Resources

SPARC Instruction Set

SystemVerilog Documents

Verilator

Waveform Viewers

  • GTKWave: Open-source, multi-platform waveform viewer (also installed on SBRocks)
  • SynaptiCAD WaveViewer: A more powerful and free waveform viewer for both Windows and Linux. Because they contain other tools, some of which are not free, the installation files are rather large.