To become familiar with Unix-style forking and x86 memory management, you will convert the simple fork() implementation in xv6 to a copy-on-write fork(). This will involve writing a trap handler for page faults, augmenting the physical memory management code, and, of course, manipulating page tables.
This assignment does not involve writing that many lines of code (hundreds to a thousand), but the hard part is figuring out what to change and writing careful unit tests for each change. We strongly recommend starting early and writing many test cases.
In these labs, you will progressively build up your kernel. With each new lab, we will provide you with some additional source. Every new lab handout will be distributed on a different branch of the same read-only repository (http://compas.cs.stonybrook.edu/~nhonarmand/courses/fa17/cse306/xv6-labs.git). Lab1 handout will be on branch lab1, Lab2 on branch lab2 and so on.
At the beginning of each lab, you will need to pull the new code for that lab from the repo, and merge it with your code from the previous labs, before you can start working on the new lab. Do the following to pull and merge the Lab 2 code with your existing Lab 1 code (after making sure you have committed your Lab 1 code on branch lab1):
$ git commit -am "final lab1 commit" $ git pull $ git checkout -b lab2 origin/lab2 Branch lab2 set up to track remote branch refs/remotes/origin/lab2. Switched to a new branch "lab2"
The git checkout -b command shown above actually does two things: it first creates a local branch lab2 that is based on the origin/lab2 branch provided by us, and second, it changes the contents of your xv6 directory to reflect the files stored on the lab2 branch. Git allows switching between existing branches using git checkout <branch-name>, though you should commit any outstanding changes on one branch before switching to a different one.
Note: In the above commands, we are assuming that the "origin" remote refers to read-only repo. If you have changed the name of the read-only remote, make sure to use the correct name.
Next, you will need to merge the changes you made in your last lab (lab1) branch into the new (lab2) branch, as follows:
$ git merge lab1 Merge made by recursive. ... x files changed, y insertions(+), z deletions(-)
In some cases, Git may not be able to figure out how to merge your changes with the new lab code (e.g. if you modified some of the code that is changed in the new lab handout). In that case, the git merge command will tell you which files are conflicted, and you should first resolve the conflicts (by editing the relevant files) and then commit the resulting files with git commit -a.
You will convert the xv6 fork() implementation to use copy-on-write. The current version does a simple copy of each page in the address space. You will modify the xv6 kernel to do copy-on-write instead.
Currently, xv6 does not allow physical page frames to be shared. The first step in copy-on-write support will be adding a reference count to each physical page descriptor.
First, you will need to understand how physical pages are allocated. Begin by reading kalloc.c. Here, each 4KB page of free physical memory is represented as a struct run, and these structures are organized into a free list.
Exercise 1. Add a reference count to this page descriptor structure. You should set the count to one when a page is allocated, write a helper function to increment and decrement the count (using appropriate locks or atomic instructions!), and assert that the count is one when a page is freed.
Checking that a page is not in use by more than one process at the time of freeing will help you find bugs later.
The second building block you will need for copy-on-write fork is the ability to catch page faults. Read trap.c: you will see a number of fault handlers registered, such as for T_SYSCALL. You will register a handler for T_PGFLT in the trap() function. Currently, this handler can copy the default behavior, but you will later extend it to handle writes to a copy-on-write page.
Exercise 2. Register a page fault handler for page faults, which prints a slightly different error message. Write a unit test that deliberately accesses an invalid address, and be sure that your handler is being called to kill the process.
The main part of the assignment is changing the fork implementation. We highly recommend that you keep the old version for easy comparison and debugging, such as with a #define#.
Begin by reading and understanding the default fork() implementation. The system call is defined in proc.c, although the main workhorse is the function copyuvm(), defined in vm.c. Before you write any code, make sure you completely understand what this function (and its helpers) are doing. The following paragraphs will give some explanation, as will the relevant Intel manual entries below.
Exercise 3. Look at chapters 5 and 6 of the Intel 80386 Reference Manual, if you haven't done so already. Read the sections about page translation and page-based protection closely (5.2 and 6.4). We recommend that you also skim the sections about segmentation; while xv6 uses paging for virtual memory and protection, segment translation and segment-based protection cannot be disabled on the x86, so you will need a basic understanding of it.
In x86 terminology, a virtual address consists of a segment selector and an offset within the segment. A linear address is what you get after segment translation but before page translation. A physical address is what you finally get after both segment and page translation and what ultimately goes out on the hardware bus to your RAM.
Selector +--------------+ +-----------+ ---------->| | | | | Segmentation | | Paging | Software | |-------->| |----------> RAM Offset | Mechanism | | Mechanism | ---------->| | | | +--------------+ +-----------+ Virtual Linear Physical |
A C pointer is the "offset" component of the virtual address.
In vm.c, xv6 installs a Global Descriptor Table (GDT)
that effectively makes segment translation a no-op by setting all segment
base addresses to 0 and limits to 0xffffffff
. Hence the
"selector" has no translation effect and the linear address always equals the
offset of the virtual address. So, we can ignore segmentation throughout
the xv6 labs and focus solely on page translation.
Exercise 4. While GDB can only access QEMU's memory by virtual address, it's often useful to be able to inspect physical memory while setting up virtual memory. Review the QEMU monitor commands from the lab tools guide, especially the xp command, which lets you inspect physical memory. To access the QEMU monitor, press Ctrl-a c in the terminal (the same binding returns to the serial console).
Use the xp command in the QEMU monitor and the x command in GDB to inspect memory at corresponding physical and virtual addresses and make sure you see the same data.
QEMU also provides an info mem command that shows an overview of which ranges of virtual memory are mapped and with what permissions.
From code executing on the CPU, once we're in protected mode (which we entered first thing in bootasm.S), there's no way to directly use a physical address. All memory references are interpreted as virtual addresses and translated by the MMU, which means all pointers in C are virtual addresses.
You will need to define a page table flag for copy-on-write. The x86 architecture reserves a few flags in the page table for use by the OS. We recommend 0x800. Define this flag as PTE_COW in mmu.h. Note that this flag will not be interpreted by the CPU — it is only there for your code to use to identify COW pages.
Exercise 5. Implement a variant of copyuvm called cowuvm that does the following:
Note that your cowuvm implementation will not need to allocate new page frames using kalloc() for the process memory. Rather, this will be done lazily in the page fault handler.
Finally, you will need to implement COW support in the page fault handler you registered above. This handler will need to:
At this point, you should be able to pass the forktest utility, as well as run all of the usertests provided. Of course, we want you to write additional unit tests.
Exercise 6. Write at least 3 additional unit tests for fork. Extra credit is possible for particularly clever or tricky ways to detect edge cases in fork. This is very important going forward, as bugs in fork can cause other problems in future assignments.
Aside from testing the proper functionality of your code, we will also evaluate the quality of your code. Be sure to use a consistent style, well documented, and break your code into separate functions and/or source files as it makes sense.
To be sure your code is very clean, it must compile with "gcc -Wall -Werror" without any errors or warnings!
If the various sources you use require common definitions, then do not duplicate the definitions. Make use of C's code-sharing facilities.
This completes the lab.
You must include a file named README-lab2 with this assignment. The file should describe what you did, what approach you took, results of any measurements you made, which new files are included in your submission and what they are for, etc. Feel free to include any other information you think is helpful to us in this file; it can only help your grade.
If you are handing in late, add an entry to the file slack.txt noting how many late hours you have used both for this assignment and in total. (This is to help us agree on the number that you have used.) This file should contain a single line formatted as follows (where n is the number of late hours):
late hours taken: n
If you submit multiple times, we will take the latest submission and count late hours accordingly.
To submit your lab, type make handin in the xv6 directory. If submission fails, please double check that you have committed all of your changes, and read any error messages carefully before emailing the course staff for help.