Fall 2017 :: CSE 306 — Operating Systems

Lab 4 Introduction

In this lab, we will add inter-process shared memory and synchronization primitives (locks and condition variables) to xv6. We will do so in three steps. First, we will add a mechanism to allow multiple processes use shared memory to communicate with each other. For example, if a shared buffer is to be used by producer and consumer processes, it can be allocated in this shared memory. Also, the locks and condition variables that those processes need to coordinate their access to the shared buffer can also be allocated in such a shared area.

Next, we would like to implement a library of synchronization primitives to allow our communicating processes safely do so. To do this efficiently, we will need some help from kernel to put a thread to sleep until the lock or condition variable they need are released or signaled, respectively. In class, we saw how to do that using Solaris-inspired system calls park(), unpark(), and setpark(). In this lab, we will use a mechanism similar to Linux's futex.

Armed with proper kernel support, we will next roll our own (reasonably) efficient synchronization library: we will build a user-mode library with locks — which we call mutex — and condition variables.

As in previous labs, this assignment does not involve writing many new lines of code. The difficulty is figuring out what to change in xv6 and writing careful unit tests for each change. We strongly recommend starting early, carefully thinking about — and vetting — the changes before developing any code. Also, write as many test cases as you can.

Getting the New Code

Do the following to pull and merge the Lab 4 code with your existing Lab 3 code (after making sure you have committed your Lab 3 code on branch lab3):

$ git commit -am "final lab3 commit"
$ git pull
$ git checkout -b lab4 origin/lab4
Branch lab4 set up to track remote branch refs/remotes/origin/lab4.
Switched to a new branch "lab4"

The git checkout -b command shown above actually does two things: it first creates a local branch lab4 that is based on the origin/lab4 branch provided by us, and second, it changes the contents of your xv6 directory to reflect the files stored on the lab4 branch. Git allows switching between existing branches using git checkout <branch-name>, though you should commit any outstanding changes on one branch before switching to a different one.

Note: In the above commands, we are assuming that the "origin" remote refers to read-only repo. If you have changed the name of the read-only remote, make sure to use the correct name.

Next, you will need to merge the changes you made in your last lab (lab3) branch into the new (lab4) branch, as follows:

$ git merge lab3
Merge made by recursive.
 ...
 x files changed, y insertions(+), z deletions(-)

In some cases, Git may not be able to figure out how to merge your changes with the new lab code (e.g. if you modified some of the code that is changed in the new lab handout). In that case, the git merge command will tell you which files are conflicted, and you should first resolve the conflicts (by editing the relevant files) and then commit the resulting files with git commit -a.

Part 1: Inter-Process Shared Memory

As locks and condition variables are just memory objects shared between communicating tasks, which in our case are processes, we will first need to add some sort of inter-process memory sharing mechanism to xv6. There are several different standard Unix APIs out there for this purpose. Examples include the POSIX API, which uses shm_open() and mmap() to create and get access to shared memory objects, and System V API, which uses shmget() and shmat() to do the same. To keep things simple, as we are more interested in the synchronization stuff in this lab, we will roll our own different mechanism.

The idea is to set aside a piece of process address space to be used for shared-memory purposes. We will call it the SHM area and it will be located between the stack and the heap. Also, similar to the sbrk() system call that is used to allocate physical pages for heap, we will add a new shmbrk()system call to allocate SHM pages.

SHM pages are used to enable inter-process communication between a group of processes. Therefore, each process in that group should be able to see the writes of other processes to those pages. This means all processes in that group should have the same physical pages mapped to their address spaces, and not different or COW pages.

To keep things simple, we use the following protocol to create a group of communicating processes: all processes in a group are directly or indirectly forked from one single parent. That common parent is in charge of allocating SHM memory by calling shmbrk(). When a parent forks, the child inherits the parent's SHM area and all of the pages in it. When forking, such SHM pages are not marked as COW because we want to share them among the processes); only their reference count increases.

Exercise 1. Implement the SHM mechanism explained above.

During exec(), set aside MAX_SHM bytes of address space for the SHM area after the stack VMA. Initially, there will be no physical pages mapped to this range. Physical pages will be mapped here using the shmbrk system call. To keep track of how much allocated space there is in the SHM area, you need to add two new fields to struct proc: one to mark the first byte of the SHM area, and one to mark the last valid (allocated) location. In this lab, we will use the term SHM break to refer to the first virtual address after the allocated part of a process's SHM area.
Implement the shmbrk() system call. The system call takes a single integer argument, say n, and returns a virtual address. The returned virtual address is always the process's SHM break before the call takes effect (i.e., the old SHM break). If n is zero, returning this value is all that shmbrk() does. If n is positive, shmbrk() expands the allocated part of the SHM area by at least n bytes. If n causes the SHM area to overflow, shmbrk() returns −1 to indicate failure. If n is negative, shmbrk() releases all pages in the process's SHM area (i.e., decrement their reference count and return them to free list if there are no more references).
During fork(), any page in the SHM area should be directly inherited by the child — i.e., not COWed — to enable inter-process shared-memory. There are two ways to identify such a page during fork(): either check its address to see if it is in the SHM area, or to use a new user-defined PTE flag to mark SHM pages (similar to the PTE_COW flag). You can call it PTE_SHM and we suggest that you use the PTE bit 0x400 for it.
Think about what happens if a process makes a system call — e.g., read() — using a memory address in the SHM area. Make the necessary changes in xv6 (in particular, the arg*() functions) accordingly.

Exercise 2. Write a few test cases to exercise different aspects of your SHM implementation. In particular, make sure the SHM pages are properly inherited even after multiple levels of forking to allow large groups of processes talk to each other.

Part 2: In-Kernel Support for Synchronization Primitives

The SHM support above allows the processes to share memory. They can put their shared data structures and synchronization objects (locks and condition variables) in that shared memory. However, to support efficient synchronization, we still need more kernel support. In particular, we need a mechanism for processes to put themselves into sleep when a resource (e.g., lock) is not available, and be woken up by other processes when the resource becomes available.

As discussed in class lectures, the two operations of 1) checking a resource's availability, and 2) putting one's self to sleep if the is resource not available should be made atomic with respect to the operation of waking up the sleepers (by whoever's holding the resource) to avoid race conditions resulting in indefinite sleeps. In class, we saw, for example, how to build a blocking lock using a bunch of system calls (park() and unpark()) to enable sleep/wakeup and a spinlock + another system call (setpark()) to solve the race problem.

In this lab, we use a different flavor of system calls inspired by Linux's futex to add the necessary support in xv6. We will add two new system calls for this purpose:

futex_wait(int *loc, int val): Checks the value in memory location loc; if it equals val, it puts the current process to sleep. The process will be woken up when a corresponding futex_wake() call is made.
futex_wake(int *loc): wakes up all the processes that are sleeping on location loc.

The key aspect of futex is that checking loc for val and putting current process to sleep in futex_wait() should happen atomically with respect to the operation of waking up sleeping processes in futex_wake(). In analogy to atomic compare-and-swap, you can think of futex_wait() as an atomic compare-and-block operation.

Exercise 3. Implement futex_wait() and futex_wake().

xv6 kernel has a mechanism to put processes to sleep on a so called "channel". The relevant functions are sleep() and wakeup() in proc.c (Note: this sleep function is internal to the kernel and is different from the sleep system call). They are explained in detail in Chapters 4 and 5 of the xv6 book. Read the book as well as the kernel code in proc.c and check uses of these functions in the rest of the kernel code to figure out how to put processes to sleep and wake them up.
Implement futex_wait(int *loc, int val) using sleep(). If the check succeeds and process is put to sleep, futex_wait() returns 0 (after waking up, of course). Otherwise, it immediately returns with a −1.
The key point in correct implementation of futex_wait() is figuring out how to make "checking of loc" and "putting process to sleep" atomic with respect to calls to futex_wake(). You can achieve this atomicity by acquiring a particular lock before checking loc. Can you identify what lock it is?
Implement futex_wake(int *loc) using wakeup(). This should wakeup all processes that are sleeping as a result of calling futex_wait() on the same loc.

Exercise 4. Run the provided test program futextest.c to test your implementation. You should not get a deadlock. Also, using the code in futextest.c as an example, develop your own test programs to play with these system calls and familiarize yourself with different corner cases that can arise with them.

Part 3: Locks and Condition Variables

Armed with SHM and futexes, we are ready to roll our own user-mode inter-process synchronization library. The library will support two primitives: mutexes (locks) and condition variables. You can see the prototypes of the functions you need to implement in user.h. It helps to review the relevant sections of the OSTEP book (locks and condition variables) before starting on your synchronization library.

Our mutex is a blocking lock: it should put the current process to sleep, using futex_wait() in mutex_lock(), when the lock is not available. When the lock becomes avail, mutex_unlock() should wakeup all processes waiting for that mutex (using futex_wake()). mutex_trylock() acquires the lock if it is available. If it is, it acquires the lock and returns 0. Otherwise, it returns −1 but does not put the process to sleep.

Our condition variable is similar to what we discussed in class. The only difference is there is no signal function, just a broadcast function (cv_bcast). This is because our futex wakes up everyone sleeping on a given location, instead of just one process. Other than that, our condition variables, adhere to the same semantic as in class lectures:

cv_wait(cond_var_t *cv, mutex_t *m) waits on cv (using futex_wait()) until woken up. It should atomically release the mutex m when going to sleep. Here, atomicity is with respect to corresponding calls to cv_bcast(). After waking up, cv_wait() should reacquire m before returning.
cv_bcast(cond_var_t *cv uses futex_wake() to awaken all processes sleeping on cv.

Exercise 5. Implement mutex functionality. You determine the field(s) needed in the mutex_t structure. To avoid weird behavior caused by compiler optimizations on these fields, you should declare them as volatile. Hint: you don't need more than one integer. If you need to atomically read-modify-write a variable in your implementation, you can use GCC's atomic builtins explained here. You can also use a spinlock to do the same, but atomics are much more elegant.

Exercise 6. Implement condition variable functionality. Again, the field(s) of cond_var_t should be declared as volatile and it is possible to implement the condition variable with just one field. Atomic builtins are your friends here, and you will probably need something more powerful than test-and-set. You can also use a spinlock, but atomics are much more elegant.

Exercise 7. Write a test program called checksum.c to test your implementation. The program will emulate a producer/consumer situation. The program initially allocates some SHM memory to be shared between producer processes and consumer processes. Then, it forks 4 producer processes and 4 consumer processes, and it waits for all of them to exit before exiting itself.

Each producer opens the file "README" that is part of the xv6 image, reads a chunk of size up to 12 bytes from the file, appends the chunk to the tail of a shared circular buffer that is located in the SHM area, and repeats this in a loop until EOF is reached. Each consumer process takes up to 8 next unconsumed bytes from the head of the circular buffer, and adds these byte, one at a time, to a variable local to the consumer. In other words, each consumer local variable holds the byte-level checksum of all the bytes that consumer has taken from the buffer. The consumers should exit when all the producers are done producing and the buffer is empty. At the end, each consumer should add its calculated checksum to a global checksum that will eventually hold the checksum of all the bytes read by all the producers.

If the buffer is full, producers should wait until the buffer becomes non-full, and if it is empty, consumers should wait until it becomes non-empty.

You should use the SHM area to keep at least the following items: 1) circular buffer of size 128 bytes, 2) head and tail pointers for the circular buffer, 3) global checksum value that gets updated by the consumers at the end of each consumer process, and 4) any mutexes and condition variables needed to protect the shared data structures and coordinate producers and consumers. With condition variables, keep in mind that when calling cond_bcast() your code should have already acquired the same mutex that was used in the corresponding cond_wait(), i.e., the mutex that protects the shared data structure upon which the condition depends.

Since addition is a commutative and associative operation, it does not matter in what order bytes are read, processes are interleaved and additions are performed. If the code is correct, the final result will always be the same: the byte-level checksum of 4 README files. You can calculate this value offline and use it to check the final global checksum to test the correctness of your program.

Hand-In Procedure

This completes the lab.

You must include a file named README-lab4 with this assignment. The file should describe what you did, what approach you took, results of any measurements you made, which new files are included in your submission and what they are for, etc. Feel free to include any other information you think is helpful to us in this file; it can only help your grade.

If you are handing in late, add an entry to the file slack.txt noting how many late hours you have used both for this assignment and in total. (This is to help us agree on the number that you have used.) This file should contain a single line formatted as follows (where n is the number of late hours):

late hours taken: n

If you submit multiple times, we will take the latest submission and count late hours accordingly.

To submit your lab, type make handin in the xv6 directory. If submission fails, please double check that you have committed all of your changes, and read any error messages carefully before emailing the course staff for help.