Fall 2017 :: CSE 306 — Operating Systems

Lab 5 Introduction

In this lab, you will implement the core performance optimizations in the Unix Fast File System (FFS), including block groups and the large file exception, on the xv6 file system. The current xv6 file system is implemented using a simple layout that places the superblock and the journal log at the front of the disk, followed by all the metadata blocks, followed by a journal log. This lab will implement a more performant variant of this design, using ideas from FFS.

Unix FFS Background

The absolute first step for this lab is to understand the Fast File System (FFS) design. If you have not already, please read Chapter 41 of the course textbook. You will be implementing several features of this design in xv6.

xv6 File System Background

A second essential task before starting is to read Chapter 6 of the xv6 book. This chapter explains the basics of how the xv6 file system is implemented and has a number of useful code pointers and explanation that will be invaluable in completing the assignment.

The current xv6 file system is basic and functional, but will not get good performance on a real disk. Similar to the basic strawman in the lecture slides, xv6 places all of the metadata at the front of the disk, followed by the data blocks. Thus, there is guaranteed to be a large seek between reading an inode and reading data. A technique such as FFS block groups can reduce the likelihood of seeks by placing inodes and data blocks relatively close to each other on disk. Similarly, the xv6 file system simply picks the first free inode and block, rather than making an attempt to place contents of a file together.


Getting the New Code

Do the following to pull and merge the Lab 5 code with your existing Lab 4 code (after making sure you have committed your Lab 4 code on branch lab4):

$ git commit -am "final lab4 commit"
$ git pull
$ git checkout -b lab5 origin/lab5
Branch lab5 set up to track remote branch refs/remotes/origin/lab5.
Switched to a new branch "lab5"

The git checkout -b command shown above actually does two things: it first creates a local branch lab5 that is based on the origin/lab5 branch provided by us, and second, it changes the contents of your xv6 directory to reflect the files stored on the lab5 branch. Git allows switching between existing branches using git checkout <branch-name>, though you should commit any outstanding changes on one branch before switching to a different one.

Note: In the above commands, we are assuming that the "origin" remote refers to read-only repo. If you have changed the name of the read-only remote, make sure to use the correct name.

Next, you will need to merge the changes you made in your last lab (lab4) branch into the new (lab5) branch, as follows:

$ git merge lab4
Merge made by recursive.
 ...
 x files changed, y insertions(+), z deletions(-)

In some cases, Git may not be able to figure out how to merge your changes with the new lab code (e.g. if you modified some of the code that is changed in the new lab handout). In that case, the git merge command will tell you which files are conflicted, and you should first resolve the conflicts (by editing the relevant files) and then commit the resulting files with git commit -a.


Part 1: Block Groups

Your first coding task will be to implement support in xv6 for block groups. The initial xv6 layout is created in mkfs.c, using a hard-coded, but configurable size (FSSIZE) and number of inodes (NINODES).

One can think of the current design as one large block group, with this format:

[ boot block | sb block | log | inode blocks | bit map | data blocks ]

With multiple block groups, the layout would look like this:

[ boot block | sb block | log | inode blocks | bit map | data blocks | inode blocks | bit map | data blocks | ... ]
                               \------------Block Group 1-----------/ \------------Block Group 2-----------/  ...

Exercise 1. Your first task is to modify the file system to stripe the inodes, free data block bit map, and data blocks across multiple groups. This would require modifying both mkfs.c, that creates the file system image, as well in-kernel FS code that uses the file system at runtime.

You should make the size of each block group a compile-time macro of mkfs.c. A reasonable way to do this is to make sure that the free-block bitmap of each group fits in a single sector. One sector stores 4096 bits (512 bytes * 8 bits/byte).

Similarly, you should pick a reasonable value for the number of inodes in each block group's inode table. Again, you can count how many inodes fits in a single sector, and make the inode count a multiple of that. Also, you should pick some reasonable ratio between the number of data blocks in each group and its inode count.

Please note that a block group size picked according to the above guidelines may not divide the disk space evenly. In such a case, your last block group may be smaller than the others and your mkfs.c should be able to handle such cases.

For the kernel file system, you should store the size of each block group in the super block, rather than hard-code this value, so that the kernel can correctly handle multiple file systems with different numbers of block groups.

For simplicity, you are welcome to add assertions to mkfs.c that the number of inodes and blocks must divide evenly by the number of block groups, and suggest alternative values to the user that would divide evenly, rather than deal with edge cases where the last block group is not completely full.

Hint: Macros such as IBLOCK() and BBLOCK() in fs.h will be helpful in adjusting how inodes and data blocks are located on disk. You can change them or add new ones if you need to.

As always, make sure you haven't broken anything by passing the FS-related tests in usertests.c before moving on to the next part.


Part 2: Spreading and Packing Heuristics

FFS included a number of heuristics for maintaining locality, including

  1. placing new directories in one of the least-utilized block groups,
  2. placing files in the same directory in the same block group (when possible), and
  3. chunking and spreading large files across multiple block groups.
Currently, xv6 just places new files and data blocks on the first available inode or block.

Exercise 2. Implement each of these three heuristics in xv6. Note that this requires keeping some statistics about how full each block group is, so that load can be spread across block groups. Think about what statistics you need and whether they should be persisted to disk. If yes, you may want to have a per-group meta-data block — let's call it group header — to store such metadata. You can place the group header as the first block of each group.

An important aspect of using heuristics is handling the cases where they don't work neatly, such as when a block group is full and more files would be added. For instance, if a directory is in a full block group, it would be best for all newly-created files in that directory to be in the same, second block group. Similarly, if space in the first block group becomes available later, new files in the directory should be in the first block group. You should think about these corner cases and have proper combat plans. Kernel panics are not an option when the heuristics don't work.

Exercise 3. Write at least three test cases that exercise these heuristics (and block groups in general). These tests should be enough to show that related files and blocks are being placed in the same block group. You may need to add a debugging output mode that can log or print the mapping of the directory hierarchy to block groups. Be sure to document how these tests work and the expected output in your README-lab5 file.


Hand-In Procedure

This completes the lab.

You must include a file named README-lab5 with this assignment. The file should describe details of your block groups and how you change the FS format to accommodate them. Details such as block group size and inode count, group header content, block group format, changes to superblock content., etc., are necessary to help us understand your design. Also, you should explain how you implement the FFS heuristics and how you handle the corner cases. Feel free to include any other information you think is helpful to us in this file; it can only help your grade.

There will be no late submissions for this lab. The deadline is hard.

If you submit multiple times, we will take the latest submission and count late hours accordingly.

To submit your lab, type make handin in the xv6 directory. If submission fails, please double check that you have committed all of your changes, and read any error messages carefully before emailing the course staff for help.