If you are pursuing a systems-related degree, you are strongly encouraged to take this course.

Course Overview

This is a new course that examines the state of the art of modern datacenters, looking into the software and hardware that powers the world’s largest and most popular online services, such as Google, Bing, Facebook, Twitter, and Amazon. The key goal of the course is to offer an understanding of how today’s systems are built, why they are built this way, and what promising directions are on the horizon.

Course readings and discussion will be geared to provide an overview of the subject, while projects will target in-depth knowledge of a specific area. The course will include advanced material spanning a wide range of topics, including multiprocessing, compilers, operating systems, virtualization, storage and database systems, and datacenter organization. Students will select the topic of their own project and will have a choice of working individually or in groups of any size. Projects that touch upon a student’s outside interests or other computer-science disciplines (e.g., computer vision, security, databases, etc…) will be highly encouraged.

Students will be expected to attend lectures, read papers and discuss them in class, and work on a major project. As part of this course, students must produce either (1) a conference-quality paper or (2) public-release-quality working software and documentation. Students will be expected to spend at least 8 hours per week on course work in addition to attending lectures.

What this course is:

A user-centric overview of datacenters. This course is about using and optimizing very-large-scale online systems.

What this course is not:

A builder-centric advanced micro-architecture course. This course is not about building processors.

Policies

The collaboration and academic integrity policy in this class is likely to be significantly different from other courses you’ve taken in the past. Absolutely all forms of collaboration are permitted, including collaboration with other students in and outside of class, as well as outside of the university. There are only three explicit requirements for all submitted work:

All submitted work must have an explicit Copyright label containing your name.
All submitted work must have an explicit license.
All Copyright laws of the United States must be respected.

For example, for all submitted materials, you must include a label similar to one of the following:

Copyright © 2012 by (Your Name). All rights reserved.
Copyright © 2012 by (Your Name). Permission to copy and distribute verbatim copies permitted.
Copyright © 2012 by (Your Name). This work is licensed under GPLv3, details in accompanying COPYING file.

Applying open-source licenses to your work (BSD, GPL, or others) is encouraged, but not required. Any willful violations of this policy will result in a failing grade being assigned for the course.

Evaluation

The final grade will be roughly-evenly split between class participation and project. Outstanding performance in either class participation or project work (above and beyond the expected) will positively bias the other. There will be no exams.

Prerequisites

There are no formal prerequisite for this course, although prior advanced course work in related fields (computer architecture, compilers, operating systems, databases) will likely be of help. If you are unsure whether or not you have the necessary background or if you are unable to sign up via the web, please contact the instructor.


Readings (to be read by specified date)

  • August 30
  • Perspectives – Cost of Power in Large-Scale Data Centers (link)
  • Perspectives – Overall Data Center Costs (link)
  • Scale-up x Scale-out: A Case Study using Nutch/Lucene (IBM T. J. Watson)
  • Base Operating System Provisioning and Bringup for a Commercial Supercomputer (IBM T. J. Watson)
  • September 6
  • Power Provisioning for a Warehouse-sized Computer (Google)
  • September 11
  • Cutting the electric bill for internet-scale systems (MIT, Akamai, CMU)
  • Impact of Hot and Cold Aisle Containment on Data Center Temperature and Efficiency (APC) (link)
  • Cool job allocation: measuring the power savings of placing jobs at cooling-efficient locations… (HP)
  • September 13
  • The Datacenter as a Computer, chapters 5 and 6 (Google) (link)
  • Understanding the Performance-Temperature Interactions in Disk I/O of Server Workloads (Penn State, UVA)
  • September 26
  • Bigtable: A Distributed Storage System for Structured Data (Google)
  • October 2
  • Dynamo: Amazon’s Highly Available Key-value Store (Amazon)
  • October 4
  • Finding a needle in Haystack: Facebook’s photo storage (Facebook)
  • October 9
  • The Hadoop Distributed File System (Yahoo!)
  • A Cost-Effective, High-Bandwidth Storage Architecture (CMU)
  • October 11
  • UpRight Cluster Services (UT Austin)
  • Spanner: Google’s Globally-Distributed Database (Google)
  • October 16
  • Commercial Fault Tolerance: A Tale of Two Systems (HP, IBM)
  • Understanding Failures in Petascale Computers (CMU)
  • October 18
  • Apologies (Microsoft, Microsoft, Amazon, Amazon, Amazon, Rackspace)
  • October 23
  • On Designing and Deploying Internet-Scale Services (Microsoft)
  • Perspectives – Observations on Errors, Corrections, & Trust of Dependent Systems (link)
  • October 25
  • The Datacenter as a Computer, chapter 7 (Google) (link)
  • Explosion at The Planet (link)
  • October 30
  • (Class cancelled)
  • November 1
  • (Class cancelled)
  • November 6
  • Understanding Full Virtualization, Paravirtualization, and Hardware Assist (VMware)
  • November 8
  • Xen and the Art of Virtualization (Cambridge)
  • Solaris Zones: Operating System Support for Consolidating Commercial Workloads (Sun)
  • November 13
  • A Comparison of Software and Hardware Techniques for x86 Virtualization (VMware)
  • Memory Resource Management in VMware ESX Server (VMware)
  • Software techniques for avoiding hardware virtualization exits (VMware)
  • November 15
  • Utilizing IOMMUs for Virtualization in Linux and Xen (IBM, Intel, AMD)
  • QEMU, a fast and portable dynamic translator (Bellard)
  • November 20
  • NOTE: Papers are on IEEE Xplore, access is FREE from the campus network!
  • Performance Studies of Commercial Workloads on a Multi-core System (IBM)
  • Workload Characterization for the Design of Future Servers (IBM)
  • November 22
  • (Thanksgiving holiday)
  • November 27
  • Intel Embraces Multithreading (MPR by Kevin Krewell)
  • The Future of Microprocessors (Stanford)
  • November 29
  • System Overview for the SM10000 Family (SeaMicro)
  • Many-Core Key-Value Store (Facebook, Tilera)
  • HP Project Moonshot and the Redstone Development Server Platform (HP)