Course Overview

The goal of this course is to introduce advanced-year undergraduate students to the field of Cloud Computing. This course will explain how large-scale systems such as Google, Facebook, and Twitter are built and provide students with the foundation needed to join or start a company that creates such systems.

Creating online services capable of handling millions of users requires a different mindset compared to traditional software development and deployment. Rather than building monolithic software packages from the ground up, bringing up modern online services calls for architecting systems by gluing together mature existing technologies deployed across many unreliable servers, working in concert to provide high-availability robust services. In this course, students will be exposed to the concepts and technologies behind deploying and scaling online services on the computing resources available in modern datacenters.

Outcomes

Students will gain theoretical and hands-on knowledge of concepts and software packages used to create modern online services. In lecture, the students will be introduced to high-level concepts of cloud computing and will receive an overview of the server software, libraries, and tools used for developing and deploying cloud applications. The concepts introduced in lecture will be reinforced by small hands-on homework assignments that put those concepts into practice. Ultimately, a final course project will have students combine these technologies to develop and deploy a robust and scalable online service on a cloud infrastructure such as Amazon Web Services, Google Cloud Platform, and Microsoft Azure.

Course Topics

Likely course topics will include: introduction to cloud services, virtualization, paravirtualization, advanced networking, web services, server-side scripting languages and frameworks, cloud programming paradigms, cloud deployment and machine management, scale-up vs scale-out, cloud storage, cloud service topologies, message serialization and transport, load balancing, content distribution networks, security, authentication, QoS, managing tail latencies, performance monitoring

Office Hours

NCS 343, Tuesdays, 2:30PM-3:50PM or by appointment

Evaluation

  • 10 homeworks – 30
  • Warm-up Project 1 – 10 (due: Feb 15)
  • Warm-up Project 2 – 10 (due: Mar 1)
  • Course Project – 40 (due: day of scheduled final)
  • Mid-term 1 – 15 (Mar 6)
  • Mid-term 2 – 15 (May 3)

Prerequisites

Solid programming and debugging experience is a must. Students enrolling in this course are expected to have working knowledge of programming and debugging in at least one scripting language (e.g., JavaScript, Python, PHP, Ruby, Perl), be familiar with at least one version control system (e.g., git, svn, hg), and have at least a cursory understanding of command-line use and system administration. Although these skills can be picked up within the first few weeks of the course, these topics will not be covered in class lectures. If you are unsure whether or not you have the necessary background, please contact the instructor.

Books

None.

Policies

For the homeworks, you must work individually. For the projects, you may work in groups of any size, however, groups larger than two must explicitly request permission from the instructor.

If you work alone, you submit your own work. If you work with partners, you submit your assignments jointly. Whether or not you work in a group, you may discuss the assignment details, designs, debugging techniques, or anything else with anyone you like in general terms, but you may not provide, receive, or take code to or from anyone outside of your group (unmodified third-party open-source libraries and packages are permitted). The code that you submit must be your own work and only your own work. Any evidence that source code has been copied, shared, or transmitted in any way between non-partners will be regarded as evidence of academic dishonesty.

You must declare your group via the course web interface.  You may change group composition for each assignment, as long as each change is announced within 5 days of that assignment’s handout.

Larger group sizes allow you to take on more challenging projects.  To balance out the advantages of a larger group compared to individuals working alone, grading strictness depends on the size of the group.  In the past, large groups have succeeded in submitting amazing projects.  However, beware of accepting deadbeats into your group: they are likely to hurt your grade beyond repair.

Some more-specific guidelines for the assignments:

  • You may not look at code from previous years of this course.
  • You may not look at code from similar courses at other universities.

Assignment Hand-in Policy

All deadlines are 3:59PM on the due date.  Submissions will be accepted on or after the due date.  Assignments submitted after the due date will be assessed a 1-point per day penalty (multiplied by the number of group members) for each late day, in 24-hour increments.

Course Mailing List

Subscription to the course mailing list is mandatory.

This term we will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates, the TA, and the instructor. Rather than emailing questions to the teaching staff, you should post your questions on Piazza. If you have any problems or feedback for the developers, email team@piazza.com.

Find our class page at: https://piazza.com/stonybrook/spring2018/cse356/home

Disability Support Services

If you have a physical, psychological, medical or learning disability that may impact your course work, please contact Disability Support Services, ECC (Educational Communications Center) Building, room 128, (631) 632-6748. They will determine with you what accommodations, if any, are necessary and appropriate. All information and documentation is confidential. http://studentaffairs.stonybrook.edu/dss/

Academic Integrity

Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person’s work as your own is always wrong. Faculty are required to report any suspected instances of academic dishonesty to the Academic Judiciary. For more comprehensive information on academic integrity, including categories of academic dishonesty, please refer to the academic judiciary website at http://www.stonybrook.edu/uaa/academicjudiciary/

Critical Incident Management

Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of University Community Standards any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students’ ability to learn. Further information about most academic matters can be found in the Undergraduate Bulletin, the Undergraduate Class Schedule, and the Faculty-Employee Handbook.


Directions for logging into OpenStack

To get started, you need to request access to our OpenStack cloud by submitting this form. Note that you must use your @cs credentials to submit the form. If you don’t have an @cs account, request one by emailing rt@cs or visiting the IT staff on the second floor of the New CS building.

After your account is authorized, you may access the OpenStack web interface at https://cloud.compas.cs.stonybrook.edu.


Homework #0 (web server)

Due: Jan 30

  • Step 1: Create a new m.micro Linux server
  • Step 2: Assign a public IP to it and log into it
  • Step 2: Create a static web page in the server’s document root called hw0.html that contains the string “Hello world” and one image

Homework #1 (ansible, git)

Due: Feb 6

  • Step 1: Place your HW#0 files into a public git repository (use a service such as github or bitbucket)
  • Step 2: Create an Ansible playbook to deploy your HW#0 on Ubuntu 16.04 servers, checking out the files from git and using “hw1” as the name for hosts: in your inventory
  • Step 3: Place your playbook at http://yourserver/hw1.yml

Warm-up Project #1

Due: Feb 15

  • Step 1: Create a front page at http://yourserver/ttt/ – the page must include at least one CSS file which changes the appearance of something on the page and a POST FORM that requests and submits a field called ‘name’. (The FORM ACTION should point to this page’s own URL)
  • Step 2: If the page receives a POST parameter called “name”, it should output “Hello $name, $date” with the name and date filled in dynamically. (Do not use client-side JavaScript for this part)
  • Step 3: Create a REST-based Tic-Tac-Toe service at http://yourserver/ttt/play that takes as input a JSON object including a ‘grid’ property and returns a JSON object including a ‘grid’ property and a ‘winner’ property. The ‘grid’ property is an array of 9 characters, each being a space (‘ ‘), ‘X’, or ‘O’. The ‘winner’ property is a single character to indicate who won.
  • Step 4: Integrate the REST-based tic-tac-toe service into your front page that starts operating when the page is loaded with a ‘name’ specified. (Use client-side JavaScript for this part)

Homework #2 (mongodb)

Due: Feb 20

  • Step 1: Install mongodb, configure it to listen to network connections
  • Step 2: Create a database called “hw2”
  • Step 3: Create a collection called “factbook”
  • Step 4: Populate the collection with data from https://github.com/opendatajson/factbook.json
    (hint, write a script to do it)
  • Step 5: Open TCP port 27017 from IP 130.245.168.156 in the Security Group

Warm-up Project #2

Due: Mar 1

  • Step 1: Develop a user-creation system validated with email
    /adduser, { username:, password:, email: }

    creates a disabled user

    /verify, { email:, key: }

    key sent via email (backdoor key is “abracadabra”). Optionally, IN ADDITION to a JSON POST request, you may also make this API call accept a GET request with the two parameters in the query string, to allow for a direct link from the verification email.

  • Step 2: Add cookie-based session support
    /login, { username:, password: }
    /logout
  • Step 3: Modify your Tic-Tac-Toe REST service at http://yourserver/ttt/play to take as input a JSON object including a ‘move’ property to indicate on which square (0-indexed, in reading order) the human is making a move in the current game. The server should respond with a JSON object that includes a ‘grid’ property and a ‘winner’ property as in WP#1. Making a request with { move:null } should return the current grid without making a move. Once a winning or tying move has been sent to the server, the server should consider the game completed and reset the grid.
  • Step 4: Maintain the history of previously played games by each user on the server.
    /listgames

    to get { status:”OK”, games:[ {id:, start_date:}, …] }

    /getgame, { id: }

    to get { status:”OK”, grid:[“X”,”O”,…], winner:”X” }

    /getscore

    to get { status:”OK”, human:0, wopr: 5, tie: 10 }

  • Clarification: all of the above API calls must be POST requests with a JSON object for the request and JSON object as a response of either { status:”OK” } or { status:”ERROR” } (unless otherwise specified).

Homework #3 (rabbitmq)

Due: Mar 8

  • Step 1: Install rabbitmq
  • Step 2: Create a direct exchange called “hw3”
  • Step 3: Create a REST service
    /listen { keys: [array] }

    Creates an exclusive queue, binds to “hw3” with all provided keys, waits to receive a message and returns it as { msg: }

  • Step 4: Create a REST service
    /speak { key:, msg: }

    Publishes the message to exchange hw3 with provided key


Homework #4 (cassandra)

Due: Mar 20

  • Step 1: Install Cassandra
  • Step 2: Create “hw4” keyspace (replication factor 1)
  • Step 3: Create a table “imgs” that includes a filename (string) and contents (blob) columns
  • Step 4: Create a POST form target
    /deposit { filename: (type=text), contents: (type=file) }

    Uploaded files should be deposited into hw4/imgs in Cassandra

  • Step 5: Create a GET service
    /retrieve { filename: }

    to get the previously uploaded image (make sure to respond with the appropriate image/… content type)

(note: use Cassandra 2.2 (22x) for this homework)


Course Project

  • Milestone 1: Mar 22
  • Milestone 2: Apr 5
  • Milestone 3: Apr 19
  • Milestone 4: May 3

Implement a Twitter clone with the following features. At a minimum, you must implement the API we provide.

  • M1 – Log in/out
  • M1 – Post tweets
  • M1 – See feed of tweets

Homework #5 (load balancer)

Due: Mar 27

  • Step 1: Install nginx
  • Step 2: Configure it as a round-robin reverse proxy between backends http://grader.cse356.compas.cs.stonybrook.edu:9000/ , http://grader.cse356.compas.cs.stonybrook.edu:9001/ , and http://grader.cse356.compas.cs.stonybrook.edu:9002/
  • Step 3: Make sure failures of a backend server (e.g., timeouts or 50x responses) are not fatal and allow the other backends to handle requests

Homework #6 (elasticsearch)

Due: Apr 3

  • Step 1: Install Elasticsearch with Kibana
  • Step 2: Create an index called “hw6”
  • Step 3: Populate the index with IMDB data about some relatively recent movies(https://grader.cse356.compas.cs.stonybrook.edu/static/movies.json)
    (hint, use logstash)
  • Step 4: Create a visualization to chart the top rated movie for every year and the average movie earnings for each year.

(note: don’t forget to open the appropriate default port(s) for both Elasticsearch and Kibana in the security group settings)


Homework #7 (mysql, memcached)

Due: Apr 10

  • Step 1: Install a mysql variant (mysql, maria, percona, …)
  • Step 2: Create a database called “hw7”
  • Step 3: Create a table called “assists” and import MLS 2017 data for assists by soccer players (https://github.com/jokecamp/FootballData/blob/master/MLS/2017/assists.csv)
  • Step 4: Create a REST service to access the data and return the top assisting player for a given club in the given position, and the average number of assists by all players of that club in that position
    /hw7?club=HOU&pos=M

    to get { club:, pos:, max_assists:, player:, avg_assists:}

  • Note: Use a higher value of the goals scored (GS field) as a tiebreaker for players who have equal assists(A field).

  • Step 5: Install memcached
  • Step 6: Integrate memcached caching to speed up the REST-based service