Course Overview

The goal of this course is to introduce advanced-year undergraduate students to the field of Cloud Computing. This course will explain how large-scale systems such as Google, Facebook, and Twitter are built and provide students with the foundation needed to join or start a company that creates such systems.

Creating online services capable of handling millions of users requires a different mindset compared to traditional software development and deployment. Rather than building monolithic software packages from the ground up, bringing up modern online services calls for architecting systems by gluing together mature existing technologies deployed across many unreliable servers, working in concert to provide high-availability robust services. In this course, students will be exposed to the concepts and technologies behind deploying and scaling online services on the computing resources available in modern datacenters.

Outcomes

Students will gain theoretical and hands-on knowledge of concepts and software packages used to create modern online services. In lecture, the students will be introduced to high-level concepts of cloud computing and will receive an overview of the server software, libraries, and tools used for developing and deploying cloud applications. The concepts introduced in lecture will be reinforced by small hands-on homework assignments that put those concepts into practice. Ultimately, a final course project will have students combine these technologies to develop and deploy a robust and scalable online service on a cloud infrastructure such as Amazon Web Services, Google Cloud Platform, and Microsoft Azure.

Course Topics

Likely course topics will include: introduction to cloud services, virtualization, paravirtualization, advanced networking, web services, server-side scripting languages and frameworks, cloud programming paradigms, cloud deployment and machine management, scale-up vs scale-out, cloud storage, cloud service topologies, message serialization and transport, load balancing, content distribution networks, security, authentication, QoS, managing tail latencies, performance monitoring

Office Hours

NCS 343, Tuesdays, 2:30PM-3:50PM or by appointment

Evaluation

  • 10 homeworks – 30
  • Warm-up Project 1 – 10 (due: Feb 3)
  • Warm-up Project 2 – 10 (due: Mar 2)
  • Course Project – 40 (due: day of scheduled final)
  • Mid-term 1 – 15 (Mar 7)
  • Mid-term 2 – 15 (May 4)
  • In-class Demo(s) or PPTX lecture slides – 6 each

Signup for PPTX lecture slide slots and In-class Demo slots. Both should be done in groups of two (each group member signs up for one slot).

Prerequisites

Solid programming and debugging experience is a must. Students enrolling in this course are expected to have working knowledge of programming and debugging in at least one scripting language (e.g., JavaScript, Python, PHP, Ruby, Perl), be familiar with at least one version control system (e.g., git, svn, hg), and have at least a cursory understanding of command-line use and system administration. Although these skills can be picked up within the first few weeks of the course, these topics will not be covered in class lectures. If you are unsure whether or not you have the necessary background or if you are unable to sign up via the web, please contact the instructor.

Books

None.

Policies

For the homeworks, you must work individually. For the projects, you may work in groups of any size, however, groups larger than two must explicitly request permission from the instructor.

If you work alone, you submit your own work. If you work with partners, you submit your assignments jointly. Whether or not you work in a group, you may discuss the assignment details, designs, debugging techniques, or anything else with anyone you like in general terms, but you may not provide, receive, or take code to or from anyone outside of your group (unmodified third-party open-source libraries and packages are permitted). The code that you submit must be your own work and only your own work. Any evidence that source code has been copied, shared, or transmitted in any way between non-partners will be regarded as evidence of academic dishonesty.

You must declare your group via email to the instructor and TA at most 5 days after the assignment handout.  You may change group composition for each assignment, as long as each change is announced within 5 days of that assignment’s handout.

Larger group sizes allow you to take on more challenging projects.  To balance out the advantages of a larger group compared to individuals working alone, grading strictness depends on the size of the group.  In the past, large groups have succeeded in submitting amazing projects.  However, beware of accepting deadbeats into your group: they are likely to hurt your grade beyond repair.

Some more-specific guidelines for the assignments:

  • You may not look at code from previous years of this course.
  • You may not look at code from similar courses at other universities.

Assignment Hand-in Policy

All deadlines are 11:59PM on the due date.  Submissions will be accepted on or after the due date.  Assignments submitted after the due date will be assessed a 1-point per day penalty (multiplied by the number of group members) for each late day, in 24-hour increments.

Course Mailing List

Subscription to the course mailing list is mandatory.

This term we will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates, the TA, and the instructor. Rather than emailing questions to the teaching staff, you should post your questions on Piazza. If you have any problems or feedback for the developers, email team@piazza.com.

Find our class page at: https://piazza.com/stonybrook/spring2017/cse356/home

Disability Support Services

If you have a physical, psychological, medical or learning disability that may impact your course work, please contact Disability Support Services, ECC (Educational Communications Center) Building, room 128, (631) 632-6748. They will determine with you what accommodations, if any, are necessary and appropriate. All information and documentation is confidential. http://studentaffairs.stonybrook.edu/dss/

Academic Integrity

Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person’s work as your own is always wrong. Faculty are required to report any suspected instances of academic dishonesty to the Academic Judiciary. For more comprehensive information on academic integrity, including categories of academic dishonesty, please refer to the academic judiciary website at http://www.stonybrook.edu/uaa/academicjudiciary/

Critical Incident Management

Stony Brook University expects students to respect the rights, privileges, and property of other people. Faculty are required to report to the Office of Judicial Affairs any disruptive behavior that interrupts their ability to teach, compromises the safety of the learning environment, or inhibits students’ ability to learn.


Directions for logging into work containers

To get started, you need to request a container by submitting this form.
After your container is created, you may retrieve your ssh port number (#####) and your password here.
You can reach your web server at
http://yournetid.cse356.compas.cs.stonybrook.edu/

To log into your container, run:

ssh -p##### root@130.245.169.83

using the password retrieved above.
To copy your ssh key from a local *nix machine, run:

ssh-copy-id -i ~/.ssh/id_rsa.pub -p##### root@130.245.169.83

If you need to create an ssh key, on your local machine, run:

ssh-keygen

Homework #0 (web server)

Due: Jan 26

  • Step 1: Log into Linux server
  • Step 2: Create a static web page in the server’s document root called hw0.html that contains at least the string “Hello world” and one image

Warm-up Project #1

Due: Feb 3

  • Step 1: Create a front page at http://yourserver/eliza/ – the page must include at least one CSS file which changes the appearance of something on the page and a POST form that requests and submits a field called “name” (the FORM action should point to this page’s own URL).
  • Step 2: If the page receives a POST parameter called “name”, it should output “Hello $name, $date” with the name and date filled in dynamically. (do not use client-side JavaScript for this part)
  • Step 3: Create a REST-based ELIZA service at http://yourserver/eliza/DOCTOR that takes as input a JSON object including a “human” property and receives back a JSON object including an “eliza” property, each containing the corresponding next phrase of the therapy session.
  • Step 4: Integrate the REST-based ELIZA service into your front page that starts operating when the page is loaded with a “name” specified. (use client-side JavaScript for this part)

Homework #1 (ansible, git)

Due: Feb 21

  • Step 1: Place your HW#0 files into a public git repository (use a service such as github or bitbucket)
  • Step 2: Create an Ansible playbook to deploy your HW#0 on Ubuntu 16.04 servers, checking out the files from git and using “hw1” as the name for hosts: in your inventory
  • Step 3: Place your playbook at http://yourserver/hw1.yml

Warm-up Project #2

Due: Mar 2

  • Step 1: Log into cloud (https://cloud.compas.cs.stonybrook.edu/, Keystone Credentials, domain: cse356, username@cs.stonybrook.edu), create a VM using the cse356-small flavor (add security group, ssh key, public IP) [if using public cloud, coordinate with TAs, especially with regard to instance size]
  • Step 2: Migrate ELIZA setup into cloud VM, hosting it in the root directory of your server (e.g., http://yoursever/DOCTOR)
  • Step 3: Add user creation with email
    /adduser, { username:, password:, email: }

    creates a disabled user

    /verify, { email:, key: }

    key sent via email (backdoor key is “abracadabra”). Optionally, IN ADDITION to a JSON POST request, you may also make this API call accept a GET request with the two parameters in the query string, to allow for a direct link from the verification email.

  • Step 4: Add cookie-based session support
    /login, { username:, password: }
    /logout
  • Step 5: Maintain history of ELIZA therapy conversations
    /listconv

    to get JSON response of { status:”OK”, conversations:[ {id:, start_date:}, …] }

    /getconv, { id: }

    to get array of { status:”OK”, conversation:[ {timestamp:, name:, text:}, …] }

  • Clarification: all of the above API calls must be POST requests with a JSON object for the request and JSON object as a response of either { status:”OK” } or { status:”ERROR” } (unless otherwise specified).

Homework #2 (mongodb)

Due: 2/28

  • Step 1: Install mongodb
  • Step 2: Create a database called “hw2”
  • Step 3: Create a collection called “factbook”
  • Step 4: Populate the collection with data from https://github.com/opendatajson/factbook.json
    (hint, write a script to do it)

Homework #3 (rabbitmq)

Due: 3/7

  • Step 1: Install rabbitmq
  • Step 2: Create a direct exchange called “hw3”
  • Step 3: Create a REST service
    /listen { keys: [array] }

    Creates an exclusive queue, binds to “hw3” with all provided keys, waits to receive a message and returns it as { msg: }

  • Step 4: Create a REST service
    /speak { key:, msg: }

    Publishes the message to exchange hw3 with provided key


Course Project

  • Milestone 1: Mar 23
  • Milestone 2: Apr 6
  • Milestone 3: Apr 20
  • Milestone 4: May 15

Implement a Twitter clone with the following features. At a minimum, you must implement the API we provide.

  • M1 – Log in/out
  • M1 – Post tweets
  • M1 – See feed of tweets
  • M2 – Delete tweets
  • M2 – Search for tweets
  • M2 – Follow users
  • M3 – Reply to tweets
  • M3 – Like and retweet tweets
  • M3 – Rank tweets in feed based on interest (likes, retweets)
  • M3 – Media (Images and Videos) on tweets
  • M4 – Handle large volume of tweets
  • M4 – Meet strict QoS (performance) guarantees

Homework #4 (cassandra)

Due: 3/21

  • Step 1: Install Cassandra
  • Step 2: Create “hw4” keyspace (replication factor 1)
  • Step 3: Create a table “imgs” that includes a filename (string) and contents (blob) columns
  • Step 4: Create a POST form target
    /deposit { filename: (type=text), contents: (type=file) }

    Uploaded files should be deposited into hw4/imgs in Cassandra

  • Step 5: Create a GET service
    /retrieve { filename: }

    to get the previously uploaded image (make sure to respond with the appropriate image/… content type)

(note: use Cassandra 2.2 (22x) for this homework)


Homework #5 (elasticsearch)

Due: 3/28

  • Step 0: Create an Ubuntu 16.04 VM in OpenStack
  • Step 1: Install elasticsearch with kibana
  • Step 2: Create an index called “hw5”
  • Step 3: Populate the index with department of education’s college scorecard (most recent cohorts) (https://catalog.data.gov/dataset)
    (hint, use logstash)
  • Step 4: Create a visualization to rank the top 300 colleges by SAT score (SAT_AVG) and by median earnings 10 years after graduation (MD_EARN_WNE_P10).

(note: don’t forget to open the appropriate port(s) in the security group settings)


Homework #6 (load balancer)

Due: 4/4

  • Step 1: Install nginx
  • Step 2: Configure it as a round-robin reverse proxy between backends http://bryang.cse356.compas.cs.stonybrook.edu:9000/ , http://bryang.cse356.compas.cs.stonybrook.edu:9001/ , and http://bryang.cse356.compas.cs.stonybrook.edu:9002/
  • Step 3: Make sure failures of a backend server (e.g., timeouts or 50x responses) are not fatal and allow the other backends to handle requests

Homework #7 (mysql, memcached)

Due: 4/11

  • Step 1: Install a mysql variant (mysql, maria, percona, …)
  • Step 2: Create a database called “hw7”
  • Step 3: Create a table called “electric” and import U.S. Electric Utility Companies and Rates: Look-up by Zipcode (2013) IOU rates into it (https://catalog.data.gov/dataset)
  • Step 4: Create a REST service to access the data and return the averages across all matching ZIP codes
    /hw7 { state:, service_type: }

    to get { status: “OK”, comm_rate_avg:, ind_rate_avg:, res_rate_avg: }

  • Step 5: Install memcached
  • Step 6: Integrate memcached caching to speed up the REST-based service

Homework #8 (hadoop)

Due: 4/18

  • Step 1: Install Hortonworks Sandbox
  • Step 2: Go through Lab 2 of Getting Started with HDP
  • Step 3: Go through Lab 3 of Getting Started with HDP
  • Step 4: Compute the average MPG for all trucks whose ID ends on the same digit as your SBUID
  • Step 5: Among the drivers whose ID ends on the same digit as your SBUID, determine the number of miles driven by the least-risky driver

Homework #9 (spark)

Due: 4/25

  • Step 1: Make a clone of the Lab 201: Intro to Machine Learning with Spark Zeppelin notebook in Hortonworks Sandbox.
  • Step 2: Go through the copy of the Lab that you created.
  • Step 3: Construct “your” feature vector by transforming the first 8 digits of your SBUID as follows: compute feature N (1-8) from digit N (counting from the left) with the formula SBUID_N as (SBUID_N/5-1.0). Ignore any unused digits from your SBUID.
  • Step 4: Use the entire provided diabetes data to train a decision tree in Spark, and then use the trained model to predict your diagnosis (by testing on your feature vector).
  • Step 5: Submit your prediction and the conditional probability as reported by the decision tree.