Problem:

Modern operating systems treat all cores equivalently assuming that each core in the system is full-fledged core with all protection domains (primarily ring zero and ring three in x86 architecture). This project aims at adding an initial effort to augment Linux kernel with support for cores that do not have all protection domains. Building an OS that can handle cores without ring zero will allow experimenting with building heterogeneous systems with simpler cores that don’t support ring zero. Primary workload targeted for such a system is the one that primarily consists of tasks running in user space which rarely execute in ring zero e.g. web server.

Design:

Modified Linux kernel will be used for system where there is a combination of cores. Some core will have all the protection domains called complex cores while some cores will have only ring three protection domain called simple cores. This project modifies Linux kernel to add system call execution framework, demand paging framework and signal handling framework for tasks that will run on simple cores. As of now modified Linux kernel supports x86 system with two cores where one core is complex and other core is a simple core. Simple core is simulated on homogeneous system by disabling all traps, interrupts including timer interrupts for second core of two core homogeneous system. Also, “isolcpus=1” Linux kernel parameter was passed at boot time for isolating second core and thus exclude load balancing for second core. Task on simple core was loaded with modified musl libc to prevent it from executing system calls directly using hardware trap instructions like “syscall” or “int x80”.

System Call - R3 (2)

  1. syscall_thread: A kernel thread called “syscall_thread” is spawned on complex core that will be responsible for executing system call requests from task running on simple core.  This kernel thread will be pinned to complex core which has all the protection domains and it will serve all system call requests on behalf of tasks on simple core by impersonating that task on complex core.
  2. System Call Execution: System call arguments will be passed from task running on simple core to syscall_thread using system wide shared page mapped right from boot time. This page called “Magic page” is mapped in each process address space at fixed virtual address – page subsequent to vsyscall page. Task on simple core spins until syscall_thread completes system call execution. Result of system call execution is returned again through the same magic page.
  3. Signal Handling: Kernel threads ignore signals but “syscall_thread” in our system is a special kernel thread which is responsible for receiving signals on behalf of task on simple core. Although syscall_thread receives signals it handles signals only after the completion of subsequent system call from task on simple core. Also, syscall_thread does not handle signal through regular signal subsystem but only removes signal from its pending signal queues and writes signal handler to the magic page. Once task on simple core receives result of system call execution through magic page it also checks if there is any pending signal that needs to be handled. If yes then it jumps to the signal handler. Thus, user space function call makes sure that after signal handler completes the execution control returns back to instruction next to system call instruction.
  4. Demand Paging: On page fault for task on simple core it enters ring zero and copies faulting address and other relevant fields to Magic page. “syscall_thread” on complex core then serves this page fault on behalf of task on simple core. It is necessary to enter ring zero on simple core in x86 architecture as cr2 register value that stores faulting address can only be read from ring zero. On actual heterogeneous system reading cr2 register value from simple core should not be a problem since simple core does not have any protection domain and also it can’t execute kernel code.
  5. Musl libc changes: Musl libc system call interfaces are modified so that system call execution copies system call arguments to magic page and task spins until it receives result through magic page from syscall_thread running on complex core.

 

Current Status:

This OS has been tested to prove that it’s possible to run Linux kernel on proposed heterogeneous system. So OS was tested from range of simple programs like “Hello world” to more complex and signal intensive program like tetris game. Tetris game makes use of signals hence it was used to test signal handling framework of the OS. Rigorous testing of the system is still pending.

Open Items:

1) Support for execution of multi-threaded programs on simple core.
2) Multi-processor support where there will be more than one simple core in the system.
3) Custom IPI to schedule tasks on simple core.
4) Improvement in Performance and Security aspects of OS.

Report:

For more design and implementation specific details please refer to this project report – Project Report