CSCI 2310
Operating Systems

Bowdoin College
Fall 2014
Instructor: Sean Barker

Project 3 - Virtual Memory Manager

This project will help you understand address spaces and virtual memory management.

In this project, you will implement an external pager, which is a process that handles virtual memory requests for application processes. The external pager ("pager" for short) is analogous to the virtual-memory portion of a normal operating system. The pager will handle address space creation, read and write faults, address space destruction, and simple argument passing between spaces. Your pager will manage a fixed range of the address space (called the arena) of each application that uses it. Your pager will be single threaded, handling each request to completion before processing the next request. Valid pages in the arena will be stored in (simulated) physical memory or in (simulated) disk. Your pager will manage these two resources on behalf of all the applications using the pager.

In addition to handling page faults, your pager will handle two system calls: vm_extend and vm_syslog. An application uses vm_extend to ask the pager to make another virtual page of its arena valid. An application uses vm_syslog to ask the pager to print a message to its console.

This handout is organized as follows. Section 1 describes how the infrastructure for this project performs the same tasks as the MMU and exception mechanism on normal computer systems. Section 2 describes the MMU used in this project. Section 3 describes the system calls that applications can use to communicate explicitly with the pager. Section 4 is the main section; it describes the functionality that you will implement in the external pager. Section 5 describes how your pager will maintain the emulated page tables and access physical memory and disk. Section 6 gives some hints for doing the project, and Sections 8-11 describe the test suite and project logistics/grading.

Download the starter files here or on the autograder using wget:

$ wget http://lass.cs.umass.edu/~sbarker/teaching/courses/fall14/os/files/proj3.tar.gz
$ tar xzvf proj3.tar.gz

1. Infrastructure and System Structure

In a normal computer system, the CPU's MMU (memory management unit) and exception mechanism perform several important tasks. The MMU is invoked on every virtual memory access. These important tasks include:

  1. For accesses to non-resident or protected memory, the MMU triggers a page fault or protection exception, transfers control to the kernel's fault handler, then retries the faulting instruction after the fault handler finishes.
  2. For resident memory accesses that are allowed by the page's protection, the MMU translates the virtual address to a physical address and accesses that physical address.
  3. Some MMUs automatically maintain dirty and reference bits; other MMUs leave this task to be handled in software. The MMU in this project does NOT automatically maintain dirty or reference bits.

On normal computer systems, system call instructions also invoke the exception mechanism. When a system call instruction is executed, the exception mechanism transfers control to the registered kernel handler for this exception.

We provide software infrastructure to emulate the MMU and exception functionality of normal hardware. To use this infrastructure, each application that uses the external pager must include "vm_app.h" and link with "libvm_app.a", and your external pager must include "vm_pager.h" and link with "libvm_pager.a". You do not need to understand the mechanisms used to emulate this functionality (in case you're curious, the infrastructure uses mmap, mprotect, SEGV handlers, named pipes, and remote procedure calls).

Linking with these libraries enables application processes to communicate with the pager process in the same manner as applications on real hardware communicate with real operating systems. Applications issue load and store instructions (compiled from normal C++ variable accesses), and these are translated or faulted by the infrastructure exactly as in the above description of the MMU. The infrastructure transfers control on faults and system calls to the pager, which receives control via function calls.

The following diagram shows how your pager will interact with applications that use the pager. An application makes a request to the system via the function calls vm_extend, vm_syslog and vm_yield, or by trying to load or store an address that is non-resident or protected.

                        initialize VM system         -->    vm_init
                        create process               -->    vm_create
                        end process                  -->    vm_destroy
                        switches to new process      -->    vm_switch

vm_yield         -->    may switch to new process
vm_extend        -->    system call handler          -->    vm_extend
vm_syslog        -->    system call handler          -->    vm_syslog
faulting load    -->    exception handler            -->    vm_fault
faulting store   -->    exception handler            -->    vm_fault

+-----------+           +----------------+                +----------------+
|APPLICATION|           | INFRASTRUCTURE |                | EXTERNAL PAGER |
+-----------+           +----------------+                +----------------+

Note that there are two versions of vm_extend and vm_syslog: one for applications and one for the pager. The application-side vm_extend/vm_syslog is implemented in libvm_app.a and is called by the application process. The pager-side vm_extend/vm_syslog is implemented by you in your pager. Think of the vm_extend/vm_syslog in libvm_app.a as a system call wrapper, and think of the vm_extend/vm_syslog in your pager as the code that is invoked by the system call. When the application calls its vm_extend/vm_syslog (the one in libvm_app.a), the infrastructure takes care of invoking the system call vm_extend/vm_syslog in your pager. See the header files vm_app.h and vm_pager.h for the actual function declarations.

2. Simulated MMU

The MMU being simulated in this project is a single-level, fixed-size page table. A virtual address is composed of a virtual page number and a page offset:

         bit 63-13              bit 12-0
   | virtual page number  |   page offset     |

The page table used by this MMU is an array of page table entries (PTEs), one PTE per virtual page in the arena. The MMU locates the page table through the page table base register (PTBR). The PTBR is a variable that is declared and defined by the infrastructure (but will be controlled by your pager). The following portion of vm_pager.h describes the arena, PTE, page table, and PTBR.

 * ***********************
 * * Definition of arena *
 * ***********************

/* pagesize for the machine */
#define VM_PAGESIZE 8192

/* virtual address at which application's arena starts */
#define VM_ARENA_BASEADDR    ((void *) 0x60000000)

/* virtual page number at which application's arena starts */

/* size (in bytes) of arena */
#define VM_ARENA_SIZE    0x20000000

 * **************************************
 * * Definition of page table structure *
 * **************************************

 * Format of page table entry.
 * read_enable=0 ==> loads to this virtual page will fault
 * write_enable=0 ==> stores to this virtual page will fault
 * ppage refers to the physical page for this virtual page (unused if
 * both read_enable and write_enable are 0)
typedef struct {
    unsigned long ppage : 51;   /* bit 0-50 */
    unsigned int read_enable : 1; /* bit 51 */
    unsigned int write_enable : 1;  /* bit 52 */
} page_table_entry_t;

 * Format of page table.  Entries start at virtual page VM_ARENA_BASEPAGE,
 * i.e. ptes[0] is the page table entry for virtual page VM_ARENA_BASEPAGE.
typedef struct {
    page_table_entry_t ptes[VM_ARENA_SIZE/VM_PAGESIZE];
} page_table_t;

 * MMU's page table base register.  This variable is defined by the
 * infrastructure, but it is controlled completely by the student's pager code.
extern page_table_t *page_table_base_register;

3. Interface used by applications of the external pager

Applications use three system calls to communicate explicitly with the simulated operating system: vm_extend, vm_syslog, and vm_yield. The prototypes for these system calls are given in the file "vm_app.h":

 * vm_app.h
 * Public routines for clients of the external pager

#ifndef _VM_APP_H_
#define _VM_APP_H_

 * vm_extend() -- ask for the lowest invalid virtual page in the process's
 * arena to be declared valid.  Returns the lowest-numbered byte of the new
 * valid virtual page.  E.g., if the valid part of the arena before calling
 * vm_extend is 0x60000000-0x60003fff, the return value will be 0x60004000,
 * and the resulting valid part of the arena will be 0x60000000-0x60005fff.
 * vm_extend will return NULL if the disk is out of swap space.
extern void *vm_extend(void);

 * vm_syslog() -- ask external pager to log a message (message data must
 * be in address space controlled by external pager).  Logs message of length
 * len.  Returns 0 on success, -1 on failure,
extern int vm_syslog(void *message, unsigned int len);

 * vm_yield() -- ask operating system to yield the CPU to another process.
 * The infrastructure's scheduler is non-preemptive, so a process runs until
 * it calls vm_yield() or exits.
extern void vm_yield(void);

#define VM_PAGESIZE 8192

#endif /* _VM_APP_H_ */

The arena of a process is the range of addresses from (VM_ARENA_BASEADDR) to (VM_ARENA_BASEADDR + VM_ARENA_SIZE). The arena is initialized to have no valid virtual pages. An application calls vm_extend to ask for the lowest invalid page in its arena to be declared valid. vm_extend returns the lowest-numbered byte of the newly allocated memory. E.g., if the arena before calling vm_extend is 0x60000000-0x60003fff, the return value of the next vm_extend call will be 0x60004000, and the resulting valid part of the arena will be 0x60000000-0x60005fff. Each byte of a newly extended virtual page is defined to be initialized with the value 0. Applications can load or store to any address on a valid arena page. Depending on the protections and residencies set by the pager, some of these loads and stores will result in calls to the pager's vm_fault routine; however these faults are serviced without the application's knowledge. An application calls vm_syslog to ask the pager to print a message (all message data should be in the valid part of the arena). vm_syslog returns 0 on success and -1 on failure.

FYI, the vm_extend interface is similar to the sbrk call provided by Linux (and FreeBSD). The interface you are used to using to manage dynamic memory (new/malloc and delete/free) are user-level libraries built on top of sbrk.

The following is a sample application program that uses the external pager.

#include <iostream>
#include "vm_app.h"

using namespace std;

int main()
    char *p;
    p = (char *) vm_extend();
    p[0] = 'h';
    p[1] = 'e';
    p[2] = 'l';
    p[3] = 'l';
    p[4] = 'o';
    vm_syslog(p, 5);

4. Pager Specification

This section describes the functions you will implement in your external pager: vm_init, vm_create, vm_fault, vm_destroy, vm_extend, and vm_syslog. Note that you will not implement main(); instead, main() is included in libvm_pager.a. The infrastructure will invoke your pager functions as described below.

The following portion of vm_pager.h describes your pager functions:

 * vm_init
 * Called when the pager starts.  It should set up any internal data structures
 * needed by the pager, e.g. physical page bookkeeping, process table, disk
 * usage table, etc.
 * vm_init is passed both the number of physical memory pages and the number
 * of disk blocks in the raw disk.
extern void vm_init(unsigned int memory_pages, unsigned int disk_blocks);

 * vm_create
 * Called when a new process, with process identifier "pid", is added to the
 * system.  It should create whatever new elements are required for each of
 * your data structures.  The new process will only run when it's switched
 * to via vm_switch().
extern void vm_create(pid_t pid);

 * vm_switch
 * Called when the kernel is switching to a new process, with process
 * identifier "pid".  This allows the pager to do any bookkeeping needed to
 * register the new process.
extern void vm_switch(pid_t pid);

 * vm_fault
 * Called when current process has a fault at virtual address addr.  write_flag
 * is true if the access that caused the fault is a write.
 * Should return 0 on success, -1 on failure.
extern int vm_fault(void *addr, bool write_flag);

 * vm_destroy
 * Called when current process exits.  It should deallocate all resources
 * held by the current process (page table, physical pages, disk blocks, etc.)
extern void vm_destroy();

 * vm_extend
 * A request by current process to declare as valid the lowest invalid virtual
 * page in the arena.  It should return the lowest-numbered byte of the new
 * valid virtual page.  E.g., if the valid part of the arena before calling
 * vm_extend is 0x60000000-0x60003fff, the return value will be 0x60004000,
 * and the resulting valid part of the arena will be 0x60000000-0x60005fff.
 * vm_extend should return NULL on error, e.g., if the disk is out of swap
 * space.
extern void * vm_extend();

 * vm_syslog
 * A request by current process to log a message that is stored in the process'
 * arena at address "message" and is of length "len".
 * Should return 0 on success, -1 on failure.
extern int vm_syslog(void *message, unsigned int len);

4.1. vm_init

The infrastructure calls vm_init when the pager starts. Its parameters are the number of physical pages provided in physical memory and the number of disk blocks available on the disk. vm_init should set up whatever data structures you need to begin accepting vm_create and subsequent requests from processes.

4.2. vm_create

The infrastructure calls vm_create when a new application process starts. You should initialize whatever data structures you need to handle this process and its subsequent calls to the library. Its initial page table should be empty, since there are no valid virtual pages in its arena until vm_extend is called. Note that the new process will not be running until after it switched to via vm_switch.

4.3. vm_switch

The infrastructure calls vm_switch when the OS scheduler runs a new process. This allows your pager to do whatever bookkeeping it needs to register the fact that a new process is running.

4.4. vm_extend

vm_extend is called when a process wants to make another virtual page in its arena valid. vm_extend should return the lowest-numbered byte of the new valid virtual page. E.g., if the arena before calling vm_extend is 0x60000000-0x60003fff, the return value of vm_extend() will be 0x60004000, and the resulting valid part of the arena will be 0x60000000-0x60005fff.

vm_extend should ensure that there are enough available disk blocks to hold all valid virtual pages (this is called "eager" swap allocation). If there are no free disk blocks, vm_extend should return NULL. The benefit of eager swap allocation is that applications know at the time of vm_extend that there is no more swap space, rather than when a page needs to be evicted to disk.

Remember that an application should see each byte of a newly extended virtual page as initialized with the value 0. However, the actual data initialization needed to provide this abstraction should be deferred as long as possible.

4.5. vm_fault

The vm_fault routine is called in response to a read or write fault by the application. Your pager determines which accesses in the arena will generate faults by setting the read_enable and write_enable fields in the page table. Your pager determines which physical page is associated with a virtual page by setting the ppage field in the page table. A physical page should be associated with at most one virtual page in one process at any given time (no sharing).

vm_fault should return 0 after successfully handling a fault. vm_fault should return -1 if the address is to an invalid page.

4.5.1. Non-resident pages

If a fault occurs on a virtual page that is not resident, you must find a physical page to associate with the virtual page. If there are no free physical pages, you must create a free physical page by evicting a virtual page that is currently resident.

Use the second-chance (clock) algorithm to select a victim. The clock queue is an ordered list of all valid, resident virtual pages in the system. When a virtual page is assigned to a physical page, that physical page should be placed on the tail of the clock queue (and marked as referenced). To select a victim, remove and examine the physical page at the head of the queue. If it has been accessed in any way since it was last placed on the queue, it should be added to the tail of the queue (and its protection updated appropriately), and victim selection should proceed to the next page in the queue.

If the page at the head has not been accessed since it was last enqueued, then its virtual page should be evicted. Dirty and clean pages are treated the same when selecting a victim page to evict.

Hint: many pages, even those that are not zero-filled, do not need to be paged out.

Also note that the order of pages in the clock queue may differ from the order of their physical page numbers.

4.5.2. Resident pages

Your pager controls the page protections for resident pages. Its goal in controlling protections is to maintain any state it needs to defer work and implement the clock replacement algorithm (e.g. dirty and reference bits). An access to a resident page will generate a page fault if the page's protection does not allow the access. On these faults, vm_fault should update state as needed, change the protections on the virtual page, and continue.

4.6. vm_syslog

vm_syslog is called with a pointer to an array of bytes in the current process's virtual address space and the length of that array. Your pager should first check that the entire message is in valid pages of the arena. Return -1 (and don't print anything) if any part of the message is not on a valid arena page, or if length is zero.

After checking for invalid addresses, your pager should next copy the entire array into a C++ string in the pager's address space, then print the C++ string to cout. You should use the following snippet of code for your print statement (this assumes the C++ string variable is named "s"):

Remember to print using exactly this formatting!

    cout << "syslog \t\t\t" << s << endl;

Most of the work in vm_syslog will be copying the array into the pager's C++ string. vm_syslog must handle virtual to physical address translation while copying the message from one address space to another. (Hint: you should treat vm_syslog's accesses to the application's data exactly as if they came from the application program for the purposes of protection, residence, and reference bits. Why?). vm_syslog should copy the application's data starting at the lowest virtual address and proceeding toward the highest virtual address.

4.7. vm_destroy

vm_destroy is called by the infrastructure when the corresponding application exits. This routine must deallocate all resources held by that process. This includes page tables, physical pages, and disk blocks. Physical pages that are released should be put back on the free list.

4.8. Deferring and avoiding work

There are many points in this project where you have some freedom over when zero-fills, faults, and disk I/O happen. You must defer such work as far into the future as possible.

Similarly, there are points in this project where careful state maintenance can help you avoid doing work. Whenever possible, avoid work. For example, if a page that is being evicted does not need to be written to disk, don't do so. (However, the victim selection algorithm in Section 5.5.1 must be used as specified; e.g. don't change the victim selection to avoid writing a page to disk).

Note that you will need to maintain reference and dirty bits to defer work and to implement the clock algorithm. Since the MMU for this project does not maintain dirty or reference bits, your pager will maintain these bits by generating page faults on appropriate accesses.

If you could possibly defer or avoid some action at the possible expense of making another necessary, keep in mind that incurring a fault (about 5 microseconds on current hardware) is cheaper than zero-filling a page (30 microseconds), which is in turn cheaper than a disk I/O (10 milliseconds). For instance, if you have a choice between taking an extra page fault and causing an extra disk I/O, you should prefer to take the extra fault.

5. Interface used by external pager to access the simulated hardware

This section describes how your external pager will access the simulated hardware, i.e. physical memory, disk, and MMU.

The following portion of vm_pager.h describes the variables and utility functions for accessing this hardware.

 * *********************************************
 * * Public interface for the disk abstraction *
 * *********************************************
 * Disk blocks are numbered from 0 to (disk_blocks-1), where disk_blocks
 * is the parameter passed in vm_init().

 * disk_read
 * read block "block" from the disk into physical memory page "ppage".
extern void disk_read(unsigned int block, unsigned int ppage);

 * disk_write
 * write the contents of physical memory page "ppage" onto disk block "block".
extern void disk_write(unsigned int block, unsigned int ppage);

 * ********************************************************
 * * Public interface for the physical memory abstraction *
 * ********************************************************
 * Physical memory pages are numbered from 0 to (memory_pages-1), where
 * memory_pages is the parameter passed in vm_init().
 * Your pager accesses the data in physical memory through the variable
 * pm_physmem, e.g. ((char *)pm_physmem)[5] is byte 5 in physical memory.
extern void * pm_physmem;

Physical memory is structured as a contiguous collection of N pages, numbered from 0 to N-1. It is settable through the -m option when you run the external pager (e.g. by running "pager -m 4"). The minimum number of physical pages is 2, the maximum is 128, and the default is 4. Your pager can access the data in physical memory via the array pm_physmem.

The disk is modeled as a single device that is disk_blocks "blocks" long, where each disk block is the same size as a physical memory page. Your pager will use two functions: disk_read and disk_write. Arguments to each function are a disk block number and a physical page number. The disk_write function is used to write data from a physical page out to disk, and the disk_read function is used to read data from disk into a physical page.

Your pager controls the operation of the MMU by modifying the contents of the page table and the variable page_table_base_register.

6. Hints

The first thing you should do is to write down a finite state machine for the life of a virtual page, from creation via vm_extend to destruction via vm_destroy. Ask yourself what events can happen to a page at each stage of its lifetime, and what state you will need to keep to represent each state. As you design the state machine, try to identify all of the places in the state machine where work can be deferred or avoided. A large portion of the credit in this project hinges on having this state machine correct.

Use assertion statements copiously in your process library to check for unexpected conditions generated by bugs in your program. These error checks are essential in debugging complex programs, because they help flag error conditions early.

Read-faults should typically make the virtual page read-only (read_enable=1, write_enable=0), but NOT always.

Virtual pages will never be write-only (read_enable=0, write_enable=1).

7. Test cases

An integral (and graded) part of writing your pager will be to write a suite of test cases to validate any pager. This is common practice in the real world--software companies maintain a suite of test cases for their programs and use this suite to check the program's correctness after a change. Writing a comprehensive suite of test cases will deepen your understanding of virtual memory, and it will help you a lot as you debug your pager. To construct a good test suite, trace through different transition paths that a page can take through a pager's state machine, then write a short test case that causes a page to take each path.

Each test case for the pager will be a short C++ application program that uses a pager via the interface described in Section 3 (e.g. the example program in Section 3). Each test case should be run without any arguments and should not use any input files.

Your test suite may contain up to 20 test cases. Each test case may cause a correct pager to generate at most 256 KB of output and must take less than 60 seconds to run. These limits are much larger than needed for full credit. You will submit your suite of test cases together with your pager, and we will grade your test suite according to how thoroughly it exercises a pager. See Section 9 for how your test suite will be graded.

Each test case will specify the number of physical memory pages to use when running the pager (the -m option) for the test case. This parameter will be communicated via the name of the test case file. Each test case should be of the following format:


where memoryPages identifies the number of physical memory pages to use with the pager, and the parts are separated by periods. Remember that the minimum number of physical memory pages is 2 and the maximum is 128.

You should test your pager with both single and multiple applications running. However, your submitted test suite need only be a single process; none of the buggy pagers used to evaluate your test suite require multi-process applications to be exposed. If you do want to try your hand at multi-process test cases, I suggest you run a parent process that then forks child processes to do the actual pager requests. The parent process can communicate with child processes by creating pipes before forking. We'll assume the entire set of application processes is done when the initial process exits, so it should exit after all the other application processes. Writing a correct multi-process test case is challenging (which is why we don't require multi-process test cases in the submitted test suite), but you'll learn some interesting things.

8. Project logistics

Write your pager in C++ on Linux. The public functions in vm_pager.h are declared "extern", but all other functions and global variables in your pager should be declared "static" to prevent naming conflicts with other libraries.

Use g++ (/usr/bin/g++) to compile your programs. You may use any functions included in the standard C++ library, including (and especially) the STL. You should not use any libraries other than the standard C++ library. To compile a pager "pager.cc", use the command "g++ pager.cc libvm_pager.a" (of course, you can add things like -g for debugging and -o to name the executable). You compile an application just like any other C++ program, except that you must link with libvm_app.a. E.g. "g++ app.cc libvm_app.a"

Your pager must be in a single file and must be named "pager.cc".

Here's how to run your pager and an application. First start the pager. The infrastructure will print a message saying "Pager started with # physical memory pages", where "#" refers to the number of physical memory pages. After the pager starts, you can run one or more application processes which will interact with the pager via the infrastructure. The same user must run the pager and the applications that use the pager, and all processes must run on the same computer.

We will place copies of vm_app.h, vm_pager.h, libvm_app.a, and libvm_pager.a in the starter files. You should make copies of these files and store them in your own directory so you can do "#include vm_app.h" in your applications and "#include vm_pager.h" in your pager (without needing the -I flag to g++).

9. Grading, auto-grading, and formatting

To help you validate your programs, your submissions will be graded automatically, and the result will be mailed back to you. You may then continue to work on the project and re-submit.

For this project, you can use up to 5 bonus submissions to the autograder.

The student suite of test cases will be graded according to how thoroughly they test a pager. We will judge thoroughness of the test suite by how well it exposes potential bugs in a pager. The auto-grader will first run a test case with a correct pager and generate the correct output FROM THE PAGER (on stdout, i.e. the stream used by cout) for this test case. The auto-grader will then run the test case with a set of buggy pagers. A test case exposes a buggy pager by causing the buggy pager to generate output (on stdout) that differs from the correct output. The test suite is graded based on how many of the buggy pagers were exposed by at least one test case. This is known as "mutation testing" in the research literature on automated testing.

Because your programs will be auto-graded, you must be careful to follow the exact rules in the project description:

  1. Your code should not print any output other than that specified for vm_syslog. The pager infrastructure also prints messages to help you debug (and to allow the auto-grader to understand what the pager is doing); you can disable these messages by running the pager with the "-q" flag.
  2. Do not modify the header files or libraries provided in the project directory.

In addition to the auto-grader's evaluation of your programs' correctness, I will evaluate your programs on issues such as the clarity and completeness of your documentation, coding style, the efficiency, brevity, and understandability of your code, etc. Your pager documentation should give an overall picture of your solution, with enough detail that I can easily read and understand your code. You should present a list of all of the places in your solution that you deferred work; give both the event that you deferred, and the time at which you had to do the work.

10. Turning in the project

Use the submit2310 program to submit your files. submit2310 submits the set of files associated with a project part, and is called as follows:

    submit2310 <project-part> <file1> <file2> <file3> ... <filen>

Here are the files you should submit for this project:

  1. C++ program for your pager, which should be named "pager.cc".
  2. suite of test cases (each test case is an C++ program in a separate file). The name of each test case must follow the format described in Section 7, e.g.:
        submit2310 3 pager.cc test1.4.cc test2.32.cc

The official time of submission for your project will be the time of your last submission. If you send in anything after the due date, your project will be considered late.

11. Project Writeup

For your project writeup, I would like you to write a short (3-5 pages) paper that summarizes your project. In particular, you should include i) an introductory section that highlights the purpose of the project, ii) an architectural overview/design section that describes the structure of your code (if a figure makes your discussion more clear, use one!), iii) an evaluation section that discusses your test cases and how you verified the correct behavior of your pager, and iv) a conclusion that draws conclusions and reflects on the assignment in general. The purpose of the writeup is to help you gain experience with technical writing. Send me the writeup as a PDF (via email) no later than 48 hours after the due date for the code.