CS 220 - Lab 10

CS 210 - Lab 9: Modeling the Student Registration Process
Part 1 due April 27, 1999

General Lab Goals

This lab provides an opportunity for the class to design and implement a significant piece of software that can register students in a complete semester 's worth of courses at Bowdoin. This exercise provides an opportunity to design a significant computing application using C++ and the data structures we have studied in this course. The data we will use in this lab is actual Bowdoin registration and course data for the Spring 1998 semester.

This project will serve as the final exam for the course. It has a collaborative part and an individual part, which are explained below. All software and data files for this lab will be in the course folder.

Overview:

The picture in Figure 1 provides an overview of the registration process. The "Registration.cpp" program has three major input files; the class schedule, the students' registration cards, and a file listing time slots and conflicts. The output of the program would, in practice, have three major parts; class lists (which are distributed to instructors), student schedules (which are distributed to students), and printed statistics which summarize the results of the scheduling process. For the purposes of this exercise, we will skip the generation of class lists and student schedules, focussing instead on the registering of students in courses and the generation of enrollment statistics, as explained below.

The files classes.data and cards.data represent the actual input to the Spring 1998 registration process; they are provided in the folder for your use.

The file classes.data contains one line of text for each course section offered in Spring 1998. For simplicity, this file excludes lab sections. A sample of the first few lines in this file is shown in Figure 1.

Figure 1. First few lines of the "classes.data" file

Each line in this file contains information about a specific course -- its department, course number, scheduled meeting time, and capacity (maximum enrollment). Some lines continue with an equals sign (=) and another course number, indicating a course with which this course is cross-listed. The other (non-cross-listed) lines contain the course title and the name of the instructor.

Below is the beginning of a class definition for a single course in this file.

The file cards.data contains the information that students fill in on their registration cards. It has one line of text for each student who is registering for courses in Spring 1998. For simplicity, it also excludes lab preferences. A sample of the first few lines in this file is shown in Figure 2.

Figure 2. First few lines of the "cards.data" file.

Each line in this file contains information about a specific student -- a student id number, class, number of courses desired, total number of courses listed, and the course numbers of all the courses listed (in order of preference, reading left-to-right across the first line of the registration card, then the second, and then the third). For example, the first line shows that student #1, class of '90, wanted four courses and listed 6 courses on the card. That student's first line of choices are CHEM 226A, ES 216, BIO 272, and SOC 215; courses ES 391 and BIO 214 were listed as alternates on the second line.

Below is the beginning of a class definition of the Student class that can be used to read lines from this file. The names of the fields in this class correspond with the information on each line of Figure 2, from left to right. Since a student can list a variable number of course choices, these are defined below as an array of Strings.

In solving this problem, it is important to think about the basic algorithm for determining how courses are selected for a student. That is, each course listed in the student_choices array must be searched in a data structure that contains the information about all courses (that data structure should be loaded from the file classes.data before this algorithm can begin). Once that course is found, two criteria must be met before that student can be enrolled in that class:

That course must not be full to capacity (that is, the field enrollment, which keeps track of the number of students already enrolled in that course, must be updated each time a new student is added to the course), and
That course's meeting time must not be in conflict with the meeting time of any course in which the student is already enrolled.

The first criterion is easy to check. The second criterion can be checked by keeping a list of all the possible meeting times, and along with each one a list of all times that are in conflict with that time. Below is a partial list of these meeting times and their respective conflicting times; their interpretation should be self-explanatory.

A complete list of all class meeting times alongside all conflicting times, is given in the file conflicts.data. Below is a list of the first few lines in that file. Here, the first line says that the time MWF8 conflicts with the time M-F8, while the third line says that MWF9 conflicts with each of the three times MW915, MW9, and MW930.

Finally, the program should produce an output that has the information shown below.

In addition to the summary enrollment data for each course, the program should keep track of and display additional "Summary Statistics" and "Run Time Statistics."

The summary statistics reveal how many students did and did not get their desired number of courses, the number of times students were rejected from courses for time conflicts, and the number of times students were rejected from courses because the course was full.

The run time statistics reveal how much time it took to run the two major parts of your program, the loading of the class data structure and the registering of students. These numbers, of course, will vary depending on your choice of data structure and search strategy for registering students.

To assist you with this task, the skeleton program Register.cpp has already been provided. It simply reads records from the classes.data file into a Vector (an enhanced Array type), and then reads individual student records one-by-one. It contains the beginnings of a class definition for a student (called Student) and a course (called Course). You should already have some experience working with this from last week's class.

Major Lab Tasks

The work for this project can be divided into several major tasks. These will roughly correspond to the breakdown of the remaining labs.

Designing efficient data structures and flushing out the overall structure of the program.

As already noted for the courses you must use a HashTable.
For the conflicts use a Binary Tree.
The major operations on each data structure will be Insert and Find. You will not need to write Delete functions.
We will discuss a data structure for Students next week.

Addition of appropriate variables and calls to track the run time statistics for the two major parts of the program (see additional suggestions commented into the sample Register.cpp program).
Determining, for a given student and course, whether that course's time is in conflict with any other courses which the student has been assigned so far.
Registering a single student with the number of courses he/she desires.
Keeping track of the appropriate summary statistics
Displaying the registration results, summary statistics, and run time statistics after all the student cards have been processed.

Part 1 : Implement the major data structures for Courses and Conflicts. Your goal in the first week is to get the two major data structures of the program implemented and debugged. Courses are to be stored in a Hash Table and Conflicts in a Binary Tree. It is important to note that your implementations of these data structures should not merely work - they should be robust and efficient. For example, when putting data into a Binary Tree, a degenerate tree will result when the data is entered in order. This is not desirable! Typically what is done in cases like this is a process called "massaging the data." For a Binary Tree this can be accomplished by randomizing the order in which the data will be read. A simple test of this part of the lab is to put all of the courses and conflicts into their appropriate data structures and then as you read in students, check their choices for conflicts.

Suggestion: Tackle this part of the project one piece at a time. You should have already begun working on the Hash Table. Finish that first. Then work on reading in the conflicts. First just get the program to read them into a simple data structure (like an array). Once you can do that, then work on your Binary Tree. I will not help you debug one part of the program until you can prove to me that the previous parts of the program already work.

What to hand in: Hardcopy of your code and a listing of all the potential course conflicts. A simple format for this would be the student number followed by the names of the potentially conflicting courses. As usual, place copies of your code (appropriately named) in the drop box.

Part 2 : The final data structure you design should hold student information. The basic registration algorithm will proceed in a fashion similar to the algorithm you used in Lab 6. Essentially you will make multiple passes on all of the students, first scheduling all of the student's first choices, then all of their second choices, etc. Your data structure will need to hold all of the student information and access them giving first priority to seniors, second to juniors, etc. Within a class you may break ties arbitrarily. E.g. a senior and a junior both have the same class listed as their first priority. The senior will be able to register for the class before the junior has a chance. If two seniors have the same course, on the other hand, the first senior you process has higher priority.

Add in appropriate statistical information.

Part 3: Implement the registration algorithm described in part 2. Add in functionality to display all of the relevant statistical information.

Final: Your final (take home) will be a theoretical analysis of the time complexity of the finished project.
        Calculate the theoretical complexity of this program, taking into account all data structures used, methods called, and the number of steps required
       to run each part of the program. You should develop this calculation by identifying how various relevant parts of the text of the program contribute to its
       complexity. The final expression should be written in the form O(t), where involves at least the following variables:
              the number of students ns, in the students.data file,
              the number of courses nc, in the courses.data file, and
              the (worst-case) number of possible conflicts nf that can occur for registering each student.

CS 210 - Lab 9: Modeling the Student Registration Process Part 1 due April 27, 1999

General Lab Goals

Overview:

Figure 1. First few lines of the "classes.data" file

Figure 2. First few lines of the "cards.data" file.

Major Lab Tasks

CS 210 - Lab 9: Modeling the Student Registration Process
Part 1 due April 27, 1999