Csci 210 Lab: Small World

(Laura Toma)
(adapted from Sedgewick & Wayne, Stanford)

Overview

In this project you will investigate the degree of separation of Holywood actors, also known as the Kevin Bacon game. As you may know, Kevin Bacon is a prolific actor who has appeared in many movies. We assign Kevin Bacon himself a Kevin-Bacon-number of 0. Any actor (except Kevin Bacon himself) who has starred in a movie with Kevin Bacon has a Kevin-Bacon-number of 1. Any remaining actor who has been in the same cast as an actor whose Kevin-Bacon-number is 1 has a Kevin-Bacon-number of 2, and so on.

Check this out: The Oracle of Bacon. And this: The six degrees of Kevin Bacon.

For example, Meryl Streep has a Kevin-Bacon-number of 1 because she appeared in The River with Kevin Bacon. Nicole Kidman has a Kevin-Bacon-number of 2 because she did not play with Kevin Bacon in any movie, but she was in Cold Mountain with Donald Sutherland, and Sutherland appeared in Animal House with Kevin Bacon.

Genarally speaking, the goal of this lab is to find Kevin-Bacon-numbers: given the name of an actor, find his/her Kevin-Bacon-number and the shortest alternating sequence of actor-movie pairs that lead to Kevin Bacon. Another question to explore is what is the largest degree of separation to Kevin Bacon. In other words, what is the largest kevin-Bacon-number at Hollywood?

You'll see that it is much smaller than you expect. This phenomenon is known as the small-world phenomenon, or the six-degrees of separation. It is a concept that was discovered in the 60's in social sciences and has been researched ever since in many disciplines. Take a few minutes to search for "six degrees of separation" on the Internet, it is a fascinating topic. When it comes to Holywood, the idea is that even if every actor has a relatively small number of co-actors, there is a relatively short chain of movies/actors separating two actors from each other. If the theory of the six-degrees of separation is true for Holiwood, it would imply that most actors will have a Kevin-number of 6 or less. That is, the average KB-number is < 6. Checking this theory is your task for the lab.

Here are varios lists of movies and the actors:

Each line gives the name of a movie followed by the cast. Since names have spaces and commas in them, the / character is used as a delimiter.
'Breaker' Morant (1980)/Fitz-Gerald, Lewis/Steele, Rob (I)/Wilson, Frank (II)/Tingwell, Charles 'Bud'/Cassell, Alan (I)/Rodger, Ron/Knez, Bruno/Woodward, Edward/Cisse, Halifa/Quin, Don/Kiefel, Russell/Meagher, Ray/Procanin, Michael/Bernard, Hank/Gray, Ian (I)/Brown, Bryan (I)/Ball, Ray (I)/Mullinar, Rod/Donovan, Terence (I)/Ball, Vincent (I)/Pfitzner, John/Currer, Norman/Thompson, Jack (I)/Nicholls, Jon/Haywood, Chris (I)/Smith, Chris (I)/Mann, Trevor (I)/Henderson, Dick (II)/Lovett, Alan/Bell, Wayne (I)/Waters, John (III)/Osborn, Peter/Peterson, Ron/Cornish, Bridget/Horseman, Sylvia/Seidel, Nellie/West, Barbara/Radford, Elspeth/Reed, Maria/Erskine, Ria/Dick, Judy/Walton, Laurie (I)
'burbs, The (1989)/Gage, Kevin/Hahn, Archie/Feldman, Corey/Gordon, Gale/Drier, Moosie/Theodore, Brother/Katt, Nicky/Miller, Dick (I)/Hanks, Tom/Dern, Bruce/Turner, Arnold F./Howard, Rance/Ducommun, Rick/Danziger, Cory/Ajaye, Franklyn/Scott, Carey/Kramer, Jeffrey (I)/Olsen, Dana (I)/Gains, Courtney/Picardo, Robert/Hays, Gary/Davis, Sonny Carl/Gibson, Henry (I)/Jayne, Billy/Stevenson, Bill (I)/Katz, Phyllis/Vorgan, Gigi/Darbo, Patrika/Schaal, Wendy/French, Leigh/Fisher, Carrie/Benner, Brenda/Newman, Tracy (I)/Stewart, Lynne Marie/Haase, Heather (I)
...

Reading and representing the data

Your first task will be to read and load the data in memory into a data structure that will facilitate computing degrees of separation between actors and such.

You have movies, and you have actors. Actors are linked to the movies that they played in, and the other way around. The mathematical model for such a structure that stores pairwise connections between entities is called a graph.

A graph is comprised of a set of vertices and a set of edges. Each edge represents a connection between two vertices. A graph represents a network on the set of vertices. Many, many problems in the world can be modeled as graphs, from telephone and computer networks, to transportation networks, to Internet (websites and links), to social networks, to genetic networks.

Not surprisingly, you'll use graphs to model the movie-actor relationship. The first question is how to model the Holywood world with a graph:

The second question is what is a good way to store the graph.To decide on a representation you need to understand what exactly you need to do with the graph.

Once you decide what the graph represents and what data structure you'll use to represent it, you'll start developing a MovieGraph class. This class should be able to construct a movie-graph from a file. Encapsulate all necessary getters and setters, and all basic functionality that you may expect from a class that implements a MovieGraph. For example,

//create en empty movie graph
MovieGraph()

//read graph from the file
MovieGraph(String fname)

//add edge u-v
void addEdge(String u, String v)

//number of vertices
int nV()

//number of edges
int nE()

//return the vertices adjacent to vertex v
bolean neighbors(String v)

//return the degree of vertex v (degree = nb of neighbors)
int degree(String v)

//is v a vertex in the graph
boolean hasVertex(String v)

//is u-v an edge in the graph
boolean hasEdge(String u, String v)
Include testing functions that allow to print the vertices and edges in your graph.

Querying the graph

Once you created the graph you want to add the capability to query the graph with the following two types of questions:
  1. Given an actor, find all the movies he/she played in.
  2. Given a movie, find all the actors who starred in the movie.
Write methods that take a movie or actor as an argument, and print out the result of the query. To test these methods I envision a text interface something like this:
void queryMovies() {

     while (1) {
         //ask the user to enter a movie name or Q to exit 
         call queryMovie on the movie that the user entered
    }
}

Computing Kevin-Bacon numbers

Given two vertices in a graph, a path is a sequence of edges connecting them. There may be more than one paths in a graph connecting two vertices. A shortest path is a path with minimum length among all paths between two vertices.

Note that to find the Kevin-Bacon-number of an actor X, we need to find the shortest path connecting X to Kevin Bacon.

Your goal is to write a method that takes an actor name and finds the Kevin-Bacon-number of the actor and displays nicely the movie-actor chain to Kevin Bacon.

It turns out that you can compute shortest paths in a graph using a strategy tha you have seen while searching: breadth-first search. Start from the vertex representng Kevin Bacon; add all its neighbors to a queue. These are all the actors with a KB-number of 1. Then add to the queue all neighbors of these neighbors, and so on. It is easy to see that using breadth-first search you find the shortest path connecting a vertex to Kevin Bacon.

Some things to think of:

  1. When is a node in the queue final?
  2. How to represent a node in the queue while doing BFS. Well, it is a String representing a vertex in a MovieGraph. But you also need to keep track of the actual path of a queue node to the start vertex. At the end, you want to trace back the path to Kevin Bacon.
  3. How do you handle duplicate nodes in the queue: that is, you may want to enqueue a vertex that is already in the queue.
  4. How do you keep track of the cost of a node in teh queue to the start node (Kevin Bacon). At the end, you need to print this distance, which is actually the Kevin-bacon number.

Note that there is nothing special about Kevin Bacon, and that the same approach can be computed to compute shortest paths between any two actors in Holywood. You want to make your methods general enough, not customized for Kevin Bacon.

In terms of style, you will probably want to implement computing paths as a separate class. Call it MoviePath. This class has to essentially perform BFS from a given vertex on a given graph and has to store all the necessary data for this. I imagine you will have a couple of methods in MoviePath. First, you'll have a constructor that takes as parameters a MovieGraph and a vertex in this graph and runs BFS from this vertex in the graph. Then you'll have functions that will return the actual path and distance to the source vertex.

//run BFS from kevinbacon in g and save results in whatever instance
//variables you may need so  that they can be queried
MoviePath(MovieGraph g, String kevinbacon)

//assume BFS has been run. Return the shortest path from actor to the
//start vertex of the BFS
int pathTo(String actor)


//assume BFS has been run. Return the distance  from actor to the
//start vertex of the BFS
int distanceTo(String actor)

//test functions
To test this class, write a function that asks the user repeatedly for an actor name, and prints the distance and path to Kevin Bacon.

More questions

Here are some more things to explore:

Final comments

This is your final project. Hopefully it will show you some interesting facts about graphs (and about movies). The interface is open ended, so feel free to shine.

Due last day of classes. You can work with one partner. You are stroungly encouraged to find a partner. Once you have the background, working with a partner is both fun and challenging.

When you turn in the code, include a brief README file that describes the structure of your code, instructs the user how to run it, and answers the questions posed in the lab.

Since the lab gives you little guidance on how to structure the code, you will find that the amount of time you put into this project is directly proportional to how clean is your design.

These are some things to think of as you think of how to model the problem. You need to understand that there is not one "right" way to do it. There are easier ways, and there are harder ways. There are more efficient ways, and less efficient ways. There are ways that will be easy to program, and there are ways that will take a lot of effort to make work. YOU are the creator of your world. Understand what it is that your world needs to do, decide how to model your world, keep it consistent, and make it work.

Lessons to learn:

Have fun!
Last modified: Thu Dec 11 13:03:47 EST 2008