(adapted from Sedgewick & Wayne, Stanford)

In this project you will investigate the degree of separation of Holywood actors, also known as the Kevin Bacon game. As you may know, Kevin Bacon is a prolific actor who has appeared in many movies. We assign Kevin Bacon himself a Kevin-Bacon-number of 0. Any actor (except Kevin Bacon himself) who has starred in a movie with Kevin Bacon has a Kevin-Bacon-number of 1. Any remaining actor who has been in the same cast as an actor whose Kevin-Bacon-number is 1 has a Kevin-Bacon-number of 2, and so on.

Check this out: The Oracle of Bacon. And this: The six degrees of Kevin Bacon.

For example, Meryl Streep has a Kevin-Bacon-number of 1 because she appeared in The River with Kevin Bacon. Nicole Kidman has a Kevin-Bacon-number of 2 because she did not play with Kevin Bacon in any movie, but she was in Cold Mountain with Donald Sutherland, and Sutherland appeared in Animal House with Kevin Bacon.

Genarally speaking, the goal of this lab is to find Kevin-Bacon-numbers: given the name of an actor, find his/her Kevin-Bacon-number and the shortest alternating sequence of actor-movie pairs that lead to Kevin Bacon. Another question to explore is what is the largest degree of separation to Kevin Bacon. In other words, what is the largest kevin-Bacon-number at Hollywood?

You'll see that it is much smaller than you expect. This
phenomenon is known as the

Here are varios lists of movies and the actors:

- cast.06.txt: movies released in 2006 [movies=8780, actors=84236]
- cast.00-06.txt: movies released since 2000 [movies=52195, actors=348497]
- cast.all.txt: movies [movies=285462, actors=933874]
- cast.action.txt: action movies [movies=14938, actors=139861]
- cast.rated.txt: popular movies [movies=4527, actors=122406]

'Breaker' Morant (1980)/Fitz-Gerald, Lewis/Steele, Rob (I)/Wilson, Frank (II)/Tingwell, Charles 'Bud'/Cassell, Alan (I)/Rodger, Ron/Knez, Bruno/Woodward, Edward/Cisse, Halifa/Quin, Don/Kiefel, Russell/Meagher, Ray/Procanin, Michael/Bernard, Hank/Gray, Ian (I)/Brown, Bryan (I)/Ball, Ray (I)/Mullinar, Rod/Donovan, Terence (I)/Ball, Vincent (I)/Pfitzner, John/Currer, Norman/Thompson, Jack (I)/Nicholls, Jon/Haywood, Chris (I)/Smith, Chris (I)/Mann, Trevor (I)/Henderson, Dick (II)/Lovett, Alan/Bell, Wayne (I)/Waters, John (III)/Osborn, Peter/Peterson, Ron/Cornish, Bridget/Horseman, Sylvia/Seidel, Nellie/West, Barbara/Radford, Elspeth/Reed, Maria/Erskine, Ria/Dick, Judy/Walton, Laurie (I) 'burbs, The (1989)/Gage, Kevin/Hahn, Archie/Feldman, Corey/Gordon, Gale/Drier, Moosie/Theodore, Brother/Katt, Nicky/Miller, Dick (I)/Hanks, Tom/Dern, Bruce/Turner, Arnold F./Howard, Rance/Ducommun, Rick/Danziger, Cory/Ajaye, Franklyn/Scott, Carey/Kramer, Jeffrey (I)/Olsen, Dana (I)/Gains, Courtney/Picardo, Robert/Hays, Gary/Davis, Sonny Carl/Gibson, Henry (I)/Jayne, Billy/Stevenson, Bill (I)/Katz, Phyllis/Vorgan, Gigi/Darbo, Patrika/Schaal, Wendy/French, Leigh/Fisher, Carrie/Benner, Brenda/Newman, Tracy (I)/Stewart, Lynne Marie/Haase, Heather (I) ...

You have movies, and you have actors. Actors are linked to the
movies that they played in, and the other way around. The
mathematical model for such a structure that stores pairwise
connections between entities is called a

A graph is comprised of a set of vertices and a set of edges. Each edge represents a connection between two vertices. A graph represents a network on the set of vertices. Many, many problems in the world can be modeled as graphs, from telephone and computer networks, to transportation networks, to Internet (websites and links), to social networks, to genetic networks.

Not surprisingly, you'll use graphs to model the movie-actor relationship. The first question is how to model the Holywood world with a graph:

- What should the vertices and edges in this graph be?
- Should the vertices be be movies with links between movies if they share a common actor?
- Should the vertices be actors with edges connecting two actors if they both played in the same movie?
- Should we have vertices for both movies and actors and have edges connecting movies to the actors who appear in that movie.

Once you decide what the graph represents and what data structure you'll use to represent it, you'll start developing a MovieGraph class. This class should be able to construct a movie-graph from a file. Encapsulate all necessary getters and setters, and all basic functionality that you may expect from a class that implements a MovieGraph. For example,

//create en empty movie graph MovieGraph() //read graph from the file MovieGraph(String fname) //add edge u-v void addEdge(String u, String v) //number of vertices int nV() //number of edges int nE() //return the vertices adjacent to vertex v bolean neighbors(String v) //return the degree of vertex v (degree = nb of neighbors) int degree(String v) //is v a vertex in the graph boolean hasVertex(String v) //is u-v an edge in the graph boolean hasEdge(String u, String v)Include testing functions that allow to print the vertices and edges in your graph.

- Given an actor, find all the movies he/she played in.
- Given a movie, find all the actors who starred in the movie.

void queryMovies() { while (1) { //ask the user to enter a movie name or Q to exit call queryMovie on the movie that the user entered } }

Note that to find the Kevin-Bacon-number of an actor X, we need to find the shortest path connecting X to Kevin Bacon.

Your goal is to write a method that takes an actor name and finds the Kevin-Bacon-number of the actor and displays nicely the movie-actor chain to Kevin Bacon.

It turns out that you can compute shortest paths in a graph using a strategy tha you have seen while searching: breadth-first search. Start from the vertex representng Kevin Bacon; add all its neighbors to a queue. These are all the actors with a KB-number of 1. Then add to the queue all neighbors of these neighbors, and so on. It is easy to see that using breadth-first search you find the shortest path connecting a vertex to Kevin Bacon.

Some things to think of:

- When is a node in the queue final?
- How to represent a node in the queue while doing BFS. Well, it is a String representing a vertex in a MovieGraph. But you also need to keep track of the actual path of a queue node to the start vertex. At the end, you want to trace back the path to Kevin Bacon.
- How do you handle duplicate nodes in the queue: that is, you may want to enqueue a vertex that is already in the queue.
- How do you keep track of the cost of a node in teh queue to the start node (Kevin Bacon). At the end, you need to print this distance, which is actually the Kevin-bacon number.

Note that there is nothing special about Kevin Bacon, and that the same approach can be computed to compute shortest paths between any two actors in Holywood. You want to make your methods general enough, not customized for Kevin Bacon.

In terms of style, you will probably want to implement computing
paths as a separate class. Call it `MoviePath`. This class has
to essentially perform BFS from a given vertex on a given graph and
has to store all the necessary data for this.
I imagine you will have a couple of methods in MoviePath. First,
you'll have a constructor that takes as parameters a MovieGraph and a
vertex in this graph and runs BFS from this vertex in the graph. Then
you'll have functions that will return the actual path and distance to
the source vertex.

//run BFS from kevinbacon in g and save results in whatever instance //variables you may need so that they can be queried MoviePath(MovieGraph g, String kevinbacon) //assume BFS has been run. Return the shortest path from actor to the //start vertex of the BFS int pathTo(String actor) //assume BFS has been run. Return the distance from actor to the //start vertex of the BFS int distanceTo(String actor) //test functionsTo test this class, write a function that asks the user repeatedly for an actor name, and prints the distance and path to Kevin Bacon.

- What is the average degree of a Movie Graph? That is, what is the average number of movies an actor plays in? What is the average number of actors a movie has?
- Are all actors connected to Kevin Bacon? What is the maximum KB-number of an actor? Print all actors with KB-number 8. Write a method that takes a number k and prints the number of actors with KB-number equal to k.
- Write a method Histogram to print a histogram of Kevin Bacon numbers, indicating how many actors have a Bacon number of 0, 1, 2, 3, etc.
- What is the average KB-number across all actors?

Due last day of classes. You can work with one partner. You are stroungly encouraged to find a partner. Once you have the background, working with a partner is both fun and challenging.

When you turn in the code, include a brief README file that describes the structure of your code, instructs the user how to run it, and answers the questions posed in the lab.

Since the lab gives you little guidance on how to structure the code, you will find that the amount of time you put into this project is directly proportional to how clean is your design.

These are some things to think of as you think of how to model the
problem. You need to understand that there is not one "right" way to
do it. There are easier ways, and there are harder ways. There are
more efficient ways, and less efficient ways. There are ways that
will be easy to program, and there are ways that will take a lot of
effort to make work. **YOU** are the creator of your world. Understand
what it is that your world needs to do, decide how to model your
world, keep it consistent, and make it work.

Lessons to learn:

- Think before you start! Sketch the layout. Encapsulate the functionality.
- Develop incrementally. Write a few lines, compile, test, debug, repeat.
- Keep testing and checking.
- Performance matters.

Last modified: Thu Dec 11 13:03:47 EST 2008