(inspired from Sedgewick & Wayne, Stanford and Jeff Forbes, Duke)

In this project you will investigate the degree of separation of Holywood actors, also known as the Kevin Bacon game. As you may know, Kevin Bacon is a prolific actor who has appeared in many movies. We assign Kevin Bacon himself a Kevin-Bacon-number of 0. Any actor (except Kevin Bacon himself) who has starred in a movie with Kevin Bacon has a Kevin-Bacon-number of 1. Any remaining actor who has been in the same cast as an actor whose Kevin-Bacon-number is 1 has a Kevin-Bacon-number of 2, and so on.

For example, Meryl Streep has a Kevin-Bacon-number of 1 because she appeared in The River with Kevin Bacon. Nicole Kidman has a Kevin-Bacon-number of 2 because she did not play with Kevin Bacon in any movie, but she was in Cold Mountain with Donald Sutherland, and Sutherland appeared in Animal House with Kevin Bacon.

Check out the Wiki page on the six degrees of Kevin Bacon. And check out an online version of this game, The Oracle of Bacon.

Genarally speaking, the goal of this lab is to:

- Find Kevin-Bacon-numbers: given the name of an actor, find his/her Kevin-Bacon-number and the shortest alternating sequence of actor-movie pairs that lead to Kevin Bacon.
- What is the average Kevin-Bacon-number at Holywood? (for this,
we'll ignore the actors, if any, that are
**not**connected to Kevin-bacon). This gives a measure of how good Kevin Bacon is as "the center of Holywood".

You may ask, is there anything special about Kevin Bacon? One
should be able to compute shortest paths between *any* two
actors; and one should be able to evaluate *any* actor as a
center of Holywood. Your lab should handle the following:

- Find the link from Actor A to Actor B. In other words, find the A-number of B.
- Evaluate how good a center is Actor A. In other words, find the average A-number at Holywood.

For Kevin Bacon, you'll see that the average KB-number is much
smaller than you expect. This phenomenon is known as the

Here are various lists of movies and the actors that you'll be using:

- cast.06.txt: movies released in 2006 [movies=8780, actors=84236]
- cast.00-06.txt: movies released since 2000 [movies=52195, actors=348497]
- cast.all.txt: movies [movies=285462, actors=933874]
- cast.action.txt: action movies [movies=14938, actors=139861]
- cast.rated.txt: popular movies [movies=4527, actors=122406]

'Breaker' Morant (1980)/Fitz-Gerald, Lewis/Steele, Rob (I)/Wilson, Frank (II)/Tingwell, Charles 'Bud'/Cassell, Alan (I)/Rodger, Ron/Knez, Bruno/Woodward, Edward/Cisse, Halifa/Quin, Don/Kiefel, Russell/Meagher, Ray/Procanin, Michael/Bernard, Hank/Gray, Ian (I)/Brown, Bryan (I)/Ball, Ray (I)/Mullinar, Rod/Donovan, Terence (I)/Ball, Vincent (I)/Pfitzner, John/Currer, Norman/Thompson, Jack (I)/Nicholls, Jon/Haywood, Chris (I)/Smith, Chris (I)/Mann, Trevor (I)/Henderson, Dick (II)/Lovett, Alan/Bell, Wayne (I)/Waters, John (III)/Osborn, Peter/Peterson, Ron/Cornish, Bridget/Horseman, Sylvia/Seidel, Nellie/West, Barbara/Radford, Elspeth/Reed, Maria/Erskine, Ria/Dick, Judy/Walton, Laurie (I) 'burbs, The (1989)/Gage, Kevin/Hahn, Archie/Feldman, Corey/Gordon, Gale/Drier, Moosie/Theodore, Brother/Katt, Nicky/Miller, Dick (I)/Hanks, Tom/Dern, Bruce/Turner, Arnold F./Howard, Rance/Ducommun, Rick/Danziger, Cory/Ajaye, Franklyn/Scott, Carey/Kramer, Jeffrey (I)/Olsen, Dana (I)/Gains, Courtney/Picardo, Robert/Hays, Gary/Davis, Sonny Carl/Gibson, Henry (I)/Jayne, Billy/Stevenson, Bill (I)/Katz, Phyllis/Vorgan, Gigi/Darbo, Patrika/Schaal, Wendy/French, Leigh/Fisher, Carrie/Benner, Brenda/Newman, Tracy (I)/Stewart, Lynne Marie/Haase, Heather (I) ...

You have movies, and you have actors. Actors are linked to the
movies that they played in, and the other way around. The
mathematical model for such a structure that stores pairwise
connections between entities is called a *graph*.

A graph is comprised of a set of *vertices* and a set of *edges*. Each
edge represents a connection between two vertices. A graph represents
a network on the set of vertices. Many, many problems in the world can
be modeled as graphs, from telephone and computer networks, to
transportation networks, to Internet (websites and links), to social
networks, to genetic and neural networks.

Not surprisingly, you'll use graphs to model the movie-actor relationship. The first question is how to model the Holywood world with a graph:

- What should the vertices and edges in this graph be?
- Should the vertices be be movies with links between movies if they share a common actor?
- Should the vertices be actors with edges connecting two actors if they both played in the same movie?
- Should we have vertices for both movies and actors and have edges connecting movies to the actors who appear in that movie.

To decide on a representation you need to understand what exactly you need to do with the graph. Think of the pros and cons for each of the options above. Keep in mind that whatever structure you chose to represent the graph, you have to build it based on one of the text files above.

The second question is what is a good way to store the graph. The graph contains of a set of vertices, which you can store as an array/vector, list, or map. For each vertex, you need a list of edges connected that are connected to it; you can store these "adjacency lists" as arrays/vectors, or lists, or maps.

Once you decide what the graph represents and what data structure you'll use to represent it, you'll start developing a MovieGraph class. This class should be able to construct a movie-graph from a file. Encapsulate all necessary getters and setters, and all basic functionality that you may expect from a class that implements a MovieGraph. For example,

//create en empty movie graph MovieGraph() //read graph from the file MovieGraph(String fname) //add edge u-v void addEdge(String u, String v) //number of vertices int nV() //number of edges int nE() //return the vertices adjacent to vertex v bolean neighbors(String v) //return the degree of vertex v (degree = nb of neighbors) int degree(String v) //is v a vertex in the graph boolean hasVertex(String v) //is u-v an edge in the graph boolean hasEdge(String u, String v)Include testing functions that allow to print the vertices and edges in your graph.

- Given an actor, find all the movies he/she played in.
- Given a movie, find all the actors who starred in the movie.

void queryMovies() { while (1) { //ask the user to enter a movie name or Q to exit call queryMovie on the movie that the user entered } }

Note that to find the Kevin-Bacon-number of an actor X, we need to find the shortest path connecting X to Kevin Bacon. Generally speaking, for an arbitrary actor A, we need to find the shortest path connecting X to A.

Your goal is to write a method that takes two actor names A and B, finds the A-number of B (that is, a shortest path from A to B) and displays nicely the movie-actor chain to A. Shortest paths are not necessarily unique; that is, there may be several paths of the same minimum length connecting A to X. In this case, we just want to compute one of them (does not matter which one).

It turns out that you can compute shortest paths in a graph using a strategy that you have seen while searching: breadth-first search (BFS). Start from the vertex representing the source (actor A); add all its neighbors to a queue. These are all the actors with an A-number of 1. Then add to the queue all neighbors of these neighbors, and so on. It is not hard to see (and we'll argue this in class) that using breadth-first search from A you find the shortest paths from A to all other vertices (that are connected to A).

Some things to think of:

- How to represent a node in the queue while doing BFS. Well, it is a String representing a vertex in a MovieGraph. But you also need to keep track of the actual path of a queue node to the start vertex. At the end, you want to trace back the path to A. Hint: think of how we stored the path out of a maze (we went over this in class).
- How do you handle duplicate nodes in the queue: that is, you may want to enqueue a vertex that is already in the queue. Hint: you'll need to mark nodes.
- How do you keep track of the cost of a node in the queue to the start node? At the end, you need to print this distance, which is actually the A-number. Hint: you'll need to store the distance of each node.

Note that there is nothing special about Kevin Bacon, and that the same approach can be computed to compute shortest paths between any two actors in Holywood. You want to make your methods general enough, not customized for Kevin Bacon.

In terms of style, you will probably want to implement computing
paths as a separate class. Call it `MoviePath`. This class has
to essentially perform BFS from a given vertex on a given graph and
has to store all the necessary data for this as class instance variables.
I imagine you will have a couple of methods in MoviePath. First,
you'll have a constructor that takes as parameters a MovieGraph and a
vertex in this graph and runs BFS from this vertex in the graph. Then
you'll have functions that will return the actual path and distance to
the source vertex.

**Efficiency:** One thing to think about is efficiency. Some of
the graphs are very large. Note that, to compute a path from A to B,
you need to run BFS from A until reaching B. So, one way to compute
the average path length from A for all actors is to run this process
for each actor B. This is extremely inneficient, and you will not be
able to use it on anything but the smallest graph. You want to think
about running BFS from A until the end (until reaching all nodes that
can be reached), and compute in this way all the paths from A in the
same time.

- What is the average degree of a Movie Graph? That is, what is the average number of movies an actor plays in? What is the average number of actors a movie has?
- Are all actors connected to Kevin Bacon (Actor-A)?
- What is the average A-number across all actors (who are connected to A)?
- Print all actors with A-number k. That is, write a method that takes a number k and an actor A, and prints the number of actors with A-number equal to k.
- Write a method to print a histogram of Kevin Bacon (Actor A) numbers, indicating how many actors have an A-number of 0, 1, 2, 3, etc.

It is due last on Wednesday December 2nd. You can work with one partner. You are stroungly encouraged to find a partner. Once you have the background, working with a partner is both fun and challenging.

When you turn in the code, include a brief README file that describes the structure of your code, instructs the user how to run it, and specifies how each team member contributed to the lab.

Since the lab gives you little guidance on how to structure the code, you will find that the amount of time you put into this project is directly proportional to how clean is your design.

These are some things to think of as you think of how to model the
problem. You need to understand that there is not one "right" way to
do it. There are easier ways, and there are harder ways. There are
more efficient ways, and less efficient ways. There are ways that
will be easy to program, and there are ways that will take a lot of
effort to make work. **YOU** are the creator of your world. Understand
what it is that your world needs to do, decide how to model your
world, keep it consistent, and make it work.

Lessons to learn:

- Think before you start! Sketch the layout. Encapsulate the functionality.
- Develop incrementally. Write a few lines, compile, test, debug, repeat.
- Keep testing and checking.
- Performance matters.

Last modified: Thu Dec 10 11:45:22 EST 2009