Project 2

In this project you will experiment with matrix layouts and their effect on the performance of a couple of simple operations.

Let A be a matrix of n elements, and let's assume there are sqrt n rows and sqrt n columns. Matrices have many applications, as you know, and you can think of it as representing the adjacency matrix of a graph; the elevations in a grid terrain; or a matrix used in numerical simulations in physics. The point is, matrices appear in numerous applications, and we want to be able to implement the fundamental operations on matrices efficiently.

In this project you will implement matrix transposition and matrix multiplication, and you will experimentally analyze the performance of your algorithms as a function of n (matrix size).

Algorithms

Matrix transposition: write a function that computes the transpose of A, call it B. Implement two approaches: the straightforward one, and the recursive one described in class.

Matric multiplication: write a function that computes the product of two matrices of the same size, A and B. Implement two approaches: the straightforward one, and the recursive one described in class.

Matris layouts

You'll implement 3 layouts:
1. row-major order
2. column-major order
3. z-order
In what follows I will describe how to structure your code so that you can think generically about layouts and switch between layouts in a simple way.

To store the matrix A, we'll use a 1-dimensional array, call it Z[0..n-1]. To access an element (i,j) from the matrix we'll define a function that maps pairs (i,j) to indices z in {0,1,...,n-1}. This function is bijective, that is, every pair (i,j) maps to a unique z(i,j) and the other way around.

For simplicity, let's assume that sqrt n=2^k, for some k in N. Let bin(x) denote binary representation of number x, with x in {0,1,..., sqrt n -1}. Note that the binary representation of an index of a row or column in the matrix requires k digits, and the binary representation of an index in Z requires 2k digits.

Some mappings:

1. z1: the binary representation z1(i,j) consists of the digits of bin(i) followed by the digits of bin(j). The effect is that the matrix is stored row by row, and thus z1 gives row-major order.
2. z2: the binary representation z2(i,j) consists of the digits of bin(j) followed by the digits of bin(i). The effect is that the matrix is stored column by column, and thus z2 gives column-major order.
3. z3: the binary representation of z3(i,j) consists of the digits of bin(i) interleaved with the digits of bin(j); that is, we start with the first digit of bin(i), then the first digit of bin(j), then the second digit of bin(i), then the second digit of bin(j), and so on. This is called z-order.
You will write your code so that it works with generically with the matrix, separating out the details for how the matrix is stored. That is, whenever you need to access A[i][j], you'll access Z[z1(i,j)] or Z[z2(i,j)] or Z[z3(i,j)]. Wrap this in a function get(i,j) so that the code is clear and simple. That is, when somebody reads your code for matrix transposition or matrix multiplication, they should not see any references to mayouts and stuff like that.

Experimental analysis

To initialize the matrices with values, use random numbers.

For experiments use the grid, and run experiments only on the 1g nodes.

There are 3 algorithms, and 3 matrix layouts, therefore you'll have 9 different modules to test.

Plots: For each algorithm, show the running times for each of the 3 layouts. Experiment with both small and large values for n, and show this on different plots. Therefore there will be 6 plots: For each algorithm, you'll have 2 plots, one for small values of n so that we can see the effect of the caches, and one for large n, so that we can see the IO bottleneck.

Hand in: Email me the code so that I can test it. Bring to class a hardcopy of the code, and a paper summarizing your work. The paper is a very important part of your work, so plan to spend on it a fair amount of time.

The paper should include:

• a clear description of your assignment
• a description of the algorithm(s) you implemented
• an analysis of the running time, both CPU and IO, experessed using Theta-notation.
• a description of all the experiments you did and the results.
• a discussion of the experimental results
• conclusions (about whiat algorithm works best under what circumstances).

Type your paper in Latex. You can use the following template.