Let A be a matrix of n elements, and let's assume there are sqrt n rows and sqrt n columns. Matrices have many applications, as you know, and you can think of it as representing the adjacency matrix of a graph; the elevations in a grid terrain; or a matrix used in numerical simulations in physics. The point is, matrices appear in numerous applications, and we want to be able to implement the fundamental operations on matrices efficiently.

In this project you will implement matrix transposition and matrix multiplication, and you will experimentally analyze the performance of your algorithms as a function of n (matrix size).

Matrix transposition: write a function that computes the transpose of A, call it B. Implement two approaches: the straightforward one, and the recursive one described in class.

Matric multiplication: write a function that computes the product of two matrices of the same size, A and B. Implement two approaches: the straightforward one, and the recursive one described in class.

- row-major order
- column-major order
- z-order

To store the matrix A, we'll use a 1-dimensional array, call it Z[0..n-1]. To access an element (i,j) from the matrix we'll define a function that maps pairs (i,j) to indices z in {0,1,...,n-1}. This function is bijective, that is, every pair (i,j) maps to a unique z(i,j) and the other way around.

For simplicity, let's assume that sqrt n=2^k, for some k in N. Let bin(x) denote binary representation of number x, with x in {0,1,..., sqrt n -1}. Note that the binary representation of an index of a row or column in the matrix requires k digits, and the binary representation of an index in Z requires 2k digits.

Some mappings:

- z1: the binary representation z1(i,j) consists of the digits of bin(i) followed by the digits of bin(j). The effect is that the matrix is stored row by row, and thus z1 gives row-major order.
- z2: the binary representation z2(i,j) consists of the digits of bin(j) followed by the digits of bin(i). The effect is that the matrix is stored column by column, and thus z2 gives column-major order.
- z3: the binary representation of z3(i,j) consists of the digits of bin(i) interleaved with the digits of bin(j); that is, we start with the first digit of bin(i), then the first digit of bin(j), then the second digit of bin(i), then the second digit of bin(j), and so on. This is called z-order.

To initialize the matrices with values, use random numbers.

For experiments use the grid, and run experiments only on the 1g nodes.

There are 3 algorithms, and 3 matrix layouts, therefore you'll have 9 different modules to test.

Plots: For each algorithm, show the running times for each of the 3 layouts. Experiment with both small and large values for n, and show this on different plots. Therefore there will be 6 plots: For each algorithm, you'll have 2 plots, one for small values of n so that we can see the effect of the caches, and one for large n, so that we can see the IO bottleneck.

**Hand in:** Email me the code so that I can test
it. Bring to class a hardcopy of the code, and a paper summarizing
your work. The paper is a very important part of your work, so plan to
spend on it a fair amount of time.

The paper should include:

- a clear description of your assignment
- a description of the algorithm(s) you implemented
- an analysis of the running time, both CPU and IO, experessed using Theta-notation.
- a description of all the experiments you did and the results.
- a discussion of the experimental results
- conclusions (about whiat algorithm works best under what circumstances).

Type your paper in Latex. You can use the following template.

Last modified: Tue Mar 29 15:12:39 EDT 2011