Homework 1: MPI and MATLAB*P

The purpose of this assignment is to get you started with parallel computing.

MIT students are welcome to use our Beowulf cluster or any other cluster at their disposal.

Take a look at Beowulf instructions for an introduction on using our class Beowulf.

Use of the PBS system is mandatory .


You are reminded to start on the homework early. Late homework will not be accepted, no matter what your reason is.

Instructions for submission:

For our sanity's sake, please follow the submission procedure:

Part I: Parallel substring search
Part II: Merge Sort
Part III: Power method
Part IV: How fast are those clusters

Part I: Parallel substring search

In lecture we talked about the "basic six" MPI routines:
(refer to
the list of MPI routines for definitions)

This problem will be an opportunity to try out these routines as well as learning about some more advanced routines.

Consider the following problem:

Let S be a random string of the alphabet Q = { A, C, G, T } with length mn. S is distributed among m processors such that processor i has a substring S_i of length n. S equals concat(S_0, S_1, ..., S_(m-1)).

Let I be an input string of the alphabet Q.

Write a program which, when run on m processors (-np m) does the following:

What should you submit


Part II:

Merge Sort

Using MATLAB*P's mm mode, try to run a merge sort.

With np processors, generate a random vector of length 10000*np. Something like randn(10000*np*p,1) would work. Then, using mm mode, do a merge sort. The issue here is how do you proceed beyond the initial sort step - this is for you to figure out. MATLAB*P <-> frontend traffic is allowed, but can only be done in blocks of size 1000.

Constraint - at any time you are not allowed to have more than 10000 elements of the input data on each node, no matter where they come from.

What should you submit


Part III: Power method

We will investigate the largest eigenvalue of an infinite matrix A using power method. The entries in the matrix are defined as follows: for the entry at row i and column j, the value is A(i,j)=1/ [(i+j-2)(i+j-1)/2 + i] for 1<=i,j<=n. (Remember c is "0 based" but matrices are often "1 based")

Implement the following algorithm in MATLAB*P:

Starting with a random vector x, repeat the following:

Use a matrix of size at least 2^16 square to approximate the infinite matrix. Note that you should NEVER form the matrix explicitly, as it will not fit in the memory of the machine. Instead, use the definition of the entries in your y = Ax computation. Use the 'mm mode' of MATLAB*P for this step.

The second and third step can be done in 'regular' MATLAB*P.

When to stop? n is an approximation to the largest eigenvalue of the matrix and should converge towards the true value. When the difference between n from the previous iteration and the current n is less than tol, stop and return n. Use tol=10>> data of ^x means any 10000 data of x, no matter which part they come

What to submit


Part IV: How fast are those clusters

This question tries to bring your attention to the issues affecting the performance of a cluster.

Refer to the
Supercomping @ MIT page, which list some clusters at MIT that we have information on. Within that page, pick 5 clusters. Your 5 choices must include all 3 different type of networks (Fast Ethernet, Gigabit Ethernet, Myrinet). Then try to estimate the High Performance Linpack benchmark (the benchmark used by top500) result on those clusters.

This is an open ended question - we don't know the answers either. So any answer that is reasonable and based on criteria relevant to cluster performance will be accepted.

How do you get started on this? Suggestions:

What to submit

Ron Choy
Last modified: Sun Feb 9 22:32:13 GMT 2003