Replace non-blocking receive with a blocking probe. Interrogate status to find out the size of the message. Allocate dynamically appropriate amount of space to receive the message.
Replace MPI_Wait in task 1 with a looping MPI_Test.
Write a short program for 8 tasks, which passes an arbitrary message around using non-blocking sends and receives combined with MPI_Wait. Make every task write the message on standard output immediately after having started a send to the next task, but before the completion of the send. Generate a trace file and use the visualization tool, vt, to observe the progress of the program. Observe that the communication effectively serializes the program.

