Is there a way to parallelise one-dimensional Fast Fourier Transform?

Yes.

The way is as follows. Think of a one dimensional data vector
as being a collapsed two-dimensional matrix, whose indexes (*j*, *J*),
where
and
collapse onto
*f*(*Jm* + *j*).

The 2-dimensional Discrete Fourier Transform of the matrix would be:

This yields the following prescription:

- 1.
- Reshape the original 1-dimensional array into an array in Fortran column-major order.
- 2.
- Parallel FFT on the second index, i.e., columns.
- 3.
- Multiply each component by a phase factor .
- 4.
- Transpose.
- 5.
- Parallel FFT on the second index, i.e., still columns.
- 6.
- Reshape the two-dimensional array into one-dimensional output.

Remember that even on the fastest communication fabrics, e.g., switched SMPs, inter-processor communication operations are at least an order of magnitude slower than local memory accesses. On slower fabrics, e.g., external switched communication networks, inter-processor communications may be 3 or more orders of magnitude slower than local memory access.

In turn, local memory operations may be still between one and two orders of magnitude slower than arithmetic CPU operations.

Moving data around is by far the costliest part of every computational process.