BenchmarksΒΆ
This page report the result of some scalability tests which are available in the under example The test being performed are the following:
Transpose of a real 3D array.
Transpose of a complex 3D array.
3D FFT trasform (fft_r2c_x) of a real 3D array starting from X physical direction. The trasform has both forward and backward to retrieve the inout array.
3D FFT trasform (fft_c2c_x) of a complex 3D array starting from X physical direction. The trasform has both forward and backward to retrieve the inout array.
3D FFT trasform (fft_r2c_z) of a real 3D array starting from Z physical direction. The trasform has both forward and backward to retrieve the inout array.
3D_FFT_trasform (fft_c2c_z) of a complex 3D array starting from Z physical direction. The trasform has both forward and backward to retrieve the inout array.
All timing are collected averaging 50 repetitions of the test with the 0 iteration being discarded. Two resolutions have been tested:
NX=NY=NZ=512
which corresponds to rougly 130 million points.NX=NY=NZ=1024
which corresponds to rougly 1 billion points.
A 2D label for the results indicates a 2D (i.e. pencils) decomposition using the optimal automatic configuration
(that generally corresponds to the closest decomposition to NR=NC
).
A 1D label for the results indicates a 1D (i.e. slabs) decomposition. With 2DECOMP&FFT this is obatained
forcing one of the two decomposition direction to 1. If N_ROW=1
an initial Z
slabs
(i.e. local memory data are in the XY
plane) is obatained,
conversely N_COL=1
start from a X
slabs configuration
(i.e. local memmory data are in the YZ
plane).
Generally only one set of slabs data are plotted since performances are relatively similar.
The library has been benchmark on the following systems: