BenchmarksΒΆ

This page report the result of some scalability tests which are available in the under example The test being performed are the following:

  1. Transpose of a real 3D array.

  2. Transpose of a complex 3D array.

  3. 3D FFT trasform (fft_r2c_x) of a real 3D array starting from X physical direction. The trasform has both forward and backward to retrieve the inout array.

  4. 3D FFT trasform (fft_c2c_x) of a complex 3D array starting from X physical direction. The trasform has both forward and backward to retrieve the inout array.

  5. 3D FFT trasform (fft_r2c_z) of a real 3D array starting from Z physical direction. The trasform has both forward and backward to retrieve the inout array.

  6. 3D_FFT_trasform (fft_c2c_z) of a complex 3D array starting from Z physical direction. The trasform has both forward and backward to retrieve the inout array.

All timing are collected averaging 50 repetitions of the test with the 0 iteration being discarded. Two resolutions have been tested:

  • NX=NY=NZ=512 which corresponds to rougly 130 million points.

  • NX=NY=NZ=1024 which corresponds to rougly 1 billion points.

A 2D label for the results indicates a 2D (i.e. pencils) decomposition using the optimal automatic configuration (that generally corresponds to the closest decomposition to NR=NC). A 1D label for the results indicates a 1D (i.e. slabs) decomposition. With 2DECOMP&FFT this is obatained forcing one of the two decomposition direction to 1. If N_ROW=1 an initial Z slabs (i.e. local memory data are in the XY plane) is obatained, conversely N_COL=1 start from a X slabs configuration (i.e. local memmory data are in the YZ plane). Generally only one set of slabs data are plotted since performances are relatively similar.

The library has been benchmark on the following systems:

  • The UK National Supercomputer service Archer2

  • The GPU partition of the EPCC Cirrus service

Here the detailed list of the results: