2D Pencil Decomposition API¶
This page explains the key public interfaces of the 2D decomposition library. After reading this section, users should be able to easily build applications using this domain decomposition strategy. The library interface is designed to be very simple. One can refer to the example applications for a quick start.
The 2D Pencil Decomposition API is defined in three Fortran module which should be used by applications as:
use decomp_2d_constants
use decomp_2d_mpi
use decomp_2d
The use decomp_2d_constants
defines all the parameters, use decomp_2d_mpi
introduces all the MPI
related interfaces and use decomp_2d
cointains the main decomposition and transposition APIs.
Module decomp_2d_constant: Global Variables¶
The decomp_2d_constants
cointains global parameters that used to define the KIND of floating
point data (e.g. single or double precision).
These are used to consistently define the precision of the data type for the viariables
and for MPI operations.
mytype
- Use this variable to define the KIND of floating-point data, e.g.real(mytype) :: var
orcomplex(mytype) :: cvar
. Depending on configuring options this type will point to single or double.real_type, complex_type
- These are the proper MPI datatypes to be used (for real and complex numbers, respectively) if applications need to call MPI library routines directly. These types will point to single of double depending on the configuring options.real2_type
- This type double the precision of the baselinereal_type
.mytype_single, real_type_single
- These two types are used to define the data type fpr the IO operations.
The module contains additional parameters to control :
the log on output,
the log on debug,
activate the degugger (caliper)
activate the different FFT backends (generic, FFTW, MKL, cuFFT)
define the release major and minor version
Module decomp_2d_mpi: MPI communication¶
The decomp_2d_mpi
cointains global parameters that are used for MPI operation:
nproc
- the total number of MPI processes. [INT].nrank
- the rank of the current MPI process. [INT].decomp_2d_comm
- global MPI communicator [INT].decomp_2d_about
- interface to display error message and call MPI_ABORT function.decomp_2d_warning
- interface to display error message together with line number and function.
Module decomp_2d: decompostion module¶
The decomp_2d
module cointains the variables and the routines to perform the global transpostion operations.
The important variables are
nx_global, ny_global, nz_global
- size of the global data.xsize(i), ysize(i), zsize(i), i=1,2,3
- sizes of the sub-domains held by the current process. The first letter refers to the pencil orientation and the three 1D array elements contain the sub-domain sizes in X, Y and Z directions, respectively. In a 2D pencil decomposition, there is always one dimension which completely resides in local memory. So by definitionxsize(1)==nx_global
,ysize(2)==ny_global
andzsize(3)==nz_global
.xstart(i), ystart(i), zstart(i), xend(i), yend(i), zend(i), i=1,2,3
- the starting and ending indices for each sub-domain, as in the global coordinate system. Obviously, it can be seen thatxsize(i)=xend(i)-xstart(i)+1
. It may be convenient for certain applications to use global coordinate (for example when extracting a 2D plane from a 3D domain, it is easier to know which process owns the plane if global index is used).
Decomposition informations are also available using the data type DECOMP_INFO
which provides the
following derived types:
xst(i), yst(i), zst(i), i=1,2,3
- the starting indices for each sub-domain, as in the global coordinate system.xen(i), yen(i), zen(i), i=1,2,3
- the end indices for each sub-domain, as in the global coordinate system.xsz(i), ysz(i), zsz(i), i=1,2,3
- the size for each sub-domain, as in the global coordinate system.
Arrays can also be stored on smaller mesh sizes where points are skipped:
iskipS, jskipS, kskipS
- points skipped in the x, y and z direction for a generic scalar fieldiskipV, jskipV, kskipV
- points skipped in the x, y and z direction for the velocity fieldiskipP, jskipP, kskipP
- points skipped in the x, y and z direction for the pressure fieldxszS, yszS, zszS, xstS, ystS, zstS, xenS, yenS, zenS
- size, starting and final indexes for the reduce size mesh for a generic scalarxszV, yszV, zszV, xstV, ystV, zstV, xenV, yenV, zenV
- size, starting and final indexes for the reduce size mesh for the velocity fieldxszP, yszP, zszP, xstP, ystP, zstP, xenP, yenP, zenP
- size, starting and final indexes for the reduce size mesh for the pressure field
The module provides memory allocations API that are recomended to correctly define its major data structures
within the main program. It is recomended that all major arrays are defined as allocable
.
alloc_x(var, decomp, global)
- allocation using a x-pencil decompositionalloc_y(var, decomp, global)
- allocation using a y-pencil decompositionalloc_y(var, decomp, global)
- allocation using a z-pencil decomposition
where var
is the allocable array name,
decomp[optional]
the relative DECOMP_INFO
data type and
global[optional]
is a logical [True/False] flag to indicate if the array is allocated in the global coordinate system.
The allocation for a x pencil decomposition would be equivalent to the statement:
allocate(var(decomp%xsz(1), decomp%xsz(2), decomp%xsz(3))) ! if global==.false.
allocate(var(decomp%xst(1):decomp%xen(1), decomp%xst(2):decomp%xen(2), &
decomp%xst(3):decomp%xen(3))) ! if global==.true.
Allocated arrays can be simply released with an deallocate(var)
statement.
Basic 2D Decomposition API¶
All the global variables described above, the defualt common type decomp
and the MPI initialization is done
using the following call
call decomp_2d_init(nx, ny, nz, p_row, p_col)
where nx
, ny
and nz
are the size of 3D global data to be distributed over
a 2D processor grid p_row x p_col
.
Note that none of the dimensions need to be divisible by p_row
or p_col
, i.e. the library can handle non-evenly distributed data.
In case of p_row=p_col=0
an automatic decomposition is selected among all possible combination available.
The algorithm will choose the closest combination such as
$$ n_{row} = n_{col} = {nproc}^{1/2} $$
In case the root is not exact the closest combination to $n_{row} = n_{col}$ with $n_{row} < n_{col}$ is used.
If a 1D slab decomposition is needed instead of a 2D pencil one, it is recommended to set p_row
to unity and p_col
to nproc
.
An optional parameter may be passed to this initialisation routine:
call decomp_2d_init(nx, ny, nz, p_row, p_col, periodic_bc)
Here periodic_bc
is a 1D array containing 3 logical values that specify whether periodic boundary condition
should apply in certain dimensions. Note this is only applicable if halo-cell communication is to be used.
Another optional parameter may be passed at the initialization stage:
call decomp_2d_init(nx, ny, nz, p_row, p_col, periodic_bc, comm)
Here comm
is the MPI communicator that the library will use. By default, MPI_COMM_WORLD is used.
A key element of this library is a set of communication routines that actually perform the data transpositions. As mentioned, one needs to perform 4 global transpositions to go through all 3 pencil orientations. Correspondingly, the library provides 4 communication subroutines:
call transpose_x_to_y(var_in,var_out)
call transpose_y_to_z(var_in,var_out)
call transpose_z_to_y(var_in,var_out)
call transpose_y_to_x(var_in,var_out)
The input array var_in
and var_output
array out should have been defined
and contain distributed data for the correct pencil orientations.
Note that the library is written using Fortran’s generic interface so different data types are supported without user intervention. That means in and out above can be either real arrays or complex arrays, the latter being useful for FFT-type of applications.
As seen, the communication details are packed within a black box. From a user’s perspective, it is not necessary to understand the internal logic of these transposition routines. From the developer’s perspective, he has the freedom to change the implementation without breaking user codes.
It is however noted that the communication routines are expensive, especially when running on large number of processors. So applications should try to minimize the number of calls to them by adjusting the algorithms in use, even sometimes by duplicating computations.
Finally, before exit, applications should clean up the memory by:
call decomp_2d_finalize
Advanced 2D Decomposition API¶
While the basic decomposition API is very user-friendly, there may be situations in which applications need to handle more complex data structures. There are quite a few examples:
While using real-to-complex FFTs, applications need to store both the real input (say, of global size nx*ny*nz) and the corresponding complex output (of smaller global size - such as (nx/2+1)*ny*nz - where roughly half the output is dropped due to conjugate symmetry).
Many CFD applications use a staggered mesh system which requires different storage for global quantities (e.g. cell-centred vs. cell-interface storage).
In applications using spectral method, for anti-aliasing purpose, it is a common practice to enlarge the spatial domain before applying the Fourier transforms.
In all these examples, there are multiple global sizes and applications need to be able to distributed different data sets as 2D pencils. 2DECOMP&FFT provides a powerful and flexible programming interface to handle this:
TYPE(DECOMP_INFO) :: new_decomp
call decomp_info_init(n1, n2, n3, new_decomp)
Here decomp is an instance of Fortran derived data type DECOMP_INFO encapsulating
the 2D decomposition information associated with one particular global size n1 x n2 x n3
.
The decomposition object can be initialised using the decomp_info_init
routine as:
call decomp_info_init(n1,n2,n3, new_decomp)
This object then can be passed to the communication routines defined in the basic interface as a third parameter. For example:
call transpose_x_to_y(var_in, var_out, new_decomp)
The input and output arrays can be allocated as:
call alloc_x(var_in, new_decomp, .true.)
call alloc_y(var_out, new_decomp, .true.)
Finally the defined type needs also to be nullified using:
call decomp_info_finalize(new_decomp)