# 2D Pencil Decomposition API¶

This page explains the key public interfaces of the 2D decomposition library. After reading this section, users should be able to easily build applications using this domain decomposition strategy. The library interface is designed to be very simple. One can refer to the example applications for a quick start.

The 2D Pencil Decomposition API is defined in three Fortran module which should be used by applications as:

```
use decomp_2d_constants
use decomp_2d_mpi
use decomp_2d
```

The `use decomp_2d_constants`

defines all the parameters, `use decomp_2d_mpi`

introduces all the MPI
related interfaces and `use decomp_2d`

cointains the main decomposition and transposition APIs.

## Module **decomp_2d_constant**: Global Variables¶

The `decomp_2d_constants`

cointains global parameters that used to define the KIND of floating
point data (e.g. single or double precision).
These are used to consistently define the precision of the data type for the viariables
and for MPI operations.

`mytype`

- Use this variable to define the KIND of floating-point data, e.g.`real(mytype) :: var`

or`complex(mytype) :: cvar`

. Depending on configuring options this type will point to single or double.`real_type, complex_type`

- These are the proper MPI datatypes to be used (for real and complex numbers, respectively) if applications need to call MPI library routines directly. These types will point to single of double depending on the configuring options.`real2_type`

- This type double the precision of the baseline`real_type`

.`mytype_single, real_type_single`

- These two types are used to define the data type fpr the IO operations.

The module contains additional parameters to control :

the log on output,

the log on debug,

activate the degugger (caliper)

activate the different FFT backends (generic, FFTW, MKL, cuFFT)

define the release major and minor version

## Module **decomp_2d_mpi**: MPI communication¶

The `decomp_2d_mpi`

cointains global parameters that are used for MPI operation:

`nproc`

- the total number of MPI processes. [INT].`nrank`

- the rank of the current MPI process. [INT].`decomp_2d_comm`

- global MPI communicator [INT].`decomp_2d_about`

- interface to display error message and call MPI_ABORT function.`decomp_2d_warning`

- interface to display error message together with line number and function.

## Module **decomp_2d**: decompostion module¶

The `decomp_2d`

module cointains the variables and the routines to perform the global transpostion operations.
The important variables are

`nx_global, ny_global, nz_global`

- size of the global data.`xsize(i), ysize(i), zsize(i), i=1,2,3`

- sizes of the sub-domains held by the current process. The first letter refers to the pencil orientation and the three 1D array elements contain the sub-domain sizes in X, Y and Z directions, respectively. In a 2D pencil decomposition, there is always one dimension which completely resides in local memory. So by definition`xsize(1)==nx_global`

,`ysize(2)==ny_global`

and`zsize(3)==nz_global`

.`xstart(i), ystart(i), zstart(i), xend(i), yend(i), zend(i), i=1,2,3`

- the starting and ending indices for each sub-domain, as in the global coordinate system. Obviously, it can be seen that`xsize(i)=xend(i)-xstart(i)+1`

. It may be convenient for certain applications to use global coordinate (for example when extracting a 2D plane from a 3D domain, it is easier to know which process owns the plane if global index is used).

Decomposition informations are also available using the data type `DECOMP_INFO`

which provides the
following derived types:

`xst(i), yst(i), zst(i), i=1,2,3`

- the starting indices for each sub-domain, as in the global coordinate system.`xen(i), yen(i), zen(i), i=1,2,3`

- the end indices for each sub-domain, as in the global coordinate system.`xsz(i), ysz(i), zsz(i), i=1,2,3`

- the size for each sub-domain, as in the global coordinate system.

Arrays can also be stored on smaller mesh sizes where points are skipped:

`iskipS, jskipS, kskipS`

- points skipped in the x, y and z direction for a generic scalar field`iskipV, jskipV, kskipV`

- points skipped in the x, y and z direction for the velocity field`iskipP, jskipP, kskipP`

- points skipped in the x, y and z direction for the pressure field`xszS, yszS, zszS, xstS, ystS, zstS, xenS, yenS, zenS`

- size, starting and final indexes for the reduce size mesh for a generic scalar`xszV, yszV, zszV, xstV, ystV, zstV, xenV, yenV, zenV`

- size, starting and final indexes for the reduce size mesh for the velocity field`xszP, yszP, zszP, xstP, ystP, zstP, xenP, yenP, zenP`

- size, starting and final indexes for the reduce size mesh for the pressure field

The module provides memory allocations API that are recomended to correctly define its major data structures
within the main program. It is recomended that all major arrays are defined as `allocable`

.

`alloc_x(var, decomp, global)`

- allocation using a x-pencil decomposition`alloc_y(var, decomp, global)`

- allocation using a y-pencil decomposition`alloc_y(var, decomp, global)`

- allocation using a z-pencil decomposition

where `var`

is the allocable array name,
`decomp[optional]`

the relative `DECOMP_INFO`

data type and
`global[optional]`

is a logical [True/False] flag to indicate if the array is allocated in the global coordinate system.
The allocation for a x pencil decomposition would be equivalent to the statement:

```
allocate(var(decomp%xsz(1), decomp%xsz(2), decomp%xsz(3))) ! if global==.false.
allocate(var(decomp%xst(1):decomp%xen(1), decomp%xst(2):decomp%xen(2), &
decomp%xst(3):decomp%xen(3))) ! if global==.true.
```

Allocated arrays can be simply released with an `deallocate(var)`

statement.

### Basic 2D Decomposition API¶

All the global variables described above, the defualt common type `decomp`

and the MPI initialization is done
using the following call

```
call decomp_2d_init(nx, ny, nz, p_row, p_col)
```

where `nx`

, `ny`

and `nz`

are the size of 3D global data to be distributed over
a 2D processor grid `p_row x p_col`

.
Note that none of the dimensions need to be divisible by `p_row`

or `p_col`

, i.e. the library can handle non-evenly distributed data.
In case of `p_row=p_col=0`

an automatic decomposition is selected among all possible combination available.
The algorithm will choose the closest combination such as

$$ n_{row} = n_{col} = {nproc}^{1/2} $$

In case the root is not exact the closest combination to $n_{row} = n_{col}$ with $n_{row} < n_{col}$ is used.
If a 1D slab decomposition is needed instead of a 2D pencil one, it is recommended to set `p_row`

to unity and `p_col`

to `nproc`

.

An optional parameter may be passed to this initialisation routine:

```
call decomp_2d_init(nx, ny, nz, p_row, p_col, periodic_bc)
```

Here `periodic_bc`

is a 1D array containing 3 logical values that specify whether periodic boundary condition
should apply in certain dimensions. Note this is only applicable if halo-cell communication is to be used.

Another optional parameter may be passed at the initialization stage:

```
call decomp_2d_init(nx, ny, nz, p_row, p_col, periodic_bc, comm)
```

Here `comm`

is the MPI communicator that the library will use. By default, MPI_COMM_WORLD is used.

A key element of this library is a set of communication routines that actually perform the data transpositions. As mentioned, one needs to perform 4 global transpositions to go through all 3 pencil orientations. Correspondingly, the library provides 4 communication subroutines:

```
call transpose_x_to_y(var_in,var_out)
call transpose_y_to_z(var_in,var_out)
call transpose_z_to_y(var_in,var_out)
call transpose_y_to_x(var_in,var_out)
```

The input array `var_in`

and `var_output`

array out should have been defined
and contain distributed data for the correct pencil orientations.

Note that the library is written using Fortran’s generic interface so different data types are supported without user intervention. That means in and out above can be either real arrays or complex arrays, the latter being useful for FFT-type of applications.

As seen, the communication details are packed within a black box. From a user’s perspective, it is not necessary to understand the internal logic of these transposition routines. From the developer’s perspective, he has the freedom to change the implementation without breaking user codes.

It is however noted that the communication routines are expensive, especially when running on large number of processors. So applications should try to minimize the number of calls to them by adjusting the algorithms in use, even sometimes by duplicating computations.

Finally, before exit, applications should clean up the memory by:

```
call decomp_2d_finalize
```

### Advanced 2D Decomposition API¶

While the basic decomposition API is very user-friendly, there may be situations in which applications need to handle more complex data structures. There are quite a few examples:

While using real-to-complex FFTs, applications need to store both the real input (say, of global size nx*ny*nz) and the corresponding complex output (of smaller global size - such as (nx/2+1)*ny*nz - where roughly half the output is dropped due to conjugate symmetry).

Many CFD applications use a staggered mesh system which requires different storage for global quantities (e.g. cell-centred vs. cell-interface storage).

In applications using spectral method, for anti-aliasing purpose, it is a common practice to enlarge the spatial domain before applying the Fourier transforms.

In all these examples, there are multiple global sizes and applications need to be able to distributed different data sets as 2D pencils. 2DECOMP&FFT provides a powerful and flexible programming interface to handle this:

```
TYPE(DECOMP_INFO) :: new_decomp
call decomp_info_init(n1, n2, n3, new_decomp)
```

Here decomp is an instance of Fortran derived data type DECOMP_INFO encapsulating
the 2D decomposition information associated with one particular global size `n1 x n2 x n3`

.
The decomposition object can be initialised using the `decomp_info_init`

routine as:

```
call decomp_info_init(n1,n2,n3, new_decomp)
```

This object then can be passed to the communication routines defined in the basic interface as a third parameter. For example:

```
call transpose_x_to_y(var_in, var_out, new_decomp)
```

The input and output arrays can be allocated as:

```
call alloc_x(var_in, new_decomp, .true.)
call alloc_y(var_out, new_decomp, .true.)
```

Finally the defined type needs also to be nullified using:

```
call decomp_info_finalize(new_decomp)
```