Numerical Relativity on your home computer: Finite Differences on a GPU Machine

Classical physics initial value problems, including those in General Relativity, are built upon the modeling of an initial state and then allowing the system to evolve in terms of time and motion. Typically, the equations of evolution are a system of partial differential equations. This means we have to figure out how to use a GPU machine to perform partial differentiation.

Finite-Differences are a method to approximate partial differentiation on a computer. The key formulas are:

for first order derivatives:

for second order derivatives:

For scalar variables in our evolution equations, we will use the CUDA routine scalardxdydz. Note that we are using flattened arrays in this routine to perform a 3-dimensional central finite difference.

A copy of the routine is presented below.

The inputs to this routine are defined in the statement that starts with '__global__',which designates this routine as one performed by the GPU, and not the CPU.

The variables include the array of floats, Wx, and flattened NxNxN array which holds the GPU values for a scalar field. Note this routine is used for all scalars in the simulation, and not just the scalar field W. (Actually, the input is a pointer to the flattened array of floats, *Wx, not the array itself, but that is a C/C++ detail.)

The main output for this routine is written to the flattened-array of vectors, parWx, which holds the result of the 3-dimensional central finite difference. So, the second element passed to the GPU routine is a pointer to that array, *parWx.

This is followed by the grid size variables for x, y, and z. So, a non-uniform grid could be used, though point of fact, for this example, we'll just the simple form

The final input to the routine is the number of elements in each dimension of the flattened-array, n. Note this is not the same as the size of the flattened array, which, for a 3-dimensional scalar filed, would be NxNxN floats.

A fairly standard method of getting the flattened-array index, Indx, follows. CUDA C/C++ built-in structures provide the x, y, and z block and thread IDs. These IDs will allow each thread to perform a task on a specific element of the array, Wx, since each thread will have a different combination of block and thread IDs.

The local variable delh is set to 1/2 ∆x, which is used to perform the central finite difference operation in the x-dimension. Then a boundary check is performed. If the x-index is between zero and n-1, then the central finite difference operation is performed. Note that in order to increment or decrement in x-direction in the flattened-array, we need to go Indx+n*n elements. If the element is on the boundary of the x-axis, in other words, if the index i=0 or i=n-1, we set the result to zero.

A similar operation is performed in the y-dimension and the z-dimension. The big difference is that in order to increment in the y-direction, we must skip ahead to the Indx+n element. For the z-direction, there is no skipping ahead required.

Note that the partial difference with respect to x is placed in the 0-th position of the array of vectors parWx. Likewise, the partial difference with respect to y is placed in the 1-st position of the array of vectors parWx and the partial difference with respect to z is placed in the 2-nd position of the array of vectors parWx.

An important point here is that if the user needs to print the results stored in parWx[Indx].v[i], we normally first copy those results to a host (CPU) array, and print from there. Copying large arrays back and forth between the GPU memory and the CPU memory is quite time consuming, relatively speaking. So, it is done as little as possible. More during de-bugging. Otherwise, keeping the results in GPU memory is probably good enough.

//find the partial derivatives of a 3-D scalar field
__global__ void scalardxdydz(float *Wx, vector *parWx, float delx, float dely, float delz, int n){

const int i = blockIdx.x*blockDim.x + threadIdx.x;

const int j = blockIdx.y*blockDim.y + threadIdx.y;

const int k = blockIdx.z*blockDim.z + threadIdx.z;

int Indx= i*n*n+j*n+k;

float delh = 0.5/delx;

if((i>0) && (i<n-1)){ //check B.C. on i ==> x

parWx[Indx].v[0] = delh*(Wx[Indx+n*n] - Wx[Indx-n*n]);

}

if ((i == 0)||(i == n-1)){

parWx[Indx].v[0] = 0.0;

delh = 0.5/dely;

if((j>0) && (j<n-1)){ //check B.C. on j ==> y

parWx[Indx].v[1] = delh*(Wx[Indx+n] - Wx[Indx-n]);

}

if ((j == 0)||(j == n-1) ){

parWx[Indx].v[1] = 0.0;

delh = 0.5/delz;

if((k>0) && (k<n-1)){ //check B.C. on k ==> z

parWx[Indx].v[2] = delh*(Wx[Indx+1] - Wx[Indx-1]);

}

if ((k == 0)||(k == n-1)){

parWx[Indx].v[2] = 0.0;

}

}

<--Back: Implementation on a GPU example
<--Back: Overview

Numerical Relativity on your home computer

Friday, December 28, 2018

Finite Differences on a GPU Machine

No comments:

Post a Comment

Overview -- Numerical Relativity Using CUDA C/C++ Is Easier Than You Think!

Report Abuse