Sunday, December 30, 2018

Implementation on a CPU Example

With respect to arrays in CUDA C/C++, many programmers have found it convenient to use so-called ‘flattened arrays’. They take a little getting used to, but they do make creation and parameter passing to the GPU much simpler.

So, in this example, that is what we will use. Although, first let's look at how one might create a flattened-array of a structure one any old CPU in C/C++. Then, we'll jump into how to do this on a GPU in CUDA C/C++.

The basic idea of a flattened array is to treat any NxN matrix in our equations as if were an N2 element one-dimensional array. As long as we are consistent in how we do this, it all works out fine. Therefore, instead of creating a 3x3 2-dimensional array, we just create a 9-element one-dimensional array instead.

Note that we want to create arrays of structures. For our purposes, we'll be creating arrays of structures that will hold the values of our variables for every grid point in our 3-D grid. We might want to store, for example, the shift variable, β, at each point in a 100x100x100 grid. Therefore, we would create a 1,000,000 element one-dimensional array with each element being a vector (the vector structure we created.)

All well and good. We just have to figure out how to map the 3D grid to the 1D flattened array. Let's look at how that is done on a CPU machine, using C.

So, for example, a loop that scrolls through the elements of a NxNxN Tensor, B, we might code like the following without the use of flattened arrays:

for(int = i;i<N;i++){
              for(int = j;j<N;j++){
for(int = k;k<N;k++){
                                           B[i][j][k] = something;
}
}
}

Now, using flattened arrays, which is to say all of our arrays are one-dimensional, there is just a slight modification we need, a way to modify and track the index (called Indx below.) An example of such a loop written in standard C (not using the GPU functions) is shown below:

for(int = i;i<N;i++){
              for(int = j;j<N;j++){
for(int = k;k<N;k++){
                 Indx = N*N*i + N*j + k;
                                           B[Indx] = something;
      }
}
}

Next: Implementation on GPU Example-->
<--Back: Structures Used to Implement on GPU
<--Back: Overview

No comments:

Post a Comment

Overview -- Numerical Relativity Using CUDA C/C++ Is Easier Than You Think!

Simulation results of a binary black hole head-on collision run on a GPU based home gaming computer Not too long ago, the only way ...