So, in this example, that is what we will use. Although, first let's look at how one might create a flattened-array of a structure one any old CPU in C/C++. Then, we'll jump into how to do this on a GPU in CUDA C/C++.
The basic idea of a flattened array is to treat any NxN matrix in our equations as if were an N2 element one-dimensional array. As long as we are consistent in how we do this, it all works out fine. Therefore, instead of creating a 3x3 2-dimensional array, we just create a 9-element one-dimensional array instead.
Note that we want to create arrays of structures. For our purposes, we'll be creating arrays of structures that will hold the values of our variables for every grid point in our 3-D grid. We might want to store, for example, the shift variable, β, at each point in a 100x100x100 grid. Therefore, we would create a 1,000,000 element one-dimensional array with each element being a vector (the vector structure we created.)
All well and good. We just have to figure out how to map the 3D grid to the 1D flattened array. Let's look at how that is done on a CPU machine, using C.
So, for example, a loop
that scrolls through the elements of a NxNxN
Tensor, B, we might code like the following without the use of
flattened arrays:
for(int = i;i<N;i++){
B[i][j][k] = something;
}
}
Now, using flattened
arrays, which is to say all of our arrays are one-dimensional, there is just a
slight modification we need, a way to modify and track the index (called Indx below.) An example of
such a loop written in standard C (not using the GPU functions) is shown below:
for(int = i;i<N;i++){
for(int
= j;j<N;j++){
for(int
= k;k<N;k++){
Indx =
N*N*i + N*j + k;
B[Indx] = something;
}
}
}
Next: Implementation on GPU Example-->
<--Back: Structures Used to Implement on GPU
<--Back: Overview
No comments:
Post a Comment