I wrote CUDA code for speculative DFA Membership test[1].
A brief description would be:
We partition the input strings into chunks of appropriate sizes. We then assign one chunk to one thread which computes a mapping from each state S[i] to the state reached when simulating the assigned chunk with the starting state as S[i]. So, we get a mapping for each chunk.
Now, I am trying to merge these mappings using reduction operation in CUDA. Following is the code that I worte:
__global__ void finalStateUsingRedn(int M, int *fStates, int q0, unsigned int maxThreads, int *d_check) {
long long unsigned int idx=threadIdx.x+blockIdx.x*blockDim.x;
if(idx>=maxThreads)
return;
int i;
for(long long unsigned int s=1; s<maxThreads; s*=2) {
if(idx%(2*s)==0) {
if((idx+s)<maxThreads) {
for(i=0; i<M; ++i) {
int x=fStates[i+idx*M];
fStates[i+idx*M]=fStates[x+(idx+s)*M];
}
}
}
__syncthreads();
}
}
Here, M is the total number of states, fStates is a 1d M*maxThreads sized integer array containing mapping of each state S[i](total number of states is M) for each chunk (total number of chunks is maxThreads) to the final state reached when simulating that chunk on S[i].
Description of what I mean by merging mapping:
Suppose map[i] and map[i+1] are two mappings then merging these mapping will finally make map[i][j]=map[i+1][map[i][j]]. In my code these 2d map matrices are mapped to 1d array fStates.
The problem is that I am not getting the correct answer of the overall final state and I am sure that the something is wrong with the above kernel code. I've checked every other part of the code. The final state is correct when I compute it sequentially.
Can someone please help me out quickly. I've spent nearly 12 hours to figure out the bug.
P.S. This is my first question. So, I'm sorry if the question is not framed correctly.
Aucun commentaire:
Enregistrer un commentaire