0-Basics

KernelA<<<nBlk, nTid>>(args);

A grid of blocks

  • gridDim, number of blocks in each dim

  • blockIdx, the index of the block in the grid

  • blockDim, number of threads in a block

A block of threads

  • threadIdx, the index of the thread in the block

// 1-d
int global_thread_id = blockDim.x * blockIdx.x + threadIdx.x;

// 2-d
int global_x = blockDim.x * blockIdx.x + threadIdx.x;
int global_y = blockDim.y * blockIdx.y + threadIdx.y;

// 3-d
int global_x = blockDim.x * blockIdx.x + threadIdx.x;
int global_y = blockDim.y * blockIdx.y + threadIdx.y;
int global_z = blockDim.z * blockIdx.z + threadIdx.z;

Note

There is no gridIdx.

  • __host__: default, calls from CPU, runs on CPU

  • __device__: calls from GPU, runs on GPU

  • __global__: calls from CPU, runs on GPU

dim3 dimGrid(32, 1, 1);
dim3 dimBlock(128, 1, 1);
vecAddKernel<<<dimGrid, dimBlock>>>(xxxx);