This lesson is being piloted (Beta version)

GPU Programming (N8 CIR): Glossary

Key Points

Introduction
  • CPUs and GPUs are both useful and each has its own place in our toolbox

  • In the context of GPU programming, we often refer to the GPU as the device and the CPU as the host

  • Using GPUs to accelerate computation can provide large performance gains

  • Using the GPU with Python is not particularly difficult

Using your GPU with CuPy
  • CuPy provides GPU accelerated version of many NumPy functions.

  • Always have CPU and GPU versions of your code so that you can compare performance, as well as validate your code.

Accelerate your Python code with Numba
  • Numba can be used to run your own Python functions on the GPU.

  • Functions may need to be changed to run correctly on a GPU.

A Better Look at the GPU
Your First GPU Kernel
  • Precede your kernel definition with the __global__ keyword

  • Use built-in variables threadIdx, blockIdx, gridDim and blockDim to identify each thread

Registers, Global, and Local Memory
  • Registers can be used to locally store data and avoid repeated memory operations

  • Global memory is the main memory space and it is used to share data between host and GPU

  • Local memory is a particular type of memory that can be used to store data that does not fit in registers and is private to a thread

Shared Memory and Synchronization
  • Shared memory is faster than global memory and local memory

  • Shared memory can be used as a user-controlled cache to speedup code

  • Size of shared memory arrays must be known at compile time if allocated inside a thread

  • It is possible to declare extern shared memory arrays and pass the size during kernel invocation

  • Use __shared__ to allocate memory in the shared memory space

  • Use __syncthreads() to wait for shared memory operations to be visible to all threads in a block

Constant Memory
  • Globally scoped arrays, which size is known at compile time, can be stored in constant memory using the __constant__ identifier

Concurrent access to the GPU

Glossary

FIXME