This lesson is being piloted (Beta version)

GPU Programming (N8 CIR): Glossary

Key Points

Introduction	CPUs and GPUs are both useful and each has its own place in our toolbox In the context of GPU programming, we often refer to the GPU as the device and the CPU as the host Using GPUs to accelerate computation can provide large performance gains Using the GPU with Python is not particularly difficult
Using your GPU with CuPy	CuPy provides GPU accelerated version of many NumPy functions. Always have CPU and GPU versions of your code so that you can compare performance, as well as validate your code.
Accelerate your Python code with Numba	Numba can be used to run your own Python functions on the GPU. Functions may need to be changed to run correctly on a GPU.
A Better Look at the GPU
Your First GPU Kernel	Precede your kernel definition with the `__global__` keyword Use built-in variables `threadIdx`, `blockIdx`, `gridDim` and `blockDim` to identify each thread
Registers, Global, and Local Memory	Registers can be used to locally store data and avoid repeated memory operations Global memory is the main memory space and it is used to share data between host and GPU Local memory is a particular type of memory that can be used to store data that does not fit in registers and is private to a thread
Shared Memory and Synchronization	Shared memory is faster than global memory and local memory Shared memory can be used as a user-controlled cache to speedup code Size of shared memory arrays must be known at compile time if allocated inside a thread It is possible to declare `extern` shared memory arrays and pass the size during kernel invocation Use `__shared__` to allocate memory in the shared memory space Use `__syncthreads()` to wait for shared memory operations to be visible to all threads in a block
Constant Memory	Globally scoped arrays, which size is known at compile time, can be stored in constant memory using the `__constant__` identifier
Concurrent access to the GPU

Glossary

FIXME