GPU Programming with CUDA

NSM Nodal Centre for Training in HPC and AI

Home | Syllabus | Registration | Schedule

Syllabus

Introduction - history, graphics processors, graphics processing units, GPGPUs - clock speeds, CPU / GPU comparisons, heterogeneity - accelerators, parallel programming, CUDA / OpenCL / OpenACC, Hello World Computation - kernels, launch parameters - thread hierarchy, warps / wavefronts, thread blocks / workgroups, streaming multiprocessors - 1D / 2D / 3D thread mapping, device properties, simple programs Memory - memory hierarchy, DRAM / global, local / shared, private / local, textures, constant memory - pointers, parameter passing, arrays and dynamic memory, multi-dimensional arrays - memory allocation, memory copying across devices - programs with matrices, performance evaluation with different memories Synchronization - memory consistency - barriers (local versus global), atomics, memory fence - prefix sum, reduction - programs for concurrent data structures such as worklists, linked-lists - synchronization across CPU and GPU Functions - device functions,host functions, kernels, functors - using libraries (such as Thrust), developing libraries Support - debugging GPU programs - profiling, profile tools, performance aspects Streams - asynchronous processing, tasks, task-dependence - overlapped data transfers, default stream, synchronization with streams - events, event-based-synchronization - overlapping data transfer and kernel execution, pitfalls Case studies - graph algorithms Advanced topics - dynamic parallelism - unified virtual memory - multi-GPU processing - peer access - heterogeneous processing