GPU Programming with OpenMP
Tim Mattson (Intel) and Larry Meadows (Intel)
OpenMP 1.0 was released in 1997 when the primary concern was symmetric multiprocessors. Over time, hardware has evolved with more complex memory hierarchies forcing us to embrace NUMA machines and the vector units on a CPU leading to SIMD constructs. Current trends in hardware bring co-processors such as GPUs into the fold. A modern platform is often a heterogeneous system with CPU cores, GPU cores, and eventually other specialized accelerators. OpenMP has responded by adding directives that map code and data onto a device. We refer to this family of directives as the target directives.
In this tutorial, we will explore these directives as they apply to programming GPUs. We will very briefly review the fundamentals of OpenMP and then move directly to the target directives and their use in complex application programs. We will run the tutorial in what we call “demo mode” where we together write code to explore OpenMP. In addition to using OpenMP to write programs, we will also explore how the technology works; i.e., a look under the hood to understand how the OpenMP compiler and runtime system work together to map code onto a GPU.
Tim Mattson is a parallel programmer obsessed with every variety of science (Ph.D. Chemistry, UCSC, 1985). He is a senior principal engineer in Intel’s parallel computing lab at Intel.
Tim has been with Intel since 1993 and has worked with brilliant people on great projects including: (1) the first TFLOP computer, (2) the OpenMP and OpenCL programming languages, (3) two different research processors (Intel’s TFLOP chip and the 48 core SCC), (4) Data management systems (Polystore systems and Array-based storage engines), and (5) the GraphBLAS API for expressing graph algorithms as sparse linear algebra.
Tim is passionate about teaching. He’s been teaching OpenMP longer than anyone on the planet with OpenMP tutorials at numerous venues including every SC’XY conference but one since 1998. He has published four books on different aspects of parallel computing with a new one due November 2019 titled “The OpenMP Common Core: making OpenMP Simple Again”.
Larry Meadows was a founder of the Portland Group (PGI) in 1989.PGI provided the compilers for the first teraflop machine (ASCI Red) and the precursors (Intel Paragon). After PGI he worked at Sun Microsystems from 1999-2004, when he joined Intel and is now a Senior Principal Engineer. The interesting bits are still HPC, but that has expanded to include AI and Machine Learning, big data analytics, and most anything that needs to do a lot of work in parallel.