Topic 2: Programming GPUs and Accelerators with Directives

Learning outcomes

• Concepts of heterogeneous computing: modern heterogeneous architectures and programming models supported
• Evaluation of scalability and speedup on GPUs
• Bottleneck detection with profiler tools
• Programming on GPUs/accelerators with standards such as OpenMP and OpenACC
‍

Overview of the

Lecture content

The lecture starts with a revision of heterogeneous parallel architectures. The heterogeneous systems are nowadays used to improve the performance rates focusing on an energy efficiency not adding same type of processors, but including coprocessors. The well-known Graphic Processor Units (GPUs) is the most used coprocessor although there exists among others FPGAs or the neural processors. GPUs were initially designed for 3D graphic rendering but due to the large number of vector units inside them, allow speeding up intensive computations. Although these types of devices offer more FLOPS than the general purpose CPUs, their programmability is one of the challenges that are still present. The GPU's manufacturer NVIDIA has promoted the CUDA programming model, which has managed to popularize these types of devices, despite the fact that CUDA only runs on NVIDA GPUs. Other initiatives such as OpenCL tries to favor the migration and portability between other types of manufacturers such as Intel, AMD or ARM but still suffers from the overhead in the code development and maintenance.

The lecture will continue with a revision of modern accelerators and their programming model and an overview of the main use cases and possible drawbacks. The GPUs programming task is addressed by means of the OpenACC programming model. OpenACC allows to express in a more friendly way than CUDA or OpenCL the code sections to be offloaded on the accelerator. This programming task is carried out by means of directives, which allows to express the kernels or code sections to be run on the accelerator as well as the amount of information to be transferred between the host and device. During a practical session, attendants will analyze and evaluate the parallel performance of OpenACC by means of profiler tools.

The OpenMP programming standard is then addressed. The well-known OpenMP standard was proposed in the late 90's to express parallelism in a multiprocessor based systems by means of directives. Among its successive evolution can find the support for accelerators from its 4.0 version. OpenMP allows the development of parallel codes not only in NVIDIA GPUs but it is currently also supported by other types of accelerators such as integrated and discrete GPUs from Intel or AMD, or the recent announcement by Intel of the powerful Ponte Vecchio GPU.

Finally, OpenMP or OpenACC will be used to implement a whole algorithm to tackle a particular application from remote sensing. Any of the directives (showed in theory) will be applied to get a significant acceleration factor compared to the serial version in C programming. During this example, profiler tools will be used to get the best possible performance.

Meet

The Instructors

Prof. Sergio Bernabé García

Bio

Sergio Bernabé received the degree in computer engineering and the M.Sc. degree in computer engineering from the University of Extremadura, Cáceres, Spain, in 2010, and the joint Ph.D. degree from the University of Iceland, Reykjavík, Iceland, and the University of Extremadura, Badajoz, Spain, in 2014.

He has been a Visiting Researcher with the Institute for Applied Microelectronics, University of Las Palmas de Gran Canaria, Las Palmas, Spain, and also with the Computer Vision Laboratory, Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil. He was a Post-Doctoral Researcher (funded by FCT) with the Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal, and a Post-Doctoral Researcher (funded by the Spanish Ministry of Economy and Competitiveness) with the Complutense University of Madrid (UCM), Madrid, Spain. He is currently an Assistant Professor with the Department of Computer Architecture and Automation, UCM. His research interests include the development and efficient processing of parallel techniques for different types of high-performance computing architectures.

Dr. Bernabé was a recipient of the Best Paper Award of the IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATION AND REMOTE SENSING (JSTARS) in 2013 and the Best Ph.D. Dissertation Award at the University of Extremadura, Cáceres, in 2015. He is an Active Reviewer of international conferences and international journals, including the IEEE JSTARS, the IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING (TGR), and IEEE GEOSCIENCE AND REMOTE SENSING LETTERS (GRSL).

‍

Prof. Carlos García Sánchez

Bio

Carlos Garcia received his B.S. and M.S. degrees in Physics in 1999 and his Ph.D. degree in 2007, both from the University Complutense of Madrid (UCM), Spain. He has been an Associate Professor at the Computer Architecture Department at UCM since 2019. His research interests include high-performance computing for heterogeneous parallel architecture, focusing on efficient parallel exploitation on modern devices such as multicore, manycore, GPUs, and FPGAs.

Member of several competitive national research projects known as CICYT since 2000. Member and head of several projects linked with enterprise, which more relevant results are some productive software to predict and avoid river flooding. Regarding publications, he is the first and second author of several articles in relevant international journals and conferences. Author of more than JCR 25 publications and several conference papers. He has also been editor of two Special-Issues in indexed journals.

Focusing in his teaching task, he has mainly taught subjects regarding "Operating Systems", "Computer Architecture Introduction", "GPUs and accelerator programming" and "High Performance Computing" in the degree and master curricula in UCM.

‍