460-4056/01 – Programming of Parallel Applications II (PPA II)
Gurantor department | Department of Computer Science | Credits | 5 |
Subject guarantor | doc. Ing. Petr Gajdoš, Ph.D. | Subject version guarantor | doc. Ing. Petr Gajdoš, Ph.D. |
Study level | undergraduate or graduate | Requirement | Optional |
Year | 1 | Semester | summer |
| | Study language | Czech |
Year of introduction | 2012/2013 | Year of cancellation | 2019/2020 |
Intended for the faculties | FEI | Intended for study types | Follow-up Master |
Subject aims expressed by acquired skills and competences
The main goal consists in the knowledge extension in the area of programming of parallel applications. The lessons extend an existing subject (Programming of Parallel Applications). All topics will be focused on usage of graphic processor units (GPU). Students will be familiar with existing architectures of GPUs and frameworks for parallel programming. The CUDA architecture will be explained in more detail with the respect to the fact, that nVidia Research Center has arisen on VŠB-TU Ostrava. Students get necessary knowledge to be able to solve practical tasks with the usage of GPU. They can use it in their diploma work or in several grant projects running on VŠB-TU Ostrava.
Knowledge and skills:
- orientation in the basic concept of architecture of graphic processors
- knowledge of software architecture of parallel program, problem decomposition into grids, blocks and threads
- knowledge of selected framework for parallel programming on GPU
- understanding of algorithm conversion from serial to parallel form
- task distribution over several GPUs, clusters
- students should be able to solve practical tasks in the area of data processing
Teaching methods
Lectures
Individual consultations
Tutorials
Project work
Summary
The subject follows an existing one called Programming of Parallel Applications I. Acquired knowledge makes a presumption for understanding of new topics. Selected lecture notes give a ground for practical exercises. nVidia CUDA architecture will be presented in more detail will related tools for parallel programming on GPU. Assumption of parallel programming technics in combination with solving of practical tasks makes the most important premises to pass the final exam.
Compulsory literature:
1. Nir Shavit, Maurice Herlihy: The Art of Multiprocessor Programming, Morgan Kaufmann (March 14, 2008) , ISBN-13: 978-0123705914
2. Edward Kandrot, Jason Sanders: CUDA by Example: An Introduction to General-Purpose GPU Programming, Addison-Wesley Professional; 1 edition (July 29, 2010), ISBN-13: 978-0131387683
3. David Kirk: Programming Massively Parallel Processors: A Hands-on Approach (Applications of GPU Computing Series), Morgan Kaufmann; 1 edition (February 5, 2010), 978-0123814722
Recommended literature:
1. Timothy G. Mattson: Patterns for Parallel Programming, Addison-Wesley Professional; 1 edition (September 25, 2004), ISBN-13: 978-0321228116
Way of continuous check of knowledge in the course of semester
A Student will work alone or within a group of max. two people on associated project. The solution of the project will be checked during semester. The final evaluation of achieved results will run at the end of semester.
E-learning
Other requirements
Additional requirements are placed on the student.
Prerequisities
Co-requisities
Subject has no co-requisities.
Subject syllabus:
The lecture notes are designed such that they can make the basis for practical exercising on computer labs.
1. Introduction to parallel programming on GPU, a brief history, CUDA
- evolution of graphic processors
- the beginning of programming on GPU, GPGPU
- programmable pipeline
- current architectures
2. CUDA architecture and its integration within standard C++ project
- key features of selected architecture
- decomposition of an algorithm
- hardware vs. software decomposition
3. Threads and kernel functions
- thread and its meaning in GPU
- threads hierarchy, basic thread life cycle, limits
- calling of kernel functions, parameters and restrictions
4. CUDA memories, patterns and usage
- global, shared, and constant memory, registers and texture memory
- allocation and deallocation of memory
- memory alignment
- copying data from RAM to VRAM and wise versa
5. Memory bank conflicts
- access optimization
- suitable data structures
6. Program execution control, distribution af an algorithm
- streams, parallel calling of kernel functions
- synchronization on several levels – threads, blocks, GPU vs. CPU
- program distribution on more GPUs
7. Algorithm performance with respect to its parallelization on GPU
- case study, experiment with more variants of the same program
8. Vectors and matrices
- case study, large data processing
- parallel reduction
9. Support library CUBLAS
- introduction to several support libraries for linear algebra
10. Performance optimization
- case study, image manipulation
- double buffering
- optimization at the level of blocks, registers, etc.
11. Case study
- interesting research topics
- outline of possible solutions
- experiments
- program tuning, debugging with nVidia nSight
Conditions for subject completion
Occurrence in study plans
Occurrence in special blocks
Assessment of instruction