9600-1020/01 – Libraries for parallel data processing (KPZD)

Gurantor departmentIT4InnovationsCredits4
Subject guarantorIng. Stanislav Böhm, Ph.D.Subject version guarantorIng. Stanislav Böhm, Ph.D.
Study levelundergraduate or graduateRequirementCompulsory
Year1Semesterwinter
Study languageCzech
Year of introduction2019/2020Year of cancellation
Intended for the facultiesFEIIntended for study typesFollow-up Master
Instruction secured by
LoginNameTuitorTeacher giving lectures
BOH126 Ing. Stanislav Böhm, Ph.D.
Extent of instruction for forms of study
Form of studyWay of compl.Extent
Full-time Graded credit 2+2
Part-time Graded credit 9+9

Subject aims expressed by acquired skills and competences

Students get an overview of libraries and frameworks for parallel processing of large data and gain a basic experience with usage of most famous libraries. The course shows basic concepts and manipulations with big data and basic paradigms and programming models for their processing. Exercises will use Python, a programming language where all well-known frameworks can be used.

Teaching methods

Lectures
Tutorials
Project work

Summary

Compulsory literature:

• Pandas documentation: http://pandas.pydata.org/ • Spark documentation: https://spark.apache.org/docs/latest/ • Tensorflow documentation: https://www.tensorflow.org/ • Keras documentation: https://keras.io/

Recommended literature:

• Nathan Marz and James Warren: Big Data - Principles and best practices of scalable realtime data systems, Manning, April 2015 ISBN 9781617290343.

Way of continuous check of knowledge in the course of semester

student project

E-learning

Other requirements

No other requirements.

Prerequisities

Subject has no prerequisities.

Co-requisities

Subject has no co-requisities.

Subject syllabus:

Student po absolvování předmětu získá přehled o knihovnách pro paralelní zpracování velkých dat a získá základní zkušenost s použitím nejznámějších knihoven. Budou představeny základní koncepty jak s velkými daty minipulovat a základní paradigmata a programové modely pro jejich zpracování. Cvičení budou probíhat v jazyce Python, ve kterém existují knhovny pro všechný známé frameworky. Osnova předmětu: 1. Úvod do zpracování velkých dat 2. Základní manipulace s daty (Pandas, Numpy) 3. Map & Reduce model (Hadoop, Spark, Flink) 4. Paralelní zpracovaní numerických dat v Pythonu (Dask) 5. Knihovny pro neuronové sítě I (Tensorflow, Theano) 6. Knihovny pro neuronové sítě II (Keras) 7. Paralelizace obecných úloh (HyperLoom) 8. Workflow systémy (Luigi, Airflow)

Conditions for subject completion

Full-time form (validity from: 2019/2020 Winter semester)
Task nameType of taskMax. number of points
(act. for subtasks)
Min. number of points
Graded credit Graded credit 100  51
Mandatory attendence parzicipation: Attendance on exercises.

Show history

Occurrence in study plans

Academic yearProgrammeField of studySpec.ZaměřeníFormStudy language Tut. centreYearWSType of duty
2021/2022 (N0541A170007) Computational and Applied Mathematics (S02) Computational Methods and HPC K Czech Ostrava 1 Compulsory study plan
2021/2022 (N0541A170007) Computational and Applied Mathematics (S02) Computational Methods and HPC P Czech Ostrava 1 Compulsory study plan
2020/2021 (N0541A170007) Computational and Applied Mathematics (S02) Computational Methods and HPC P Czech Ostrava 1 Compulsory study plan
2020/2021 (N0541A170007) Computational and Applied Mathematics (S02) Computational Methods and HPC K Czech Ostrava 1 Compulsory study plan
2019/2020 (N0541A170007) Computational and Applied Mathematics (S02) Computational Methods and HPC P Czech Ostrava 1 Compulsory study plan
2019/2020 (N0541A170007) Computational and Applied Mathematics (S02) Computational Methods and HPC K Czech Ostrava 1 Compulsory study plan

Occurrence in special blocks

Block nameAcademic yearForm of studyStudy language YearWSType of blockBlock owner