9600-0004/02 – Big Data Processing (ZRD)

Gurantor departmentIT4InnovationsCredits10
Subject guarantordoc. Mgr. Jiří Dvorský, Ph.D.Subject version guarantordoc. Mgr. Jiří Dvorský, Ph.D.
Study levelpostgraduateRequirementChoice-compulsory
YearSemesterwinter + summer
Study languageEnglish
Year of introduction2015/2016Year of cancellation
Intended for the facultiesUSPIntended for study typesDoctoral
Instruction secured by
LoginNameTuitorTeacher giving lectures
DVO26 doc. Mgr. Jiří Dvorský, Ph.D.
Extent of instruction for forms of study
Form of studyWay of compl.Extent
Full-time Examination 2+0
Combined Examination 10+0

Subject aims expressed by acquired skills and competences

The aim of the subject is to present PhD students with the field of big data processing and currently available technologies including the technologies and approaches under development.

Teaching methods

Lectures
Individual consultations

Summary

The importance of big data for business decisions, company strategies, research of human behaviour on social networks, and targeted advertising has recently proved to be unquestionable. Moreover, the top scientific centres (e.g. CERN) have proved the need for routine storage of previously unimaginable amounts of data. The key issue of big data processing is, first of all, the storage of extremely big data sets, for example the sets of documents, stream data from sensor networks, time series (e.g. share prices on stock market, transport data), graph database representing social networks and webs, satellite images of the Earth’s surface etc. It has proved that the standard relational databases are not suitable for processing such an enormous amount of data and that massively parallel software running on hundreds and thousands of servers needs to be employed. A part of the course is the presentation of technologies forming the current state of big data processing, technologies such as Hadoop Distributed File System, NoSQL database, or HDF5 hierarchical data format. There will also be presented data structures suitable for various kinds of data, their manipulation, effective questioning, costs of I/O operations, specific kinds of data compression, algorithms and data structures suitable for computational accelerators (CUDA, Intel Xeon Phi).

Compulsory literature:

• S. Sakr, M. Gaber: Large Scale and Big Data: Processing and Management, Auerbach Publications, 2014, ISBN 978-1466581500 • T. White: Hadoop: The Definitive Guide, Yahoo Press, 2014, ISBN 978-1449311520 • P. J. Sadalage: NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, Addison-Wesley Professional, 2012, ISBN 978-0321826626

Recommended literature:

• J. Jeffers, J. Reinders: Intel Xeon Phi Coprocessor High-Performance Programming, Morgan Kaufmann, 2013, ISBN 978-0124104143 • G. Barlas: Multicore and GPU Programming: An Integrated Approach, Morgan Kaufmann, 2014, ISBN 978-0124171374 • J. Leskovec, A. Rajaraman, J. D. Ullman: Mining of Massive Datasets, Cambridge University Press, 2014, ISBN 978-1107077232 • V. S. Agneeswaran: Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives, Pearson FT Press, 2014, ISBN 978-0133837940

Way of continuous check of knowledge in the course of semester

E-learning

Další požadavky na studenta

No other requirements.

Prerequisities

Subject has no prerequisities.

Co-requisities

Subject has no co-requisities.

Subject syllabus:

The importance of big data for business decisions, company strategies, research of human behaviour on social networks, and targeted advertising has recently proved to be unquestionable. Moreover, the top scientific centres (e.g. CERN) have proved the need for routine storage of previously unimaginable amounts of data. The key issue of big data processing is, first of all, the storage of extremely big data sets, for example the sets of documents, stream data from sensor networks, time series (e.g. share prices on stock market, transport data), graph database representing social networks and webs, satellite images of the Earth’s surface etc. It has proved that the standard relational databases are not suitable for processing such an enormous amount of data and that massively parallel software running on hundreds and thousands of servers needs to be employed. A part of the course is the presentation of technologies forming the current state of big data processing, technologies such as Hadoop Distributed File System, NoSQL database, or HDF5 hierarchical data format. There will also be presented data structures suitable for various kinds of data, their manipulation, effective questioning, costs of I/O operations, specific kinds of data compression, algorithms and data structures suitable for computational accelerators (CUDA, Intel Xeon Phi).

Conditions for subject completion

Full-time form (validity from: 2015/2016 Winter semester)
Task nameType of taskMax. number of points
(act. for subtasks)
Min. number of points
Examination Examination  
Mandatory attendence parzicipation:

Show history

Occurrence in study plans

Academic yearProgrammeField of studySpec.FormStudy language Tut. centreYearWSType of duty
2019/2020 (P2658) Computational Sciences (2612V078) Computational Sciences P English Ostrava Choice-compulsory study plan
2019/2020 (P2658) Computational Sciences (2612V078) Computational Sciences K English Ostrava Choice-compulsory study plan
2018/2019 (P2658) Computational Sciences (2612V078) Computational Sciences P English Ostrava Choice-compulsory study plan
2018/2019 (P2658) Computational Sciences (2612V078) Computational Sciences K English Ostrava Choice-compulsory study plan
2017/2018 (P2658) Computational Sciences (2612V078) Computational Sciences P English Ostrava Choice-compulsory study plan
2017/2018 (P2658) Computational Sciences (2612V078) Computational Sciences K English Ostrava Choice-compulsory study plan
2016/2017 (P2658) Computational Sciences (2612V078) Computational Sciences P English Ostrava Choice-compulsory study plan
2016/2017 (P2658) Computational Sciences (2612V078) Computational Sciences K English Ostrava Choice-compulsory study plan
2015/2016 (P2658) Computational Sciences (2612V078) Computational Sciences P English Ostrava Choice-compulsory study plan
2015/2016 (P2658) Computational Sciences (2612V078) Computational Sciences K English Ostrava Choice-compulsory study plan

Occurrence in special blocks

Block nameAcademic yearForm of studyStudy language YearWSType of blockBlock owner