9600-0004/01 – Big Data Processing (ZRD)
Gurantor department | IT4Innovations | Credits | 10 |
Subject guarantor | doc. Mgr. Jiří Dvorský, Ph.D. | Subject version guarantor | doc. Mgr. Jiří Dvorský, Ph.D. |
Study level | postgraduate | Requirement | Choice-compulsory |
Year | | Semester | winter + summer |
| | Study language | Czech |
Year of introduction | 2015/2016 | Year of cancellation | |
Intended for the faculties | FEI, USP | Intended for study types | Doctoral |
Subject aims expressed by acquired skills and competences
The aim of the subject is to present PhD students with the field of big data processing and currently available technologies including the technologies and approaches under development.
Teaching methods
Lectures
Individual consultations
Summary
The importance of big data for business decisions, company strategies, research of human behaviour on social networks, and targeted advertising has recently proved to be unquestionable. Moreover, the top scientific centres (e.g. CERN) have proved the need for routine storage of previously unimaginable amounts of data. The key issue of big data processing is, first of all, the storage of extremely big data sets, for example the sets of documents, stream data from sensor networks, time series (e.g. share prices on stock market, transport data), graph database representing social networks and webs, satellite images of the Earth’s surface etc. It has proved that the standard relational databases are not suitable for processing such an enormous amount of data and that massively parallel software running on hundreds and thousands of servers needs to be employed. A part of the course is the presentation of technologies forming the current state of big data processing, technologies such as Hadoop Distributed File System, NoSQL database, or HDF5 hierarchical data format. There will also be presented data structures suitable for various kinds of data, their manipulation, effective questioning, costs of I/O operations, specific kinds of data compression, algorithms and data structures suitable for computational accelerators (CUDA, Intel Xeon Phi).
Compulsory literature:
Recommended literature:
Way of continuous check of knowledge in the course of semester
oral exam
E-learning
Other requirements
No other requirements.
Prerequisities
Subject has no prerequisities.
Co-requisities
Subject has no co-requisities.
Subject syllabus:
The importance of big data for business decisions, company strategies, research of human behaviour on social networks, and targeted advertising has recently proved to be unquestionable. Moreover, the top scientific centres (e.g. CERN) have proved the need for routine storage of previously unimaginable amounts of data. The key issue of big data processing is, first of all, the storage of extremely big data sets, for example the sets of documents, stream data from sensor networks, time series (e.g. share prices on stock market, transport data), graph database representing social networks and webs, satellite images of the Earth’s surface etc. It has proved that the standard relational databases are not suitable for processing such an enormous amount of data and that massively parallel software running on hundreds and thousands of servers needs to be employed. A part of the course is the presentation of technologies forming the current state of big data processing, technologies such as Hadoop Distributed File System, NoSQL database, or HDF5 hierarchical data format. There will also be presented data structures suitable for various kinds of data, their manipulation, effective questioning, costs of I/O operations, specific kinds of data compression, algorithms and data structures suitable for computational accelerators (CUDA, Intel Xeon Phi).
Conditions for subject completion
Occurrence in study plans
Occurrence in special blocks
Assessment of instruction
Předmět neobsahuje žádné hodnocení.