[ 921CGELBDEK21 ] KV Big Data Engineering

Workload Education level Study areas Responsible person Hours per week Coordinating university
3 ECTS M1 - Master's programme 1. year Computer Science Werner Retschitzegger 2 hpw Johannes Kepler University Linz
Detailed information
Original study plan Master's programme Computer Science 2021S
Objectives First of all, the rationale behind the importance of this course’s topic is that big data can unfold its full potential only, if its application is made more accessible by explicitly supporting the engineering process of corresponding systems, thereby bridging theory and practice. Thus, in this course, students will learn systematic concepts and techniques for designing and maintaining scalable software systems that are able to gather, store, process and analyze huge volumes of varying data, even at high velocities. In particular, students will not only get in-depth knowledge of the challenges and the current state of the art in this highly active and diverse field of research but also will gain a deep understanding of the often well-established and long-standing engineering theories and techniques like forward- and reverse engineering, design patterns, model-driven development, and schema evolution. Students will get insight into the challenges posed on these techniques by handling huge volumes of data in different formats at high speeds, while maintaining resiliency. Finally, since in this area, most often existing systems and tools are combined to fulfill the peculiarities of big data applications, this course should help students to think about engineering big data in new ways and especially how to select appropriate systems & tools for a certain problem at hand.

  1. Foundations of Big Data Engineering: Big Data reference architectures, technology classification and selection frameworks, requirements and architecture definition for Big Data applications
  2. Big Data Storage Models: Key-value, Column-Family, Document- and Graph-based, “polyglot” data models
  3. Big Data Processing Models: Events, Batch & Stream Processing, Real-time DBS, Log-based solid data infrastructures, Kafka & the Unix philosophy of distributed data, turning the DBS “inside out”
  4. Engineering of Big Data Schemas: Model-driven techniques for forward engineering (schema-first) & reverse engineering from data & code (schema-on-read), schema-driven consistency checking
  5. Design Patterns for Big Data Schemas: Key-value patterns, Column-Family patterns, Document- and Graph-based, patterns, “polyglot” data model patterns
  6. Evolution of Big Data Schemas: Empirical Analysis of Existing NoSQL Schemas, schema-driven DB evolution, schema transformation, data migration
Criteria for evaluation
  • Literature studies of students and presentations
  • Oral Exam
  • Introduction to the course topics based on slide presentations
  • Seminar-style literature studies of students and presentations
Language English
Study material
  • Volk, M., Staegemann, D., Pohl, M. and Turowski, K., “Challenging Big Data Engineering: Positioning of Current and Future Development”, In Proc. of the 4th Int Conf. on Internet of Things, Big Data and Security (IoTBDS), 2019, pages 351-358
  • Kleppmann, M., “Designing Data-Intensive Applications – The Big Ideas Behind Reliable, Scalable, and Maintainable Systems”, O'Reilly, March 2017
  • Kleppmann, M., “Making Sense of Stream Processing, The Philosophy behind Data Streaming Platforms”, O’Reilly, March 2016
  • Isah, H., et al., "A Survey of Distributed Data Stream Processing Frameworks", IEEE Access Journal, Volume 7, Oct. 2019, pages: 154300 - 154316
  • Röger, H., Mayer, R., "A Comprehensive Survey on Parallelization and Elasticity in Stream Processing, ACM Computing Surveys (CSUR), Vol. 52, No. 2, Article 36, April, 2019, pages: 1-36
  • Störl, U., Klettke, M., Scherzinger, S., “NoSQL Schema Evolution & Data Migration: State-of-the-Art & Opportunities”, In Proc. of the 22nd Int. Conf. on Extending Database Technology (EDBT), March, 2020
Changing subject? No
On-site course
Maximum number of participants -
Assignment procedure Direct assignment