[ 921DASIBDMK17 ] KV Big Data Management and Processing

Workload Education level Study areas Responsible person Hours per week Coordinating university
3 ECTS M1 - Master's programme 1. year Computer Science Birgit Pröll 2 hpw Johannes Kepler University Linz
Detailed information
Original study plan Master's programme Computer Science 2021W
Objectives Students know about advanced concepts and techniques for management and processing of big data by bridging theory and practice. Students have in-depth knowledge about the current state of the art in this highly active and diverse field of research and have gained a deep understanding of the often well-established and longstanding theories underlying the huge variety of upcoming big data systems and tools. Overall, students think about big data systems and tools in new ways — not just how they work, but rather why they were designed that way and how to select appropriate systems and tools for a certain problem at hand.
  1. Foundations of NoSQL Data Management: Reliable, Scalable and Maintainable Data-Intensive Applications; NoSQL Data Models and Query Languages; NoSQL Data Modeling
  2. Distributed Data in NoSQL Systems: Replication, Partitioning, Transactions, Consistency and Consensus
  3. Derived Data in NoSQL Systems: Batch Processing, Stream Processing, Lambda vs. Kappa Architectures, Situation Assessment Techniques, Situation & Process Mining
  4. Queries in Computational Data Analytics: Query Languages & Execution (Index Structures, Similarity Queries)
  5. Natural Language Processing and Social Media Mining on the Web: Web Search, Web Extraction and Mining, Question Answering and Dialogue Systems
Criteria for evaluation Exercises and written exam at the end of the semester.
Methods Slide presentation with case studies and hands-on sessions.
Language English
Study material
  • Martin Kleppmann “Designing Data-Intensive Applications – The Big Ideas Behind Reliable, Scalable, and Maintainable Systems”, O'Reilly, March 2017
  • Lena Wiese, “Advanced Data Management for SQL, NoSQL, Cloud and Distributed Databases”, De Gruyter/Oldenburg, 2015
  • Kay Uwe Sattler, Gunter Saake and Erhard Rahm, “Verteiltes und Paralleles Datenmanagement – Von verteilten Datenbanken zu Big Data und Cloud”, Springer, 2015
  • Nathan Marz and James Warren. “Big Data: Principles and Best Practices of Scalable Realtime Data Systems”, Manning Publications Co., Greenwich, CT, USA, 2015
  • Wil van der Aalst, “Process Mining – Data Science in Action”, Springer, 2016.
  • Ricardo Baeza-Yates, Berthier Ribeiro-Neto. “Modern Information Retrieval”, Addison-Wesley 2011
  • Bruce Croft, David Metzler, Trevor Strohma. “Search Engines”, Pearson 2009
Changing subject? No
On-site course
Maximum number of participants -
Assignment procedure Direct assignment