Inhalt

[ 921DASIBDMK17 ] KV Big Data Management and Processing

Versionsauswahl
Workload Education level Study areas Responsible person Hours per week Coordinating university
3 ECTS M1 - Master's programme 1. year Computer Science Birgit Pröll 2 hpw Johannes Kepler University Linz
Detailed information
Original study plan Master's programme Computer Science 2018W
Objectives In this course, students will learn advanced concepts and techniques for management and processing of big data by bridging theory and practice. Students will not only get in-depth knowledge of the current state of the art in this highly active and diverse field of research but also will gain a deep understanding of the often well-established and longstanding theories underlying behind the huge variety of upcoming big data systems and tools. Overall, this course should help students to think about big data systems and tools in new ways — not just how they work, but rather why they were designed that way and how to select appropriate systems and tools for a certain problem at hand.
Subject
  1. Foundations of NoSQL Data Management: Reliable, Scalable and Maintainable Data-Intensive Applications; NoSQL Data Models and Query Languages; NoSQL Data Modeling
  2. Distributed Data in NoSQL Systems: Replication, Partitioning, Transactions, Consistency and Consensus
  3. Derived Data in NoSQL Systems: Batch Processing, Stream Processing, Lambda vs. Kappa Architectures, Situation Assessment Techniques, Situation & Process Mining
  4. Queries in Computational Data Analytics: Query Languages & Execution (Index Structures, Similarity Queries)
  5. Natural Language Processing on the Web: NLP foundations, Web Search, Web Extraction and Mining, Question Answering and Dialogue Systems
Criteria for evaluation Exercises and written exam at the end of the semester.
Methods Slide presentation with case studies and hands-on sessions.
Language (*)English
Study material
  • Martin Kleppmann “Designing Data-Intensive Applications – The Big Ideas Behind Reliable, Scalable, and Maintainable Systems”, O'Reilly, March 2017
  • Lena Wiese, “Advanced Data Management for SQL, NoSQL, Cloud and Distributed Databases”, De Gruyter/Oldenburg, 2015
  • Kay Uwe Sattler, Gunter Saake and Erhard Rahm, “Verteiltes und Paralleles Datenmanagement – Von verteilten Datenbanken zu Big Data und Cloud”, Springer, 2015
  • Nathan Marz and James Warren. “Big Data: Principles and Best Practices of Scalable Realtime Data Systems”, Manning Publications Co., Greenwich, CT, USA, 2015
  • Wil van der Aalst, “Process Mining – Data Science in Action”, Springer, 2016.
  • Ricardo Baeza-Yates, Berthier Ribeiro-Neto. “Modern Information Retrieval”, Addison-Wesley 2011
  • Bruce Croft, David Metzler, Trevor Strohma. “Search Engines”, Pearson 2009
Changing subject? No
On-site course
Maximum number of participants -
Assignment procedure Direct assignment