[ 951SMDSSPDK20 ] KV Statistical Principles of Data Science

Workload Education level Study areas Responsible person Hours per week Coordinating university
6 ECTS M1 - Master's programme 1. year Statistics Andreas Futschik 3 hpw Johannes Kepler University Linz
Detailed information
Original study plan Master's programme Statistics 2021W
Objectives Students know basic concepts and tools of statistics for data analysis. They can apply methods designed for big data and high dimensional inference and know about pitfalls to avoid in data analysis.
Subject Basic concepts of statistics: estimation, testing, prediction and classification, clustering.

Basic statistical tools: frequentist vs. Bayesian inference; common statistical models; model selection and model averaging.

Pitfalls: correlation vs. causation; all models are wrong; garbage in - garbage out; common sources of bias; Simpson's paradox and the perils of aggregating data; data mining, multiple hypothesis testing and the false discovery rate; curse of dimensionality, spurious correlation, incidental endogeneity.

Big data and large scale inference: big "n" vs. big "p"; introduction to and application of a specific advanced statistical method such as, for example, sparse modelling and lasso; random forests, boosting, shrinkage and empirical Bayes.

Criteria for evaluation Homework plus written exam.
Language English
Study material Bradley Efron and Trevor Hastie: Computer Age Statistical Inference. Cambridge University Press 2016.
Changing subject? Yes
Corresponding lecture 951SMDSSPDK17: KV Statistical Principles of Data Science (6 ECTS)
On-site course
Maximum number of participants 25
Assignment procedure Assignment according to priority