Search for probability and statistics terms on Statlect
StatLect

Machine learning

This is a short course in machine learning, aimed at those who are already proficient with the basics of statistical methodology, and in particular with linear regressions.

Foundations

Overfitting

Some models fit previously seen data vey well, but are bad at forecasting unseen data

Predictive model

A model used to predict unseen outputs given observed inputs

Choice of a regularization parameter

Use train-val-test splits to choose the amount of regularization

Training, validation and test

How to split the data in order to test and validate predictive models

Popular models

Gradient boosting

A generalization of the algorithm used in boosted linear regressions

Boosted linear regression

An algorithm to train high-dimensional linear regression models without overfitting

Boosted tree

A gradient-boosted model where the base learners are decision trees

Decision tree

A predictive model built performing sample splits based on the input values

Boosted classifier

A classification model in which the scores are obtained by boosting

Example of what can be achieved by machine learning methods: a model receives a text prompt as input and produces a beautiful computer-generated image as output.

Techniques to improve predictive performance

K-fold cross-validation

A method to validate and test predictive models that uses data in a smart way

Ensembling

It is advantageous to average the predictions from many different models

Addressing disappointment on production data

Domain shift

What to do when production data does not come from the same distribution as the learning data

Large language models explained

The books

Most of the learning materials found on this website are now available in a traditional textbook format.