This is a short course in machine learning, aimed at those who are already proficient with the basics of statistical methodology, and in particular with linear regressions.
Some models fit previously seen data vey well, but are bad at forecasting unseen data
A model used to predict unseen outputs given observed inputs
Choice of a regularization parameter
Use train-val-test splits to choose the amount of regularization
How to split the data in order to test and validate predictive models
A generalization of the algorithm used in boosted linear regressions
An algorithm to train high-dimensional linear regression models without overfitting
A gradient-boosted model where the base learners are decision trees
A predictive model built performing sample splits based on the input values
A classification model in which the scores are obtained by boosting
A method to validate and test predictive models that uses data in a smart way
It is advantageous to average the predictions from many different models
What to do when production data does not come from the same distribution as the learning data
Most of the learning materials found on this website are now available in a traditional textbook format.