Day 14 - XGBoost & Data Leakage (Intermediate ML Lessons 6 & 7)

Welcome to Day 14 of Kaggle 30 Days of Machine Learning. In this video, I will walk through Lesson 6 and Lesson 7 of the Kaggle Intermediate Machine Learning course. In Lesson 6, we are introduced to XGBoost (short for extreme gradient boosting) which is a state-of-the-art machine learning model and an ensemble method. Similar to a random forest model, XGBoost also relies on building multiple decision tree models, in sequence, to generate highly accurate prediction results. In this lesson, we will learn the overall structure of an XGBoost model, how it works as well as how to tune the parameters within the model such as the number of estimators, the learning rate and early stopping rounds. In Lesson 7, we will learn what data leakage is, why it can be a costly problem in machine learning and how to prevent it. On a high level, data leakage happens when your training data contains information about the target but similar data will not be available when the model is used for prediction. This can lead to hig
Back to Top