Machine Learning: Decision Trees 🌳🌲

Yasemin Derya Dilli
3 min readJan 21, 2024

If you remember in my previous blog, I wrote about regression models, and in this blog, I will be discussing decision trees.

Classification and Regression Trees (CART)

CART forms the foundation of Random Forest. It transforms complex structures within the dataset into simple decision structures and separates heterogeneous datasets into homogeneous groups.

Caution: CART tends to overfitting.

Regression Problems in CART

Values are sorted from small to large, and the data is split where the minimum SSE (Sum of Squared Errors) value is identified. This process continues up to a specified maximum depth.

Classification Problem in CART

The branching processes continue based on Entropy and Gini values. The aim is to have these values small.

Random Forest

The evaluation of many decision trees by aggregating their predictions forms the basis. Random Forest has emerged from the combination of Bagging and Random Subspace methods.

Observations for trees are selected using the bootstrap random sampling method, and variables are selected using the random subspace method.

Gradient Boosting Machine(GBM)

Adaboost(Adaptive Boosting) is the foundation of GBM.

Adaboost: It relies on the idea of combining weak classifiers to create a strong classifier.

A series of models is built on errors, in the form of a single predictive model.

y = rent= dependent variable= real value

F0: The initial predicted value.

y- F0​= Differences, errors and residuals= new dependent variable

sqfeet= independent variable

F1= F0 + Ξ”1

How does the GBM algorithm work?

The real values are subtracted from the initial predicted value.

A new model is built and trained using the obtained error values, then the dataset is split.

The Ξ”1 value obtained by building the model again is added to the previous predicted value to create a new predicted value. And this process is stopped where the error is minimized.

XGBOOST (eXtreme Gradient Boosting)

XGBoost is an optimized version of Gradient Boosting Machine (GBM) designed to enhance speed and prediction performance. It also features scalability and can be seamlessly integrated into different platforms.

LightGBM

LightGBM is a type of gradient boosting machine (GBM) developed to improve the training time performance, specifically aiming to enhance the performance of XGBoost.

XGBoost uses the Level-wise growth strategy, while LightGBM uses the Leaf-wise growth strategy, which is why it is faster.

CatBoost

Another fast and successful type of GBM that automatically handles categorical variables is CatBoost.

Big thanks to Vahit Keskin and Miuul

Contact me on Linkedin :) yaseminderyadilli

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Yasemin Derya Dilli
Yasemin Derya Dilli

Written by Yasemin Derya Dilli

Data Analyst | Engineer | Content Writer

No responses yet

Write a response