Machine Learning: Decision Trees 🌳🌲

3 min readJan 21, 2024

If you remember in my previous blog, I wrote about regression models, and in this blog, I will be discussing decision trees.

Classification and Regression Trees (CART)

CART forms the foundation of Random Forest. It transforms complex structures within the dataset into simple decision structures and separates heterogeneous datasets into homogeneous groups.

Caution: CART tends to overfitting.

Regression Problems in CART

Values are sorted from small to large, and the data is split where the minimum SSE (Sum of Squared Errors) value is identified. This process continues up to a specified maximum depth.

Classification Problem in CART

The branching processes continue based on Entropy and Gini values. The aim is to have these values small.

Random Forest

The evaluation of many decision trees by aggregating their predictions forms the basis. Random Forest has emerged from the combination of Bagging and Random Subspace methods.

Observations for trees are selected using the bootstrap random sampling method, and variables are selected using the random subspace method.

Gradient Boosting Machine(GBM)

Adaboost(Adaptive Boosting) is the foundation of GBM.

Adaboost: It relies on the idea of combining weak classifiers to create a strong classifier.

A series of models is built on errors, in the form of a single predictive model.

y = rent= dependent variable= real value

F0: The initial predicted value.

y- F0= Differences, errors and residuals= new dependent variable

sqfeet= independent variable

F1= F0 + Δ1

How does the GBM algorithm work?

The real values are subtracted from the initial predicted value.

A new model is built and trained using the obtained error values, then the dataset is split.

The Δ1 value obtained by building the model again is added to the previous predicted value to create a new predicted value. And this process is stopped where the error is minimized.

XGBOOST (eXtreme Gradient Boosting)

XGBoost is an optimized version of Gradient Boosting Machine (GBM) designed to enhance speed and prediction performance. It also features scalability and can be seamlessly integrated into different platforms.

LightGBM

LightGBM is a type of gradient boosting machine (GBM) developed to improve the training time performance, specifically aiming to enhance the performance of XGBoost.

XGBoost uses the Level-wise growth strategy, while LightGBM uses the Leaf-wise growth strategy, which is why it is faster.

CatBoost

Another fast and successful type of GBM that automatically handles categorical variables is CatBoost.

Big thanks to Vahit Keskin and Miuul

Contact me on Linkedin :) yaseminderyadilli

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Written by Yasemin Derya Dilli

97 Followers

122 Following

Data Analyst | Engineer | Content Writer

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Recommended from Medium

Building an Innovative & Transparent Credit Scorecard with Python

Towards AI

Can Demir

Building an Innovative & Transparent Credit Scorecard with Python

Credit scoring models are the backbone of modern lending and risk management practices, helping financial institutions evaluate the…

Jan 5

Survival Analysis: Predict Time-To-Event With Machine Learning (Part I)

TDS Archive

Lina Faik

Survival Analysis: Predict Time-To-Event With Machine Learning (Part I)

Practical Application to Customer Churn Prediction

Feb 9, 2023

Lists

Predictive Modeling w/ Python

20 stories1857 saves

Practical Guides to Machine Learning

10 stories2225 saves

Natural Language Processing

1977 stories1620 saves

The New Chatbots: ChatGPT, Bard, and Beyond

12 stories563 saves

Interpreting Support Vector Machine Coefficients: A Comprehensive Analysis

D.H. Jang

Interpreting Support Vector Machine Coefficients: A Comprehensive Analysis

In the rapidly advancing landscape of artificial intelligence (AI) and machine learning (ML), specific methodologies and their…

Nov 3, 2024

Solving Max-Cut Problems with D-Wave Quantum Annealing

Naoki

Solving Max-Cut Problems with D-Wave Quantum Annealing

Split the Network for Maximum Gain!

Nov 24, 2024

Price Optimization with Generalized Additive Model (GAM)

Gustavo R Santos

Price Optimization with Generalized Additive Model (GAM)

Use PyGAM to discover the price that can maximize your revenue.

Feb 4

Monte Carlo Simulation for Time Series Probabilistic Forecasting

Dataman in AI

Chris Kuo/Dr. Dataman

Monte Carlo Simulation for Time Series Probabilistic Forecasting

Its application on stock market prices

Mar 15, 2024

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams

Machine Learning: Decision Trees 🌳🌲

Classification and Regression Trees (CART)

Random Forest

Gradient Boosting Machine(GBM)

How does the GBM algorithm work?

XGBOOST (eXtreme Gradient Boosting)

LightGBM

CatBoost

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Yasemin Derya Dilli

No responses yet

More from Yasemin Derya Dilli

Veri Görselleştirme : Matplotlib & Seaborn

Merhabalar, Python ile veri analizi konusunun son yazısına hoş geldiniz! Diğer 3 yazıyı incelemediyseniz Veri Bilimi İçin Python , Python…

FLO — CLTV Prediction with BG-NBD & Gamma-Gamma

Data Understanding ve Preparation

25 SQL Sorgu Alıştırması

Merhabalar! Üzerinde çalıştığım SQL sorgu örneklerini sizlerle de paylaşıyorum. Umarım incelerken keyif alır ve öğrenirsiniz :)

Pizza Satış Analizi 🍕( SQL & Power BI)

Selamlar, pizza sever biri olarak tesadüfen denk geldiğim pizza verisini analiz etmesem olmazdı :)

Recommended from Medium

Building an Innovative & Transparent Credit Scorecard with Python

Credit scoring models are the backbone of modern lending and risk management practices, helping financial institutions evaluate the…

Survival Analysis: Predict Time-To-Event With Machine Learning (Part I)

Practical Application to Customer Churn Prediction

Lists

Predictive Modeling w/ Python

Practical Guides to Machine Learning

Natural Language Processing

The New Chatbots: ChatGPT, Bard, and Beyond

Interpreting Support Vector Machine Coefficients: A Comprehensive Analysis

In the rapidly advancing landscape of artificial intelligence (AI) and machine learning (ML), specific methodologies and their…

Solving Max-Cut Problems with D-Wave Quantum Annealing

Split the Network for Maximum Gain!

Price Optimization with Generalized Additive Model (GAM)

Use PyGAM to discover the price that can maximize your revenue.

Monte Carlo Simulation for Time Series Probabilistic Forecasting

Its application on stock market prices