Machine Learning: Decision Trees π³π²
If you remember in my previous blog, I wrote about regression models, and in this blog, I will be discussing decision trees.

Classification and Regression Trees (CART)
CART forms the foundation of Random Forest. It transforms complex structures within the dataset into simple decision structures and separates heterogeneous datasets into homogeneous groups.
Caution: CART tends to overfitting.

Regression Problems in CART
Values are sorted from small to large, and the data is split where the minimum SSE (Sum of Squared Errors) value is identified. This process continues up to a specified maximum depth.

Classification Problem in CART
The branching processes continue based on Entropy and Gini values. The aim is to have these values small.

Random Forest
The evaluation of many decision trees by aggregating their predictions forms the basis. Random Forest has emerged from the combination of Bagging and Random Subspace methods.
Observations for trees are selected using the bootstrap random sampling method, and variables are selected using the random subspace method.

Gradient Boosting Machine(GBM)
Adaboost(Adaptive Boosting) is the foundation of GBM.
Adaboost: It relies on the idea of combining weak classifiers to create a strong classifier.
A series of models is built on errors, in the form of a single predictive model.

y = rent= dependent variable= real value
F0: The initial predicted value.
y- F0β= Differences, errors and residuals= new dependent variable
sqfeet= independent variable
F1= F0 + Ξ1
How does the GBM algorithm work?
The real values are subtracted from the initial predicted value.
A new model is built and trained using the obtained error values, then the dataset is split.
The Ξ1 value obtained by building the model again is added to the previous predicted value to create a new predicted value. And this process is stopped where the error is minimized.
XGBOOST (eXtreme Gradient Boosting)
XGBoost is an optimized version of Gradient Boosting Machine (GBM) designed to enhance speed and prediction performance. It also features scalability and can be seamlessly integrated into different platforms.
LightGBM
LightGBM is a type of gradient boosting machine (GBM) developed to improve the training time performance, specifically aiming to enhance the performance of XGBoost.
XGBoost uses the Level-wise growth strategy, while LightGBM uses the Leaf-wise growth strategy, which is why it is faster.

CatBoost
Another fast and successful type of GBM that automatically handles categorical variables is CatBoost.
Big thanks to Vahit Keskin and Miuul
Contact me on Linkedin :) yaseminderyadilli