This notebook tries to implement GBM through sklearn
and at the same time try to understand the tuning parameter in that class.
# Load Library
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Generate Data
X, y = make_regression(n_samples = 500, n_features = 10, random_state = 42)
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.2, random_state = 42)
The number of weak learners (i.e. regression trees) is controlled by the parameter n_estimators
; The size of each tree can be controlled either by setting the tree depth via max_depth
or by setting the number of leaf nodes via max_leaf_nodes
. The learning_rate
is a hyper-parameter in the range (0.0, 1.0] that controls overfitting via shrinkage. Also we have subsample
parameter to perform stochastic gradient descent.
gbm = GradientBoostingRegressor(n_estimators= 1000,
learning_rate=0.1,
max_depth=1,
random_state=42,
subsample = 0.5)
gbm = gbm.fit(X_train, y_train)
mean_squared_error(y_test, gbm.predict(X_test))
gbm.feature_importances_