Fan Gong 1/16/2018
This notebook shows how to construct a binary logsitic regression model from scratch.
We will still use the classic IRIS data as our sample data.
# Load some libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
from scipy.optimize import approx_fprime
from sklearn.metrics import roc_curve, auc
#Transform the sklearn Dataframe into pandas Dataframe
iris = datasets.load_iris()
iris = pd.DataFrame(np.hstack((iris.data, iris.target.reshape(-1,1))),
columns = iris.feature_names + ['target'])
#Rename the columns and only select the first two species's data
iris.rename(columns = {'sepal length (cm)':'sepal_l',
'sepal width (cm)': 'sepal_w',
'petal length (cm)': 'petal_l',
'petal width (cm)': 'petal_w'}, inplace = True)
iris = iris.query('target in [0.0, 1.0]')
This part show what is the sigmoid function
# Define the sigmoid function
def sigmoid(x):
y = 1/(1+np.exp(-x))
return y
X = np.random.uniform(low=-10.0, high=10.0, size =10000)
y = sigmoid(X)
plt.scatter(X,y)
plt.axvline(x=0)
plt.title('sigmoid function')
plt.show()
From this part I would like to construct the loss function, we know the loss function looks like this:
$$L(\beta)=-log(L(\mathbf{y},\mathbf{X}|\beta))=-\sum_{i=1}^{n}y_ilog(\sigma(x_i \beta))+(1-y_i)log(1-\sigma(x_i \beta))$$And if transfer the above formula to matrix-based formula, it is going to like this:
$$L(\beta)=-(\mathbf{y}^T log(\sigma(\mathbf{X}\beta)) + (1-\mathbf{y}^T) log(1-\sigma(\mathbf{X}\beta)))$$Where $\mathbf{y}$ is an $n x 1$ column vector; $\mathbf{X}$ is an $n x p$ matrix and $n$ represents the number of observations, $p$ represents the number of independent variables; $\beta$ is an $p x 1$ column vector
def logloss(beta, y, X):
'''
This function aims to calculate the log loss of a binary logistic regression model.
Parameter
---------
y: An n x 1 column vector
X: An n x p matrix
beta: An p x 1 column vector
Return
loss: The log loss
'''
part1 = y.reshape(1,-1).dot(np.log(sigmoid(X.dot(beta))))
part2 = (1-y).reshape(1,-1).dot(np.log(1-sigmoid(X.dot(beta))))
loss = - (part1 + part2)
return loss
y = iris.target.values
y.shape
X = iris.loc[:, iris.columns != 'target'].values
X.shape
beta = np.random.rand(4)
beta.shape
Then we construct gradient descent function to minimize the loss and get the optimal beta.
def gradient_descent(func, beta, step = 0.01, max_iter = 1000, stop_val = 1e-6, y=y, X=X):
'''
This functions construct the gradient descent algorithm
Parameter
---------
func: The function we would like to perform gradient descent
x_init: The initialized parameter value
step: The learning rate
max_iter: The maximum iteration number
stop_val: The stop criteria
y: The label matrix
X: All the observations
Return
------
beta: The final parameter value
'''
# initialization
x_matrix = np.zeros(shape=(max_iter, beta.shape[0]))
x_matrix[0,:] = beta
for i in range(max_iter-1):
# calculate gradient
grad = approx_fprime(beta, func, 0.00001, y, X)
# update the parameter
beta = beta - grad * step
x_matrix[i+1, :] = beta
if i % 200 == 0:
print('Now it has itered {0} times, and the loss is {1}'.format(i, logloss(beta,y,X)))
if np.absolute((x_matrix[i+1,:] - x_matrix[i,:]).mean()) < stop_val:
break
return beta
optimal_beta = gradient_descent(logloss,beta)
optimal_beta
Now that we have used GD to obtain the optimal beta, we then evaluate our model.
I would like use ROC curve to see our results and also plot the decision boundary.
logits = X.dot(optimal_beta) # logits
p_1 = sigmoid(logits).reshape(-1,1) #the probability that each observation belongs to the class one
p_2 = 1-p_1 # prob belongs to class 0
y_score = np.hstack((p_1,p_2))
predict_label = (logits >= 0).astype(float) # the prediction label if we use 0.5 as our threshold
# calculate ROC curve
fpr, tpr, threshold = roc_curve(y, y_score[:,0])
roc_auc = auc(fpr, tpr)
plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc, lw=2)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()
class_1 = p_1[p_1>=0.5]
class_2 = p_1[p_1< 0.5]
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter([i for i in range(len(class_1))], class_1, s=25, c='b', marker="o", label='Class_1')
ax.scatter([i for i in range(len(class_2))], class_2, s=25, c='r', marker="o", label='Class_0')
plt.legend(loc='center right');
ax.set_title("Decision Boundary")
ax.set_xlabel('N/2')
ax.set_ylabel('Predicted Probability')
plt.axhline(.5, color='black')
plt.show()
plt.show()