Logistic Regression¶

Fan Gong 1/16/2018

This notebook shows how to construct a binary logsitic regression model from scratch.

We will still use the classic IRIS data as our sample data.

# Load some libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
from scipy.optimize import approx_fprime
from sklearn.metrics import roc_curve, auc

#Transform the sklearn Dataframe into pandas Dataframe
iris = datasets.load_iris()
iris = pd.DataFrame(np.hstack((iris.data, iris.target.reshape(-1,1))),
                   columns = iris.feature_names + ['target'])

#Rename the columns and only select the first two species's data
iris.rename(columns = {'sepal length (cm)':'sepal_l',
                      'sepal width (cm)': 'sepal_w',
                      'petal length (cm)': 'petal_l',
                      'petal width (cm)': 'petal_w'}, inplace = True)
iris = iris.query('target in [0.0, 1.0]')

1. Sigmoid Function¶

This part show what is the sigmoid function

# Define the sigmoid function
def sigmoid(x):
    y = 1/(1+np.exp(-x))
    return y

X = np.random.uniform(low=-10.0, high=10.0, size =10000)
y = sigmoid(X)

plt.scatter(X,y)
plt.axvline(x=0)
plt.title('sigmoid function')
plt.show()

2. Log loss for logistic regression¶

From this part I would like to construct the loss function, we know the loss function looks like this:

$$L(\beta)=-log(L(\mathbf{y},\mathbf{X}|\beta))=-\sum_{i=1}^{n}y_ilog(\sigma(x_i \beta))+(1-y_i)log(1-\sigma(x_i \beta))$$

And if transfer the above formula to matrix-based formula, it is going to like this:

$$L(\beta)=-(\mathbf{y}^T log(\sigma(\mathbf{X}\beta)) + (1-\mathbf{y}^T) log(1-\sigma(\mathbf{X}\beta)))$$

Where $\mathbf{y}$ is an $n x 1$ column vector; $\mathbf{X}$ is an $n x p$ matrix and $n$ represents the number of observations, $p$ represents the number of independent variables; $\beta$ is an $p x 1$ column vector

def logloss(beta, y, X):
    '''
    This function aims to calculate the log loss of a binary logistic regression model.
    
    Parameter
    ---------
    y: An n x 1 column vector
    X: An n x p matrix
    beta: An p x 1 column vector
    
    Return
    loss: The log loss
    '''
    part1 = y.reshape(1,-1).dot(np.log(sigmoid(X.dot(beta))))
    part2 = (1-y).reshape(1,-1).dot(np.log(1-sigmoid(X.dot(beta))))
    
    loss = - (part1 + part2)
    return loss

y = iris.target.values
y.shape

(100,)

X = iris.loc[:, iris.columns != 'target'].values
X.shape

(100, 4)

beta = np.random.rand(4)
beta.shape

(4,)

3. Gradient Descent for Optimization¶

Then we construct gradient descent function to minimize the loss and get the optimal beta.

def gradient_descent(func, beta, step = 0.01, max_iter = 1000, stop_val = 1e-6, y=y, X=X):
    '''
    This functions construct the gradient descent algorithm
    
    Parameter
    ---------
    func: The function we would like to perform gradient descent
    x_init: The initialized parameter value
    step: The learning rate
    max_iter: The maximum iteration number
    stop_val: The stop criteria
    y: The label matrix
    X: All the observations
    
    Return
    ------
    beta: The final parameter value
    '''
    
    # initialization
    x_matrix = np.zeros(shape=(max_iter, beta.shape[0]))
    x_matrix[0,:] = beta
    
    
    for i in range(max_iter-1):
        
        # calculate gradient
        grad = approx_fprime(beta, func, 0.00001, y, X)
        
        # update the parameter
        beta = beta - grad * step
        
        x_matrix[i+1, :] = beta
        
        if i % 200 == 0:
            print('Now it has itered {0} times, and the loss is {1}'.format(i, logloss(beta,y,X)))
        
        if np.absolute((x_matrix[i+1,:] - x_matrix[i,:]).mean()) < stop_val:
            break 
    
    return beta

optimal_beta = gradient_descent(logloss,beta)
optimal_beta

Now it has itered 0 times, and the loss is [ 729.06781017]
Now it has itered 200 times, and the loss is [ 0.33350056]
Now it has itered 400 times, and the loss is [ 0.1963901]
Now it has itered 600 times, and the loss is [ 0.14095517]
Now it has itered 800 times, and the loss is [ 0.11064282]

array([-1.14648903, -2.74273989,  4.59965078,  2.71457418])

4. Model Evaluation¶

Now that we have used GD to obtain the optimal beta, we then evaluate our model.

I would like use ROC curve to see our results and also plot the decision boundary.

logits = X.dot(optimal_beta) # logits
p_1 = sigmoid(logits).reshape(-1,1) #the probability that each observation belongs to the class one
p_2 = 1-p_1 # prob belongs to class 0
y_score = np.hstack((p_1,p_2))
predict_label = (logits >= 0).astype(float) # the prediction label if we use 0.5 as our threshold

# calculate ROC curve
fpr, tpr, threshold = roc_curve(y, y_score[:,0])
roc_auc = auc(fpr, tpr)

plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc, lw=2)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

class_1 = p_1[p_1>=0.5]
class_2 = p_1[p_1< 0.5]

fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter([i for i in range(len(class_1))], class_1, s=25, c='b', marker="o", label='Class_1')
ax.scatter([i for i in range(len(class_2))], class_2, s=25, c='r', marker="o", label='Class_0')
plt.legend(loc='center right');
ax.set_title("Decision Boundary")
ax.set_xlabel('N/2')
ax.set_ylabel('Predicted Probability')
plt.axhline(.5, color='black')
plt.show()
plt.show()