Evaluating Classification Models with ROC AUC and LOG LOSS

Classification models predicts classes as output either 0 or 1.But,sometimes instead of predicting classes,computing probabilities can provide a deep understanding and insight about the predictions that can be used for evaluating classification model.In this tutorial,you will learn about ROC AUC and Log-Loss(Cross-Entropy) to evaluate classification models.

Before going through this tutorial ,I hope you are familiar with terms like TP,TN,FP,FN,Confusion matrix,precision,recall,sensitivity,specificity.If you are not familiar with this terms, I suggest you to please go through this post; ,because to understand ROC AUC and Log Loss you need to first understand this terms.

ROC AUC (Receiver Operating Characteristic/Area Under Curve)

As you know,probability ranges from 0 to 1.So,to get a optimal threshold value we need to calculate confusion matrix for each threshold and check which threshold value gives us High True Positive Rate.But this is not a feasible solution,because there are many threshold values we can take between 0 and 1.So,is there any other way?

Yes. ROC graph is the solution to this problem. ROC provide a simple way to summarise all of the confusion matrix information for each threshold value in a graph.The Y-axis represents True Positive Rate(Sensitivity) and X-axis represents False Positive Rate(1-Specificity).

True Positive Rate(Sensitivity)
True Positive Rate (Sensitivity)

True Positive Rate is the Proportion of samples which are correctly classified as Positive from all positive samples.

False Positive Rate
False Positive Rate (1- Specificity)

False Positive Rate is the Proportion of samples which were actually Negative but were incorrectly classified as Positive,from all Negative samples.

Low values on x-axis indicates low False Positive Rate and High True Negative Rate.

High values on y-axis indicates high True Positive Rate and Low False Negative Rate.

Let’s first Implement it in Python ,then you will get a better understanding of ROC.

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
import matplotlib.pyplot as plt

##Generating dataset with 2 classes

X,y=make_classification(n_samples=1000,n_classes=2,random_state=1)

##Splitting dataset into Train and Test split
train_X,test_X,train_y,test_y=train_test_split(X,y,test_size=0.5,random_state=2)

##Creating the model

lr=LogisticRegression()
lr.fit(train_X,train_y)

lr_probs=lr.predict_proba(test_X)

##Getting probailities only for Positive(1) class
lr_probs=lr_probs[:,1]

##Calculating scores

lr_auc_score=roc_auc_score(test_y,lr_probs)

##Printing Scores
print("Logistic ROC Score"+ str(lr_auc_score))

##Plotting ROC Curve

fpr,tpr,threshold=roc_curve(test_y,lr_probs)
plt.plot([0,1],[0,1],linestyle="-",label="Generalized Model Curve")
plt.plot(fpr,tpr,marker=".",label="ROC CURVE")
plt.legend()
plt.show()

OUTPUT: Logistic ROC Score 0.902804487179487

In the first 6 lines,we are importing necessary functions and classes from libraries. make_classification is used to generates random data for classification with 2 classes. and 1000 sample points. Next,we split the data in train and test set with train_test_split function. predict_proba is a method,which return probabilities instead of predicting classes as 0 or 1.In ROC,we only focus on positive classes,so we are only extracting probabilities of Positive classes using lr_probs=lr_probs[:,1]. Next,we use roc_curve method which return fpr,tpr and threshold.We pass fpr and tpr to matplotlib.pyplot function to plot this values against each other.

ROC AUC CURVE
ROC AUC CURVE

This is how ROC curve look like.The blue diagonal line is a generalised ROC curve.A generalised ROC curve is a curve which represents a balanced model,where True Positive Rate = False Positive Rate.Hence,Area above the curve=Area below the curve=0.5.

We want this Area Under the Curve (AUC) to be close to 1.Notice that,the output of code returns roc_score=0.9028044 which is a very good score.

Notice the 3 black points in the graph.This are the points which have very high values for the Y-axis(True Positive Rate). Depending upon your business case you can choose any threshold you want.If you take first point where x=0.3549 and y=0.9570,which means True Positive Rate is about 95% and False Positive Rate is about 35%. Now,look at the third point,where x=0.5732 and y=0.9863.Notice that,this threshold will give you True Positive Rate of about 98% but it will give False Positive Rate of about 57% which is more than the first point where False Positive Rate is about 35%.

So,its all depends on your business case and how much False Positive’s you can accept.Choose a threshold,which is suitable for your business needs.So,In this case,you might choose threshold corresponding to first point which have high True Positive Rate and Low False Positive Rate.

Why ROC AUC is useful?

ROC AUC is useful because we can directly compare the performance of one model to another.Suppose,you have 2 model,one is Logistic Regression and other is SVM. You can compare the ROC AUC of both the models.Model which will have higher Area Under The Curve,will be the best model.

LOG-LOSS

Suppose a model is predicting a sample as 1(positive),but when you look at the probability,you found that that,it has probability of just 0.51,which is very close to 0.50.And,as the probability is above 0.50,which is default threshold for classification,it is being classified as 1(positive).But,do you think,the prediction is correct?Just because probability is above 0.50,doesn’t mean the actual output will be 1,because the probability is just slightly above the threshold.

In log loss,we look at the confidence of model in predicting the outcome.In above case,although it’s predicting output as 1,but its not very confident about its prediction,as probability is close to 0.50(threshold).

Log-loss is a function,in which ,each predicted probability is compared to actual class (0 and 1),and a score is calculated based on the distance between actual and predicted output.As we are using log,the returned log-loss score is on logarithmic scale,meaning it assigns less score when distance from actual and predicted output is less and high score when distance from actual and predicted output is more.

Log-Loss for Binary classification is given by following formula:

1.When Actual Class is 1

-log(h^ (x)); when y=1

h^ (x) is predicted value for sample x,when actual output is 1.

If we plot this function,with x-axis as probability(confidence) in prediction and y-axis as cost(loss) ,it will look like below:

Log loss when Actual Output is 1
Log loss when Actual Output is 1

From graph,you can see that,when Actual Output is 1 and Prediction is also 1,then cost is 0 and probability(confidence) is 1.But,when Actual Output is 1 and prediction is 0,cost is maximum and probability(confidence) is 0.

2.When Actual Class is 0

-log(1-h^ (x)); when y=0

h^ (x) is predicted value for sample x,when actual output is 0.

If we plot this function,with x-axis as probability(confidence) in prediction and y-axis as cost(loss) ,it will look like below:

Log loss when Actual Output is 1
Log loss when Actual Output is 1

From graph,you can see that,when Actual Output is 0 and Prediction is also 0,then cost is 0 and probability(confidence) is 1.But,when Actual Output is 0 and prediction is 1,cost is maximum and probability(confidence) is 0.

The above two log-loss functions can be combined together as

-y log(h^ (x)) – (1-y) log (1-h^ (x))

The above formula is log-loss function or cross-entropy for binary classification and graph looks like this.

log loss function
log loss function

Notice,the cut off at 0.5,and as they go away from actual output,probability(confidence) in prediction decreases.

Log-loss implementation in python

Sci-kit learn provide log_loss function under metrics package to calculate log-loss score.

from sklearn.metrics import log_loss
from matplotlib import pyplot


# Creating 100 predictions
yhat = [x*0.01 for x in range(0, 101)]

# Calculating loss when actual output is 0
losses_0 = [log_loss([0], [x], labels=[0,1]) for x in yhat]
# Calculating loss when actual output is 1
losses_1 = [log_loss([1], [x], labels=[0,1]) for x in yhat]


# plot input to loss

pyplot.plot(yhat, losses_0, label='true=0')
pyplot.plot(yhat, losses_1, label='true=1')
pyplot.xlabel("Probability")
pyplot.ylabel("Cost")
pyplot.legend()
pyplot.show()

That’s all for this tutorial.I hope, I have explained all the concept very clearly.If you have any doubt or suggestion,feel free to comment.

Thank You.

Amarjeet

About Amarjeet

Amarjeet,BE in CS ,love to code in python and passionate about Machine Learning and Data Science. Expertsteaching.com is just a medium to share what I have learned till now with world.
Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *