In machine-learning classification models, one common measure of model accuracy is AUC or Area Under the Curve. By curve is implied, ROC curve. ROC stands for Receiver Operating characteristic used by Radar Engineers in World War-II.

While the curve name ROC has stayed, in machine-learning it has nothing to do with radar signals or any signal receiver. It may, therefore, be better to keep the abbreviated name ROC as it is without attempting to expand it. In our explanation of AUC we will keep the technical terms to the minimum.

In an ROC curve, we plot ‘**True Positives**‘ on Y-axis and ‘**True Negatives**‘ on X-axis. This said, let us explain and consider both these terms. We have a data with 200 records. Assume that in this data, 100 records are classified as 1s and another 100 records are classified as 0s. A predictive classification model is built and a prediction for total number of 1s is made and tested. Let us say prediction is that number of 1s are 70 and rest 30 are 0s. Thus, true positive rate (TPR) is 70/100 or 0.70.

Let us now say that out of 100 records classified as 0s, prediction says that 70 of them are 1s and 30 are 0s. Therefore, false positives rate (FPR) is 70% or 0.70. (See first and third columns of the table below) What happens when FPR (False positive rate) of 0.70 is equal to TPR (True positive rate) of 0.70. As per TPR prediction we will expect 70 ones (from among hundred records actually marked as 1) and also as per FPR, we will expect another 70 ones (out of a total of 100 records actually marked as 0s) that is a total of 140 ones are predicted. Among 140 ones, 50% are correct and 50% are incorrect. Now a new record is tested. It tests as 1. There is a 50% chance that it is one among true-positives and there is another 50% chance that it is one among the false-positives.

What if TPR was 40% and FPR was also 40%. Again the same 50% chance of a predicted one (1) being from any one of the two groups. Thus, if FPR is plotted on X-axis and TPR on Y-axis, all along the diagonal we will have points where TPR and FPR are equal. Points along the diagonal represent a situation which is akin to making a prediction by tossing of an unbiased coin, that is classification model is random.

Now let us say, TPR is 70% and FPR is 40%. A one (1) is predicted for a new record. There is a 70/(70+40) = 70/110 chance of it being one from among true positive group and 40/110 chance of it being one from among the false positive group. Our classification model now shows improvement than tossing of coin. This point, (FPR, TPR)=(0.4,0.7), is above the diagonal. And certainly, this situation (classification model) is relatively still better than when the FPR/TPR combination is (0.4, 0.6). That is, more vertically above the diagonal we are placed, the better the classification model.

Consider now the opposite: a FPR/TPR point of (0.7,0.4). A one (1) is predicted. There is more chance (70/110) that it belongs to false-positive group rather than to true-positive group (40/110 chance). This situation is worse than a model with 50:50 chance of being on the diagonal. This point (0.7,0.4) is below the diagonal.

All models with points below the diagonal have worse performance than a model which makes predictions randomly.

What if our initial set of binary class contained 100 ones (1s) and 50 zeros (0s) instead of 100 ones and 100 zeros ? What would diagonal represent? In this case the diagonal would still represent points with equal TPR and FPR but the probability of a one being from the TPR group will be 2/3 and that of being from FPR group will be 1/3 all along the diagonal; this can again be simulated by a random process, say, a biased coin.

**How an ROC curve is drawn**: Now you have build a classification model. This model has a TPR of 0.70 and FPR of 0.30. So this model stands well above the diagonal. All binary classification models make a judgement of a record being in one class or another based on a certain threshold value of the model output. Suppose now you decrease this threshold level. Then there are chances that what were being earlier classified as zeros will be reclassified as ones. That is, there will be an increase in the total number of 1s; total number of predicted zeros will not increase though these may decrease. Meaning thereby both TPR and FPR will increase. See the following table.

Actual class | Model output | Predicted class (cut off 0.5) | Predicted class (cut off 0.45) |
---|---|---|---|

1 | 0.6 | 1 | 1 |

1 | 0.5 | 1 | 1 |

1 | 0.5 | 1 | 1 |

1 | 0.45 | 0 | 1 |

1 | 0.44 | 0 | 0 |

— | — | — | — |

0 | 0.5 | 1 | 1 |

0 | 0.5 | 1 | 1 |

0 | 0.45 | 0 | 1 |

0 | 0.45 | 0 | 1 |

0 | 0.4 | 0 | 0 |

— | — | TPR=3/5, FPR=2/5 | TPR=4/5,FPR=4/5 |

TPR may increase (or remain the same) because earlier an actual one (1) could have been wrongly classified by the model as being zero (0) because of high cut-off value but with the reduction of threshold it gets correctly classified as 1 increasing the numbers of true positives. FPR will increase because an actual zero which was being correctly classified as zero, may get reclassified as 1 leading to another false positive. Both FPR and TPR increase and our point on the ROC plane moves to the right and up. Similarly, increasing threshold will decrease TPR and also FPR and point will move to down-left (in the diagram above, point ‘a’ moves to point ‘b’). An ROC curve is drawn by changing the value of this threshold. The shape of ROC curve depends upon how discriminating our model output has been.

The more an ROC curve is lifted up and away from the diagonal the better the model is. Since our X-axis scale is maximum 1 (FPR may vary from 0 to 100% or 1) and since Y-axis scale is also maximum one (TPR may vary from 0 to 100% or 1), total area under the rectangle (actually a square) is 1*1=1. The more the ROC curve is lifted up and away from the diagonal (and hugs the top-horizontal line, TPR=1), the more area will be under it and this area will be closer to 1. Thus, maximum possible area an ROC curve can have under it is 1. If the ROC curve coincides with the diagonal, area under it is 0.5 (a diagonal divides a square/rectangle in half). We have just learnt that an ROC curve along the diagonal is a random classification model. The area under (ROC) curve is known as AUC. This area, therefore, should be greater than 0.5 for a model to be acceptable; a model with AUC of 0.5 or less is worthless. Understandably, this area is a measure of predictive accuracy of model.

You may have a look at the two ROC curves in Wikipedia.

Tags: area under curve, machine learning model accuracy, predictive accuracy of classification model, ROC, ROC curve, TPR, understanding ROC curve, what is an AUC

June 12, 2017 at 10:26 pm |

If a model results in a AUC less than 0.5, just flip the binary decision of the model would result in an acceptable model, isn’t it?

April 11, 2018 at 2:03 pm |

Yes, if you have a model with an AUC under 0.5 just take the opposite what the model is predicting and you have a better model with AUC > 0.5.

If the model is accaptable depends on how good you expect your model to perform which depends on the problem you are working on.

July 26, 2018 at 5:17 am |

Ultimate explanation

October 3, 2018 at 10:58 am |

[…] oceanic conditions, we can predict where species are most likely to be in real time. When we tested the models, we found they performed well in distinguishing between where species were or were not […]

October 4, 2018 at 7:11 am |

[…] oceanic conditions, we can predict where species are most likely to be in real time. When we tested the models, we found they performed well in distinguishing between where species were or were not […]

October 7, 2018 at 1:43 am |

[…] oceanic conditions, we can predict where species are most likely to be in real time. When we tested the models, we found they performed well in distinguishing between where species were or were not […]

October 26, 2018 at 2:38 pm |

August 5, 2019 at 4:04 pm |

[…] getting too technical, AUC measures how much more likely the AI solution is to correctly classify a positive result (say, to correctly detect a pulmonary embolism in a scan) versus how likely the […]

August 6, 2019 at 8:57 am |

[…] getting too technical, AUC measures how much more likely the AI solution is to correctly classify a positive result (say, to correctly detect a pulmonary embolism in a scan) versus how likely the […]

August 19, 2019 at 10:10 pm |

[…] getting too technical, AUC measures how much more likely the AI solution is to correctly classify a positive result (say, to correctly detect a pulmonary embolism in a scan) versus how likely the […]