This blog aims to answer following questions:
- What the confusion matrix is and why you need it?
- Types of errors in confusion matrix
- How confusion matrix is connected with cyber security
What is Confusion Matrix and why you need it?
Well, it is a performance measurement for machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.
It is extremely useful for measuring Recall, Precision, Specificity, Accuracy and most importantly AUC-ROC Curve.
Let’s understand TP, FP, FN, TN
◼** True Positive:
Interpretation: You predicted positive and it’s true.
◼** True Negative:
Interpretation: You predicted negative and it’s true.
◼** False Positive: (Type 1 Error)
Interpretation: You predicted positive and it’s false..
◼** False Negative: (Type 2 Error)
Interpretation: You predicted negative and it’s false.
Just Remember, We describe predicted values as Positive and Negative and actual values as True and False.
What can we learn from this..?
A valid question arises that what we can do with this matrix. There are some important terminologies based on this:
It is the portion of values that are identified by the model as correct and are relevant to the problem statement solution. We can also quote this as values, which are a portion of the total positive results given by the model and are positive. Therefore, we can give its formula as TP/ (TP + FP).
It is the portion of values that are correctly identified as positive by the model. It is also termed as True Positive Rate or Sensitivity. Its formula comes out to be TP/ (TP+FN).
F-1 Score :
It is the harmonic mean of Precision and Recall. It means that if we were to compare two models, then this metric will suppress the extreme values and consider both False Positives and False Negatives at the same time. It can be quoted as 2*Precision*Recall/ (Precision+Recall).
It is the portion of values that are identified correctly irrespective of whether they are positives or negatives. It means that all True positives and True negatives are included in this. The formula for this is (TP+TN)/ (TP+TN+FP+FN).
Confusion Matrix in Cyber Crime :
Cyber attack is becoming a critical issue of organizational information systems. A number of cyber attack detection and classification methods have been introduced with different levels of success that is used as a countermeasure to preserve data integrity and system availability from attacks. The classification of attacks against computer network is becoming a harder problem to solve in the field of network security.
Type I Error :
This type of error can prove to be very dangerous. Our system predicted no attack but in real attack takes place, in that case no notification would have reached the security team and nothing can be done to prevent it. The False Positive cases above fall in this category and thus one of the aim of model is to minimize this value.
Type II Error :
This type of error are not very dangerous as our system is protected in reality but model predicted an attack. the team would get notified and check for any malicious activity. This doesn’t cause any harm. They can be termed as False Alarm.