Confusion Matrix Calculator - Accuracy, Precision, Recall, and F1
Use this confusion matrix calculator with TP, FN, FP, and TN counts to compute accuracy, precision, recall, F1, and the Matthews correlation coefficient.
Confusion Matrix Calculator
Results
What Is Confusion Matrix Calculator?
A confusion matrix calculator turns the four-cell summary of a binary classification result into the standard evaluation metrics used in machine learning, statistics, and academic work. Type the true positives, false negatives, false positives, and true negatives, and the tool returns accuracy, precision, recall, F1, specificity, false positive and false negative rates, prevalence, and the Matthews correlation coefficient in one step.
- • Grading a homework classifier: Compare a student logistic regression against a target confusion matrix from a textbook without retyping the formulas.
- • Evaluating a medical screening tool: Check sensitivity, specificity, and the Matthews correlation coefficient where false negatives carry real cost.
- • Tuning a spam or fraud filter: See how a new threshold changes precision, recall, and F1 when the team debates flagging more messages.
- • Comparing two model versions: Drop the confusion matrix counts from each candidate model side by side and compare every metric at once.
Most reports print a 2-by-2 matrix of TP, FN, FP, and TN, but the derived metrics are scattered across dashboards. A single page that takes the four counts and turns them into the standard ratios makes reading a model quicker.
The top row (TP and FN) covers actual positive cases, and the bottom row (FP and TN) covers actual negative cases. The left column (TP and FP) holds the model's positive predictions, and the right column (FN and TN) holds the negative ones.
For a broader view of descriptive statistics on the same dataset, the Statistics Calculator accepts a raw list of values and reports mean, median, variance, and standard deviation alongside it.
How Confusion Matrix Calculator Works
The calculator reads the four cell counts, sums them into a total, and divides the right combinations to produce each ratio. Denominators of zero are treated as undefined so the page does not silently misreport metrics that have no meaning in your matrix.
- TP: True positives: actual positives predicted as positive.
- FN: False negatives: actual positives predicted as negative.
- FP: False positives: actual negatives predicted as positive.
- TN: True negatives: actual negatives predicted as negative.
- N: Total samples: TP + FN + FP + TN.
Precision answers 'when the model predicts positive, how often is it right?' and recall answers 'of all actual positives, how many did the model catch?' They are almost always a trade-off, which is why the F1 score, the harmonic mean of the two, is so widely used.
The Matthews correlation coefficient (MCC) is harder to compute by hand because it uses all four cells and a product of four sums in the denominator. According to Wikipedia, accuracy is the proportion of correct predictions, precision is TP divided by (TP plus FP), and recall is TP divided by (TP plus FN).
Worked example: imbalanced medical screening
TP = 5, FN = 1, FP = 4, TN = 90 (100 patients, 6 truly positive).
Accuracy = 0.95; Precision = 5 / 9 = 0.5556; Recall = 5 / 6 = 0.8333; F1 = 0.6667; MCC = 0.6562.
Accuracy 0.95, Precision 0.56, Recall 0.83, F1 0.67, MCC 0.66.
Accuracy looks high because the dataset is imbalanced, but F1 and MCC penalize the false positives. Relying on accuracy alone can mislead a clinical decision in this situation.
According to Wikipedia: Confusion matrix, accuracy is the proportion of correct predictions, precision is TP divided by (TP plus FP), and recall is TP divided by (TP plus FN).
If you want to manipulate the four cells as a real 2-by-2 array, the Matrix Calculator lets you add, multiply, and invert matrices built from the same numbers.
Key Concepts Explained
These four concepts are the vocabulary that lets you talk about a confusion matrix calculator result out loud. Skim the names first, then read the explanations once you have a count to look at.
True and false positives vs negatives
Positive and negative refer to the model prediction, while true and false refer to whether the prediction matched reality. A true positive is a positive case the model got right, and a false negative is a missed positive case.
Precision and recall
Precision looks down the predicted-positive column and asks what fraction of those predictions are right. Recall looks across the actual-positive row and asks what fraction of true cases were caught.
Specificity and false positive rate
Specificity is the share of actual negatives that the model labeled correctly, while the false positive rate is the share it labeled incorrectly. They always add up to one, and the result panel shows both so you can read whichever your field uses.
Matthews correlation coefficient
The MCC summarizes the matrix into a single number between minus one and one by combining all four cells with the formula (TP*TN - FP*FN) divided by the square root of (TP+FP)(TP+FN)(TN+FP)(TN+FN). It stays informative when classes are imbalanced, with 1 meaning a perfect classifier.
Once you can name these four ideas, the rest of the confusion matrix vocabulary - false discovery rate, false omission rate, balanced accuracy - falls into place as a small rearrangement of the same four cells.
Many textbooks pair a confusion matrix with a hypothesis test. The chi-square test uses the same observed-versus-expected structure, so the four cells you type here often feed straight into a chi-square analysis.
When you want to know whether the pattern in the four cells is statistically significant, the Chi-Square Calculator turns the same observed counts into a chi-square statistic and a p-value.
How to Use This Calculator
Five quick steps cover the typical workflow: pull the four counts out of your report, type them in, read the result panel, scan for undefined metrics, then screenshot your write-up.
- 1 Locate the four counts in your report: Open the classification report or library output that shows TP, FN, FP, and TN. If you only have class-level precision and recall, use the formulas on this page to back out the counts.
- 2 Enter the four counts: Type the integer value of each cell into the matching input. Use whole numbers - the calculator rounds down any decimal so the metrics always match a real count.
- 3 Read the result panel: Watch the result panel update as you type. Accuracy, precision, recall, F1, specificity, false positive and negative rates, prevalence, and the Matthews correlation coefficient all update from one recalculation.
- 4 Spot divide-by-zero issues: If a metric shows zero where you expected a real value, check the denominator. Precision needs at least one predicted positive (TP + FP greater than zero) and recall needs at least one actual positive (TP + FN greater than zero).
- 5 Save and compare to a baseline: Copy the result table into your lab notebook, slide deck, or pull request, and run the same matrix through a baseline model. The side-by-side comparison makes it much easier to argue a new threshold is moving the metrics you care about.
Example workflow: read off TP = 50, FN = 10, FP = 5, TN = 100 from a confusion matrix image, type them in, and the panel reports accuracy 0.91, precision 0.91, recall 0.83, F1 0.87, MCC 0.81 - the same numbers you would get by hand.
If you are comparing two confusion matrices from two different models, the T-Test Calculator tells you whether the difference in their accuracies is large enough to be statistically meaningful.
Benefits of Using This Calculator
These are the concrete benefits students, instructors, and machine learning practitioners report from a single-page tool.
- • Cuts the metric confusion: Every standard classification metric appears in the same place, so you stop mixing up precision with recall, sensitivity with specificity, or F1 with accuracy.
- • Makes imbalanced classes honest: By showing F1 and MCC alongside accuracy, the result panel reveals when a high accuracy number hides a useless model, a common failure on imbalanced datasets.
- • Saves a stats homework step: Enter four cells once and copy the row of metrics straight into the answer box instead of typing formulas into a spreadsheet.
- • Speeds up model reviews: When a teammate pastes a confusion matrix in a pull request, the reviewer can punch in the four numbers and confirm the metrics within seconds.
- • Reinforces the matrix layout: Typing the counts by hand forces you to remember which cell is TP and which is FN, the most common source of bug in student code.
The biggest pay-off is the first time you build a model that scores 99 percent accuracy on a credit-card-fraud dataset and look at the F1 to see the model is predicting 'not fraud' for almost everything. The result panel makes that surprise visible in seconds.
For instructors, putting the page on a class site lets students check homework without giving them the answer key.
If the four cells come from a randomized experiment with a known positive rate, the Binomial Distribution Calculator helps you decide whether the observed false negative count is unusual under the null hypothesis.
Factors That Affect Your Results
Three factors change the numbers you see in the result panel, and two limitations remind you which metrics are safe to quote and which need extra context.
Class imbalance
When the positive class is rare, accuracy is dominated by the true negative cell and the model can look strong while recall collapses. The result panel prints prevalence next to accuracy so the imbalance is obvious at a glance.
Choice of positive class label
Swapping which class you call positive flips TP with TN and FP with FN, which changes precision, recall, and F1. Pick the label that matches the cost you actually care about, usually the rarer class.
Decision threshold
The same model produces a different confusion matrix at every probability threshold. Lowering the threshold usually raises recall and lowers precision, and the calculator helps you see the trade-off by typing in the four cells for each candidate threshold.
Choice of summary metric
Accuracy, F1, and MCC can tell different stories on the same matrix. As published by Google's Machine Learning Crash Course, accuracy can be misleading for imbalanced classes and a high recall or F1 is often more informative when positive cases are rare.
- • This tool is built for binary classification. Multi-class problems have to be collapsed into one-versus-rest counts before the four cells make sense.
- • Counts are entered as integers and rounded down. If your pipeline reports floating point counts, normalize to the nearest whole number first.
If you need a single number to summarize a binary classifier on an imbalanced dataset, prefer the Matthews correlation coefficient over accuracy. It is symmetric across the two classes, so swapping the positive label only changes the sign.
The matrix is only as honest as the labels you started with. If the ground-truth labels are noisy, common in medical imaging or user-tagged datasets, the matrix inherits that noise.
As published by Google Machine Learning Crash Course, accuracy can be misleading for imbalanced classes and a high recall or high F1 score is often more informative when positive cases are rare.
Frequently Asked Questions
Q: How do you read a confusion matrix?
A: Read row by row. The top row shows actual positive cases: predicted correctly (TP) versus missed (FN). The bottom row shows actual negative cases: correctly rejected (TN) versus wrongly flagged (FP).
Q: What is the difference between precision and recall?
A: Precision is TP divided by (TP plus FP) and answers 'when the model predicts positive, how often is it right?' Recall is TP divided by (TP plus FN) and answers 'how many actual positives did the model catch?'
Q: How do I calculate the F1 score?
A: Compute precision and recall first, then take the harmonic mean: F1 = 2 times precision times recall, divided by the sum of precision and recall. The harmonic mean punishes extreme values.
Q: What is a good accuracy for a confusion matrix?
A: There is no universal threshold. On a balanced dataset, accuracy above 0.80 usually beats a naive baseline. On an imbalanced dataset with only 5 percent positives, predicting negative for everything already scores 0.95 accuracy.
Q: When should I use the Matthews correlation coefficient?
A: Use MCC whenever the two classes are not balanced, when the cost of false positives differs from the cost of false negatives, or when you need a single number that summarizes the matrix honestly. MCC ranges from minus one to one.
Q: What does it mean if my confusion matrix shows only false results?
A: A matrix with only false positives and false negatives (TP and TN both zero) means the model is inverting every prediction. The Matthews correlation coefficient will be minus one and accuracy 0.