Enhancing Financial Fraud Detection using XGBoost, LSTM, and KNN with SMOTE for Imbalanced Datasets
Abstract
The surge in digital financial activity has led to increasingly sophisticated forms of fraud, creating serious challenges for financial institutions. One of the core obstacles in fraud detection is the substantial class imbalance present in transactional datasets, where fraudulent records represent a small minority. This study presents a robust machine learning framework that integrates the Synthetic Minority Over-sampling Technique (SMOTE) with three distinct classifiers—XGBoost, Long Short-Term Memory (LSTM), and K-Nearest Neighbors (KNN)—to enhance the detection of fraudulent activities. Using a real-world dataset of six million banking transactions, we assess each model’s performance through accuracy, precision, recall, F1-score, and both PR and ROC AUC metrics. Our findings show that SMOTE significantly boosts model recall and AUC scores. Among the models, XGBoost consistently delivers superior results with near-perfect metrics, while KNN maximizes recall, albeit at a slight cost to precision. LSTM produces more moderate but stable performance. Visual diagnostics, such as ROC/PR curves and confusion matrices, further confirm the reliability of XGBoost when combined with SMOTE. Overall, the integration of data balancing with advanced classifiers proves to be a powerful approach for real- time fraud detection.
Keywords
Download Options
Introduction
With the rapid evolution of digital banking, the financial industry faces a mounting threat from fraudulent transactions. Fraud not only leads to significant monetary losses but also undermines consumer confidence in online financial platforms [1]. According to a 2022 report by the Association of Certified Fraud Examiners (ACFE), global fraud resulted in losses exceeding $3.6 billion, underscoring the urgent need for effective detection systems [2].
A major challenge in identifying fraudulent behavior lies in the highly skewed nature of fraud datasets, where valid transactions vastly outnumber fraudulent ones. Thisskewness often leadsto machine learning models performing poorly on minority classes, resulting in high false negative rates and overlooked fraud [3].
To improve detection, a wide range of techniques has been investigated—ranging from rule-based heuristics to deep learning systems. While machine learning excels at identifying complex, non-linear patterns in high-dimensional data, its effectiveness is hindered by class imbalance [4]. To mitigate this, methods like the Synthetic Minority Over-sampling Technique (SMOTE) have been employed to create a more balanced training distribution by synthesizing new examples from the minority class [5].
Conclusion
This study explored the application of three machine learning models—XGBoost, KNN, and LSTM—for detecting fraudulent financial transactions, with a particular focus on addressing class imbalance through the use of SMOTE.
Our findings reveal clear differences in model behavior both before and after applying SMOTE. Without balancing, all models—especially KNN and LSTM—struggled with recall due to the scarcity of fraudulent instances in the training data. Among the unbalanced results, XGBoost stood out with the fewest false negatives, but still suffered from reduced sensitivity overall.
The introduction of SMOTE significantly improved each model’s ability to detect fraudulent transactions. Recall increased across the board, most notably for KNN (from 0.1247 to 1.0000) and LSTM (from0.2688to0.9822), as evidenced by confusion matrices. However, these gains were accompanied by a rise in false positives, particularly for KNN, reflecting the classic precision-recall trade-off that arises in imbalanced classification problems.
Despite these trade-offs, XGBoost with SMOTE consistently emerged as the best-performing model, achieving outstanding results across all key metrics. It reached perfect PR AUC and ROC AUC scores (1.00) and maintained a strong balance between precision (0.9943) and recall (0.9840). While KNN achieved flawless recall, it did so at the cost of precision (0.8671) and overall F1 score stability. LSTM demonstrated considerable improvement post-SMOTE, but still lagged behind in precision and balanced performance.
These results were further reinforced by visual tools such as ROC and PR curves, histograms, and confusion matrices. The consistency of XGBoost’s dominance across all metrics and visual diagnostics makes it a robust and scalable solution for realworld fraud detection systems, especially when augmented with data balancing techniques like SMOTE.