Telecom fraud is a criminal activity that utilizes phones, the internet, and text messages to deceive victims into transferring funds. With the advancement of technology, telecom fraud methods continue to evolve, making the identification and prevention of these fraudulent activities increasingly important. This paper combines data provided by the organizing committee for data analysis and the establishment of a telecom and bank card fraud prediction model.
In Problem 1, data visualization techniques were used to analyze the environment in which telecom and bank card fraud occurs. Pie charts and bar charts were created to display the frequency of fraud occurrences and the ratio of online to offline fraud. The analysis revealed a higher incidence of telecom fraud in online environments, providing a basis for subsequent preventive strategies.
In Problem 2, we used a logistic regression model to analyze the impact of bank card usage (whether the card was used for transfers on a device and whether a PIN code was used for transactions) on the probability of telecom fraud. The results showed that using a PIN code significantly reduced the likelihood of fraud, suggesting that strengthening the use of PIN codes is an effective strategy in preventing telecom fraud.
In Problem 3, we tested the linearity and normality of variables, using Spearman’s correlation analysis and chi-square tests to reveal the correlation between factors such as the transaction amount ratio, whether transactions were conducted with the same bank, and whether transactions were made online. The in-depth analysis of these indicators aids in more accurately identifying potential fraudulent activities and provides feature importance for the prediction model.
In Problem 4, convolutional neural network (CNN) models, long short-term memory (LSTM) models, and attention mechanisms were introduced. We employed stacking ensemble learning to integrate models, improved the attention mechanism, and added measures to prevent overfitting. Finally, we calculated accuracy, precision, recall, and F1 scores for each model. The results showed that the telecom and bank card fraud prediction model established in this paper achieved an accuracy of 99.99%, effectively predicting fraud. Based on the data analysis throughout this paper, we provide recommendations for public security departments, banks, and citizens to reduce the probability of telecom fraud.
In conclusion, this paper applies modern data analysis and machine learning techniques, with particular innovations in model integration and the application of attention mechanisms. Not only does this improve the prediction ability of the models, but it also validates their applicability and robustness through accuracy and error analysis. The research results are compared with existing technologies in a horizontal comparison and demonstrate the model's potential for future expansion, providing a promising outlook for telecom and bank card fraud detection.