Case Study | Data Analytics | Financial Services
Problem Statement
A regional bank in Tanzania was losing $3 million annually due to fraudulent transactions, including credit card fraud and account takeovers. The existing rule-based fraud detection system flagged too many false positives, overwhelming the fraud investigation team and delaying legitimate transactions.
Solution
The bank implemented a data analytics solution using machine learning to enhance fraud detection accuracy and efficiency.
Technical Approach:
- Data Collection and ETL Process:
- Extracted data from transaction logs, customer profiles, and device information stored in an Oracle database.
- Built an ETL pipeline using Apache Nifi to process and transform data, including anonymization of sensitive customer information to comply with GDPR.
- Handled imbalanced data (fraud cases were <1% of transactions) using oversampling techniques (SMOTE) during model training.
- Feature Engineering:
- Created features such as transaction velocity (number of transactions in the last hour), geolocation mismatches (using Haversine distance), and unusual spending patterns (z-score deviations).
- Features are stored in a feature store using Redis for low-latency access during real-time inference.
- Fraud Detection Model:
- Built an anomaly detection model using unsupervised learning with Isolation Forest (scikit-learn) to identify outliers in transaction data.
- Developed a supervised learning model using XGBoost (library) to classify transactions as fraudulent or legitimate, achieving a precision of 92%.
- Combined both models in an ensemble approach to improve detection accuracy, using a weighted voting mechanism.
- Real-Time Monitoring:
- Deployed the model as a streaming application using Apache Flink to process transactions in real time, with a latency of <100ms.
- Integrated the model with the bank’s transaction processing system using REST APIs built with FastAPI.
- Case Management Dashboard:
- Developed a dashboard using Dash (Python) to prioritize high-risk cases and provide model explanations using SHAP (SHapley Additive exPlanations) values.
- Hosted the dashboard on an internal Kubernetes cluster for scalability and security.
Tech Stack:
- Data Storage: Oracle, Redis
- Data Processing: Apache Nifi (ETL), Apache Flink (streaming)
- Programming: Python
- Visualization: Dash
- Deployment: FastAPI, Kubernetes
Implementation:
- We collaborated with the bank’s in-house tech team and functional teams to build the solution.
- Conducted a 3-month PoC in the credit card division before full deployment.
- Trained the fraud team on interpreting model outputs and managing flagged cases.
Impact:
- Efficiency: Reduced false positives by 70%, decreasing the workload of the fraud investigation team by 40%.
- Productivity: The team resolved fraud cases 50% faster due to prioritized case management and accurate flagging.
- Decision-Making: The bank’s leadership gained insights into fraud trends, enabling proactive updates to security policies and customer authentication processes.
- Overall Growth: Fraud losses decreased by $0.5 million annually, customer trust improved, and the bank attracted 10% more high-value customers due to enhanced security measures.
