Fraud Detection using Active Learning to reduce False Positives
Lead Data Scientist, Mudano
Abstract: Financial fraud is normally detected using a combination of statistical methods, automated rules, and human verification. This process results in a high amount of false positives, as fraud is very rare. We frame the problem as active learning and propose a novel methodology to reduce the amount of examples that need to be labelled by humans. First an unsupervised approach is used to select anomalies, which are passed to human fraud experts to verify. These labelled examples are then used to build a supervised classifier, which can be applied to the rest of the dataset to spot other possible fraudulent examples, and optimise the selection of next examples to be labelled. This process is iterated and we name this method Iteratively Retrained Autoencoder (IRAE). Using an open-source dataset we show that IRAE can be used to obtain fraud detection accuracy that is better than other published unsupervised or semi-supervised methods on this dataset. IRAE significantly reduces false positive detection, thus allowing human experts more time to focus on fraudulent cases. We show that by human labelling only 1% of the data, we can obtain comparable performance to a supervised method, which uses all labelled examples. The talk aims to gently introduce unsupervised, semi-supervised and active learning.
Bio: Boris has extensive experience solving business problems using machine learning. He obtained his BSc in Artificial Intelligence from the University of Edinburgh, did postgraduate research in Machine learning and applied machine learning in a commercial setting. He now specialises in formulating and prototyping ML solutions for client problems at Mudano.