SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
Anomaly and fraud detection with
Machine Learning
Ahmed Rebai
Esprit Prépa & Esprit School of Business
Plan
● Anomaly/Fraud detection with machine learning/Deep
learning
● Outlier detection general applications
● Applications to financial sector (banking, insurance): Towards
Fintech
● Supervised vs. Unsupervised
● Algorithms and methods
● Conclusions Ensemble method (Isolation forest) and Density
method (DBSCAN/HDBSCAN)
Anomaly/Fraud detection with
machine learning/Deep learning
● Machine learning : data mining, predictive analytics, artificial
intelligence (in practice : statistics and numerical methods for
statistical analysis)
● Anomaly: deviation from “normal”/”expected”
Anomaly=Outlier=Deviant or Unusual Data Point
● Anomaly detection : detection of outlier events or observations:
Detecting deviations from the expected pattern of a data set.
The real challenge in anomaly detection is to construct the right
data model to separate outliers from noise and normal data.
In one dimension: what anomalies
look like?
Real Example: Stock market :
The May 6, 2010, Flash Crash at 2:45 pm
Real Example from the Stock market :
A real financial fraud!
● On April 21, 2015, nearly five years after the
incident, the U.S. Department of Justice laid "22
criminal counts, including fraud and market
manipulation" against Navinder Singh Sarao, a
trader. Among the charges included was the
use of spoofing algorithms; just prior to the
Flash Crash, he placed thousands of E-mini
S&P 500 stock index futures contracts which he
planned on canceling later.
Stock market : Flash Crash
continues
In two dimension: what anomalies
look like?
Results with DBSCAN algorithm from sklearn.cluster
In two dimension: what anomalies
look like?
Results with 2D PCA method from sklear.decomposition
also you can do kernel PCA
January 28 1986 Challenger spatial mission:
Dalal & al., 1989 Journal of the American Statistical Association
Real Example: Rocket science
In two dimension: what anomalies
look like?
In two dimension: what anomalies
look like?
Using Logarithmic, log-log plots to
display outliers can help
pyplot.hist can "log" y axis for you with keyword argument log=True
Example : plt.hist(numpy.log10(data), log=True)
Linear histogram Frequency vs log10
Price
Log-log histogram
log10 freq vs log10
price
Using Logarithmic, log-log plots to
display outliers can help
Linear plot Log-log plot
matplotlib.pyplot.loglog
Outlier detection general applications
● Cyber security: Detect cyber-attacks on networks: more
trusted connections
● Equipment failure (risk theory/théorie de la fiabilité)
industrial sector
● Fraud detection
● Detecting cheaters in mobile gaming
● Preprocessing task for analysis or machine learning
● Reduce false declines then grow the revenue
● Detecting French regions where Le Pen’s scores at the
2017 presidential election deviate from predictions based
on socio-economic variables (towardsdatascience)
Applications to financial sector (banking,
insurance): Towards Fintech
● Retail bank: Credit card fraud
● Private bank: Market abuse, Anti-money laundering
● Investment bank: Market abuse (Flash Crash), Anti-money
laundering
●
Insurance companies: Fraudulent operations
●
Central banks: detecting tax havens (panama papers)
Requires different approaches because Red flags are banking-type
specific (specific business expertise)
Methodology?
Supervised learning vs.
Unsupervised learning
Supervised vs. Unsupervised
● In credit card fraud detection one knows the target
variable.
How? Customers tell us. Can use supervised approach
because the true class is self-revealing.
● In market abuse or money laundering detection we don’t
really know classes (how money is laundered changes
all the time and by the same criminal organisation)
Why supervised learning is difficult?
● Severe class imbalance : we estimate that
99.9% are trusted operations vs. 0.1% of
fraudulent operations
● Problem during the train test split
● Solution you can use the stratification option in
the train_test_split function of sklearn
And now why unsupervised is
difficult?
● Sever class overlap : money laundering is mixed with legal
financial activity, especially in Investment banks
● Uncertainty around the data model
● The complexity of data
● The huge volume of data (time complexity)
● Next we will discuss the data models
● To avoid time complexity: use dask, numba (used to speed
numpy), ray and the newly module modin (used to speed
pandas) + demonstration
Algorithms
● We can differentiate between three methods:
- Distance based algorithms (similarity)
K-NN for classification
K-Means for clustering
- Density-based (fitting a density)
DBSCAN and HDBSCAN
Local outlier factor (LOF)
- Parametric
Gaussian mixture models (GMM)
Single class SVM
Extreme value theory: Tukey outlier labeling
Tree ensemble algorithm : Isolation forest
(used in Credit-swiss)
(F. T. Liu, et al., Isolation Forest, Data Mining, 2008. ICDM’08, Eighth
IEEE International Conference)
from sklearn.ensemble import IsolationForest
Ensemble regressor uses the concept of isolation to explain/separate-
away anomalies
● No point based distance calculation
● Instead I.F. builds an ensemble of random trees for a given data set
and anomalies are points with the shortest average path length
Unsupervised learning for anomaly
detection (conclusion)
● Unsupervised learning is all about finding structure in data
● Techniques: Clustering (K-means, spectral clustering)
● Principle Components Analysis
● Support Vector machine
● Autoencoder Deep Neural Networks : DNN
autoencoder anomaly detection (exotic)
● Filtering, Sequential Bayesian Filtering
● Gaussian mixture Model clustering via EM
● LightGBM (Exotic)
Fraud detection ML

Mais conteúdo relacionado

Mais procurados

Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
kalpesh1908
 

Mais procurados (20)

Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Machine Learning in Banking
Machine Learning in BankingMachine Learning in Banking
Machine Learning in Banking
 
CREDIT_CARD.ppt
CREDIT_CARD.pptCREDIT_CARD.ppt
CREDIT_CARD.ppt
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
 
Fraud Detection presentation
Fraud Detection presentationFraud Detection presentation
Fraud Detection presentation
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
 
Data Science Use cases in Banking
Data Science Use cases in BankingData Science Use cases in Banking
Data Science Use cases in Banking
 
Big Data in FinTech
Big Data in FinTechBig Data in FinTech
Big Data in FinTech
 
Build Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and GraphsBuild Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and Graphs
 
A Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine LearningA Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine Learning
 
Fraud detection
Fraud detectionFraud detection
Fraud detection
 
Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection Technique
 
Big Data: Banking Industry Use Case
Big Data: Banking Industry Use Case Big Data: Banking Industry Use Case
Big Data: Banking Industry Use Case
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Default payment prediction system
Default payment prediction systemDefault payment prediction system
Default payment prediction system
 
Data analytics in banking sector
Data analytics in banking sectorData analytics in banking sector
Data analytics in banking sector
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud Detection
 

Semelhante a Fraud detection ML

Eick/Alpaydin Introduction
Eick/Alpaydin IntroductionEick/Alpaydin Introduction
Eick/Alpaydin Introduction
butest
 
2011 02-04 - d sallier - prévision probabiliste
2011 02-04 - d sallier - prévision probabiliste2011 02-04 - d sallier - prévision probabiliste
2011 02-04 - d sallier - prévision probabiliste
Cdiscount
 

Semelhante a Fraud detection ML (20)

Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
IRJET - Fraud Detection in Credit Card using Machine Learning Techniques
IRJET -  	  Fraud Detection in Credit Card using Machine Learning TechniquesIRJET -  	  Fraud Detection in Credit Card using Machine Learning Techniques
IRJET - Fraud Detection in Credit Card using Machine Learning Techniques
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaUnsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOT
 
Eick/Alpaydin Introduction
Eick/Alpaydin IntroductionEick/Alpaydin Introduction
Eick/Alpaydin Introduction
 
2011 02-04 - d sallier - prévision probabiliste
2011 02-04 - d sallier - prévision probabiliste2011 02-04 - d sallier - prévision probabiliste
2011 02-04 - d sallier - prévision probabiliste
 
Credit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data ScienceCredit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data Science
 
Credit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data ScienceCredit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data Science
 
Ml topic1 a
Ml topic1 aMl topic1 a
Ml topic1 a
 
IRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection Analysis
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Planning for power systems
Planning for power systemsPlanning for power systems
Planning for power systems
 
IRJET - Cardless ATM
IRJET -  	  Cardless ATMIRJET -  	  Cardless ATM
IRJET - Cardless ATM
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
Why am I doing this???
Why am I doing this???Why am I doing this???
Why am I doing this???
 
Log Message Anomaly Detection with Oversampling
Log Message Anomaly Detection with Oversampling Log Message Anomaly Detection with Oversampling
Log Message Anomaly Detection with Oversampling
 
Cerdit card
Cerdit cardCerdit card
Cerdit card
 
A Review of deep learning techniques in detection of anomaly incredit card tr...
A Review of deep learning techniques in detection of anomaly incredit card tr...A Review of deep learning techniques in detection of anomaly incredit card tr...
A Review of deep learning techniques in detection of anomaly incredit card tr...
 
Trading outlier detection machine learning approach
Trading outlier detection  machine learning approachTrading outlier detection  machine learning approach
Trading outlier detection machine learning approach
 

Último

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 

Último (20)

Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 

Fraud detection ML

  • 1. Anomaly and fraud detection with Machine Learning Ahmed Rebai Esprit Prépa & Esprit School of Business
  • 2. Plan ● Anomaly/Fraud detection with machine learning/Deep learning ● Outlier detection general applications ● Applications to financial sector (banking, insurance): Towards Fintech ● Supervised vs. Unsupervised ● Algorithms and methods ● Conclusions Ensemble method (Isolation forest) and Density method (DBSCAN/HDBSCAN)
  • 3. Anomaly/Fraud detection with machine learning/Deep learning ● Machine learning : data mining, predictive analytics, artificial intelligence (in practice : statistics and numerical methods for statistical analysis) ● Anomaly: deviation from “normal”/”expected” Anomaly=Outlier=Deviant or Unusual Data Point ● Anomaly detection : detection of outlier events or observations: Detecting deviations from the expected pattern of a data set. The real challenge in anomaly detection is to construct the right data model to separate outliers from noise and normal data.
  • 4. In one dimension: what anomalies look like?
  • 5. Real Example: Stock market : The May 6, 2010, Flash Crash at 2:45 pm
  • 6. Real Example from the Stock market : A real financial fraud! ● On April 21, 2015, nearly five years after the incident, the U.S. Department of Justice laid "22 criminal counts, including fraud and market manipulation" against Navinder Singh Sarao, a trader. Among the charges included was the use of spoofing algorithms; just prior to the Flash Crash, he placed thousands of E-mini S&P 500 stock index futures contracts which he planned on canceling later.
  • 7. Stock market : Flash Crash continues
  • 8. In two dimension: what anomalies look like? Results with DBSCAN algorithm from sklearn.cluster
  • 9. In two dimension: what anomalies look like? Results with 2D PCA method from sklear.decomposition also you can do kernel PCA
  • 10. January 28 1986 Challenger spatial mission: Dalal & al., 1989 Journal of the American Statistical Association Real Example: Rocket science
  • 11. In two dimension: what anomalies look like?
  • 12. In two dimension: what anomalies look like?
  • 13. Using Logarithmic, log-log plots to display outliers can help pyplot.hist can "log" y axis for you with keyword argument log=True Example : plt.hist(numpy.log10(data), log=True) Linear histogram Frequency vs log10 Price Log-log histogram log10 freq vs log10 price
  • 14. Using Logarithmic, log-log plots to display outliers can help Linear plot Log-log plot matplotlib.pyplot.loglog
  • 15. Outlier detection general applications ● Cyber security: Detect cyber-attacks on networks: more trusted connections ● Equipment failure (risk theory/théorie de la fiabilité) industrial sector ● Fraud detection ● Detecting cheaters in mobile gaming ● Preprocessing task for analysis or machine learning ● Reduce false declines then grow the revenue ● Detecting French regions where Le Pen’s scores at the 2017 presidential election deviate from predictions based on socio-economic variables (towardsdatascience)
  • 16. Applications to financial sector (banking, insurance): Towards Fintech ● Retail bank: Credit card fraud ● Private bank: Market abuse, Anti-money laundering ● Investment bank: Market abuse (Flash Crash), Anti-money laundering ● Insurance companies: Fraudulent operations ● Central banks: detecting tax havens (panama papers) Requires different approaches because Red flags are banking-type specific (specific business expertise)
  • 18. Supervised vs. Unsupervised ● In credit card fraud detection one knows the target variable. How? Customers tell us. Can use supervised approach because the true class is self-revealing. ● In market abuse or money laundering detection we don’t really know classes (how money is laundered changes all the time and by the same criminal organisation)
  • 19. Why supervised learning is difficult? ● Severe class imbalance : we estimate that 99.9% are trusted operations vs. 0.1% of fraudulent operations ● Problem during the train test split ● Solution you can use the stratification option in the train_test_split function of sklearn
  • 20. And now why unsupervised is difficult? ● Sever class overlap : money laundering is mixed with legal financial activity, especially in Investment banks ● Uncertainty around the data model ● The complexity of data ● The huge volume of data (time complexity) ● Next we will discuss the data models ● To avoid time complexity: use dask, numba (used to speed numpy), ray and the newly module modin (used to speed pandas) + demonstration
  • 21. Algorithms ● We can differentiate between three methods: - Distance based algorithms (similarity) K-NN for classification K-Means for clustering - Density-based (fitting a density) DBSCAN and HDBSCAN Local outlier factor (LOF) - Parametric Gaussian mixture models (GMM) Single class SVM Extreme value theory: Tukey outlier labeling
  • 22. Tree ensemble algorithm : Isolation forest (used in Credit-swiss) (F. T. Liu, et al., Isolation Forest, Data Mining, 2008. ICDM’08, Eighth IEEE International Conference) from sklearn.ensemble import IsolationForest Ensemble regressor uses the concept of isolation to explain/separate- away anomalies ● No point based distance calculation ● Instead I.F. builds an ensemble of random trees for a given data set and anomalies are points with the shortest average path length
  • 23.
  • 24. Unsupervised learning for anomaly detection (conclusion) ● Unsupervised learning is all about finding structure in data ● Techniques: Clustering (K-means, spectral clustering) ● Principle Components Analysis ● Support Vector machine ● Autoencoder Deep Neural Networks : DNN autoencoder anomaly detection (exotic) ● Filtering, Sequential Bayesian Filtering ● Gaussian mixture Model clustering via EM ● LightGBM (Exotic)