Near Real-Time Fraud Detection in Telecommunication Industry

Burak IŞIKLI
Near Real-Time
Fraud Detection in
Telecommunication
Industry

Who am I?
Burak IŞIKLI
Senior Software Engineer, Turkcell
burak.isikli at turkcell dot com.tr
@burakisikli, github.com/burakisikli

Turkcell
• It is the first and only Turkish company ever to be listed on the NYSE.

Fraud Overview
• Roaming Fraud – $6.11B
• Premium Rate Service Fraud – $4.73B
• Abuse or Arbitrage Fraud – $2.2B
• International Revenue Share Fraud (IRSF)- $1.8B
• Domestic Revenue Share Fraud (DRSF) – $0.8B
• Call and SMS Spamming – $0.8B
• Subscription Fraud – $5.2B
• Wangiri Fraud – $2B annually
• SMS Phishing/Pharming – $1.7B
• (V)PBX Fraud – $4.42B

Our Fraud Journey
• Wangiri
• Anomaly Detection
• Simbox

Wangiri
• Wangiri (One ring-cut)
• Premium services
• Both local and
international services

Wangiri
Problem: Someone
makes a call, then
subscriber will be curious
about it and calls back but
he/she didn’t know it’s a
premium service so that
next bill become
expensive which is
unexpected for subscriber

Wangiri
Solution: Design a system that detect these numbers
to react
• There’s no historical data so it’s an analytical
solution
• Python + Spark = Pyspark
• Send an email to alert

Wangiri
• Process CDRs in near real time(nrt) using Spark
• Generate features using these data in NRT
• 1TB data/day
• Wangiri Detection Rate > %50

Anomaly Detection
Problem: Identify the anomalies of the international
calls and determine if there is a situation where action
should be taken

Solution: Designing a system that can detect
unexpected changes. Whenever there is a change,
produce an alarm so that business can quickly examine
and take action.
Anomaly Detection

Lambda Architecture
Anomaly Detection - Architecture

• 68–95–99.7 (three sigma) Rule
• K-means clustering
• One-class SVM
• Seasonal Hybrid Extreme Studentized Deviates (S-
H-ESD)
• Holt-Winters Method
• Deep Learning
Anomaly Detection - Methods

68-95-99.7 Rule
• Data should be distributed properly

68-95-99.7 Rule
• But data changes in time…

68-95-99.7 Rule
• Implemented in Spark and SQL
• Too much false positive alarm
• Current data should be more weighted than
historical data
• Not enough data to explain anomaly

One-Class SVM
• Clustering
• All data in one class
• Too much false positive so not good enough

One-Class SVM
• Using Oracle Data Miner(ODM) easier to
implementation

One-Class SVM
• Explain how?

K-Means Clustering
• Number of cluster is predefined
• It’s impractical to define number of cluster for each
country

S-H-ESD
• Global(Seasonally) and Local(trend) Analysis
• R Library Open sourced by Twitter
• Prototype using R
• Ported to python to get production-ready version
• Easier to explain, successful results

Simbox
+90 5321112233
Ayşe
+49 5321234567
Ali
Ali
Ayşe
Turkcell gets 0,050 €

Simbox
+90 5321112233
Ayşe
+90 5321111111
Ali
Ali
Ayşe
Turkcell gets 0,01 €

Simbox - Methods
• Analytical
• Machine Learning
✓ Supervised Learning
✓ Crime Ring Detection (In Progress)

Simbox - Machine Learning
• We have simbox numbers that have been blocked
before and their activities(CDR)
• Supervised Learning!

• ROC = 0.99
• Good enough?
• What about other metrics such as precision,
recall, confusion matrix and so on?

• ROC = 0.99
• Rookie Mistake
▪ 0 predicted simbox!
• Imbalanced data

• Imbalanced Data
1. Under sampling
2. Oversampling
3. SMOTE(Synthetic Minority Over-sampling
Technique)
4. Different algorithms
▪ Decision Tree, Random Forest
5. Penalized Models
▪ Penalized-SVM, Penalized-LDA

Spark:
• Under sampling
▪ Filter&union or sample
• Oversampling
▪ sample(True(withReplacement),
67000.0(fraction), 1234(seed))
• Different algorithms
▪ Decision Tree, Random Forest
• Other models in progress

Results:
Accuracy = %88.95
F1 = %88.91
AUROC = %79.87
AUPR = %82.23

Results:
• Oversampling or under sampling doesn’t change
the accuracy so much
• Random forest and decision tree handles
imbalanced data set
• The accuracy can be improved

Near Real-Time Fraud Detection in Telecommunication Industry

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Near Real-Time Fraud Detection in Telecommunication Industry

Semelhante a Near Real-Time Fraud Detection in Telecommunication Industry (20)

Mais de Evention

Mais de Evention (20)

Último

Último (20)

Near Real-Time Fraud Detection in Telecommunication Industry