SlideShare uma empresa Scribd logo
1 de 12
Baixar para ler offline
1
Reducing False Positives:
BSA AML Transaction Monitoring Re-Tuning Approach
Written by Mayank Johri and Erik De Monte
Introduction
Institutions waste millions per year analyzing false positives due to models which return low efficacy.
In an era of heightened regulatory scrutiny coupled with institutions’ desire to control compliance
costs, there is need for a sound methodology to improve the overall efficacy of alerts. High efficacy
and sound methodology allow institutions to better channel their time and resources to true
suspicious activities and improve the overall quality of a BSA/AML program.
Certain proposed solutions to this problem include automated alert closures, whitelists, etc. These
solutions do not in any way ameliorate the issue of reducing false positives and do not represent
sound principles for a robust BSA/AML program.
Instead of using “out-of-the-box” rules from the transaction monitoring software, custom rules that
encapsulate multiple scenarios and using automated learned behavior (based on past disposition),
customer segmentation, and peer group analysis may help improve the efficacy of the alerts;
however, these still have to be tuned to determine most effective thresholds.
Below is summarization of the steps/approach that can stand the scrutiny of examiners and fulfil the
desired objective of generating quality alerts.
Approach
Assessment & Prioritization
On a regular basis evaluate the efficacy of the current suspicious activity detection rules in
production, identify the rules with the lowest to the highest efficacy and create a prioritization list.
This list then drives the tuning schedule/plan.
Data Acquisition
Three sets of data are pulled for the re-tuning analysis:
1) All historic transaction data since the most previous tuning was implemented;
2) For the rule in question, all historic alerted transactions, and subsequent disposition
(escalated cases and SAR) data. This data can be collected by querying the backend
databases of the transaction monitoring system.
3) Various relevant customer data elements like entity/consumer, cash intensive business,
AI, etc. on the customers alerted.
Data Analysis
Stratify the data as required (such as grouping like-attributes or ‘non-tunable parameters’ such as
entity/consumer, cash intensive businesses, etc.) to account for like-attribute behavior patterns.
Subsequent to stratification, perform a series of data analyses to better understand the data. This
2
data analysis consists of, but is not limited to, identifying if suitable transaction codes and details are
all available, confirming the completeness and accuracy of the data set, and performing a series of
correlation tests to identify if certain data elements are correlated. This stage will help the institution
to understand the data specific to your client and data set. For example, two data elements may
prove to be correlated for an institution and not for another. If two data elements are found to be
correlated it may be in the institution’s best interest (from a resource or time perspective) to run data
analyses against those two elements in parallel.
Build Detection Engine
Using the transaction monitoring manual as a guide, recreate the rule using an object oriented
programming language (statistically driven language preferable; R, MATLAB or Python
recommended) to build an external engine to perform analysis on the rule thresholds.
A threshold range is determined for each threshold being tuned, and a matrix is created for all
combinations of each of the different possible threshold values. As mentioned above, in the event
that two thresholds are discovered to be directly correlated, choose to anchor these two thresholds
together as one to eliminate unnecessary noise in the permutation matrix.
Determine a de-minimis value to serve as the lowest threshold value in the re-tuning range for that
threshold. Professional judgment is used to identify the highest threshold value in the re-tuning
range for that threshold but typically will mirror the same delta between the current threshold value
and the de-minimis threshold value in the opposite direction.
For some rules, it is expected that this permutation matrix can easily create upwards of a thousand
different threshold combinations. A simple example is included below to visualize the permutation
matrix discussed above.
Threshold Current Threshold Current Threshold – Lower Range (de-minimis) Current Threshold – Upper Range
1 10,000 9,500 10,500
2 4 2 6
Figure 1.1 Sample Thresholds and Ranges
Permutation Threshold 1 Threshold 2
1 9,500 2
2 9,500 4
3 9,500 6
4 10,000 2
5 10,000 4
6 10,000 6
7 10,500 2
8 10,500 4
9 10,500 6
Figure 1.2 Sample Permutation Matrix
(of Thresholds and Ranges from Figure 1.1)
3
Once both the permutation matrix and the rule engine have been built, the transaction data is
clustered and all transactions falling into clusters outside of the threshold ranges are excluded and
the two sets of transaction data (full set of transaction data and the transaction data related to
historic alerts, cases, and SARs) are run through the rule engine against a loop of all threshold
combinations in the permutation matrix.
The first full set of all transactions are run through the engine to output a count of events, or
“alerts”, for each permutation combination. Before proceeding, identify the threshold combination
in the matrix which contains all current thresholds and compare this count against the actual alert
count per the historic transaction monitoring system data. This provides a check for completeness
over the data pull as well as validates the risk engine’s accuracy.
Once confirmed, the second set of transaction data linked to the historic cases and SARs are run
through the engine and logged as separate event counts in new columns in the matrix as show
below.
Permutation Threshold 1 Threshold 2
Transaction Alert
– Historic
All Transaction
Data – Count
Case Event
Count
SAR Event
Count
1 (Current) 10,000 4 65 65 14 2
2 (New) 10,000 2 65 58 11 2
3 (New) 10,500 2 65 51 10 1
4 (New) … … … … … …
Figure 1.3 Re-Tuning Permutation Matrix Event Counts
As seen above, the “Transaction Alert – Historic” count and the “All Transaction Data- Count” for
permutation 1 (the current threshold) are equal which would confirm the rule engine is simulating
the rule accurately. In permutation 2, when the thresholds have been adjusted to a new combination
there is a slight decline in the “All Transaction Data – Count”, as expected with the adjusted
thresholds (Note that the “Transaction Alert – Historic” will be anchored at 65 as this logic only
produces the alert count at the current thresholds).
It is notable to mention that the SAR count of 2 will be used as an anchor in the analysis of the
results to set the rule threshold or parameter. Best practice instructs that recent SARs serve as a
benchmark for tuning thresholds and should heavily considered in the analysis. As seen above in
permutation 3, the threshold combination would cause one of the historic SARs to evade detection,
and thus this permutation (and any additional permutations which do not detect 2 historic SARs)
should be subsequently eliminated from any consideration for re-tuning.
A sample transaction data set and shell code (written in R) for the detection engine discussed above
is provided in the Appendix.
Quantitative Analysis
Identify the remaining permutation combinations and focus the analysis on the case and SAR
retention proportions (SAR proportion is usually weighed the most in the analysis). Any threshold
combinations in the matrix with undesirably SAR and/or case retention ratios are eliminated from
the list of possibilities. No one specific line of demarcation is identified at the end of the quantitative
4
analysis for a re-tuning exercise. Instead, all remaining threshold combinations in the permutation
matrix continue through to the qualitative assessment and subsequent qualitative analysis is
performed to solidify a new proposed line of demarcation.
Qualitative Analysis
Determine during the quantitative analysis historical data in order to set indicators for Above-the-
Line (ATL) and Below-the-Line (BTL) and pull the qualitative samples to be reviewed by the FIU.
These samples, when flagged as ‘ATL’ are essentially the pseudo alerts, and are treated as such in the
FIU’s investigative analysis. BTL samples are included in the sample to further validate the threshold
line as the expectation is that less than x% of BTL samples (this percentage will depend on
institutions risk appetite) would return as escalated cases.
Sampling
Determine the appropriate sample size using a hypergeometric binomial sampling without
replacement. The number of transactions which fall into the ATL or BTL category will determine
the number of random samples required for a statistically significant qualitative assessment. A large
enough random sample of the same size would have roughly the same chance of producing a similar
result. Below is the formula to be used for determining sample size.
Included below is a sample size example:
N 620 Through data segmentation analysis (e.g., clustering, etc.) BTL population is determined.
CI 1.96
Target significance level (or confidence interval) is 95%; in this case associated factor (“z-value”) is 1.96.
In MS Excel this can be calculated using “=NORM.S.INV (1-((1-0.95)/2))”
Prec 0.05
Precision is set by risk appetite.
The smaller the value of this variable the larger the sample size needs to be.
P 10% Occurrence rate which needs to be detected
n 113 Based on these values listed above, n = 113






−
⋅
+





 ⋅
=
1
Prec
PQCI
N
1
1
Prec
PQCI
n
2
2
2
2
Legend
N = population size
P = expected occurrence rate of an attribute
Q = l - P
Prec = desired precision level
CI = associated factor at a given confidence level
5
The table below shows how each variable impacts sample size:
N Prec P CI n
620 0.05 0.1 1.96 113
620 0.03 0.1 1.96 237
620 0.05 0.2 1.96 176
620 0.05 0.1 1.64 135
Figure 1.4 Sample Size
Investigator Analysis
The purpose of generating these samples is for the FIU to qualitatively evaluate the efficacy of the
quantitatively calculated thresholds. A group of investigators should be selected for the exercise and
randomly assigned pseudo ‘alerts’ to review as if they were authentic alerts from the transaction
monitoring system. In theory, if the threshold is appropriately tuned, then a transaction marked
‘ATL’ should most likely also be classified as ‘suspicious’ during this qualitative analysis, and all
sample transactions that are marked ‘BTL’ would be flagged as not suspicious.
The investigator’s evaluation must include consideration for the intent of each rule, and they will
generally evaluate each transaction through a lens akin to “Given what is known from KYC,
origin/destination of funds, beneficiary, etcetera, is it explainable that this consumer/entity would
transact this dollar amount at this ...frequency, velocity, pattern etc...” To maintain the integrity of
this assessment, the investigator does not make this qualitative assessment based only on the value
of the flagged transaction, but rather looks holistically at various qualities of the transaction such as
who the transaction is from/to (is it a wire transfer between two branches of the same company or a
similar commodity like computers and semi-conductors), and if there are any fields such as an
individual’s last name which contain key words which caused the rule to misinterpret a field as a
false positive.
Proportion and Efficacy Tests
All threshold combinations will need a review to identify which threshold combination has the best
efficacy both from a quantitative and qualitative perspective.
The outcome of the investigator’s qualitative analysis and the subsequent statistical analysis decide if
the line of demarcation determined during the quantitative analysis remains at the current level or is
revised. The risk appetite determines the acceptable magnitude of proportion defective (proportion
of suspicious transactions), also known as the “efficacy rate”. The range of outcomes and the
corresponding decisions are listed below.
1. BTL has acceptable proportion of suspicious transactions and ATL proportion is
significantly different (i.e., larger) than BTL’s proportion; threshold remains at the current
level: the threshold meaningfully separates BTL and ATL populations and the separation is
at the “correct” level (in terms of the risk appetite).
2. Both BTL and ATL proportions are low. Regardless of the statistical difference between the
two populations, if the proportions are low, most likely the threshold needs to become less
stringent to reduce the level of false positive.
6
3. Both BTL and ATL proportions are higher than what is the acceptable level of suspicious
transactions. Threshold needs to become more stringent.
Approval and Implementation
Per the institution’s review and approval process, receive all necessary approvals from key personnel
prior to making any changes into production. Once all pertinent parties are in agreement, create a
functional specification document which should include a brief overview of the rule change, what is
currently configured, and the desired configuration changes to be made. It is imperative that the
functional specification document is thoroughly vetted and signed off validating that the document
provides all necessary and accurate information to make the desired implementation changes.
Authors
Mayank Johri and Erik De Monte both work in the Bank Security Act/Anti-Money Laundering
Analytics group at First Republic Bank in San Francisco, California. Their contact information is
included below
Mayank Johri, Vice President Analytics
https://www.linkedin.com/in/johrim
Erik De Monte, Data Scientist
https://www.linkedin.com/in/edemonte
7
Appendix: Detection Engine Shell Code (R)
Included below is a sample of transaction data and a detection engine shell code written in R that
the data can be run through to depict the methodology discussed above. Please note that the table
below should be saved as a comma-separated file (CSV) with the headers included as
“Transactions.csv”.
The R code was built using RStudio Version 0.99.902 and has been commented to navigate the user
through each step of the methodology.
8
Sample Transaction File (Save as “Transactions.csv”)
Transaction_Key Date Alert_Nbr Case_Nbr SAR_Nbr Attribute_01 Attribute_02 Attribute_03
TXN001 1/1/2016 NULL NULL NULL 6 0 70000
TXN002 1/15/2016 NULL NULL NULL 1 1 40
TXN003 2/1/2016 ALRT001 NULL NULL 11 2 1300000
TXN004 2/15/2016 NULL NULL NULL 5 1 340
TXN005 3/1/2016 NULL NULL NULL 7 0 126
TXN006 3/15/2016 NULL NULL NULL 7 0 986
TXN007 4/1/2016 NULL NULL NULL 5 0 1400
TXN008 4/15/2016 NULL NULL NULL 2 1 9765
TXN009 5/1/2016 NULL NULL NULL 3 0 2098
TXN010 5/15/2016 ALRT002 CASE001 SAR001 16 5 1000001
TXN011 6/1/2016 ALRT003 NULL NULL 15 3 1800765
TXN012 6/15/2016 NULL NULL NULL 3 1 65433
TXN013 1/1/2016 NULL NULL NULL 3 0 765889
TXN014 1/15/2016 NULL NULL NULL 4 1 12
TXN015 2/1/2016 NULL NULL NULL 7 1 2345
TXN016 2/15/2016 NULL NULL NULL 9 0 97800
TXN017 3/1/2016 NULL NULL NULL 6 0 5422
TXN018 3/15/2016 ALRT004 NULL NULL 12 2 1005678
TXN019 4/1/2016 NULL NULL NULL 6 1 9845
TXN020 4/15/2016 NULL NULL NULL 3 0 998
TXN021 5/1/2016 ALRT005 CASE002 NULL 18 4 1009876
TXN022 5/15/2016 NULL NULL NULL 4 0 12333
TXN023 6/1/2016 ALRT006 NULL NULL 10 5 1200000
TXN024 6/15/2016 ALRT007 CASE003 SAR002 20 10 34087264
9
Detection Engine Shell Code (R)
#//////////////////////////////////////////////////////////////////////
# Name: Re-Tuning Permutation Analysis - Example R Script
# Date: October 2016
# Developers: Erik De Monte, Mayank Johri
#//////////////////////////////////////////////////////////////////////
# Assumptions:
#
# i. There are 4 tables of Transactions available to be run through the engine:
# - All transactions for the date period identified
# - All transactions related to historic alerts for the date period identified
# - All transactions related to historic alerts that were escalated to case
# - All transactions related to historic alerts that were escalated to SAR
#
# ii. The data available for the relevant thresholds being re-tuned are available.
#//////////////////////////////////////////////////////////////////////
#0. Preliminary Procedures
#//////////////////////////////////////////////////////////////////////
# Load relevant preinstalled R Packages
library(cluster)
library(doBy)
library (base)
library(lubridate)
library(utils)
library(RODBC)
library(reshape)
library(dplyr)
# Upload and Format Data Frame
transactions <- read.csv(file='Transactions.csv', sep=',', header=TRUE, stringsAsFactors = FALSE)
transactions[,1] <- as.character(transactions[,1])
transactions[,2] <- as.Date(transactions[,2], format = "%m/%d/%Y")
transactions[,3] <- as.character(transactions[,3])
transactions[,4] <- as.character(transactions[,4])
transactions[,5] <- as.character(transactions[,5])
transactions[,6] <- as.numeric(transactions[,6])
transactions[,7] <- as.numeric(transactions[,7])
transactions[,8] <- as.numeric(transactions[,8])
#//////////////////////////////////////////////////////////////////////
# 1. Create a reference table for permutation matrix.
#//////////////////////////////////////////////////////////////////////
# 1a. Define Threshold Variables
# For the sake of this example, let us assume that the current thresholds are set at:
# threshold_01 = 10
# threshold_02 = 2
# threshold_03 = 1000000
# To define exact values to a threshold, assign it to a vector ("c")
# To define a sequence of values, use the "seq" function under the syntax:
# threshold = seq(a,b,c) ; Go from a to b in increments of c
threshold_01 = c(7, 10, 12)
threshold_02 = c(1,2,3)
threshold_03 = seq(800000,1200000,200000)
# 1b. Create the Threshold Table
10
x_Threshold_Table <- expand.grid(threshold_01,threshold_02,threshold_03)
# 1c. Accurately define the columns in the new table
names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var1'] <- 'Example_Threshold_01'
names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var2'] <- 'Example_Threshold_02'
names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var3'] <- 'Example_Threshold_03'
# 1d. Clean up your Enviornment and remove unneccesary varibales.
rm(threshold_01)
rm(threshold_02)
rm(threshold_03)
#//////////////////////////////////////////////////////////////////////
# 2. Loop transactions through each permutation in the Permutation Matrix (x_Threshold_Table)
# Count the number of events
#//////////////////////////////////////////////////////////////////////
# Count of Transactions - Current Thresholds
#//////////////////////////////////////////////////////////////////////
# 2a. Set the baseline alert count based on current transactions
# For the sake of this example, let us assume that the current thresholds are set at:
# threshold_01 = 10
# threshold_02 = 2
# threshold_03 = 1000000
# In this example, there are 7 historic alerts for the transaction set.
alerts <- subset(transactions, transactions$Alert_Nbr != 'NULL')
alert_count <- as.numeric(length(alerts$Transaction_Key))
x_Final <- data.frame(x_Threshold_Table[1:3], alert_count)
names(x_Final)[names(x_Final) == 'alert_count'] <- 'Transaction Alert - Historic'
rm(alert_count)
#//////////////////////////////////////////////////////////////////////
# Count of Transactions - Permutation Thresholds
#//////////////////////////////////////////////////////////////////////
# 2b. Create a variable which logs the number of events which fit the respective loop
Var_Event <- rep(NA,nrow(x_Threshold_Table))
# 2c. Loop through all threshold permutation combinations and create a subset of the transactions
that would alert
# var_index is used to temporarily hold the count of alerts between loops
for (i in 1:nrow(x_Threshold_Table)){
var_index <- subset(transactions, (
(transactions$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i])
& (transactions$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i])
& (transactions$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i])
))
#Count
Var_Event[i] <- as.numeric(length(var_index$Transaction_Key))
rm(var_index)
}
Event_Count=as.matrix(Var_Event)
x_Final = cbind(x_Final, Event_Count)
11
names(x_Final)[names(x_Final) == 'Event_Count'] <- 'Transaction Data - Count'
rm(Event_Count)
rm(Var_Event)
rm(i)
#//////////////////////////////////////////////////////////////////////
# Count of Historic Case Transactions
#//////////////////////////////////////////////////////////////////////
# Emulate the logic above using only the transactions related to historic cases.
# Append ("cbind") the results to the final permutation table as done above.
# Name it "Case Event Count"
cases <- subset(transactions, transactions$Case_Nbr != 'NULL')
Var_Event <- rep(NA,nrow(x_Threshold_Table))
for (i in 1:nrow(x_Threshold_Table)){
var_index <- subset(cases, (
(cases$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i])
& (cases$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i])
& (cases$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i])
))
#Count
Var_Event[i] <- as.numeric(length(var_index$Transaction_Key))
rm(var_index)
}
Event_Count=as.matrix(Var_Event)
x_Final = cbind(x_Final, Event_Count)
names(x_Final)[names(x_Final) == 'Event_Count'] <- 'Case Event Count'
rm(Event_Count)
rm(Var_Event)
rm(i)
#//////////////////////////////////////////////////////////////////////
# Count of Historic SAR Transactions
#//////////////////////////////////////////////////////////////////////
# Emulate the logic above using only the transactions related to historic SARs.
# Append ("cbind") the results to the final permutation table as done above.
# Name it "SAR Event Count"
sars <- subset(transactions, transactions$SAR_Nbr != 'NULL')
Var_Event <- rep(NA,nrow(x_Threshold_Table))
for (i in 1:nrow(x_Threshold_Table)){
var_index <- subset(sars, (
(sars$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i])
& (sars$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i])
& (sars$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i])
))
#Count
Var_Event[i] <- as.numeric(length(var_index$Transaction_Key))
rm(var_index)
}
12
Event_Count=as.matrix(Var_Event)
x_Final = cbind(x_Final, Event_Count)
names(x_Final)[names(x_Final) == 'Event_Count'] <- 'SAR Event Count'
rm(Event_Count)
rm(Var_Event)
rm(i)
#//////////////////////////////////////////////////////////////////////
# Anchor your analysis to the number of SARs filed, remove any combinations which would have
# missed a prior filed SAR.
sar_count <- as.numeric(length(sars$Transaction_Key))
x_Final <- subset(x_Final, x_Final$`SAR Event Count` >= sar_count)
rm(sar_count)
#//////////////////////////////////////////////////////////////////////
#//////////////////////////////////////////////////////////////////////
#//////////////////////////////////////////////////////////////////////
#//////////////////////////////////////////////////////////////////FIN.

Mais conteúdo relacionado

Mais procurados

Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIJSRD
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysishktripathy
 
Data mining query languages
Data mining query languagesData mining query languages
Data mining query languagesMarcy Morales
 
Feature Selection with R / in JP
Feature Selection with R / in JPFeature Selection with R / in JP
Feature Selection with R / in JPSercan Ahi
 
Tema3 modelo relacional - pasaje a tablas
Tema3   modelo relacional - pasaje a tablasTema3   modelo relacional - pasaje a tablas
Tema3 modelo relacional - pasaje a tablasAlvaro Loustau
 
Data mining in market basket analysis
Data mining in market basket analysisData mining in market basket analysis
Data mining in market basket analysisTanmayeeMandala
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with RYanchang Zhao
 
第5章 統計的仮説検定 (Rによるやさしい統計学)
第5章 統計的仮説検定 (Rによるやさしい統計学)第5章 統計的仮説検定 (Rによるやさしい統計学)
第5章 統計的仮説検定 (Rによるやさしい統計学)Prunus 1350
 
巨大な表を高速に扱うData.table について
巨大な表を高速に扱うData.table について巨大な表を高速に扱うData.table について
巨大な表を高速に扱うData.table についてHaruka Ozaki
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
Key,ID Field and Tables Relationship
Key,ID Field and Tables Relationship Key,ID Field and Tables Relationship
Key,ID Field and Tables Relationship ShouaQureshi
 
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...Raj vardhan
 
PRML 2.3 ガウス分布
PRML 2.3 ガウス分布PRML 2.3 ガウス分布
PRML 2.3 ガウス分布KokiTakamiya
 
統計的学習の基礎 3章前半
統計的学習の基礎 3章前半統計的学習の基礎 3章前半
統計的学習の基礎 3章前半Kazunori Miyanishi
 
R6パッケージの紹介―機能と実装
R6パッケージの紹介―機能と実装R6パッケージの紹介―機能と実装
R6パッケージの紹介―機能と実装__nakamichi__
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationDatamining Tools
 
Discriminant Analysis-lecture 8
Discriminant Analysis-lecture 8Discriminant Analysis-lecture 8
Discriminant Analysis-lecture 8Laila Fatehy
 

Mais procurados (20)

Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
 
Data mining query languages
Data mining query languagesData mining query languages
Data mining query languages
 
Feature Selection with R / in JP
Feature Selection with R / in JPFeature Selection with R / in JP
Feature Selection with R / in JP
 
Tema3 modelo relacional - pasaje a tablas
Tema3   modelo relacional - pasaje a tablasTema3   modelo relacional - pasaje a tablas
Tema3 modelo relacional - pasaje a tablas
 
Data mining in market basket analysis
Data mining in market basket analysisData mining in market basket analysis
Data mining in market basket analysis
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
5desc
5desc5desc
5desc
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with R
 
第5章 統計的仮説検定 (Rによるやさしい統計学)
第5章 統計的仮説検定 (Rによるやさしい統計学)第5章 統計的仮説検定 (Rによるやさしい統計学)
第5章 統計的仮説検定 (Rによるやさしい統計学)
 
巨大な表を高速に扱うData.table について
巨大な表を高速に扱うData.table について巨大な表を高速に扱うData.table について
巨大な表を高速に扱うData.table について
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
Key,ID Field and Tables Relationship
Key,ID Field and Tables Relationship Key,ID Field and Tables Relationship
Key,ID Field and Tables Relationship
 
ER to Relational Mapping
ER to Relational MappingER to Relational Mapping
ER to Relational Mapping
 
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
 
PRML 2.3 ガウス分布
PRML 2.3 ガウス分布PRML 2.3 ガウス分布
PRML 2.3 ガウス分布
 
統計的学習の基礎 3章前半
統計的学習の基礎 3章前半統計的学習の基礎 3章前半
統計的学習の基礎 3章前半
 
R6パッケージの紹介―機能と実装
R6パッケージの紹介―機能と実装R6パッケージの紹介―機能と実装
R6パッケージの紹介―機能と実装
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
 
Discriminant Analysis-lecture 8
Discriminant Analysis-lecture 8Discriminant Analysis-lecture 8
Discriminant Analysis-lecture 8
 

Destaque

Tiphanie May Resume 2015
Tiphanie May Resume 2015Tiphanie May Resume 2015
Tiphanie May Resume 2015Tiphanie May
 
Durlav Mudbhari - MSME Thesis
Durlav Mudbhari - MSME ThesisDurlav Mudbhari - MSME Thesis
Durlav Mudbhari - MSME ThesisDurlav Mudbhari
 
i like and dislike doing alone, with my family or friends
i like and dislike doing alone, with my family or friendsi like and dislike doing alone, with my family or friends
i like and dislike doing alone, with my family or friendsAndreina Builes
 
Kode etik advokat_indonesia
Kode etik advokat_indonesiaKode etik advokat_indonesia
Kode etik advokat_indonesiaDanang Satriya
 
Jason Lowe I_C HMI Devloper
Jason Lowe I_C HMI DevloperJason Lowe I_C HMI Devloper
Jason Lowe I_C HMI DevloperJerome Lowe
 
Trust transaction monitoring and aml for swift messaging
Trust transaction monitoring and aml for swift messagingTrust transaction monitoring and aml for swift messaging
Trust transaction monitoring and aml for swift messagingKeith Furst
 
Continuous Transaction Monitoring Detect and analyze anomalous transactions t...
Continuous Transaction Monitoring Detect and analyze anomalous transactions t...Continuous Transaction Monitoring Detect and analyze anomalous transactions t...
Continuous Transaction Monitoring Detect and analyze anomalous transactions t...Genpact Ltd
 
Model analisis wacana
Model analisis wacanaModel analisis wacana
Model analisis wacanasyifa atiqah
 

Destaque (12)

Dirty money
Dirty moneyDirty money
Dirty money
 
Tiphanie May Resume 2015
Tiphanie May Resume 2015Tiphanie May Resume 2015
Tiphanie May Resume 2015
 
паб
пабпаб
паб
 
альтернативные источники
альтернативные источникиальтернативные источники
альтернативные источники
 
Durlav Mudbhari - MSME Thesis
Durlav Mudbhari - MSME ThesisDurlav Mudbhari - MSME Thesis
Durlav Mudbhari - MSME Thesis
 
Resume
ResumeResume
Resume
 
i like and dislike doing alone, with my family or friends
i like and dislike doing alone, with my family or friendsi like and dislike doing alone, with my family or friends
i like and dislike doing alone, with my family or friends
 
Kode etik advokat_indonesia
Kode etik advokat_indonesiaKode etik advokat_indonesia
Kode etik advokat_indonesia
 
Jason Lowe I_C HMI Devloper
Jason Lowe I_C HMI DevloperJason Lowe I_C HMI Devloper
Jason Lowe I_C HMI Devloper
 
Trust transaction monitoring and aml for swift messaging
Trust transaction monitoring and aml for swift messagingTrust transaction monitoring and aml for swift messaging
Trust transaction monitoring and aml for swift messaging
 
Continuous Transaction Monitoring Detect and analyze anomalous transactions t...
Continuous Transaction Monitoring Detect and analyze anomalous transactions t...Continuous Transaction Monitoring Detect and analyze anomalous transactions t...
Continuous Transaction Monitoring Detect and analyze anomalous transactions t...
 
Model analisis wacana
Model analisis wacanaModel analisis wacana
Model analisis wacana
 

Semelhante a Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach

Quality assurance management
Quality assurance managementQuality assurance management
Quality assurance managementselinasimpson0301
 
Quality management information system
Quality management information systemQuality management information system
Quality management information systemselinasimpson341
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industryskewdlogix
 
Risk based quality management
Risk based quality managementRisk based quality management
Risk based quality managementselinasimpson2301
 
Open06
Open06Open06
Open06butest
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisIRJET Journal
 
Electronic quality management system
Electronic quality management systemElectronic quality management system
Electronic quality management systemselinasimpson2401
 
Supplier quality management system
Supplier quality management systemSupplier quality management system
Supplier quality management systemselinasimpson1901
 
Sharepoint quality management system
Sharepoint quality management systemSharepoint quality management system
Sharepoint quality management systemselinasimpson2101
 
Software quality management question bank
Software quality management question bankSoftware quality management question bank
Software quality management question bankselinasimpson3001
 
Intelligent Supermarket using Apriori
Intelligent Supermarket using AprioriIntelligent Supermarket using Apriori
Intelligent Supermarket using AprioriIRJET Journal
 
Iso 9001 gap analysis
Iso 9001 gap analysisIso 9001 gap analysis
Iso 9001 gap analysisjomjintra
 
Iso 9001 quality standards
Iso 9001 quality standardsIso 9001 quality standards
Iso 9001 quality standardspogerita
 
Iso 9001 companies
Iso 9001 companiesIso 9001 companies
Iso 9001 companiesjengutajom
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom IndustrySatyam Barsaiyan
 
How much does iso 9001 cost
How much does iso 9001 costHow much does iso 9001 cost
How much does iso 9001 costjondarita
 
Softhandover criteria
Softhandover criteriaSofthandover criteria
Softhandover criteriaDian Azizi
 
Requirements of iso 9001
Requirements of iso 9001Requirements of iso 9001
Requirements of iso 9001jomjenguta
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningIRJET Journal
 

Semelhante a Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach (20)

Quality assurance management
Quality assurance managementQuality assurance management
Quality assurance management
 
Quality management information system
Quality management information systemQuality management information system
Quality management information system
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industry
 
Risk based quality management
Risk based quality managementRisk based quality management
Risk based quality management
 
Open06
Open06Open06
Open06
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend Analysis
 
Electronic quality management system
Electronic quality management systemElectronic quality management system
Electronic quality management system
 
Supplier quality management system
Supplier quality management systemSupplier quality management system
Supplier quality management system
 
Sharepoint quality management system
Sharepoint quality management systemSharepoint quality management system
Sharepoint quality management system
 
Software quality management question bank
Software quality management question bankSoftware quality management question bank
Software quality management question bank
 
Intelligent Supermarket using Apriori
Intelligent Supermarket using AprioriIntelligent Supermarket using Apriori
Intelligent Supermarket using Apriori
 
Iso 9001 gap analysis
Iso 9001 gap analysisIso 9001 gap analysis
Iso 9001 gap analysis
 
Iso 9001 quality standards
Iso 9001 quality standardsIso 9001 quality standards
Iso 9001 quality standards
 
Iso 9001 companies
Iso 9001 companiesIso 9001 companies
Iso 9001 companies
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom Industry
 
How much does iso 9001 cost
How much does iso 9001 costHow much does iso 9001 cost
How much does iso 9001 cost
 
Softhandover criteria
Softhandover criteriaSofthandover criteria
Softhandover criteria
 
Requirements of iso 9001
Requirements of iso 9001Requirements of iso 9001
Requirements of iso 9001
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data Mining
 

Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach

  • 1. 1 Reducing False Positives: BSA AML Transaction Monitoring Re-Tuning Approach Written by Mayank Johri and Erik De Monte Introduction Institutions waste millions per year analyzing false positives due to models which return low efficacy. In an era of heightened regulatory scrutiny coupled with institutions’ desire to control compliance costs, there is need for a sound methodology to improve the overall efficacy of alerts. High efficacy and sound methodology allow institutions to better channel their time and resources to true suspicious activities and improve the overall quality of a BSA/AML program. Certain proposed solutions to this problem include automated alert closures, whitelists, etc. These solutions do not in any way ameliorate the issue of reducing false positives and do not represent sound principles for a robust BSA/AML program. Instead of using “out-of-the-box” rules from the transaction monitoring software, custom rules that encapsulate multiple scenarios and using automated learned behavior (based on past disposition), customer segmentation, and peer group analysis may help improve the efficacy of the alerts; however, these still have to be tuned to determine most effective thresholds. Below is summarization of the steps/approach that can stand the scrutiny of examiners and fulfil the desired objective of generating quality alerts. Approach Assessment & Prioritization On a regular basis evaluate the efficacy of the current suspicious activity detection rules in production, identify the rules with the lowest to the highest efficacy and create a prioritization list. This list then drives the tuning schedule/plan. Data Acquisition Three sets of data are pulled for the re-tuning analysis: 1) All historic transaction data since the most previous tuning was implemented; 2) For the rule in question, all historic alerted transactions, and subsequent disposition (escalated cases and SAR) data. This data can be collected by querying the backend databases of the transaction monitoring system. 3) Various relevant customer data elements like entity/consumer, cash intensive business, AI, etc. on the customers alerted. Data Analysis Stratify the data as required (such as grouping like-attributes or ‘non-tunable parameters’ such as entity/consumer, cash intensive businesses, etc.) to account for like-attribute behavior patterns. Subsequent to stratification, perform a series of data analyses to better understand the data. This
  • 2. 2 data analysis consists of, but is not limited to, identifying if suitable transaction codes and details are all available, confirming the completeness and accuracy of the data set, and performing a series of correlation tests to identify if certain data elements are correlated. This stage will help the institution to understand the data specific to your client and data set. For example, two data elements may prove to be correlated for an institution and not for another. If two data elements are found to be correlated it may be in the institution’s best interest (from a resource or time perspective) to run data analyses against those two elements in parallel. Build Detection Engine Using the transaction monitoring manual as a guide, recreate the rule using an object oriented programming language (statistically driven language preferable; R, MATLAB or Python recommended) to build an external engine to perform analysis on the rule thresholds. A threshold range is determined for each threshold being tuned, and a matrix is created for all combinations of each of the different possible threshold values. As mentioned above, in the event that two thresholds are discovered to be directly correlated, choose to anchor these two thresholds together as one to eliminate unnecessary noise in the permutation matrix. Determine a de-minimis value to serve as the lowest threshold value in the re-tuning range for that threshold. Professional judgment is used to identify the highest threshold value in the re-tuning range for that threshold but typically will mirror the same delta between the current threshold value and the de-minimis threshold value in the opposite direction. For some rules, it is expected that this permutation matrix can easily create upwards of a thousand different threshold combinations. A simple example is included below to visualize the permutation matrix discussed above. Threshold Current Threshold Current Threshold – Lower Range (de-minimis) Current Threshold – Upper Range 1 10,000 9,500 10,500 2 4 2 6 Figure 1.1 Sample Thresholds and Ranges Permutation Threshold 1 Threshold 2 1 9,500 2 2 9,500 4 3 9,500 6 4 10,000 2 5 10,000 4 6 10,000 6 7 10,500 2 8 10,500 4 9 10,500 6 Figure 1.2 Sample Permutation Matrix (of Thresholds and Ranges from Figure 1.1)
  • 3. 3 Once both the permutation matrix and the rule engine have been built, the transaction data is clustered and all transactions falling into clusters outside of the threshold ranges are excluded and the two sets of transaction data (full set of transaction data and the transaction data related to historic alerts, cases, and SARs) are run through the rule engine against a loop of all threshold combinations in the permutation matrix. The first full set of all transactions are run through the engine to output a count of events, or “alerts”, for each permutation combination. Before proceeding, identify the threshold combination in the matrix which contains all current thresholds and compare this count against the actual alert count per the historic transaction monitoring system data. This provides a check for completeness over the data pull as well as validates the risk engine’s accuracy. Once confirmed, the second set of transaction data linked to the historic cases and SARs are run through the engine and logged as separate event counts in new columns in the matrix as show below. Permutation Threshold 1 Threshold 2 Transaction Alert – Historic All Transaction Data – Count Case Event Count SAR Event Count 1 (Current) 10,000 4 65 65 14 2 2 (New) 10,000 2 65 58 11 2 3 (New) 10,500 2 65 51 10 1 4 (New) … … … … … … Figure 1.3 Re-Tuning Permutation Matrix Event Counts As seen above, the “Transaction Alert – Historic” count and the “All Transaction Data- Count” for permutation 1 (the current threshold) are equal which would confirm the rule engine is simulating the rule accurately. In permutation 2, when the thresholds have been adjusted to a new combination there is a slight decline in the “All Transaction Data – Count”, as expected with the adjusted thresholds (Note that the “Transaction Alert – Historic” will be anchored at 65 as this logic only produces the alert count at the current thresholds). It is notable to mention that the SAR count of 2 will be used as an anchor in the analysis of the results to set the rule threshold or parameter. Best practice instructs that recent SARs serve as a benchmark for tuning thresholds and should heavily considered in the analysis. As seen above in permutation 3, the threshold combination would cause one of the historic SARs to evade detection, and thus this permutation (and any additional permutations which do not detect 2 historic SARs) should be subsequently eliminated from any consideration for re-tuning. A sample transaction data set and shell code (written in R) for the detection engine discussed above is provided in the Appendix. Quantitative Analysis Identify the remaining permutation combinations and focus the analysis on the case and SAR retention proportions (SAR proportion is usually weighed the most in the analysis). Any threshold combinations in the matrix with undesirably SAR and/or case retention ratios are eliminated from the list of possibilities. No one specific line of demarcation is identified at the end of the quantitative
  • 4. 4 analysis for a re-tuning exercise. Instead, all remaining threshold combinations in the permutation matrix continue through to the qualitative assessment and subsequent qualitative analysis is performed to solidify a new proposed line of demarcation. Qualitative Analysis Determine during the quantitative analysis historical data in order to set indicators for Above-the- Line (ATL) and Below-the-Line (BTL) and pull the qualitative samples to be reviewed by the FIU. These samples, when flagged as ‘ATL’ are essentially the pseudo alerts, and are treated as such in the FIU’s investigative analysis. BTL samples are included in the sample to further validate the threshold line as the expectation is that less than x% of BTL samples (this percentage will depend on institutions risk appetite) would return as escalated cases. Sampling Determine the appropriate sample size using a hypergeometric binomial sampling without replacement. The number of transactions which fall into the ATL or BTL category will determine the number of random samples required for a statistically significant qualitative assessment. A large enough random sample of the same size would have roughly the same chance of producing a similar result. Below is the formula to be used for determining sample size. Included below is a sample size example: N 620 Through data segmentation analysis (e.g., clustering, etc.) BTL population is determined. CI 1.96 Target significance level (or confidence interval) is 95%; in this case associated factor (“z-value”) is 1.96. In MS Excel this can be calculated using “=NORM.S.INV (1-((1-0.95)/2))” Prec 0.05 Precision is set by risk appetite. The smaller the value of this variable the larger the sample size needs to be. P 10% Occurrence rate which needs to be detected n 113 Based on these values listed above, n = 113       − ⋅ +       ⋅ = 1 Prec PQCI N 1 1 Prec PQCI n 2 2 2 2 Legend N = population size P = expected occurrence rate of an attribute Q = l - P Prec = desired precision level CI = associated factor at a given confidence level
  • 5. 5 The table below shows how each variable impacts sample size: N Prec P CI n 620 0.05 0.1 1.96 113 620 0.03 0.1 1.96 237 620 0.05 0.2 1.96 176 620 0.05 0.1 1.64 135 Figure 1.4 Sample Size Investigator Analysis The purpose of generating these samples is for the FIU to qualitatively evaluate the efficacy of the quantitatively calculated thresholds. A group of investigators should be selected for the exercise and randomly assigned pseudo ‘alerts’ to review as if they were authentic alerts from the transaction monitoring system. In theory, if the threshold is appropriately tuned, then a transaction marked ‘ATL’ should most likely also be classified as ‘suspicious’ during this qualitative analysis, and all sample transactions that are marked ‘BTL’ would be flagged as not suspicious. The investigator’s evaluation must include consideration for the intent of each rule, and they will generally evaluate each transaction through a lens akin to “Given what is known from KYC, origin/destination of funds, beneficiary, etcetera, is it explainable that this consumer/entity would transact this dollar amount at this ...frequency, velocity, pattern etc...” To maintain the integrity of this assessment, the investigator does not make this qualitative assessment based only on the value of the flagged transaction, but rather looks holistically at various qualities of the transaction such as who the transaction is from/to (is it a wire transfer between two branches of the same company or a similar commodity like computers and semi-conductors), and if there are any fields such as an individual’s last name which contain key words which caused the rule to misinterpret a field as a false positive. Proportion and Efficacy Tests All threshold combinations will need a review to identify which threshold combination has the best efficacy both from a quantitative and qualitative perspective. The outcome of the investigator’s qualitative analysis and the subsequent statistical analysis decide if the line of demarcation determined during the quantitative analysis remains at the current level or is revised. The risk appetite determines the acceptable magnitude of proportion defective (proportion of suspicious transactions), also known as the “efficacy rate”. The range of outcomes and the corresponding decisions are listed below. 1. BTL has acceptable proportion of suspicious transactions and ATL proportion is significantly different (i.e., larger) than BTL’s proportion; threshold remains at the current level: the threshold meaningfully separates BTL and ATL populations and the separation is at the “correct” level (in terms of the risk appetite). 2. Both BTL and ATL proportions are low. Regardless of the statistical difference between the two populations, if the proportions are low, most likely the threshold needs to become less stringent to reduce the level of false positive.
  • 6. 6 3. Both BTL and ATL proportions are higher than what is the acceptable level of suspicious transactions. Threshold needs to become more stringent. Approval and Implementation Per the institution’s review and approval process, receive all necessary approvals from key personnel prior to making any changes into production. Once all pertinent parties are in agreement, create a functional specification document which should include a brief overview of the rule change, what is currently configured, and the desired configuration changes to be made. It is imperative that the functional specification document is thoroughly vetted and signed off validating that the document provides all necessary and accurate information to make the desired implementation changes. Authors Mayank Johri and Erik De Monte both work in the Bank Security Act/Anti-Money Laundering Analytics group at First Republic Bank in San Francisco, California. Their contact information is included below Mayank Johri, Vice President Analytics https://www.linkedin.com/in/johrim Erik De Monte, Data Scientist https://www.linkedin.com/in/edemonte
  • 7. 7 Appendix: Detection Engine Shell Code (R) Included below is a sample of transaction data and a detection engine shell code written in R that the data can be run through to depict the methodology discussed above. Please note that the table below should be saved as a comma-separated file (CSV) with the headers included as “Transactions.csv”. The R code was built using RStudio Version 0.99.902 and has been commented to navigate the user through each step of the methodology.
  • 8. 8 Sample Transaction File (Save as “Transactions.csv”) Transaction_Key Date Alert_Nbr Case_Nbr SAR_Nbr Attribute_01 Attribute_02 Attribute_03 TXN001 1/1/2016 NULL NULL NULL 6 0 70000 TXN002 1/15/2016 NULL NULL NULL 1 1 40 TXN003 2/1/2016 ALRT001 NULL NULL 11 2 1300000 TXN004 2/15/2016 NULL NULL NULL 5 1 340 TXN005 3/1/2016 NULL NULL NULL 7 0 126 TXN006 3/15/2016 NULL NULL NULL 7 0 986 TXN007 4/1/2016 NULL NULL NULL 5 0 1400 TXN008 4/15/2016 NULL NULL NULL 2 1 9765 TXN009 5/1/2016 NULL NULL NULL 3 0 2098 TXN010 5/15/2016 ALRT002 CASE001 SAR001 16 5 1000001 TXN011 6/1/2016 ALRT003 NULL NULL 15 3 1800765 TXN012 6/15/2016 NULL NULL NULL 3 1 65433 TXN013 1/1/2016 NULL NULL NULL 3 0 765889 TXN014 1/15/2016 NULL NULL NULL 4 1 12 TXN015 2/1/2016 NULL NULL NULL 7 1 2345 TXN016 2/15/2016 NULL NULL NULL 9 0 97800 TXN017 3/1/2016 NULL NULL NULL 6 0 5422 TXN018 3/15/2016 ALRT004 NULL NULL 12 2 1005678 TXN019 4/1/2016 NULL NULL NULL 6 1 9845 TXN020 4/15/2016 NULL NULL NULL 3 0 998 TXN021 5/1/2016 ALRT005 CASE002 NULL 18 4 1009876 TXN022 5/15/2016 NULL NULL NULL 4 0 12333 TXN023 6/1/2016 ALRT006 NULL NULL 10 5 1200000 TXN024 6/15/2016 ALRT007 CASE003 SAR002 20 10 34087264
  • 9. 9 Detection Engine Shell Code (R) #////////////////////////////////////////////////////////////////////// # Name: Re-Tuning Permutation Analysis - Example R Script # Date: October 2016 # Developers: Erik De Monte, Mayank Johri #////////////////////////////////////////////////////////////////////// # Assumptions: # # i. There are 4 tables of Transactions available to be run through the engine: # - All transactions for the date period identified # - All transactions related to historic alerts for the date period identified # - All transactions related to historic alerts that were escalated to case # - All transactions related to historic alerts that were escalated to SAR # # ii. The data available for the relevant thresholds being re-tuned are available. #////////////////////////////////////////////////////////////////////// #0. Preliminary Procedures #////////////////////////////////////////////////////////////////////// # Load relevant preinstalled R Packages library(cluster) library(doBy) library (base) library(lubridate) library(utils) library(RODBC) library(reshape) library(dplyr) # Upload and Format Data Frame transactions <- read.csv(file='Transactions.csv', sep=',', header=TRUE, stringsAsFactors = FALSE) transactions[,1] <- as.character(transactions[,1]) transactions[,2] <- as.Date(transactions[,2], format = "%m/%d/%Y") transactions[,3] <- as.character(transactions[,3]) transactions[,4] <- as.character(transactions[,4]) transactions[,5] <- as.character(transactions[,5]) transactions[,6] <- as.numeric(transactions[,6]) transactions[,7] <- as.numeric(transactions[,7]) transactions[,8] <- as.numeric(transactions[,8]) #////////////////////////////////////////////////////////////////////// # 1. Create a reference table for permutation matrix. #////////////////////////////////////////////////////////////////////// # 1a. Define Threshold Variables # For the sake of this example, let us assume that the current thresholds are set at: # threshold_01 = 10 # threshold_02 = 2 # threshold_03 = 1000000 # To define exact values to a threshold, assign it to a vector ("c") # To define a sequence of values, use the "seq" function under the syntax: # threshold = seq(a,b,c) ; Go from a to b in increments of c threshold_01 = c(7, 10, 12) threshold_02 = c(1,2,3) threshold_03 = seq(800000,1200000,200000) # 1b. Create the Threshold Table
  • 10. 10 x_Threshold_Table <- expand.grid(threshold_01,threshold_02,threshold_03) # 1c. Accurately define the columns in the new table names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var1'] <- 'Example_Threshold_01' names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var2'] <- 'Example_Threshold_02' names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var3'] <- 'Example_Threshold_03' # 1d. Clean up your Enviornment and remove unneccesary varibales. rm(threshold_01) rm(threshold_02) rm(threshold_03) #////////////////////////////////////////////////////////////////////// # 2. Loop transactions through each permutation in the Permutation Matrix (x_Threshold_Table) # Count the number of events #////////////////////////////////////////////////////////////////////// # Count of Transactions - Current Thresholds #////////////////////////////////////////////////////////////////////// # 2a. Set the baseline alert count based on current transactions # For the sake of this example, let us assume that the current thresholds are set at: # threshold_01 = 10 # threshold_02 = 2 # threshold_03 = 1000000 # In this example, there are 7 historic alerts for the transaction set. alerts <- subset(transactions, transactions$Alert_Nbr != 'NULL') alert_count <- as.numeric(length(alerts$Transaction_Key)) x_Final <- data.frame(x_Threshold_Table[1:3], alert_count) names(x_Final)[names(x_Final) == 'alert_count'] <- 'Transaction Alert - Historic' rm(alert_count) #////////////////////////////////////////////////////////////////////// # Count of Transactions - Permutation Thresholds #////////////////////////////////////////////////////////////////////// # 2b. Create a variable which logs the number of events which fit the respective loop Var_Event <- rep(NA,nrow(x_Threshold_Table)) # 2c. Loop through all threshold permutation combinations and create a subset of the transactions that would alert # var_index is used to temporarily hold the count of alerts between loops for (i in 1:nrow(x_Threshold_Table)){ var_index <- subset(transactions, ( (transactions$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i]) & (transactions$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i]) & (transactions$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i]) )) #Count Var_Event[i] <- as.numeric(length(var_index$Transaction_Key)) rm(var_index) } Event_Count=as.matrix(Var_Event) x_Final = cbind(x_Final, Event_Count)
  • 11. 11 names(x_Final)[names(x_Final) == 'Event_Count'] <- 'Transaction Data - Count' rm(Event_Count) rm(Var_Event) rm(i) #////////////////////////////////////////////////////////////////////// # Count of Historic Case Transactions #////////////////////////////////////////////////////////////////////// # Emulate the logic above using only the transactions related to historic cases. # Append ("cbind") the results to the final permutation table as done above. # Name it "Case Event Count" cases <- subset(transactions, transactions$Case_Nbr != 'NULL') Var_Event <- rep(NA,nrow(x_Threshold_Table)) for (i in 1:nrow(x_Threshold_Table)){ var_index <- subset(cases, ( (cases$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i]) & (cases$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i]) & (cases$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i]) )) #Count Var_Event[i] <- as.numeric(length(var_index$Transaction_Key)) rm(var_index) } Event_Count=as.matrix(Var_Event) x_Final = cbind(x_Final, Event_Count) names(x_Final)[names(x_Final) == 'Event_Count'] <- 'Case Event Count' rm(Event_Count) rm(Var_Event) rm(i) #////////////////////////////////////////////////////////////////////// # Count of Historic SAR Transactions #////////////////////////////////////////////////////////////////////// # Emulate the logic above using only the transactions related to historic SARs. # Append ("cbind") the results to the final permutation table as done above. # Name it "SAR Event Count" sars <- subset(transactions, transactions$SAR_Nbr != 'NULL') Var_Event <- rep(NA,nrow(x_Threshold_Table)) for (i in 1:nrow(x_Threshold_Table)){ var_index <- subset(sars, ( (sars$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i]) & (sars$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i]) & (sars$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i]) )) #Count Var_Event[i] <- as.numeric(length(var_index$Transaction_Key)) rm(var_index) }
  • 12. 12 Event_Count=as.matrix(Var_Event) x_Final = cbind(x_Final, Event_Count) names(x_Final)[names(x_Final) == 'Event_Count'] <- 'SAR Event Count' rm(Event_Count) rm(Var_Event) rm(i) #////////////////////////////////////////////////////////////////////// # Anchor your analysis to the number of SARs filed, remove any combinations which would have # missed a prior filed SAR. sar_count <- as.numeric(length(sars$Transaction_Key)) x_Final <- subset(x_Final, x_Final$`SAR Event Count` >= sar_count) rm(sar_count) #////////////////////////////////////////////////////////////////////// #////////////////////////////////////////////////////////////////////// #////////////////////////////////////////////////////////////////////// #//////////////////////////////////////////////////////////////////FIN.