Deep Anomaly Detection from Research to Production Leveraging Spark and Tensorflow

Deep Anomaly Detection
From research to production
Leveraging Spark and TensorFlow
Davit Bzhalava, Shaheer Mansoor, Erik Ekerot
What’s in this talk:
We present our work on anomaly
detection: describe overall results, how
Swedbank goes from research to
production, and what we think is important
when building AI infrastructure.

Sweden
Population: 10.2 million
Private customers: 4.0 million
Corporate customers: 335 000
Branches: 248
Employees: 8 600
Estonia
Branches: 35
Employees: 2 600
Latvia
Branches: 41
Employees: 1 800
Lithuania
Branches: 65
Employees: 2 500
The leading bank in our home markets

Analytics & AI
R&D
Data exploration, advanced
analytics discovery and AI
research.
Long term.
Delivery & Service
Deliver on-demand analytics and
AI applications addressing
immediate business requirements.
Short term.
Business development
Effective support and execution of
prioritized value streams and
strategic business projects.
Medium term.
› Established in 2016, consolidating disparate analytics
functions across the organization.
› Today 30 colleagues.
› A mix of data scientists, business analysts, developers,
engineers and project managers.
› A center of excellence, promoting a data-driven mindset
in the bank.

Why this now?
0
10
20
30
40
50
60
70
80
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
The police Univers ities Healthc are National bank
Royal house Radio & TV The church The parliam ent
Unions Gov ernment The press Large companies
EU com mis sion Bank s Political parties
Source: Kantar SIFO, Medieakademin, Trust Barometer. Share high/quite high trust.
%

A multi-lingual team
Data
scientists
Python
Statisticians
R
Data
analysts
Python
SQL
Machine learning
engineers
Java
Scala

A platform- and framework
agnostic approach
Collect
data
Feature
engineer
Analyse
Model
Deploy
Monitor
Act

From R&D to production
Research-research
platform
Prepare datasets
Build and train model
Production-research
platform
Translate and package model
Production
platform
Publish and run model

Production-research
platform
Production
platform
Translate and package model Publish and run model
Research-research
platform
Prepare datasets
Production-research
platform
Production
platform

Why Deep Anomaly Detection?
Fraud schemes are living organisms, far from static
scenarios.
Complexity of schemes on the growth due to variety of
disruptions.
Lack of access to abundant/any recent cases.
We don’t always know what we are exactly looking for
– but we do know what we are not looking for.

Contrasting old and new
IF
Transactions count > 3
AND
Transactions amount > 50 000 SEK
AND
Country is on red list
THEN
Flag
model.compile(…)
model.fit(…)
model.evaluate(…)
model.predict(…)

F
e
a
t
u
r
e
e
n
g
i
n
e
e
r
i
n
g
:
t
r
a
n
s
a
c
t
i
o
n
s
2
v
e
c
Writing to the feature store
val transactionsFeatureDs = rawTransactionDs.map(
(rawTransaction: RawTransaction) => TransactionFeature(
amount = rawTransaction.amount,
destination = rawTransaction.destimation,
type_of_transaction = rawTransaction.type_of_transaction
))
Hops.createFeaturegroup(TRANSACTIONS_FEATUREGROUP)
.setDataframe(transactionsFeatureDs.toDF)
.setVersion(1)
.setDescription("Features of card transactions")
.setPrimaryKey("customer_id")
.write()
Reading from the feature store
val transactions_featuresDF = Hops.
getFeaturegroup(
"Features of card transactions")
.read()
Hops.
createTrainingDataset(
"card_transactions_prediction")
.setDataframe(transactions_featuresDF)
.setVersion(1)
.write()
val latestVersionTrainDF = Hops.
getLatestTrainingDatasetVersion(
"card_transactions_prediction")
.read()

Generative Adversarial Networks
for anomaly detection

Generative Adversarial Networks cheat sheet
R: Real data
G: Generator
(forger)
D: Discriminator

GANs in anomaly detection
R: Normal
transactions
G: Random
noise
D: Discriminator
Probability
FakeReal

D: Discriminator
R: Normal
transactions
G: Imitation of
normal transactions
Probability
FakeReal
Probability
FakeReal

D: DiscriminatorAll transactions
Probability
CriminalNormal

GAN estimator API
def build_gan():
fake_input = generator_network(noise)
d_logit_real = discriminator_network(real_input)
d_logit_fake = discriminator_network(fake_input)
return d_logit_real, d_logit_fake
def gan_loss():
real_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_real, labels=tf.ones_like(d_logit_real)))
fake_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_fake, labels=tf.zeros_like(d_logit_fake)))
d_loss = real_loss + fake_loss
g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_fake, labels=tf.ones_like(d_logit_fake)))
return d_loss, g_loss
def gan_optimizer(d_loss, g_loss, d_learning_rate, opt_type):
return d_opt, g_opt

F
e
a
t
u
r
e
e
n
g
i
n
e
e
r
i
n
g
:
t
r
a
n
s
a
c
t
i
o
n
s
2
v
e
c
GAN estimator API
def model_fn():
gan_network = build_gan()
gan_optimizer = gan_optimizer()
estimator = tf.estimator.Estimator(
model_fn(),
run_config(),
params=hparams
)
experiment = tf.contrib.learn.Experiment(
estimator,
train_input_fn = train_input_fn(),
eval_input_fn = eval_input_fn()
)
experiment.train_and_evaluate()
hparams = tf.contrib.training.HParams(
n_neurons,
n_layers,
activation_fn,
batch_norm,
batch_dropout,
…,
dropout_rate,
kernel_bias_reg,
L1_rate,
L2_rate,
learning_rate,
loss_type,
opt_type
)

Proof-of-concept results:
Rule-based approach
True positives
False positives
GAN-based approach

Model Deployment (TensorFlow serving)

TF graph in map function
val path: Path = new Path(inputFrozenGraph)
val hadoopConfig: Configuration = new Configuration()
val fileSystem: FileSystem = FileSystem.get(hadoopConfig)
val reader: FSDataInputStream = fileSystem.open(path)
val length: Int = reader.available()
val graph = new Array[Byte](length)
reader.readFully(graph)
val model: TFModel = ModelFromByteArray.read (graph)
val server = TFModelServer.create (model) t
val scored_dataset: DataFrame = df.rdd.map(rdd =>
(sever.serve(RDD2Tensor(rdd), inputNodeName, outputNodeName))).toDF()

Prepare datasets
Analytical Ops framework
Research-research
platform
Production-research
platform
Production
platform

Research-research
platform
Production-research
platform
Production
platform
Capacity High Medium Low
Service level Low Medium High
Accessibility High Medium Low
Capability High Medium Low
Organization R&D R&D + Many

Analytical Ops at Swedbank v1.0
Training ScoringEvaluate
Business
acceptance
Manual task
Automated taskConfigure
model trainer
Run training
Execute data
scripts
Collect artefacts
Collect
diagnostics
Sample testingRun evaluation
Collect
diagnostics
Dashboarding
Configure model
scoring
Run model
scoring

Analytical Ops at Swedbank today
Business
acceptance
Configure
model trainer
Run training
Execute data
scripts
Collect artefacts
Collect
diagnostics
Sample testing
Run evaluation
Collect
diagnostics
Dashboarding
Configure model
scoring
Run model
scoring
Download model
artefacts
NEW
Configure
model evaluator
NEW
Collect
diagnostics
NEW
Self-service framework
component
The framework provides the
toolbox to achieve the goal, but
workflows need to be run
manually.
Automated framework
component
The framework takes care of
creating necessary workflows to
reach the desired outcome.
Business
acceptance
User Interface
A data scientist-friendly UI that
enables an easy way from
training to scoring.

Analytical Ops at Swedbank today
Self-service framework
component
The framework provides the
toolbox to achieve the goal, but
workflows need to be run
manually.
Automated framework
component
The framework takes care of
creating necessary workflows to
reach the desired outcome.
Business
acceptance
User Interface
A data scientist-friendly UI that
enables an easy way from
training to scoring.
Data Compute
■ Diagnostics
■ GPU
■ CPU■ Feature store
■ Datasets
■ Model metadata
■
■ Spark batch
■ TF serving

Conclusion
› Data science and engineering needs to go hand in hand. A push in industry towards
separating the two, but that’s not our model. P-R is the meeting place.
› Feature store. Good for decreasing redundant computation and work. One point of
truth.
› Infrastructure that enables effective hyper parameter tuning is important. GANs are
very sensitive to hyper parameter change.

Thank you.
Davit Bzhalava, Shaheer Mansoor, Erik Ekerot
Analytics & AI, Swedbank

Deep Anomaly Detection from Research to Production Leveraging Spark and Tensorflow

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Deep Anomaly Detection from Research to Production Leveraging Spark and Tensorflow

Semelhante a Deep Anomaly Detection from Research to Production Leveraging Spark and Tensorflow (20)

Mais de Databricks

Mais de Databricks (20)

Último

Último (20)

Deep Anomaly Detection from Research to Production Leveraging Spark and Tensorflow