This document discusses Swedbank's work on anomaly detection from research to production. Some key points:
- Swedbank's Analytics & AI team works on advanced analytics and AI research, delivering applications to address business needs in the short and long term.
- They describe their approach to building deep anomaly detection models using generative adversarial networks and deploying them using TensorFlow Serving.
- The team has developed an "Analytical Ops" framework to streamline the process from building models in a research environment to translating, packaging, and publishing them for production use.
- Lessons learned include the importance of joint data science and engineering efforts, using a feature store for reuse, and infrastructure to support hyperparameter
Deep Anomaly Detection from Research to Production Leveraging Spark and Tensorflow
1. Deep Anomaly Detection
From research to production
Leveraging Spark and TensorFlow
Davit Bzhalava, Shaheer Mansoor, Erik Ekerot
What’s in this talk:
We present our work on anomaly
detection: describe overall results, how
Swedbank goes from research to
production, and what we think is important
when building AI infrastructure.
5. Analytics & AI
R&D
Data exploration, advanced
analytics discovery and AI
research.
Long term.
Delivery & Service
Deliver on-demand analytics and
AI applications addressing
immediate business requirements.
Short term.
Business development
Effective support and execution of
prioritized value streams and
strategic business projects.
Medium term.
› Established in 2016, consolidating disparate analytics
functions across the organization.
› Today 30 colleagues.
› A mix of data scientists, business analysts, developers,
engineers and project managers.
› A center of excellence, promoting a data-driven mindset
in the bank.
6. Why this now?
0
10
20
30
40
50
60
70
80
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
The police Univers ities Healthc are National bank
Royal house Radio & TV The church The parliam ent
Unions Gov ernment The press Large companies
EU com mis sion Bank s Political parties
Source: Kantar SIFO, Medieakademin, Trust Barometer. Share high/quite high trust.
%
9. A platform- and framework
agnostic approach
Collect
data
Feature
engineer
Analyse
Model
Deploy
Monitor
Act
10. From R&D to production
Research-research
platform
Prepare datasets
Build and train model
Production-research
platform
Translate and package model
Production
platform
Publish and run model
11. Production-research
platform
Production
platform
Translate and package model Publish and run model
From R&D to production
Research-research
platform
Prepare datasets
Build and train model
Production-research
platform
Production
platform
Translate and package model Publish and run model
14. Why Deep Anomaly Detection?
Fraud schemes are living organisms, far from static
scenarios.
Complexity of schemes on the growth due to variety of
disruptions.
Lack of access to abundant/any recent cases.
We don’t always know what we are exactly looking for
– but we do know what we are not looking for.
15. Contrasting old and new
IF
Transactions count > 3
AND
Transactions amount > 50 000 SEK
AND
Country is on red list
THEN
Flag
model.compile(…)
model.fit(…)
model.evaluate(…)
model.predict(…)
17. F
e
a
t
u
r
e
e
n
g
i
n
e
e
r
i
n
g
:
t
r
a
n
s
a
c
t
i
o
n
s
2
v
e
c
Writing to the feature store
val transactionsFeatureDs = rawTransactionDs.map(
(rawTransaction: RawTransaction) => TransactionFeature(
amount = rawTransaction.amount,
destination = rawTransaction.destimation,
type_of_transaction = rawTransaction.type_of_transaction
))
Hops.createFeaturegroup(TRANSACTIONS_FEATUREGROUP)
.setDataframe(transactionsFeatureDs.toDF)
.setVersion(1)
.setDescription("Features of card transactions")
.setPrimaryKey("customer_id")
.write()
Reading from the feature store
val transactions_featuresDF = Hops.
getFeaturegroup(
"Features of card transactions")
.read()
Hops.
createTrainingDataset(
"card_transactions_prediction")
.setDataframe(transactions_featuresDF)
.setVersion(1)
.write()
val latestVersionTrainDF = Hops.
getLatestTrainingDatasetVersion(
"card_transactions_prediction")
.read()
26. F
e
a
t
u
r
e
e
n
g
i
n
e
e
r
i
n
g
:
t
r
a
n
s
a
c
t
i
o
n
s
2
v
e
c
GAN estimator API
def model_fn():
gan_network = build_gan()
gan_optimizer = gan_optimizer()
estimator = tf.estimator.Estimator(
model_fn(),
run_config(),
params=hparams
)
experiment = tf.contrib.learn.Experiment(
estimator,
train_input_fn = train_input_fn(),
eval_input_fn = eval_input_fn()
)
experiment.train_and_evaluate()
hparams = tf.contrib.training.HParams(
n_neurons,
n_layers,
activation_fn,
batch_norm,
batch_dropout,
…,
dropout_rate,
kernel_bias_reg,
L1_rate,
L2_rate,
learning_rate,
loss_type,
opt_type
)
27. F
e
a
t
u
r
e
e
n
g
i
n
e
e
r
i
n
g
:
t
r
a
n
s
a
c
t
i
o
n
s
2
v
e
c
GAN estimator API
def model_fn():
gan_network = build_gan()
gan_optimizer = gan_optimizer()
estimator = tf.estimator.Estimator(
model_fn(),
run_config(),
params=hparams
)
experiment = tf.contrib.learn.Experiment(
estimator,
train_input_fn = train_input_fn(),
eval_input_fn = eval_input_fn()
)
experiment.train_and_evaluate()
hparams = tf.contrib.training.HParams(
n_neurons,
n_layers,
activation_fn,
batch_norm,
batch_dropout,
…,
dropout_rate,
kernel_bias_reg,
L1_rate,
L2_rate,
learning_rate,
loss_type,
opt_type
)
30. TF graph in map function
val path: Path = new Path(inputFrozenGraph)
val hadoopConfig: Configuration = new Configuration()
val fileSystem: FileSystem = FileSystem.get(hadoopConfig)
val reader: FSDataInputStream = fileSystem.open(path)
val length: Int = reader.available()
val graph = new Array[Byte](length)
reader.readFully(graph)
val model: TFModel = ModelFromByteArray.read (graph)
val server = TFModelServer.create (model) t
val scored_dataset: DataFrame = df.rdd.map(rdd =>
(sever.serve(RDD2Tensor(rdd), inputNodeName, outputNodeName))).toDF()
31. TF graph in map function
val path: Path = new Path(inputFrozenGraph)
val hadoopConfig: Configuration = new Configuration()
val fileSystem: FileSystem = FileSystem.get(hadoopConfig)
val reader: FSDataInputStream = fileSystem.open(path)
val length: Int = reader.available()
val graph = new Array[Byte](length)
reader.readFully(graph)
val model: TFModel = ModelFromByteArray.read (graph)
val server = TFModelServer.create (model) t
val scored_dataset: DataFrame = df.rdd.map(rdd =>
(sever.serve(RDD2Tensor(rdd), inputNodeName, outputNodeName))).toDF()
36. From R&D to production
Prepare datasets
Build and train model
Translate and package model Publish and run model
Analytical Ops framework
Research-research
platform
Production-research
platform
Production
platform
37. From R&D to production
Research-research
platform
Production-research
platform
Production
platform
Capacity High Medium Low
Service level Low Medium High
Accessibility High Medium Low
Capability High Medium Low
Organization R&D R&D + Many
38. Analytical Ops at Swedbank v1.0
Training ScoringEvaluate
Business
acceptance
Manual task
Automated taskConfigure
model trainer
Run training
Execute data
scripts
Collect artefacts
Collect
diagnostics
Sample testingRun evaluation
Collect
diagnostics
Dashboarding
Configure model
scoring
Run model
scoring
39. Analytical Ops at Swedbank today
Training ScoringEvaluate
Business
acceptance
Configure
model trainer
Run training
Execute data
scripts
Collect artefacts
Collect
diagnostics
Sample testing
Run evaluation
Collect
diagnostics
Dashboarding
Configure model
scoring
Run model
scoring
Download model
artefacts
NEW
Configure
model evaluator
NEW
Collect
diagnostics
NEW
Self-service framework
component
The framework provides the
toolbox to achieve the goal, but
workflows need to be run
manually.
Automated framework
component
The framework takes care of
creating necessary workflows to
reach the desired outcome.
Training ScoringEvaluate
Business
acceptance
User Interface
A data scientist-friendly UI that
enables an easy way from
training to scoring.
40. Analytical Ops at Swedbank today
Self-service framework
component
The framework provides the
toolbox to achieve the goal, but
workflows need to be run
manually.
Automated framework
component
The framework takes care of
creating necessary workflows to
reach the desired outcome.
Training ScoringEvaluate
Business
acceptance
User Interface
A data scientist-friendly UI that
enables an easy way from
training to scoring.
Data Compute
■ Diagnostics
■ GPU
■ CPU■ Feature store
■ Datasets
■ Model metadata
■
■ Spark batch
■ TF serving
42. Conclusion
› Data science and engineering needs to go hand in hand. A push in industry towards
separating the two, but that’s not our model. P-R is the meeting place.
› Feature store. Good for decreasing redundant computation and work. One point of
truth.
› Infrastructure that enables effective hyper parameter tuning is important. GANs are
very sensitive to hyper parameter change.