SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
Deep Anomaly Detection
From research to production
Leveraging Spark and TensorFlow
Davit Bzhalava, Shaheer Mansoor, Erik Ekerot
What’s in this talk:
We present our work on anomaly
detection: describe overall results, how
Swedbank goes from research to
production, and what we think is important
when building AI infrastructure.
Introduction
Sweden
Population: 10.2 million
Private customers: 4.0 million
Corporate customers: 335 000
Branches: 248
Employees: 8 600
Estonia
Population: 1.3 million
Private customers: 0.9 million
Corporate customers: 141 000
Branches: 35
Employees: 2 600
Latvia
Population: 2.0 million
Private customers: 0.9 million
Corporate customers: 91 000
Branches: 41
Employees: 1 800
Lithuania
Population: 2.9 million
Private customers: 1.5 million
Corporate customers: 86 000
Branches: 65
Employees: 2 500
The leading bank in our home markets
Swedbank HQ, Stockholm
Analytics & AI
R&D
Data exploration, advanced
analytics discovery and AI
research.
Long term.
Delivery & Service
Deliver on-demand analytics and
AI applications addressing
immediate business requirements.
Short term.
Business development
Effective support and execution of
prioritized value streams and
strategic business projects.
Medium term.
› Established in 2016, consolidating disparate analytics
functions across the organization.
› Today 30 colleagues.
› A mix of data scientists, business analysts, developers,
engineers and project managers.
› A center of excellence, promoting a data-driven mindset
in the bank.
Why this now?
0
10
20
30
40
50
60
70
80
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
The police Univers ities Healthc are National bank
Royal house Radio & TV The church The parliam ent
Unions Gov ernment The press Large companies
EU com mis sion Bank s Political parties
Source: Kantar SIFO, Medieakademin, Trust Barometer. Share high/quite high trust.
%
Our approach to data science
A multi-lingual team
Data
scientists
Python
Statisticians
R
Data
analysts
Python
SQL
Machine learning
engineers
Java
Scala
A platform- and framework
agnostic approach
Collect
data
Feature
engineer
Analyse
Model
Deploy
Monitor
Act
From R&D to production
Research-research
platform
Prepare datasets
Build and train model
Production-research
platform
Translate and package model
Production
platform
Publish and run model
Production-research
platform
Production
platform
Translate and package model Publish and run model
From R&D to production
Research-research
platform
Prepare datasets
Build and train model
Production-research
platform
Production
platform
Translate and package model Publish and run model
Anomaly detection
Why Deep Anomaly Detection?
Fraud schemes are living organisms, far from static
scenarios.
Complexity of schemes on the growth due to variety of
disruptions.
Lack of access to abundant/any recent cases.
We don’t always know what we are exactly looking for
– but we do know what we are not looking for.
Contrasting old and new
IF
Transactions count > 3
AND
Transactions amount > 50 000 SEK
AND
Country is on red list
THEN
Flag
model.compile(…)
model.fit(…)
model.evaluate(…)
model.predict(…)
Feature engineering
F
e
a
t
u
r
e
e
n
g
i
n
e
e
r
i
n
g
:
t
r
a
n
s
a
c
t
i
o
n
s
2
v
e
c
Writing to the feature store
val transactionsFeatureDs = rawTransactionDs.map(
(rawTransaction: RawTransaction) => TransactionFeature(
amount = rawTransaction.amount,
destination = rawTransaction.destimation,
type_of_transaction = rawTransaction.type_of_transaction
))
Hops.createFeaturegroup(TRANSACTIONS_FEATUREGROUP)
.setDataframe(transactionsFeatureDs.toDF)
.setVersion(1)
.setDescription("Features of card transactions")
.setPrimaryKey("customer_id")
.write()
Reading from the feature store
val transactions_featuresDF = Hops.
getFeaturegroup(
"Features of card transactions")
.read()
Hops.
createTrainingDataset(
"card_transactions_prediction")
.setDataframe(transactions_featuresDF)
.setVersion(1)
.write()
val latestVersionTrainDF = Hops.
getLatestTrainingDatasetVersion(
"card_transactions_prediction")
.read()
Generative Adversarial Networks
for anomaly detection
Generative Adversarial Networks cheat sheet
R: Real data
G: Generator
(forger)
D: Discriminator
GANs in anomaly detection
R: Normal
transactions
G: Random
noise
D: Discriminator
Probability
FakeReal
GANs in anomaly detection
D: Discriminator
R: Normal
transactions
G: Imitation of
normal transactions
Probability
FakeReal
Probability
FakeReal
GANs in anomaly detection
D: DiscriminatorAll transactions
Probability
CriminalNormal
GAN estimator API
def build_gan():
fake_input = generator_network(noise)
d_logit_real = discriminator_network(real_input)
d_logit_fake = discriminator_network(fake_input)
return d_logit_real, d_logit_fake
def gan_loss():
real_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_real, labels=tf.ones_like(d_logit_real)))
fake_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_fake, labels=tf.zeros_like(d_logit_fake)))
d_loss = real_loss + fake_loss
g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_fake, labels=tf.ones_like(d_logit_fake)))
return d_loss, g_loss
def gan_optimizer(d_loss, g_loss, d_learning_rate, opt_type):
return d_opt, g_opt
GAN estimator API
def build_gan():
fake_input = generator_network(noise)
d_logit_real = discriminator_network(real_input)
d_logit_fake = discriminator_network(fake_input)
return d_logit_real, d_logit_fake
def gan_loss():
real_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_real, labels=tf.ones_like(d_logit_real)))
fake_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_fake, labels=tf.zeros_like(d_logit_fake)))
d_loss = real_loss + fake_loss
g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_fake, labels=tf.ones_like(d_logit_fake)))
return d_loss, g_loss
def gan_optimizer(d_loss, g_loss, d_learning_rate, opt_type):
return d_opt, g_opt
GAN estimator API
def build_gan():
fake_input = generator_network(noise)
d_logit_real = discriminator_network(real_input)
d_logit_fake = discriminator_network(fake_input)
return d_logit_real, d_logit_fake
def gan_loss():
real_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_real, labels=tf.ones_like(d_logit_real)))
fake_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_fake, labels=tf.zeros_like(d_logit_fake)))
d_loss = real_loss + fake_loss
g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_fake, labels=tf.ones_like(d_logit_fake)))
return d_loss, g_loss
def gan_optimizer(d_loss, g_loss, d_learning_rate, opt_type):
return d_opt, g_opt
F
e
a
t
u
r
e
e
n
g
i
n
e
e
r
i
n
g
:
t
r
a
n
s
a
c
t
i
o
n
s
2
v
e
c
GAN estimator API
def model_fn():
gan_network = build_gan()
gan_optimizer = gan_optimizer()
estimator = tf.estimator.Estimator(
model_fn(),
run_config(),
params=hparams
)
experiment = tf.contrib.learn.Experiment(
estimator,
train_input_fn = train_input_fn(),
eval_input_fn = eval_input_fn()
)
experiment.train_and_evaluate()
hparams = tf.contrib.training.HParams(
n_neurons,
n_layers,
activation_fn,
batch_norm,
batch_dropout,
…,
dropout_rate,
kernel_bias_reg,
L1_rate,
L2_rate,
learning_rate,
loss_type,
opt_type
)
F
e
a
t
u
r
e
e
n
g
i
n
e
e
r
i
n
g
:
t
r
a
n
s
a
c
t
i
o
n
s
2
v
e
c
GAN estimator API
def model_fn():
gan_network = build_gan()
gan_optimizer = gan_optimizer()
estimator = tf.estimator.Estimator(
model_fn(),
run_config(),
params=hparams
)
experiment = tf.contrib.learn.Experiment(
estimator,
train_input_fn = train_input_fn(),
eval_input_fn = eval_input_fn()
)
experiment.train_and_evaluate()
hparams = tf.contrib.training.HParams(
n_neurons,
n_layers,
activation_fn,
batch_norm,
batch_dropout,
…,
dropout_rate,
kernel_bias_reg,
L1_rate,
L2_rate,
learning_rate,
loss_type,
opt_type
)
Proof-of-concept results:
Rule-based approach
True positives
False positives
GAN-based approach
Model Deployment (TensorFlow serving)
TF graph in map function
val path: Path = new Path(inputFrozenGraph)
val hadoopConfig: Configuration = new Configuration()
val fileSystem: FileSystem = FileSystem.get(hadoopConfig)
val reader: FSDataInputStream = fileSystem.open(path)
val length: Int = reader.available()
val graph = new Array[Byte](length)
reader.readFully(graph)
val model: TFModel = ModelFromByteArray.read (graph)
val server = TFModelServer.create (model) t
val scored_dataset: DataFrame = df.rdd.map(rdd =>
(sever.serve(RDD2Tensor(rdd), inputNodeName, outputNodeName))).toDF()
TF graph in map function
val path: Path = new Path(inputFrozenGraph)
val hadoopConfig: Configuration = new Configuration()
val fileSystem: FileSystem = FileSystem.get(hadoopConfig)
val reader: FSDataInputStream = fileSystem.open(path)
val length: Int = reader.available()
val graph = new Array[Byte](length)
reader.readFully(graph)
val model: TFModel = ModelFromByteArray.read (graph)
val server = TFModelServer.create (model) t
val scored_dataset: DataFrame = df.rdd.map(rdd =>
(sever.serve(RDD2Tensor(rdd), inputNodeName, outputNodeName))).toDF()
Analytical Ops
From R&D to production
Prepare datasets
Build and train model
Translate and package model Publish and run model
Analytical Ops framework
Research-research
platform
Production-research
platform
Production
platform
From R&D to production
Research-research
platform
Production-research
platform
Production
platform
Capacity High Medium Low
Service level Low Medium High
Accessibility High Medium Low
Capability High Medium Low
Organization R&D R&D + Many
Analytical Ops at Swedbank v1.0
Training ScoringEvaluate
Business
acceptance
Manual task
Automated taskConfigure
model trainer
Run training
Execute data
scripts
Collect artefacts
Collect
diagnostics
Sample testingRun evaluation
Collect
diagnostics
Dashboarding
Configure model
scoring
Run model
scoring
Analytical Ops at Swedbank today
Training ScoringEvaluate
Business
acceptance
Configure
model trainer
Run training
Execute data
scripts
Collect artefacts
Collect
diagnostics
Sample testing
Run evaluation
Collect
diagnostics
Dashboarding
Configure model
scoring
Run model
scoring
Download model
artefacts
NEW
Configure
model evaluator
NEW
Collect
diagnostics
NEW
Self-service framework
component
The framework provides the
toolbox to achieve the goal, but
workflows need to be run
manually.
Automated framework
component
The framework takes care of
creating necessary workflows to
reach the desired outcome.
Training ScoringEvaluate
Business
acceptance
User Interface
A data scientist-friendly UI that
enables an easy way from
training to scoring.
Analytical Ops at Swedbank today
Self-service framework
component
The framework provides the
toolbox to achieve the goal, but
workflows need to be run
manually.
Automated framework
component
The framework takes care of
creating necessary workflows to
reach the desired outcome.
Training ScoringEvaluate
Business
acceptance
User Interface
A data scientist-friendly UI that
enables an easy way from
training to scoring.
Data Compute
■ Diagnostics
■ GPU
■ CPU■ Feature store
■ Datasets
■ Model metadata
■
■ Spark batch
■ TF serving
Learnings
Conclusion
› Data science and engineering needs to go hand in hand. A push in industry towards
separating the two, but that’s not our model. P-R is the meeting place.
› Feature store. Good for decreasing redundant computation and work. One point of
truth.
› Infrastructure that enables effective hyper parameter tuning is important. GANs are
very sensitive to hyper parameter change.
Thank you.
Davit Bzhalava, Shaheer Mansoor, Erik Ekerot
Analytics & AI, Swedbank

Mais conteúdo relacionado

Mais procurados

Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!DataWorks Summit
 
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...Big Data Spain
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...Herman Wu
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerSpark Summit
 
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Turi, Inc.
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsJen Aman
 
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...Turi, Inc.
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Spark Summit
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkPetr Zapletal
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonMOHITKUMAR1379
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDatabricks
 
Latent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkLatent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkSandy Ryza
 
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Spark Summit
 
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlowTensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlowDatabricks
 

Mais procurados (20)

Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
 
Deploying Machine Learning Models to Production
Deploying Machine Learning Models to ProductionDeploying Machine Learning Models to Production
Deploying Machine Learning Models to Production
 
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
 
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable Statistics
 
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on Spark
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
 
Latent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkLatent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with Spark
 
Data visualization
Data visualizationData visualization
Data visualization
 
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
 
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlowTensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
 

Semelhante a Deep Anomaly Detection from Research to Production Leveraging Spark and Tensorflow

Types Working for You, Not Against You
Types Working for You, Not Against YouTypes Working for You, Not Against You
Types Working for You, Not Against YouC4Media
 
Social media analytics using Azure Technologies
Social media analytics using Azure TechnologiesSocial media analytics using Azure Technologies
Social media analytics using Azure TechnologiesKoray Kocabas
 
Balancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem FormulationBalancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem FormulationAlex D. Gaudio
 
Making the Most of Customer Data
Making the Most of Customer DataMaking the Most of Customer Data
Making the Most of Customer DataWSO2
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave ClubJoining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave ClubData Con LA
 
Linked data business models
Linked data business modelsLinked data business models
Linked data business modelsJesus Contreras
 
Scikit-learn for text mining at Jurismarchés
Scikit-learn for text mining at JurismarchésScikit-learn for text mining at Jurismarchés
Scikit-learn for text mining at JurismarchésPyDataParis
 
Beyond php it's not (just) about the code
Beyond php   it's not (just) about the codeBeyond php   it's not (just) about the code
Beyond php it's not (just) about the codeWim Godden
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
[GDSC-GNIOT] Google Cloud Study Jams Day 2- Cloud AI GenAI Overview.pptx
[GDSC-GNIOT] Google Cloud Study Jams Day 2- Cloud AI GenAI Overview.pptx[GDSC-GNIOT] Google Cloud Study Jams Day 2- Cloud AI GenAI Overview.pptx
[GDSC-GNIOT] Google Cloud Study Jams Day 2- Cloud AI GenAI Overview.pptxOWAISSALAUDDINKHAN
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaChetan Khatri
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupJim Dowling
 
The Dynamic Language is not Enough
The Dynamic Language is not EnoughThe Dynamic Language is not Enough
The Dynamic Language is not EnoughLukas Renggli
 
Velocity Conference - What do cats and APIs have in common? They are both awe...
Velocity Conference - What do cats and APIs have in common? They are both awe...Velocity Conference - What do cats and APIs have in common? They are both awe...
Velocity Conference - What do cats and APIs have in common? They are both awe...Stephen Fishman
 
Brownfield Domain Driven Design
Brownfield Domain Driven DesignBrownfield Domain Driven Design
Brownfield Domain Driven DesignNicolò Pignatelli
 
@RISK Unchained Webinar
@RISK Unchained Webinar@RISK Unchained Webinar
@RISK Unchained WebinarAndrew Sich
 
Tokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTrent McConaghy
 

Semelhante a Deep Anomaly Detection from Research to Production Leveraging Spark and Tensorflow (20)

Types Working for You, Not Against You
Types Working for You, Not Against YouTypes Working for You, Not Against You
Types Working for You, Not Against You
 
F sharp - an overview
F sharp - an overviewF sharp - an overview
F sharp - an overview
 
Social media analytics using Azure Technologies
Social media analytics using Azure TechnologiesSocial media analytics using Azure Technologies
Social media analytics using Azure Technologies
 
Balancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem FormulationBalancing Infrastructure with Optimization and Problem Formulation
Balancing Infrastructure with Optimization and Problem Formulation
 
Making the Most of Customer Data
Making the Most of Customer DataMaking the Most of Customer Data
Making the Most of Customer Data
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave ClubJoining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
 
Linked data business models
Linked data business modelsLinked data business models
Linked data business models
 
Scikit-learn for text mining at Jurismarchés
Scikit-learn for text mining at JurismarchésScikit-learn for text mining at Jurismarchés
Scikit-learn for text mining at Jurismarchés
 
Beyond php it's not (just) about the code
Beyond php   it's not (just) about the codeBeyond php   it's not (just) about the code
Beyond php it's not (just) about the code
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
[GDSC-GNIOT] Google Cloud Study Jams Day 2- Cloud AI GenAI Overview.pptx
[GDSC-GNIOT] Google Cloud Study Jams Day 2- Cloud AI GenAI Overview.pptx[GDSC-GNIOT] Google Cloud Study Jams Day 2- Cloud AI GenAI Overview.pptx
[GDSC-GNIOT] Google Cloud Study Jams Day 2- Cloud AI GenAI Overview.pptx
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
The Dynamic Language is not Enough
The Dynamic Language is not EnoughThe Dynamic Language is not Enough
The Dynamic Language is not Enough
 
Velocity Conference - What do cats and APIs have in common? They are both awe...
Velocity Conference - What do cats and APIs have in common? They are both awe...Velocity Conference - What do cats and APIs have in common? They are both awe...
Velocity Conference - What do cats and APIs have in common? They are both awe...
 
Brownfield Domain Driven Design
Brownfield Domain Driven DesignBrownfield Domain Driven Design
Brownfield Domain Driven Design
 
@RISK Unchained Webinar
@RISK Unchained Webinar@RISK Unchained Webinar
@RISK Unchained Webinar
 
Tokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTokens, Complex Systems, and Nature
Tokens, Complex Systems, and Nature
 

Mais de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 

Último (20)

convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 

Deep Anomaly Detection from Research to Production Leveraging Spark and Tensorflow

  • 1. Deep Anomaly Detection From research to production Leveraging Spark and TensorFlow Davit Bzhalava, Shaheer Mansoor, Erik Ekerot What’s in this talk: We present our work on anomaly detection: describe overall results, how Swedbank goes from research to production, and what we think is important when building AI infrastructure.
  • 3. Sweden Population: 10.2 million Private customers: 4.0 million Corporate customers: 335 000 Branches: 248 Employees: 8 600 Estonia Population: 1.3 million Private customers: 0.9 million Corporate customers: 141 000 Branches: 35 Employees: 2 600 Latvia Population: 2.0 million Private customers: 0.9 million Corporate customers: 91 000 Branches: 41 Employees: 1 800 Lithuania Population: 2.9 million Private customers: 1.5 million Corporate customers: 86 000 Branches: 65 Employees: 2 500 The leading bank in our home markets
  • 5. Analytics & AI R&D Data exploration, advanced analytics discovery and AI research. Long term. Delivery & Service Deliver on-demand analytics and AI applications addressing immediate business requirements. Short term. Business development Effective support and execution of prioritized value streams and strategic business projects. Medium term. › Established in 2016, consolidating disparate analytics functions across the organization. › Today 30 colleagues. › A mix of data scientists, business analysts, developers, engineers and project managers. › A center of excellence, promoting a data-driven mindset in the bank.
  • 6. Why this now? 0 10 20 30 40 50 60 70 80 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 The police Univers ities Healthc are National bank Royal house Radio & TV The church The parliam ent Unions Gov ernment The press Large companies EU com mis sion Bank s Political parties Source: Kantar SIFO, Medieakademin, Trust Barometer. Share high/quite high trust. %
  • 7. Our approach to data science
  • 9. A platform- and framework agnostic approach Collect data Feature engineer Analyse Model Deploy Monitor Act
  • 10. From R&D to production Research-research platform Prepare datasets Build and train model Production-research platform Translate and package model Production platform Publish and run model
  • 11. Production-research platform Production platform Translate and package model Publish and run model From R&D to production Research-research platform Prepare datasets Build and train model Production-research platform Production platform Translate and package model Publish and run model
  • 13.
  • 14. Why Deep Anomaly Detection? Fraud schemes are living organisms, far from static scenarios. Complexity of schemes on the growth due to variety of disruptions. Lack of access to abundant/any recent cases. We don’t always know what we are exactly looking for – but we do know what we are not looking for.
  • 15. Contrasting old and new IF Transactions count > 3 AND Transactions amount > 50 000 SEK AND Country is on red list THEN Flag model.compile(…) model.fit(…) model.evaluate(…) model.predict(…)
  • 17. F e a t u r e e n g i n e e r i n g : t r a n s a c t i o n s 2 v e c Writing to the feature store val transactionsFeatureDs = rawTransactionDs.map( (rawTransaction: RawTransaction) => TransactionFeature( amount = rawTransaction.amount, destination = rawTransaction.destimation, type_of_transaction = rawTransaction.type_of_transaction )) Hops.createFeaturegroup(TRANSACTIONS_FEATUREGROUP) .setDataframe(transactionsFeatureDs.toDF) .setVersion(1) .setDescription("Features of card transactions") .setPrimaryKey("customer_id") .write() Reading from the feature store val transactions_featuresDF = Hops. getFeaturegroup( "Features of card transactions") .read() Hops. createTrainingDataset( "card_transactions_prediction") .setDataframe(transactions_featuresDF) .setVersion(1) .write() val latestVersionTrainDF = Hops. getLatestTrainingDatasetVersion( "card_transactions_prediction") .read()
  • 19. Generative Adversarial Networks cheat sheet R: Real data G: Generator (forger) D: Discriminator
  • 20. GANs in anomaly detection R: Normal transactions G: Random noise D: Discriminator Probability FakeReal
  • 21. GANs in anomaly detection D: Discriminator R: Normal transactions G: Imitation of normal transactions Probability FakeReal Probability FakeReal
  • 22. GANs in anomaly detection D: DiscriminatorAll transactions Probability CriminalNormal
  • 23. GAN estimator API def build_gan(): fake_input = generator_network(noise) d_logit_real = discriminator_network(real_input) d_logit_fake = discriminator_network(fake_input) return d_logit_real, d_logit_fake def gan_loss(): real_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_real, labels=tf.ones_like(d_logit_real))) fake_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_fake, labels=tf.zeros_like(d_logit_fake))) d_loss = real_loss + fake_loss g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_fake, labels=tf.ones_like(d_logit_fake))) return d_loss, g_loss def gan_optimizer(d_loss, g_loss, d_learning_rate, opt_type): return d_opt, g_opt
  • 24. GAN estimator API def build_gan(): fake_input = generator_network(noise) d_logit_real = discriminator_network(real_input) d_logit_fake = discriminator_network(fake_input) return d_logit_real, d_logit_fake def gan_loss(): real_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_real, labels=tf.ones_like(d_logit_real))) fake_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_fake, labels=tf.zeros_like(d_logit_fake))) d_loss = real_loss + fake_loss g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_fake, labels=tf.ones_like(d_logit_fake))) return d_loss, g_loss def gan_optimizer(d_loss, g_loss, d_learning_rate, opt_type): return d_opt, g_opt
  • 25. GAN estimator API def build_gan(): fake_input = generator_network(noise) d_logit_real = discriminator_network(real_input) d_logit_fake = discriminator_network(fake_input) return d_logit_real, d_logit_fake def gan_loss(): real_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_real, labels=tf.ones_like(d_logit_real))) fake_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_fake, labels=tf.zeros_like(d_logit_fake))) d_loss = real_loss + fake_loss g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logit_fake, labels=tf.ones_like(d_logit_fake))) return d_loss, g_loss def gan_optimizer(d_loss, g_loss, d_learning_rate, opt_type): return d_opt, g_opt
  • 26. F e a t u r e e n g i n e e r i n g : t r a n s a c t i o n s 2 v e c GAN estimator API def model_fn(): gan_network = build_gan() gan_optimizer = gan_optimizer() estimator = tf.estimator.Estimator( model_fn(), run_config(), params=hparams ) experiment = tf.contrib.learn.Experiment( estimator, train_input_fn = train_input_fn(), eval_input_fn = eval_input_fn() ) experiment.train_and_evaluate() hparams = tf.contrib.training.HParams( n_neurons, n_layers, activation_fn, batch_norm, batch_dropout, …, dropout_rate, kernel_bias_reg, L1_rate, L2_rate, learning_rate, loss_type, opt_type )
  • 27. F e a t u r e e n g i n e e r i n g : t r a n s a c t i o n s 2 v e c GAN estimator API def model_fn(): gan_network = build_gan() gan_optimizer = gan_optimizer() estimator = tf.estimator.Estimator( model_fn(), run_config(), params=hparams ) experiment = tf.contrib.learn.Experiment( estimator, train_input_fn = train_input_fn(), eval_input_fn = eval_input_fn() ) experiment.train_and_evaluate() hparams = tf.contrib.training.HParams( n_neurons, n_layers, activation_fn, batch_norm, batch_dropout, …, dropout_rate, kernel_bias_reg, L1_rate, L2_rate, learning_rate, loss_type, opt_type )
  • 28. Proof-of-concept results: Rule-based approach True positives False positives GAN-based approach
  • 30. TF graph in map function val path: Path = new Path(inputFrozenGraph) val hadoopConfig: Configuration = new Configuration() val fileSystem: FileSystem = FileSystem.get(hadoopConfig) val reader: FSDataInputStream = fileSystem.open(path) val length: Int = reader.available() val graph = new Array[Byte](length) reader.readFully(graph) val model: TFModel = ModelFromByteArray.read (graph) val server = TFModelServer.create (model) t val scored_dataset: DataFrame = df.rdd.map(rdd => (sever.serve(RDD2Tensor(rdd), inputNodeName, outputNodeName))).toDF()
  • 31. TF graph in map function val path: Path = new Path(inputFrozenGraph) val hadoopConfig: Configuration = new Configuration() val fileSystem: FileSystem = FileSystem.get(hadoopConfig) val reader: FSDataInputStream = fileSystem.open(path) val length: Int = reader.available() val graph = new Array[Byte](length) reader.readFully(graph) val model: TFModel = ModelFromByteArray.read (graph) val server = TFModelServer.create (model) t val scored_dataset: DataFrame = df.rdd.map(rdd => (sever.serve(RDD2Tensor(rdd), inputNodeName, outputNodeName))).toDF()
  • 32.
  • 33.
  • 34.
  • 36. From R&D to production Prepare datasets Build and train model Translate and package model Publish and run model Analytical Ops framework Research-research platform Production-research platform Production platform
  • 37. From R&D to production Research-research platform Production-research platform Production platform Capacity High Medium Low Service level Low Medium High Accessibility High Medium Low Capability High Medium Low Organization R&D R&D + Many
  • 38. Analytical Ops at Swedbank v1.0 Training ScoringEvaluate Business acceptance Manual task Automated taskConfigure model trainer Run training Execute data scripts Collect artefacts Collect diagnostics Sample testingRun evaluation Collect diagnostics Dashboarding Configure model scoring Run model scoring
  • 39. Analytical Ops at Swedbank today Training ScoringEvaluate Business acceptance Configure model trainer Run training Execute data scripts Collect artefacts Collect diagnostics Sample testing Run evaluation Collect diagnostics Dashboarding Configure model scoring Run model scoring Download model artefacts NEW Configure model evaluator NEW Collect diagnostics NEW Self-service framework component The framework provides the toolbox to achieve the goal, but workflows need to be run manually. Automated framework component The framework takes care of creating necessary workflows to reach the desired outcome. Training ScoringEvaluate Business acceptance User Interface A data scientist-friendly UI that enables an easy way from training to scoring.
  • 40. Analytical Ops at Swedbank today Self-service framework component The framework provides the toolbox to achieve the goal, but workflows need to be run manually. Automated framework component The framework takes care of creating necessary workflows to reach the desired outcome. Training ScoringEvaluate Business acceptance User Interface A data scientist-friendly UI that enables an easy way from training to scoring. Data Compute ■ Diagnostics ■ GPU ■ CPU■ Feature store ■ Datasets ■ Model metadata ■ ■ Spark batch ■ TF serving
  • 42. Conclusion › Data science and engineering needs to go hand in hand. A push in industry towards separating the two, but that’s not our model. P-R is the meeting place. › Feature store. Good for decreasing redundant computation and work. One point of truth. › Infrastructure that enables effective hyper parameter tuning is important. GANs are very sensitive to hyper parameter change.
  • 43. Thank you. Davit Bzhalava, Shaheer Mansoor, Erik Ekerot Analytics & AI, Swedbank