SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
Privacy-Preserving Sharing
of Financial Transaction Data
with Deep Generative Models
Michael Platzer
Founder & CEO
Mostly AI
Christoph Töglhofer
George Labs
Erste Group
Big Data
➔ drives scientific progress
➔ fosters innovation
➔ improves services, and
➔ helps optimize processes
But if Data is the new Oil...
...then let’s talk side effects
Data Protection is here for a reason
My position is not that there
should be no regulation.
Mark Zuckerberg in 2018
The Privacy vs. Innovation Clash in Finance
Data privacy restricts sharing of data
and thus hampers digital innovation.
1
The Privacy vs. Innovation Clash in Finance
Data privacy restricts sharing of data
and thus hampers digital innovation.
1
Pseudonymization offers no safety, while
Full Anonymization falls short for big data.
2
Pseudonymization Fails for Big Data
Günther Baumgartner Not So Anonymous User #3289384
Classic Anonymization...
Still Not So AnonymousStill Not So Anonymous
20-30€
40-50€
Classic Anonymization Falls Short for Big Data
AnonymousAnonymous
0-30€
30-60€
0-30€
Classic Anonymization Falls Short for Big Data
i.e. for High-Dimensional, Highly Correlated Data
masking - obfuscation adding noise - perturbations
generalizationgeneralization
Simple Demographics Often Identify People Uniquely (Sweeney, 2000)
→ 87% of US citizens identified by date-of-birth, gender and ZIP
AOL search data leak (2006)
Robust De-anonymization of Large Sparse Datasets (Narayanan, 2008)
→ Partial re-identification of the Netflix dataset
“Sanitization techniques from the k-anonymity literature [..] do not provide meaningful privacy guarantees, and in any
case fail on high-dimensional data.”
The privacy bounds of human mobility (Montjoye, 2013)
→ 2 spatio-temporal points are enough to uniquely identify 55% of 1.5m people
Unique in the shopping mall: On the reidentifiability of credit card metadata (Montjoye, 2015)
→ 4 spatio-temporal points are enough to uniquely identify 90% of 1.1m people
→ “even coarse datasets provide little anonymity”
Stalking Celebrities in NYC Taxi Dataset (2014; viz)
→ Everyone’s Digital Trail is Highly Unique
The Underestimated De-Anonymization Risk
amount of captured information
amount of
shareable
information
Classic Anonymization
can only share very limited amount of
information per person
...
Growing Gap
The Problem Untapped Big Data Potential
The Solution
Synthetic Data
Synthetic Data ?
John Doe Jane Doe
hand-crafted: simplistic and biased
Synthetic Data ?
John Doe Jane Doe
hand-engineered: simplistic and biased
AI-Generated Synthetic Data !
state-of-the-art deep neural networks, trained by NVIDIA on 30’000 celebrity photos
the generated new images look like real persons, but they aren’t
AI generated = realistic and representative
Scales with Data Growth
while fully preserving privacy of actual customers
amount of captured information
amount of
shareable
information
Classic Anonymization
can only share very limited amount of
information per person
AI-generated Synthetic Data
retains structure and variation, but no 1:1
relationship to actual persons
...
...
The Synthetic Data Engine by Mostly AI
AI-generated rich synthetic worlds of
customers and their behavior
actual
synthetic
Bayesian Model of Customer Transactions (Platzer and Reutterer, 2016)
flexible MCMC estimation of parametric customer model !
Bayesian Model of Customer Transactions (Platzer and Reutterer, 2016)
But computation & model capacity did not scale to big data
Learning by Being Taught
→ Supervised Learning
Learning by Observation
→ Unsupervised learning
Learning by Exploration
→ Reinforcement Learning
Machine Learning
Generative Deep Models - VAEs
Variational Autoencoders
Encoder Decoder
actual data
Latent Space Representation
synthetic data
- VAE introduced by Kingma & Welling in 2013
- 600+ papers published on VAE in 2017
Generative Deep Models - GANs
Generative Adversarial Networks
- Introduced by Goodfellow et al. in 2014
- 1500+ papers published on GANs in 2017
Generative Deep Models - ARNs
Autoregressive Neural Networks
Synthetic Shakespeare Synthetic Linux Source Code
The Customer Story
Product Development in Finance Industry
Customer Story The Business Problem
- UX Development & Testing w/ realistic data
- platform for 3rd
party development
- feature dev: forecasting account balances
- open research collaboration with university
- deep generative model trained on 100k+ customers w/ 100m+ financial transactions
- ability to simulate an unlimited number of synthetic profiles, accounts and transactions
- results are highly realistic and representative; retain detail, structure and variation
- independent audit by bank’s analytics team: ”over-achieved”
Customer Story The Solution
Hofer
Merkur
Starbucks
Mediamarkt
Thalia
Fressnapf
AirBnB
EVN
OMV
A1
Bauhaus
Santander
Generali
Amazon
Finanzamt
...
100+ merchants
Customer Story Data Quality
Transaction Level
Billa
Spar
Kirchenbeitrag
Ikea
Libro
Kik
Drei
UPC
ORF GIS
Uniqa
ÖBB
WGKK Zahlung
Wiener Netze
Radatz
...
→ actual & synthetic patterns near identical for 100+ merchants
synthetic
actuals
Customer Story Data Quality
Customer Level
€ / month correlations
→ actual & synthetic patterns near identical for 100+ merchants
Customer Story System Setup
- on-premise, secure
- GPU environment
2x Quadro P4000 (8GB)
(training runs ~18h)
- anywhere
- anytime
- CLI / REST API
integration /
pre-processing
SDE
Training Moduleactual
data
SDE
Generator Module synthetic
data
quality
assurance
reports
target
holdout
synthetic
subset
aggregation
integration /
pre-processing
integration /
pre-processing
integration /
post-processing
deep generative model
deep generative model
architecture, encodings & weights
Customer Story Integration
Synthetic data is anonymous.
Key Takeaway
Data privacy restricts sharing of data and
thus hampers digital innovation.
1
Pseudonymization offers no safety, while
Full Anonymization falls short for big data.
2
PROBLEM
SOLUTION
1
2
Generative AI allows highly accurate
synthetic data to be generated at scale.
Christoph Töglhofer
Data Scientist
YES,WE’RE
HIRINGYES, WE’RE
HIRING
george@erstegroup.com
https://george-labs.com/
michael.platzer@mostly.ai
https://mostly.ai/
Contact Details
christoph.toeglhofer@erstegroup.com
Erste Group IT International GmbH
Michael Platzer
Founder & CEO

Mais conteúdo relacionado

Mais procurados

GraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial FraudGraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial FraudNeo4j
 
How to make business using AI and Big Data?
How to make business using AI and Big Data?How to make business using AI and Big Data?
How to make business using AI and Big Data?Ivano Digital
 
Big Data: Industry trends and key players
Big Data: Industry trends and key playersBig Data: Industry trends and key players
Big Data: Industry trends and key playersCM Research
 
The Web3 Data Economy: Ocean Protocol
The Web3 Data Economy: Ocean ProtocolThe Web3 Data Economy: Ocean Protocol
The Web3 Data Economy: Ocean ProtocolTrent McConaghy
 
29 08-2018 blockchain-introduction_digital wednesday
29 08-2018 blockchain-introduction_digital wednesday29 08-2018 blockchain-introduction_digital wednesday
29 08-2018 blockchain-introduction_digital wednesdayDigital Wednesday
 
A visual approach to fraud detection and investigation - Giuseppe Francavilla
A visual approach to fraud detection and investigation - Giuseppe FrancavillaA visual approach to fraud detection and investigation - Giuseppe Francavilla
A visual approach to fraud detection and investigation - Giuseppe FrancavillaData Driven Innovation
 
Big data asap 2016 v1
Big data asap 2016 v1Big data asap 2016 v1
Big data asap 2016 v1Eric Meger
 
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)Cisco Service Provider Mobility
 
Innovations from GAFAM, BATX threats or innovations for businesses
Innovations from GAFAM, BATX threats or innovations for businessesInnovations from GAFAM, BATX threats or innovations for businesses
Innovations from GAFAM, BATX threats or innovations for businessesEric Li
 
Fy13 performance final
Fy13 performance finalFy13 performance final
Fy13 performance finalKognitio
 
Future of value of data asia iot-asia presentation - 22 march 2018
Future of value of data asia   iot-asia presentation - 22 march 2018Future of value of data asia   iot-asia presentation - 22 march 2018
Future of value of data asia iot-asia presentation - 22 march 2018Future Agenda
 
Dwika sharing bisnis Big Data v2a IDBigData Meetup 3rd UI Jakarta
Dwika sharing  bisnis Big Data v2a IDBigData Meetup 3rd UI JakartaDwika sharing  bisnis Big Data v2a IDBigData Meetup 3rd UI Jakarta
Dwika sharing bisnis Big Data v2a IDBigData Meetup 3rd UI JakartaDwika Sudrajat
 
Big Data in Asia
Big Data in AsiaBig Data in Asia
Big Data in AsiaTom Simpson
 
Hacked: Threats, Trends and the Power of Connected Data
Hacked: Threats, Trends and the Power of Connected DataHacked: Threats, Trends and the Power of Connected Data
Hacked: Threats, Trends and the Power of Connected DataNeo4j
 
Synthetic Data for Big Data Privacy
Synthetic Data for Big Data PrivacySynthetic Data for Big Data Privacy
Synthetic Data for Big Data PrivacyMOSTLY AI
 
Present & Future of Marketing - Disruption: Social Media vs Social Networks =...
Present & Future of Marketing - Disruption: Social Media vs Social Networks =...Present & Future of Marketing - Disruption: Social Media vs Social Networks =...
Present & Future of Marketing - Disruption: Social Media vs Social Networks =...Dinis Guarda
 

Mais procurados (19)

GraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial FraudGraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
 
How to make business using AI and Big Data?
How to make business using AI and Big Data?How to make business using AI and Big Data?
How to make business using AI and Big Data?
 
Big Data: Industry trends and key players
Big Data: Industry trends and key playersBig Data: Industry trends and key players
Big Data: Industry trends and key players
 
The Web3 Data Economy: Ocean Protocol
The Web3 Data Economy: Ocean ProtocolThe Web3 Data Economy: Ocean Protocol
The Web3 Data Economy: Ocean Protocol
 
29 08-2018 blockchain-introduction_digital wednesday
29 08-2018 blockchain-introduction_digital wednesday29 08-2018 blockchain-introduction_digital wednesday
29 08-2018 blockchain-introduction_digital wednesday
 
A visual approach to fraud detection and investigation - Giuseppe Francavilla
A visual approach to fraud detection and investigation - Giuseppe FrancavillaA visual approach to fraud detection and investigation - Giuseppe Francavilla
A visual approach to fraud detection and investigation - Giuseppe Francavilla
 
Big data asap 2016 v1
Big data asap 2016 v1Big data asap 2016 v1
Big data asap 2016 v1
 
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
 
"Big Data Dreams"
"Big Data Dreams""Big Data Dreams"
"Big Data Dreams"
 
Innovations from GAFAM, BATX threats or innovations for businesses
Innovations from GAFAM, BATX threats or innovations for businessesInnovations from GAFAM, BATX threats or innovations for businesses
Innovations from GAFAM, BATX threats or innovations for businesses
 
IT FUTURE- Big data
IT FUTURE- Big dataIT FUTURE- Big data
IT FUTURE- Big data
 
Fy13 performance final
Fy13 performance finalFy13 performance final
Fy13 performance final
 
Future of value of data asia iot-asia presentation - 22 march 2018
Future of value of data asia   iot-asia presentation - 22 march 2018Future of value of data asia   iot-asia presentation - 22 march 2018
Future of value of data asia iot-asia presentation - 22 march 2018
 
Tomorrow's Data Heros
Tomorrow's Data HerosTomorrow's Data Heros
Tomorrow's Data Heros
 
Dwika sharing bisnis Big Data v2a IDBigData Meetup 3rd UI Jakarta
Dwika sharing  bisnis Big Data v2a IDBigData Meetup 3rd UI JakartaDwika sharing  bisnis Big Data v2a IDBigData Meetup 3rd UI Jakarta
Dwika sharing bisnis Big Data v2a IDBigData Meetup 3rd UI Jakarta
 
Big Data in Asia
Big Data in AsiaBig Data in Asia
Big Data in Asia
 
Hacked: Threats, Trends and the Power of Connected Data
Hacked: Threats, Trends and the Power of Connected DataHacked: Threats, Trends and the Power of Connected Data
Hacked: Threats, Trends and the Power of Connected Data
 
Synthetic Data for Big Data Privacy
Synthetic Data for Big Data PrivacySynthetic Data for Big Data Privacy
Synthetic Data for Big Data Privacy
 
Present & Future of Marketing - Disruption: Social Media vs Social Networks =...
Present & Future of Marketing - Disruption: Social Media vs Social Networks =...Present & Future of Marketing - Disruption: Social Media vs Social Networks =...
Present & Future of Marketing - Disruption: Social Media vs Social Networks =...
 

Semelhante a Nvidia GTC18 Platzer Töglhofer

APD Presents Best of the Next
APD Presents Best of the Next APD Presents Best of the Next
APD Presents Best of the Next dgmAustralia
 
Eric van Tol - Businesscases & Verdienmodellen
Eric van Tol - Businesscases & VerdienmodellenEric van Tol - Businesscases & Verdienmodellen
Eric van Tol - Businesscases & VerdienmodellenMedia Perspectives
 
BI, AI/ML, Use Cases, Business Impact and how to get started
BI, AI/ML, Use Cases, Business Impact and how to get startedBI, AI/ML, Use Cases, Business Impact and how to get started
BI, AI/ML, Use Cases, Business Impact and how to get startedKarthick S
 
Social Media Monitoring: your data with destiny
Social Media Monitoring: your data with destinySocial Media Monitoring: your data with destiny
Social Media Monitoring: your data with destinySMLXL Ltd
 
From Info Science to Data Science & Smart Nation
From Info Science to Data Science & Smart Nation From Info Science to Data Science & Smart Nation
From Info Science to Data Science & Smart Nation CK Toh
 
Keynote Sales Kickoff Interoute
Keynote Sales Kickoff InterouteKeynote Sales Kickoff Interoute
Keynote Sales Kickoff Interoute247 Invest
 
How to design ai functions to the cloud native infra
How to design ai functions to the cloud native infraHow to design ai functions to the cloud native infra
How to design ai functions to the cloud native infraChun Myung Kyu
 
Fontys Eric van Tol
Fontys Eric van TolFontys Eric van Tol
Fontys Eric van TolTalentEvent
 
Policy paper need for focussed big data & analytics skillset building throu...
Policy  paper  need for focussed big data & analytics skillset building throu...Policy  paper  need for focussed big data & analytics skillset building throu...
Policy paper need for focussed big data & analytics skillset building throu...Ritesh Shrivastava
 
South By South Best 2018
South By South Best 2018 South By South Best 2018
South By South Best 2018 James Quinlan
 
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air FranceQu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air FranceJedha Bootcamp
 
Ps095 - London College of Fashion
Ps095 - London College of FashionPs095 - London College of Fashion
Ps095 - London College of FashionIan Jindal
 
Neo4j Graph Data Platform: Making Your Data More Intelligent
Neo4j Graph Data Platform: Making Your Data More IntelligentNeo4j Graph Data Platform: Making Your Data More Intelligent
Neo4j Graph Data Platform: Making Your Data More IntelligentNeo4j
 
From Paris Hilton to Walmart: welcome to the Big Data Revolution
From Paris Hilton to Walmart: welcome to the Big Data RevolutionFrom Paris Hilton to Walmart: welcome to the Big Data Revolution
From Paris Hilton to Walmart: welcome to the Big Data RevolutionWilliam Visterin
 
Digital marketing for ninja e lazio innova giulia decina 2018
Digital marketing for ninja e lazio innova giulia decina 2018Digital marketing for ninja e lazio innova giulia decina 2018
Digital marketing for ninja e lazio innova giulia decina 2018Giulia Decina
 

Semelhante a Nvidia GTC18 Platzer Töglhofer (20)

APD Presents Best of the Next
APD Presents Best of the Next APD Presents Best of the Next
APD Presents Best of the Next
 
Eric van Tol - Businesscases & Verdienmodellen
Eric van Tol - Businesscases & VerdienmodellenEric van Tol - Businesscases & Verdienmodellen
Eric van Tol - Businesscases & Verdienmodellen
 
BI, AI/ML, Use Cases, Business Impact and how to get started
BI, AI/ML, Use Cases, Business Impact and how to get startedBI, AI/ML, Use Cases, Business Impact and how to get started
BI, AI/ML, Use Cases, Business Impact and how to get started
 
Social Media Monitoring: your data with destiny
Social Media Monitoring: your data with destinySocial Media Monitoring: your data with destiny
Social Media Monitoring: your data with destiny
 
2014 11 24 big data bmm aformationdigitale2014 v print guy huyberechts
2014 11 24 big data bmm aformationdigitale2014 v print guy huyberechts2014 11 24 big data bmm aformationdigitale2014 v print guy huyberechts
2014 11 24 big data bmm aformationdigitale2014 v print guy huyberechts
 
From Info Science to Data Science & Smart Nation
From Info Science to Data Science & Smart Nation From Info Science to Data Science & Smart Nation
From Info Science to Data Science & Smart Nation
 
Keynote Sales Kickoff Interoute
Keynote Sales Kickoff InterouteKeynote Sales Kickoff Interoute
Keynote Sales Kickoff Interoute
 
How to design ai functions to the cloud native infra
How to design ai functions to the cloud native infraHow to design ai functions to the cloud native infra
How to design ai functions to the cloud native infra
 
Trend Report dld2017
Trend Report dld2017Trend Report dld2017
Trend Report dld2017
 
Fontys Eric van Tol
Fontys Eric van TolFontys Eric van Tol
Fontys Eric van Tol
 
Policy paper need for focussed big data & analytics skillset building throu...
Policy  paper  need for focussed big data & analytics skillset building throu...Policy  paper  need for focussed big data & analytics skillset building throu...
Policy paper need for focussed big data & analytics skillset building throu...
 
South By South Best 2018
South By South Best 2018 South By South Best 2018
South By South Best 2018
 
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air FranceQu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
 
Ps095 - London College of Fashion
Ps095 - London College of FashionPs095 - London College of Fashion
Ps095 - London College of Fashion
 
Rulex big data and analytics
Rulex big data and analyticsRulex big data and analytics
Rulex big data and analytics
 
Neo4j Graph Data Platform: Making Your Data More Intelligent
Neo4j Graph Data Platform: Making Your Data More IntelligentNeo4j Graph Data Platform: Making Your Data More Intelligent
Neo4j Graph Data Platform: Making Your Data More Intelligent
 
Talk_AI_in_Africa
Talk_AI_in_AfricaTalk_AI_in_Africa
Talk_AI_in_Africa
 
From Paris Hilton to Walmart: welcome to the Big Data Revolution
From Paris Hilton to Walmart: welcome to the Big Data RevolutionFrom Paris Hilton to Walmart: welcome to the Big Data Revolution
From Paris Hilton to Walmart: welcome to the Big Data Revolution
 
Digital marketing for ninja e lazio innova giulia decina 2018
Digital marketing for ninja e lazio innova giulia decina 2018Digital marketing for ninja e lazio innova giulia decina 2018
Digital marketing for ninja e lazio innova giulia decina 2018
 
Stop Being A Third-Party Victim - Treat Your Customer Data Like A Pro - Mo Mi...
Stop Being A Third-Party Victim - Treat Your Customer Data Like A Pro - Mo Mi...Stop Being A Third-Party Victim - Treat Your Customer Data Like A Pro - Mo Mi...
Stop Being A Third-Party Victim - Treat Your Customer Data Like A Pro - Mo Mi...
 

Mais de MOSTLY AI

Everything You Always Wanted to Know About Synthetic Data
Everything You Always Wanted to Know About Synthetic DataEverything You Always Wanted to Know About Synthetic Data
Everything You Always Wanted to Know About Synthetic DataMOSTLY AI
 
Everything you always wanted to know about Synthetic Data
Everything you always wanted to know about Synthetic DataEverything you always wanted to know about Synthetic Data
Everything you always wanted to know about Synthetic DataMOSTLY AI
 
Synthetic Population Data with MOSTLY AI
Synthetic Population Data with MOSTLY AISynthetic Population Data with MOSTLY AI
Synthetic Population Data with MOSTLY AIMOSTLY AI
 
AI-based re-identification of behavioral data
AI-based re-identification of behavioral dataAI-based re-identification of behavioral data
AI-based re-identification of behavioral dataMOSTLY AI
 
Artificial Intelligence - How Machines Learn
Artificial Intelligence - How Machines LearnArtificial Intelligence - How Machines Learn
Artificial Intelligence - How Machines LearnMOSTLY AI
 
PhD Seminar Riezlern 2016
PhD Seminar Riezlern 2016PhD Seminar Riezlern 2016
PhD Seminar Riezlern 2016MOSTLY AI
 
My Entry to the DMEF CLV Contest
My Entry to the DMEF CLV ContestMy Entry to the DMEF CLV Contest
My Entry to the DMEF CLV ContestMOSTLY AI
 
Stochastic Models of Noncontractual Consumer Relationships
Stochastic Models of Noncontractual Consumer RelationshipsStochastic Models of Noncontractual Consumer Relationships
Stochastic Models of Noncontractual Consumer RelationshipsMOSTLY AI
 
Incorporating Regularity into Models of Noncontractual Customer-Firm Relation...
Incorporating Regularity into Models of Noncontractual Customer-Firm Relation...Incorporating Regularity into Models of Noncontractual Customer-Firm Relation...
Incorporating Regularity into Models of Noncontractual Customer-Firm Relation...MOSTLY AI
 

Mais de MOSTLY AI (9)

Everything You Always Wanted to Know About Synthetic Data
Everything You Always Wanted to Know About Synthetic DataEverything You Always Wanted to Know About Synthetic Data
Everything You Always Wanted to Know About Synthetic Data
 
Everything you always wanted to know about Synthetic Data
Everything you always wanted to know about Synthetic DataEverything you always wanted to know about Synthetic Data
Everything you always wanted to know about Synthetic Data
 
Synthetic Population Data with MOSTLY AI
Synthetic Population Data with MOSTLY AISynthetic Population Data with MOSTLY AI
Synthetic Population Data with MOSTLY AI
 
AI-based re-identification of behavioral data
AI-based re-identification of behavioral dataAI-based re-identification of behavioral data
AI-based re-identification of behavioral data
 
Artificial Intelligence - How Machines Learn
Artificial Intelligence - How Machines LearnArtificial Intelligence - How Machines Learn
Artificial Intelligence - How Machines Learn
 
PhD Seminar Riezlern 2016
PhD Seminar Riezlern 2016PhD Seminar Riezlern 2016
PhD Seminar Riezlern 2016
 
My Entry to the DMEF CLV Contest
My Entry to the DMEF CLV ContestMy Entry to the DMEF CLV Contest
My Entry to the DMEF CLV Contest
 
Stochastic Models of Noncontractual Consumer Relationships
Stochastic Models of Noncontractual Consumer RelationshipsStochastic Models of Noncontractual Consumer Relationships
Stochastic Models of Noncontractual Consumer Relationships
 
Incorporating Regularity into Models of Noncontractual Customer-Firm Relation...
Incorporating Regularity into Models of Noncontractual Customer-Firm Relation...Incorporating Regularity into Models of Noncontractual Customer-Firm Relation...
Incorporating Regularity into Models of Noncontractual Customer-Firm Relation...
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Último (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Nvidia GTC18 Platzer Töglhofer

  • 1. Privacy-Preserving Sharing of Financial Transaction Data with Deep Generative Models Michael Platzer Founder & CEO Mostly AI Christoph Töglhofer George Labs Erste Group
  • 2. Big Data ➔ drives scientific progress ➔ fosters innovation ➔ improves services, and ➔ helps optimize processes
  • 3. But if Data is the new Oil...
  • 4. ...then let’s talk side effects
  • 5. Data Protection is here for a reason
  • 6. My position is not that there should be no regulation. Mark Zuckerberg in 2018
  • 7. The Privacy vs. Innovation Clash in Finance Data privacy restricts sharing of data and thus hampers digital innovation. 1
  • 8. The Privacy vs. Innovation Clash in Finance Data privacy restricts sharing of data and thus hampers digital innovation. 1 Pseudonymization offers no safety, while Full Anonymization falls short for big data. 2
  • 9. Pseudonymization Fails for Big Data Günther Baumgartner Not So Anonymous User #3289384
  • 10. Classic Anonymization... Still Not So AnonymousStill Not So Anonymous 20-30€ 40-50€
  • 11. Classic Anonymization Falls Short for Big Data AnonymousAnonymous 0-30€ 30-60€ 0-30€
  • 12. Classic Anonymization Falls Short for Big Data i.e. for High-Dimensional, Highly Correlated Data masking - obfuscation adding noise - perturbations generalizationgeneralization
  • 13. Simple Demographics Often Identify People Uniquely (Sweeney, 2000) → 87% of US citizens identified by date-of-birth, gender and ZIP AOL search data leak (2006) Robust De-anonymization of Large Sparse Datasets (Narayanan, 2008) → Partial re-identification of the Netflix dataset “Sanitization techniques from the k-anonymity literature [..] do not provide meaningful privacy guarantees, and in any case fail on high-dimensional data.” The privacy bounds of human mobility (Montjoye, 2013) → 2 spatio-temporal points are enough to uniquely identify 55% of 1.5m people Unique in the shopping mall: On the reidentifiability of credit card metadata (Montjoye, 2015) → 4 spatio-temporal points are enough to uniquely identify 90% of 1.1m people → “even coarse datasets provide little anonymity” Stalking Celebrities in NYC Taxi Dataset (2014; viz) → Everyone’s Digital Trail is Highly Unique The Underestimated De-Anonymization Risk
  • 14. amount of captured information amount of shareable information Classic Anonymization can only share very limited amount of information per person ... Growing Gap The Problem Untapped Big Data Potential
  • 16. Synthetic Data ? John Doe Jane Doe hand-crafted: simplistic and biased
  • 17. Synthetic Data ? John Doe Jane Doe hand-engineered: simplistic and biased
  • 18. AI-Generated Synthetic Data ! state-of-the-art deep neural networks, trained by NVIDIA on 30’000 celebrity photos the generated new images look like real persons, but they aren’t AI generated = realistic and representative
  • 19. Scales with Data Growth while fully preserving privacy of actual customers amount of captured information amount of shareable information Classic Anonymization can only share very limited amount of information per person AI-generated Synthetic Data retains structure and variation, but no 1:1 relationship to actual persons ... ...
  • 20. The Synthetic Data Engine by Mostly AI AI-generated rich synthetic worlds of customers and their behavior actual synthetic
  • 21. Bayesian Model of Customer Transactions (Platzer and Reutterer, 2016) flexible MCMC estimation of parametric customer model !
  • 22. Bayesian Model of Customer Transactions (Platzer and Reutterer, 2016) But computation & model capacity did not scale to big data
  • 23. Learning by Being Taught → Supervised Learning Learning by Observation → Unsupervised learning Learning by Exploration → Reinforcement Learning Machine Learning
  • 24. Generative Deep Models - VAEs Variational Autoencoders Encoder Decoder actual data Latent Space Representation synthetic data - VAE introduced by Kingma & Welling in 2013 - 600+ papers published on VAE in 2017
  • 25. Generative Deep Models - GANs Generative Adversarial Networks - Introduced by Goodfellow et al. in 2014 - 1500+ papers published on GANs in 2017
  • 26. Generative Deep Models - ARNs Autoregressive Neural Networks Synthetic Shakespeare Synthetic Linux Source Code
  • 27. The Customer Story Product Development in Finance Industry
  • 28. Customer Story The Business Problem - UX Development & Testing w/ realistic data - platform for 3rd party development - feature dev: forecasting account balances - open research collaboration with university
  • 29. - deep generative model trained on 100k+ customers w/ 100m+ financial transactions - ability to simulate an unlimited number of synthetic profiles, accounts and transactions - results are highly realistic and representative; retain detail, structure and variation - independent audit by bank’s analytics team: ”over-achieved” Customer Story The Solution
  • 30. Hofer Merkur Starbucks Mediamarkt Thalia Fressnapf AirBnB EVN OMV A1 Bauhaus Santander Generali Amazon Finanzamt ... 100+ merchants Customer Story Data Quality Transaction Level Billa Spar Kirchenbeitrag Ikea Libro Kik Drei UPC ORF GIS Uniqa ÖBB WGKK Zahlung Wiener Netze Radatz ... → actual & synthetic patterns near identical for 100+ merchants synthetic actuals
  • 31. Customer Story Data Quality Customer Level € / month correlations → actual & synthetic patterns near identical for 100+ merchants
  • 32. Customer Story System Setup - on-premise, secure - GPU environment 2x Quadro P4000 (8GB) (training runs ~18h) - anywhere - anytime - CLI / REST API integration / pre-processing SDE Training Moduleactual data SDE Generator Module synthetic data quality assurance reports target holdout synthetic subset aggregation integration / pre-processing integration / pre-processing integration / post-processing deep generative model deep generative model architecture, encodings & weights
  • 34. Synthetic data is anonymous. Key Takeaway Data privacy restricts sharing of data and thus hampers digital innovation. 1 Pseudonymization offers no safety, while Full Anonymization falls short for big data. 2 PROBLEM SOLUTION 1 2 Generative AI allows highly accurate synthetic data to be generated at scale.
  • 35. Christoph Töglhofer Data Scientist YES,WE’RE HIRINGYES, WE’RE HIRING george@erstegroup.com https://george-labs.com/ michael.platzer@mostly.ai https://mostly.ai/ Contact Details christoph.toeglhofer@erstegroup.com Erste Group IT International GmbH Michael Platzer Founder & CEO