2. IT Manager at ING Analytics
Academic background: A.I. & Informatics
Working in IT since 2004
At ING since 2013
@bgeerdink https://nl.linkedin.com/in/geerdink
Bas Geerdink
4. ING is a top financial enterprise, operating since 1881
Customers
33 Million
Private, Corporate and
Institutional Customers
Countries
41
In Europe, Asia,
Australia, North and
South America
Employees
52,000
Market leaders Benelux
Growth markets
Commercial Banking
Challengers
4
6. We’re making a shift from batch to real-time
6
Secure and reliable
Relevant
Personal
Omnichannel
Predictive Actionable
7. We’re building a Streaming Analytics platform with
high-end capabilities
7
• High performance & low latency to manage and process
users’ events.
• Scalability & availability to build new features, scenarios and
integration with other external systems.
• Fault tolerance mechanism to ensure high quality output
with strong information consistency.
• Online machine learning and business rules management
capabilities.
8. • Fraud detection:
• Detect possible fraudulous transactions
• Identify possible money laundering accounts
• Actionable Insights:
• Small financial insights and robotized advise
• Customer-defined alerts and push notifications
• Offer chat/call when customers don’t finish their customer journey
• Marketing and commercial offerings (‘Next Best Action’)
Main use cases for a Streaming Analytics Platform
8
9. • Wisdom of the crowd:
• making use of similar customers to make better predictions
• Online learning:
• making use of customer feedback in real-time to train models
• Beyond banking:
• enabling IoT
Future scenarios
9
11. The Streaming Data Platform has to be probabilistic in nature
11
Deterministic Probabilistic
• Business rules / decision trees
• Fixed logic (if-then-else)
• Rigid
• Machine learning
• Fuzzy logic
• Flexible
12. Streaming Data Platform Architecture
CEP Engine Machine Learning Engine Post-Processor
“detect pattern”
“determine
relevancy”
“produce
notification”
Application Flow:
Business Flow:
“Data generator”
Formats outgoing messages
e.g. create push message in HTML format
Get
Customer
Profile
Apply
Selection
Criteria
Score
Notificatio
ns
Detect
Pattern
Get
Intermed
Event
Get Raw
Events
Send
Output
Create
Intermed
Event
Get
Business
Event
Create
Business
Event
Format
Event
“Data gatherer”
Load stream of raw events and
combine/aggregate them into a
meaningful business event
e.g. multiple transactions within one
hour lead to detection of ‘out of
money’ pattern
“Event scorer”
Scoring of PMML models to determine
the personal and relevant follow-up
actions
e.g. given customer is running out of
money, give a suggestion to transfer
money from savings to checkings
account
13. Streaming Data Platform Architecture
13
CEP Engine Machine Learning Engine Post-Processor
Raw
Event
Business
Event
Notification
Event
“detect pattern”
“determine
relevancy”
“produce
notification”
Application Flow :
Kafka Events:
State:
Business Flow:
Customer
Features
Notification
Definitions
Models
Get
Customer
Profile
Apply
Selection
Criteria
Score
Notificatio
ns
Detect
Pattern
Get
Intermed
Event
Get Raw
Events
Send
Output
Business
Users
System configuration: Machine
Learning
Environment
Configuration
GUI
(future scope) Data
Scientists
Create
Intermed
Event
Get
Business
Event
Create
Business
Event
Intermediate
Event
Format
Event
Data Lake
CEP
Functions
Feedback loop
heap
openscoring.io
15. Flink:
• originated as research project “Stratosphere” at Technische Universität Berlin
• is an open source framework for distributed, in-memory (big) data analytics
• Contains Java, Scala, and Python APIs for streams (DataStream), batch (DataSet), and
relational data (Table API)
Benefits:
• true streaming: per-event processing, no micro-batching
• high volumes, low latency
Features:
• state management and fault tolerancy: savepointing, checkpointing exactly-once
semanctics
• time windowing: event time, flexible (e.g. sum all transactions in the past minute)
• Complex Event Processing, Machine Learning, SQL (at various levels of maturity)
What is Flink, and why use it?
15
16. PMML is the de facto standard for model representation
• A PMML file contains all pre- and post-processing as well as one or more machine learning models
• Support of hot-replacing an algorithm (configuration) at runtime by just providing an updated PMML
model and interpreting (scoring) it with openscoring.io
• PMML models can be created by a large number of tools, including: Spark, SAS, R, Knime, Python, H20.ai
• Large number of supported machine learning models
PMML (Predictive Model Markup Language) bridges the gap
between data scientists and data engineers
16
Type Model support
Simple business rules • Rulesets, score cards
Complex business rules • Sequences, association rules
Predictive models • Naive Bayes
• k-Nearest Neighbors
• Regression: linear, logistic, regularized
• Trees: decision trees, random forest
• Support Vector Machines
Advanced models • Bayesian Network
• Neural Network (Deep Learning)
• Gaussian Process
Other • Time Series
• Cluster Models
Openscoring.io
17. • Cross functional squads
• business, dev, ops, data scientist
• Reusing ING standard building blocks and processes
• Continuous integration & delivery in DTAP environment
• Develop: Gitlab Jenkins Artifactory
• Using Flink’s savepointing mechanism
• No data loss, guaranteed
• Limited downtime
• Scalability, resilience and robustness
• Flink gives us automatic restart on failure, using checkpointing mechnism
• Rolling updates of infrastructure and middleware leads to no downtime
Development & operations according to the ING way of working
17
Design
Develop
TestDeploy
Monitor
18. The application, in binary form:
• provides a framework for its business users,
• gives just enough configurability to be effective but not
overcomplex
• gives flexibility in the form of a DSL and GUI (future scope)
The application’s internal components (the code, the artifacts):
• Provide a toolbox for developers
• Contain clearly defined building blocks, for maximum
reusability
• Are open source within ING
The best of both worlds – Functional and Modular Flexibility
18
19. Squad
In ING’s one way of working, ‘business’ and ‘IT’ go hand-in-hand
SquadSquad
Chapter
Guild
DS
DA
OpsOps
CJE
DS
DA
DS
CJE
DS
Ops
DS
CJE
Dev DevDevDevDev
Tribe
21. Bigger streaming data volumes to handle, with higher performance demands
Enterprise-readyness and security of Flink is continuously improving
With PMML we have a standard format that is very well suited for multidisciplinary
teams, where data scientists work on the same user stories as engineers
We set up a streaming analytics community within ING, and work together with industry
leaders (Lightbend, DataArtisans)
ING build a streaming analytics platform is as a generic, multi-purpose solution with
fraud detection and actionable insights as it first use cases
Why is Streaming Analytics with Flink the right choice for ING,
moving forward?
21
23. Follow us to stay a step ahead
We’re hiring!
ING.com
YouTube.com/ING
SlideShare.net/ING@ING_News LinkedIn.com/company/ING
Flickr.com/INGGroupFacebook.com/ING