Improving business performance is never easy! The Natixis Pack is like Rugby. Working together is key to scrum success. Our data journey would undoubtedly have been so much more difficult if we had not made the move together.
This session is the story of how ‘The Natixis Pack’ has driven change in its current IT architecture so that legacy systems can leverage some of the many components in Hortonworks Data Platform in order to improve the performance of business applications. During this session, you will hear:
• How and why the business and IT requirements originated
• How we leverage the platform to fulfill security and production requirements
• How we organize a community to:
o Guard all the players, no one gets left on the ground!
o Us the platform appropriately (Not every problem is eligible for Big Data and standard databases are not dead)
• What are the most usable, the most interesting and the most promising technologies in the Apache Hadoop community
We will finish the story of a successful rugby team with insight into the special skills needed from each player to win the match!
DETAILS
This session is part business, part technical. We will talk about infrastructure, security and project management as well as the industrial usage of Hive, HBase, Kafka, and Spark within an industrial Corporate and Investment Bank environment, framed by regulatory constraints.
1. How did HADOOP make the NATIXIS PACK more
efficient ?
a short story
by
Front Office, PnL, Risks and Finance
Pierre Alexandre PAUTRAT, Cyril MONTAGNON, EmmanuelVAIE
Dataworks Summit Munich 2017 April the 6 th 2017 #DWS17
2. THOW DID HADOOP MAKE THE NATIXIS PACK MORE EFFICIENT ?2
HOW DID WE MEET HADOOP ?
A GAME PHASE:
the Front Office, Pnl,the Risks and Finance
departments
HAKAS AND OPENINGS
TAKE AWAY
Q&A
LET US PLAY
THE MATCH
AGAIN
1
2
3
4
5
3. How did we meet Hadoop ?
HOW DID HADOOP MAKE THE NATIXIS PACK MORE EFFICIENT ?3
Experiments done by two separate departments in 2014
Credit card analysis POC and site creation for Marketing (Internal Anonymized Test Only)
• Hadoop, Elastic Search, Kibana
NoSQL Persistence for simulated profits and losses by the Market Risks Department
• HBASE
Informal exchanges between the Front Office and the IT Risks by the end of 2014
4. How did we meet Hadoop ?
HOW DID HADOOP MAKE THE NATIXIS PACK MORE EFFICIENT ?4
“Big DataThursday”: an open meeting with a positive mood !
June 2015: we built a first Platform - secured as a Production platform should be - host our DEV !
Target: Go live for a PROD platform Summer 2016,
if Pilots projects were to be OK
if sharing a platform was ok for everybody (FO and IT Risks)
January 2016: First project results accelerate the decision to move forward - especially for
regulatory hot topics
5. STACK
HDFS
HBASE
HIVE
KERBEROS
QUICK BI
DATA
INTEGRATION
SQOOP
RANGER
WORKFLOWS
AMBARI
TRAINING
BACKUP AND CONTINUITY PLAN
PYTHON
STANDARDS
SCHEDULERS
SPARK
BACKUPS
R
DATA GOVERNANCE
ATLAS
WHEREHOWS
COLLIBRA…
BILLING
PLAFORM
GOVERNANCE
COMMITEE
PRODUCTION PLATFORM
SPONSOR
HADOOP
COMMITERS
SCALA
JAVA
SPARK ML PHOENIX
KAFKA
INTEGRATION WITH
AUTHORIZATION 2017 MARCH
PRODUCTION VERSION
NATIXIS
2.5.3
LLAP
ZEPPELIN
AUTOMATIC
SSO
2S 2014
INDEXIMA
ANACONDA
ARCHIVING CLUSTER
Our Technical Journey
6. A game phase:
The Front Office, Pnl,The Risks and Finance departments ecosystem
HOW DID HADOOP MAKE THE NATIXIS PACK MORE EFFICIENT ?6
Regulatory evolutions
RIM (Regulatory Initial Margin)
• the initial amount for your loan !
FRTB (Fundamental Review of the Trading Book)
• the new vision of the Market Risk for the ECB
7. A game phase:
The Front Office, Pnl,The Risks and Finance departments ecosystem
HOW DID HADOOP MAKE THE NATIXIS PACK MORE EFFICIENT ?7
More efficient
Stop processing data in a sequential way
Do not waste time in transferring data from one NAS to another
Go beyond the limit of the (usual) monolithic and centralized systems
Process data where it is in a common and secured place-> HDFS
Precise and secured synchronization -> KAFKA
NoSQL persistence versus Standard SQL -> HBASE
Connecting the BIG DATA universe to the BIG COMPUTE paradigm
AddedValue: making Golden Sources available on the cluster
8. A game phase:
The Front Office, Pnl,The Risks and Finance departments ecosystem
11 avril 2017
8
PnL
PnL certification
Finance
Regulatory
Provision
Accountancy
Front Office
Positions
Market data
Big Compute
Risks
Risk Scenario
Compute
Sensitivities
certification
9. HAKA
HOW DID HADOOP MAKE THE NATIXIS PACK MORE EFFICIENT ?9
If you are interested and want to know more: welcome on board !
Diversity improves knowledge
Our Infrastructure team is onboarded and curious by nature
Open your minds, exchange with others, contribute to Hadoop!
Web Champions inspiration (GAFA)
With the banking industry
Try to optimise the architecture during this meeting through guided debate
An iterative way… A progressive way
Exchange : Big Data Thursday
«To bodly go where no man has gone before »
10. HAKA
TITRE DE LA PRÉSENTATION 27 FÉVRIER 201710
Try your own solution as early as possible
Proceed iteratively, work on the DEV with real data of the real size
Enjoy the Wave between DEV and Prod
Find a Minimum Viable Solution for each project
A reference, a starting kit, publish everything on the Entreprise Social Network
An integrated BI solution (with a Big Data cluster) is crucial: Indexima…
Demonstrate use cases to build platform legitimacy
A Machine Learning enabled platform
With flagship success in the community
Minimum
Viable
Solution
Done More than 40 Pocs & Projects and 10 in production
11. Openings:Technologies
HOW DID HADOOP MAKE THE NATIXIS PACK MORE EFFICIENT ?11
Infrastructure and security helpers:
Ambari: setup confort
Ranger KMS, Ranger: security
Ambari Metrics: monitoring
ETL and stored like processes, data appenders:
HDFS DFS copy, Web HDFS
HIVE,
• High latency (if not using LLAP…)
Low latency, version control and NoSQL container:
HBASE and Phoenix
12. Openings: Age of discovery
HOW DID HADOOP MAKE THE NATIXIS PACK MORE EFFICIENT ?12
Hive, what else?
Pros Cons
Best learning curve Not easy to test
JDBC compatible Not iterative
Hard to maintain
Latency
Not really ACID…
API is not friendly (UDF)
Data scientists, operatives, POCs
13. Openings: Age of discovery
HOW DID HADOOP MAKE THE NATIXIS PACK MORE EFFICIENT ?13
Hive use cases
Explore data
I am comfortable with SQL
Business is pushing hard to produce results
14. Openings: Age of reason
HOW DID HADOOP MAKE THE NATIXIS PACK MORE EFFICIENT ?14
Pros Cons
Reduced latency Slow learning curve (Scala…)
Iterative jobs Memory greedy
Easy to test Evolving very quickly
Easy to maintain Slow learning curve (Scala…)
Friendly API Memory greedy
Large community
Ok now I want a computation engine for developers… Spark!
15. Openings: Age of reason
HOW DID HADOOP MAKE THE NATIXIS PACK MORE EFFICIENT ?15
Spark use cases
Iterative computations (cache data!)
Streaming data
I want to test my code
Machine learning
16. Openings: Age of reason
HOW DID HADOOP MAKE THE NATIXIS PACK MORE EFFICIENT ?16
Another tool to read and write data very fast : HBase!
Uses cases : Logs,Time series…
Pros Cons
Very fast Just a distributed multimap
Latency REST api…
TTL Data model less flexible (than Hive)
17. Openings: Technologies used NOW
HOW DID HADOOP MAKE THE NATIXIS PACK MORE EFFICIENT ?17
Inter-application messaging
Kafka
Database import
SQOOP
Datascience and prototyping
Zeppelin with Livy
BI and Restitution
IndexIma 10 B records in 10 msec !
SQL Server and Polypath
Data governance
Atlas,
Colibra
18. Take Aways
HOW DID HADOOP MAKE THE NATIXIS PACK MORE EFFICIENT ?18
Associate
Dynamic iterative positive mood weekly meetings
Manage your projects as a community
MinimumViable Solutions and iterate
Integrated BI solution : an open window on the big data
DEV.Cluster as a PROD. Cluster : Kerberos is key
Hadoop Providers : make them involved in your project !
In our caseThankYou Hortonworks for your involvement !
20. How did HADOOP make the NATIXIS PACK more
efficient ?
a short story
by
Front Office, PnL, Risks and Finance
Pierre Alexandre PAUTRAT, Cyril MONTAGNON, EmmanuelVAIE
Dataworks Summit Munich 2017 April the 6 th 2017 #DWS17
Notas do Editor
Présentation de chacun:
PAP
Cyril
Emmanuel
We are going to tell you about the experience we had in Natixis in the big Hadoop shift from our legacy IT architecture to a new data centric one.
How did we structure/design the change process?
How did we convince the different IT actors to embrace the change?
And how did we avoid conflicts?
Pierre Alexandre will tell you: the history of the project, the way the infrastructure was built and about security and cluster governance.
Cyril will talk about technologies: our sucesses and our ambitions
I’ll provide some final points AND try and to make the journey as pleasant as possible!!!!
The first point will deal with history: pioneering
The second is about consolidation
The third point will deal with our feedback
The 4th point will be a synthesis
And the last point is for you
What events could have made us consider Hadoop as a viable solution ?
Why ?
The data volumetry
The immutable intrinsic characteristic of HDFS: the data is never lost
The ability to being scaled out horizontally
The additional features :
As a matter of fact, our different departments are collobarating in such a way :
Présenter le datalake
What were the key points of success : weekly meetings
Openings : the tools sorted by needs
Présentation de chacun:
PAP
Cyril
Emmanuel
We are going to tell you about the experience we had in Natixis in the big Hadoop shift from our legacy IT architecture to a new data centric one.
How did we structure/design the change process?
How did we convince the different IT actors to embrace the change?
And how did we avoid conflicts?
Pierre Alexandre will tell you: the history of the project, the way the infrastructure was built and about security and cluster governance.
Cyril will talk about technologies: our sucesses and our ambitions
I’ll provide some final points AND try and to make the journey as pleasant as possible!!!!