How Hadoop Makes the Natixis Pack More Efficient

How did HADOOP make the NATIXIS PACK more
efficient ?
a short story
by
Front Office, PnL, Risks and Finance
Pierre Alexandre PAUTRAT, Cyril MONTAGNON, EmmanuelVAIE
Dataworks Summit Munich 2017 April the 6 th 2017 #DWS17

THOW DID HADOOP MAKE THE NATIXIS PACK MORE EFFICIENT ?2
HOW DID WE MEET HADOOP ?
A GAME PHASE:
the Front Office, Pnl,the Risks and Finance
departments
HAKAS AND OPENINGS
TAKE AWAY
Q&A
LET US PLAY
THE MATCH
AGAIN
1
2
3
4
5

How did we meet Hadoop ?
HOW DID HADOOP MAKE THE NATIXIS PACK MORE EFFICIENT ?3
 Experiments done by two separate departments in 2014
 Credit card analysis POC and site creation for Marketing (Internal Anonymized Test Only)
• Hadoop, Elastic Search, Kibana
 NoSQL Persistence for simulated profits and losses by the Market Risks Department
• HBASE
 Informal exchanges between the Front Office and the IT Risks by the end of 2014

How did we meet Hadoop ?
 “Big DataThursday”: an open meeting with a positive mood !
 June 2015: we built a first Platform - secured as a Production platform should be - host our DEV !
 Target: Go live for a PROD platform Summer 2016,
 if Pilots projects were to be OK
 if sharing a platform was ok for everybody (FO and IT Risks)
 January 2016: First project results accelerate the decision to move forward - especially for
regulatory hot topics

STACK
HDFS
HBASE
HIVE
KERBEROS
QUICK BI
DATA
INTEGRATION
SQOOP
RANGER
WORKFLOWS
AMBARI
TRAINING
BACKUP AND CONTINUITY PLAN
PYTHON
STANDARDS
SCHEDULERS
SPARK
BACKUPS
R
DATA GOVERNANCE
ATLAS
WHEREHOWS
COLLIBRA…
BILLING
PLAFORM
GOVERNANCE
COMMITEE
PRODUCTION PLATFORM
SPONSOR
HADOOP
COMMITERS
SCALA
JAVA
SPARK ML PHOENIX
KAFKA
INTEGRATION WITH
AUTHORIZATION 2017 MARCH
PRODUCTION VERSION
NATIXIS
2.5.3
LLAP
ZEPPELIN
AUTOMATIC
SSO
2S 2014
INDEXIMA
ANACONDA
ARCHIVING CLUSTER
Our Technical Journey

A game phase:
The Front Office, Pnl,The Risks and Finance departments ecosystem
 Regulatory evolutions
 RIM (Regulatory Initial Margin)
• the initial amount for your loan !
 FRTB (Fundamental Review of the Trading Book)
• the new vision of the Market Risk for the ECB

A game phase:
 More efficient
 Stop processing data in a sequential way
 Do not waste time in transferring data from one NAS to another
 Go beyond the limit of the (usual) monolithic and centralized systems
 Process data where it is in a common and secured place-> HDFS
 Precise and secured synchronization -> KAFKA
 NoSQL persistence versus Standard SQL -> HBASE
 Connecting the BIG DATA universe to the BIG COMPUTE paradigm
 AddedValue: making Golden Sources available on the cluster

A game phase:
11 avril 2017
8
PnL
PnL certification
Finance
Regulatory
Provision
Accountancy
Front Office
Positions
Market data
Big Compute
Risks
Risk Scenario
Compute
Sensitivities
certification

HAKA
 If you are interested and want to know more: welcome on board !
 Diversity improves knowledge
 Our Infrastructure team is onboarded and curious by nature
 Open your minds, exchange with others, contribute to Hadoop!
 Web Champions inspiration (GAFA)
 With the banking industry
 Try to optimise the architecture during this meeting through guided debate
 An iterative way… A progressive way
Exchange : Big Data Thursday
«To bodly go where no man has gone before »

HAKA
TITRE DE LA PRÉSENTATION 27 FÉVRIER 201710
 Try your own solution as early as possible
 Proceed iteratively, work on the DEV with real data of the real size
 Enjoy the Wave between DEV and Prod
 Find a Minimum Viable Solution for each project
 A reference, a starting kit, publish everything on the Entreprise Social Network
 An integrated BI solution (with a Big Data cluster) is crucial: Indexima…
 Demonstrate use cases to build platform legitimacy
 A Machine Learning enabled platform
 With flagship success in the community
Minimum
Viable
Solution
Done  More than 40 Pocs & Projects and 10 in production

Openings:Technologies
 Infrastructure and security helpers:
 Ambari: setup confort
 Ranger KMS, Ranger: security
 Ambari Metrics: monitoring
 ETL and stored like processes, data appenders:
 HDFS DFS copy, Web HDFS
 HIVE,
• High latency (if not using LLAP…)
 Low latency, version control and NoSQL container:
 HBASE and Phoenix

Openings: Age of discovery
 Hive, what else?
Pros Cons
Best learning curve Not easy to test
JDBC compatible Not iterative
Hard to maintain
Latency
Not really ACID…
API is not friendly (UDF)
Data scientists, operatives, POCs

Openings: Age of discovery
 Hive use cases
 Explore data
 I am comfortable with SQL
 Business is pushing hard to produce results

Openings: Age of reason
Pros Cons
Reduced latency Slow learning curve (Scala…)
Iterative jobs Memory greedy
Easy to test Evolving very quickly
Easy to maintain Slow learning curve (Scala…)
Friendly API Memory greedy
Large community
 Ok now I want a computation engine for developers… Spark!

Spark use cases
 Iterative computations (cache data!)
 Streaming data
 I want to test my code
 Machine learning

 Another tool to read and write data very fast : HBase!
 Uses cases : Logs,Time series…
Pros Cons
Very fast Just a distributed multimap
Latency REST api…
TTL Data model less flexible (than Hive)

Openings: Technologies used NOW
 Inter-application messaging
 Kafka
 Database import
 SQOOP
 Datascience and prototyping
 Zeppelin with Livy
 BI and Restitution
 IndexIma 10 B records in 10 msec !
 SQL Server and Polypath
 Data governance
 Atlas,
 Colibra

Take Aways
 Associate
 Dynamic iterative positive mood weekly meetings
 Manage your projects as a community
 MinimumViable Solutions and iterate
 Integrated BI solution : an open window on the big data
 DEV.Cluster as a PROD. Cluster : Kerberos is key
 Hadoop Providers : make them involved in your project !
 In our caseThankYou Hortonworks for your involvement !

CONTACT
SPEAKERS
M. PierreAlexandre PAUTRAT
pierre-alexandre.pautrat@natixis.com
https://www.linkedin.com/in/pierrealexandrepautrat/
M. Cyril MONTAGNON
cyril.montagnon@natixis.com
M. EmmanuelVAIE
emmanuel.vaie@natixis.com
https://www.linkedin.com/in/emmanuelvaie
ADRESSE
 NATIXIS
30, avenue Pierre Mendès France 75013 Paris
- France
www.natixis.com

How Hadoop Makes the Natixis Pack More Efficient

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a How Hadoop Makes the Natixis Pack More Efficient

Semelhante a How Hadoop Makes the Natixis Pack More Efficient (20)

Mais de DataWorks Summit/Hadoop Summit

Mais de DataWorks Summit/Hadoop Summit (20)

Último

Último (20)

How Hadoop Makes the Natixis Pack More Efficient

Notas do Editor