O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

COBOL to Apache Spark

1.174 visualizações

Publicada em

"What does it take to transform a legacy mainframe COBOL system to state-of-the-art Java EE platform? How the Apache Spark clustering framework fits in all of this? Attend this session to find out, with concrete solutions to some of the major problems of turning a procedural program object-oriented, and parallelizing sequential processing."

Publicada em: Tecnologia
  • Entre para ver os comentários

  • Seja a primeira pessoa a gostar disto

COBOL to Apache Spark

  1. 1. Oct 28, 2017 Ville Misaki System Strategy Department, Rakuten Card Co., Ltd.
  2. 2. 2  Ville Misaki  Senior Software Engineer  Technology Strategy Group, System Strategy Department, Rakuten Card Co., Ltd  Career  15+ years; 3 years at Rakuten  In Finland, the Netherlands, Japan  Java (EE), Perl, C++, web systems, relational databases, performance optimization & security
  3. 3. 3  Oracle OpenWorld 2017  Case Study: Credit Card Core System with Exalogic, Exadata, Oracle Cloud Machine (CON4994) => Link  JavaOne 2017  Java EE 7 with Apache Spark for the World’s Largest Credit Card Core Systems (CON4998) => Link
  4. 4. 4 Part 1 – Perfect Design 1. About Rakuten Card 2. Background 3. Platform Migration 4. Data Migration 5. Software Migration Part 2 – Harsh Reliability 6. Performance 7. Apache Spark 8. Judgement Day 9. Into the Future
  5. 5. 5
  6. 6. 6 Unified brand, ecosystems around the world.
  7. 7. 7  Top-level credit card company in Japan  Core of Rakuten eco systems.  3rd position of total transaction volume in 2016. Growing rapidly.
  8. 8. 8
  9. 9. 9 Core Systems Web Systems External Systems Intra Systems
  10. 10. 10 Mainframe  Old architecture – >20 years  High cost  Limited capacity and performance  Low maintainability  Vendor locked-in  Limited security  For more details, check session “From Mainframe to Java EE” at 16:00 today
  11. 11. 11 Phase of the improvement – 3.0 1.0 Initial phase 2.0 In-house development 3.0 Standardization Outsource based, just started. Vendor locked-in. In-house development, differentiate with lower costs and faster delivery. Standardized system architecture, both for hardware and software. Achieved Current Standard Architecture
  12. 12. 12
  13. 13. 13 Oracle Exalogic + Exadata + ZFS Servers Mainframe Old New Core Systems
  14. 14. 14  Financial de-facto standard  Java EE compliant.  Matured, from 1997.  Financial de-facto standard  ISO/IEC 9075 SQL compliant  Matured, from 1983. COBOL Network DB App Server Database Old New WebLogic Server Oracle Database
  15. 15. 15
  16. 16. 16 ISAM VSAM NDB Oracle Database Copy & Convert
  17. 17. 17  Data Conversion  Network database to relational database  ISAM/VSAM data to relational database  Legacy Japanese character set to Unicode  Fix data inconsistencies  Scale  Terabytes of live production data  Less than 24 hours time
  18. 18. 18  Offline migration  Freeze data during migration  Full migration – not incremental  Customers mostly unaffected  Data & System migration  At the same time  Cannot be split into phases Cached
  19. 19. 19 ISAM VSAM NDB Oracle DatabaseISAM VSAM NDB Mirror Copy & Convert Replication
  20. 20. 20
  21. 21. 21 Req. Source code Appliction Platform Hardware Reimplement Convert Emulate
  22. 22. 22 Reimplement Emulate Convert Pro • Optimal performance • Low maintenance cost • Development unchanged • Easy to test • Easy to migrate • Flexible cost vs. schedule • Case-by-case fixes • Easy to test Con • Expensive • Takes a long time • Risky • Difficult to test • Development unchanged • Low performance • Future questionable • Legacy code remains • Low performance points need to be addressed Requirements?
  23. 23. 23 Reimplement Emulate Convert Pro • Optimal performance • Low maintenance cost • Development unchanged • Easy to test • Easy to migrate • Flexible cost vs. schedule • Case-by-case fixes • Easy to test Con • Expensive • Takes a long time • Risky • Difficult migration • Development unchanged • Low performance • Future questionable • Legacy code remains • Low performance points need to be addressed 2x Performance No regression Minimal downtime
  24. 24. 24 Reimplement Emulate Convert Pro • Optimal performance • Low maintenance cost • Development unchanged • Easy to test • Easy to migrate • Flexible cost vs. schedule • Case-by-case fixes • Easy to test Con • Expensive • Takes a long time • Risky • Difficult migration • Development unchanged • Low performance • Future questionable • Legacy code remains • Low performance points need to be addressed 2x Performance No regression Minimal downtime
  25. 25. 25 Japanese COBOL Source code Java Source code Customized source code converter  Convert from Japanese COBOL to Java EE  Keep original core business logic
  26. 26. 26 Java From Web Systems, For New Logic COBOL From Old System, converted to Java  Ease of migration, resource re-use  Introduce power of Java EE  Introduce converter from YPS to Java “Dual Source Architecture” Japanese COBOL  Japanese source code  Almost abandoned  No books, no community Old New
  27. 27. 27 New Logic (Java EE) Application Server (Java EE) Legacy Logic (Mainframe) Build Deploy Japanese COBOL Convert to COBOL Convert to Java COBOL Java Compile WAR Converter  Two sources, single binary  Easy to operate Java Byte Code Compile Java
  28. 28. 28 BIG-IP Real-time Servers (WebLogic) Batch Servers (Spark & Java) Façade Rich clients Façade Façade Intranet External Intra Exadata Mail Form BIG-IP Façade BIG-IP External customers Scheduler CoreBusinessLogicAPIs Operation terminal Web browser Old New
  29. 29. 29 Part 1 – Perfect Design 1. About Rakuten Card 2. Background 3. Platform Migration 4. Data Migration 5. Software Migration Part 2 – Harsh Reliability 6. Performance 7. Apache Spark 8. Judgement Day 9. Into the Future
  30. 30. 30
  31. 31. 31 vs.
  32. 32. 32 vs.
  33. 33. 33 Start Slow Slow  Batches are run as networks  Hierarchical  Critical path  Time window
  34. 34. 34  Automatic code conversion  COBOL program flow emulated in Java  COBOL-like data structures in Java  DB access logic  Business logic built on network DB  NDB and RDB are good at different tasks
  35. 35. 35  COBOL vs. Java  Goto statement – imitation is complex  Sub-program calls – heavy  No local variables – tight coupling  No libraries – copy&paste code  Few shared data structures – copy&paste definition  No shared enum/constant – magic numbers
  36. 36. 36  COBOL data structures  Fixed length – hard-coded  String-based  Data block inside program  Often thousands of fields  Hierarchical fields  Content is joined/split automatically  Variable namespace under each parent  Even five levels deep
  37. 37. 37
  38. 38. 38  Logic optimized for NDB  Read sequentially  Data pre-sorted  Data pre-formatted  Emulate in RDB  Uphill battle NDB RDB Search Slow Fast Sequential Access Fast Slow Sorting Slow Fast Formatting Fast Slow
  39. 39. 39  New system must be faster  Time until launch: 1 year
  40. 40. 40  Options?  Redesign and re-implement from scratch  Not feasible  Optimize framework  Limited effectiveness  Parallelize batches  Elastic brute-force
  41. 41. 41
  42. 42. 42 Time Sequential Parallel
  43. 43. 43 Cluster Node Cluster Node Cluster Node Cluster Node Cluster Node Bootstrap Scheduler Cluster Node SharedMemory
  44. 44. 44 1. Making business logic parallel  Independent processing 2. I/O  Data transferred over network 3. Data ordering  Shuffles
  45. 45. 45  Problem: input data rows are not independent!  Red flags  Fields not initialized for each row  Code forks early (header & data?)  Legacy code analysis  Refactor  Fields to local variables  Extract data structures  Initialize data for each row  Run & see 321 3 2 1 Reference?
  46. 46. 46 1. Group related rows together 2. Process header rows separately 3. Modify business logic
  47. 47. 47 Group related rows together  Custom data reader  Multiple rows behave like one row  Process each group row in a loop, on the same node  Pro  Business logic not modified  Con  Relationships may be too complex  Groups may grow too big ID Data 1 … 1 … 2 … 3 … 3 … 4 …
  48. 48. 48 Process header rows separately  Run business logic for header rows first  Collect result in NavigableMap  Run business logic for data rows  Initialize data from previous header  floorKey(dataRowIndex)  Pro  Minimal changes to business logic  Con  Relationships may be too complex ID Type Data 1 Head … 1 Data … 1 Data … 2 Head … 2 Data … 3 Head … 3 Data …
  49. 49. 49 Modify business logic  Row relationship could be removed, if it’s  Unintentional (a bug)  For unnecessary optimization  Data that could be retrieved otherwise  Pro  High chance for good performance  Con  High chance for new bugs
  50. 50. 50  Input and output data must be shared  Network storage  How long does it take to copy 200 GB? Transfer Process Transfer Process Transfer Heavy Process Heavy ProcessTransfer Transfer Process
  51. 51. 51  Sequential batches rely on ordering  Tricky to keep in Spark  Safe operations: map, filter, zip  Unsafe operations: join, group, sort Process Process Process Process Process Process Shuffle Process Process Process Shuffle
  52. 52. 52  Good for  Heavy processing  Independent input data records  One input, multiple outputs  Unordered data  Not so great for  Little processing  Dependencies between data records  Merging multiple data sources
  53. 53. 53
  54. 54. 54
  55. 55. 55 321 321Data Saturday Sunday Monday
  56. 56. 56 vs.
  57. 57. 57
  58. 58. 58 Next Phase 1.0 Initial phase 2.0 In-house development 3.0 Standardization 4.0 Data Optimized Outsource based, just started. Vendor locked-in. In-house development, differentiate with lower costs and faster delivery. Standardized system architecture, both for hardware and software. Overwhelming differentiation, with enabling architecture for customer centric service. Achieved Next Current Standard Architecture

×