Atzmon Hen-Tov & Lior Schachter, Pontis
Businesses everywhere are increasingly challenged by their dependencies on legacy platforms. The dramatic increase in data volume, speed, and types of data is quickly outstripping the capabilities of these legacy systems. By transitioning from a legacy RDBMS to a Hadoop-based platform, Pontis was able to process and analyze billions of mobile subscriber events every day. In this talk, we’ll provide a quick overview of our legacy system, as well as our process for migrating to our target architecture. We’ll continue with a review our Hadoop platform selection process, which involved a thorough RFP and a detailed analysis of the top Hadoop platform vendors. This session will focus on how we gradually transitioned to our big data platform over the course of several product versions, resulting in higher scalability and a lower TCO in each version. We’ll outline the benefits of the target architecture, and detail how we successfully integrated Hadoop into our organization. Our session will conclude with a look at technical solutions for dealing with big data deficiencies.
11. We conducted an RFP for selecting the most Telco-Grade platform.
The RFP focused on non-functional capabilities such as sustainable
performance, high-availability and manageability.
12.
13.
14. The approach
Each step should increase scalability and reduce TCO.
Runtime (OLTP) processing:
We replace the underline plumbing's-minimal changes to business logic.
All changes can be turned on/off by GUI configurations:
Modular hybrid architecture.
Ability to work in dual mode - Good for QA…But also for production (legacy)…
Analytics processing:
Calculate the Profile in M/R (Java).
Scalable.
We have the best Java developers.
Wrap it with a DSL (Domain-Specific-Languages)
That’s how we work for years – (ModelTalk paper)
Non-Java-programmers can do the Job.
20. Phase 2 – Introducing MapR Hadoop
Cluster
Resulting Context
MapR FS + NFS :
Horizontally scalable
Cheap compared to high-end NFS solutions.
Fast and High-Available (using VIPs)
Avoiding another hop to HDFS (Flume, Kafka).
Many small files are stored in HDFS (100s of millions) – no need to merge files
Phase # Customers # Events
Legacy 10M 120M
Phase 1 10M 200M
Phase 2 unlimited 200M
21. Phase 2 – Introducing MapR Hadoop
Cluster
Resulting Context
Avro files:
Complex Object Graph
Troubleshooting with PIG
Out-of-the-box upgrade (e.g. adding field)
Map/Reduce is incremental – Avro record capture the subscriber state
Map/Reduce efficiency - avoiding huge joins
Subscriber Profile calculation:
Performance : 2-3 hours.
Linear scalability: No limitation on number of subscribers/raw data (buy more nodes)
Fast run over history data allows for early launch
Sqoop - very fast insertions to MS-SQL (10s of millions of records in minutes).
Data-Analysts started working over Hive environment.
No HA for OOZIE yet…
Hue is premature
MS-SQL and ODBC over Hive is slow and limited
23. Phase 3 –Introducing MapR M7 Table
Extensive YCSB load tests to find best table structure and read/update
granularity. Main conclusions:
M7 knows how to handle very big heap – 90GB.
Update granularity : small updates (using columns) = fast reads
(*)While in other KV store need to update the entire BLOB
CSR tables migrated from Oracle to M7 Table:
10s of billions of records
Need sub-second random access per subscriber
99.9% Writes – by Runtime machines (almost each event processing operation
produces update)
0.1% Reads – by Customer’s CSR representative.
Rows – per subscriber key, 10’s of millions
2 CFs – TTL 365 days. 1 version.
Qualifier:
key:[date_class_event_id], value: record
Up to thousands per Row
24. Phase 3 –Introducing MapR M7 Table
Resulting Context
Choosing the right features – no too demanding performance wise.
Easy to create and manage tables– still there’s some tweaking.
No cross-table ACID - need to develop a solution for keeping consistency across
M7 Table/Oracle/Files-system.
Hard for QA - compared to RDBMS. No easy way to query. Need to develop tools.
Phase # Customers # Events
Legacy 10M 120M
Phase 1 10M 200M
Phase 2 unlimited 200M
Phase 3 unlimited 300M
26. Phase 4 – Migrating OLTP features to
M7 tables
Subscriber State table migrated from Oracle to M7 Table:
25% Writes– by Runtime machines updating the state
100% Reads – by Runtime.
Rows – per subscriber key, 10’s of millions
1 CFs – TTL -1. 1 version.
YCSB to validate the solution Sizing model
Qualifier:
key:state_name, value: state value.
Dozens per Row.
But….Only 10% are being updated per event
Subscriber Profile Table migrated from MS-SQL to M7 Table.
Bulk insert once a day
Outbound Queue Table migrated from MS-SQL to M7 Table.
27. Phase 4 – Migrating OLTP features to
M7 tables
Resulting Context
No longer dependent on Oracle for OLTP.
Real-time processing can handle billions of events per day.
Sizing is linear and easy to calculate:
Number of subscribers * state size * 80% should reside in cache.
HW spec: 128GB RAM, 12 SAS drives.
Consistency management is very complicated.
Phase # Customers # Events
Legacy 10M 120M
Phase 1 10M 200M
Phase 2 unlimited 200M
Phase 3 unlimited 300M
Phase 4 unlimited unlimited
29. Phase 5 – Decommission legacy RDBMS
Resulting Context
MySQL is not a new technology in our stack (part of MapR
distribution).
Removing Oracle/MS-SQL from our architecture has
significant impact on system cost, deployment, monitoring
etc.