Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform

Operat
or
System
s
OCS
IN
CDR
PCC
CRM
Data Flow
IntegrationLayer
RT Complex
Event
Processing
Decisioning Engine
iCLM UI
Marketing
Operations
Business
Discovery,
Monitoring
& Reporting
Visual
Rules
Subscriber Data Store
HBa
se
MarketingCSR Monitoring
Big Data
Analytics
Hadoop M/R Event,
aggregation and profile
Data
Hive DWH
Subscriber Profile
Decisioning Engine
Chann
els

We conducted an RFP for selecting the most Telco-Grade platform.
The RFP focused on non-functional capabilities such as sustainable
performance, high-availability and manageability.

The approach
 Each step should increase scalability and reduce TCO.
 Runtime (OLTP) processing:
 We replace the underline plumbing's-minimal changes to business logic.
 All changes can be turned on/off by GUI configurations:
 Modular hybrid architecture.
 Ability to work in dual mode - Good for QA…But also for production (legacy)…
 Analytics processing:
 Calculate the Profile in M/R (Java).
 Scalable.
 We have the best Java developers.
 Wrap it with a DSL (Domain-Specific-Languages)
 That’s how we work for years – (ModelTalk paper)
 Non-Java-programmers can do the Job.

Phase 1 – File queues in NFS
Resulting context
 Pure plumbing change – no changes to business logic code.
 Offloading oracle: *2 Performance boost.
 No BigData technology.
 Windows NFS client performance is a bottleneck.
Phase # Customers # Events
Legacy 10M 120M
Phase 1 10M 200M

Phase 2 – Introducing MapR Hadoop
Cluster
Resulting Context
 MapR FS + NFS :
 Horizontally scalable
 Cheap compared to high-end NFS solutions.
 Fast and High-Available (using VIPs)
 Avoiding another hop to HDFS (Flume, Kafka).
 Many small files are stored in HDFS (100s of millions) – no need to merge files
Legacy 10M 120M
Phase 1 10M 200M
Phase 2 unlimited 200M

Phase 2 – Introducing MapR Hadoop
Cluster
Resulting Context
 Avro files:
 Complex Object Graph
 Troubleshooting with PIG
 Out-of-the-box upgrade (e.g. adding field)
 Map/Reduce is incremental – Avro record capture the subscriber state
 Map/Reduce efficiency - avoiding huge joins
 Subscriber Profile calculation:
 Performance : 2-3 hours.
 Linear scalability: No limitation on number of subscribers/raw data (buy more nodes)
 Fast run over history data allows for early launch
 Sqoop - very fast insertions to MS-SQL (10s of millions of records in minutes).
 Data-Analysts started working over Hive environment.
 No HA for OOZIE yet…
 Hue is premature
 MS-SQL and ODBC over Hive is slow and limited

Phase 3 –Introducing MapR M7 Table
 Extensive YCSB load tests to find best table structure and read/update
granularity. Main conclusions:
 M7 knows how to handle very big heap – 90GB.
 Update granularity : small updates (using columns) = fast reads
(*)While in other KV store need to update the entire BLOB
 CSR tables migrated from Oracle to M7 Table:
 10s of billions of records
 Need sub-second random access per subscriber
 99.9% Writes – by Runtime machines (almost each event processing operation
produces update)
 0.1% Reads – by Customer’s CSR representative.
 Rows – per subscriber key, 10’s of millions
 2 CFs – TTL 365 days. 1 version.
 Qualifier:
 key:[date_class_event_id], value: record
 Up to thousands per Row

Phase 3 –Introducing MapR M7 Table
Resulting Context
 Choosing the right features – no too demanding performance wise.
 Easy to create and manage tables– still there’s some tweaking.
 No cross-table ACID - need to develop a solution for keeping consistency across
M7 Table/Oracle/Files-system.
 Hard for QA - compared to RDBMS. No easy way to query. Need to develop tools.
Legacy 10M 120M
Phase 1 10M 200M

Phase 4 – Migrating OLTP features to
M7 tables
 Subscriber State table migrated from Oracle to M7 Table:
 25% Writes– by Runtime machines updating the state
 100% Reads – by Runtime.
 Rows – per subscriber key, 10’s of millions
 1 CFs – TTL -1. 1 version.
 YCSB to validate the solution  Sizing model
 Qualifier:
 key:state_name, value: state value.
 Dozens per Row.
 But….Only 10% are being updated per event
 Subscriber Profile Table migrated from MS-SQL to M7 Table.
 Bulk insert once a day
 Outbound Queue Table migrated from MS-SQL to M7 Table.

Phase 4 – Migrating OLTP features to
M7 tables
Resulting Context
 No longer dependent on Oracle for OLTP.
 Real-time processing can handle billions of events per day.
 Sizing is linear and easy to calculate:
 Number of subscribers * state size * 80% should reside in cache.
 HW spec: 128GB RAM, 12 SAS drives.
 Consistency management is very complicated.
Legacy 10M 120M
Phase 1 10M 200M
Phase 4 unlimited unlimited

Phase 5 – Decommission legacy RDBMS
Resulting Context
 MySQL is not a new technology in our stack (part of MapR
distribution).
 Removing Oracle/MS-SQL from our architecture has
significant impact on system cost, deployment, monitoring
etc.

Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform

Similar to Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform (20)

More from MapR Technologies

More from MapR Technologies (20)

Recently uploaded

Recently uploaded (20)

Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform