“Customer experience is the next big battle ground for telcos,” proclaimed recently Amit Akhelikar, Global Director of Lynx Analytics at TM Forum Live! Asia in Singapore. But, how to fight in this battle? A common approach has been to keep “under control” some well-known network quality indicators, like dropped calls, radio access congestion, availability, and so on; but this has proven not to be enough to keep customers happy, like a siege weapon is not enough to conquer a city. But, what if it were possible to know how customers perceive services, at least most demanded ones, like web browsing or video streaming? That would be like a squad of archers ready to battle. And even having that, how to extract value of it and take actions in no time, giving our skilled archers the right targets? Meet CANVAS (Customer And Network Visualization and AnaltyticS), one of the first LATAM implementations of a Flink-based stream processing use case for a telco, which successfully combines leading and innovative technologies like Apache Hadoop, YARN, Kafka, Nifi, Druid and advanced visualizations with Flink core features like non-trivial stateful stream processing (joins, windows and aggregations on event time) and CEP capabilities for alarm generation, delivering a next-generation tool for SOC (Service Operation Center) teams.
Flink Forward San Francisco 2019: Towards Flink 2.0: Rethinking the stack and...
Semelhante a Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time monitoring of Mobile Internet Quality of Experience using Flink"
Optimising Service Deployment and Infrastructure Resource ConfigurationRECAP Project
Semelhante a Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time monitoring of Mobile Internet Quality of Experience using Flink" (20)
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time monitoring of Mobile Internet Quality of Experience using Flink"
1. everis, an NTT DATA
Company
San Francisco
April 2018
Real-time monitoring
of Mobile Internet
QoE using Flink
2. Speakers
• Senior Telecom Analytics Consultant at everis.
• 5+ years of experience in Telecom sector.
• Working in batch and streaming analytics
innovative use cases.
• Interests: next-generation stream processing
engines, bussiness-applied machine learning use
cases…
• Analytics Manager at everis with 10+ years of
experience in the Telecom sector.
• Analytics for: Commercial, Sales, Operational,
Technical, Regulatory compliance.
• Interests: tackling BDA challenges from the
definition to the implementation with disruptive
technologies. Data should be played.
3. everis, an NTT Data Company
17.000 15
professionals countries
4. The Challenge: Mobile Internet Service Quality
Are customers
satisfied?
Is network
performance good
enough?
How do customers
perceive the
service?
CeX is the next big
Battle Ground!
5. The Challenge has evolved
Voice connectivity Voice quality Internet connectivity Mobile voice calls Smartphones mobile
data connectivity
Applications usage
6. Classic tools and processes are not enough
Customer complaints
analysis, Net Promoter
Score, Churn over
technical issues
Network performance
& trouble
Business support
systems management
New problems require
new tools!
8. Getting ready to face the challenge
• How does the customer perceive Mobile Internet Quality?
• How do main players measure customer perception?
• What tools can be used to calculate experience metrics?
• Which data is available to calculate CeX metrics?
Best Mean Opinion Score metric possible for:
• Video Streaming
• Web Browsing
• Smartphone Apps
Research Define
All logos are registered trademarks or trademarks of their respective owners and are only used for illustrative purposes
9. Let’s hit the target, with the right tools
Mean Opinion Score for
Video Streaming, Web
Browsing and Apps
Usage
Service operations
center for customer
experience monitoring
Root cause analysis and
CapEx / OpEx
prioritization
Hit straight into service
perception
10. Who are the users of this new tool?
Service Operations Center
Identify massive MOS problems in
geographical zones or per
individual customer
Analyze root causes over the
network that affect MOS
Real-time End to End monitoring
of service perception
11. Benefits
• Proactively avoid customer contacts
• Identify customer satisfaction and enable troubleshooting
• Characterize high value customers and prioritize monitoring efforts
• As a result:
• Enhance customer recommendation
• Reduce Churn
• Save costs and expenses
• Invest better
12. Per event processing
Event time processing
Session Windowing
Multiple Window Firings
Complex event processing
Why Flink?
13. ArchitectureIngestionProcessingStorage
DPI Data
Visualization
KPI Calculations &
Enrichment
Custom
Dashboards
Exploratory
Analysis
Network Performance
Data
Enriched KPIs Ingested Sources
Analytical Datastore
Raw data long-
term storage
Complex Event
Processing
Alarms
Interface to
Alarms Manager
KPI Definitions &
Alarm Rules
Other Datasources for
enrichment
• Leverages existing HDP® stack
• Loosely coupled
• Fault tolerant (HA)
• Easily extendable
• Flexible reporting
14. How Big is Big?
• We’re growing really fast!
• Now processing 23 billion
records a day.
• Processing roughly a half
of ingested data
• Aiming to more than
duplicate soon.
7
23 23
53
-
5
10
15
20
25
30
2017Q3 2017Q4 2018Q1 2018Q2 2018Q3
TB# Events x Day (in billions)
15. Challenge #1 – Multiple Stream Enriching
TL;DR: Keep the stream flowing!
And a quick tip: Keep intermediate
data objects as flexible as possible
Sources for enrichment = 1
Sources for enrichment = 2 (2nd Attempt)
Sources for enrichment = 2 (1st Attempt)
val res = NetworkElementXGeoInfo(...)
val res = DPIInfoXNetworkElementXGeoInfo(...)
val res = (DPIInfo, NetworkElement)
val res = (DPIInfo, NetworkElement, GeoInfo)
16. Challenge #2 – Session Windowing
The problem
Long gone are the days when
web pages were simple!
Now 100s of requests per web
page are the norm.
Solution
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val hdrStream = env.addSource(kafkaConsumer).filter(...)
hdrStream.keyBy(e => (e.userKey, e.webPageKey))
.window(EventTimeSessionWindows.withGap(Time.minutes(GAP_MINUTES)))
.aggregate(new CalcWebQoEAggregateFunction)
.addSink(kafkaProducer)
So simple!
We defined a “session” as the
minimum interaction of the end
user to which the system
assigns a score.
17. Challenge #3 – Multiple Window Firings (I)
The problem
𝑓 𝑣, 𝑤, 𝑥, 𝑦, 𝑧 = 𝑎𝑣 + 𝑏𝑤 + 𝑐𝑥 + 𝑑𝑦 + 𝑒𝑧
We need to compute these
formulas for each window on
hourly data, but variables often
come at (very) different processing
times.
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val consumer = new FlinkKafkaConsumer010[...](“pmTopic", MySerdeSchema, readerProps)
.assignTimestampsAndWatermarks(
new BoundedOutOfOrdernessTimestampExtractor[...](Time.minutes(60)){
override def extractTimestamp(element: ...): Long =
element.startTimestamp
})
val pmStream = env.addSource(consumer).flatMap(...)
pmStream.keyBy(e => (e.time, e.networkElementId, e.formulaId))
.window(TumblingEventTimeWindows.of(Time.minutes(60)))
.aggregate(new CalcFormulasAggregateFunction)
.addSink(kafkaProducer)
Solution
Processing Time
8:00 9:00
8:00
𝒗
8:00
𝒘
9:10
8:00
𝒙
8:00
𝒚
9:15
8:00
𝒛
Watermark
7:00
10:00
8:00
9:00
𝒗
Too much latency!
18. Challenge #3 – Multiple Window Firings (II)
New problems
• If data comes early, we still need to
wait until the next hour to trigger the
window.
• If data comes later (it does all the
time!) we’d need to make our out-of-
orderness parameter bigger, making
our latency problem even worse.
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime) //not EventTime
val consumer = new FlinkKafkaConsumer010[...](“pmTopic", MySerdeSchema, readerProps)
val pmStream = env.addSource(consumer).flatMap(...)
pmStream.keyBy(e => (e.time, e.networkElementId, e.formulaId))
.timeWindow(Time.days(1)) //max time we’re waiting for data
.trigger(ContinuousProcessingTimeTrigger.of[TimeWindow](Time.minutes(10)))
.evictor(new RemoveAfterCalculateEvictor()) //removes all data of a calc’d window
.aggregate(new CalcFormulaAggregateFunction()) //returns Option[...]
.filter(_.isDefined).map(_.get)
.addSink(kafkaProducer)
New Solution
Processing Time
8:00 9:00
8:00
𝒗
8:00
𝒘
9:10
8:00
𝒙
8:00
𝒚
9:15
8:00
𝒛
10:00
9:00
𝒗
TL;DR: Much better latency
and scalability at the cost of
a few more CPU cycles and a
bigger state.
9:20
1st Firing = None 2nd Firing = Some(...)
19. Challenge #4 – Making CEP Dynamically
• We need to run several complex event
patterns on some different datasources.
• New rules are added, and the existing
ones are changed from time to time.
• Hard-code this is not an option!
{
"name": “VideoQoE CEP",
"eventClass": "com.everis.stream.qoe.example.VideoQoEEvent",
“inputTopic": “video_qoe",
"outputTopic": "alarms",
“alarmType": 12345,
“alarmBody": "Alarm! The current Video QoE score in zone ${geographicalArea} is ${score}",
"patterns": [
{"name": "start", "quantifier": "+", "condition": "score > 3 && score <= 3.5", "optional": false, "greedy": true},
{"name": "end", "quantifier": "1", "condition": "score < 3", "optional": false, "greedy": false, "contiguity": "relaxed", "within": "10 min"}
]
}
The problem
• Make it dynamic! (more or less)
• Use JSONs to configure patterns.
• Flink parses the JSONs and starts
running the pattern matching.
• Needs the job to be restarted
Solution
20. What’s Next
• Add more sources to the mix (Fixed network, Contact Center, etc.)
• Use ML to enhance our anomaly detection capabilities.
• Further develop interfaces to external systems.
• Calibrate math models to better resemble quality expectations of customers.
• Improve our CEP module.
• Data Science / ML work area.