Self-Tuning and Managing Services

Self-Tuning and
Managing Services
Reza Rahimi, PhD.
Principal Staff Algorithm and Software Architect,
Huawei R&D (Futurewei) Cloud Storage Lab (Global CTO Office),
Santa Clara, USA.
R&D

Self-Tuning and Managing Services
Related Recent R&D Experience
2
QoE-Aware smart home wireless network management and
optimization.
SLA-Aware intelligent cloud management and optimization.
R&D
PhD Topic: QoS-Aware resource management in mobile cloud
computing.
Optimal Algorithm Design
R&D
Low complexity secure code for big data in cloud storage.
5+ Years
R&D

3
PhD. Thesis:
“QoS-Aware Middleware for
Optimal Service Allocation in
Mobile Cloud Computing : An
Opportunistic Approach to Internet
of Things”
Initial Core Idea
The Next Big
Thing

4
Mobile Cloud Computing (MCC)
Ecosystem
M. Reza Rahimi, Jian Ren, Chi Harold Liu, Athanasios V. Vasilakos, and Nalini Venkatasubramanian, "Mobile Cloud
Computing: A Survey, State of Art and Future Directions", in ACM/Springer Mobile Application and Networks (MONET),
Special Issue on Mobile Cloud Computing, Nov. 2014 (cited : 210+).
Tier 1: Public Cloud
(+) Scalable, Elastic, Available
(+) Fault Tolerant
(-) Price, Delay, Privacy and
Security
Tier 2: Local/Private
Cloud
(+) Low Delay, Low Power,
(+) Privacy and Security,
(- ) Limited Capacity,
IBM: by 2018 61% of
enterprise would be on
tiered cloud
Wired and Wireless
Network Providers
Local and Private
Cloud Providers
Devices, Users
,Apps and Sensors
Public Cloud Providers
Content and Service
Providers
Heavy Reading predicts that
the direct revenue of mobile
cloud market will grow
to about 68 billion $ by 2018.

5
Problem Statement
Context as a Service:
ex: Mobility patterns,
Service Usage in different
location and time,
Group-Aware, Social
Context, …
Network as a
Service:
ex: Wireless
connectivity
(Wi-Fi,
3G/4G/5G,
Bluetooth,…)
Computation
/Storage as a Service
(2-Tier):
ex: computation, Storage,
Platform,..
QoS-Aware Optimal service
allocation for mobile users based on
important criteria like:
Delay
Power
Price

6
Outline
• Location-Time Workflow
• QoS and Normalization
• Group/Single Mobile Applications, Fairness Utility
Problem
Formulation
• Simulated Annealing Approach
• Greedy Approach
• Genetic Algorithm Approach
• Scalability
Mobility-Aware
Service Allocation
Algorithms
• Orchestrating all Components in MAPCloud SOA
Architecture.
Middleware
Architecture
• Performance Results of the Algorithms
Experimental
Results

7
Modeling Services in MCC
• Applications/System Queries could be modeled as the
Workflow.
• It consists of number of logical steps known as a Service
with different composition patterns:
S1
S2
S4
S3
S5
S7
S8
S6
0
1
Par1
Par2
3
Start End
S1 S2 S3
S1
S2
S4
S3
S1
S1
S2
S4
S3
SEQ LOOP
AND: CONCURRENT FUNCTIONS XOR: CONDITIONAL FUNCTIONS
k
1
1
P1
P2
𝑷𝟏 + 𝑷𝟐 = 𝟏, 𝑷𝟏, 𝑷𝟐 ∈ {𝟎, 𝟏}

8
t1 t2 t4
t3 tN
l2
l1
l3
ln
W1
Wk+1
Wk
Wj+1
Wj
Location-Time Workflow
• It could be formally defined as:
𝑾(𝒖𝒌)𝚻
𝑳
≝ (𝒘 𝒖𝒌 𝒕𝒎𝟏
𝒍𝒏𝟏 , 𝒘 𝒖𝒌 𝒕𝒎𝟐
𝒍𝒏𝟐 ,….,𝒘 𝒖𝒌 𝒕𝒎𝒌
𝒍𝒏𝒌 )
Location-Time Workflow
M. Reza. Rahimi, Nalini Venkatasubramanian, Sharad Mehrotra, Athanasios Vasilakos, "On Optimal and Fair Service
Allocation in Mobile Cloud Computing", in IEEE Transaction on Cloud Computing, 2015.

9
Quality of Service (QoS)
𝒒(𝒖𝒌
𝒔𝒊,𝒍𝒋,𝒕𝒎)𝒑𝒐𝒘𝒆𝒓 power consumed on u cellphone when he is in location l at t using s .
• The QoS could be defined in two different Levels:
• Atomic service level
• Composite service level or workflow level.
• Atomic service level could be defined as (for power
as an example):
• The workflow QoS is based on different patterns.
QoS SEQ AND (PAR) XOR (IF-ELSE-THEN) LOOP
𝑾𝒑𝒐𝒘𝒆𝒓
𝒒(𝒖𝒌
𝒔𝒊,𝒍𝒋,𝒕𝒎)𝒑𝒐𝒘𝒆𝒓
𝒊 𝒏
𝒊 𝟏
𝒒(𝒖𝒌
𝒔𝒊,𝒍𝒋,𝒕𝒎)𝒑𝒐𝒘𝒆𝒓
𝒊 𝒏
𝒊 𝟏
𝒎𝒂𝒙
𝒊
𝒒(𝒖𝒌
𝒔𝒊,𝒍𝒋,𝒕𝒎)𝒑𝒐𝒘𝒆𝒓 𝒒(𝒖𝒌
𝒔𝒊,𝒍𝒋,𝒕𝒎)𝒑𝒐𝒘𝒆𝒓 × 𝒌
Shivajit Mohapatra, M Reza Rahimi, Nalini Venkatasubramanian, Handbook of Energy-Aware and Green Computing-Two
Volume Set, CRC Press Taylor & Francis Group, 2012 (cited :100+).

10
• As it can be understood different QoSes have different dimensions
(Price->$, power->joule, delay->s)
• We need the normalization process to make them comparable.
QoS Normalization
𝑾(𝒖𝒌)𝜯
𝑳
𝒑𝒐𝒘𝒆𝒓
≝
𝑾(𝒖𝒌)𝜯
𝑳
𝒎𝒂𝒙
− 𝑾(𝒖𝒌)𝜯
𝑳
𝑾(𝒖𝒌)𝜯
𝑳
𝒎𝒂𝒙
− 𝑾(𝒖𝒌)𝜯
𝑳
𝒎𝒊𝒏 , 𝑾(𝒖𝒌)𝜯
𝑳
𝒎𝒂𝒙
≠
𝑾(𝒖𝒌)𝜯
𝑳
𝒎𝒊𝒏
𝟏, 𝒆𝒍𝒔𝒆
 The normalized power, price and delay is
the real number in interval [0,1].
 The higher the normalized QoS the better
the execution plan is.

11
𝒎𝒂𝒙
𝟏
|𝑼|
𝒎𝒊𝒏 𝑾(𝒖𝒌)𝚻
𝑳
, 𝑾(𝒖𝒌)𝚻
𝑳
𝒑𝒓𝒊𝒄𝒆
𝑳
𝒅𝒆𝒍𝒂𝒚
𝒖𝒌
𝑺𝒖𝒃𝒋𝒆𝒄𝒕 𝒕𝒐: 𝟏
|𝑼|
𝑾(𝒖𝒌)𝚻
𝑳
≤ 𝑩𝒑𝒐𝒘𝒆𝒓,
𝟏
|𝑼|
𝑾(𝒖𝒌)𝚻
𝑳
≤ 𝑩𝒑𝒓𝒊𝒄𝒆,
𝟏
|𝑼|
𝑾(𝒖𝒌)𝚻
𝑳
≤ 𝑩𝒅𝒆𝒍𝒂𝒚,
𝜿 ≤ 𝑪𝒂𝒑(𝑳𝒐𝒄𝒂𝒍_𝑪𝒍𝒐𝒖𝒅𝒔)
𝜿 ≜ 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒎𝒐𝒃𝒊𝒍𝒆 𝑼𝒔𝒆𝒓𝒔 𝒖𝒔𝒊𝒏𝒈
𝒔𝒆𝒓𝒗𝒊𝒄𝒆𝒔 𝒐𝒏 𝒍𝒐𝒄𝒂𝒍 𝒄𝒍𝒐𝒖𝒅
∀ 𝒖𝒌 ∈ 𝒖𝟏, … , 𝒖|𝑼|
• In this optimization problem our goal is to maximize the minimum
saving of power, price and delay of the mobile applications.
Optimal Service Allocation for
Single Mobile User

12
𝒎𝒂𝒙
𝟏
|𝑮|
𝟏
|𝒈𝒊|
𝒎𝒊𝒏 𝑾(𝒖𝒌)𝚻
𝑳
𝑳
𝑳
𝒖𝒌∈𝒈𝒊
𝒈𝒊∈𝑮
𝑺𝒖𝒃𝒋𝒆𝒄𝒕 𝒕𝒐: 𝟏
|𝒈𝒊|
𝑾(𝒈𝒊)𝚻
𝑳
≤ 𝑩𝒑𝒐𝒘𝒆𝒓,
𝟏
|𝒈𝒊|
𝑾(𝒈𝒊)𝚻
𝑳
≤ 𝑩𝒑𝒓𝒊𝒄𝒆,
𝟏
|𝒈𝒊|
𝑾(𝒈𝒊)𝚻
𝑳
≤ 𝑩𝒅𝒆𝒍𝒂𝒚,
𝜿 ≤ 𝑪𝒂𝒑(𝑳𝒐𝒄𝒂𝒍_𝑪𝒍𝒐𝒖𝒅𝒔)
𝜿 ≜ 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒎𝒐𝒃𝒊𝒍𝒆 𝑼𝒔𝒆𝒓𝒔 𝒖𝒔𝒊𝒏𝒈
𝒔𝒆𝒓𝒗𝒊𝒄𝒆𝒔 𝒐𝒏 𝒍𝒐𝒄𝒂𝒍 𝒄𝒍𝒐𝒖𝒅
∀ 𝒖𝒌 ∈ 𝒖𝟏, … , 𝒖|𝑼|
∀ 𝒈𝒊 ∈ 𝒈𝒊, … , 𝒈|𝑮|
These optimization problems are NP-Hard (Knapsack is the special case)
so we look for heuristic/Approximation to solve them.
Optimal Service Allocation for
Mobile/Social Groups

13
Service Allocation Algorithms for
Single Mobile User and Mobile Group-Social
Applications
Brute-Force Search (BFS)
MuSIC
Genetic Based
Greedy Based (Different
Policies)
Random Service Allocation
(RSA)
• We start with our main one, which we call it MuSIC: Mobility
Aware Service AllocatIon on Cloud.
• Its core is based-on simulated annealing approach.
M. Reza. Rahimi, Nalini Venkatasubramanian, Athanasios Vasilakos, "MuSIC: On Mobility-Aware Optimal Service
Allocation in Mobile Cloud Computing", In the IEEE 6th International Conference on Cloud Computing, (Cloud
2013), Silicon Valley, CA, USA, July 2013 (cited : 110+).
Mobility-Aware Service Allocation
Algorithms on 2-Tier Cloud

14
 Partition mobile users and local clouds based on their
proximities and run service allocation algorithms for each
region in parallel (using clustering techniques).
Public Cloud
Local
Cloud
Local
Cloud
Local
Cloud
Local
Cloud
Local
Cloud Local
Cloud
Local
Cloud
Scaling Out MuSIC : Simplified

15
Service 1 Service n
Mobile Users
Pig Latin pseudo codes:
/*Load Data*/
LOAD mobile users , services….
/*Cartesian product to produce solution
space*/
CROSS Mobile users, Service1,…
/*Apply Optimization Constraints to
solution space */
FILTER by Constraints1,…
/*Find Best Solution*/
FOREACH Mobile User GENERATE
utility value
GROUP Solution By Mobile Users
FOREACH Solution GENERATE MAX
Apply System Constraints
Compute UTILITY FUNCTION of
each solution for Mobile Users
GROUP Solutions for each Mobile
Users
Find the MAXIMUM UTILITY for
each Mobile Users and emit as the
best solution
Scalable Brute-Force Search Using
Big Data Processing (Apache Pig)

16
QoS-Aware
Service DB
Mobile User
Log DB
Optimal Service Scheduler
Cloud Service Registry
Mobile
Client
MAPCloud
Web
Service
Interface
MAPCloud
Runtime
Local and
Public
Cloud Pool
MAPCloud LTW
Engine
MAPCloud Web Service Interface
MAPCloud SOAArchitecture
MAPCloud Video
Demo
M. Reza. Rahimi, Nalini Venkatasubramanian, Sharad Mehrotra and Athanasios Vasilakos, "MAPCloud: Mobile Applications
on an Elastic and Scalable 2-Tier Cloud Architecture", In the 5th IEEE/ACM International Conference on Utility and Cloud
Computing (UCC 2012), USA, Nov 2012 (cited : 110+)

17
Experimental and Simulation Results:
Mobile Applications Benchmarks
OCR+ Speech
(OCRS):
Video
Augmented
Reality
(VAR):
You Tube
Link
Multimedia File Sharing (MFS):
Mobile Apps Processing Storage Bandwidth Social Application
OCRS × ×
VAR × × ×
MFC × × × ×

18
Simulation Setup
Amazon
EC2,S3
Local Cloud
1
Local Cloud
5
Local Cloud 2
Local Cloud
7
Local Cloud
4
Local Cloud n
S1
.
.
.
Sn
S1
.
.
.
Sn
S1
.
.
.
Sn
S1
.
.
.
Sn
S1
.
.
.
Sn
large instance:
equivalent to a PC
with
7.5GB of memory,
850 GB of storage
Local Cloud:
64bit Windows dual-
core server,
with 8GB of memory
and 500GB of storage.
LAN Speed
Profiling sample applications has been used to
tune the system Environment.
Java Network simulator
(JNS) and CloudSim
used for modeling the
delay between
Local clouds.
RWP and Manhattan
mobility models are used
as the mobility models
(V[0/ms-10/ms]).
We also add some error
in mobility models to
check the robustness of
service allocation
algorithms.

19
Performance Results
MuSIC, Genetic, Greedy, RSA and G-MuSIC (5-Groups) algorithms average throughput
with uncertainty in LTW prediction in the range of [0%,30%]

20
Scalability Studies
RSA-Par/Pig-Based MuSIC-Par/Pig-Based Greedy-Par/Pig-Based GA-Par/Pig-Based
Single 48% 79% 65% 70%
Group 47% 74% 62% 68%
Settings: 10,000 mobile users, uniformly distributed, 6 different services per mobile users and for each
service we have 10 different candidates (on local or public cloud) # of local clouds: 50 , uniformly distributed
# public cloud :10 amazon Large instance.
In general our parallel version of our algorithm is ~4 times faster
Than Pig-based distributed version.

21
QoE-Aware smart home wireless
network management and optimization.

22
SMART Connectivity
Solution
Providing innovative means of connecting devices and things through new connectivity
paradigm.
2018
Need for a “Intelligent” Networking
2014
Growing “Connected Life “
Better Coverage
Higher Throughput Lower Interference
Lower Power/Energy
Bad Coverage
Low Throughput High Interference
High Power/Energy
Problem

23
Ambient Network Sensing
UX
Network
Devices
Apps
Vertical Handoffs
(VHOs)
Analytics Interference
Battery Life Service Cost
Context-aware Connectivity Context Aware Handoffs
Predict Quality of Experience
QoE Modeling
Outcome
Core Enabler Services

24
SMART Connectivity Service Diagram
Ambient network and application
sensing
RSSI
Link Speed
YouTube
Event Listener
App
ANS Profiler
647 sec
QoE Analytics
Sample
Predicated
0.425
17 events/sec
-85 dBm
1 Mbps
1.9
1
2
Objective Context
3
5
QoE Scale : 1 for “Poor” to 5 for
“Excellent”
Decision Tree
QoE Heat Maps
2.9
1.9
1.2
Initial
Buffering
Time
Buffering
Ratio
Buffering
Frequency
QoE for WiFi1
QoE for WiFi2
QoE for BT
Connect
to WiFi1
6
4
With Technology Usage we Got :
-- increase users QoE (MOS Score) and user’s
engagement up to 30%,
-- energy consumption reduction on smart
devices up to 2x,
-- reduce delay and buffering time up to 4x

25
Got Technology transfer to Samsung
New Acquired Company :
SmartThings
 Sensing Dongle,
 MIH IEEE 802.21 Implementation
on Linux Kernel

26
SLA-Aware intelligent cloud
management and optimization
R&D
M. Reza. Rahimi, “Self-Tuning Data Centers”, Big Data Innovation Summit, Las Vegas, 2017,
(Keynote Talk).

27
Applications Spectrum
Computing (CPU , GPU, DSP, FPGA,...)
Storage (DRAM, SSD, HDD,..)
Network (Wired, Wi-Fi, 4G,…)
Self-Driving Cars
Robotic/AI Applications
Data Management Systems
Video Streaming/IoT,…
These applications will be
fully or partially supported
by Data Centers Services
(Cloud-Based)

28
Typical Data Center Architecture
As a simple rule of thumb:
Enterprise Data Center Size :
100 Hosts
1000 VMs
~Logs : 40 GB/Day
Data Center
Management
VM-1-k
VM-1-1 VM-2-1
VM-2-m
VM-n-1
VM-n-l
Host 1 Host 2
Host n
logs
logs logs
Storage Pool
Big Data Engineering
and Science
Apps are running
on VMs

29
Some Data Center Management
Challenges/Opportunities
Service Level Agreement (SLA) :
Throughput/Latency (e-commerce applications):
► 2014 US $304 billion increasing 15.4% yearly in e-commerce,
► 100ms latency costs 1% decrease in sale,
► Page loading should be less than 2 seconds per page not to lose
customer, will decrease overall sales by 7%,
Resource Utilization (Capacity Planning) :
► ~ 90 percent of the VMs utilizes < 15% of assigned Cores/Storage,
► ~ 90 percent of the VMs only have < 10 IOPS,
Scalable and Elastic (on Demand) :
► Should know when and how to scale to satisfy SLA dynamically,

30
Energy Efficiency :
► By 2020 reduction of energy cost 30% based on
European law-Green DC,
► US data centers consume ~ 90 billion Kilowatt hours annually =
House hold in NY for two years
► Pollute over 150 million tons of carbon yearly in USA,
► Average server runs on [12%-18%] of their capacity most of the time
still consuming 30% to 60% of their maximum power consumption.
► High utilization -> save in power consumption->Low carbon
footprint
Dynamic Service Pricing :
► Computing, network and storage are utilities for workloads.
► Should model to find a dynamic way and good policy of pricing in
competitive market of cloud providers while increasing revenue/profit.

31
Software Compliance and License :
► ~ $500,000 spent on software licensing for average size data center,
► It could be per User/Device/VM/Core/…
► Different models and policies for license like:
1) Running licensed workload on bare metal (no virtualization),
2) Running licensed workload on dedicated cluster,
3) Migrate licensed workload,
4) …
► Workloads and cluster growth bring challenges for software license,
► This bring the challenge how to minimize the cost of software on
data centers and not violate license policy,

32
Self-Tuning Data Center : Simplified
Service Flow
VM
Scheduling and
Orchestrating Services
and Resources
Real-time Log and
Monitoring
Service
Watcher and
Policy Service
Recommendation
/Prediction Service
2) Ask correct size, type
And location for resource
Based on request
1) Request resource
3) Correct conf and
resource size and place
4) Allocate required
resources
1) Telemetry and log sending
2) Query logs for policy and
alert checks
4) Check for violation
and warnings
5) Alert of Violation
6) Ask for Recommendation
7 ) Send Recommendations
and Recipes
8 ) Apply Recommendation
Initial
State
Operational and
Recovery State
1 ) Ask Recommendation
For Self-Tune (for example
in low traffic state)
2) Send Tuning Plan
and Recommendation
(like VM migration or
resizing)
3 ) Apply Self-Tuning
recommendation
Self-Tuning
State
3) Collected Data
-- Resource Sizing Tool,
-- Time Series Prediction
(CAP/PERF(IOPS/LAT)),
-- Anomaly detection,
-- Performance Prediction
-- Simulation,
-- …

33
Got Technology transfer to Huawei
eService (Huawei OceanStor DJ)
 Scalable, Reliable and Available Telemetry
Platform,
 Modeling as a Service :
-- Storage Required Capacity Prediction,
-- Performance Prediction (IOPS/LAT/BW),
-- CACHE Sizing tool for OLTP/VDI/Exchange,
-- …

34
Low Complexity Secure Code for Big Data in
Cloud Storage Systems
R&D
Mohsen Kiskani, Hamid Sadjadpour, M. Reza Rahimi, Fred Etemadieh, "Low Complexity Secure Code
(LCSC) Design for Big Data in Cloud Storage Systems", in IEEE ICC, Kansas City, MO, USA, May
2018.

35
Data
Reliability
Data
Security and
Privacy
Data Replication
or Coding
Encryption/
Decryption
Current
Industry
Approach
Data Reliability +
Security and Privacy
Hybrid Codes
Increased processing speed by ~400%,
100% security in data (SLA),
No more need to use https,
PIR (Private Information Retrieval),

36
Both Encoding and Decoding
will be simple/cheap XOR
operations.
Encoding:
y1 = x3 ⨁ x4
y2 = x1 ⨁ x3
y3 = x1 ⨁ x2
y4 = x2 ⨁ x3 ⨁ x4
Decoding:
x1 = y1 ⨁ y3 ⨁ y4
x2 = y1 ⨁ y4
x3 = y1 ⨁ y2 ⨁ y3 ⨁ y4
x4 = y2 ⨁ y3 ⨁ y4
Sample Encoding and Decoding
y1 y2 y3 y4
y1 y2 y3 y4
x1 x2 x3 x4
x1 x2 x3 x4
Property Repetition / RAID Erasure / MDS Proposed Coding
Storage overhead Significant Very Small Very Small
Reparability Possible Possible Possible
Security Need Encryption Need Encryption No Encryption
Computational Complexity
comparing encryption
Low high Very Low

37
• Step 1 : User requests file B1 and sends the decoding
instructions to the storage unit,
• Step 2 : Storage unit combines the encoded files using the
decoding instructions in the storage processing unit,
• Step 3 : Storage unit transmits the result to the user,
• Step 4 : User combines the received data with its own encoded
files and creates B1,
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
Storage Unit
User
New
Encoding
Scheme
Original Data
B1
B2
B3
B4
B5
B6
𝚺 giCi + g13 C13 + g14
C14+ g15 C15 = B1
Decoding
Instructions
C13 C15
C14
𝚺 giCi
Storage Processing Service
(1)
(2)
(3)
(4)
Service Architecture and Data
Retrieval Process

Conclusion and Future Directions
 Self-tuning and managing services in different application
domains and context.
 Architectural and algorithmic patterns that those systems have
in common and some best practices to solve them.
 More things (physical/virtual) are connected together,
physically/virtually/semantically.
Managing this environment is very challenging for human, need a
way to have autonomous system with low human interaction.
AI/ML/DL and big data processing tools are some of the best
industrial practices to tackle these problems.
 We are in the beginning era of Web 4.0 : Self-tuning and
managing web.
38

Self-Tuning and Managing Services

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Self-Tuning and Managing Services

Similar to Self-Tuning and Managing Services (20)

More from Reza Rahimi

More from Reza Rahimi (19)

Recently uploaded

Recently uploaded (20)

Self-Tuning and Managing Services