1. Self-Tuning and
Managing Services
Reza Rahimi, PhD.
Principal Staff Algorithm and Software Architect,
Huawei R&D (Futurewei) Cloud Storage Lab (Global CTO Office),
Santa Clara, USA.
R&D
2. Self-Tuning and Managing Services
Related Recent R&D Experience
2
QoE-Aware smart home wireless network management and
optimization.
SLA-Aware intelligent cloud management and optimization.
R&D
PhD Topic: QoS-Aware resource management in mobile cloud
computing.
Optimal Algorithm Design
R&D
Low complexity secure code for big data in cloud storage.
5+ Years
R&D
3. 3
PhD. Thesis:
โQoS-Aware Middleware for
Optimal Service Allocation in
Mobile Cloud Computing : An
Opportunistic Approach to Internet
of Thingsโ
Initial Core Idea
The Next Big
Thing
4. 4
Mobile Cloud Computing (MCC)
Ecosystem
M. Reza Rahimi, Jian Ren, Chi Harold Liu, Athanasios V. Vasilakos, and Nalini Venkatasubramanian, "Mobile Cloud
Computing: A Survey, State of Art and Future Directions", in ACM/Springer Mobile Application and Networks (MONET),
Special Issue on Mobile Cloud Computing, Nov. 2014 (cited : 210+).
Tier 1: Public Cloud
(+) Scalable, Elastic, Available
(+) Fault Tolerant
(-) Price, Delay, Privacy and
Security
Tier 2: Local/Private
Cloud
(+) Low Delay, Low Power,
(+) Privacy and Security,
(- ) Limited Capacity,
IBM: by 2018 61% of
enterprise would be on
tiered cloud
Wired and Wireless
Network Providers
Local and Private
Cloud Providers
Devices, Users
,Apps and Sensors
Public Cloud Providers
Content and Service
Providers
Heavy Reading predicts that
the direct revenue of mobile
cloud market will grow
to about 68 billion $ by 2018.
5. 5
Problem Statement
Context as a Service:
ex: Mobility patterns,
Service Usage in different
location and time,
Group-Aware, Social
Context, โฆ
Network as a
Service:
ex: Wireless
connectivity
(Wi-Fi,
3G/4G/5G,
Bluetooth,โฆ)
Computation
/Storage as a Service
(2-Tier):
ex: computation, Storage,
Platform,..
QoS-Aware Optimal service
allocation for mobile users based on
important criteria like:
Delay
Power
Price
6. 6
Outline
โข Location-Time Workflow
โข QoS and Normalization
โข Group/Single Mobile Applications, Fairness Utility
Problem
Formulation
โข Simulated Annealing Approach
โข Greedy Approach
โข Genetic Algorithm Approach
โข Scalability
Mobility-Aware
Service Allocation
Algorithms
โข Orchestrating all Components in MAPCloud SOA
Architecture.
Middleware
Architecture
โข Performance Results of the Algorithms
Experimental
Results
7. 7
Modeling Services in MCC
โข Applications/System Queries could be modeled as the
Workflow.
โข It consists of number of logical steps known as a Service
with different composition patterns:
S1
S2
S4
S3
S5
S7
S8
S6
0
1
Par1
Par2
3
Start End
S1 S2 S3
S1
S2
S4
S3
S1
S1
S2
S4
S3
SEQ LOOP
AND: CONCURRENT FUNCTIONS XOR: CONDITIONAL FUNCTIONS
k
1
1
P1
P2
๐ท๐ + ๐ท๐ = ๐, ๐ท๐, ๐ท๐ โ {๐, ๐}
8. 8
t1 t2 t4
t3 tN
l2
l1
l3
ln
W1
Wk+1
Wk
Wj+1
Wj
Location-Time Workflow
โข It could be formally defined as:
๐พ(๐๐)๐ป
๐ณ
โ (๐ ๐๐ ๐๐๐
๐๐๐ , ๐ ๐๐ ๐๐๐
๐๐๐ ,โฆ.,๐ ๐๐ ๐๐๐
๐๐๐ )
Location-Time Workflow
M. Reza. Rahimi, Nalini Venkatasubramanian, Sharad Mehrotra, Athanasios Vasilakos, "On Optimal and Fair Service
Allocation in Mobile Cloud Computing", in IEEE Transaction on Cloud Computing, 2015.
9. 9
Quality of Service (QoS)
๐(๐๐
๐๐,๐๐,๐๐)๐๐๐๐๐ power consumed on u cellphone when he is in location l at t using s .
โข The QoS could be defined in two different Levels:
โข Atomic service level
โข Composite service level or workflow level.
โข Atomic service level could be defined as (for power
as an example):
โข The workflow QoS is based on different patterns.
QoS SEQ AND (PAR) XOR (IF-ELSE-THEN) LOOP
๐พ๐๐๐๐๐
๐(๐๐
๐๐,๐๐,๐๐)๐๐๐๐๐
๐ ๐
๐ ๐
๐(๐๐
๐๐,๐๐,๐๐)๐๐๐๐๐
๐ ๐
๐ ๐
๐๐๐
๐
๐(๐๐
๐๐,๐๐,๐๐)๐๐๐๐๐ ๐(๐๐
๐๐,๐๐,๐๐)๐๐๐๐๐ ร ๐
Shivajit Mohapatra, M Reza Rahimi, Nalini Venkatasubramanian, Handbook of Energy-Aware and Green Computing-Two
Volume Set, CRC Press Taylor & Francis Group, 2012 (cited :100+).
10. 10
โข As it can be understood different QoSes have different dimensions
(Price->$, power->joule, delay->s)
โข We need the normalization process to make them comparable.
QoS Normalization
๐พ(๐๐)๐ฏ
๐ณ
๐๐๐๐๐
โ
๐พ(๐๐)๐ฏ
๐ณ
๐๐๐๐๐
๐๐๐
โ ๐พ(๐๐)๐ฏ
๐ณ
๐๐๐๐๐
๐พ(๐๐)๐ฏ
๐ณ
๐๐๐๐๐
๐๐๐
โ ๐พ(๐๐)๐ฏ
๐ณ
๐๐๐๐๐
๐๐๐ , ๐พ(๐๐)๐ฏ
๐ณ
๐๐๐๐๐
๐๐๐
โ
๐พ(๐๐)๐ฏ
๐ณ
๐๐๐๐๐
๐๐๐
๐, ๐๐๐๐
๏ผ The normalized power, price and delay is
the real number in interval [0,1].
๏ผ The higher the normalized QoS the better
the execution plan is.
11. 11
๐๐๐
๐
|๐ผ|
๐๐๐ ๐พ(๐๐)๐ป
๐ณ
๐๐๐๐๐
, ๐พ(๐๐)๐ป
๐ณ
๐๐๐๐๐
, ๐พ(๐๐)๐ป
๐ณ
๐ ๐๐๐๐
๐๐
๐บ๐๐๐๐๐๐ ๐๐: ๐
|๐ผ|
๐พ(๐๐)๐ป
๐ณ
๐๐๐๐๐
โค ๐ฉ๐๐๐๐๐,
๐
|๐ผ|
๐พ(๐๐)๐ป
๐ณ
๐๐๐๐๐
โค ๐ฉ๐๐๐๐๐,
๐
|๐ผ|
๐พ(๐๐)๐ป
๐ณ
๐ ๐๐๐๐
โค ๐ฉ๐ ๐๐๐๐,
๐ฟ โค ๐ช๐๐(๐ณ๐๐๐๐_๐ช๐๐๐๐ ๐)
๐ฟ โ ๐ต๐๐๐๐๐ ๐๐ ๐๐๐๐๐๐ ๐ผ๐๐๐๐ ๐๐๐๐๐
๐๐๐๐๐๐๐๐ ๐๐ ๐๐๐๐๐ ๐๐๐๐๐
โ ๐๐ โ ๐๐, โฆ , ๐|๐ผ|
โข In this optimization problem our goal is to maximize the minimum
saving of power, price and delay of the mobile applications.
Optimal Service Allocation for
Single Mobile User
12. 12
๐๐๐
๐
|๐ฎ|
๐
|๐๐|
๐๐๐ ๐พ(๐๐)๐ป
๐ณ
๐๐๐๐๐
, ๐พ(๐๐)๐ป
๐ณ
๐๐๐๐๐
, ๐พ(๐๐)๐ป
๐ณ
๐ ๐๐๐๐
๐๐โ๐๐
๐๐โ๐ฎ
๐บ๐๐๐๐๐๐ ๐๐: ๐
|๐๐|
๐พ(๐๐)๐ป
๐ณ
๐๐๐๐๐
โค ๐ฉ๐๐๐๐๐,
๐
|๐๐|
๐พ(๐๐)๐ป
๐ณ
๐๐๐๐๐
โค ๐ฉ๐๐๐๐๐,
๐
|๐๐|
๐พ(๐๐)๐ป
๐ณ
๐ ๐๐๐๐
โค ๐ฉ๐ ๐๐๐๐,
๐ฟ โค ๐ช๐๐(๐ณ๐๐๐๐_๐ช๐๐๐๐ ๐)
๐ฟ โ ๐ต๐๐๐๐๐ ๐๐ ๐๐๐๐๐๐ ๐ผ๐๐๐๐ ๐๐๐๐๐
๐๐๐๐๐๐๐๐ ๐๐ ๐๐๐๐๐ ๐๐๐๐๐
โ ๐๐ โ ๐๐, โฆ , ๐|๐ผ|
โ ๐๐ โ ๐๐, โฆ , ๐|๐ฎ|
These optimization problems are NP-Hard (Knapsack is the special case)
so we look for heuristic/Approximation to solve them.
Optimal Service Allocation for
Mobile/Social Groups
13. 13
Service Allocation Algorithms for
Single Mobile User and Mobile Group-Social
Applications
Brute-Force Search (BFS)
MuSIC
Genetic Based
Greedy Based (Different
Policies)
Random Service Allocation
(RSA)
โข We start with our main one, which we call it MuSIC: Mobility
Aware Service AllocatIon on Cloud.
โข Its core is based-on simulated annealing approach.
M. Reza. Rahimi, Nalini Venkatasubramanian, Athanasios Vasilakos, "MuSIC: On Mobility-Aware Optimal Service
Allocation in Mobile Cloud Computing", In the IEEE 6th International Conference on Cloud Computing, (Cloud
2013), Silicon Valley, CA, USA, July 2013 (cited : 110+).
Mobility-Aware Service Allocation
Algorithms on 2-Tier Cloud
14. 14
๏ผ Partition mobile users and local clouds based on their
proximities and run service allocation algorithms for each
region in parallel (using clustering techniques).
Public Cloud
Local
Cloud
Local
Cloud
Local
Cloud
Local
Cloud
Local
Cloud Local
Cloud
Local
Cloud
Scaling Out MuSIC : Simplified
15. 15
Service 1 Service n
Mobile Users
Pig Latin pseudo codes:
/*Load Data*/
LOAD mobile users , servicesโฆ.
/*Cartesian product to produce solution
space*/
CROSS Mobile users, Service1,โฆ
/*Apply Optimization Constraints to
solution space */
FILTER by Constraints1,โฆ
/*Find Best Solution*/
FOREACH Mobile User GENERATE
utility value
GROUP Solution By Mobile Users
FOREACH Solution GENERATE MAX
Apply System Constraints
Compute UTILITY FUNCTION of
each solution for Mobile Users
GROUP Solutions for each Mobile
Users
Find the MAXIMUM UTILITY for
each Mobile Users and emit as the
best solution
Scalable Brute-Force Search Using
Big Data Processing (Apache Pig)
16. 16
QoS-Aware
Service DB
Mobile User
Log DB
Optimal Service Scheduler
Cloud Service Registry
Mobile
Client
MAPCloud
Web
Service
Interface
MAPCloud
Runtime
Local and
Public
Cloud Pool
MAPCloud LTW
Engine
MAPCloud Web Service Interface
MAPCloud SOAArchitecture
MAPCloud Video
Demo
M. Reza. Rahimi, Nalini Venkatasubramanian, Sharad Mehrotra and Athanasios Vasilakos, "MAPCloud: Mobile Applications
on an Elastic and Scalable 2-Tier Cloud Architecture", In the 5th IEEE/ACM International Conference on Utility and Cloud
Computing (UCC 2012), USA, Nov 2012 (cited : 110+)
17. 17
Experimental and Simulation Results:
Mobile Applications Benchmarks
OCR+ Speech
(OCRS):
Video
Augmented
Reality
(VAR):
You Tube
Link
Multimedia File Sharing (MFS):
Mobile Apps Processing Storage Bandwidth Social Application
OCRS ร ร
VAR ร ร ร
MFC ร ร ร ร
18. 18
Simulation Setup
Amazon
EC2,S3
Local Cloud
1
Local Cloud
5
Local Cloud 2
Local Cloud
7
Local Cloud
4
Local Cloud n
S1
.
.
.
Sn
S1
.
.
.
Sn
S1
.
.
.
Sn
S1
.
.
.
Sn
S1
.
.
.
Sn
large instance:
equivalent to a PC
with
7.5GB of memory,
850 GB of storage
Local Cloud:
64bit Windows dual-
core server,
with 8GB of memory
and 500GB of storage.
LAN Speed
Profiling sample applications has been used to
tune the system Environment.
Java Network simulator
(JNS) and CloudSim
used for modeling the
delay between
Local clouds.
RWP and Manhattan
mobility models are used
as the mobility models
(V[0/ms-10/ms]).
We also add some error
in mobility models to
check the robustness of
service allocation
algorithms.
19. 19
Performance Results
MuSIC, Genetic, Greedy, RSA and G-MuSIC (5-Groups) algorithms average throughput
with uncertainty in LTW prediction in the range of [0%,30%]
20. 20
Scalability Studies
RSA-Par/Pig-Based MuSIC-Par/Pig-Based Greedy-Par/Pig-Based GA-Par/Pig-Based
Single 48% 79% 65% 70%
Group 47% 74% 62% 68%
Settings: 10,000 mobile users, uniformly distributed, 6 different services per mobile users and for each
service we have 10 different candidates (on local or public cloud) # of local clouds: 50 , uniformly distributed
# public cloud :10 amazon Large instance.
In general our parallel version of our algorithm is ~4 times faster
Than Pig-based distributed version.
22. 22
SMART Connectivity
Solution
Providing innovative means of connecting devices and things through new connectivity
paradigm.
2018
Need for a โIntelligentโ Networking
2014
Growing โConnected Life โ
Better Coverage
Higher Throughput Lower Interference
Lower Power/Energy
Bad Coverage
Low Throughput High Interference
High Power/Energy
Problem
24. 24
SMART Connectivity Service Diagram
Ambient network and application
sensing
RSSI
Link Speed
YouTube
Event Listener
App
ANS Profiler
647 sec
QoE Analytics
Sample
Predicated
0.425
17 events/sec
-85 dBm
1 Mbps
1.9
1
2
Objective Context
3
5
QoE Scale : 1 for โPoorโ to 5 for
โExcellentโ
Decision Tree
QoE Heat Maps
2.9
1.9
1.2
Initial
Buffering
Time
Buffering
Ratio
Buffering
Frequency
QoE for WiFi1
QoE for WiFi2
QoE for BT
Connect
to WiFi1
6
4
With Technology Usage we Got :
-- increase users QoE (MOS Score) and userโs
engagement up to 30%,
-- energy consumption reduction on smart
devices up to 2x,
-- reduce delay and buffering time up to 4x
25. 25
Got Technology transfer to Samsung
New Acquired Company :
SmartThings
๏ Sensing Dongle,
๏ MIH IEEE 802.21 Implementation
on Linux Kernel
26. 26
SLA-Aware intelligent cloud
management and optimization
R&D
M. Reza. Rahimi, โSelf-Tuning Data Centersโ, Big Data Innovation Summit, Las Vegas, 2017,
(Keynote Talk).
27. 27
Applications Spectrum
Computing (CPU , GPU, DSP, FPGA,...)
Storage (DRAM, SSD, HDD,..)
Network (Wired, Wi-Fi, 4G,โฆ)
Self-Driving Cars
Robotic/AI Applications
Data Management Systems
Video Streaming/IoT,โฆ
These applications will be
fully or partially supported
by Data Centers Services
(Cloud-Based)
28. 28
Typical Data Center Architecture
As a simple rule of thumb:
Enterprise Data Center Size :
100 Hosts
1000 VMs
~Logs : 40 GB/Day
Data Center
Management
VM-1-k
VM-1-1 VM-2-1
VM-2-m
VM-n-1
VM-n-l
Host 1 Host 2
Host n
logs
logs logs
Storage Pool
Big Data Engineering
and Science
Apps are running
on VMs
29. 29
Some Data Center Management
Challenges/Opportunities
Service Level Agreement (SLA) :
Throughput/Latency (e-commerce applications):
โบ 2014 US $304 billion increasing 15.4% yearly in e-commerce,
โบ 100ms latency costs 1% decrease in sale,
โบ Page loading should be less than 2 seconds per page not to lose
customer, will decrease overall sales by 7%,
Resource Utilization (Capacity Planning) :
โบ ~ 90 percent of the VMs utilizes < 15% of assigned Cores/Storage,
โบ ~ 90 percent of the VMs only have < 10 IOPS,
Scalable and Elastic (on Demand) :
โบ Should know when and how to scale to satisfy SLA dynamically,
30. 30
Energy Efficiency :
โบ By 2020 reduction of energy cost 30% based on
European law-Green DC,
โบ US data centers consume ~ 90 billion Kilowatt hours annually =
House hold in NY for two years
โบ Pollute over 150 million tons of carbon yearly in USA,
โบ Average server runs on [12%-18%] of their capacity most of the time
still consuming 30% to 60% of their maximum power consumption.
โบ High utilization -> save in power consumption->Low carbon
footprint
Dynamic Service Pricing :
โบ Computing, network and storage are utilities for workloads.
โบ Should model to find a dynamic way and good policy of pricing in
competitive market of cloud providers while increasing revenue/profit.
Some Data Center Management
Challenges/Opportunities
31. 31
Software Compliance and License :
โบ ~ $500,000 spent on software licensing for average size data center,
โบ It could be per User/Device/VM/Core/โฆ
โบ Different models and policies for license like:
1) Running licensed workload on bare metal (no virtualization),
2) Running licensed workload on dedicated cluster,
3) Migrate licensed workload,
4) โฆ
โบ Workloads and cluster growth bring challenges for software license,
โบ This bring the challenge how to minimize the cost of software on
data centers and not violate license policy,
Some Data Center Management
Challenges/Opportunities
32. 32
Self-Tuning Data Center : Simplified
Service Flow
VM
Scheduling and
Orchestrating Services
and Resources
Real-time Log and
Monitoring
Service
Watcher and
Policy Service
Recommendation
/Prediction Service
2) Ask correct size, type
And location for resource
Based on request
1) Request resource
3) Correct conf and
resource size and place
4) Allocate required
resources
1) Telemetry and log sending
2) Query logs for policy and
alert checks
4) Check for violation
and warnings
5) Alert of Violation
6) Ask for Recommendation
7 ) Send Recommendations
and Recipes
8 ) Apply Recommendation
Initial
State
Operational and
Recovery State
1 ) Ask Recommendation
For Self-Tune (for example
in low traffic state)
2) Send Tuning Plan
and Recommendation
(like VM migration or
resizing)
3 ) Apply Self-Tuning
recommendation
Self-Tuning
State
3) Collected Data
-- Resource Sizing Tool,
-- Time Series Prediction
(CAP/PERF(IOPS/LAT)),
-- Anomaly detection,
-- Performance Prediction
-- Simulation,
-- โฆ
33. 33
Got Technology transfer to Huawei
eService (๏ Huawei OceanStor DJ)
๏ Scalable, Reliable and Available Telemetry
Platform,
๏ Modeling as a Service :
-- Storage Required Capacity Prediction,
-- Performance Prediction (IOPS/LAT/BW),
-- CACHE Sizing tool for OLTP/VDI/Exchange,
-- โฆ
34. 34
Low Complexity Secure Code for Big Data in
Cloud Storage Systems
R&D
Mohsen Kiskani, Hamid Sadjadpour, M. Reza Rahimi, Fred Etemadieh, "Low Complexity Secure Code
(LCSC) Design for Big Data in Cloud Storage Systems", in IEEE ICC, Kansas City, MO, USA, May
2018.
35. 35
Data
Reliability
Data
Security and
Privacy
Data Replication
or Coding
Encryption/
Decryption
Current
Industry
Approach
Data Reliability +
Security and Privacy
Hybrid Codes
Increased processing speed by ~400%,
100% security in data (SLA),
No more need to use https,
PIR (Private Information Retrieval),
36. 36
Both Encoding and Decoding
will be simple/cheap XOR
operations.
Encoding:
y1 = x3 โจ x4
y2 = x1 โจ x3
y3 = x1 โจ x2
y4 = x2 โจ x3 โจ x4
Decoding:
x1 = y1 โจ y3 โจ y4
x2 = y1 โจ y4
x3 = y1 โจ y2 โจ y3 โจ y4
x4 = y2 โจ y3 โจ y4
Sample Encoding and Decoding
y1 y2 y3 y4
y1 y2 y3 y4
x1 x2 x3 x4
x1 x2 x3 x4
Property Repetition / RAID Erasure / MDS Proposed Coding
Storage overhead Significant Very Small Very Small
Reparability Possible Possible Possible
Security Need Encryption Need Encryption No Encryption
Computational Complexity
comparing encryption
Low high Very Low
37. 37
โข Step 1 : User requests file B1 and sends the decoding
instructions to the storage unit,
โข Step 2 : Storage unit combines the encoded files using the
decoding instructions in the storage processing unit,
โข Step 3 : Storage unit transmits the result to the user,
โข Step 4 : User combines the received data with its own encoded
files and creates B1,
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
Storage Unit
User
New
Encoding
Scheme
Original Data
B1
B2
B3
B4
B5
B6
๐บ giCi + g13 C13 + g14
C14+ g15 C15 = B1
Decoding
Instructions
C13 C15
C14
๐บ giCi
Storage Processing Service
(1)
(2)
(3)
(4)
Service Architecture and Data
Retrieval Process
38. Conclusion and Future Directions
๏ผ Self-tuning and managing services in different application
domains and context.
๏ผ Architectural and algorithmic patterns that those systems have
in common and some best practices to solve them.
๏ผ More things (physical/virtual) are connected together,
physically/virtually/semantically.
๏ผManaging this environment is very challenging for human, need a
way to have autonomous system with low human interaction.
๏ผAI/ML/DL and big data processing tools are some of the best
industrial practices to tackle these problems.
๏ผ We are in the beginning era of Web 4.0 : Self-tuning and
managing web.
38