4. • Application
owners/developers do not
care about the underlining
infrastructure unless it is a
problem.
• Microservices based
architectures demands
inherently granular
application design.
• SLAs for applications must
be holistic and independent
of the underlining
infrastructure
Vision
Host
Virtualization VirtualizationContainer Container
Container Container
Srvc Srvc Srvc Srvc Srvc Srvc Srvc
Application A Application B
5. Enable business/application
owners to easily define the
aspects that are relevant in
running their applications with
the budget constraints that are
imposed by IT.
Vision
6. Monitoring is now holistic and has to
consider various level of
virtualization and harmonize data
over the different layers.
Containers are short lived and
moved around the available
infrastructure.
Vision
Host
Virtualization VirtualizationContainer Container
Container Container
7. Application owners’ soft limits (alarms) are notified back and hard limits
(actions) are performed whenever required.
Vision
9. Underutilized Servers
OPS/NOC Policy Example
error(vm, email) :-
nova:server_owner(vm, owner),
two_months_before_today(start, end),
ceilometer:statistics(vm, start, end, “cpu-util”, cpu),
cpu < 5,
keystone:email(owner, email)
two_months_before_today(start, end) :-
date:today(end),
date:minus(end, “2 months”, start)
If a VM has less than 5% CPU utilization for the last 2 months,
then notify its owner via email
10. Current Solution
Ceilometer API
Congress API
Policy
Engine
Ceilometer
Datasource
GET
/v2/meters/cpu_util/statistics?resource_
id=…
VM UUID (Resource ID) CPU
xxxxxxxx-0001-xxxx-xxxxxxxxxxx
xxxxxxxx-0002-xxxx-xxxxxxxxxxx
xxxxxxxx-0003-xxxx-xxxxxxxxxxx
xxxxxxxx-0004-xxxx-xxxxxxxxxxx
xxxxxxxx-0005-xxxx-xxxxxxxxxxx
Poll every <n>s
40
30
2
70
55
11. Current Solution
Congress APIPolicy
Engine
Ceilometer
Datasource
VM UUID (Resource ID) CPU
xxxxxxxx-0001-xxxx
xxxxxxxx-0002-xxxx
xxxxxxxx-0003-xxxx
xxxxxxxx-0004-xxxx
xxxxxxxx-0005-xxxx
40
30
2
70
55
Nova API
Nova
Datasource
Keystone
Datasource
Keystone API
VM Owner
xxxxxxxx-0001-xxxx Ann
xxxxxxxx-0002-xxxx Fabio
xxxxxxxx-0003-xxxx Fabio
xxxxxxxx-0004-xxxx Ken
xxxxxxxx-0005-xxxx Ken
Owner Email
Ann AnnNotRealEmail@cisco.com
Fabio FabioNotRealEmail@cisco.com
Ken KenNotRealEmail@cisco.com
VM Email
xxxxxxxx-0003-xxxx FabioNotRealEmail@cisco.com
12. From Policy to Alarm
error(vm, email) :-
nova:server_owner(vm, owner),
two_months_before_today(start, end),
monasca_alarms:stats(vm, start, end, “cpu.user_perc”, cpu),
cpu < 5,
keystone:email(owner, email)
two_months_before_today(start, end) :-
date:today(end),
date:minus(end, “2 months”, start)
{
"name":"Average CPU percent is less than 5",
"description":"The average CPU percent is lesser than 5",
"expression":"(avg(cpu.user_perc{resource_id=vm}) < 5)",
"match_by":[
"resource_id"
],
"severity":”HIGH",
"ok_actions":[
”action_id_for_ok"
],
"alarm_actions":[
”action_id_for_alarm"
]
}
13. Proposed Solution (receiving notif.)
Metrics
DB
Monasca
Agents
Monasca API
Notification
Engine
Threshold
Engine
Persister
Kafka Cluster
Congress API
Policy
Engine
Monasca Alarm
Datasource
Webhook:
…/v1/data-
sources/monasca_alarm
?execute&action=handl
e_alarm
Settings
DB
monasca notification-create congress WEBHOOK
http:…/v1/data-
sources/monasca_alarm?execute&action=handle_ala
handle_alarm(params)
VM UUID (Resource ID) CPU
xxxxxxxx-0003-xxxx 2
POST /v2.0/alarm-definitions
14. Proposed Solution (receiving notifications)
Congress API
Policy
Engine
Monasca Alarm
Datasource
VM UUID (Resource ID) CPU
xxxxxxxx-0003-xxxx 2
Nova API
Nova
Datasource
Keystone
Datasource
Keystone API
VM Owner
xxxxxxxx-0003-xxxx Fabio
Owner Email
Fabio FabioNotRealEmail@cisco.com
VM Email
xxxxxxxx-0003-xxxx FabioNotRealEmail@cisco.com
16. VM Evacuation for Biz Critical App if Host has potential health issues
App Intent Policy Example
error(vm) :-
nova:show(vm, hostID),
monasca_alarm:host_issues(hostID)
If a Host has issues, for instance:
1. Unhealthy: cannot be pinged and or SSH into
2. Network errors and packet loss
3. Disk space below certain threshold
17. App Intent Policy: Metrics Correlation
error(vm) :-
nova:show(vm, hostID),
monasca_alarm:host_issues(hostID)
Metric Name Dimensions Value
host_alive_status observer_host=fqdn,
hostname=supplied hostname being
checked,
test_type=ping or ssh
0=online, 1=offline
disk.space_used_perc device, mount_point The percentage of disk space that
is being used on a device
net.in_packets_dropped_sec device Number of inbound network packets
dropped per second
net.out_packets_dropped_sec device Number of outbound network
packets dropped per second
18. App Intent Policy: Multi-Alarms #1
{
"name":”Host is Unhealty",
"description":"The host is considered unhealty",
"expression":"(host_alive_status{host_id=hostID}) = 1)",
"match_by":[
"host_id"
],
...
}
{
"name":”Host disk getting full",
"description":"The host disk is reaching capacity",
"expression":"(disk.space_used_perc{host_id=hostID}) > 90)",
"match_by":[
"host_id"
],
...
}
Metric Name Value
host_alive_status 0=online, 1=offline
disk.space_used_perc The percentage of disk
space that is being used on
a device
net.in_packets_dropped_sec Number of inbound network
packets dropped per
second
net.out_packets_dropped_se
c
Number of outbound
network packets dropped
per second
19. App Intent Policy: Multi-Alarms #2
{
"name":”Host is Unhealty",
"description":"The host is considered unhealty",
"expression":"(net.in_packets_dropped_sec{host_id=hostID}) > 30)",
"match_by":[
"host_id"
],
...
}
{
"name":”Host disk getting full",
"description":"The host disk is reaching capacity",
"expression":"(net.out_packets_dropped_sec{host_id=hostID}) > 30)",
"match_by":[
"host_id"
],
...
}
Metric Name Value
host_alive_status 0=online,
1=offline
disk.space_used_perc The percentage
of disk space that
is being used on
a device
net.in_packets_dropped_sec Number of
inbound network
packets dropped
per second
net.out_packets_dropped_sec Number of
outbound network
packets dropped
per second
22. • Done:
• Developed a Monasca Datasource to validate integration.
• Designed the solution and found the main integration points
• To be Done:
• Developed a Monasca Alarm Datasource leveraging the RPC
capabilties in Congress.
• Create a Congress Notification Webhook for Monasca
• Develop a policy to alarm conversion component to develop
policies prefixed with monasca-alarm.
Current Status and Next Steps