Unidata S.p.A. is an Italian cloud service provider that uses OpenStack. They developed monitoring solutions for both the infrastructure and user perspectives. For operators, they use Zenoss for infrastructure monitoring and tools like swatch and logwatch. For users, they implemented a monitoring system using Collectd to collect metrics from OpenStack via the API and LibVirt plugin, storing the data in RRD files for rendering by a Rails application. This provides performance monitoring of resources like CPU, network, and disks for each user's instances.
2. Agenda
• What is Unidata S.p.A.?
• (Cloud) monitoring
• OpenStack Monitoring
• Unidata case report
3. Unidata S.p.A.
• established in 1985
• pioneer of microcomputer technology in Italy
• today one of the most important ISPs
• PoP at NaMeX, MiX, AMS-IX
• large fiber infrastructure (Rome and province of Rome)
• a large number of WiFi installations (based on the OpenWISP project)
also for the Italian PA
• institutional partners
• AIIP - first Italian ISPs Association - founder and member, 1995
• NaMeX - Internet exchange and interconnection point - founder and
member, 1995
• strong vocation for innovation (making significant investments in R&D)
4. Unidata S.p.A.
• since 2012 - public and private cloud
services
• UniCloud [3] - yep, it’s OpenStack! ;-)
• Folsom release
• Full access to OpenStack API (SSL)
• IPv6 enabled
7. Puppies vs Cattle
• (crude) analogy that describes the most appropriate
use of the cloud paradigm
• “The servers in today’s data center are like puppies –
they’ve got names and when they get sick, everything
grinds to a halt while you nurse them back to health” --
Joshua McKenty, co-founder of Piston Cloud
• treat servers like cattle
• a single server should easily replaced
• it should be possible to (seamlessly) increment or
decrement their number for a given application
8. Puppies vs Cattle
• ...not only for VMs...
• it also make sense for the bare-metal
• this also changes something for
monitoring, doesn’t it?
9. Cloud Monitoring
• for cloud monitoring we’ve got two points of
view
• operators
• infrastructural monitoring
• end users
• cloud infrastructural resources (IaaS)
monitoring (e.g. cloud servers monitoring)
• cloud services monitoring (SaaS/PaaS)
10. Cloud Monitoring
• in both cases: what to monitor? and with what
purpose?
• availability - for proactive anomalies fix
• efficiency - for (proactive) capacity planning
• what is needed?
• alerting systems
• instantaneous measures
• historical data
12. OpenStack Monitoring
• as of today (Grizzly release) there is no
integrated and ready-to-use monitoring
system [1]
• what about Ceilometer?
• general purpose measurement
collector
13. OpenStack Monitoring
• Healthnmon (uses ceilometer) [2]
• inventory management
• alerts and notifications
• utilization data (CPU, RAM, network,
storage) for guests and hosts
14. ...meanwhile...
• those who already offer cloud services based on
the OpenStack had to develop (semi-) ad-hoc
solutions
• OpenStack is massively scalable...
• ...so also the monitoring system should be
scalable
• the good news is that we have all the ingredients
• and they are free and open source ;-)
15. What to monitor?
• load average/ CPUs/RAM/swap/disk
& network usage
• alerts based on absolute (and relative)
thresholds
• health of storage resources
• logs analysis
• system integrity checks
16. What to monitor?
• OpenStack specific
• services availability and logs of the following
• nova-*
• glance-*
• cinder-*
• keystone
• horizon
• misc (dnsmasq, swift, rabbitmq)
20. UniCloud Monitoring
• Zenoss core, for infrastructural monitoring
• open source (GPLv2)
• SNMP and network protocol monitoring of
applications, servers and network devices
• auto-discovery / auto-modeling
• crucial for automatizations (puppies vs
cattle)
• just add the SNMP agent to the configuration
of new nodes (e.g. with Puppet)
21. UniCloud - Zenoss core
• Web UI with events and
infrastructure summary
• historical data browsing
• customizable reports
• real-time email or user-
defined alerts
• simple integration with an
SMS gateway
22. UniCloud Monitoring
• OpenStack/Systems logs
• swatch - email alerts for errors/anomalies
• logwatch - daily system status review
• system integrity (and security)
• smartmontools - health of hard drives with email
notifications
• rkhunter - daily systems status analysis and
(eventual) alerting
• arpwatch - real-time ARP monitoring (detection of
duplicate IPs)
24. UniCloud Monitoring
• ad hoc monitoring system based on
• OpenStack API
• Collectd [5]
• collects, transfers and stores performance data of
computers and network equipment
• modular architecture
• we used RRD, LibVirt, and network plugins
• free and open source (GPLv2)
• we wrote a patch for the LibVirt plugin -
included since version 5.2 [6]
25. UniCloud Monitoring
• Front-end
• WEB-UI RoR (written from scratch)
• OpenStack ActiveResource - Ruby
binding for OpenStack API by
Unidata S.p.A. [7]
26. UniCloud Monitoring
• hypervisors
• acquire “raw” data from LibVirt (localhost)
• sends structured data to the collector
• collector
• receives data from the network
• (efficiently) writes RRD files
• RoR application
• establishes a mapping between OpenStack
cloud instances and RRD files (via API)
• renders performance graphs to fulfill user
requests (instances and timespans)
27. UniCloud Monitoring
• What gets monitored?
• all the measurements that the collectd LibVirt
plugin makes available
• for each vCPU - utilization rate (%)
• for each network interface - pps, bps and eps
(in+out)
• for each disks - bps and ops (read+write)
• with “extra volumes” from nova-volume
(or cinder)
28. UniCloud Monitoring
• Does it scale?
• collectd is not a new product...
• it has proven itself to be very reliable and scalable
• it’s possible to use multiple collectors
• for HA (using multicast) or LB
• puppies vs cattle?
• automatic discovery of new cloud instances
• collectd installation and configuration should be made by
means of a configuration management system (e.g. Puppet)