Configuration Crawler for Cloud Appliances Meta-Data
1. A Configuration Crawler for
Cloud Appliances
Michael Menzel, Markus Klems, Hoang Anh-Le, Stefan Tai
eOrganization Research Group
Karlsruhe Institute of Technology (KIT)
March 27, 2013, International Conference on Cloud Engineering (IC2E)
2. Agenda
1. Foundations, Motivation & Existing Work
2. Method: A Configuration Crawler
3. Validation: Implementation for AWS EC2
4. Conclusion & Outlook
#2
4. Cloud Appliances in Compute IaaS*
• Differently configured Virtual Machine Images
VM Image VM Image
Executables & Data
Software Platforms
Libraries
Operating System Operating System
Operating System only Full/Partial Software Stack
* Infrastructure as a Service (IaaS) #4
5. Appliances in Today‘s Public Clouds
• Not all Providers offer Appliances
• Rackspace
Simple Cloud
Both • AWS EC2
VM Images • GoGrid Appliances
Centralized Packaging Decentralized Packaging
• Engaged Users create many Appliances
Top 3 public AMI owners in US-East-1, April 13 2012
#5
6. Meta-Data on Cloud Appliances
• There is Meta-Data, but not on Configuration
• Crawling needed to gain more information
#6
7. Applications
• Interoperability: Convert Appliances to
Configuration Management Manifests
• Decision Support: Consider Configuration
Data in Virtual Machine Selection
• Statistics: Aggregate Configuration Data
#7
8. Existing Work
• Meta-Data bundled with VM Image Files [1]
• Configuration Mgmt. to upgrade Appliances [2]
• Chef Ohai and Puppet Facter to collect installed
libraries in Systems
– For most Operating Systems
– For most Package Managers
[1] D. Lutterkort and M. McLoughlin, “Manageable virtual appliances,” Linux Symposium, 2007.
[2] R. Filepp, L. Shwartz, C. Ward, R. Kearney, K. Cheng, C. Young, and Y. Ghosheh, “Image selection as a
service for cloud computing environments,” in Service-Oriented Computing and Applications #8
(SOCA), 2010 IEEE International Conference on, dec. 2010, pp. 1 –8.
9. A METHOD FOR CRAWLING
VIRTUAL APPLIANCE CONFIGURATIONS
#9
10. Method for Configuration Crawling
• Procedure Model for
Crawling Virtual
Appliance
Configurations
Parameter Input
Operation
Data Artifact
# 10
12. Crawling Configuration Data
• Split Function allows parallel
processing
• Instantiate & Crawl multiple
Virtual Appliances in parallel
• Leverage configuration mgmt.
Agents* to detect configuration
• Collect configuration meta-data
from started Appliance Instance
# 12
13. Data Persistence
• Centralized storing of crawled configuration meta-
data
• Persistent, centralized data store enables to reuse
data in several applications
# 13
18. Implementation for AWS EC2 [3]
• Ruby Discoverer with filter & blacklist
• Ruby Crawler EC2 Instances injecting Chef Ohai [4] to
instantiated Appliances
– Ohai requires Ruby
– Intermediate Result Collection to AWS S3
• Crawling Appliance 21 min. avg., costs 1 EC2-h
• MongoDB to store JSON Data, and copy on Google
AppEngine for WebApp
[3] Available at http://github.com/myownthemepark/ami-crawler
[4] http://wiki.opscode.com/display/chef/Ohai # 18
19. Find it online!
You can find the Crawler Database as a Web App on
myownthemepark.com
... enhancing it permanently.
# 19
21. Conclusion
• Crawling Configuration Data of Cloud
Appliances is feasible
– Proposed a procedure and data model
– Validated the approach with a Proof-of-Concept
• Several Applications for collected
Configuration Meta-Data of Appliances
– Configuration Manifests for Interoperability
– Statistics and Decision Support
# 21
22. Outlook
• Extend implementation with support for more
Cloud compute services
• Use Crawler Data in Decision Support
Frameworks for Web Applications (e.g.,
CloudGenius [5])
[5] M. Menzel and R. Ranjan, “CloudGenius: Decision Support for Web
Server Cloud Migration,” in Proceedings of the 21st International # 22
Conference on World Wide Web. New York, NY, USA: ACM, 2012.
23. Discussion on the findings
THANK YOU!
TIME FOR QUESTIONS AND COMMENTS
# 23
24. Contact Me
For Questions, Discussions,
or Initiating Research Exchange:
Michael Menzel
Karlsruhe Institute of Technology (KIT)
Englerstr. 11
76131 Karlsruhe
Email: menzel@kit.edu
26. Related Work
• Security Analysis:
– T. Garfinkel and M. Rosenblum, “A virtual machine introspection based architecture for
intrusion detection,” in NDSS, 2003.
• Configuration Management:
– R. Filepp, L. Shwartz, C. Ward, R. Kearney, K. Cheng, C. Young, and Y. Ghosheh, “Image
selection as a service for cloud computing environments,” in Service-Oriented Computing
and Applications (SOCA), 2010 IEEE International Conference on, dec. 2010, pp. 1 –8.
– K. Magoutis, M. Devarakonda, N. Joukov, and N. G. Vogl, “Galapagos: Model-driven discovery
of end-to-end application-storage relationships in distributed systems,” IBM Journal of
Research and Development, vol. 52, no. 4.5, pp. 367 –377, july 2008.
– IBM, “Tivoli application dependency discovery manager,” http://www-
01.ibm.com/software/tivoli/products/taddm/, accessed 25th April 2012.
– A. V. Dastjerdi, S. G. H. Tabatabaei, and R. Buyya, “An Effective Architecture for Automated
Appliance Management System Applying Ontology-Based Cloud Discovery,” in Proceedings
of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing,
IEEE Computer Society. Ieee, 2010, pp. 104–112.
• Meta-Data in VM Image Files
– D. Lutterkort and M. McLoughlin, “Manageable virtual appliances,” Linux Symposium, 2007.
# 26
27. Appliances in Today‘s Public Clouds
Cloud • Centralized Packaging
Appliances • Decentralized Packaging
Simple VM • Centralized Packaging
Images
# 27
28. Appliances in AWS‘ Public Cloud
• Amazon accounts for >50.000 AMIs, growing
daily
• AMIs differ in multiple attributes, including its
software configuration
# 28