1. Grid-based VM-provisioning Using STAR / Nimbus Date : 2 Apr, 2009 Presented by : Arunabh Das School of Computer Science Sources: http://workspace.globus.org http://www.gridvm.org Other Sources : Enabling Cost-Effective Resource Leases with Virtual Machines, Sotomayor, B., K. Keahey, I. Foster, T. Freeman. HPDC 2007 Hot Topics session, Monterey Bay, CA. June 2007 (pdf) Virtual Workspaces for Scientific Applications, Keahey, K., T. Freeman, J. Lauret, D. Olson. SciDAC 2007 Conference, Boston, MA. June 2007 (pdf)
2. Topics covered in this presentation What are the goals. What are the problems we are trying to solve? Benefits of grid computing Issues with grid computing Why grid computing ! = cloud computing The case for grid VM Brief history of grid Vms Introduction to STAR and description of provisioning with STAR – schematics + deployment Introduction to NIMBUS and description of provisioning – mechanism with NIMBUS
3. What do we seek?What are we looking to find? Ginormous amounts of compute power Available 24x7x365 Humongous amounts of storage Also available 24x7x365 The ability to access the above from the cupboards that professors and post-doc fellows and the millions of starving graduate students in North America, Europe, Asia and Africa live in
4. Why?Who needs all that compute power? Many people, but just as an example - The data stream from the LHC detector is approximately 300 GB/s The CERN computer center has a dedicated 10 GB/s connection to the counting room 27 TB of raw data + 10 TB of event summary data LHC Computing Grid has hundreds of Tier 1 and Tier2 institutions connected via dedicated 10 GB/s links Source :http://en.wikipedia.org/wiki/LHC_Computing_Grid
5. So we still haven't found what we're looking for? Kate Keahey is a scientist at Argonne National Laboratory and a Computation Institute fellow at University of Chicago She created and leads the Nimbus Project She calls it Infrastructure as a Service (IaaS) Which makes perfect sense!!
6. A Brief History of Nimbus First STAR production run on EC2 Xen released EC2 goes online Nimbus Cloud comes online 2003 2009 2006 Research on agreement-based services First WSRF Workspace Service release Support for EC2 interfaces EC2 gateway available Context Broker release Source - http://colab.cim3.net/file/work/Expedition_Workshop/2009_02_09_LeveragingSOA_Cyberinfrastructure/magic-Keahey.cloudcomputing.ppt
7. Grid Technologies : A brief overview Infrastructure (”middleware”) for establishing, managing and evolving multi-organization federations Secure, coordinated sharing Dynamic, autonmous, domain independent On-demand, ubiquitous access to computing, data and services Globus Toolkit : An implementation of the most basic capabilities A de facto implementation standard
8. A typical grid use-case gridmapfile 4. Transfers data from a remote location Grid Security Infrastructure (GSI) Monitoring and Discovery Service (MDS) Grid Resource and Allocation Manager (GRAM) Data Transfer (GridFTP) 3. Starts a remote computation 1. User logs into the Grid (single sign-on): grid-proxy-init 2. Finds available resources
9. The case for VM on grid Most grid applications and grid infrastructure just needs to be able to handle heavy computation and heavy lifting of data However, certain applications (Ex – Nuclear Physics STAR Experiment) rely heavily on dynamically loading external libraries depending on the task to be performed Configuring an environment for such an application is complex Deployment on non-dedicated platform = effort consuming Even when the application compiles on a new platform, validating it is a controlled process subject to quality assurance and regression testing to ensure Physics reproducibility and result uniformity Heavy reliance of an application (ex-Physics Engine) on dependencies deeply embedded in the environment => Porting application would be easiest if we could take the full software stack from the operating system up, and simply install that environment on remote resources
10. The case for VM on grid (contd) Virtual machine provides a software-based virtualization of a physical host machine Dedicated Configured with a full software stack Once configured, deploys on a remote resource in a matter of milliseconds Resource provisioning via Vms is attractive
11. More benefits of VM on grid A scientist can develop his or her application within a familiar environment Can port this environment between local and remote resources as the need arises This facilitates provisioning resources for an application The virtual machine can be run as easily on local resources as on remote resources or resources outsourced commercially
12. A quick look at Virtual Workspaces (STAR) STAR is the predecessor of Nimbus and was developed by Kate Keahey and Tim Freeman at ANL The Solenoidal Tracker at RHIC (STAR) is a detector which specializes in tracking the thousands of particles produced by each ion collision at RHIC. (Relativistic Heavy Ion Collider) STAR is a massive detector. It is used to search for signatures of the form of matter that RHIC was designed to create: the quark-gluon plasma. It is also used to investigate the behavior of matter at high energy densities by making measurements over a large area. It is a proof-of-concept strategy developed for the High Energy and Nuclear Physics (HENP) group
13. A Brief Look at VM A VM is a virtualization abstraction of a physical machine (hardware resources + software infrastructure) Software running on a host supporting VM deployment, typically called a VMM (Virtual Machine Monitor) or Hypervisor is responsible for supporting this abstraction by intercepting and emulating instructions issued by the guest machine Hypervisor provides an interface allowing client to start, pause, serialize, and shut down multiple guests VM image is composed of a full image of a VM RAM, disk images and configuration files Thus, VM can be paused, its state serialized and later resumed at a different time and in a different location => Decouples image preparation from deployment => Easy migration Sources : Enabling Cost-Effective Resource Leases with Virtual Machines, Sotomayor, B., K. Keahey, I. Foster, T. Freeman. HPDC 2007 Hot Topics session, Monterey Bay, CA. June 2007 (pdf) Virtual Workspaces for Scientific Applications, Keahey, K., T. Freeman, J. Lauret, D. Olson. SciDAC 2007 Conference, Boston, MA. June 2007 (pdf)
14. Paravirtualization Virtualization technique that presents a software interface to virtual machines that is similar but not identical to that of the underlying hardware. Example of paravirtualization = The virtual monitor can present the host operating system with an intelligent NIC with support for DMA-based sending of packets, even though the NIC on the real system lacks this capability. Sending packets is then done entirely by the virtual monitor and NIC interrupts may be processed by the monitor too Since delivering interrupts to the host operating system is expensive, performance can improve. Who'd a thunk it? Paravirtualization actually helps performance!! http://en.wikipedia.org/wiki/Paravirtualization
15. Virtual Workspace Features Workspace provides interfaces based on the WSRF Allows an authorized Grid client to deploy, shutdown, pause and reactivate VMS
16. Worker node deployment workflow Worker node deployment is requested on-demand by an authorized off-site grid client Resource allocation request asks for 2 GB memory and the full use of a CPU for each virtual node On deployment, each node reports to Condor headnode and joins the Condor pool A web application displays current virtual cluster node information based on Condor pool properties A client can then start jobs on the deployed VM using GRAM2 deployed on the static CE (compute element)
17. Schematic for provisionig of STAR nodes TeraPort node TeraPort node TeraPort node TeraPort node TeraPort node Provisioning STAR nodes Workspace Service Star node Star node Star node Star node Star node STAR Execute new STAR instance Star node Star node Star node OSG CE GRAM
18. Current Cloud providers Go Grid Amazon Web Services Google App Engine Mosso Slice Host Media Temple Flexiscale Joyent Although they provide web services and compute-power- on-demand, they are not Virtual Machines on grid
19. Cloud Computing – Everything as a Service Elastic computing, Pay-as-you-go, Capital expense operational expense Source : Cloud Computing with Nimbus, FNAL, January 2009 Kate Keahey, University of Chicago, ANL
20. Cloud Computing – Everything-as-a-service Software as a Service SaaS PaaS Platform as a Service IaaS Infrastructure as a Service The anology to the real world is that it used to be that if you wanted to go to the airport, you could call a cab and pay the cab-driver. Then, they said, you know what – if you pay us, we can let you rent the car and you can have the car, but you can't be setting the car on fire Now – you can lease a car and have the car for as long as you want and do whatever you want to the car. Ofcourse, you are going to be able to do a lot more than just drive to the airport with the car. Source : Cloud Computing with Nimbus, FNAL, January 2009 Kate Keahey, University of Chicago, ANL
21. Main problems we're trying to solve Code complexity Resource control Source - The Nimbus Toolkit : http://workspace.globus.org
22. The concept of 'workspaces' Dynamicall provisioned environment Environment control Resource control Hardware implementations vs. virtualization Source : Cloud Computing with Nimbus, FNAL, January 2009 Kate Keahey, University of Chicago, ANL
23. Nimbus Overview Goal: open source, extensible, IaaS implementation and tools Specifically targeting scientific community A platform for experimentation with features for scientific needs Set up private clouds (privacy, expense considerations) Tools IaaS layer (Workspace Service) Orchestration layer (Context Broker, gateway) http://workspace.globus.org/ Source : Cloud Computing with Nimbus, FNAL, January 2009 Kate Keahey, University of Chicago, ANL
24. Workspace Pilot and the concept of resource leases Resource leases – Allow users to request direct access to resources rather than ask for a job to be run on those resource Examples A static long-term agreement with a hosting company On-demand provisioning of a physical cluster partition with a specified configuration (Cluster-on-demand) Dynamically deploying a virtual machine for an hour on resource provied by Amazon's EC2 service
25. Advantages of 'Flying Low' (Workspace Pilot) A user can adapt resource to his needs Use it to support an interactive session Run computations requiring an application-specific scheduler Support portability tests across a variety of environments Exemplified by 'pilot job' approaches that use batch scheduler installations on sites to deliver a lease rather than submit a job to that scheduler Source : Flying Low : Simple Leases with Workspace Pilot Tim Freeman Kate Keahey, University of Chicago, ANL
26. Implementation of VWS (Virtual Workspace Service) Pool node Pool node Pool node VWS Service Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Source : Cloud Computing with Nimbus, FNAL, January 2009 Kate Keahey, University of Chicago, ANL
27. The Workspace Service The workspace service publishes information on each workspace as standard WSRF Resource Properties. Pool node Pool node Pool node VWS Service Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Users can interact directly with their workspaces the same way the would with a physical machine. Trusted Computing Base (TCB) Source : Cloud Computing with Nimbus, FNAL, January 2009 Kate Keahey, University of Chicago, ANL
28. Workspace Service Interfaces and Clients Web Services based Web Service Resource Framework (WSRF) GT-based Elastic Computing Cloud (EC2) Supported: ec2-describe-images, ec2-run-instances, ec2-describe-instances, ec2-terminate-instances, ec2-reboot-instances, ec2-add-keypair, ec2-delete-keypair Unsupported: availability zones, security groups, elastic IP assignment, REST Used alongside WSRF interfaces E.g., the University of Chicago cloud allows you to connect via the cloud client or via the EC2 client
29. Nimbus Schematic storage service workspace resource manager workspace control workspace service workspace pilot WSRF EC2 IaaS gateway EC2 potentially other providers context broker context client workspace client cloud client Source : Cloud Computing with Nimbus, FNAL, January 2009 Kate Keahey, University of Chicago, ANL
33. Start with EC2-like functionality and evolve to serve scientific projects: virtual clusters, diverse resource leases
34. Federating clouds: moving between cloud resources in academic and commercial spaceSource : Cloud Computing with Nimbus, FNAL, January 2009 Kate Keahey, University of Chicago, ANL