2. Anatomy of a Cloud
Data Centers
Clusters
Storage
Other
Grids/Clouds
Virtualization
VM Management & Deployment
Amazon S3, EC2
OpenNebula, Eucalyptus
Web 2.0 Interface
Programming API
Scripting & Programming
Languages
Google AppEngine
Microsoft Azure
Manjrasoft Aneka
Google Apps (Gmail, Docs,…)
Salesforce.com
Public Cloud
Private Cloud
Infrastructure as a Service
Platform as a Service
Software as a Service
2
3. The Next Revolution in IT
• Cloud Computing
• Subscribe
• Use
• $ - pay for what you
use, based on QoS
• Classical Computing
3
5. Platform as a Service (PaaS)
• Platform as a Service (PaaS) cloud systems provide a
software execution environment that application services
can run on
• The environment is not just a pre-installed operating
system but is also integrated with a programming-
language-level platform
• PaaS clouds’ users don’t need to take care of the
resource management or allocation problems such as
automatic scaling and load balancing.
5
6. 6
Common PaaS Scenario
Executor
Scheduler
Executor
Executor Executor
internet
internet
Programming / Deployment Model
public DumbTask: ITask
{
…
public void Execute()
{
……
}
}
for(int i=0; i<n; i++)
{
…
DumbTask task = new DumbTask();
app.SubmitExecution(task);
}
7. PaaS Providers
PaaS provider
Programming
Environments
Infrastructure
Google AppEngine Python, Java and Go Google Data Center
Azure .Net (Microsoft Visual Studio) Microsoft Data Centers
Force.com Apex Programming and Java Saleforce Data Center
Heroku Ruby, Java, Python and Scala Amazon EC2 and S3
Hadoop
MapReduce Model(Java,
Python)
Private Cloud- Elastic MapReduce
AppScale Java, Python Private Cloud
7
8. • Google App Engine lets you run your web applications on
Google's infrastructure
• With App Engine, there are no servers to maintain: You
just upload your application, and it's ready to serve your
users.
8
9. Google AppEngine
• Full support for common web technologies
• Program in Java, Go, or Python
• Automatic scaling, load balancing
• Scheduled tasks & queues
• Persistent storage
• Sandboxing
9
11. storing data:
• App Engine Datastore
• NOSql Datastore
• Google Cloud SQL
• RDBMS Based Databases (MySQL)
• Google Cloud Storage
• provides a storage service for objects and files up to terabytes in
size
11
12. App Engine Services
• Mail
• Memcache
• Image Manipulation
• Full Text Search API
• Google Cloud Storage API
• Datastore API
• Blobstore API
12
14. PaaS Advantages
• Infinite compute resource available on demand
• Pay per use basis
• Reduced costs due to dynamic resource provisioning
• Scalability - No need to plan for peak load
• Easy management
• Software versioning and upgrading
• Elastic
• Only use what you need
14
16. Risks
• Privacy
• Who access your data?
• Security
• How much you trust your provider?
• What about recovery, tracing, and data integrity?
• Political and legal issues
• Who owns the data?
• Who uses your personal data?
• Government
• Where is your data?
• Amazon Availability Zones
• Lock-in to vendor
16
17. Hadoop Platform
• Google Articles
• The Google File System - 2003
• MapReduce: Simplified Data Processing on Large Cluster - 2004
• A framework for storing & processing Petabyte of data
using commodity hardware and storage
• Hadoop partitions data and computation across many
(thousands) of hosts, and executing application
computations in parallel close to their data.
17
18. Hadoop clusters
• Yahoo has ~20,000 machines running Hadoop
• largest clusters are currently 3000 nodes
• Load 30-50TB/day
18
20. Hadoop projects
• HDFS : A distributed filesystem that runs on large clusters of
commodity machines
• MapReduce : A distributed data processing model
• Hbase : A distributed, column-oriented database.
• Hive : A distributed data warehouse. Hive manages data
stored in HDFS and provides a query language based on SQL
• Pig : A data flow language and execution environment for
exploring very large datasets
20
21. Hadoop Characteristics
• Commodity HW + Horizontal scaling
• Add inexpensive servers
• Storage servers and their disks are not assumed to be highly reliable and available
• Use replication across servers to deal with unreliable storage/servers
• Support for moving computation close to data
• Automatic re-execution on failure/distribution
• Metadata-data separation - simple design
• Storage scales horizontally
• Metadata scales vertically (today)
21
28. MapReduce Model
• Developing MapReduce based Applications
• Define map and reduce operations
• Provide the data
• Run the MapReduce engine
• MapReduce library does most of the hard work for us!
• Parallelization
• Fault Tolerance
• Data Distribution
• Load Balancing
28
29. Map and Reduce
• Map()
• Map workers read in contents of corresponding input partition
• Process a key/value pair to generate intermediate key/value pairs
• Reduce()
• Merge all intermediate values associated with the same key
• eg. <key, [value1, value2,..., valueN]>
• Output of user's reduce function is written to output file on global file
system
Input data
map & reduce
MapReduce engine
Map & Reduce network
29
30. MapReduce Example
the quick
brown
fox
the fox
ate the
mouse
how now
brown
cow
Map
Map
Map
Reduce
Reduce
brown, 2
fox, 2
how, 1
now, 1
the, 3
ate, 1
cow, 1
mouse, 1
quick, 1
the, 1
quick, 1
brown, 1
fox, 1
the, 1
fox, 1
the, 1
ate, 1
mouse, 1
how, 1
now, 1
brown, 1
cow, 1
Input Map Reduce Resualt
30
32. MapReduce Components
• Master-Slave architecture
• JobTracker
• Accepts jobs submitted by users
• Assigns Map and Reduce tasks to Tasktrackers
• Makes all scheduling decisions
• Schedules tasks on nodes close to data
• Monitors task and tasktracker status, re-executes tasks upon failure
• TaskTracker
• Asks for new tasks, executes, monitors, reports status
• Run Map and Reduce tasks upon instruction from the Jobtracker
• Manage storage and transmission of intermediate output
32
35. Private Cloud
• HADOOP AND EUCALYPTUS INTEGRATION
• in order to build a Hadoop cluster, it can use virtual machines that are
created by the Eucalyptus
Physical Node1 Physical Node2 Physical Node3 Physical Node4 Physical Node7….
Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor
Infrastructure Manager
VM 1 VM 2 VM 3 VM 4 VM 5 VM 6 VM 7 VM 8 VM 9 … VM 27
DFS-M
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
DFS-N
…
Master
Slave1
Slave2
Slave3
Slave4
Slave5
Slave6
Slave7
Slave8
Slave9
Distributed File System / Platform Manager
….
35
36. Case study - Evolutionary algorithms
• In artificial intelligence, an evolutionary algorithm (EA) is
a subset of evolutionary computation, a generic
population-based metaheuristic optimization algorithm:
• Genetic algorithm
• Populations
• Fitness Function
• Mutation
• Crossover
36