2. Intro to Big data world & Cloud apps
Challenges of big data applications
Outages & disaster recovery
Complexity of multi tiers application
Lock-in to specific cloud
Monitoring & Scaling
PaaS make our life easier …
® Copyright 2013 GigaSpaces Ltd. All Rights Reserved2
AGENDA
3. WE’RE LIVING IN A REAL TIME WORLD…
Homeland Security
Real Time Search
Social
eCommerce
User Tracking &
Engagement
Financial Services
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved3
4. BIG DATA PREDICTIONS
“Over the next few years we'll see the adoption of scalable
frameworks and platforms for handling
streaming, or near real-time, analysis and processing. In the
same way that Hadoop has been borne out of large-scale web
applications, these platforms will be driven by the needs of large-
scale location-aware mobile, social and sensor use.”
Edd Dumbill, O’REILLY
4
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
5. 5
THE BIG GUYS…
• AWS – around 0.5M servers
• Facebook – less than 0.1M servers
• Google – around 1M servers
6. STORE 135+ BILLION MESSAGES A MONTH
6
® Copyright 2011 Gigaspaces Ltd. All Rights
14. 14
• No Code Change
• No lock-in
• Full Monitor & Control
Moving Apps At
Massive Scale –
The Right Way
15. CLOUDIFY POSITIONING IN THE CLOUD STACK
15
PaaS
IaaS
DevOps
(Automation)
Productivity
Control
ChefPuppet
CloudFoundry
Heroku
GAE
OpenShift
Rightscale
Public clouds
(AWS, Rackspace,..) Private clouds
(Vmware, OpenStack..)
High productivity with
full control
Enstratus
16. HOW DOES IT WORK?
1 Upload your recipes
2 Cloudify creates VMs & installs agents
3 Agents install and manage your app
4 Cloudify automates the scaling
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved16
23. UNDERSTAND APPLICATION STRUCTURE & DEPENDENCIES
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved23
application {
name="petclinic"
service {
name = "mongod"
}
service {
name = "mongoConfig"
}
service {
name = "apacheLB"
}
service {
name = "mongos"
dependsOn = ["mongoConfig", "mongod"]
}
service {
name = "tomcat"
dependsOn = ["mongos","apacheLB"]
}
}
25. AUTO SCALE YOUR WAY
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved25
scalingRules ([
scalingRule {
serviceStatistics {
metric "Total Requests Count"
statistics Statistics.maximumThroughput
movingTimeRangeInSeconds 20
}
highThreshold {
value 1
instancesIncrease 1
}
lowThreshold {
value 0.2
instancesDecrease 1
}
}
])
26. 5 MINUTES SETUP
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved26
/* Credentials - You must enter your
* cloud provider account credentials
*/
user="ENTER_USER_HERE"
apiKey="ENTER_API_KEY_HERE"
keyFile="ENTER_KEY_FILE_HERE"
keyPair="ENTER_KEY_PAIR_HERE"
// Advanced usage
hardwareId="m1.small"
locationId="us-east-1"
linuxImageId="us-east-1/ami-1624987f"
ubuntuImageId="us-east-1/ami-82fa58eb"
27. Intro to Big data world & Cloud apps
Challenges of big data applications
Outages & disaster recovery
Complexity of multi tiers application
Lock-in to specific cloud
Monitoring & Scaling
PaaS make our life easier - Cloudify
® Copyright 2013 GigaSpaces Ltd. All Rights Reserved27
SUMMARY
We live almost every aspect of our lives in a real-time world. Think about our social communications; we update our friends online via social networks and micro-blogging, we text from our mobiles, or message from our laptops. But it's not just our social lives; we shop online whenever we want, we search the web for immediate answers to our questions, we trade stocks online, we pay our bills, and do our banking. All online and all in real time.Real time doesn't just affect our personal lives. Enterprises and government agencies need real-time insights to be successful, whether they are investment firms that need fast access to market views and risk analysis, or retailers that need to adjust their online campaigns and recommendations. Even homeland security has come to increasingly rely on real-time monitoring.The amount of data that flows in these systems is huge.Major social networking platforms like Facebook and Twitter have developed their own architectures for handling the need for real-time analytics on huge amounts of data. However, not every company has the need or resources to build their own Twitter-like solution.
Big data is definitely expected to grow and expand. Amount of data is growing and the demand grows as well. The requirements for analytics in real time is a must.
A high-ranking Amazon executive said there are 60,000 different customers across the various Amazon Web Services, and most of them are not the startups that are normally associated with on-demand computing. Rather the biggest customers in both number and amount of computing resources consumed are divisions of banks, pharmaceuticals companies and other large corporations who try AWS once for a temporary project, and then get hooked. According to Statspotting.com in March 2012 - researcher estimates that Amazon Web Services is using at least 454,400 servers in seven data center hubs around the globe. Let us try this: Google is powered by a million servers. Maybe a little more than that. And Amazon has half a million servers. Now, things fall in place. Facebook, the service that takes up one fourth of all our time online, is powered by less than 100,000 servers.Biggest customers – pinterest, instagram, Netflix, heroku, quora, foursquare etcAmazon Web Services runs more than 835,000 requests per second for hundreds of thousands of customers in 190 countries, including 300 government agencies and 1,500 educational institutions.
http://highscalability.com/blog/2010/11/16/facebooks-new-real-time-messaging-system-hbase-to-store-135.htmlYou may have read somewhere that Facebook has introduced a new Social Inbox integrating email, IM, SMS, text messages, on-site Facebook messages. All-in-all they need to store over 135 billion messages a month. Where do they store all that stuff? One of the posts gave the surprise answer - HBase beat out MySQL, Cassandra, and a few others.Why a surprise? Facebook created Cassandra and it was purpose built for an inbox type application, but they found Cassandra's eventual consistency model wasn't a good match for their new real-time Messages product. Facebook also has an extensive MySQL infrastructure, but they found performance suffered as data set and indexes grew larger. And they could have built their own, but they chose HBase.HBase is a scaleout table store supporting very high rates of row-level updates over massive amounts of data. Exactly what is needed for a Messaging system. HBase is also a column based key-value store built on the BigTable model. It's good at fetching rows by key or scanning ranges of rows and filtering. Also what is needed for a Messaging system. Complex queries are not supported however. Queries are generally given over to an analytics tool like Hive, which Facebook created to make sense of their multi-petabyte data warehouse, and Hive is based on Hadoop's file system, HDFS, which is also used by HBase.
The Amazon cloud proved itself in that sufficient resources were available world-wide such that many well-prepared users could continue operating with relatively little downtime. But because Amazon’s reliability has been incredible, many users were not well-prepared leading to widespread outages.Amazon EC2 outage on April 2011 was the worst in cloud computing’s history back then. It made the front page of many news pages, including the New York Times, probably because many people were shocked by how many web sites and services rely on EC2.Microsoft Azure outageDec 28 2012 - some owners of Microsoft's Xbox 360 game console were unable to access some of their cloud-based save storage files.July 26 - 2012 - Service for Microsoft’s Windows Azure Europe region went down for more than two hoursFeb 29 2012 - The ultimate result was service impacts of 8-10 hours for users of Azure data centers in Dublin, Ireland, Chicago, and San Antonio.
Each cloud is unique in many aspects offering different API and functionality to manage the resources.Different set of available resourcesDifferent format, encoding and versionsDifferent security groups, machine images, snapshots etc.
DR – The process and procedures you take to restore your system after catastrophic event.Cloud infrastructure has made DR much easier and affordable comparing to previous options.Cloud can also suffer from large scale failures because of network, power or any IT failures.Applications owners need to be responsible for HA and DR – can use multiple servers, AZ, regions and even clouds.Zones within a region share a LAN so they have high bandwidth, low latency and private IP access. Zones utilize separate power resources. Regions are “islands” – they share no resources.
There are lots of patterns on how to avoid failure.It took Netflix lots of development work to build a framework that can handle them well.Most users, startup don't have the lactury of implementing them themselves. You need a tool that will enable you to automate those patterns in a consistent way. - Enter Cloudify