This document discusses building a highly available solution based on the Cloud Foundry PaaS. It describes selecting AWS and OpenStack as technologies, implementing a pilot project on AWS across two regions, and using Cloud Foundry for application deployment. The solution provides a scalable and distributed platform for managing devices as a service, leveraging technologies like Cassandra, MariaDB, and open source components.
1. Building a Highly Available Solution
Based on the Cloud Foundry PaaS
By Sergey Sverchkov,
Software Architect at Altoros
sergey.sverchkov@altoros.com
www.altoros.com | @altoros
2. What We Are Going to Discuss
❏ Project requirements (from the business point of view)
❏ Selecting the technology stack
❏ The pilot project on Amazon Web Services (AWS)
❏ The private cloud solution based on OpenStack
❏ Adding the Cloud Foundry (CF) Platform-as-a-Service (PaaS)
❏ Building a distributed system with high availability (HA)
www.altoros.com | @altoros
3. ❏ A solution for management of devices
❏ Delivered as Software-as-a-Service
❏ Built as a private cloud
❏ Distributed between several regions
❏ Scalability to millions of devices
❏ Based on open source components
www.altoros.com | @altoros
Project Requirements from the Business Point of View
4. ❏ Amazon Web Services (AWS)—the platform for prototyping
❏ Cassandra—the main data storage
❏ MariaDB Galera Cluster—the clustered SQL database
❏ Cloud Foundry (СF)—Platform-as-a-Service
❏ OpenStack—the platform for bulding the private cloud
www.altoros.com | @altoros
Selecting the Technology Stack
5. ❏ A cloud in 2 AWS regions with data synchronization enabled
❏ A secure channel between the regions
❏ Web socket for device communication
❏ Scalable up to 150,000 devices
❏ Device data packet size ~1–2 KB
❏ Every device sends several data packets every minute
www.altoros.com | @altoros
The Pilot Project on AWS: Requirements
6. ❏ 2 virtual private clouds
❏ An IPSEC VPN tunnel
❏ Route 53 DNS with failover or a latency policy
❏ Elastic Load Balancer
❏ A full replica in Cassandra and MariaDB
❏ Device emulation for workload
The Pilot Project on AWS:
Implementation
8. ❏ Supports deployment of Java, Ruby, Node.js, etc.
❏ Linux-like containers for application isolation
❏ Automated deployment of runtime environments (JRE, Tomcat, etc.)
❏ Organizations, users, spaces, and resource limits
❏ Supported on AWS, OpenStack, and VMWare
www.altoros.com | @altoros
Cloud Foundry: The Application Platform (PaaS)
9. What We Get from Cloud Foundry
❏ Application management from a console
❏ Health monitoring and scaling
❏ Application load balancing and routing
❏ HTTP, HTTPS, and WebSocket
❏ Databases as external services
❏ Think about development not deployment
www.altoros.com | @altoros
11. ❏ Server chassis: SuperMicro 5037 with 8 nodes
❏ Xeon E3-1230, 32 GB of RAM, 2*3.5” SATA3
❏ 3 nodes for OpenStack management
❏ 5 nodes for OpenStack virtual machines
❏ 20 CPU cores, 160 GB of RAM, 10 TB of storage
www.altoros.com | @altoros
Hardware Configuration for OpenStack with High Availability
15. ❏ A complex system requires complex approaches
❏ Verify if the technology stack meets your project requirements
❏ High availabilty needs to be supported on all layers/components
❏ Open source is free but you take all the risks on your own
❏ Demonstrate business value on every phase
www.altoros.com | @altoros
Lessons Learned
17. WHAT WE DO
We bring “software assembly lines” into organizations through
deployment and integration of solutions offered by the Cloud Foundry ecosystem
Managed ServicesConsulting Integration
Delivered by
Altoros
Delivered by partners from CF
Ecosystem
Training
My name is Sergey Sverchkov
I’m a Solution Architect at Altoros. Altoros is a software company that specializes in cloud technologies and big data.
This presentation will be about one of our ongoing projects that uses a technology stack for building service-based solutions. One of its major components is a platform for application deployment, also known as Platform-as-a-Service.
So, today I will speak about the requirements to this type of systems from the business point of view. I will also explain:
- how we selected technologies for this project;
- how we built pilot implementation of solution on the Amazon Web Services platform;
- why we decided to create a private cloud based in a datacenter;
- and what issues we can resolve with the Cloud Foundry PaaS.
In addition, I will share some insights on how to make a distributed system highly available.
As you can see from the business requirements on this slide, the project is a device management system available as Software-as-a-Service. We plan to deploy this system to a private cloud located in a datacenter. The system will be distributed to several regions. The architecture must be highly scalable with a possibility to extend the number of devices to millions. The application’s core will be written in Java.
And last but not least, we have to use open source components when building this system.
We started working on this project from selecting the technology stack.
Since we are building Software-as-a-Service, it was reasonable to use the Amazon platform at the beginning of the project. Thanks to Amazon, we built a pilot solution that was distributed to several regions. Amazon is a great platform for applications and contains a lot of useful services for developers.
We chose Cassandra for storing unstructured data generated by the devices. It is scalable (in storage and computing capacity), distributed, fault tolerant NoSQL data storage. And it supports several regions. But Cassandra may not the best choice for storing structured data. So we decided to use a clustered SQL database for that. The solution is called MariaDB Galera Cluster. This database is based on MySQL and it supports active-active replication between nodes in a cluster. As a result it can write to any node.
As I have mentioned, the project is Java-based. The system consists of many applications and each of them implements a particular business function. We need to implement every application as a cluster for fault tolerance and load balancing. This causes certain issues that we have to think during design and implementation:
- we need to manage deployment of applications in a cluster and monitor their health
- we need to balance the workload
- and we have to think carefully about infrastructure for application clusters
This is why, we started considering a platform for applications that would automate these tasks. We decided to use the Cloud Foundry PaaS. It is an open source platform and it is being actively developed by the community. Finally, since Amazon is a proprietary platform and our project is to be deployed in a private cloud in a datacenter, we had to find a solution for building private clouds. Eventually, we settled on OpenStack. Like Cloud Foundry, OpenStack has a large community of developers, including people from Rackspace, Intel, NASA (“нэса”), etc.
After we had completed with technology stack selection, we started implementing project prototype in Amazon Web Services. Now, let’s take look at the requirements that we compiled for the pilot project:
- We needed to build a distributed system in two regions with data synchronization between these regions.
- Resources in 2 regions had to be connected through the secure channel.
- The devices had to communicate with the system through the web socket protocol. Each device sends data several times a minute. The size of a typical data packet is from 1 to 2 KB.
- And last, we were to demonstrate that this system can handle the workload generated from 150,000 devices.
The pilot project in Amazon was completed in around 3 months. On this slide, you can see a simplified architecture of the solution.
To ensure security of information, we created two private clouds in data centers located in Virginia and Oregon Amazon regions. Virtual private clouds are connected through a VPN tunnel.
Each private cloud has Cassandra and MariaDB clusters. Each cluster consists of three machines with data replicas. So, overall we have six Cassandra nodes and six MariaDB nodes. The main difference between Cassandra and MariaDB is that, in Cassandra, we can control the number of replicas in the cluster. In MariaDB, each node contains a full copy of the database.
In order to verify the workload, we created a device emulator and an application that writes data to Cassandra and MariaDB. The workload generated by these devices is distributed by Elastic Load Balancer that is a load balancing service available on Amazon AWS.
The input point of our distributed system is the Route 53 service, which serves for domain name resolution. We have tested Route 53 with failover and a latency policy. The main difference between these two is that we need to assign a primary and a secondary region for the failover policy. For the latency policy, Route 53 identifies a preferred region for connection. And the applications work in both regions in active mode.
Using this prototype, we performed load testing which proved that we can serve more than 100,000 devices with less than a dozen virtual machines. But Amazon is a public and proprietary cloud provider. So, many of its services cannot be ported to another infrastructure, for example Elastic Load Balancer or Amazon Relational Database Service (Amazon RDS)
So why do we need to care about infrastructure, clouds, and platforms for application? This diagram shows the key differences between different hosting or deployment models for applications and data.
If you use the traditional approach and host your application on servers located in a datacenter or in your company’s office, you have to manage all the resources . This includes the servers, storage, networking, configuration of the operating systems, application runtime, databases, and applications.
Infrastructure-as-a-service providers, such as Amazon or Google, manage the hardware and virtualization layers for you. You choose the region, the datacenter where virtual machines will be located, if they are available in (а) public or private address space. And you have to choose the operating system, install the application runtime, upload data, and deploy applications. You also need implement scaling, fault tolerance, and management of allocated resources.
When you use the Platform-as-a-Service model, you only manage applications and data. All the complexity related to allocating resources, installing runtime, and monitoring is managed by the platform for applications. This can save a considerable amount of time and speed up the release cycle.
Now, let’s discuss the platform for applications in a bit more details. In our case, it’s Cloud Foundry:
The PaaS helps us to automate tasks related to managing application lifecycle.
It supports applications written in different programming languages and can be deployed to different infrastructures. For Cloud Foundry, it’s OpenStack, VMWare, and AWS.
The PaaS also provides a single approach to testing and deploying applications to production.
Applications within the platform run in isolated containers that are managed by PaaS.
PaaS automatically configures application runtime inside these containers. For example it installs Java Runtime, Tomcat application server or Ruby or Node.js
The platform makes it possible to divide resources between organizations, users, spaces, and it helps to control resource allocation.
Finally, it helps developers to abstract away from the infrastructure and focus on development instead of deployment.
So, how does a developer work with a PaaS? To work with the PaaS, we use the Cloud Foundry command line client. This client enables developers to push applications to the platform, monitor their health, and scale them by increasing the amount of memory or by adding new application instances.
Cloud Foundry already has a load balancer and supports the main internet communication protocols.
All these features are available due to the complex internal architecture of the PaaS that contains number of management components. It takes some time to understand how it works and how to deploy it.
Now let’s move on to the OpenStack implementation of the project. OpenStack is another complex system that consists of many services. It’s not just a hypervisor for launching VMs. OpenStack helps to build private clouds that contains networking, computing, and storage resources.
OpenStack can be deployed on commodity hardware. It’s open source, so you can play with an OpenStack deployment on your laptop. But to build a functional OpenStack cloud, you will need a sufficient amount of resources.
This diagram shows a sample server configuration for OpenStack. It has eight nodes, each with with 32 or 64 GB of memory and two hot-swappable hard drives. Three nodes are dedicated to the OpenStack management services. And five servers are allocated to run virtual machines.
On this slide, you can our OpenStack deployment on hardware with 8 nodes.
There are two major roles that group different services in OpenStack. They are OpenStack Controller and OpenStack Compute. OpenStack Controller contains services that manage the cloud, while OpenStack Compute services run virtual machines.
On the diagram, you can see a group of 5 nodes for OpenStack Compute and three OpenStack Controller nodes. Virtual machines for MariaDB, Cassandra, and applications are hosted on OpenStack Compute nodes. Every data cluster contains from 3 to 5 virtual machines. Applications may be deployed as clusters in virtual machines. To provide fault tolerance, the components need to be redundant on every layer, including hardware, virtual machines, and application instances.
But OpenStack doesn’t provide application load balancing. This is why we need to deploy an additional service. We tested the HA Proxy project as an implementation example for application load balancing.
We have also tested this deployment with OpenStack and applications on virtual machines using a device emulator. We were able to serve up to 40,000 device connections concurrently on the sample hardware I showed you on the previous slide.
So, we have seen a deployment diagram for OpenStack. It is a redundant infrastructure with an eight-node server. And it is enough if you manage all the applications directly. But we wanted to get rid of complexity of the application lifecycle and significantly reduce the time that each developer spends during deployment. That’s why we deployed the Cloud Founry PaaS on OpenStack.
On this slide I wanted to demonstrate how the components of the PaaS are deployed to OpenStack. Each component is deployed on a separate virtual machine. To make the PaaS deployment highly available, the components of the PaaS need to be deployed on at least two VMs.
Here I wanted to highlight that application instances run inside the Cloud Foundry PaaS. Components that run the application are called Droplet Execution Agents or DEAs. The component responsible for load balancing is called Router. Router handles all incoming requests to applications and knows when an application instance is executed. So it knows where to send the incoming request. Once again, to make the entire system highly available, we need to ensure redundancy on all layers, including Cloud Foundry.
Finally, we came to the architecture view of the two regions. Each region has its own deployment of OpenStack and Cloud Foundry. The same set of applications is deployed to Cloud Foundry in both regions. Database clusters are distributed. Connections between devices and one of the regions are served by a DNS service. For example, it can be Amazon Route 53. This helps us to make sure that the system operates even if only one region is available.
To sum it up, I would like to list some of the most important lessons we’ve learned from working on this solution.
A complex system requires an equally complex approach to design, implementation, and testing.
It is highly important to remember about project requirements when selecting the technology stack. You should always verify each component and make sure that it meets these requirements, both separately and as part of the system.
If you want to build a fault tolerant system, all architectural layers and components will need support high availability.
When choosing an open source solution, be prepared to dig into its internals. Any open product has some issues. It is the price you pay for free usage. And this is why you may need a team of experienced engineers.
You will most likely need about six months of research and self-learning before you can start using OpenStack and Cloud Foundry in production (if you do it on your own). Of course there are companies that provide training and support, which can help.
Finally, to persuade your customers to switch to a new platform, you need to demonstrate its business value on every phase. This means, the project needs to achieve business objectives despite the underlying technical complexity.
With 250+ employees across 8 countries, Altoros is behind some of the world’s largest Cloud Foundry deployments.
We are one of the community leaders in the Cloud Foundry ecosystem and a founding member of the Cloud Foundry Foundation. (alongside with SAP, IBM, Pivotal Software, Vmware, etc. )
#1
Altoros integrates software assembly lines into large organizations by deploying solutions offered by the Cloud Foundry ecosystem. Our customers are among the first to create and monetize application-driven competitive advantages with Cloud Foundry.
#2
At Altoros, our mission is to reduce the time and cost of application delivery by using Cloud Foundry-based” software assembly lines” and “data lakes”. We integrate solutions offered by the Cloud Foundry ecosystem and therefore help customers deliver and operate their “software assembly lines”.
Altoros is behind some of the world’s largest Cloud Foundry deployments.
This is all. Thank you for you attention. I’ll be glad to answer your questions.