10. Service Breadth & Depth
TECHNICAL &
BUSINESS
SUPPORT
Account
Management
Support
Professional
Services
Solutions
Architects
Training &
Certification
Security &
Pricing
Reports
Partner
Ecosystem
AWS
MARKETPLACE
Backup
Big Data
& HPC
Business
Apps
Databases
Development
Industry
Solutions
Security
MANAGEMENT
TOOLS
Queuing
Notifications
Search
Orchestration
Email
ENTERPRISE
APPS
Virtual
Desktops
Storage
Gateway
Sharing &
Collaboration
Email &
Calendaring
Directories
HYBRID CLOUD
MANAGEMENT
Backups
Deployment
Direct
Connect
Identity
Federation
Integrated
Management
SECURITY &
MANAGEMENT
Virtual Private
Networks
Identity &
Access
Encryption
Keys
Configuration Monitoring Dedicated
INFRASTRUCTURE
SERVICES
Regions
Availability
Zones
Compute
Storage
O b j e c t s
,
B l o c k s ,
F i l e s
Databases
SQL, NoSQL,
Caching
CDNNetworking
PLATFORM
SERVICES
App
Mobile
& Web
Front-end
Functions
Identity
Data Store
Real-time
Development
Containers
Source
Code
Build
Tools
Deployment
DevOps
Mobile
Sync
Identity
Push
Notifications
Mobile
Analytics
Mobile
Backend
Analytics
Data
Warehousing
Hadoop
Streaming
Data
Pipelines
Machine
Learning
11. AWS building blocks
Servicios altamente disponibles y
tolerante a fallas desde su concepción
Servicios altamente disponibles
con una arquitectura correcta
Amazon CloudFront
Amazon Route53
Amazon S3
Amazon DynamoDB
Elastic Load Balancing
Amazon SQS
Amazon SNS
Amazon SES
Amazon SWF
…
Amazon EC2
Amazon Elastic Block Store
Amazon RDS
Amazon VPC
13. Día 1, usuario 1
• Una única instancia Amazon
EC2
– El stack completo en este host
• Aplicación web
• Base de datos
• Administración
• Entre otros…
• Una única IP pública
• Amazon Route 53 para DNS
Amazon EC2
instance
Elastic IP
Amazon
Route 53
User
14. “Vamos a necesitar una caja mas grande”
• Solución más simple
• Posibilidad de usar PIOPS
• Instancias para alto I/O
• Instancias con alta memoria
• Instancias con alto CPU
• Instancias con alto
almacenamiento
• Fácil cambio de tamaño de
instancia
• Eventualmente llegará al límite
c3.8xlarge
m3.2xlarge
t2.micro
15. “Vamos a necesitar una caja mas grande”
c3.8xlarge
m3.2xlarge
t2.micro
• Solución más simple
• Posibilidad de usar PIOPS
• Instancias para alto I/O
• Instancias con alta memoria
• Instancias con alto CPU
• Instancias con alto
almacenamiento
• Fácil cambio de tamaño de
instancia
• Eventualmente llegará al límite
16. Día 1, usuario 1
• Potencialmente podríamos
atender de algunos cientos
a algunos miles
dependiendo de la
complejidad de la aplicación
• No hay failover
• No hay redundancia
• Muchos huevos en la
misma canasta
EC2 Instance
Elastic IP
Amazon
Route 53
User
17. Día 2, usuario > 1
Primero, separemos
nuestro host único en
más de uno
• Web
• Base de datos
– Usar un servicio de base de
datos?
Web
Instance
Database
Instance
Elastic IP
Amazon
Route 53
User
18. No administrada Administrada
Base de datos en
Amazon EC2
Su decisión de rodar la
base de datos en
Amazon EC2
Traiga su propia licencia
(BYOL)
Amazon
DynamoDB
Servicio de base de datos
NoSQL usando
almacenamiento SSD
Escalabilidad simple con
cero administración
Amazon RDS
Microsoft SQL Server,
Oracle, MySQL, o
PostgreSQL como servicio
administrado
Licenciamiento flexible:
BYOL o licencia incluida
Amazon
Redshift
Servicio de DW de
gran escala,
masivamente
paralelo.
Rápido, poderoso y
fácil de escalar
Opciones de base de datos
22. Por qué comenzar con SQL?
• Es una tecnología establecida y bien conocida
• Existen cientos de libros, comunidades, herramientas,
código y más.
• Patrones claros de escalabilidad.
• Usted no va a derrumbar una BD SQL con sus primeros
10 millones de usuarios. No, en realidad no lo hará*
*A menos que usted esté haciendo algo MUY fuera de lo común o tenga cantidades
MASIVAS de datos, pero inclusive así, SQL tendrá un espacio en su stack.
24. Si su uso es tal que va a
generar TB (> 5) de datos en su
primer año, O va a tener un flujo
de trabajo de intensidad de
datos increíbles, usted podría
entonces necesitar NoSQL
25. Por qué otras razones necesitaría NoSQL?
• Aplicaciones de muy baja latencia
• Datasets basados en metadata
• Data altamente no relacional
• Necesidad de constructs de datos sin esquema*
• Cantidades masivas de datos (de nuevo, en el orden de
TB)
• Rápida ingestión de datos (miles de records/seg)
*Necesidad != “Es más fácil desarrollar sin esquemas”
26. Usuario > 100
Primero, separemos
nuestro único host en
más de uno:
• Web
• Base de datos
– Use Amazon RDS para
hacer su vida más fácil
Web
instance
Elastic IP
RDS DB
instance
Amazon
Route 53
User
27. Usuario > 1000
Después, vamos a atacar
nuestra falta de failover y
problemas de
redundancia:
• Elastic Load Balancing
(ELB)
• Otra instancia web
– En otra zona de
disponibilidad
• RDS Multi-AZ
Web
Instance
RDS DB Instance
Active (Multi-AZ)
Availability Zone Availability Zone
Web
Instance
RDS DB Instance
Standby (Multi-AZ)
ELB Balancer
Amazon
Route 53
User
29. Usuario > 10,000s–100,000s
RDS DB Instance
Active (Multi-AZ)
Availability Zone Availability Zone
RDS DB Instance
Standby (Multi-AZ)
ELB Balancer
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Amazon
Route 53
User
30. Esto nos llevará lejos, pero nos
importa también el performance
y la eficiencia, entonces vamos
a mejorarlo un poco más:
31. RDS DB Instance
Active (Multi-AZ)
Availability Zone
ELB Balancer
Amazon S3
Amazon
CloudFront
Amazon
Route 53
User
Vamos a aligerar la carga en
nuestras instancias web y base
de datos:
• Mover el contenido
estático de la instancia
web a Amazon S3 y
Amazon CloudFront
• Mover la sesiones/estado y
crear un caché para la base
de datos usando Amazon
ElastiCache o Amazon
DynamoDB
Movamos las cargas
Web Instances
32. RDS DB Instance
Active (Multi-AZ)
Availability Zone
ELB Balancer
Amazon S3
Amazon
CloudFront
Amazon
Route 53
User
ElastiCache
DynamoDB
Web Instances
Vamos a aligerar la carga en
nuestras instancias web y base
de datos:
• Mover el contenido estático
de la instancia web a
Amazon S3 y Amazon
CloudFront
• Mover la sesiones/estado y
crear un caché para la
base de datos usando
Amazon ElastiCache o
Amazon DynamoDB
Movamos las cargas
33. Vamos a aligerar la carga en nuestras
instancias web y base de datos:
• Mover el contenido estático de la
instancia web a Amazon S3 y
Amazon CloudFront
• Mover la sesiones/estado y crear un
caché para la base de datos usando
Amazon ElastiCache o Amazon
DynamoDB
• Mover el contenido dinámico del
ELB a Amazon CloudFront
RDS DB Instance
Active (Multi-AZ)
Availability Zone
ELB Balancer
Amazon S3
Amazon
CloudFront
Amazon
Route 53
User
ElastiCache
DynamoDB
Web Instances
Movamos las cargas
34. Ahora que nuestra capa web es
mucho más ligera, podemos
volver al inicio de nuestra
charla…
43. Usuario > 500,000+
Availability Zone
Amazon
Route 53
User
Amazon S3
Amazon
CloudFront
Availability Zone
ELB Balancer
DynamoDB
RDS DB Instance
Read Replica
Web
Instance
Web
Instance
Web
Instance
ElastiCache RDS DB Instance
Read Replica
Web
Instance
Web
Instance
Web
Instance
ElastiCacheRDS DB Instance
Standby (Multi-AZ)
RDS DB Instance
Active (Multi-AZ)
44. Use automatización
Administrar su infraestructura va a ser cada día una parte
más importante de su tiempo. Use herramientas de
automatización para tareas repetitivas:
• Herramientas para administrar sus recursos AWS
• Herramientas para administrar el software y la
configuración en sus instancias.
• Automatice el análisis de logs y acciones de los
usuarios.
45. Soluciones de administración de aplicaciones
AWS
Convenience Control
Servicios de alto nivel Hágalo usted mismo
AWS
Elastic Beanstalk
AWS
OpsWorks
AWS
CloudFormation
Amazon EC2
46. Usuario > 500,000+
Potencialmente usted va a empezar a tener problemas con la
velocidad y el performance de sus aplicaciones:
• Asegúrese de tener monitoreo, métricas, alarmas y logs.
– Si no puede construir una solución interna, use un Third-party como Nagios, NewRelic, entre
otros…
• Ponga atención a cuantos clientes hablan bien de su
aplicación vs. cuantos no lo hacen y use esta información.
• Intente exprimir la mayor cantidad de performance de cada
uno de los servicios o componentes que use.
47. Métricas a
nivel de host
Métricas
agregadas por
nivel
Performance
externo del sitio
Análisis de log
49. SOA (Service Oriented Architecture)
• Mover servicios a sus propias capas o módulos. Trate cada uno
de ellos como piezas completamente separadas de su
infraestructura y escálelas de forma independiente.
• Amazon.com y AWS hacen esto de forma extensiva! Ofrece
flexibilidad y un mejor entendimiento de cada uno de los
componentes.
50. Desacoplamiento + SOA = ganador
No reinvente la rueda.
Ejemplos:
• Email
• Queuing
• Transcoding
• Search
• Databases
• Monitoring
• Metrics
• Logging
• Compute
Amazon CloudSearch Amazon SQSAmazon SNS
Amazon Elastic
Transcoder
Amazon SWFAmazon SES
Si alguien ya creó un servicio que cumple sus
necesidades, úselo en vez de construirlo.
AWS Lambda
51. El desacoplamiento te libera!
Entre más desacoplados, más escalan
– Componentes independientes
– Diseñe todo como una caja negra
– Desacople interacciones
– Favorezca servicios que ya ofrecen redundancia y escalabilidad,
en vez de crear sus propios.
S3 Bucket
Lambda
Push: Event
Notification
DynamoDB
Pull: DynamoDB
Stream
Amazon
Kinesis
Pull:
DynamoDB Stream
SQS
messages
Get
Message
Instance
Put
Message
Instance
Amazon SNS Topic
Publish
Notification
Queue Is Subscribed
to Topic
52. Usuario > 1 millón +
Llegar a millones y más, va a requerir un poco de todo lo
que previamente hemos conversado:
• Multi-AZ
• Elastic Load Balancing entre capas
• Auto Scaling
• Service Oriented Architecture
• Sirva contenido de forma inteligente (S3/CloudFront)
• Caché de base de datos
• Remueva el estado de capas que auto escalan
53. Usuario > 1 millón+
RDS DB Instance
Active (Multi-AZ)
Availability Zone
ELB Balancer
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Amazon
Route 53
User
Amazon S3
Amazon
CloudFront
DynamoDB
Amazon SQS
ElastiCache
Worker
Instance
Worker
Instance
Amazon
CloudWatch
Internal App
Instance
Internal App
Instance Amazon SES
Lambda
55. Usuario > 5 millones – 10 millones
Potencialmente en este punto comenzará a tener
problemas con su base de datos sobre conexiones de
escrita en la instancia master.
Cómo lo puede resolver?
• Federation — separar en múltiples BDs dependiendo de la función
(Foros, Usuarios, Productos.. )
• Sharding — separar los datos en múltiples hosts
• Mover algunas funcionalidades a otros tipos de bases de datos
(NoSQL, Grafos)
57. Resumen
• Infraestructura Multi-AZ
• Use servicios que escalan por si solos – ELB, Amazon S3,
Amazon SNS, Amazon SQS, Amazon SWF, Amazon SES,
entre otros.
• Construya con redundancia en todos los niveles.
• Empiece con SQL. En serio!
• Use caché de datos tanto dentro como fuera de su
infraestructura.
• Use herramientas de automatización en su infraestructura.
58. Resumen
• Asegúrese de tener buenas herramientas para
métricas/monitoreo/logs.
• Separe capas en servicios individuales (SOA).
• Use Auto Scaling cuando esté listo para ello.
• No intente reinventar la rueda.
• Cámbiese a NoSQL siempre y cuando tenga
sentido.
59. Poner todo esto junto significa
que podrá fácilmente ser
capaz de manejar 10+
millones de usuarios!
Scaling your application is a big topic, with lots of opinions, guides, and how-tos. If you are new to scaling on AWS, you might ask yourself this question: “So how do I scale?”
If you are like me, you’ll start where I usually start when I want to learn how to do something – using a search engine. In this case I’ve gone and searched for “scaling on AWS” on google
It’s important to note something about the results here. First off, there are a lot of things to read. This search was from yesterday, and there were over half a million posts on how to scale on AWS…so quite a few!
Unfortunately for us and our search engine here, the first response back is actually not what we are looking for. Auto-scaling IS an AWS services, and its great, but its not your starting point
Auto Scaling is not where you want to start -- It is a tool where we will go later in this discussion, but not the starting point
Back to Auto-scaling. “Auto-scaling is a tool and a destination for your infrastructure. It isn’t a single thing. Its not a check-box you can click when launching something. Your infrastructure really has to be built with the right properties in mind for Auto-scaling to work.”.. “So again, where do we start?”
What do we need first?
We need some basics to lay the foundations we’ll need to build our knowledge of AWS on top of.
AWS serves hundreds of thousands of customers in more than 190 countries.
Amazon CloudFront and Amazon Route 53 services are offered at AWS Edge Locations
AWS has developed the broadest collection of services available from any cloud provider.
Our approach to regions, availability zones, and POPs provides global coverage for high availability, low latency applications.
Foundation services across compute, storage, security, and networking offer customers flexibility in their architecture. We have a full spectrum of options to meet most price-to-performance scenarios.
We offer the capability for both managed and unmanaged database options.
The offerings for Analytics and Application Services enable advanced data processing and workloads.
AWS Redshift, our cloud-based data warehouse, is the fastest growing service in the history of AWS.
Our management tools offer a lot of insight and flexibility to let you manage your AWS resources through either our tools or the management tools you’re already familiar with.
Recent expansion into enterprise applications has been entirely driven by customer feedback on where they’d like us to deliver value.
AWS has global, regional, and services that are local to an AZ or datacenter. Many of these global and regional services are inherently fault tolerant.
Examples of Global services include Cloudfront, our CDN, and Route 53, our DNS Service. Both of these services use the Edge locations to distribute load closest to the user.
We have many platform services that are regional in nature. Many of these regional services are inherently fault tolerant by nature. When building an app, it is always much easier to use a fault tolerant service than work at building your own fault tolerant service. Here are a couple of good examples.
S3 replicates data around the region and provides 11 9s of data durability on objects.
DynamoDB writes to at least 2 AZs before responding with a commit successful and it does this all in single digit milliseconds
ELB allows you to load balance to EC2 instances both publically and privately in multiple Azs in a region and is self healing and scales organically with your traffic.
SQS replicates your messages around the region and these messages can be reached by API endpoints in any of the Azs
Lastly, there are many services that are specific to an AZ or datacenter. Many of these services can be deployed in patterns to make the architecture HA or FT but the services are not FT by default.
EC2 can be run in multiple Azs – something we are going to talk about more
So let’s get started at day one, user one…which is most likely you
This here is the most basic set up you would need to serve up a web application.
Any user would first hit Route53 for DNS resolution.
Behind the DNS service is an EC2 instance running our webapp and database on a single server,
We will need to attach an Elastic IP so Route53 can direct traffic to our webstack at that IP Address.
To scale this infrastructure, the only real option we have is to get a bigger EC2 instance…
Vertically Scaling the one EC2 instance we have to a larger one is the most simple approach to start with and that is where we will begin.
There are a lot of different AWS instance types to go with depending on your work load. We call these instance families as they have different characteristics.
We have our instances that have a linear CPU to Memory ratio such as the M3s.
Some have high I/O such as our I2 instance, others are CPU optimized such as our C3 and C4 instances,
Others are Memory optimized such as our R3 instances, and we have some that are IO or storage optimized such as our I2 or D2 instances.
Inside of each instance family are different sizes ranging from Micro in the T2 family to 8XL in many of the other families. This allows you to vertically scale inside the family that best supports your workload.
You can also make use of EBS-Optimized instances and Provisioned IOPs to help scale the storage for this instance quite a bit.
This is all great at the beginning, but the key concern here, is that you WILL hit an endpoint, where we just don’t have a bigger instance class out yet, and so scaling this way, while it can get you over an initial hump, really isn’t going to get you very far.
We also have to consider some other issues with this architecture; No Failover, No redundancy, and too many eggs in one basket, since we have both the database and webapp on the same instance.
We need to be more strategic and start to break apart our application for both scaling and redundancy reasons.
The first thing we can do to address the issues of too many eggs in one basket is to split out our Webapp and Database into two instances. This gives us more flexibility in scaling these two tiers independently. And since we are breaking out the Database, this is a great time to think about maybe making use of a database services instead of managing the DB ourselves…
So what options do we have?
At AWS there are a lot of different options to running databases. One is to just install pretty much any database you can think of on an EC2 instance, and manage all of it yourself. If you are really comfortable doing DBA like activities, like backups, patching, security, tuning, this could be an option for you. Also, if you need something highly specialized or customized and need to manage the hardware to achieve this, again this might be for you.
If not, then we have a few options that we think are a better idea:
First is Amazon RDS, or Relational Database Service. With RDS you get a managed database instance of either MySQL, Oracle, Postgres or SQL Server, with features such as automated daily backups, simple scaling, patch management, snapshots and restores, High availability, and read replicas - depending on the engine you go with.
Next up we have DynamoDB, a NoSQL database, built on top of SSDs. DynamoDB is based on the Dynamo whitepaper published by Amazon.com back in 2003. This whitepaper was considered the grandfather of most modern NoSQL databases like Cassandra. DynamoDB is kind of like a cousin of the original paper or an evolution of that whitepaper. One of the key concepts to DynamoDB is what we call “Zero Administration”. With DynamoDB the only knobs to tweak are the reads and writes per second you want the DB to be able to perform at. You set it, and it will give you that capacity with query responses averaging in single digit millisecond. We’ve had customers with loads such as half a million reads and writes per second without DynamoDB even blinking.
Lastly we have Amazon Redshift, a multi-petabyte-scale data warehouse service. Redshift is managed, massively parallel and you speak ANSI SQL to it over the wire. With Redshift, much like most AWS services, the idea is that you can start small, and scale as you need to, while only paying for what you are using. What this means is that you can start on a smaller single node with Redshift and scale your DW cluster as your workload requires. Redshift is also several times cheaper than most other dataware house providers.
Given that we have all these different options, from running pretty much anything you want yourself, to making use of one of the database services AWS provides, how do you choose? How do you decide between SQL and NoSQL?
Read as is
Read as is
So Why start with SQL databases? Generally speaking SQL based databases are established and well worn technology.
There’s a good chance SQL is older than most people in this room. It has however continued to power most of the largest web applications we deal with on a daily basis.
There are a lot of existing code, books, tools, communities, and people who know and understand SQL. Some of these newer nosql databases might have a handful of companies using them at scale.
People is key here in addition to all of the other points as you may need to hire people to manage your database.
You also aren’t going to break SQL databases in your first 10 million users. And yes there is an asterisk here, and we’ll get to that in a second. Lastly, there are a lot of clear patterns for scalability that we’ll discuss a bit through out this talk.
So as for my point here at the bottom, I again strongly recommend SQL based technology, unless your application is doing something SUPER weird with the data, or you’ll have MASSIVE amounts of it, even then, SQL will be in your stack.
So you are thinking to yourself…AH HA! You said “massive amounts”. and we will have massive amounts of data. There could be a few of you in this room that at the start would need something like NoSQL… well lets clarify this a bit.
If your usage is such that you will be generating several of data in the first year, OR you will have an incredibly data intensive workload, then, you might need NoSQL
So why else might you need NoSQL? There are definitely usecases where it makes sense to go NoSQL right off the bat. Some examples:
Super low latency applications.
Metadata driven datasets
High-unrelational data
Kind of going along with the previous is where you really need schema-less data constructs. And lets highlight the word NEED here. This isn’t just developers saying its easy to make apps without schemas. That’s just laziness
Massive amounts of data, again from the previous slide, in the several TB range.
Rapid ingest of data. Where you need to ingest potentially thousands of records per second into a single dataset
So for this scenario today and based upon our discussion, we’re going to go with RDS and MYSQL as our database engine.
Next up we need to address the lack of failover and redundancy in our infrastructure.
We’re going to do this by adding in another webapp instance, and enabling the Multi-AZ feature of RDS, which will give us a standby instance in a different AZ from the Primary.
We’re also going to replace our EIP with an Elastic Load Balancer to share the load between our two web instances
Now we have an app that is a bit more scalable and has some fault tolerance built in as well.
Read this slide.
Explain architecture
Route 53 to ELB
ELB to web instances
Master for writes and read replicas for reads
Stack is fault tolerant and has scaled quite a bit from where we started.
Most of you will get to this point and be pretty well off honestly.
You can take this really far for most web applications.
We could scale this out over another AZ maybe. Add in another tier of read replicas.
but its not that efficient in both performance or cost, and since those are important too, let’s clean up this infrastructure a bit.
As mentioned, we can start by moving any static assets from our webapp instances to S3, and then serving those objects via CloudFront.
This would be all of your images, videos, css, javascript and any other heavy static content.
These files can be served via an S3 origin (more on S3 in the next slide) and then globally cached and distributed via Cloudfront.
This will take load off your webservers and allow you to reduce your footprint in that web tier.
We can also move things like session information or to a NoSQL db like DynamoDB or to a cache like Elasticache. For our scenario, we will use DynamoDB for this.
We can also use Elasticache to store some of our common database query results which will prevent us from hitting the database too much.
This should take load off of our DB tier.
Removing session state from our web / app tier is also very key as it allows us to scale up and down without losing session information when this horizontal scaling happens.
Lastly, let’s move CloudFront in front of our dynamic origin or webstack as well as our static origin on S3.
We are going to pump the entire site through CloudFront. This could allow us to cache all sorts of page content at the edge, and greatly speed up our site performance to our end users, while significantly lowering the load on our infrastructure.
As mentioned, CloudFront will also accelerate connections back to your dynamic origin.
Read slide
Read slide
Imagine a “typical” week of traffic to Amazon.com. This pattern might look a lot like your own traffic, with a peak during the middle of the day, and a low down in the middle of the night.
Given this pattern, it becomes really easy to do capacity planning. We can provision say 15% over what we see our normal peak to be, and be happy with the capacity we have for a while, so long as our traffic matches this pattern. BUT, what if I told you that this was the first week of November for Amazon.com
And this was the month of November! We can see a pretty big growth here at the end of the month with Black Friday and Cyber Monday sales in the US.
IF we attempted to do our ‘add 15% capacity for spikes” rule, we’d be in trouble for the month of November.
That’s a lot of potential wasted infrastructure and cost. 76% wasted potentially, while only 24% of it on average for the month gets utilized. Traditionally this is how IT did things. You bought servers for a 6-12 month vision on what growth might be. So, since we can all agree this is bad, what is the solution.
Well what if we could map our capacity directly to our needs based on our end-users? We could make sure that at any point in time, we were only provisioning what we needed, vs some arbitrary capacity guess.
Read slide.
If we add in auto-scaling, our caching layer (both inside, and outside our infrastructure), and the read-replicas with MySQL, we can now handle a pretty serious load. This could potentially even get us into the millions of users by itself if continued to be scaled horizontally and vertically.
But this is a monolith. All of the application logic is running on each server. Can we make this better? We will dive into this topic after another short discussion.
To do this, you will need to add automation to your deployments.
There are tools to automate deployment of AWS resources
There are other tools that manage deployment of your software and configuration of an instance
Lastly, you will want to monitor your application and analyze what users are doing on your application. This can be done with metrics, logs, and analytics.
AWS provides a lot of tools to choose from in this area and we will discuss a few.
Elastic Beanstalk is easiest to start with, but offers less control. EB allows you to not worry about managing the infrastructure for your application. You simply deploy your app such as a Ruby app, in a Ruby container, and EB takes care of scaling it and managing it.
Opsworks allows you to manage the lifecycle of your application in layers with Chef recipes. We have out of the box recipes for managing many different types of layers and you can write custom chef recipes to manage any layers we don’t support.
Cloudformation is infrastructure as code. It is a template based tool with its own language, so a bit of a learning curve, but very very powerful. You define json templates that define what infrastructure you want to build out and any relationships that exist between your infrastructure.
Lastly you could do all this manually, but at scale its nearly impossible without a huge team.
Recently we also introduced a number of services for storing your code, deploying your code, and managing continuous delivery and release automation. These services are called Code Commit, Code Deploy, and Code Pipeline. I would take a look at these as well.
So now we are at a load of greater than 500K users
At this point users you could start running into issues with speed and performance of your application.
You will want to make sure you application is properly instrumented and you are monitoring, gathering metrics and performing log collection.
There are many providers out there offering solutions in this space. AWS has tools such as CloudWatch, and there are many other choices out there that integrate well with AWS.
Also listen to your customers. Send out a survey…gather feedback from super users, etc. Your users probably know best what works and what doesn’t perform well.
Remember your goal is to squeeze as much performance out of every tier as possible, and without instrumenting your app, it is very hard to tell how you are performing.
Here are some examples.
Host Level metrics are great for deep diving on problems,
but aggregate level metrics will be more valuable as a bigger picture of what is going on with your infrastructure.
Log analysis is also very much needed, and incredibly powerful to have in your infrastructure. Don’t skim on it. Log everything centrally. Here is an example of logging with CW Logs but there are many other solutions out there from third party providers as well.
Lastly we have external site metrics. Its amazing how many people don’t think about this last one here. You need to understand how your site is performing from the view of your end users. You might also do this with tools such as New Relic which can also dig into Application performance monitoring.
( top two are from CloudWatch, bottom left is from CW Logs, bottom right is Pingdom)
We still are in a position where we have a very monolithic application where all logic and functionality is running on the webserver. We have a single webapp tier doing all of our application workload. While that works for some sites and applications, for many it doesn’t. Which brings us on to our next topic…
So with SOA, you want to move different services your application is performing into different tiers or modules.
From there, you can treat them as separate pieces of your infrastructure that you can scale them and make them fault tolerant separately.
Amazon.com and AWS have hundreds of services under the hood that represent the sites and services you see. It’s a core principle in application/service development at Amazon.
AWS has quite a few services that can solve key functionality areas in your application.
Combining loose coupling, SOA, and prebuilt services, can also really have some huge advantages.
Instead of writing all these mini services yourself, try and leverage already existing services and applications, especially when you are starting out.
DON’T REINVENT THE WHEEL! For example, at AWS we have services to help you with Email, Queues, Transcoding, Search, Databases, and Monitoring and Metrics. Lean on other 3rd parties for more.
Loose coupling different tiers of your architecture and using SOA gives you the ability to move quickly
When services are loosley coupled, they can scale and be made fault tolerant independently of each other.
The looser that services are coupled the larger they can scale
So remember:
Design everything as a black box
Build separate services instead of something that is tightly interacting with something else
Uses common interfaces or common APIs between the components
And remember to favor services with built-in redundancy and scalability rather than building your own; such as the two examples provided below
To reach a million, you will want to iterate on many of the previous things such as:
Use of multi-AZ; maybe introduce a 3rd AZ
Use of ELBs between tiers to loosely couple the tiers
Use SOA principles and services that already exist that are built to scale with you
Offload your web tier by removing static content from the web tier and using caching in front of your static and dynamic origins
Make sure to cache in front of your DB and use read replicas where appropriate for application reads or any offline reporting or analytics
Also, move your session state off your tiers that autoscale; making them stateless
This diagram is missing the 2nd (and maybe third) AZ, but we’ve only got so much room on the slide.
But we can see we’ve added in some internal pools for different tasks perhaps.
Maybe we’re using SQS for something, and have SES for sending our out bound email.
We may have Lambda catching events or items from S3 and DynamoDB and processing them.
We are using caching for the database and have built stateless tiers for our web and app tiers
Again our users will still talk to Route53, and then to CloudFront to get to our site and our content hosted back by our ELB and S3.
So what are the next big steps we need to think about?
When we start getting into the 5M user plus range, we may start seeing database contention issues on writes to the Master
We are going to drill into a couple of techniques to solve these types of issues, and those include Federation and Sharding
Read slide
Multi-AZ your infrastructure. Maybe use a 3rd AZ when you get to large scale or you could do that fro the beginning.
Make use of self-scaling and fault tolerant services such as ELB, S3, SNS, SQS, SWF, and SES.
Build in redundancy at every level of your application
Start with SQL. Seriously.
Cache data both inside and outside your infrastructure with the techniques we discussed.
Use automation tools in your infrastructure to deploy your app and make it repeatable.
Make sure you have instrumented your app so you know what’s going on
Use the SOA principles to split things into separate modules and reuse services that are available where you can
Take advantage of auto scaling when it is time and your workload shows some variance
Remember not to reinvent the wheel and only move to NoSQL when it makes sense