3. Opciones de almacenamiento en
AWS
• Almacenamiento de objetos escalable
• Almacenamiento de bajo costo para
archivado
• Almacenamiento persistente de bloques
• Sistema de archivos compartido
• Gateway para integración
8. 2012 2013 2014 2015
102% de crecimento anual en
términos de transferencia de
datos desde y hacia S3
(Q4 2014 vs Q4 2013, sin considerar el uso de Amazon)
Uso de S3
13. 1 PB de storage
800 TB de storage usables
600 TB de storage asignados
400 TB de datos de la aplicación
Precios de S3 — pague solo por lo que utilice!
Amazon S3
14. Reducción continua de costos: S3
• Disponible globalmente en 11
regiones
• Cobrado por GB-mes
• 8 reducciones de precios desde
su lanzamiento
• 51% de reduccíon (promedio) de
precios en 1/4/2014
• TCO: comparando on-premises
con S3
– Puede ser un desafío para
algunos clientes
– Nosotros podemos ayudar!
17. S3 event notifications
Envía notificaciones hacia Amazon SNS, Amazon SQS, o
AWS Lambda cuando ocurre un evento de S3
S3
Eventos
Tópico SNS
Fila SQS
Função Lambda
Notificações
Foo() {
…
}
18. • Preserve, recupere, y restaure todas las
versiones de cada objeto almacenado en un
bucket
• S3 automáticamente adiciona nuevas
versiones y preserva los objetos excluidos
utilizando marcadores de exclusión
• Controle fácilmente el número de versiones
almacenadas usando las políticas de
expiración y ciclo de vida de los objetos
• Puede ser fácilmente habilitado a través de la
consola web
Control de versión en S3
19. Replicación entre regiones de S3
Replicación asíncronica de objetos entre regiones AWS, automatizada, rápida y confiable
Source
(Virginia)
Destination
(Oregon)
• Únicamente replica nuevos PUTs.
Una vez que está configurado,
todos los nuevos uploads en un
bucket serán replicados
• Bucket entero o también basado
en un prefijo
• Replicación 1:1 entre 2 regiones
• Requiere control de versión
Casos de Uso:
• Compliance – almacene sus datos a centenas o miles de kilómetros de distancia
• Menor latencia - distribuya sus datos para clientes regionales
• Seguridad – Cree réplicas remotas administradas por diferentes cuentas de AWS
21. Casos de Uso de S3
• Capacidad de almacenamiento y performance Web-
Scale
• Origen de contenido que será entregado a través de
Amazon CloudFront
• Almacenamiento temporario y persistente para
aplicaciones de Big Data
• Almacenamiento para backup y archiving
28. Beneficios de Glacier
• Reduzca el costo de archivar sus datos a largo
plazo
• Capacidad ilimitada de almacenamiento
• Reemplace las cintas
• Aumente la durabilidad
29. Amazon S3 – Integración con Glacier
Servicio de archivado basado en políticas
30. Políticas de ciclo de vida en S3 →
Key prefix “logs/”
Mover los objetos a Glacier después de 30 días de
su creación
Excluir después de 365 días de su creación
<LifecycleConfiguration>
<Rule>
<ID>archive-in-30-days</ID>
<Prefix>logs/</Prefix>
<Status>Enabled</Status>
<Transition>
<Days>30</Days>
<StorageClass>GLACIER</StorageClass>
</Transition>
<Expiration>
<Days>365</Days>
</Expiration>
</Rule>
</LifecycleConfiguration
31. SoundCloud – usa Glacier para realizar
transcoding de audio
• Líder en redes sociales dentro del
mercado de música y audio
• Los archivos de audio necesitan ser
transcodificados y almacenados en
distintos formatos
• Almacena petabytes de datos
• Los archivos transcodificados son
servidos por S3
• Originales movidos para Glacier
para reducir costos
32. Casos de uso para las políticas de ciclo de
vida en S3
• Data-Tiering en la nube
• Administrar el control de versión de objetos para
proteger lógicamente los datos
• Exclusión de datos en Glacier en base a políticas
40. EBS
General Purpose (SSD)
hasta 16 TB
10,000 IOPS
hasta 160 Mbps
Provisioned IOPS (SSD)
hasta 16 TB
20,000 IOPS
hasta 320 Mbps
41. Precio Performance
EBS
Magnéticos Uso General IOPS Aprovisionadas
Casos de uso
Acceso a datos poco
frecuente
Volúmenes de arranque
Bases de datos pequeñas-
medianas
Desarrollo y Pruebas
E/S intensiva
Bases de datos relacionales
Bases de datos NoSQL
Medio físico de
Almacenamiento
Respaldo en discos
magnéticos
Respaldo SSD Respaldo SSD
Max IOPS 40–200 IOPS 10,000 IOPS 20,000 IOPS
Latencia (random
read)
20–40 ms 1–2 ms 1–2 ms
Disponibilidad Diseñado para 99.999% Diseñado para 99.999% Diseñado para 99.999%
Precio
$.05/GB-mes
$.05/millón de I/O
$.10/GB-mes
$.125/GB-mes
$.065/provisioned IOPS
44. Casos de uso de EBS
• Almacenamiento de bloques persistente para Amazon
EC2
• Workloads transaccionales
• Sistema de archivos para una instancia - NTFS, ExtFS,
etc…
46. ¿Que es EFS?
• Un sistema de archivos administrado para instancias EC2
• Utiliza los comandos e interfaces estándares utilizados en los
sistemas de archivos tradicionales
• Crece elásticamente hasta una escala de Petabytes
• Performance para una gran variedad de workloads
• Altamente disponible y durable
Simple Elástico Escalable
1 2 3
47. EFS está diseñado para una gran variedad de
casos de uso, como…
• Repositorio de Contenido
• Ambientes de Desarrollo
• Directorios de usuarios
• Big data
48. EFS es simple
• Totalmente gestionado por AWS
– Sin necesidad de provisionar hardware ni red
– Cree un sistema de archivos escalable en
segundos!
• Integración transparente con herramientas y
aplicaciones existentes
– NFS v4 – ampliamente adoptado y de código
abierto
– Operaciones y comandos para sistemas de
archivos
– Funciona con las APIs estándares del OS
• Precio simple = Estimativa de costo simple
1
49. EFS es elástico
• Los sistemas de archivos crecen y se
encogen automáticamente conforme
se adicionan o remueven archivos
• No es necesario provisionar tamaño o
performance
• Pague únicamente por lo que utilice,
sin cobros mínimos ni inversiones up-
front
2
50. • Los Sistemas de archivos pueden crecer
hasta una escala de petabytes
• El Throughput y las IOPS escalan
automáticamente conforme el sistema
de archivos crece
• Baja latencia consistente, independiente
del tamaño del sistema de archivos
• Soporta miles de conexiones NFS
concurrentes
EFS es escalable3
51. Arquitectura de EFS
AVAILABILITY ZONE 1
REGION
AVAILABILITY ZONE 2
AVAILABILITY ZONE 3
VPC
EC2
EC2
EC2
EC2
Sistema de
Archivos del
Cliente
52. ¿Por qué esto es importante?...
… para
responsables y
desarrolladores
de apps?
… para su
negocio?
• Facilita la migración de aplicaciones y códigos
existentes que utilizan NFS
• Almacenamiento de archivos simple para nuevas
aplicaciones nativas en la nube
• Precio previsible y sin la necesidad de invertir en up-
fronts iniciales
• Aumento de la agilidad
• Gaste menos tiempo administrando un storage y
gane mas tiempo para enforcarse en su negocio
… para
administratores
de TI?
• Elimina la necesidad de mantener y administrar un
Almacenamiento de gran escala
54. Storage Gateway
Su rampa de aceso a los servicios de
Almacenamiento en la nube de AWS:
• Backups en S3
• DR en EC2
• Archivado en Amazon Glacier
• iSCSI o VTL
55. Resumen: Servicios de Storage en AWS
S3
• Storage de Objetos: datos presentados como buckets de objetos
• Datos accedidos por APIs a través de Internet
EFS
• Storage de archivos (análogo a un NAS): datos presentados como un sistema de
archivos (file system)
• Acceso de baja latencia y compartido entre múltiples instancias EC2
Glacier
• Storage para Archiving: datos presentados como vaults/archives de objetos
• Storage de menor costo, para datos que no son accedidos frecuentemente
Storage
Gateway
• Back up y archiving de datos en Amazon S3 y Amazon Glacier
EBS
• Storage de Bloques (análogo a una SAN): datos presentados como discos o
volúmenes
• Acceso de menor latencia a partir de las Instancias EC2
So why do we provide a variety of cloud Almacenamiento services? Data comes in many shapes and sizes. Data sets and applications demand different levels of performance, cost and access interface. It is important to understand what technology will meet your data needs, giving the right performance at the right price.
We’ll start with object Almacenamiento. If you can get a way with using object Almacenamiento for production it’s always going to be the most durable and economical.
One of the most common workloads in the mobile sector is multi-media. With the proliferation of mobile devices, the amount of data that can be sent in simultaneously from a globally distributed network of phones capturing so many moments and associated media can be huge. Both in total volume and number of transactions
It is not just the consumer segment that is producing masses of data. Industries such as Financial services, healthcare, media and entertainment and advertising are collecting, analysing and storing enormous amounts of data. Advertising tech companies are tracking usage patterns to make sure shoppers only see ads relevant to what they’re looking to buy. I made sure to include a cartoon illustration for you.
Traditional Almacenamiento architectures and their on-premises administration models we not designed for the sheer volume and unpredictable nature of these new uses. Cloud object Almacenamiento is. Let’s quickly get the definition out of the way – an object is a file that also includes associated metadata. So as opposed to a file that resides in a file system and relies on it for structure and metadata, objects are files stored in a flat structure and contain their own metadata.
Amazon S3 or simple Almacenamiento service. Simple Almacenamiento sounds a bit like an oxymoron but it really takes away all the complexity of provisioning and managing Almacenamiento. It was the first service AWS released in 2006, it’s mature, proven and used at large scale by companies such Dropbox for content collaboration, Netflix for video streaming and Pinterest for their site images and big data analysis
{9 years old}
One of the main reasons our customers love S3 is its scalability. S3 automatically scale to accommodate your capacity and performance needs. Customers no longer have to worry about the underlying Almacenamiento infrastructure or designing for peaks, which is a very error prone and expensive practice.
Our customers love the maturity and value proposition of the service which is growing rapidly with data transfer rates in and out of S3 growing 95% YoY in Q4 2014 .
S3 is highly durable. Your data is stored across three separate facilities giving you geo-redundancy and we can sustain data loss in two facilities simultaneously and your data is still safe, providing a statistical measure of 11 9’s of durability. Consider what it would take to architect for such a level of durability in your own data centres
S3 is inexpensive but we also give you the flexibility of using less durable storage for a lower cost. This is a good choice for reproducible or temporary data sets
S3 is also very simple to manage. There’s no need to partition data into volumes to accommodate system limitations or performance considerations.
To store objects in S3, you can create buckets. These are a containers for your objects, unlimited by size or number of objects. These objects, with the right permissions, are addressable over the Internet.
This means S3 can be used for everything from storing images, to application data, to backups
S3 can be your web server for static content, allowing you to completely offload static web serving and run dynamic content on EC2.
So at this point, I got you thinking that S3 is great, but how much does it cost. Let’s first discuss what we’re going to charge you for… with traditional Almacenamiento you pay for raw capacity but after accounting for protection schemes, such as RAID, file system overhead and the need to keep a free Almacenamiento reserve, you’re left with much less of actual capacity used by data. With S3 you only pay for used capacity, when you use it. So in this example for 400 TBs, you’re really paying for 400 TBs and this is not accounting for DR copies. This drastic difference affects both CAPEX and OPEX costs.
S3 is priced per GB/month only for the data you actually store. We have different prices in different regions and we charge lower where our costs our lower. In our core regions prices are 3c and below. Compare that to the cost of purchasing storage systems, maintaining replicas and the associated capacity buffers required, admin effort and DC costs.
AWS is focused on continuously taking cost out of our operation and giving to our customers in the form of cost savings. S3 has a strong record of price reductions, the most recent on 4/1/2014 at an average of 51%. So as you can see doing a multi-year TCO comparison to the current S3 cost is also not exactly apples-to-apples.
We understand that some of these calculations can be complicated so we set up a dedicated team of TCO experts that is ready to help model costs. Please contact your account manager if you need our help!
Many of our customers use S3 as a staging area and persistent store for big data analysis in Amazon services such as EMR and Redshift.
Yelp, a social network focused on local businesses leverages S3 to develop features like autocomplete as you type, review highlights and ad placement based on detailed log analysis, without worrying about managing the underlying heavy lifting of Almacenamiento. These logs along user uploaded photos are stored on S3
Back at Re:Invent we announced S3 notification on PUTs. So as soon as your application uploads an object to S3, for example a new hi-res photo, S3 will fire off a notification that goes to either SNS our Simple Notification Services, SQS, our Simple Queueing Service or Lambda. That allows your application to automatically process thumbnails and resized versions for different mobile devices.
Versioning augments S3’s extreme durability and high availability with logical data protection. Once turned on, S3 will keep a version of all changed objects to protect you from human error or data corruption. You can limit how many versions S3 keeps by leveraging an expiration policy, something I’ll discuss later in this presentation.
Even though S3 provides 11 9’s of durability out of a single AWS region, some of our customers were asking us to automate replication of objects between regions to help them achieve their compliance objectives, lower latency and enhance access security. S3 users can now configure automatic replication of newly created objects with a few clicks of their AWS console. This is supported between any 2 regions at bucket or key name prefix granularity. Many customers told us they plan to use this feature to enhance access security by replicating data between buckets with separate owners. That’s powerful stuff!
To summarize, use S3 when you:
Build a web scale application and don’t want to worry about underlying Almacenamiento capacity and performance. You can also offload static web pages
Deliver media such as video, music and photos through a CDN with S3 providing the origin store
When you use Amazon’s big data analytics services, such as, Elastic Map Reduce and RedShift. Here S3 provides a durable and inexpensive Almacenamiento for staging and persistent Almacenamiento.
Leverage S3’s economics, durability and ease of management for us as a backup and archive target
We heard the feedback around S3, that it was a great product, but when you looked at infrequently accessed data that was to be archived for a long time, you wanted a cheaper solution
Glacier’s pricing per GB is unmatched in the industry, especially for this level of durability. When you mix it with S3’s pricing at 3c or less per GB/month, you can reach blended price points of less than 2c per GB/month
That equates to $120 per TB/Year.
Glacier maintains your new favourite metric of 11s of durability, while keeping the 11 x 9s of durability so your data is safe. Compare that to the durability of tape which is great for backup but not so much for restore operations.
In contrast to S3, the data in Glacier needs 3-5 hours
Send in a request to retrieve data,
We then notify you when it is there.
Then you can download.
Important to manage the retrieval costs. If your data is too hot, it will be more economic to store in S3
You have 5% of your data per month (prorated to the day) available to retrieve.
Recycle bin in the cloud
But it gets better…
A customer that makes great use of life cycle policies is Sound Cloud. They are the leading social platform for artists, allowing them to upload songs and sounds and then deliver and share it across multiple devices and social networks. They transcode on S3 for real-time delivery and then move the originals to Glacier to benefit from reduced capacity fees
Blocks Almacenamiento is usually required for high performance transactional applications such as ERP and relational and NoSQL databases
Each server comes with instance storage,
However this cannot be transferred between instances, and is lost when the instance is disposed of.
Great for temp data, etc.
For more persistent data you need EBS. Elastic Block Store is a service – which gives you cecenamiento that is replicated in the same AZ allowing you to provision the performance that your applications may need, the ability to snapshot those volumes to S3 for 11 9s of durability and provides you with metrics to monitor the state and performance of the volumes
You start by creating a volume and attaching to an EC2 instance
As opposed to Instance volumes you can reattach EBS volumes to a new instance. Your data is now persistent
You can also group multiple volumes together. Maybe you need it to segment your data –DB components (data files, logs) on different volumes.
You can also use OS and volume management tools to create a RAID array achieving larger max volume size, and increased performance.
Up until 3/2015 EBS allowed you to create a volume of up to 1TB, but let’s not dwell on the past. In March we announced support for what we can larger and faster volumes. The bottom line is we now support up to 16 TBs per EBS volume. So there’s a much lesser need now to span an application across multiple EBS volumes which simplifies management, as well as backup and recovery.
Along with the increased size, we’ve dramatically improved the performance of our SSD-based offerings. GP now supports up to 10,000 IOPS up from 3,000 and PIOPS 20,000 up from 4,000.
Let’s put it all together – When performance matters use SSD-based types. GP offers the best price performance and should always be used for boot volumes. You’ll find that it’s good enough for most applications and that’s why it’s the default volume type. You get 3 IOPS per GB and can burst up to 3000 IOPS on smaller volumes. Magnetic is still around, it’s inexpensive and should be used for infrequent data access. Finally for the most demanding and I/O intensive applications, use PIOPS which provides consistent and high IOPS. You can set the specific IOPS you require and get much higher IOPS per GB ratio.
What is really important is to get point in time snapshots – this is your backup
You can take snapshots of your EBS drives to allow you to roll back to an older version if needed. They are stored in S3 and are benefitting from it’s superior durability. They can also be used to increase the size of a volume and even migrate across AWS regions for DR scenarios or to duplicate your application in a new region.
You use EBS when you need a persistent block device for EC2 instances, when you have transactional workloads and when you want to deploy a file system on top of EC2, such as NTFS or a Linux file system. Which leads us nicely to our next topic...
Today we’re launching an exciting new AWS Service that allows multiple hosts to easily share, read and write against a common data set.
A highly consistent, highly available, extremely durable and region scale data service. We’re launching – Amazon Elastic File System
Call to action, review and sign up for preview at aws.amazon.com/efs
EFS allows you to deploy workloads that benefit from EC2’s elasticity and economics but also require access to a shared data store. In the past that required you to build your own file server on EC2. We built this service focusing on 3 core principles – simple, elastic and scalable.
It’s fully managed, provides the file system semantics and standard OS compatibility and APIs you expect, elastically grows to PB-scale without sacrificing performance and it’s highly available and durable.
EFS is designed for a broad range of use cases – let me walk you through a few examples
Content management systems store and serve information for a wide range of applications
E.g., online publications, reference libraries.
EFS: durable, high throughput backing store for these applications.
Development and build environments often have hundreds of nodes accessing a common set of source files, binaries, and other resources.
EFS: serve as the storage layer in these environments, providing a high level of IOPS to serve demanding development and test needs.
Many organizations provide storage for individual and team use.
E.g., research scientists commonly share data sets that each scientist may perform ad hoc analyses on, and these scientists also store files in their own home directories.
EFS: an administrator can create a file system where parts of it are accessible to groups in an organization and parts are accessible to individuals.
Big Data workloads often require file system operations and read-after-write consistency, as well as the ability to scale to large amounts of Almacenamiento and throughput.
EFS: file Almacenamiento that EC2 clusters can work with directly for Big Data workloads
Fully managed service – simple to administer file systems at scale
No file layer or hardware to manage – eliminates ongoing maintenance and the constant upgrade/refresh cycle. No volumes, LUNs, RAID groups, provisioning management
Create a scalable file system in seconds. That’s probably hard to believe for folks in the room who’ve managed shared file Almacenamiento – but I’ll provide it to you later in this session
[GO TO SECOND BULLET}
Simple GB-month pricing makes it easy to forecast costs
[Add to third bullet]: Payment is in the form of a simple GB/month charge.
[Read each bullet point, then add to each]
No need to re-provision, adjust performance settings, or do anything in order for your file system to grow to petabyte-scale. Just does it automatically as you add files
With each GB stored you get a particular amount of throughput and IOPS
Lots of file workloads are latency-sensitive. EFS is SSD-based and low latencies are an important part of our design
Ability to use this as a common data source in large clusters or more broadly for use cases where many nodes are accessing a single source
You can create an unlimited number of PB-scale file systems and mount them through NFS to hosts running on EC2. The data in the file system is replicated across multiple AZs and is available to EC2 instances running in all AZs that are part of an allowed VPC. These write are immediately consistent across the AZs. You can further use security groups and network ACLs to limit access from the VPC to the mount point and then use permissions to limit access to specific users and groups.
So now coming back full circle, why does all of this matter…?
[Read slide]
So AWS has a great set of cloud Almacenamiento services but how do I move my backup and archive data to leverage the ease of use and cost savings? Here we have the Amazon SGW which provides an IT friendly interface to facilitate that
The SGW is a downloadable virtual machine that supports ESX and Hyper-V. It sits in your data center and presents itself either as an block iSCSI or Virtual Tape Library target that backup and archive applications can write to without changes to existing workflows. Once data is written into the SGW, it’ll cache and buffer it on local disk, compress it and then send it over to S3 or Glacier for persistent Almacenamiento. It has a very simple pricing model of $125 per month per GW + the backend Almacenamiento costs. You can download a 60-day evaluation copy to easily test if it fits your requirements
So what’s the conclusion here besides the fact that I’m not a talented powerpoint slide designer? Amazon provides a robust set of services to cover any Almacenamiento requirement. S3 delivers highly scalable, inexpensive, durable, web accessible object Almacenamiento. Glacier lowers the price point for archival data sets and integrates with S3. EBS is where you go for high performance block Almacenamiento for EC2. EFS delivers highly available and high performance file system access with NFS v4 and the SGW is a hook into your on-prem data center to simplify backup and archive into S3 and Glacier.