SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
1TB/day 
Logging and counting billions of events. 
Scaling infrastructure using Amazon Web Services. 
Dirk Harms-Merbitz - grasswood@icloud.com
Amazon Web Services 
• Flexible toolkit for building Internet applications 
• Infrastructure as a service 
• Enables very fast growth 
• No commitments, capex replaced by opex
Example 
• Customer signs up on web form, specifies number of 
users, data retention policies, based on business needs. 
• Vendor programmatically spins up an instance from a 
custom AMI with EBS volumes or local storage RAIDed as 
needed to match performance, size, and cost parameters. 
• One customer or one thousand customers, the 
infrastructure and scaling of resources is handled by 
Amazon. 
• Vendor focusses on marketing, support and software 
development.
The AWS Toolkit 
• EC2 = Containers on Demand 
• EBS = Elastic Block Storage 
• S3 = Object storage and static HTTP 
• Glacier = Long term storage
Elastic Compute 2 
• Container for OS and application software 
• Storage is EBS or locally attached 
• / on EBS makes it easy to change instance size 
• Standard or custom AMI 
• An EC2 instance is not a server
Elastic Block Storage 
• More reliable than hard drives 
• Building blocks for application specific storage 
• Combine as needed using RAID and LVM 
• Different flavors, PIOPS, GP2, magnetic 
• 1TB max, 10 max per instance, 1TB = $50-$388/mo 
• Elastic Block Storage is not a disk
Local storage 
• Directly attached to an instance 
• Lower cost compared to EBS, much faster 
• Survives reboots but disappears when instance 
is stopped or terminated 
• Best used with instance level redundancy: 
RAID0 with the same data on multiple instances 
allows for very fast processing in parallel
Object Storage 3 
• Stores objects of up to 5TB 
• 4x9 availability, 11x9 durability 
• REST and SOAP interfaces - $5/1M requests 
• HTTP download, easy for customers to access 
• 1TB = $30/mo storage, $120/mo to transfer
AWS Glacier 
• Glacier Storage 
• 4x9 availability, 11x9 durability 
• $10/mo to store 1TB 
• Cost for getting data out is based on speed 
• Getting data out quickly can become expensive
AWS Optimizations 
• EBS optimized instances offer better performance. Your 
storage and network compete otherwise. 
• RAID and LVM are used to combine EBS volumes to 
match application storage size and throughput 
requirements. 
• Local SSDs double in size and speed with RAID0. Data 
survives reboots but snapshots are needed before 
stopping or terminating. 
• Cloud is not just AWS: DigitalOcean, Linode, there are 
many alternatives. EBS however makes resizing easy.
AWS Pro and Con 
• Not hardware: Intuitions based on physical hardware won’t 
transfer. Everything is throttled. 
• Flexible: Used correctly you don’t have to think about scaling 
your hardware to millions of users. Short term, testing ideas. 
• Complex: Easy to use incorrectly, with very low performance and 
very high costs possible as a result. 
• Expensive Mistakes: Storing 6TB for three years can cost as 
much as $83,808 or as little as $4,818. 
• If you know what you need, co-location delivers more for less: A 
physical 6TB drive is faster, lasts 3-5 years and costs $299.
AWS 
• Not appropriate for all businesses: Complexity 
cost, rental cost, slow technology updates. 
• Not appropriate for all applications: nobody 
mines bitcoin in AWS. 
• Not appropriate as workaround when 
management is slow in approving hardware.
Tips & Tricks 
• avoid copying data 
• use parallel or exec 
• speed up ssh, use mosh 
• use fixed length records 
• use raw block devices 
• use bitmaps
avoid copying data 
• write to EBS volume A until full 
• switch to volume B, continue writing 
• detach A and attach to processing instance 
• zero copy when a volume is passed around
parallel and pexec 
• grep, bzip2, wc, awk, sed use only a single CPU core 
• gnu parallel or pexec make use of all cores, local and even neighbors 
• pexec -o - -f instances -e x -c -- 'rsync -ae ssh /etc/hosts $x:/etc/hosts' 
• parallel ping -c1 ::: host1 host2 host2 host4 
• find -name “*csv.gz” -print | parallel zgrep “string” 
• find -name “*.csv.gz” -print | parallel zcat >all.txt 
• cat all.txt | parallel —pipe grep ‘api_key=xyz’ 
• cat all.txt | parallel —pipe wc -l | awk ‘{s+$1} END {print s}’
ssh and mosh 
• 30x faster when reusing ssh connections: 
• ControlMaster auto 
• ControlPersist yes 
• ControlPath ~/.ssh/socket-%r@%h:%p 
• mosh.mit.edu works well over lossy connections 
• including changing locations and IP numbers
fixed length records 
• Fixed length records on raw block devices 
• No compressing and uncompressing 
• No parsing of ASCII 
• No file system 
• No overflow possible, write pointer wraps
raw block devices 
• Counters on raw block devices 
• By keeping just the lower byte of a counter in 
RAM you can divide access frequency by 256 
• RAID0 of SSDs can reach 1000-2000MB/s 
• EBS 100MB/s, RAID0 of multiple EBS 800MB/s
bitmaps 
• Bitmaps for counting things and other uses 
• 100M unique users in 12.5MB of RAM 
• Hourly, Daily, Weekly, Quarterly… 
• 6TB SSD instance = 7000 bits / person on earth

Mais conteúdo relacionado

Mais procurados

Advanced Apache Cayenne
Advanced Apache CayenneAdvanced Apache Cayenne
Advanced Apache Cayenne
WO Community
 
Selecting the Right Cloud Host
Selecting the Right Cloud HostSelecting the Right Cloud Host
Selecting the Right Cloud Host
Ahsan Saleem
 
Amazon DynamoDB by Aswin
Amazon DynamoDB by AswinAmazon DynamoDB by Aswin
Amazon DynamoDB by Aswin
Agate Studio
 

Mais procurados (20)

Quixote
QuixoteQuixote
Quixote
 
AWS Customer Presentation - HotPads
AWS Customer Presentation - HotPadsAWS Customer Presentation - HotPads
AWS Customer Presentation - HotPads
 
Data Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWSData Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWS
 
Modernizing DevOps
Modernizing DevOpsModernizing DevOps
Modernizing DevOps
 
Cassandra On EPAM Cloud - VDAY 2017
Cassandra On EPAM Cloud - VDAY 2017Cassandra On EPAM Cloud - VDAY 2017
Cassandra On EPAM Cloud - VDAY 2017
 
Inception Pack Vol 2: Bizarre premium
Inception Pack Vol 2: Bizarre premiumInception Pack Vol 2: Bizarre premium
Inception Pack Vol 2: Bizarre premium
 
Leveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelinesLeveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelines
 
A Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingA Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and Processing
 
Hyperloglog Lightning Talk
Hyperloglog Lightning TalkHyperloglog Lightning Talk
Hyperloglog Lightning Talk
 
Wikipedia Cloud Search Webinar
Wikipedia Cloud Search WebinarWikipedia Cloud Search Webinar
Wikipedia Cloud Search Webinar
 
Advanced Apache Cayenne
Advanced Apache CayenneAdvanced Apache Cayenne
Advanced Apache Cayenne
 
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
 
Meetup #3: Migrate a fast scale system to AWS
Meetup #3: Migrate a fast scale system to AWSMeetup #3: Migrate a fast scale system to AWS
Meetup #3: Migrate a fast scale system to AWS
 
Overview and Best Practices for Amazon Elastic Block Store - September 2016 W...
Overview and Best Practices for Amazon Elastic Block Store - September 2016 W...Overview and Best Practices for Amazon Elastic Block Store - September 2016 W...
Overview and Best Practices for Amazon Elastic Block Store - September 2016 W...
 
Scaling MongoDB on Amazon Web Services (DAT209) | AWS re:Invent 2013
Scaling MongoDB on Amazon Web Services (DAT209) | AWS re:Invent 2013Scaling MongoDB on Amazon Web Services (DAT209) | AWS re:Invent 2013
Scaling MongoDB on Amazon Web Services (DAT209) | AWS re:Invent 2013
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
 
Selecting the Right Cloud Host
Selecting the Right Cloud HostSelecting the Right Cloud Host
Selecting the Right Cloud Host
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
Amazon DynamoDB by Aswin
Amazon DynamoDB by AswinAmazon DynamoDB by Aswin
Amazon DynamoDB by Aswin
 
Honest performance testing with NDBench
Honest performance testing with NDBenchHonest performance testing with NDBench
Honest performance testing with NDBench
 

Semelhante a AWS Cloud experience concepts tips and tricks

Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
Tom Laszewski
 
Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)
Rasmus Ekman
 

Semelhante a AWS Cloud experience concepts tips and tricks (20)

Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
Create cloud service on AWS
Create cloud service on AWSCreate cloud service on AWS
Create cloud service on AWS
 
Deep Dive on Amazon EFS (with Encryption) - AWS Online Tech Talks
Deep Dive on Amazon EFS (with Encryption) - AWS Online Tech TalksDeep Dive on Amazon EFS (with Encryption) - AWS Online Tech Talks
Deep Dive on Amazon EFS (with Encryption) - AWS Online Tech Talks
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
 
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
 
Managing storage on Prem and in Cloud
Managing storage on Prem and in CloudManaging storage on Prem and in Cloud
Managing storage on Prem and in Cloud
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
Servicios de Almacenamiento en AWS
Servicios de Almacenamiento en AWSServicios de Almacenamiento en AWS
Servicios de Almacenamiento en AWS
 
cse40822-amazon.pptx
cse40822-amazon.pptxcse40822-amazon.pptx
cse40822-amazon.pptx
 
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
 
Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)
 
Understanding Elastic Block Store Availability and Performance
Understanding Elastic Block Store Availability and PerformanceUnderstanding Elastic Block Store Availability and Performance
Understanding Elastic Block Store Availability and Performance
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?
 
STG330_Case Study How Experian Leverages Amazon EC2, EBS, and S3 with Clouder...
STG330_Case Study How Experian Leverages Amazon EC2, EBS, and S3 with Clouder...STG330_Case Study How Experian Leverages Amazon EC2, EBS, and S3 with Clouder...
STG330_Case Study How Experian Leverages Amazon EC2, EBS, and S3 with Clouder...
 
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

AWS Cloud experience concepts tips and tricks

  • 1. 1TB/day Logging and counting billions of events. Scaling infrastructure using Amazon Web Services. Dirk Harms-Merbitz - grasswood@icloud.com
  • 2. Amazon Web Services • Flexible toolkit for building Internet applications • Infrastructure as a service • Enables very fast growth • No commitments, capex replaced by opex
  • 3. Example • Customer signs up on web form, specifies number of users, data retention policies, based on business needs. • Vendor programmatically spins up an instance from a custom AMI with EBS volumes or local storage RAIDed as needed to match performance, size, and cost parameters. • One customer or one thousand customers, the infrastructure and scaling of resources is handled by Amazon. • Vendor focusses on marketing, support and software development.
  • 4. The AWS Toolkit • EC2 = Containers on Demand • EBS = Elastic Block Storage • S3 = Object storage and static HTTP • Glacier = Long term storage
  • 5. Elastic Compute 2 • Container for OS and application software • Storage is EBS or locally attached • / on EBS makes it easy to change instance size • Standard or custom AMI • An EC2 instance is not a server
  • 6. Elastic Block Storage • More reliable than hard drives • Building blocks for application specific storage • Combine as needed using RAID and LVM • Different flavors, PIOPS, GP2, magnetic • 1TB max, 10 max per instance, 1TB = $50-$388/mo • Elastic Block Storage is not a disk
  • 7. Local storage • Directly attached to an instance • Lower cost compared to EBS, much faster • Survives reboots but disappears when instance is stopped or terminated • Best used with instance level redundancy: RAID0 with the same data on multiple instances allows for very fast processing in parallel
  • 8. Object Storage 3 • Stores objects of up to 5TB • 4x9 availability, 11x9 durability • REST and SOAP interfaces - $5/1M requests • HTTP download, easy for customers to access • 1TB = $30/mo storage, $120/mo to transfer
  • 9. AWS Glacier • Glacier Storage • 4x9 availability, 11x9 durability • $10/mo to store 1TB • Cost for getting data out is based on speed • Getting data out quickly can become expensive
  • 10. AWS Optimizations • EBS optimized instances offer better performance. Your storage and network compete otherwise. • RAID and LVM are used to combine EBS volumes to match application storage size and throughput requirements. • Local SSDs double in size and speed with RAID0. Data survives reboots but snapshots are needed before stopping or terminating. • Cloud is not just AWS: DigitalOcean, Linode, there are many alternatives. EBS however makes resizing easy.
  • 11. AWS Pro and Con • Not hardware: Intuitions based on physical hardware won’t transfer. Everything is throttled. • Flexible: Used correctly you don’t have to think about scaling your hardware to millions of users. Short term, testing ideas. • Complex: Easy to use incorrectly, with very low performance and very high costs possible as a result. • Expensive Mistakes: Storing 6TB for three years can cost as much as $83,808 or as little as $4,818. • If you know what you need, co-location delivers more for less: A physical 6TB drive is faster, lasts 3-5 years and costs $299.
  • 12. AWS • Not appropriate for all businesses: Complexity cost, rental cost, slow technology updates. • Not appropriate for all applications: nobody mines bitcoin in AWS. • Not appropriate as workaround when management is slow in approving hardware.
  • 13. Tips & Tricks • avoid copying data • use parallel or exec • speed up ssh, use mosh • use fixed length records • use raw block devices • use bitmaps
  • 14. avoid copying data • write to EBS volume A until full • switch to volume B, continue writing • detach A and attach to processing instance • zero copy when a volume is passed around
  • 15. parallel and pexec • grep, bzip2, wc, awk, sed use only a single CPU core • gnu parallel or pexec make use of all cores, local and even neighbors • pexec -o - -f instances -e x -c -- 'rsync -ae ssh /etc/hosts $x:/etc/hosts' • parallel ping -c1 ::: host1 host2 host2 host4 • find -name “*csv.gz” -print | parallel zgrep “string” • find -name “*.csv.gz” -print | parallel zcat >all.txt • cat all.txt | parallel —pipe grep ‘api_key=xyz’ • cat all.txt | parallel —pipe wc -l | awk ‘{s+$1} END {print s}’
  • 16. ssh and mosh • 30x faster when reusing ssh connections: • ControlMaster auto • ControlPersist yes • ControlPath ~/.ssh/socket-%r@%h:%p • mosh.mit.edu works well over lossy connections • including changing locations and IP numbers
  • 17. fixed length records • Fixed length records on raw block devices • No compressing and uncompressing • No parsing of ASCII • No file system • No overflow possible, write pointer wraps
  • 18. raw block devices • Counters on raw block devices • By keeping just the lower byte of a counter in RAM you can divide access frequency by 256 • RAID0 of SSDs can reach 1000-2000MB/s • EBS 100MB/s, RAID0 of multiple EBS 800MB/s
  • 19. bitmaps • Bitmaps for counting things and other uses • 100M unique users in 12.5MB of RAM • Hourly, Daily, Weekly, Quarterly… • 6TB SSD instance = 7000 bits / person on earth