SlideShare uma empresa Scribd logo
1 de 24
Harish Ganesan
CTO
8KMiles
2013
P1) This Presentation is
P2) Strongly Inspired by “Guy Ritchie”
Movies
P3) Disclaimer : All images are downloaded from
internet. If you find any of the content / images violating
copyright, please let me know and I will act upon it
immediately
AGENDA
• Case
• Challenge
• Solution
• Learning's
• About us
Case
Cigarette smoking is injurious to health
• Mobile Advertising company, USA
• Forbes 1000 clientele
• TB’s of unstructured data -> Big Data
Problem
Lock
• Hourly ~1 TB
• CDN Logs
• Text Files
• XML Files
• Geo data files
• Server logs
• DB records
STOCK
• Reduce the cost leakage
• How to Save $$$ ?
Challenges
• Daily (was OK), Monthly (Pain) and Historical
analysis ( almost dead )
• How do we Transfer, Store, Analyze and Share ?
• How to optimize costs at this scale ?
Solution
Cigarette smoking is injurious to health
• Use AWS Cloud for hosting Analytics module
• Amazon EMR for unstructured Log Analysis
• Automation using Scripts, Java code and other
tools
Social / 3rd
Party
Feeds/Cloud
Logs
Stage 1: Data Transfer
• Tsunami UDP
• ~1TB un compressed logs
every hour
• High bandwidth EC2’s for
Tsunami UDP
• Other Popular Options :
• Aspera
• AWS Import/Export
• WAN optimization
• AWS Direct Connect
Amazon S3
Logs
Stage 2: Storage
• Amazon Web Services Building Block
– S3
• Scalable Object Store
• Inherently Fault Tolerant
• ~2 TB of compressed logs every day
• S3 RR option for intermediate
outputs
• Amazon Glacier for archivalSocial / 3rd
Party
Feeds/Cloud
Amazon S3
Elastic
MapReduce
Logs
Stage 3: Analyze
• Elastic MapReduce
Service of Amazon
• Minimal Setup time
• Log Analysis
• ~2000 mappers /
750 reducers @
peak
• ~250 m1.xlarge
task nodes (1000
cores, 3750 GB
RAM) @ peakSocial / 3rd
Party
Feeds/Cloud
• Amazon EMR is great
• But adding Spot EC2 is super cool
Wait !!!
What is Amazon Spot ?
13
• Time-flexible, interruption-tolerant tasks
• Bid Price & Spot Price
• M1.xlarge Price Comparison
• $0.480 per Hour – On Demand
• $0.052 per Hour - Spot
• You will never pay more than your
maximum bid price per hour
•Spot Instance may be interrupted
• If interrupted you will not be charged for
any partial hour of usage. (*Free)
Spot Bidding Strategies
14
•Just above Spot Price
•Between Spot Price & On Demand
Price
•On Demand Price
•Above On Demand Price
Spot Price Variations - AZ
Amazon EMR with Spot Instance
Project Master
Instance
Group
Core Instance
Group
Task Instance
Group
Long-running
clusters
on-demand on-demand Spot
Cost-driven
workloads
spot spot Spot
Data-critical
workloads
on-demand on-demand Spot
Application
testing
spot Spot Spot
Amazon S3
Elastic
MapReduce
Social /
3rd Party
Feeds
Logs
Stage 4: Custom EMR Manager
• We created a Custom EMR
Manager
• Choose spot based on:
• Past price trend intelligence
• Choose AZ based on Current
Market Prices
• Choose between Large vs
Extra Large
• Spot Pricing Strategy :
• Set Spot Price = On Demand
Price
• Over board <20% of On
Demand Price at times
• Dynamic Sizing the Core / Task
nodes
• Dynamic EMR Cluster creationCustom EMR
Manager
Some Spot Use Cases
18
• Analytics & Big Data
• Scientific computing
• Web crawling
• Financial model and Analysis
• Testing
• Image & Media Encoding
66 % savings
50 % savings
57 % savings
Learning
• Spot + On demand EC2 is a deadly combination for cost savings
• Every millisecond matters in MR – Tune your code
• Merge Files – Bigger ones are better for processing
More Learning …
• Custom Job Manager was designed by us
• 1 File Per Mapper was better for our case in AWS
• Understand the performance constraints of AWS and
work with it
• Compress data : Both storage and transit(.LZO & Snappy)
Continues…
• Keep configuration data in local memory or Amazon
DynamoDB
• Reducers split files suitable for next job mappers
• Elasticity – Increase/Decrease Task nodes
• Elasticity – Create new EMR Clusters matching the Logs
(Core + Task)
Value
• ~56% cost savings from pure On-Demand model for Core+
Task Nodes
• Automation vastly reduced Labor cost ( initial + on going)
• Customer CXO’s were happy
• AWS Premium Partner
• Solution Experts in
• Cloud Computing
• Big Data
• Identity Management
About US
Shoot your ?
Harish@8kmiles.com
http://harish11g.blogspot.com
@harish11g
harishganesan

Mais conteúdo relacionado

Mais procurados

AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?Amazon Web Services
 
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)Amazon Web Services
 
AWS Summit London 2014 | Deployment Done Right (300)
AWS Summit London 2014 | Deployment Done Right (300)AWS Summit London 2014 | Deployment Done Right (300)
AWS Summit London 2014 | Deployment Done Right (300)Amazon Web Services
 
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)Amazon Web Services
 
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape SoftwareTO THE NEW | Technology
 
Aws 201:Advanced Breakout Track on HA and DR
Aws 201:Advanced Breakout Track on HA and DRAws 201:Advanced Breakout Track on HA and DR
Aws 201:Advanced Breakout Track on HA and DRHarish Ganesan
 
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAmazon Web Services
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)Amazon Web Services
 
Auto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and RedisAuto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and RedisYi Hsuan (Jeddie) Chuang
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanAmazon Web Services
 
(CMP201) All You Need To Know About Auto Scaling
(CMP201) All You Need To Know About Auto Scaling(CMP201) All You Need To Know About Auto Scaling
(CMP201) All You Need To Know About Auto ScalingAmazon Web Services
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon AuroraAmazon Web Services
 
All you need to know about Auto scaling - Pop-up Loft
All you need to know about Auto scaling - Pop-up LoftAll you need to know about Auto scaling - Pop-up Loft
All you need to know about Auto scaling - Pop-up LoftAmazon Web Services
 
This One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsThis One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsAmazon Web Services
 

Mais procurados (20)

AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
 
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
 
AWS Summit London 2014 | Deployment Done Right (300)
AWS Summit London 2014 | Deployment Done Right (300)AWS Summit London 2014 | Deployment Done Right (300)
AWS Summit London 2014 | Deployment Done Right (300)
 
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
 
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
 
Aws 201:Advanced Breakout Track on HA and DR
Aws 201:Advanced Breakout Track on HA and DRAws 201:Advanced Breakout Track on HA and DR
Aws 201:Advanced Breakout Track on HA and DR
 
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
 
Aws Autoscaling
Aws AutoscalingAws Autoscaling
Aws Autoscaling
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
 
Auto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and RedisAuto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and Redis
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
 
Auto Scaling with Amazon Web Services
Auto Scaling with Amazon Web ServicesAuto Scaling with Amazon Web Services
Auto Scaling with Amazon Web Services
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
(CMP201) All You Need To Know About Auto Scaling
(CMP201) All You Need To Know About Auto Scaling(CMP201) All You Need To Know About Auto Scaling
(CMP201) All You Need To Know About Auto Scaling
 
How Does Amazon EC2 Auto Scaling Work
How Does Amazon EC2 Auto Scaling WorkHow Does Amazon EC2 Auto Scaling Work
How Does Amazon EC2 Auto Scaling Work
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon Aurora
 
All you need to know about Auto scaling - Pop-up Loft
All you need to know about Auto scaling - Pop-up LoftAll you need to know about Auto scaling - Pop-up Loft
All you need to know about Auto scaling - Pop-up Loft
 
This One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsThis One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You Thousands
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
 
Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 SpotIntroduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot
 

Semelhante a Cloud Connect 2013- Lock Stock and x Smoking EC2's

Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rasmus Ekman
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...Amazon Web Services
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayAmazon Web Services Korea
 
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivAmazon Web Services
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
AWS Meet-up Atlanta: AWS Economics
AWS Meet-up Atlanta: AWS EconomicsAWS Meet-up Atlanta: AWS Economics
AWS Meet-up Atlanta: AWS EconomicsAaron Klein
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...Julien SIMON
 
AWS Cost Optimization
AWS Cost OptimizationAWS Cost Optimization
AWS Cost OptimizationMiles Ward
 
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...Amazon Web Services
 
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...Amazon Web Services
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database ServicesAmazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
AWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWSAWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWSAmazon Web Services
 
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)Amazon Web Services
 
How to Reduce your Spend on AWS
How to Reduce your Spend on AWSHow to Reduce your Spend on AWS
How to Reduce your Spend on AWSJoseph K. Ziegler
 

Semelhante a Cloud Connect 2013- Lock Stock and x Smoking EC2's (20)

Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
 
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
AWS Meet-up Atlanta: AWS Economics
AWS Meet-up Atlanta: AWS EconomicsAWS Meet-up Atlanta: AWS Economics
AWS Meet-up Atlanta: AWS Economics
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
 
AWS Cost Optimization
AWS Cost OptimizationAWS Cost Optimization
AWS Cost Optimization
 
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
 
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
AWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWSAWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWS
 
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
 
How to Reduce your Spend on AWS
How to Reduce your Spend on AWSHow to Reduce your Spend on AWS
How to Reduce your Spend on AWS
 

Último

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Último (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Cloud Connect 2013- Lock Stock and x Smoking EC2's

  • 2. P1) This Presentation is P2) Strongly Inspired by “Guy Ritchie” Movies P3) Disclaimer : All images are downloaded from internet. If you find any of the content / images violating copyright, please let me know and I will act upon it immediately
  • 3. AGENDA • Case • Challenge • Solution • Learning's • About us
  • 4. Case Cigarette smoking is injurious to health • Mobile Advertising company, USA • Forbes 1000 clientele • TB’s of unstructured data -> Big Data Problem
  • 5. Lock • Hourly ~1 TB • CDN Logs • Text Files • XML Files • Geo data files • Server logs • DB records
  • 6. STOCK • Reduce the cost leakage • How to Save $$$ ?
  • 7. Challenges • Daily (was OK), Monthly (Pain) and Historical analysis ( almost dead ) • How do we Transfer, Store, Analyze and Share ? • How to optimize costs at this scale ?
  • 8. Solution Cigarette smoking is injurious to health • Use AWS Cloud for hosting Analytics module • Amazon EMR for unstructured Log Analysis • Automation using Scripts, Java code and other tools
  • 9. Social / 3rd Party Feeds/Cloud Logs Stage 1: Data Transfer • Tsunami UDP • ~1TB un compressed logs every hour • High bandwidth EC2’s for Tsunami UDP • Other Popular Options : • Aspera • AWS Import/Export • WAN optimization • AWS Direct Connect
  • 10. Amazon S3 Logs Stage 2: Storage • Amazon Web Services Building Block – S3 • Scalable Object Store • Inherently Fault Tolerant • ~2 TB of compressed logs every day • S3 RR option for intermediate outputs • Amazon Glacier for archivalSocial / 3rd Party Feeds/Cloud
  • 11. Amazon S3 Elastic MapReduce Logs Stage 3: Analyze • Elastic MapReduce Service of Amazon • Minimal Setup time • Log Analysis • ~2000 mappers / 750 reducers @ peak • ~250 m1.xlarge task nodes (1000 cores, 3750 GB RAM) @ peakSocial / 3rd Party Feeds/Cloud
  • 12. • Amazon EMR is great • But adding Spot EC2 is super cool Wait !!!
  • 13. What is Amazon Spot ? 13 • Time-flexible, interruption-tolerant tasks • Bid Price & Spot Price • M1.xlarge Price Comparison • $0.480 per Hour – On Demand • $0.052 per Hour - Spot • You will never pay more than your maximum bid price per hour •Spot Instance may be interrupted • If interrupted you will not be charged for any partial hour of usage. (*Free)
  • 14. Spot Bidding Strategies 14 •Just above Spot Price •Between Spot Price & On Demand Price •On Demand Price •Above On Demand Price
  • 16. Amazon EMR with Spot Instance Project Master Instance Group Core Instance Group Task Instance Group Long-running clusters on-demand on-demand Spot Cost-driven workloads spot spot Spot Data-critical workloads on-demand on-demand Spot Application testing spot Spot Spot
  • 17. Amazon S3 Elastic MapReduce Social / 3rd Party Feeds Logs Stage 4: Custom EMR Manager • We created a Custom EMR Manager • Choose spot based on: • Past price trend intelligence • Choose AZ based on Current Market Prices • Choose between Large vs Extra Large • Spot Pricing Strategy : • Set Spot Price = On Demand Price • Over board <20% of On Demand Price at times • Dynamic Sizing the Core / Task nodes • Dynamic EMR Cluster creationCustom EMR Manager
  • 18. Some Spot Use Cases 18 • Analytics & Big Data • Scientific computing • Web crawling • Financial model and Analysis • Testing • Image & Media Encoding 66 % savings 50 % savings 57 % savings
  • 19. Learning • Spot + On demand EC2 is a deadly combination for cost savings • Every millisecond matters in MR – Tune your code • Merge Files – Bigger ones are better for processing
  • 20. More Learning … • Custom Job Manager was designed by us • 1 File Per Mapper was better for our case in AWS • Understand the performance constraints of AWS and work with it • Compress data : Both storage and transit(.LZO & Snappy)
  • 21. Continues… • Keep configuration data in local memory or Amazon DynamoDB • Reducers split files suitable for next job mappers • Elasticity – Increase/Decrease Task nodes • Elasticity – Create new EMR Clusters matching the Logs (Core + Task)
  • 22. Value • ~56% cost savings from pure On-Demand model for Core+ Task Nodes • Automation vastly reduced Labor cost ( initial + on going) • Customer CXO’s were happy
  • 23. • AWS Premium Partner • Solution Experts in • Cloud Computing • Big Data • Identity Management About US