Leveraging big data and high performance computing (HPC) solutions enables your organization to make smarter and faster decisions that influence strategy, increase productivity, and ultimately grow your business. We kick off the Big Data and HPC track with the latest advancements in data analytics, databases, storage, and HPC at AWS. Hear customer success stories and discover how to put data to work in your own organization.
Axa Assurance Maroc - Insurer Innovation Award 2024
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
1. November 12, 2014 | Las Vegas, NV
Ben Butler, Sr. Solutions Marketing Mgr., Big Data and HPC
2. Big data on AWS
Big data customer success stories
HPC on AWS
HPC Customer Presentation: Honda
AWS resources to get started
Big data and HPC track review: where to go next
3. Big data on AWS
Big data customer success stories
HPC on AWS
HPC Customer Presentation: Honda
AWS resources to get started
Big data and HPC track review: where to go next
4. Big data on AWS
Big data customer success stories
HPC on AWS
HPC Customer Presentation: Honda
AWS resources to get started
Big data and HPC track review: where to go next
5. Big data on AWS
Big data customer success stories
HPC on AWS
HPC Customer Presentation: Honda
AWS resources to get started
Big data and HPC track review: where to go next
6. Big data on AWS
Big data customer success stories
HPC on AWS
HPC Customer Presentation: Honda
AWS resources to get started
Big data and HPC track review: where to go next
7. Big data on AWS
Big data customer success stories
HPC on AWS
HPC Customer Presentation: Honda
AWS resources to get started
Big data and HPC track review: where to go next
30. Corporate Data
Center
Elastic Data
Center
Results of
analysis pulled
back into your
systems
31.
32. Sets new large-scale sort record with AWS
●Databricks, founders of Apache Spark
●Why AWS?
●EC2—fast access to large compute, SSD, 10Gbs network
●Agility
33. Mobile / Cable Telecom
Oil and Gas Industrial Manufacturing
Retail/Consumer Entertainment Hospitality
Life Sciences Scientific Exploration
Financial Services
Publishing Media Advertising
Online Media Social Network Gaming
34. Sling uses AWS to Store and Analyze Terabytes of Data
By using AWS, we can make decisions about new features and offers very quickly and very easily.
•Needed to leverage terabytes of usage data to generate user insights and innovate to capture market share
•Using AWS made it possible for Sling to offer value-add product to its partners
•Stored terabytes of analytics data
•Enabled near real-time ad hoc analytics
•Capacity to scale database immediately
Dmitry Dimov
Director, Online Services,
Sling Media
”
“
35. By Amazon Redshift, we can process petabytes of data from thousands of marketing campaigns simultaneously while reducing operating expenses by 75%
Zhong Hong, VP,
Infrastructure and Operations, VivaKi
”
“
36. NDN Uses AWS to Serve 600 Million Videos to Worldwide Users
Using AWS has enabled us to build a solid platform that has scaled quickly while becoming a source of profit for our customers.
•NDN, a global media exchange for publishers and content creators, enables 146 million users a month to see videos online
•Ingested and stored more than 100,000 video titles per month and served 600 million content plays a month
•Uses Amazon Kinesis to analyze over a billion user generated events and page loads per day
Eric Orme
NDN COO and CTO
”
“
37. Financial Times Uses AWS to Reduce Infrastructure Costs by 80%
When our analysts first started to do queries on Amazon Redshift, they thought it was broken because it was working so fast.
•Needed a way to increase speed, performance and flexibility of data analysis at a low cost
•Using AWS enabled FT to run queries 98% faster than previously—helping FT make business decisions quickly
•Easier to track and analyze trends
•Reduced infrastructure costs by 80% over traditional data center model
John O’Donovan
CTO, Financial Times
”
“
38. NTT DOCOMO Delivers Voice Recognition Services to Over 60 Million Customers by Using AWS
I cannot imagine NTT DOCOMO without the AWS Cloud
Minoru Etoh
Senior VP, NTT DOCOMO
”
“
•NTT DOCOMO, Inc. is the predominant mobile phone operator in Japan
•DOCOMO launched a popular voice recognition service and experienced large traffic spikes in its mobile network that impacted performance
•DOCOMO decided to migrate their whole environment to AWS last June
•The company built a voice recognition architecture able to scale easily to handle spikes in traffic and serve over 60 million customers
39. Kellogg Uses AWS to Save $900K Over 5 Years Over Using On- premises Infrastructure
Using AWS saves us $900,000 in infrastructure costs alone, and lets us run dozens of simulations a day so we can reduce trade spend. It’s a win-win.
•Needed a better way to track and model promotional costs (“trade spend”) to improve the bottom line—and needed to be able to run more than 1 trade-spend simulation/day
•By using SAP HANA on AWS, Kellogg estimates it will save $900,000 over 5 years versus traditional on-premises infrastructure alternatives
•As well, the company can run dozens of trade spend simulations each day, and decreases deployment time by 30x
Stover McIlwain
Senior Director
IT Infrastructure Engineering
”
“
40. Baylor College of Medicine Uses AWS to Accelerate Analysis and Discovery
We are able to power ultra large- scale clinical studies that require computational infrastructure in a secure and compliant environment at a scale not previously possible.
•Stores more than 430 TB of genomic result data
•Analyzes the genome sequences of more than 14,000 individuals—5 times faster than with the previous infrastructure
•Enables more than 200 scientists worldwide to share tools and data quickly
Omar Serang
DNAnexus Chief Cloud Officer, DNAnexus
”
“
41.
42. ”
“
We used Amazon EMR to make running Hadoop clusters easy, and now we can de-dupe 10+ billion documents.
Victor Moreira,
CTO, HG Data
HG Data uses AWS to process billions of documents for BI monthly
43. Internet
Hadoop
Document
Crawler
Java
Document
Crawler on
EC2
Packaging on
EC2
Amazon S3
MongoDB
Cluster on
EC2
Hadoop
ETL and
Analytics
ElasticSearch
Cluster on
EC2
Hadoop
Analytics
Java/Python
Analytics
MySQL on
RDS
HG API
HG WebApp
Direct Clients Enterprise
Partners
End Users
Client
55. Why AWS for HPC?
Low cost with flexible pricing Efficient clusters
Unlimited infrastructure
Faster time to results
Concurrent clusters on-demand
Increased collaboration
56. Popular HPC workloads on AWS
Transcoding and Encoding
Monte Carlo Simulations
Computational
Chemistry
Government and Educational Research
Modeling and Simulation
Genome Processing
60. Time: +120h
Scale Using Elastic Capacity
>600 cores
Scalability on AWS
61. Schrodinger and CycleComputing: computational chemistry
Simulation by Mark Thompson of the University of Southern California to see which of 205,000 organic compounds could be used for photovoltaic cells for solar panel material.
Estimated computation time 264 years completed in 18 hours.
•156,314 core cluster across 8 regions
•1.21 petaflops (Rpeak)
•$33,000 or 16¢ per molecule
62. Cost Benefits of HPC in the Cloud
Pay As You Go Model
Use only what you need
Multiple pricing models
On-Premises
Capital Expense Model
High upfront capital cost
High cost of ongoing support
63. Reserved
Make a low, one- time payment and receive a significant discount on the hourly charge
For committed utilization
Free Tier
Get started on AWS with free usage and no commitment
For POCs and getting started
On-Demand
Pay for compute capacity by the hour with no long-term commitments
For spiky workloads, or to define needs
Spot
Bid for unused capacity, charged at a Spot price that fluctuates based on supply and demandFor time- insensitive or transient workloads
Dedicated
Launch instances within Amazon VPC that run on hardware dedicated to a single customerFor highly sensitive or compliance related workloads
Many pricing models to support different workloads
64. When to consider running HPC workloads on AWS
New ideas
New HPC project
Proof of concept
New application features
Training models
Benchmarking algorithms
Remove the queue
Hardware refresh cycle
Reduce costs
Collaboration of results
Increase innovation speed
Reduce time to results
Improvement
65. EBS
Submit jobs, orchestrate
HPC clusters over VPC
Run 1 Million drive head
designs = 70.75 core-years
90x throughput:
Ran in 8 hours, not 30 days
3 days from idea to running
70,908 cores, 729 TFLOPS
c3, r3 with Intel E5-2670 v2
Cost: $5,594
Spot Instances
New Drive
Head
Design
Workloads
World’s Largest F500 Cloud Run
Transforming drive design to store the world’s data
Encrypt, route data to
AWS, return results
Cluster
70,908 Cores
with
Spot
Instances
66.
67.
New
Motorcycle
Products
ASIMO
Power
Products
Honda
Jet
UNI-CUB
MC-β
Automobile
Honda Smart Home
System (HSHS)
Dreams are the source of our courage and energy
to meet every challenge without fear of failure.
FCX
(as of March 31, 2014)
(April 2013 to March 2014)
68. North America
South America
Europe
China
Asia/Oceania
We had individual HPC resources
at every RandD.
Japan
Motorcycle
Power products
Fundamental research
Aircraft ENG
Automobile
Others
69. Europe
Japan
North America
Asia/Oceania
China
South America
Honda DC
We consolidated HPC resources.
Overall
Optimization
Globalization
70. Use forcertain period
Parallel
Transient clusters
Trial use
Need a lot of cores
High memory
71. Lead time
No complicated proceduresand screening
Don’t have to worry about the availability of resources.
Agility
Use the AWS API and start EC2 instances quickly
Stop it anytime you want with pay-as-you-go
Service
Choose from several EC2 instance types (including the newtypes)
EC2 Spot instances