SlideShare uma empresa Scribd logo
1 de 6
Baixar para ler offline
Achieve higher data throughput for your Apache
Spark machine learning workloads with M5n
instances for Amazon Web Services featuring
2nd Generation Intel Xeon Scalable Processors –
Cascade Lake
M5n series instances that featured 2nd Generation Intel Xeon
Scalable processors handled more data per second compared to
older M4 instances that featured Broadwell processors
If your company has decided to run its large-scale machine learning workloads on Amazon Web Services, you
have a big decision to make: which instances will you purchase to power your work?
At Principled Technologies, we used two machine learning workloads from the HiBench benchmarking suite
to test two series of small, medium, and large Amazon Elastic Cloud Compute (Amazon EC2) instances
running Apache Spark. We compared newer M5n series instances that featured 2nd Generation Intel Xeon
Scalable processors—also known as Cascade Lake—to older M4 series instances that featured Intel Xeon E5
v4 processors—also known as Broadwell. We found that at all three instance sizes, the newer M5n instances
processed a higher volume of data compared to the older instances.
These advantages could help your business to organize its data and complete big data analysis work in less time,
enabling you to reach mission-critical conclusions faster. This could in turn lead to key decisions that could give
your business the edge over competitors.
Small instances:
Process data at up
to 1.57x the rate
Medium instances:
Process data at up
to 1.42x the rate
Large instances:
Process data at up
to 1.72x the rate
Achieve higher data throughput for your Apache Spark machine learning workloads with M5n instances for
Amazon Web Services featuring 2nd Generation Intel Xeon Scalable Processors – Cascade Lake
December 2020
A Principled Technologies report: Hands-on testing. Real-world results.
How we tested
We tested two series of general-purpose Amazon EC2 instances:
• Newer M5n series instances featuring Cascade Lake processors
• Older M4 series instances featuring Broadwell processors
We compared these instances across three sizes: small instances with 8 vCPUs, medium instances with 16 vCPUs,
and large instances with 64 vCPUs. For each environment, we created a five-node cluster of identical instances.
Figure 1: Specifications for the Amazon EC2 instances we used. We tested each instance in the us-east-1f region. Source:
Principled Technologies.
Testing with the HiBench benchmark suite
We assessed the Amazon EC2 instances with two machine leaning implementations from the HiBench
benchmark suite:
• The training process for Naive Bayesian classification
• The k-means clustering algorithm
Large-scale machine learning involves processing and analyzing unstructured data from many sources. Bayesian
classification assesses a system’s ability to mine data to make probabilistic predictions, while the k-means
algorithm uses data-point clustering to provide business insights, stressing the compute capabilities of a system.
With each workload representing different types of data manipulation, we can assess multiple aspects of the
instance cluster’s machine learning performance.
Small instances Medium instances Large instances
Memory: 64 GB
New instance: m5n.4xlarge
Old instance: m4.4xlarge
Memory: 256 GB
New instance: m5n.16xlarge
Old instance: m4.16xlarge
Memory: 32 GB
New instance: m5n.2xlarge
Old instance: m4.2xlarge
vCPUs: 8 vCPUs: 16 vCPUs: 64
December 2020 | 2
Achieve higher data throughput for your Apache Spark machine learning workloads with M5n instances for
Amazon Web Services featuring 2nd Generation Intel Xeon Scalable Processors – Cascade Lake
Our results
Small instances
Smaller organizations that don’t require a great deal of computing resources can still benefit from purchasing
newer instances for their machine learning work. Figure 2 shows that the m5n.2xlarge instance cluster that
featured Cascade Lake processors cluster processed a higher volume of data on average than the older
m4.2xlarge instance cluster that featured Broadwell processors. This was true for both the Bayesian classification
and k-means clustering workloads.
Medium instances
Businesses with more data to analyze may find that newer medium-sized M5n instances suit their needs. Figure 3
shows that, on average, the Cascade Lake processor-powered m5n.4xlarge instance cluster processed 1.22 times
the volume of data on the Bayesian classification benchmark compared to the older m4.4xlarge instance cluster
that featured Broadwell processors, and 1.42 times the data throughput on the k-means clustering workload.
Figure 2: Normalized comparison of the average throughput each small instance cluster achieved in the Naive Bayesian
classification and k-means clustering workloads. Higher throughput is better. Source: Principled Technologies.
Figure 3: Normalized comparison of average data throughput each medium instance cluster achieved in the Naive Bayesian
classification and k-means clustering workloads. Higher throughput is better. Source: Principled Technologies.
0 0.2 0.4 0.6 0.8 1.0 1.4 1.6 1.81.2
Bayes
k-means
Small instance comparison: relative throughput
Relative data throughput
Higher is better
up to
1.57x the
throughput
Bayes
m5n.2xlarge – Cascade Lake m4.2xlarge – Broadwell
0 0.2 0.4 0.6 0.8 1.0 1.4 1.6 1.81.2
Medium instance comparison: relative throughput
Higher is better
up to
1.42x the
throughput
Bayes
k-means
Relative data throughput
m5n.4xlarge – Cascade Lake m4.4xlarge – Broadwell
December 2020 | 3
Achieve higher data throughput for your Apache Spark machine learning workloads with M5n instances for
Amazon Web Services featuring 2nd Generation Intel Xeon Scalable Processors – Cascade Lake
Large instances
Large businesses with vast stores of critical data to analyze need all the processing power they can afford. Figure
4 shows that the m5n.16xlarge instance cluster, which featured Cascade Lake processors, handled a larger
volume of data than the m4.16xlarge instance cluster that featured Broadwell processors. On average, the cluster
of newer instances processed 1.56 times the data per second on the Bayesian classification benchmark, and 1.72
times the data per second on the k-means clustering workload.
Figure 4: Normalized comparison of the average data throughput each large instance cluster achieved in the Naive
Bayesian classification and k-means clustering workloads. Higher throughput is better. Source: Principled Technologies.
Large instance comparison: relative throughput
Higher is better
up to
1.72x the
throughput
0 0.2 0.4 0.6 0.8 1.0 1.4 1.6 1.81.2
Bayes
k-means
Relative data throughput
m5n.16xlarge – Cascade Lake m4.16xlarge – Broadwell
Better performance, better value for your investment
In our tests, the newer M5n series instance clusters handled a range of 1.22x to 1.72x higher volume of data
compared to the older M4 series instance clusters. Because M5n instances cost just 1.19x as much as the older
instances, they may be a better value for machine learning workloads such as the ones we tested.1
December 2020 | 4
Achieve higher data throughput for your Apache Spark machine learning workloads with M5n instances for
Amazon Web Services featuring 2nd Generation Intel Xeon Scalable Processors – Cascade Lake
Performance that scales at higher vCPU counts
In both the Bayesian classification and k-means clustering workloads, the performance margin between the
M5n and M4 series instances increased significantly between the medium (16vCPU) and large (64vCPU)
instance cluster tests.
M5n series instances for Amazon EC2
In 2019, Amazon introduced M5n instances to their EC2 offerings, extending 100Gbps networking and adding
improved packet-processing performance capabilities to its general-purpose M family.2
In addition to adding
2nd Generation Intel Xeon Scalable processors to the M-series options, M5n instances support the following:
• Higher CPU core frequency (3.1 GHz vs 2.3 GHz)
• Intel DL Boost Vector Neural Network Instructions, which Intel claims can “deliver a significant performance
improvement” for deep learning applications3
• 25 Gbps of peak bandwidth for small and medium instances, 75Gbps for large instances
• AWS Nitro System, which provides better security and performance compared to older AWS hypervisors4
December 2020 | 5
Achieve higher data throughput for your Apache Spark machine learning workloads with M5n instances for
Amazon Web Services featuring 2nd Generation Intel Xeon Scalable Processors – Cascade Lake
Conclusion
Your organization deserves powerful instances to run its cloud-based Apache Spark machine learning work. The
time you save on big data analysis can go toward informed planning and decision-making that can affect the
outcome of critical business goals.
In our Naive Bayesian classification and k-means clustering tests, newer M5n series instances for Amazon EC2
that featured Cascade Lake processors handled more data per second on average compared to older M4 series
instances that featured Broadwell processors. The performance advantages we saw during testing also mean that
organizations that invest in the newer instances would be getting better value for their money.
1	 “Amazon EC2 On-Demand Pricing,” accessed October 28, 2020, https://aws.amazon.com/ec2/pricing/on-demand/
2	 Julien Simon, “New M5n and R5n EC2 Instances, with up to 100Gbps networking,” accessed November 3, 2020,
https://aws.amazon.com/blogs/aws/new-m5n-and-r5n-instances-with-up-to-100-gbps-networking/.
3	 “Introduction to Intel Deep Learning Boost on Second Generation Intel Xeon Scalable Processors,” accessed November
3, 2020, https://software.intel.com/content/www/us/en/develop/articles/introduction-to-intel-deep-learning-boost-on-sec-
ond-generation-intel-xeon-scalable.html.
4	 “AWS Nitro System,” accessed November 16, 2020, https://aws.amazon.com/ec2/nitro/.
Principled Technologies is a registered trademark of Principled Technologies, Inc.
All other product names are the trademarks of their respective owners.
For additional information, review the science behind this report.
Principled
Technologies®
Facts matter.®Principled
Technologies®
Facts matter.®
This project was commissioned by Intel.
Read the science behind this report at http://facts.pt/AFf9GVO
December 2020 | 6
Achieve higher data throughput for your Apache Spark machine learning workloads with M5n instances for
Amazon Web Services featuring 2nd Generation Intel Xeon Scalable Processors – Cascade Lake

Mais conteúdo relacionado

Mais de Principled Technologies

Finding the path to AI success with the Dell AI portfolio - Summary
Finding the path to AI success with the Dell AI portfolio - SummaryFinding the path to AI success with the Dell AI portfolio - Summary
Finding the path to AI success with the Dell AI portfolio - SummaryPrincipled Technologies
 
Finding the path to AI success with the Dell AI portfolio
Finding the path to AI success with the Dell AI portfolioFinding the path to AI success with the Dell AI portfolio
Finding the path to AI success with the Dell AI portfolioPrincipled Technologies
 
Achieve strong performance and value on Azure SQL Database Hyperscale
Achieve strong performance and value on Azure SQL Database HyperscaleAchieve strong performance and value on Azure SQL Database Hyperscale
Achieve strong performance and value on Azure SQL Database HyperscalePrincipled Technologies
 
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...Principled Technologies
 
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...Principled Technologies
 
Utilizing Azure Cosmos DB for intelligent AI‑powered applications
Utilizing Azure Cosmos DB for intelligent AI‑powered applicationsUtilizing Azure Cosmos DB for intelligent AI‑powered applications
Utilizing Azure Cosmos DB for intelligent AI‑powered applicationsPrincipled Technologies
 
Build an Azure OpenAI application using your own enterprise data
Build an Azure OpenAI application using your own enterprise dataBuild an Azure OpenAI application using your own enterprise data
Build an Azure OpenAI application using your own enterprise dataPrincipled Technologies
 
Dell Chromebooks: Durable, easy to deploy, and easy to service
Dell Chromebooks: Durable, easy to deploy, and easy to serviceDell Chromebooks: Durable, easy to deploy, and easy to service
Dell Chromebooks: Durable, easy to deploy, and easy to servicePrincipled Technologies
 
Back up and restore data faster with a Dell PowerProtect Data Manager Applian...
Back up and restore data faster with a Dell PowerProtect Data Manager Applian...Back up and restore data faster with a Dell PowerProtect Data Manager Applian...
Back up and restore data faster with a Dell PowerProtect Data Manager Applian...Principled Technologies
 
Back up and restore data faster with a Dell PowerProtect Data Manager Appliance
Back up and restore data faster with a Dell PowerProtect Data Manager ApplianceBack up and restore data faster with a Dell PowerProtect Data Manager Appliance
Back up and restore data faster with a Dell PowerProtect Data Manager AppliancePrincipled Technologies
 
Meeting the challenges of AI workloads with the Dell AI portfolio - Summary
Meeting the challenges of AI workloads with the Dell AI portfolio - SummaryMeeting the challenges of AI workloads with the Dell AI portfolio - Summary
Meeting the challenges of AI workloads with the Dell AI portfolio - SummaryPrincipled Technologies
 
The Dell Latitude 5440 survived 30 drops and still functioned properly
The Dell Latitude 5440 survived 30 drops and still functioned properlyThe Dell Latitude 5440 survived 30 drops and still functioned properly
The Dell Latitude 5440 survived 30 drops and still functioned properlyPrincipled Technologies
 
Meeting the challenges of AI workloads with the Dell AI portfolio
Meeting the challenges of AI workloads with the Dell AI portfolioMeeting the challenges of AI workloads with the Dell AI portfolio
Meeting the challenges of AI workloads with the Dell AI portfolioPrincipled Technologies
 
A Dell PowerEdge MX environment using OpenManage Enterprise and OpenManage En...
A Dell PowerEdge MX environment using OpenManage Enterprise and OpenManage En...A Dell PowerEdge MX environment using OpenManage Enterprise and OpenManage En...
A Dell PowerEdge MX environment using OpenManage Enterprise and OpenManage En...Principled Technologies
 
A Dell PowerEdge MX environment using OpenManage Enterprise and OpenManage En...
A Dell PowerEdge MX environment using OpenManage Enterprise and OpenManage En...A Dell PowerEdge MX environment using OpenManage Enterprise and OpenManage En...
A Dell PowerEdge MX environment using OpenManage Enterprise and OpenManage En...Principled Technologies
 
Escalate productivity and output with the HP ZBook Firefly 14 G10 mobile work...
Escalate productivity and output with the HP ZBook Firefly 14 G10 mobile work...Escalate productivity and output with the HP ZBook Firefly 14 G10 mobile work...
Escalate productivity and output with the HP ZBook Firefly 14 G10 mobile work...Principled Technologies
 
Significant AI inference performance advances with the HPE ProLiant DL380 Gen...
Significant AI inference performance advances with the HPE ProLiant DL380 Gen...Significant AI inference performance advances with the HPE ProLiant DL380 Gen...
Significant AI inference performance advances with the HPE ProLiant DL380 Gen...Principled Technologies
 
Improve AI inference performance with HPE ProLiant DL380 Gen11 servers, power...
Improve AI inference performance with HPE ProLiant DL380 Gen11 servers, power...Improve AI inference performance with HPE ProLiant DL380 Gen11 servers, power...
Improve AI inference performance with HPE ProLiant DL380 Gen11 servers, power...Principled Technologies
 
HP ZBook G10 mobile workstations: Get the performance you need to take projec...
HP ZBook G10 mobile workstations: Get the performance you need to take projec...HP ZBook G10 mobile workstations: Get the performance you need to take projec...
HP ZBook G10 mobile workstations: Get the performance you need to take projec...Principled Technologies
 
How Dell and Broadcom can help you make the transition to IPv6
How Dell and Broadcom can help you make the transition to IPv6How Dell and Broadcom can help you make the transition to IPv6
How Dell and Broadcom can help you make the transition to IPv6Principled Technologies
 

Mais de Principled Technologies (20)

Finding the path to AI success with the Dell AI portfolio - Summary
Finding the path to AI success with the Dell AI portfolio - SummaryFinding the path to AI success with the Dell AI portfolio - Summary
Finding the path to AI success with the Dell AI portfolio - Summary
 
Finding the path to AI success with the Dell AI portfolio
Finding the path to AI success with the Dell AI portfolioFinding the path to AI success with the Dell AI portfolio
Finding the path to AI success with the Dell AI portfolio
 
Achieve strong performance and value on Azure SQL Database Hyperscale
Achieve strong performance and value on Azure SQL Database HyperscaleAchieve strong performance and value on Azure SQL Database Hyperscale
Achieve strong performance and value on Azure SQL Database Hyperscale
 
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
 
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
Improve backup and recovery outcomes by combining Dell APEX Data Storage Serv...
 
Utilizing Azure Cosmos DB for intelligent AI‑powered applications
Utilizing Azure Cosmos DB for intelligent AI‑powered applicationsUtilizing Azure Cosmos DB for intelligent AI‑powered applications
Utilizing Azure Cosmos DB for intelligent AI‑powered applications
 
Build an Azure OpenAI application using your own enterprise data
Build an Azure OpenAI application using your own enterprise dataBuild an Azure OpenAI application using your own enterprise data
Build an Azure OpenAI application using your own enterprise data
 
Dell Chromebooks: Durable, easy to deploy, and easy to service
Dell Chromebooks: Durable, easy to deploy, and easy to serviceDell Chromebooks: Durable, easy to deploy, and easy to service
Dell Chromebooks: Durable, easy to deploy, and easy to service
 
Back up and restore data faster with a Dell PowerProtect Data Manager Applian...
Back up and restore data faster with a Dell PowerProtect Data Manager Applian...Back up and restore data faster with a Dell PowerProtect Data Manager Applian...
Back up and restore data faster with a Dell PowerProtect Data Manager Applian...
 
Back up and restore data faster with a Dell PowerProtect Data Manager Appliance
Back up and restore data faster with a Dell PowerProtect Data Manager ApplianceBack up and restore data faster with a Dell PowerProtect Data Manager Appliance
Back up and restore data faster with a Dell PowerProtect Data Manager Appliance
 
Meeting the challenges of AI workloads with the Dell AI portfolio - Summary
Meeting the challenges of AI workloads with the Dell AI portfolio - SummaryMeeting the challenges of AI workloads with the Dell AI portfolio - Summary
Meeting the challenges of AI workloads with the Dell AI portfolio - Summary
 
The Dell Latitude 5440 survived 30 drops and still functioned properly
The Dell Latitude 5440 survived 30 drops and still functioned properlyThe Dell Latitude 5440 survived 30 drops and still functioned properly
The Dell Latitude 5440 survived 30 drops and still functioned properly
 
Meeting the challenges of AI workloads with the Dell AI portfolio
Meeting the challenges of AI workloads with the Dell AI portfolioMeeting the challenges of AI workloads with the Dell AI portfolio
Meeting the challenges of AI workloads with the Dell AI portfolio
 
A Dell PowerEdge MX environment using OpenManage Enterprise and OpenManage En...
A Dell PowerEdge MX environment using OpenManage Enterprise and OpenManage En...A Dell PowerEdge MX environment using OpenManage Enterprise and OpenManage En...
A Dell PowerEdge MX environment using OpenManage Enterprise and OpenManage En...
 
A Dell PowerEdge MX environment using OpenManage Enterprise and OpenManage En...
A Dell PowerEdge MX environment using OpenManage Enterprise and OpenManage En...A Dell PowerEdge MX environment using OpenManage Enterprise and OpenManage En...
A Dell PowerEdge MX environment using OpenManage Enterprise and OpenManage En...
 
Escalate productivity and output with the HP ZBook Firefly 14 G10 mobile work...
Escalate productivity and output with the HP ZBook Firefly 14 G10 mobile work...Escalate productivity and output with the HP ZBook Firefly 14 G10 mobile work...
Escalate productivity and output with the HP ZBook Firefly 14 G10 mobile work...
 
Significant AI inference performance advances with the HPE ProLiant DL380 Gen...
Significant AI inference performance advances with the HPE ProLiant DL380 Gen...Significant AI inference performance advances with the HPE ProLiant DL380 Gen...
Significant AI inference performance advances with the HPE ProLiant DL380 Gen...
 
Improve AI inference performance with HPE ProLiant DL380 Gen11 servers, power...
Improve AI inference performance with HPE ProLiant DL380 Gen11 servers, power...Improve AI inference performance with HPE ProLiant DL380 Gen11 servers, power...
Improve AI inference performance with HPE ProLiant DL380 Gen11 servers, power...
 
HP ZBook G10 mobile workstations: Get the performance you need to take projec...
HP ZBook G10 mobile workstations: Get the performance you need to take projec...HP ZBook G10 mobile workstations: Get the performance you need to take projec...
HP ZBook G10 mobile workstations: Get the performance you need to take projec...
 
How Dell and Broadcom can help you make the transition to IPv6
How Dell and Broadcom can help you make the transition to IPv6How Dell and Broadcom can help you make the transition to IPv6
How Dell and Broadcom can help you make the transition to IPv6
 

Último

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Último (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Achieve higher data throughput for your Apache Spark machine learning workloads with M5n instances for Amazon Web Services featuring 2nd Generation Intel Xeon Scalable Processors – Cascade Lake

  • 1. Achieve higher data throughput for your Apache Spark machine learning workloads with M5n instances for Amazon Web Services featuring 2nd Generation Intel Xeon Scalable Processors – Cascade Lake M5n series instances that featured 2nd Generation Intel Xeon Scalable processors handled more data per second compared to older M4 instances that featured Broadwell processors If your company has decided to run its large-scale machine learning workloads on Amazon Web Services, you have a big decision to make: which instances will you purchase to power your work? At Principled Technologies, we used two machine learning workloads from the HiBench benchmarking suite to test two series of small, medium, and large Amazon Elastic Cloud Compute (Amazon EC2) instances running Apache Spark. We compared newer M5n series instances that featured 2nd Generation Intel Xeon Scalable processors—also known as Cascade Lake—to older M4 series instances that featured Intel Xeon E5 v4 processors—also known as Broadwell. We found that at all three instance sizes, the newer M5n instances processed a higher volume of data compared to the older instances. These advantages could help your business to organize its data and complete big data analysis work in less time, enabling you to reach mission-critical conclusions faster. This could in turn lead to key decisions that could give your business the edge over competitors. Small instances: Process data at up to 1.57x the rate Medium instances: Process data at up to 1.42x the rate Large instances: Process data at up to 1.72x the rate Achieve higher data throughput for your Apache Spark machine learning workloads with M5n instances for Amazon Web Services featuring 2nd Generation Intel Xeon Scalable Processors – Cascade Lake December 2020 A Principled Technologies report: Hands-on testing. Real-world results.
  • 2. How we tested We tested two series of general-purpose Amazon EC2 instances: • Newer M5n series instances featuring Cascade Lake processors • Older M4 series instances featuring Broadwell processors We compared these instances across three sizes: small instances with 8 vCPUs, medium instances with 16 vCPUs, and large instances with 64 vCPUs. For each environment, we created a five-node cluster of identical instances. Figure 1: Specifications for the Amazon EC2 instances we used. We tested each instance in the us-east-1f region. Source: Principled Technologies. Testing with the HiBench benchmark suite We assessed the Amazon EC2 instances with two machine leaning implementations from the HiBench benchmark suite: • The training process for Naive Bayesian classification • The k-means clustering algorithm Large-scale machine learning involves processing and analyzing unstructured data from many sources. Bayesian classification assesses a system’s ability to mine data to make probabilistic predictions, while the k-means algorithm uses data-point clustering to provide business insights, stressing the compute capabilities of a system. With each workload representing different types of data manipulation, we can assess multiple aspects of the instance cluster’s machine learning performance. Small instances Medium instances Large instances Memory: 64 GB New instance: m5n.4xlarge Old instance: m4.4xlarge Memory: 256 GB New instance: m5n.16xlarge Old instance: m4.16xlarge Memory: 32 GB New instance: m5n.2xlarge Old instance: m4.2xlarge vCPUs: 8 vCPUs: 16 vCPUs: 64 December 2020 | 2 Achieve higher data throughput for your Apache Spark machine learning workloads with M5n instances for Amazon Web Services featuring 2nd Generation Intel Xeon Scalable Processors – Cascade Lake
  • 3. Our results Small instances Smaller organizations that don’t require a great deal of computing resources can still benefit from purchasing newer instances for their machine learning work. Figure 2 shows that the m5n.2xlarge instance cluster that featured Cascade Lake processors cluster processed a higher volume of data on average than the older m4.2xlarge instance cluster that featured Broadwell processors. This was true for both the Bayesian classification and k-means clustering workloads. Medium instances Businesses with more data to analyze may find that newer medium-sized M5n instances suit their needs. Figure 3 shows that, on average, the Cascade Lake processor-powered m5n.4xlarge instance cluster processed 1.22 times the volume of data on the Bayesian classification benchmark compared to the older m4.4xlarge instance cluster that featured Broadwell processors, and 1.42 times the data throughput on the k-means clustering workload. Figure 2: Normalized comparison of the average throughput each small instance cluster achieved in the Naive Bayesian classification and k-means clustering workloads. Higher throughput is better. Source: Principled Technologies. Figure 3: Normalized comparison of average data throughput each medium instance cluster achieved in the Naive Bayesian classification and k-means clustering workloads. Higher throughput is better. Source: Principled Technologies. 0 0.2 0.4 0.6 0.8 1.0 1.4 1.6 1.81.2 Bayes k-means Small instance comparison: relative throughput Relative data throughput Higher is better up to 1.57x the throughput Bayes m5n.2xlarge – Cascade Lake m4.2xlarge – Broadwell 0 0.2 0.4 0.6 0.8 1.0 1.4 1.6 1.81.2 Medium instance comparison: relative throughput Higher is better up to 1.42x the throughput Bayes k-means Relative data throughput m5n.4xlarge – Cascade Lake m4.4xlarge – Broadwell December 2020 | 3 Achieve higher data throughput for your Apache Spark machine learning workloads with M5n instances for Amazon Web Services featuring 2nd Generation Intel Xeon Scalable Processors – Cascade Lake
  • 4. Large instances Large businesses with vast stores of critical data to analyze need all the processing power they can afford. Figure 4 shows that the m5n.16xlarge instance cluster, which featured Cascade Lake processors, handled a larger volume of data than the m4.16xlarge instance cluster that featured Broadwell processors. On average, the cluster of newer instances processed 1.56 times the data per second on the Bayesian classification benchmark, and 1.72 times the data per second on the k-means clustering workload. Figure 4: Normalized comparison of the average data throughput each large instance cluster achieved in the Naive Bayesian classification and k-means clustering workloads. Higher throughput is better. Source: Principled Technologies. Large instance comparison: relative throughput Higher is better up to 1.72x the throughput 0 0.2 0.4 0.6 0.8 1.0 1.4 1.6 1.81.2 Bayes k-means Relative data throughput m5n.16xlarge – Cascade Lake m4.16xlarge – Broadwell Better performance, better value for your investment In our tests, the newer M5n series instance clusters handled a range of 1.22x to 1.72x higher volume of data compared to the older M4 series instance clusters. Because M5n instances cost just 1.19x as much as the older instances, they may be a better value for machine learning workloads such as the ones we tested.1 December 2020 | 4 Achieve higher data throughput for your Apache Spark machine learning workloads with M5n instances for Amazon Web Services featuring 2nd Generation Intel Xeon Scalable Processors – Cascade Lake
  • 5. Performance that scales at higher vCPU counts In both the Bayesian classification and k-means clustering workloads, the performance margin between the M5n and M4 series instances increased significantly between the medium (16vCPU) and large (64vCPU) instance cluster tests. M5n series instances for Amazon EC2 In 2019, Amazon introduced M5n instances to their EC2 offerings, extending 100Gbps networking and adding improved packet-processing performance capabilities to its general-purpose M family.2 In addition to adding 2nd Generation Intel Xeon Scalable processors to the M-series options, M5n instances support the following: • Higher CPU core frequency (3.1 GHz vs 2.3 GHz) • Intel DL Boost Vector Neural Network Instructions, which Intel claims can “deliver a significant performance improvement” for deep learning applications3 • 25 Gbps of peak bandwidth for small and medium instances, 75Gbps for large instances • AWS Nitro System, which provides better security and performance compared to older AWS hypervisors4 December 2020 | 5 Achieve higher data throughput for your Apache Spark machine learning workloads with M5n instances for Amazon Web Services featuring 2nd Generation Intel Xeon Scalable Processors – Cascade Lake
  • 6. Conclusion Your organization deserves powerful instances to run its cloud-based Apache Spark machine learning work. The time you save on big data analysis can go toward informed planning and decision-making that can affect the outcome of critical business goals. In our Naive Bayesian classification and k-means clustering tests, newer M5n series instances for Amazon EC2 that featured Cascade Lake processors handled more data per second on average compared to older M4 series instances that featured Broadwell processors. The performance advantages we saw during testing also mean that organizations that invest in the newer instances would be getting better value for their money. 1 “Amazon EC2 On-Demand Pricing,” accessed October 28, 2020, https://aws.amazon.com/ec2/pricing/on-demand/ 2 Julien Simon, “New M5n and R5n EC2 Instances, with up to 100Gbps networking,” accessed November 3, 2020, https://aws.amazon.com/blogs/aws/new-m5n-and-r5n-instances-with-up-to-100-gbps-networking/. 3 “Introduction to Intel Deep Learning Boost on Second Generation Intel Xeon Scalable Processors,” accessed November 3, 2020, https://software.intel.com/content/www/us/en/develop/articles/introduction-to-intel-deep-learning-boost-on-sec- ond-generation-intel-xeon-scalable.html. 4 “AWS Nitro System,” accessed November 16, 2020, https://aws.amazon.com/ec2/nitro/. Principled Technologies is a registered trademark of Principled Technologies, Inc. All other product names are the trademarks of their respective owners. For additional information, review the science behind this report. Principled Technologies® Facts matter.®Principled Technologies® Facts matter.® This project was commissioned by Intel. Read the science behind this report at http://facts.pt/AFf9GVO December 2020 | 6 Achieve higher data throughput for your Apache Spark machine learning workloads with M5n instances for Amazon Web Services featuring 2nd Generation Intel Xeon Scalable Processors – Cascade Lake