SlideShare uma empresa Scribd logo
1 de 58
Baixar para ler offline
2
33
0% 20% 40% 60%10% 30% 50% 70%
Ryzen™ 7 1800X advantage vs Core i7 6850K
Ryzen™ 7 1700X advantage vs Core i7 6800K
Ryzen™ 7 1700 advantage vs Core i7 7700K
Ryzen™ 5 1600X advantage vs Core i5 7600K
Ryzen™ 5 1600 advantage vs Core i5 7600
Ryzen™ 5 1500X advantage vs Core i5 7500
Ryzen™ 5 1400 advantage vs Core i5 7400
Ryzen™ 3 1300X advantage vs Core i3 7300
Ryzen™ 3 1200 vs Core i3 7200
Ryzen™ Threadripper˜ 1950X advantage vs Core i9 7900X
A new era of PC
enthusiasts want to game
and stream and work and
watch
Photo, Video, and 3D
creation software continues
to push the limits of multiple
cores
Current Graphics APIs are
natively multi-threaded:
Microsoft® DirectX® 12 and
Vulkan®
Multi-threading processors
are required to drive smooth
VR experiences
4
4
AMD Model Cores / Threads Intel Model Cores/Threads
i7-9700K
i7-8700K 8/8
6/12i7-8700
i5-8600K
6/6
i5-8600
i5-8500
i5-8400
i3-8300
4/4
i3-8100
5
ENTHUSIAST/PROSUMER
Cutting-edge performance for those who demand
the highest tier of multi-threaded performance
possible
on a mainstream PC platform
Intel Core™ i7
HIGH PERFORMANCE
Powerful high-performance processors for
demanding games and productivity software
Intel Core™ i5
MAINSTREAM/ONLINE
Advanced computing hardware that can easily
handle productivity software, online/esports
games, and media playback, with room for
expansion
Intel Core™ i3
EVERYDAY COMPUTING
Basic, Affordable computing power for
responsive web surfing, media playback,
and casual gaming in HD
Pentium,
Celeron
WITH RADEON™ VEGA GRAPHICS
6
6
VR & GAMING, PROSUMERS, ESPORTS, AND ENTHUSIASTS
Built for
VR & Gaming
Whether you’re exploring new worlds
in VR, or just gaming with your friends,
Ryzen™ processors will take you there
with incredibly smooth, fast
performance
System images are for illustrative purposes only.
The new AMD Ryzen™
Processor delivers lightning fast
responsiveness whether you’re
creating 3D models in VR, editing UHD
video, or performing computational
modelling
Built for
Prosumers
You can be confident your Ryzen™
processor won’t choke when you
need it, enjoying professional-level
eSports performance to crush your
enemies today, on a system designed
to dominate the games of tomorrow
Built for
eSports Gaming
Built for
Enthusiasts
Enthusiasts place a premium
value on the most cores and threads
thanks to multi-threaded benchmarks
and Intel’s Core Extreme Edition
processors
7
THE ULTIMATE DESKTOP PROCESSOR
FOR GAMERS, CREATORS, AND ENTHUSIASTS
Smarter PlatformMore FeaturesMore Performance
8
THE “ZEN+”
ARCHITECTURE
MORE PERFORMANCE
9




MORE PERFORMANCE
3.7 GHz
3.8 GHz
3.9 GHz
4.0 GHz
4.1 GHz
4.2 GHz
4.3 GHz
1T 2T 3T 4T 5T 6T 7T 8T 9T 10T 11T 12T 13T 14T 15T 16T
Average Clockspeed 1T-16T
BASE
MAX
10



MORE PERFORMANCE
3.6 GHz
3.7 GHz
3.8 GHz
3.9 GHz
4.0 GHz
4.1 GHz
4.2 GHz
4.3 GHz
1T 2T 3T 4T 5T 6T 7T 8T 9T 10T 11T 12T 13T 14T 15T 16T
Average Clockspeed 1T-16T
2700X Uplift
11
X
⁰ ⁰ ⁰
Cinebench R15 nT
F
RANGE 2
MORE PERFORMANCE
12
RelativePerformanceMORE PERFORMANCE
VS. CORE i7
CPUs*
13
MORE PERFORMANCE
144Hz
60Hz
VS. CORE i7 CPUs*
14
14
chosen as the
IN THEIR PRICE SEGMENTS
by
15
Average over period 1/1/18 – 5/31/18 in int’l survey covering etailers in UK, USA, China, Ger., Can., and Fr.
Average over period 1 January 2018 to 31 May 2018. International survey covering UK, USA, Germany, China, Canada, and France. Etailers selling either or both
families of processors in a box amazon-de, amazon-uk, amazon-us, bestbuy-us, ebuyer-uk, jd-cn, ldlc-fr, materielnet-fr, microcenter-us, mindfactory-de, ncix-ca,
ncix-us, neweggbus-us, newegg-ca, newegg-us, pcworld-uk, staples-us, walmart-us. AMD Ryzen™ desktop processor family: Ryzen 3 1200, Ryzen 3 1300X, Ryzen 5
1400, Ryzen 5 1500X, Ryzen 5 1600, Ryzen 5 1600X, Ryzen 5 1700X, Ryzen 5 2600, Ryzen 5 2600X, Ryzen 7 1700, Ryzen 7 1700X, Ryzen 7 1800X, Ryzen 7 2600X,
Ryzen 7 2700, Ryzen 7 2700X, Ryzen Threadripper 1900X, Ryzen Threadripper 1920X, Ryzen Threadripper 1950X, Threadripper 1900X, Threadripper 1920X,
Threadripper 1950X. Comparable Intel processor family: i3-7100, i3-7100T, i3-7100U, i3-7300, i3-7320, i3-7350, i3-7350K, i3-8100, i3-8300, i3-8350, i3-8350K, i5-
7260, i5-7260U, i5-7400, i5-7400, i5-7400T, i5-7500, i5-7600, i5-7600K, i5-7600T, i5-7640X, i5-8400, i5-8500, i5-8600, i5-8600K, i7-6800K, i7-6850K, i7-6900K, i7-
6950X, i7-7500, i7-7567U, i7-7700, i7-7700K, i7-7700T, i7-7740X, i7-7800, i7-7800X, i7-7820X, i7-8211, i7-8550U, i7-8600K, i7-8700, i7-8700K, i9-7900X, I9-7920X, i9-
7940X, i9-7960X, i9-7980X, I9-7980Xe.
Does not include Ryzen Mobile processors with Radeon Vega Graphics or Ryzen Desktop processors with Radeon Vega Graphics.
THE AMD RYZEN DESKTOP PROCESSOR FAMILY RATES AN AVERAGE
4.76 STARS VERSUS COMPARABLE INTEL PROCESSOR FAMILY RATING
4.68 ACROSS TOP ETAILERS USING VERIFIED END USER REVIEWS.
16
17
18

19

‒
‒

‒
‒

‒
‒

‒
‒
20
21



Data
Fabric
8M L3
I+D Cache
16-way
32B/cycle32B/cycle
512K L2
I+D Cache
8-way
64K
I-Cache
4-way
32K
D-Cache
8-way
32B/cycle32B fetch
32B/cycle2*16B load
1*16B store
Unified
Memory
Controller
DRAM
Channel
16B/cycle32B/cycle
cclk
memclk
IO Hub
Controller
32B/cycle
lclk



22



Data
Fabric
8M L3
I+D Cache
16-way
32B/cycle32B/cycle
512K L2
I+D Cache
8-way
64K
I-Cache
4-way
32K
D-Cache
8-way
32B/cycle32B fetch
32B/cycle2*16B load
1*16B store
Unified
Memory
Controller
DRAM
Channel
16B/cycle32B/cycle
cclk
memclk
IO Hub
Controller
32B/cycle
lclk
32B/cycle
16B/cycle off die
GMI 16B/cycle off die



23





Data
Fabric
4M L3
I+D Cache
16-way
32B/cycle32B/cycle
512K L2
I+D Cache
8-way
64K
I-Cache
4-way
32K
D-Cache
8-way
32B/cycle32B fetch
32B/cycle2*16B load
1*16B store
Unified
Memory
Controller
DRAM
Channel
16B/cycle32B/cycle
cclk
IO Hub
Controller
32B/cycle
lclk
GFX9
Media
32B/cycle
32B/cycle
sclk

memclk
24




‒
‒
25
YEAR FAMILY PRODUCT FAMILY ARCHITECTURE EXAMPLE MODEL
ADX
CLFLUSHOPT
RDSEED
SHA
SMAP
XGETBV
XSAVEC
XSAVES
AVX2
BMI2
MOVBE
RDRND
SMEP
FSGSBASE
XSAVEOPT
BMI
FMA
F16C
AES
AVX
OSXSAVE
PCLMULQDQ
SSE4.1
SSE4.2
XSAVE
SSSE3
CLZERO
FMA4
TBM
XOP
2017 17h “Summit Ridge” “Zen” Ryzen 7 1800X 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
2015 15h “Carrizo”/”Bristol Ridge” “Excavator” A12-9800 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1
2014 15h “Kaveri”/”Godavari” “Steamroller” A10-7890K 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1
2012 15h “Vishera” “Piledriver” FX-8370 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1
2011 15h “Zambezi” “Bulldozer” FX-8150 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 0 1
2013 16h “Kabini” “Jaguar” A6-1450 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 0 0 0 0
2011 14h “Ontario” “Bobcat” E-450 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
2011 12h “Llano” “Husky” A8-3870 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2009 10h “Greyhound” “Greyhound” Phenom II X4 955 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26
27
V1.1 CHANGES
NEW GUI FOR PERFORMANCE AND POWER PROFILING ANALYSIS.
FILTER AVAILABLE EVENTS
GROUP BY PROCESS, MODULE, OR THREAD
UNIFIED CLI FOR PERFORMANCE AND POWER PROFILING
ENERGY CONSUMPTION ANALYSIS OF PROCESS, THREAD, MODULE, FUNCTION AND SOURCE
LINE.
L3 AND DF PMC COUNTER SUPPORT (WINDOWS CLI ONLY)
RECOMMEND USING 250,000 CPU CLOCK CYCLE INTERVAL WHILE ADDING CUSTOM COUNTERS
TO PROFILE
MAKES MATH EASY FOR PER THOUSAND INSTRUCTIONS (PTI) & PER THOUSAND CLOCKS (PTC)
SEE: HTTPS://DEVELOPER.AMD.COM/AMD-UPROF/
SEE HTTP://SUPPORT.AMD.COM/EN-US/SEARCH/TECH-DOCS
28
FP: floating point
LS: load/store
IC/BP: instruction cache and branch prediction
EX (SC): integer ALU &
AGU execution and
scheduling
L2
DE: instruction decode, dispatch, microcode
sequencer, & micro-op cache
L3
DF: Data
Fabric
UMC:
Unified
Memory
Controller
(NDA only)
IOHC: IO
Hub
Controller
(NDA only)
 rdpmc SMN in/out
29

‒
‒
‒
‒
‒
• Set core clock > base clock to disable boost on “Raven Ridge” processors
‒
‒
‒
‒
30
31
•
•
•
•
•
32

‒

‒
‒

‒
‒
‒
‒
‒
‒
33
MSVS 2015 MSVS 2017
34
• THIS ADVICE IS SPECIFIC TO AMD PROCESSORS AND IS NOT
GENERAL GUIDANCE FOR ALL PROCESSOR VENDORS.
• GENERALLY, APPLICATIONS SHOW SMT BENEFITS
AND USE OF ALL LOGICAL PROCESSORS IS
RECOMMENDED.
• BUT GAMES OFTEN SUFFER FROM SMT CONTENTION ON THE
MAIN THREAD.
• ONE STRATEGY TO REDUCE THIS CONTENTION IS TO
CREATE THREADS BASED ON PHYSICAL CORE COUNT
RATHER THAN LOGICAL PROCESSOR COUNT.
• ALWAYS PROFILE YOUR APPLICATION/GAME TO
DETERMINE THE IDEAL THREAD COUNT.
• AMD BULLDOZER IS NOT A SMT DESIGN
• AVOID CORE CLAMPING
• SEE HTTPS://GPUOPEN.COM/CPU-CORE-COUNT-DETECTION-WINDOWS/
// This advice is specific to AMD processors and is
// not general guidance for all processor vendors
DWORD getDefaultThreadCount() {
DWORD cores, logical;
getProcessorCount(cores, logical);
DWORD count = logical;
char vendor[13];
getCpuidVendor(vendor);
if (0 == strcmp(vendor, "AuthenticAMD")) {
if (0x15 == getCpuidFamily()) {
// AMD "Bulldozer" family microarchitecture
count = logical;
} else {
count = cores;
}
}
return count;
}
35
• GET
PSYSTEM_LOGICAL_PROCESSOR_INFORMATION
_EX BUFFER LENGTH AT RUNTIME.
• OTHERWISE, THE APPLICATION MAY CRASH
IF AN INSUFFICIENTLY SIZED BUFFER WAS
CREATED AT COMPILE TIME.
• SEE
HTTPS://GITHUB.COM/GPUOPEN-
LIBRARIESANDSDKS/CPU-CORE-
COUNTS/BLOB/MASTER/WINDOWS/THREADCO
UNT-WIN7.CPP
// char buffer[0x1000] /* bad assumption */
char* buffer = NULL;
DWORD len = 0;
if (FALSE == GetLogicalProcessorInformationEx(
RelationAll,
PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX)buffer,
len)) {
if (GetLastError() == ERROR_INSUFFICIENT_BUFFER) {
buffer = (char*)malloc(len);
if (GetLogicalProcessorInformationEx(
RelationAll,
(PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX)buffer,
&len)) {
// ...
}
}
free(buffer);
}
36
• AVOID SIGNED, NARROWING AFFINITY
MASKS.
• OTHERWISE, THE APPLICATION MAY
CRASH OR EXHIBIT UNEXPECTED
BEHAVIOR.
• BY DEFAULT, AN APPLICATION IS CONSTRAINED TO A SINGLE GROUP,
A STATIC SET OF UP TO 64 LOGICAL PROCESSORS.
• RIGHT SHIFTS: FOR SIGNED NUMBERS, THE SIGN BIT IS USED TO FILL
THE VACATED BIT POSITIONS.
• LEFT SHIFTS: IF YOU LEFT-SHIFT A SIGNED NUMBER SO THAT THE
SIGN BIT IS AFFECTED, THE RESULT IS UNDEFINED.
• SEE HTTPS://MSDN.MICROSOFT.COM/EN-
US/LIBRARY/336XBHCZ.ASPX
•
• BAD
• POPCNT = 0
• MASK B[32] = 0X0000000000000000
• GOOD
• POPCNT = 64
• MASK B[32] = 0X0000000100000000
DWORD lps = 64; // logical processors
DWORD_PTR p = 0xffffffffffffffff; // process mask
int b = 32; // bit index
#if 0
/* BAD */
int x = p; // signed, narrowing!
int count = 0;
//while (x != 0) { // never zero!
while (x > 0) { // false!
count += (x & 1);
x >>= 1; // fill bits with sign!
}
printf("popcnt = %in", count); // 0
printf("mask b[%i] = 0x%pn", b, (1 << b)); // undefined, 0?
#else
/* GOOD */
DWORD_PTR x = p;
int count = 0;
while (x != 0) {
count += (x & 1);
x >>= 1;
}
printf("popcnt = %in", count);
printf("mask b[%i] = 0x%pn", b, (1ULL << b));
#endif
37

‒
‒

‒
‒
‒
‒
‒
38

‒
‒

39
#include "stdafx.h"
#include <intrin.h>
#include <numeric>
#include <chrono>
#define LEN 64000
alignas(64) float a[LEN];
alignas(64) float b[LEN];
void step(float dt) {
for (size_t i = 0; i < LEN; i += 8) {
// x,y,z,w,vx,vy,vy,vw
__m128 p1 = _mm_load_ps(&a[(i + 0) % LEN]);
__m128 v1 = _mm_load_ps(&a[(i + 4) % LEN]);
p1 = _mm_add_ps(p1, _mm_mul_ps(v1, _mm_load_ps1(&dt)));
_mm_stream_ps(&a[(i + 0) % LEN], p1);
__m128 p2 = _mm_load_ps(&b[(i + 0) % LEN]);
__m128 v2 = _mm_load_ps(&b[(i + 4) % LEN]);
p2 = _mm_add_ps(p2, _mm_mul_ps(v2, _mm_load_ps1(&dt)));
#if 0
/* without workaround */
_mm_stream_ps(&b[(i + 0) % LEN], p2);
# else
/* with workaround */
_mm_store_ps(&b[(i + 0) % LEN], p2);
#endif
}
}
int main(int argc, char *argv[]) {
using namespace std::chrono;
int j = (argc > 1) ? atoi(argv[1]) : 0;
int seed = (argc > 2) ? atoi(argv[2]) : 3;
size_t steps = (argc > 3) ? atoll(argv[3]) : 2000000;
float dt = (argc > 4) ? (float)atof(argv[4]) : 0.001f;
srand(seed);
for (int i = 0; i < LEN; ++i) {
a[i] = (float)rand() / RAND_MAX;
b[i] = (float)rand() / RAND_MAX;
}
high_resolution_clock::time_point t0 = 
high_resolution_clock::now();
for (size_t i = 0; i < steps; i++) {
step(dt);
}
high_resolution_clock::time_point t1 = 
high_resolution_clock::now();
duration<double> time_span = 
duration_cast<duration<double>>(t1 - t0);
printf("time (milliseconds): %lfn", 
1000.0 * time_span.count());
printf("a[%i] = %fn", j, a[j]);
printf("b[%i] = %fn", j, b[j]);
return EXIT_SUCCESS;
}
40
60 WcbFull per
1000 instructions
41
6 WcbFull per
1000 instructions
42
Before Workaround After Workaround
43
•
• USE THE PAUSE INSTRUCTION
• TEST, THEN TEST-AND-SET
• ALIGNAS(64) LOCK VARIABLE
• OR _DECLSPEC( ALIGN(64) )
•
• CUSTOM CPU PROFILE > EVENTS BY HARDWARE
SOURCE >
• [076] CYCLES NOT IN HALT
• [0C0] INSTRUCTIONS RETIRED
• [0AF] DYNAMIC TOKENS DISPATCH STALL CYCLES 0 >
[4] ALU TOKENS TOTAL UNAVAILABLE
• 10K PER THOUSAND INSTRUCTIONS IS BAD
namespace MyLock {
typedef unsigned LOCK, *PLOCK;
enum { LOCK_IS_FREE = 0, LOCK_IS_TAKEN = 1 };
#if 0
/* BAD */
void Lock(PLOCK pl) {
while (LOCK_IS_TAKEN == _InterlockedCompareExchange( 
pl, LOCK_IS_TAKEN, LOCK_IS_FREE)) { // lock, xchg, cmp
}
}
#else
/* GOOD */
void Lock(PLOCK pl) {
while (1) {
while (LOCK_IS_TAKEN == *pl)
_mm_pause();
if (LOCK_IS_FREE==_InterlockedExchange(pl, LOCK_IS_TAKEN))
break;
}
}
#endif
void Unlock(PLOCK pl) {
_InterlockedExchange(pl, LOCK_IS_FREE);
}
}
alignas(64) MyLock::LOCK gLock;
44
#include "intrin.h"
#include "stdio.h"
#include "windows.h"
#include <chrono>
#include <numeric>
#include <thread>
#define LEN 512
alignas(64) float b[LEN][4][4];
alignas(64) float c[LEN][4][4];
DWORD WINAPI ThreadProcCallback(LPVOID data) {
MyLock::Lock(&gLock);
alignas(64) float a[LEN][4][4];
std::fill((float*)a, (float*)(a + LEN), 0.0f);
float r = 0.0;
for (size_t iter = 0; iter < 100000; iter++) {
for (int m = 0; m < LEN; m++)
for (int i = 0; i < 4; i++)
for (int j = 0; j < 4; j++)
for (int k = 0; k < 4; k++)
a[m][i][j] += b[m][i][k] * c[m][k][j];
r += std::accumulate((float*)a, 
(float*)(a + LEN), 0.0f);
}
printf("result: %fn", r);
MyLock::Unlock(&gLock);
return 0;
}
int main(int argc, char *argv[]) {
using namespace std::chrono;
float b0 = (argc > 1) ? strtof(argv[1], NULL) : 1.0f;
float c0 = (argc > 2) ? strtof(argv[2], NULL) : 2.0f;
std::fill((float*)b, (float*)(b + LEN), b0);
std::fill((float*)c, (float*)(c + LEN), c0);
int num_threads = std::thread::hardware_concurrency();
HANDLE* threads = new HANDLE[num_threads];
high_resolution_clock::time_point t0 = 
high_resolution_clock::now();
for (size_t i = 0; i < num_threads; ++i) {
threads[i] = CreateThread(NULL, 
0, ThreadProcCallback, NULL, 0, NULL);
}
WaitForMultipleObjects(num_threads, 
threads, TRUE, INFINITE);
high_resolution_clock::time_point t1 = 
high_resolution_clock::now();
duration<double> time_span = 
duration_cast<duration<double>>(t1 - t0);
printf("time (milliseconds): %lfn", 
1000.0 * time_span.count());
delete[] threads;
return EXIT_SUCCESS;
}
45

‒
‒

46
11K stalls per
1000 instructions
47
0 stall per 1000
instructions
48
1. xor eax,eax
2. lock cmpxchg dword ptr [?gLock@@3IA],ecx
3. cmp eax,ecx
4. je 00000001400010C0
1. cmp dword ptr [?gLock@@3IA],1
2. jne 00000001400010DB
3. nop dword ptr [rax]
4. pause
5. cmp dword ptr [?gLock@@3IA],1
6. je 00000001400010D0
7. mov eax,1
8. xchg eax,dword ptr [?gLock@@3IA]
9. test eax,eax
10. jne 00000001400010C0
Bad Good
49
•
• MINIMIZE ACCESSES TO THE SAME CACHE LINE FROM MULTIPLE THREADS.
• USE THREAD LOCAL STORAGE RATHER THAN PROCESS SHARED DATA
WHENEVER POSSIBLE.
•
• CUSTOM CPU PROFILE > EVENTS BY HARDWARE SOURCE >
• [076] CYCLES NOT IN HALT
• [0C0] INSTRUCTIONS RETIRED
• [043] DATA CACHE REFILLS FROM SYSTEM
• [1] LS_MABRESP_LCL_CACHE
• [4] LS_MABRESP_RMT_CACHE, HIT IN CACHE; REMOTE CCX AND HOME
NODE ON DIFFERENT DIE
50

‒
‒

51
/* #include <windows.h> <chrono> <numeric> <thread> */
using namespace std::chrono;
#define NUM_ITER 2000000000
struct Data {
unsigned long sum;
};
struct ThreadData {
int seed;
Data data;
};
DWORD WINAPI ThreadProcCallback(void* param) {
ThreadData *p = (ThreadData*)param;
srand(p->seed);
#if 0 /* process shared data */
p->data.sum = 0;
for (int i = 0; i < NUM_ITER; i++) {
p->data.sum += rand() % 2;
}
#else /* thread local data */
int local_sum = 0;
for (int i = 0; i < NUM_ITER; i++) {
local_sum += rand() % 2;
}
p->data.sum = local_sum;
#endif
return 0;
}
int main(int argc, char *argv[]) {
int seed = (argc > 1) ? atoi(argv[1]) : 3;
int numThreads = std::thread::hardware_concurrency();
HANDLE* threads = new HANDLE[numThreads];
ThreadData* a = new ThreadData[numThreads];
high_resolution_clock::time_point t0 = 
high_resolution_clock::now();
for (size_t i = 0; i < numThreads; ++i) {
a[i].seed = seed;
threads[i] = CreateThread(NULL, 0, ThreadProcCallback, 
(void*)&a[i], 0, NULL);
}
WaitForMultipleObjects(numThreads, threads, TRUE, INFINITE);
high_resolution_clock::time_point t1 = 
high_resolution_clock::now();
duration<double> time_span = 
duration_cast<duration<double>>(t1 - t0);
printf("time (ms): %lfn", 1000.0 * time_span.count());
for (size_t i = 0; i < numThreads; ++i) {
printf("sum[%llu] = %lun", i, a[i].data.sum);
}
delete[] a;
delete[] threads;
return EXIT_SUCCESS;
}
52
Many DC refills
53
Near 0 DC refills
54
1. call qword ptr [__imp_rand]
2. and eax,80000001h
3. jge 00000001400010A5
4. dec eax
5. or eax,0FFFFFFFEh
6. inc eax
7. add dword ptr [rbx+4],eax
8. sub rdi,1
9. jne 0000000140001091
1. call qword ptr [__imp_rand]
2. and eax,80000001h
3. jge 00000001400010A5
4. dec eax
5. or eax,0FFFFFFFEh
6. inc eax
7. add ebx,eax
8. sub rdi,1
9. jne 0000000140001091
10. mov dword ptr [rsi+4],ebx
Frequently accessed process shared data in
innermost loop
Minimized access to process shared data by using
thread local data
55
•
•
•
•
•
5656
5757 | GDC18 AMD RYZEN CPU OPTIMIZATION | 3/21/2018 | AMD
[IGC2018] AMD Don Woligroski - WHY Ryzen

Mais conteúdo relacionado

Mais procurados

Tipos de socket
Tipos de socketTipos de socket
Tipos de socketjoslis12
 
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor CoreAMD
 
AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD
 
AMD processors
AMD processorsAMD processors
AMD processorssanthu652
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdfAdrian Huang
 
Gpu presentation
Gpu presentationGpu presentation
Gpu presentationJosiah Lund
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Jonathan Katz
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesAMD
 
The Path to "Zen 2"
The Path to "Zen 2"The Path to "Zen 2"
The Path to "Zen 2"AMD
 
Getting The Most Out Of Your Flash/SSDs
Getting The Most Out Of Your Flash/SSDsGetting The Most Out Of Your Flash/SSDs
Getting The Most Out Of Your Flash/SSDsAerospike, Inc.
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLQ1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLMemory Fabric Forum
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance AMD
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Saksham Tanwar
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugginglibfetion
 

Mais procurados (20)

AMD Ryzen
AMD RyzenAMD Ryzen
AMD Ryzen
 
Amd processor
Amd processorAmd processor
Amd processor
 
Tipos de socket
Tipos de socketTipos de socket
Tipos de socket
 
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
 
AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor Architecture
 
AMD processors
AMD processorsAMD processors
AMD processors
 
It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
 
Gpu presentation
Gpu presentationGpu presentation
Gpu presentation
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
 
The Path to "Zen 2"
The Path to "Zen 2"The Path to "Zen 2"
The Path to "Zen 2"
 
Getting The Most Out Of Your Flash/SSDs
Getting The Most Out Of Your Flash/SSDsGetting The Most Out Of Your Flash/SSDs
Getting The Most Out Of Your Flash/SSDs
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLQ1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
 
AMD Processor
AMD ProcessorAMD Processor
AMD Processor
 
Graphics processing unit
Graphics processing unitGraphics processing unit
Graphics processing unit
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 

Semelhante a [IGC2018] AMD Don Woligroski - WHY Ryzen

Blue Line Supermicro Aplus
Blue Line Supermicro AplusBlue Line Supermicro Aplus
Blue Line Supermicro AplusBlue Line
 
HPE ProLiant DL380 Gen9 Server Data Sheet
HPE ProLiant DL380 Gen9 Server Data SheetHPE ProLiant DL380 Gen9 Server Data Sheet
HPE ProLiant DL380 Gen9 Server Data Sheet美兰 曾
 
Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kazuhito Ohkawa
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?Shinnosuke Furuya
 
Hp dl 380 g9
Hp dl 380 g9Hp dl 380 g9
Hp dl 380 g9ganeshvm1
 
Cерверы Depo storm 3400 на базе новейших процессоров intel xeon e5 2600v3 fin
Cерверы Depo storm 3400 на базе новейших процессоров intel xeon e5 2600v3 finCерверы Depo storm 3400 на базе новейших процессоров intel xeon e5 2600v3 fin
Cерверы Depo storm 3400 на базе новейших процессоров intel xeon e5 2600v3 finDEPO Computers
 
MSI Intel H370/B360/H310 Motherboard Info Kit
MSI Intel H370/B360/H310 Motherboard Info KitMSI Intel H370/B360/H310 Motherboard Info Kit
MSI Intel H370/B360/H310 Motherboard Info KitMSI Gaming
 
MSI Z370 Motherboard Info Kit
MSI Z370 Motherboard Info KitMSI Z370 Motherboard Info Kit
MSI Z370 Motherboard Info KitMSI Gaming
 
Boost UDP Transaction Performance
Boost UDP Transaction PerformanceBoost UDP Transaction Performance
Boost UDP Transaction PerformanceLF Events
 
AMD 2014 Performance Mobile APUs
AMD 2014 Performance Mobile APUsAMD 2014 Performance Mobile APUs
AMD 2014 Performance Mobile APUsAMD
 
Schematic diagrams of GPUs' architecture and Time evolution of theoretical FL...
Schematic diagrams of GPUs' architecture and Time evolution of theoretical FL...Schematic diagrams of GPUs' architecture and Time evolution of theoretical FL...
Schematic diagrams of GPUs' architecture and Time evolution of theoretical FL...智啓 出川
 
A way to visual the best storage media for an application
A way to visual the best storage media for an applicationA way to visual the best storage media for an application
A way to visual the best storage media for an applicationTony Roug
 
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Community
 
Hpe Proliant DL325 Gen10 Server Datasheet
Hpe Proliant DL325 Gen10 Server DatasheetHpe Proliant DL325 Gen10 Server Datasheet
Hpe Proliant DL325 Gen10 Server Datasheet美兰 曾
 
Особенности архитектуры и траблшутинга маршрутизаторов серии ASR1000
Особенности архитектуры и траблшутинга маршрутизаторов серии ASR1000Особенности архитектуры и траблшутинга маршрутизаторов серии ASR1000
Особенности архитектуры и траблшутинга маршрутизаторов серии ASR1000Cisco Russia
 
HPE ProLiant DL360 Gen9 Server Data Sheet
HPE ProLiant DL360 Gen9 Server Data SheetHPE ProLiant DL360 Gen9 Server Data Sheet
HPE ProLiant DL360 Gen9 Server Data Sheet美兰 曾
 
Sample inventory report
Sample inventory reportSample inventory report
Sample inventory reportKamran Arshad
 

Semelhante a [IGC2018] AMD Don Woligroski - WHY Ryzen (20)

Blue Line Supermicro Aplus
Blue Line Supermicro AplusBlue Line Supermicro Aplus
Blue Line Supermicro Aplus
 
HPE ProLiant DL380 Gen9 Server Data Sheet
HPE ProLiant DL380 Gen9 Server Data SheetHPE ProLiant DL380 Gen9 Server Data Sheet
HPE ProLiant DL380 Gen9 Server Data Sheet
 
Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例
 
Fujitsu PRIMERGY RX200 S7
Fujitsu PRIMERGY RX200 S7Fujitsu PRIMERGY RX200 S7
Fujitsu PRIMERGY RX200 S7
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?
 
Hp dl 380 g9
Hp dl 380 g9Hp dl 380 g9
Hp dl 380 g9
 
Cерверы Depo storm 3400 на базе новейших процессоров intel xeon e5 2600v3 fin
Cерверы Depo storm 3400 на базе новейших процессоров intel xeon e5 2600v3 finCерверы Depo storm 3400 на базе новейших процессоров intel xeon e5 2600v3 fin
Cерверы Depo storm 3400 на базе новейших процессоров intel xeon e5 2600v3 fin
 
MSI Intel H370/B360/H310 Motherboard Info Kit
MSI Intel H370/B360/H310 Motherboard Info KitMSI Intel H370/B360/H310 Motherboard Info Kit
MSI Intel H370/B360/H310 Motherboard Info Kit
 
MSI Z370 Motherboard Info Kit
MSI Z370 Motherboard Info KitMSI Z370 Motherboard Info Kit
MSI Z370 Motherboard Info Kit
 
Boost UDP Transaction Performance
Boost UDP Transaction PerformanceBoost UDP Transaction Performance
Boost UDP Transaction Performance
 
AMD 2014 Performance Mobile APUs
AMD 2014 Performance Mobile APUsAMD 2014 Performance Mobile APUs
AMD 2014 Performance Mobile APUs
 
Schematic diagrams of GPUs' architecture and Time evolution of theoretical FL...
Schematic diagrams of GPUs' architecture and Time evolution of theoretical FL...Schematic diagrams of GPUs' architecture and Time evolution of theoretical FL...
Schematic diagrams of GPUs' architecture and Time evolution of theoretical FL...
 
A way to visual the best storage media for an application
A way to visual the best storage media for an applicationA way to visual the best storage media for an application
A way to visual the best storage media for an application
 
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
 
Hpe Proliant DL325 Gen10 Server Datasheet
Hpe Proliant DL325 Gen10 Server DatasheetHpe Proliant DL325 Gen10 Server Datasheet
Hpe Proliant DL325 Gen10 Server Datasheet
 
Особенности архитектуры и траблшутинга маршрутизаторов серии ASR1000
Особенности архитектуры и траблшутинга маршрутизаторов серии ASR1000Особенности архитектуры и траблшутинга маршрутизаторов серии ASR1000
Особенности архитектуры и траблшутинга маршрутизаторов серии ASR1000
 
El Super NAS Server TDS
El Super NAS Server TDSEl Super NAS Server TDS
El Super NAS Server TDS
 
HPE ProLiant DL360 Gen9 Server Data Sheet
HPE ProLiant DL360 Gen9 Server Data SheetHPE ProLiant DL360 Gen9 Server Data Sheet
HPE ProLiant DL360 Gen9 Server Data Sheet
 
Sample inventory report
Sample inventory reportSample inventory report
Sample inventory report
 
Yeni Nesil Sunucular ile Veritabanınız
Yeni Nesil Sunucular ile VeritabanınızYeni Nesil Sunucular ile Veritabanınız
Yeni Nesil Sunucular ile Veritabanınız
 

Mais de 강 민우

[IGC2018] 엔씨소프트 이경종 - 심층강화학습을 활용한 프로게이머 수준의 AI 만들기
[IGC2018] 엔씨소프트 이경종 - 심층강화학습을 활용한 프로게이머 수준의 AI 만들기[IGC2018] 엔씨소프트 이경종 - 심층강화학습을 활용한 프로게이머 수준의 AI 만들기
[IGC2018] 엔씨소프트 이경종 - 심층강화학습을 활용한 프로게이머 수준의 AI 만들기강 민우
 
[IGC2018] 청강대 이득우 - 언리얼에디터확장을위해알아야할것들
[IGC2018] 청강대 이득우 - 언리얼에디터확장을위해알아야할것들[IGC2018] 청강대 이득우 - 언리얼에디터확장을위해알아야할것들
[IGC2018] 청강대 이득우 - 언리얼에디터확장을위해알아야할것들강 민우
 
[IGC2018] SUB 윤민 - 나만의 사운드—제작하고 연출하기
[IGC2018] SUB 윤민 - 나만의 사운드—제작하고 연출하기[IGC2018] SUB 윤민 - 나만의 사운드—제작하고 연출하기
[IGC2018] SUB 윤민 - 나만의 사운드—제작하고 연출하기강 민우
 
[IGC2018] 이락디지털문화연구소 남기덕 - 게임 디자인의 시작, 테마
[IGC2018] 이락디지털문화연구소 남기덕 - 게임 디자인의 시작, 테마[IGC2018] 이락디지털문화연구소 남기덕 - 게임 디자인의 시작, 테마
[IGC2018] 이락디지털문화연구소 남기덕 - 게임 디자인의 시작, 테마강 민우
 
[IGC2018] 왓스튜디오 방영훈 - 놀면서공부하기
[IGC2018] 왓스튜디오 방영훈 - 놀면서공부하기[IGC2018] 왓스튜디오 방영훈 - 놀면서공부하기
[IGC2018] 왓스튜디오 방영훈 - 놀면서공부하기강 민우
 
[IGC2018] 넷마블 이상철 - 모바일 게임 보안 AR(Android Republic) 변조앱 내부를 파헤치다
[IGC2018] 넷마블 이상철 - 모바일 게임 보안 AR(Android Republic) 변조앱 내부를 파헤치다[IGC2018] 넷마블 이상철 - 모바일 게임 보안 AR(Android Republic) 변조앱 내부를 파헤치다
[IGC2018] 넷마블 이상철 - 모바일 게임 보안 AR(Android Republic) 변조앱 내부를 파헤치다강 민우
 
[IGC2018] TeamHoray 문지환 - 던그리드, 이랬으면 더 좋았을 텐데
[IGC2018] TeamHoray 문지환 - 던그리드, 이랬으면 더 좋았을 텐데[IGC2018] TeamHoray 문지환 - 던그리드, 이랬으면 더 좋았을 텐데
[IGC2018] TeamHoray 문지환 - 던그리드, 이랬으면 더 좋았을 텐데강 민우
 
[IGC2018] 에픽게임즈 신광섭 - 언리얼엔진4 포트나이트 멀티플랫폼 개발 지원
[IGC2018] 에픽게임즈 신광섭 - 언리얼엔진4 포트나이트 멀티플랫폼 개발 지원[IGC2018] 에픽게임즈 신광섭 - 언리얼엔진4 포트나이트 멀티플랫폼 개발 지원
[IGC2018] 에픽게임즈 신광섭 - 언리얼엔진4 포트나이트 멀티플랫폼 개발 지원강 민우
 
[IGC2018] 잔디소프트 윤세민 - HTML5 게임 어디까지 가능한가
[IGC2018] 잔디소프트 윤세민 - HTML5 게임 어디까지 가능한가[IGC2018] 잔디소프트 윤세민 - HTML5 게임 어디까지 가능한가
[IGC2018] 잔디소프트 윤세민 - HTML5 게임 어디까지 가능한가강 민우
 
[IGC2018] 해피툭 김봉균 - 대만 게임 시장 진출시 유의해야 할 점
[IGC2018] 해피툭 김봉균 - 대만 게임 시장 진출시 유의해야 할 점[IGC2018] 해피툭 김봉균 - 대만 게임 시장 진출시 유의해야 할 점
[IGC2018] 해피툭 김봉균 - 대만 게임 시장 진출시 유의해야 할 점강 민우
 
[IGC2018] 캡콤 토쿠다 유야 - 몬스터헌터 월드의 게임 컨셉과 레벨 디자인
[IGC2018] 캡콤 토쿠다 유야 - 몬스터헌터 월드의 게임 컨셉과 레벨 디자인[IGC2018] 캡콤 토쿠다 유야 - 몬스터헌터 월드의 게임 컨셉과 레벨 디자인
[IGC2018] 캡콤 토쿠다 유야 - 몬스터헌터 월드의 게임 컨셉과 레벨 디자인강 민우
 
[IGC2018] 산타모니카스튜디오 에이브 타라키 - 게임의 컨셉 디자인과 세계를 만드는 법
[IGC2018] 산타모니카스튜디오 에이브 타라키 - 게임의 컨셉 디자인과  세계를 만드는 법[IGC2018] 산타모니카스튜디오 에이브 타라키 - 게임의 컨셉 디자인과  세계를 만드는 법
[IGC2018] 산타모니카스튜디오 에이브 타라키 - 게임의 컨셉 디자인과 세계를 만드는 법강 민우
 
[IGC2018] 펄어비스 강건우 - 펄어비스에서 기획자가 일하는 방법
[IGC2018] 펄어비스 강건우 - 펄어비스에서 기획자가 일하는 방법[IGC2018] 펄어비스 강건우 - 펄어비스에서 기획자가 일하는 방법
[IGC2018] 펄어비스 강건우 - 펄어비스에서 기획자가 일하는 방법강 민우
 
[IGC2018] 스튜디오EIM 정사인 - 좋은 소리는 무엇인가
[IGC2018] 스튜디오EIM 정사인 - 좋은 소리는 무엇인가[IGC2018] 스튜디오EIM 정사인 - 좋은 소리는 무엇인가
[IGC2018] 스튜디오EIM 정사인 - 좋은 소리는 무엇인가강 민우
 
[IGC2018] 유유자적라이프 김윤정 - SunShine 베를린을 밝게 비추다
[IGC2018] 유유자적라이프 김윤정 - SunShine 베를린을 밝게 비추다[IGC2018] 유유자적라이프 김윤정 - SunShine 베를린을 밝게 비추다
[IGC2018] 유유자적라이프 김윤정 - SunShine 베를린을 밝게 비추다강 민우
 
[IGC2018] 자라나는 씨앗 김효택 - MazM 시리즈로 바라본 스토리 게임의 가능성
[IGC2018] 자라나는 씨앗 김효택 - MazM 시리즈로 바라본 스토리 게임의 가능성[IGC2018] 자라나는 씨앗 김효택 - MazM 시리즈로 바라본 스토리 게임의 가능성
[IGC2018] 자라나는 씨앗 김효택 - MazM 시리즈로 바라본 스토리 게임의 가능성강 민우
 
[IGC2018] 인플루전 곽노진 - 인디게임이 망할 수 밖에 없는 현실과 이유
 [IGC2018] 인플루전 곽노진 -  인디게임이 망할 수 밖에 없는 현실과 이유 [IGC2018] 인플루전 곽노진 -  인디게임이 망할 수 밖에 없는 현실과 이유
[IGC2018] 인플루전 곽노진 - 인디게임이 망할 수 밖에 없는 현실과 이유강 민우
 
[IGC2018] 라운드8 박성준 - 블레스 언리쉬드 우리는 왜 모든것을 재설계했나
[IGC2018] 라운드8 박성준 - 블레스 언리쉬드  우리는 왜 모든것을 재설계했나[IGC2018] 라운드8 박성준 - 블레스 언리쉬드  우리는 왜 모든것을 재설계했나
[IGC2018] 라운드8 박성준 - 블레스 언리쉬드 우리는 왜 모든것을 재설계했나강 민우
 
[IGC2018] 아이봉 정봉재 - 아직 아이 망하니
[IGC2018] 아이봉 정봉재 - 아직 아이 망하니[IGC2018] 아이봉 정봉재 - 아직 아이 망하니
[IGC2018] 아이봉 정봉재 - 아직 아이 망하니강 민우
 
[IGC2018] 퍼니파우 최재영 - 감성을 위한 개발요소
[IGC2018] 퍼니파우 최재영 - 감성을 위한 개발요소[IGC2018] 퍼니파우 최재영 - 감성을 위한 개발요소
[IGC2018] 퍼니파우 최재영 - 감성을 위한 개발요소강 민우
 

Mais de 강 민우 (20)

[IGC2018] 엔씨소프트 이경종 - 심층강화학습을 활용한 프로게이머 수준의 AI 만들기
[IGC2018] 엔씨소프트 이경종 - 심층강화학습을 활용한 프로게이머 수준의 AI 만들기[IGC2018] 엔씨소프트 이경종 - 심층강화학습을 활용한 프로게이머 수준의 AI 만들기
[IGC2018] 엔씨소프트 이경종 - 심층강화학습을 활용한 프로게이머 수준의 AI 만들기
 
[IGC2018] 청강대 이득우 - 언리얼에디터확장을위해알아야할것들
[IGC2018] 청강대 이득우 - 언리얼에디터확장을위해알아야할것들[IGC2018] 청강대 이득우 - 언리얼에디터확장을위해알아야할것들
[IGC2018] 청강대 이득우 - 언리얼에디터확장을위해알아야할것들
 
[IGC2018] SUB 윤민 - 나만의 사운드—제작하고 연출하기
[IGC2018] SUB 윤민 - 나만의 사운드—제작하고 연출하기[IGC2018] SUB 윤민 - 나만의 사운드—제작하고 연출하기
[IGC2018] SUB 윤민 - 나만의 사운드—제작하고 연출하기
 
[IGC2018] 이락디지털문화연구소 남기덕 - 게임 디자인의 시작, 테마
[IGC2018] 이락디지털문화연구소 남기덕 - 게임 디자인의 시작, 테마[IGC2018] 이락디지털문화연구소 남기덕 - 게임 디자인의 시작, 테마
[IGC2018] 이락디지털문화연구소 남기덕 - 게임 디자인의 시작, 테마
 
[IGC2018] 왓스튜디오 방영훈 - 놀면서공부하기
[IGC2018] 왓스튜디오 방영훈 - 놀면서공부하기[IGC2018] 왓스튜디오 방영훈 - 놀면서공부하기
[IGC2018] 왓스튜디오 방영훈 - 놀면서공부하기
 
[IGC2018] 넷마블 이상철 - 모바일 게임 보안 AR(Android Republic) 변조앱 내부를 파헤치다
[IGC2018] 넷마블 이상철 - 모바일 게임 보안 AR(Android Republic) 변조앱 내부를 파헤치다[IGC2018] 넷마블 이상철 - 모바일 게임 보안 AR(Android Republic) 변조앱 내부를 파헤치다
[IGC2018] 넷마블 이상철 - 모바일 게임 보안 AR(Android Republic) 변조앱 내부를 파헤치다
 
[IGC2018] TeamHoray 문지환 - 던그리드, 이랬으면 더 좋았을 텐데
[IGC2018] TeamHoray 문지환 - 던그리드, 이랬으면 더 좋았을 텐데[IGC2018] TeamHoray 문지환 - 던그리드, 이랬으면 더 좋았을 텐데
[IGC2018] TeamHoray 문지환 - 던그리드, 이랬으면 더 좋았을 텐데
 
[IGC2018] 에픽게임즈 신광섭 - 언리얼엔진4 포트나이트 멀티플랫폼 개발 지원
[IGC2018] 에픽게임즈 신광섭 - 언리얼엔진4 포트나이트 멀티플랫폼 개발 지원[IGC2018] 에픽게임즈 신광섭 - 언리얼엔진4 포트나이트 멀티플랫폼 개발 지원
[IGC2018] 에픽게임즈 신광섭 - 언리얼엔진4 포트나이트 멀티플랫폼 개발 지원
 
[IGC2018] 잔디소프트 윤세민 - HTML5 게임 어디까지 가능한가
[IGC2018] 잔디소프트 윤세민 - HTML5 게임 어디까지 가능한가[IGC2018] 잔디소프트 윤세민 - HTML5 게임 어디까지 가능한가
[IGC2018] 잔디소프트 윤세민 - HTML5 게임 어디까지 가능한가
 
[IGC2018] 해피툭 김봉균 - 대만 게임 시장 진출시 유의해야 할 점
[IGC2018] 해피툭 김봉균 - 대만 게임 시장 진출시 유의해야 할 점[IGC2018] 해피툭 김봉균 - 대만 게임 시장 진출시 유의해야 할 점
[IGC2018] 해피툭 김봉균 - 대만 게임 시장 진출시 유의해야 할 점
 
[IGC2018] 캡콤 토쿠다 유야 - 몬스터헌터 월드의 게임 컨셉과 레벨 디자인
[IGC2018] 캡콤 토쿠다 유야 - 몬스터헌터 월드의 게임 컨셉과 레벨 디자인[IGC2018] 캡콤 토쿠다 유야 - 몬스터헌터 월드의 게임 컨셉과 레벨 디자인
[IGC2018] 캡콤 토쿠다 유야 - 몬스터헌터 월드의 게임 컨셉과 레벨 디자인
 
[IGC2018] 산타모니카스튜디오 에이브 타라키 - 게임의 컨셉 디자인과 세계를 만드는 법
[IGC2018] 산타모니카스튜디오 에이브 타라키 - 게임의 컨셉 디자인과  세계를 만드는 법[IGC2018] 산타모니카스튜디오 에이브 타라키 - 게임의 컨셉 디자인과  세계를 만드는 법
[IGC2018] 산타모니카스튜디오 에이브 타라키 - 게임의 컨셉 디자인과 세계를 만드는 법
 
[IGC2018] 펄어비스 강건우 - 펄어비스에서 기획자가 일하는 방법
[IGC2018] 펄어비스 강건우 - 펄어비스에서 기획자가 일하는 방법[IGC2018] 펄어비스 강건우 - 펄어비스에서 기획자가 일하는 방법
[IGC2018] 펄어비스 강건우 - 펄어비스에서 기획자가 일하는 방법
 
[IGC2018] 스튜디오EIM 정사인 - 좋은 소리는 무엇인가
[IGC2018] 스튜디오EIM 정사인 - 좋은 소리는 무엇인가[IGC2018] 스튜디오EIM 정사인 - 좋은 소리는 무엇인가
[IGC2018] 스튜디오EIM 정사인 - 좋은 소리는 무엇인가
 
[IGC2018] 유유자적라이프 김윤정 - SunShine 베를린을 밝게 비추다
[IGC2018] 유유자적라이프 김윤정 - SunShine 베를린을 밝게 비추다[IGC2018] 유유자적라이프 김윤정 - SunShine 베를린을 밝게 비추다
[IGC2018] 유유자적라이프 김윤정 - SunShine 베를린을 밝게 비추다
 
[IGC2018] 자라나는 씨앗 김효택 - MazM 시리즈로 바라본 스토리 게임의 가능성
[IGC2018] 자라나는 씨앗 김효택 - MazM 시리즈로 바라본 스토리 게임의 가능성[IGC2018] 자라나는 씨앗 김효택 - MazM 시리즈로 바라본 스토리 게임의 가능성
[IGC2018] 자라나는 씨앗 김효택 - MazM 시리즈로 바라본 스토리 게임의 가능성
 
[IGC2018] 인플루전 곽노진 - 인디게임이 망할 수 밖에 없는 현실과 이유
 [IGC2018] 인플루전 곽노진 -  인디게임이 망할 수 밖에 없는 현실과 이유 [IGC2018] 인플루전 곽노진 -  인디게임이 망할 수 밖에 없는 현실과 이유
[IGC2018] 인플루전 곽노진 - 인디게임이 망할 수 밖에 없는 현실과 이유
 
[IGC2018] 라운드8 박성준 - 블레스 언리쉬드 우리는 왜 모든것을 재설계했나
[IGC2018] 라운드8 박성준 - 블레스 언리쉬드  우리는 왜 모든것을 재설계했나[IGC2018] 라운드8 박성준 - 블레스 언리쉬드  우리는 왜 모든것을 재설계했나
[IGC2018] 라운드8 박성준 - 블레스 언리쉬드 우리는 왜 모든것을 재설계했나
 
[IGC2018] 아이봉 정봉재 - 아직 아이 망하니
[IGC2018] 아이봉 정봉재 - 아직 아이 망하니[IGC2018] 아이봉 정봉재 - 아직 아이 망하니
[IGC2018] 아이봉 정봉재 - 아직 아이 망하니
 
[IGC2018] 퍼니파우 최재영 - 감성을 위한 개발요소
[IGC2018] 퍼니파우 최재영 - 감성을 위한 개발요소[IGC2018] 퍼니파우 최재영 - 감성을 위한 개발요소
[IGC2018] 퍼니파우 최재영 - 감성을 위한 개발요소
 

Último

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationShrmpro
 

Último (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 

[IGC2018] AMD Don Woligroski - WHY Ryzen

  • 1.
  • 2. 2
  • 3. 33 0% 20% 40% 60%10% 30% 50% 70% Ryzen™ 7 1800X advantage vs Core i7 6850K Ryzen™ 7 1700X advantage vs Core i7 6800K Ryzen™ 7 1700 advantage vs Core i7 7700K Ryzen™ 5 1600X advantage vs Core i5 7600K Ryzen™ 5 1600 advantage vs Core i5 7600 Ryzen™ 5 1500X advantage vs Core i5 7500 Ryzen™ 5 1400 advantage vs Core i5 7400 Ryzen™ 3 1300X advantage vs Core i3 7300 Ryzen™ 3 1200 vs Core i3 7200 Ryzen™ Threadripper˜ 1950X advantage vs Core i9 7900X A new era of PC enthusiasts want to game and stream and work and watch Photo, Video, and 3D creation software continues to push the limits of multiple cores Current Graphics APIs are natively multi-threaded: Microsoft® DirectX® 12 and Vulkan® Multi-threading processors are required to drive smooth VR experiences
  • 4. 4 4 AMD Model Cores / Threads Intel Model Cores/Threads i7-9700K i7-8700K 8/8 6/12i7-8700 i5-8600K 6/6 i5-8600 i5-8500 i5-8400 i3-8300 4/4 i3-8100
  • 5. 5 ENTHUSIAST/PROSUMER Cutting-edge performance for those who demand the highest tier of multi-threaded performance possible on a mainstream PC platform Intel Core™ i7 HIGH PERFORMANCE Powerful high-performance processors for demanding games and productivity software Intel Core™ i5 MAINSTREAM/ONLINE Advanced computing hardware that can easily handle productivity software, online/esports games, and media playback, with room for expansion Intel Core™ i3 EVERYDAY COMPUTING Basic, Affordable computing power for responsive web surfing, media playback, and casual gaming in HD Pentium, Celeron WITH RADEON™ VEGA GRAPHICS
  • 6. 6 6 VR & GAMING, PROSUMERS, ESPORTS, AND ENTHUSIASTS Built for VR & Gaming Whether you’re exploring new worlds in VR, or just gaming with your friends, Ryzen™ processors will take you there with incredibly smooth, fast performance System images are for illustrative purposes only. The new AMD Ryzen™ Processor delivers lightning fast responsiveness whether you’re creating 3D models in VR, editing UHD video, or performing computational modelling Built for Prosumers You can be confident your Ryzen™ processor won’t choke when you need it, enjoying professional-level eSports performance to crush your enemies today, on a system designed to dominate the games of tomorrow Built for eSports Gaming Built for Enthusiasts Enthusiasts place a premium value on the most cores and threads thanks to multi-threaded benchmarks and Intel’s Core Extreme Edition processors
  • 7. 7 THE ULTIMATE DESKTOP PROCESSOR FOR GAMERS, CREATORS, AND ENTHUSIASTS Smarter PlatformMore FeaturesMore Performance
  • 9. 9     MORE PERFORMANCE 3.7 GHz 3.8 GHz 3.9 GHz 4.0 GHz 4.1 GHz 4.2 GHz 4.3 GHz 1T 2T 3T 4T 5T 6T 7T 8T 9T 10T 11T 12T 13T 14T 15T 16T Average Clockspeed 1T-16T BASE MAX
  • 10. 10    MORE PERFORMANCE 3.6 GHz 3.7 GHz 3.8 GHz 3.9 GHz 4.0 GHz 4.1 GHz 4.2 GHz 4.3 GHz 1T 2T 3T 4T 5T 6T 7T 8T 9T 10T 11T 12T 13T 14T 15T 16T Average Clockspeed 1T-16T 2700X Uplift
  • 11. 11 X ⁰ ⁰ ⁰ Cinebench R15 nT F RANGE 2 MORE PERFORMANCE
  • 14. 14 14 chosen as the IN THEIR PRICE SEGMENTS by
  • 15. 15 Average over period 1/1/18 – 5/31/18 in int’l survey covering etailers in UK, USA, China, Ger., Can., and Fr. Average over period 1 January 2018 to 31 May 2018. International survey covering UK, USA, Germany, China, Canada, and France. Etailers selling either or both families of processors in a box amazon-de, amazon-uk, amazon-us, bestbuy-us, ebuyer-uk, jd-cn, ldlc-fr, materielnet-fr, microcenter-us, mindfactory-de, ncix-ca, ncix-us, neweggbus-us, newegg-ca, newegg-us, pcworld-uk, staples-us, walmart-us. AMD Ryzen™ desktop processor family: Ryzen 3 1200, Ryzen 3 1300X, Ryzen 5 1400, Ryzen 5 1500X, Ryzen 5 1600, Ryzen 5 1600X, Ryzen 5 1700X, Ryzen 5 2600, Ryzen 5 2600X, Ryzen 7 1700, Ryzen 7 1700X, Ryzen 7 1800X, Ryzen 7 2600X, Ryzen 7 2700, Ryzen 7 2700X, Ryzen Threadripper 1900X, Ryzen Threadripper 1920X, Ryzen Threadripper 1950X, Threadripper 1900X, Threadripper 1920X, Threadripper 1950X. Comparable Intel processor family: i3-7100, i3-7100T, i3-7100U, i3-7300, i3-7320, i3-7350, i3-7350K, i3-8100, i3-8300, i3-8350, i3-8350K, i5- 7260, i5-7260U, i5-7400, i5-7400, i5-7400T, i5-7500, i5-7600, i5-7600K, i5-7600T, i5-7640X, i5-8400, i5-8500, i5-8600, i5-8600K, i7-6800K, i7-6850K, i7-6900K, i7- 6950X, i7-7500, i7-7567U, i7-7700, i7-7700K, i7-7700T, i7-7740X, i7-7800, i7-7800X, i7-7820X, i7-8211, i7-8550U, i7-8600K, i7-8700, i7-8700K, i9-7900X, I9-7920X, i9- 7940X, i9-7960X, i9-7980X, I9-7980Xe. Does not include Ryzen Mobile processors with Radeon Vega Graphics or Ryzen Desktop processors with Radeon Vega Graphics. THE AMD RYZEN DESKTOP PROCESSOR FAMILY RATES AN AVERAGE 4.76 STARS VERSUS COMPARABLE INTEL PROCESSOR FAMILY RATING 4.68 ACROSS TOP ETAILERS USING VERIFIED END USER REVIEWS.
  • 16. 16
  • 17. 17
  • 20. 20
  • 21. 21    Data Fabric 8M L3 I+D Cache 16-way 32B/cycle32B/cycle 512K L2 I+D Cache 8-way 64K I-Cache 4-way 32K D-Cache 8-way 32B/cycle32B fetch 32B/cycle2*16B load 1*16B store Unified Memory Controller DRAM Channel 16B/cycle32B/cycle cclk memclk IO Hub Controller 32B/cycle lclk   
  • 22. 22    Data Fabric 8M L3 I+D Cache 16-way 32B/cycle32B/cycle 512K L2 I+D Cache 8-way 64K I-Cache 4-way 32K D-Cache 8-way 32B/cycle32B fetch 32B/cycle2*16B load 1*16B store Unified Memory Controller DRAM Channel 16B/cycle32B/cycle cclk memclk IO Hub Controller 32B/cycle lclk 32B/cycle 16B/cycle off die GMI 16B/cycle off die   
  • 23. 23      Data Fabric 4M L3 I+D Cache 16-way 32B/cycle32B/cycle 512K L2 I+D Cache 8-way 64K I-Cache 4-way 32K D-Cache 8-way 32B/cycle32B fetch 32B/cycle2*16B load 1*16B store Unified Memory Controller DRAM Channel 16B/cycle32B/cycle cclk IO Hub Controller 32B/cycle lclk GFX9 Media 32B/cycle 32B/cycle sclk  memclk
  • 25. 25 YEAR FAMILY PRODUCT FAMILY ARCHITECTURE EXAMPLE MODEL ADX CLFLUSHOPT RDSEED SHA SMAP XGETBV XSAVEC XSAVES AVX2 BMI2 MOVBE RDRND SMEP FSGSBASE XSAVEOPT BMI FMA F16C AES AVX OSXSAVE PCLMULQDQ SSE4.1 SSE4.2 XSAVE SSSE3 CLZERO FMA4 TBM XOP 2017 17h “Summit Ridge” “Zen” Ryzen 7 1800X 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 2015 15h “Carrizo”/”Bristol Ridge” “Excavator” A12-9800 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 2014 15h “Kaveri”/”Godavari” “Steamroller” A10-7890K 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 2012 15h “Vishera” “Piledriver” FX-8370 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 2011 15h “Zambezi” “Bulldozer” FX-8150 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 0 1 2013 16h “Kabini” “Jaguar” A6-1450 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 0 0 0 0 2011 14h “Ontario” “Bobcat” E-450 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2011 12h “Llano” “Husky” A8-3870 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2009 10h “Greyhound” “Greyhound” Phenom II X4 955 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  • 26. 26
  • 27. 27 V1.1 CHANGES NEW GUI FOR PERFORMANCE AND POWER PROFILING ANALYSIS. FILTER AVAILABLE EVENTS GROUP BY PROCESS, MODULE, OR THREAD UNIFIED CLI FOR PERFORMANCE AND POWER PROFILING ENERGY CONSUMPTION ANALYSIS OF PROCESS, THREAD, MODULE, FUNCTION AND SOURCE LINE. L3 AND DF PMC COUNTER SUPPORT (WINDOWS CLI ONLY) RECOMMEND USING 250,000 CPU CLOCK CYCLE INTERVAL WHILE ADDING CUSTOM COUNTERS TO PROFILE MAKES MATH EASY FOR PER THOUSAND INSTRUCTIONS (PTI) & PER THOUSAND CLOCKS (PTC) SEE: HTTPS://DEVELOPER.AMD.COM/AMD-UPROF/ SEE HTTP://SUPPORT.AMD.COM/EN-US/SEARCH/TECH-DOCS
  • 28. 28 FP: floating point LS: load/store IC/BP: instruction cache and branch prediction EX (SC): integer ALU & AGU execution and scheduling L2 DE: instruction decode, dispatch, microcode sequencer, & micro-op cache L3 DF: Data Fabric UMC: Unified Memory Controller (NDA only) IOHC: IO Hub Controller (NDA only)  rdpmc SMN in/out
  • 29. 29  ‒ ‒ ‒ ‒ ‒ • Set core clock > base clock to disable boost on “Raven Ridge” processors ‒ ‒ ‒ ‒
  • 30. 30
  • 34. 34 • THIS ADVICE IS SPECIFIC TO AMD PROCESSORS AND IS NOT GENERAL GUIDANCE FOR ALL PROCESSOR VENDORS. • GENERALLY, APPLICATIONS SHOW SMT BENEFITS AND USE OF ALL LOGICAL PROCESSORS IS RECOMMENDED. • BUT GAMES OFTEN SUFFER FROM SMT CONTENTION ON THE MAIN THREAD. • ONE STRATEGY TO REDUCE THIS CONTENTION IS TO CREATE THREADS BASED ON PHYSICAL CORE COUNT RATHER THAN LOGICAL PROCESSOR COUNT. • ALWAYS PROFILE YOUR APPLICATION/GAME TO DETERMINE THE IDEAL THREAD COUNT. • AMD BULLDOZER IS NOT A SMT DESIGN • AVOID CORE CLAMPING • SEE HTTPS://GPUOPEN.COM/CPU-CORE-COUNT-DETECTION-WINDOWS/ // This advice is specific to AMD processors and is // not general guidance for all processor vendors DWORD getDefaultThreadCount() { DWORD cores, logical; getProcessorCount(cores, logical); DWORD count = logical; char vendor[13]; getCpuidVendor(vendor); if (0 == strcmp(vendor, "AuthenticAMD")) { if (0x15 == getCpuidFamily()) { // AMD "Bulldozer" family microarchitecture count = logical; } else { count = cores; } } return count; }
  • 35. 35 • GET PSYSTEM_LOGICAL_PROCESSOR_INFORMATION _EX BUFFER LENGTH AT RUNTIME. • OTHERWISE, THE APPLICATION MAY CRASH IF AN INSUFFICIENTLY SIZED BUFFER WAS CREATED AT COMPILE TIME. • SEE HTTPS://GITHUB.COM/GPUOPEN- LIBRARIESANDSDKS/CPU-CORE- COUNTS/BLOB/MASTER/WINDOWS/THREADCO UNT-WIN7.CPP // char buffer[0x1000] /* bad assumption */ char* buffer = NULL; DWORD len = 0; if (FALSE == GetLogicalProcessorInformationEx( RelationAll, PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX)buffer, len)) { if (GetLastError() == ERROR_INSUFFICIENT_BUFFER) { buffer = (char*)malloc(len); if (GetLogicalProcessorInformationEx( RelationAll, (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX)buffer, &len)) { // ... } } free(buffer); }
  • 36. 36 • AVOID SIGNED, NARROWING AFFINITY MASKS. • OTHERWISE, THE APPLICATION MAY CRASH OR EXHIBIT UNEXPECTED BEHAVIOR. • BY DEFAULT, AN APPLICATION IS CONSTRAINED TO A SINGLE GROUP, A STATIC SET OF UP TO 64 LOGICAL PROCESSORS. • RIGHT SHIFTS: FOR SIGNED NUMBERS, THE SIGN BIT IS USED TO FILL THE VACATED BIT POSITIONS. • LEFT SHIFTS: IF YOU LEFT-SHIFT A SIGNED NUMBER SO THAT THE SIGN BIT IS AFFECTED, THE RESULT IS UNDEFINED. • SEE HTTPS://MSDN.MICROSOFT.COM/EN- US/LIBRARY/336XBHCZ.ASPX • • BAD • POPCNT = 0 • MASK B[32] = 0X0000000000000000 • GOOD • POPCNT = 64 • MASK B[32] = 0X0000000100000000 DWORD lps = 64; // logical processors DWORD_PTR p = 0xffffffffffffffff; // process mask int b = 32; // bit index #if 0 /* BAD */ int x = p; // signed, narrowing! int count = 0; //while (x != 0) { // never zero! while (x > 0) { // false! count += (x & 1); x >>= 1; // fill bits with sign! } printf("popcnt = %in", count); // 0 printf("mask b[%i] = 0x%pn", b, (1 << b)); // undefined, 0? #else /* GOOD */ DWORD_PTR x = p; int count = 0; while (x != 0) { count += (x & 1); x >>= 1; } printf("popcnt = %in", count); printf("mask b[%i] = 0x%pn", b, (1ULL << b)); #endif
  • 39. 39 #include "stdafx.h" #include <intrin.h> #include <numeric> #include <chrono> #define LEN 64000 alignas(64) float a[LEN]; alignas(64) float b[LEN]; void step(float dt) { for (size_t i = 0; i < LEN; i += 8) { // x,y,z,w,vx,vy,vy,vw __m128 p1 = _mm_load_ps(&a[(i + 0) % LEN]); __m128 v1 = _mm_load_ps(&a[(i + 4) % LEN]); p1 = _mm_add_ps(p1, _mm_mul_ps(v1, _mm_load_ps1(&dt))); _mm_stream_ps(&a[(i + 0) % LEN], p1); __m128 p2 = _mm_load_ps(&b[(i + 0) % LEN]); __m128 v2 = _mm_load_ps(&b[(i + 4) % LEN]); p2 = _mm_add_ps(p2, _mm_mul_ps(v2, _mm_load_ps1(&dt))); #if 0 /* without workaround */ _mm_stream_ps(&b[(i + 0) % LEN], p2); # else /* with workaround */ _mm_store_ps(&b[(i + 0) % LEN], p2); #endif } } int main(int argc, char *argv[]) { using namespace std::chrono; int j = (argc > 1) ? atoi(argv[1]) : 0; int seed = (argc > 2) ? atoi(argv[2]) : 3; size_t steps = (argc > 3) ? atoll(argv[3]) : 2000000; float dt = (argc > 4) ? (float)atof(argv[4]) : 0.001f; srand(seed); for (int i = 0; i < LEN; ++i) { a[i] = (float)rand() / RAND_MAX; b[i] = (float)rand() / RAND_MAX; } high_resolution_clock::time_point t0 = high_resolution_clock::now(); for (size_t i = 0; i < steps; i++) { step(dt); } high_resolution_clock::time_point t1 = high_resolution_clock::now(); duration<double> time_span = duration_cast<duration<double>>(t1 - t0); printf("time (milliseconds): %lfn", 1000.0 * time_span.count()); printf("a[%i] = %fn", j, a[j]); printf("b[%i] = %fn", j, b[j]); return EXIT_SUCCESS; }
  • 40. 40 60 WcbFull per 1000 instructions
  • 41. 41 6 WcbFull per 1000 instructions
  • 43. 43 • • USE THE PAUSE INSTRUCTION • TEST, THEN TEST-AND-SET • ALIGNAS(64) LOCK VARIABLE • OR _DECLSPEC( ALIGN(64) ) • • CUSTOM CPU PROFILE > EVENTS BY HARDWARE SOURCE > • [076] CYCLES NOT IN HALT • [0C0] INSTRUCTIONS RETIRED • [0AF] DYNAMIC TOKENS DISPATCH STALL CYCLES 0 > [4] ALU TOKENS TOTAL UNAVAILABLE • 10K PER THOUSAND INSTRUCTIONS IS BAD namespace MyLock { typedef unsigned LOCK, *PLOCK; enum { LOCK_IS_FREE = 0, LOCK_IS_TAKEN = 1 }; #if 0 /* BAD */ void Lock(PLOCK pl) { while (LOCK_IS_TAKEN == _InterlockedCompareExchange( pl, LOCK_IS_TAKEN, LOCK_IS_FREE)) { // lock, xchg, cmp } } #else /* GOOD */ void Lock(PLOCK pl) { while (1) { while (LOCK_IS_TAKEN == *pl) _mm_pause(); if (LOCK_IS_FREE==_InterlockedExchange(pl, LOCK_IS_TAKEN)) break; } } #endif void Unlock(PLOCK pl) { _InterlockedExchange(pl, LOCK_IS_FREE); } } alignas(64) MyLock::LOCK gLock;
  • 44. 44 #include "intrin.h" #include "stdio.h" #include "windows.h" #include <chrono> #include <numeric> #include <thread> #define LEN 512 alignas(64) float b[LEN][4][4]; alignas(64) float c[LEN][4][4]; DWORD WINAPI ThreadProcCallback(LPVOID data) { MyLock::Lock(&gLock); alignas(64) float a[LEN][4][4]; std::fill((float*)a, (float*)(a + LEN), 0.0f); float r = 0.0; for (size_t iter = 0; iter < 100000; iter++) { for (int m = 0; m < LEN; m++) for (int i = 0; i < 4; i++) for (int j = 0; j < 4; j++) for (int k = 0; k < 4; k++) a[m][i][j] += b[m][i][k] * c[m][k][j]; r += std::accumulate((float*)a, (float*)(a + LEN), 0.0f); } printf("result: %fn", r); MyLock::Unlock(&gLock); return 0; } int main(int argc, char *argv[]) { using namespace std::chrono; float b0 = (argc > 1) ? strtof(argv[1], NULL) : 1.0f; float c0 = (argc > 2) ? strtof(argv[2], NULL) : 2.0f; std::fill((float*)b, (float*)(b + LEN), b0); std::fill((float*)c, (float*)(c + LEN), c0); int num_threads = std::thread::hardware_concurrency(); HANDLE* threads = new HANDLE[num_threads]; high_resolution_clock::time_point t0 = high_resolution_clock::now(); for (size_t i = 0; i < num_threads; ++i) { threads[i] = CreateThread(NULL, 0, ThreadProcCallback, NULL, 0, NULL); } WaitForMultipleObjects(num_threads, threads, TRUE, INFINITE); high_resolution_clock::time_point t1 = high_resolution_clock::now(); duration<double> time_span = duration_cast<duration<double>>(t1 - t0); printf("time (milliseconds): %lfn", 1000.0 * time_span.count()); delete[] threads; return EXIT_SUCCESS; }
  • 46. 46 11K stalls per 1000 instructions
  • 47. 47 0 stall per 1000 instructions
  • 48. 48 1. xor eax,eax 2. lock cmpxchg dword ptr [?gLock@@3IA],ecx 3. cmp eax,ecx 4. je 00000001400010C0 1. cmp dword ptr [?gLock@@3IA],1 2. jne 00000001400010DB 3. nop dword ptr [rax] 4. pause 5. cmp dword ptr [?gLock@@3IA],1 6. je 00000001400010D0 7. mov eax,1 8. xchg eax,dword ptr [?gLock@@3IA] 9. test eax,eax 10. jne 00000001400010C0 Bad Good
  • 49. 49 • • MINIMIZE ACCESSES TO THE SAME CACHE LINE FROM MULTIPLE THREADS. • USE THREAD LOCAL STORAGE RATHER THAN PROCESS SHARED DATA WHENEVER POSSIBLE. • • CUSTOM CPU PROFILE > EVENTS BY HARDWARE SOURCE > • [076] CYCLES NOT IN HALT • [0C0] INSTRUCTIONS RETIRED • [043] DATA CACHE REFILLS FROM SYSTEM • [1] LS_MABRESP_LCL_CACHE • [4] LS_MABRESP_RMT_CACHE, HIT IN CACHE; REMOTE CCX AND HOME NODE ON DIFFERENT DIE
  • 51. 51 /* #include <windows.h> <chrono> <numeric> <thread> */ using namespace std::chrono; #define NUM_ITER 2000000000 struct Data { unsigned long sum; }; struct ThreadData { int seed; Data data; }; DWORD WINAPI ThreadProcCallback(void* param) { ThreadData *p = (ThreadData*)param; srand(p->seed); #if 0 /* process shared data */ p->data.sum = 0; for (int i = 0; i < NUM_ITER; i++) { p->data.sum += rand() % 2; } #else /* thread local data */ int local_sum = 0; for (int i = 0; i < NUM_ITER; i++) { local_sum += rand() % 2; } p->data.sum = local_sum; #endif return 0; } int main(int argc, char *argv[]) { int seed = (argc > 1) ? atoi(argv[1]) : 3; int numThreads = std::thread::hardware_concurrency(); HANDLE* threads = new HANDLE[numThreads]; ThreadData* a = new ThreadData[numThreads]; high_resolution_clock::time_point t0 = high_resolution_clock::now(); for (size_t i = 0; i < numThreads; ++i) { a[i].seed = seed; threads[i] = CreateThread(NULL, 0, ThreadProcCallback, (void*)&a[i], 0, NULL); } WaitForMultipleObjects(numThreads, threads, TRUE, INFINITE); high_resolution_clock::time_point t1 = high_resolution_clock::now(); duration<double> time_span = duration_cast<duration<double>>(t1 - t0); printf("time (ms): %lfn", 1000.0 * time_span.count()); for (size_t i = 0; i < numThreads; ++i) { printf("sum[%llu] = %lun", i, a[i].data.sum); } delete[] a; delete[] threads; return EXIT_SUCCESS; }
  • 53. 53 Near 0 DC refills
  • 54. 54 1. call qword ptr [__imp_rand] 2. and eax,80000001h 3. jge 00000001400010A5 4. dec eax 5. or eax,0FFFFFFFEh 6. inc eax 7. add dword ptr [rbx+4],eax 8. sub rdi,1 9. jne 0000000140001091 1. call qword ptr [__imp_rand] 2. and eax,80000001h 3. jge 00000001400010A5 4. dec eax 5. or eax,0FFFFFFFEh 6. inc eax 7. add ebx,eax 8. sub rdi,1 9. jne 0000000140001091 10. mov dword ptr [rsi+4],ebx Frequently accessed process shared data in innermost loop Minimized access to process shared data by using thread local data
  • 56. 5656
  • 57. 5757 | GDC18 AMD RYZEN CPU OPTIMIZATION | 3/21/2018 | AMD