SlideShare uma empresa Scribd logo
1 de 60
MEMORY OPTIMIZATION Christer Ericson Sony Computer Entertainment, Santa Monica (christer_ericson@playstation.sony.com)
Talk contents 1/2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Talk contents 2/2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Problem statement ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Need more justification? 1/3 SIMD instructions consume data at 2-8 times the rate of normal instructions! Instruction parallelism:
Need more justification? 2/3 Improvements to compiler technology double program performance every  ~18 years ! Proebsting’s law: Corollary: Don’t expect the compiler to do it for you!
Need more justification? 3/3 ,[object Object],[object Object],[object Object],On Moore’s law:
Brief cache review ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The memory hierarchy Main memory L2 cache L1 cache CPU ~1-5 cycles ~5-20 cycles ~40-100 cycles 1 cycle Roughly:
Some cache specs ,[object Object],[object Object],L1 cache (I/D) L2 cache PS2 16K/8K †  2-way N/A GameCube 32K/32K ‡  8-way 256K 2-way unified XBOX 16K/16K 4-way 128K 8-way unified PC ~32-64K ~128-512K
Foes: 3 C’s of cache misses ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Friends: Introducing the 3 R’s ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Compulsory Capacity Conflict Rearrange X (x) X Reduce X X (x) Reuse (x) X
Measuring cache utilization ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Code cache optimization 1/2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Code cache optimization 2/2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data cache optimization ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Prefetching and preloading ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Software prefetching const int kLookAhead = 4;  // Some elements ahead for (int i = 0; i < 4 * n; i += 4) { Prefetch(elem[i + kLookAhead]); Process(elem[i + 0]); Process(elem[i + 1]); Process(elem[i + 2]); Process(elem[i + 3]); } // Loop through and process all 4n elements for (int i = 0; i < 4 * n; i++) Process(elem[i]);
Greedy prefetching void PreorderTraversal(Node *pNode) { // Greedily prefetch left traversal path Prefetch(pNode->left); // Process the current node Process(pNode); // Greedily prefetch right traversal path Prefetch(pNode->right); // Recursively visit left then right subtree PreorderTraversal(pNode->left); PreorderTraversal(pNode->right); }
Preloading (pseudo-prefetch) (NB: This code reads one element beyond the end of the elem array.)  Elem a = elem[0]; for (int i = 0; i < 4 * n; i += 4) { Elem e = elem[i + 4];  // Cache miss, non-blocking Elem b = elem[i + 1];  // Cache hit Elem c = elem[i + 2];  // Cache hit Elem d = elem[i + 3];  // Cache hit Process(a); Process(b); Process(c); Process(d); a = e; }
Structures ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Field reordering ,[object Object],void Foo(S *p, void *key, int k) { while (p) { if (p->key == key) { p->count[k]++; break; } p = p->pNext; } } struct S { void *key; int count[20]; S *pNext; }; struct S { void *key; S *pNext; int count[20]; };
Hot/cold splitting ,[object Object],[object Object],[object Object],[object Object],Hot fields: Cold fields: struct S { void *key; S *pNext; S2 *pCold; }; struct S2 { int count[10]; };
Hot/cold splitting
Beware compiler padding ,[object Object],Decreasing size! struct X { int8 a; int64 b; int8 c; int16 d; int64 e; float f; }; struct Z { int64 b; int64 e; float f; int16 d; int8 a; int8 c; }; struct Y { int8 a, pad_a[7]; int64 b; int8 c, pad_c[1]; int16 d, pad_d[2]; int64 e; float f, pad_f[1]; };
Cache performance analysis ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tree data structures ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Breadth-first order ,[object Object],[object Object]
Depth-first order ,[object Object],[object Object]
van Emde Boas layout ,[object Object],[object Object]
A compact static k-d tree ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Linearization caching ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
 
Memory allocation policy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The curse of aliasing ,[object Object],Aliasing is also missed opportunities for optimization What value is returned here? Who knows! Aliasing is multiple references to the same storage location int Foo(int *a, int *b) { *a = 1; *b = 2; return *a; } int n; int *p1 = &n; int *p2 = &n;
The curse of aliasing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How do we do ‘anti-aliasing’? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],†  To be defined
Matrix multiplication 1/3 Consider optimizing a 2x2 matrix multiplication: How do we typically optimize it? Right, unrolling! Mat22mul(float a[2][2], float b[2][2], float c[2][2]){ for (int i = 0; i < 2; i++) { for (int j = 0; j < 2; j++) { a[i][j] = 0.0f; for (int k = 0; k < 2; k++) a[i][j] += b[i][k] * c[k][j]; } } }
Matrix multiplication 2/3 ,[object Object],[object Object],[object Object],[object Object],[object Object],Staightforward unrolling results in this: // 16 memory reads, 4 writes Mat22mul(float a[2][2], float b[2][2], float c[2][2]){ a[0][0] = b[0][0]*c[0][0] + b[0][1]*c[1][0]; a[0][1] = b[0][0]*c[0][1] + b[0][1]*c[1][1];  //(1) a[1][0] = b[1][0]*c[0][0] + b[1][1]*c[1][0];  //(2) a[1][1] = b[1][0]*c[0][1] + b[1][1]*c[1][1];  //(3) }
Matrix multiplication 3/3 A correct approach is instead writing it as: … before producing outputs Consume inputs… // 8 memory reads, 4 writes Mat22mul(float a[2][2], float b[2][2], float c[2][2]){ float b00 = b[0][0], b01 = b[0][1]; float b10 = b[1][0], b11 = b[1][1]; float c00 = c[0][0], c01 = c[0][1]; float c10 = c[1][0], c11 = c[1][1]; a[0][0] = b00*c00 + b01*c10; a[0][1] = b00*c01 + b01*c11; a[1][0] = b10*c00 + b11*c10; a[1][1] = b10*c01 + b11*c11; }
Abstraction penalty problem ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
C++ abstraction penalty ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
C++ abstraction penalty Pointer members in classes may alias other members: Code likely to refetch  numVals  each iteration! numVals  not a local variable! May be aliased by  pBuf ! class Buf { public: void Clear() { for (int i = 0; i < numVals; i++) pBuf[i] = 0; } private: int numVals, *pBuf; }
C++ abstraction penalty We know that aliasing won’t happen, and can manually solve the aliasing issue by writing code as: class Buf { public: void Clear() { for (int i = 0, n = numVals; i < n; i++) pBuf[i] = 0; } private: int numVals, *pBuf; }
C++ abstraction penalty Since  pBuf[i]  can only alias  numVals  in the first iteration, a quality compiler can fix this problem by peeling the loop once, turning it into: Q: Does  your  compiler do this optimization?! void Clear() { if (numVals >= 1) { pBuf[0] = 0; for (int i = 1, n = numVals; i < n; i++) pBuf[i] = 0; } }
Type-based alias analysis ,[object Object],[object Object],Use language types to disambiguate memory references!
Type-based alias analysis ,[object Object],[object Object],[object Object],[object Object],[object Object]
Compatibility of C/C++ types ,[object Object],[object Object],[object Object],[object Object],[object Object]
What TBAA can do for you into this: It can turn this: Possible aliasing between v[i]  and  *n No aliasing possible so fetch  *n  once! void Foo(float *v, int *n) { for (int i = 0; i < *n; i++) v[i] += 1.0f; } void Foo(float *v, int *n) { int t = *n; for (int i = 0; i < t; i++) v[i] += 1.0f; }
What TBAA can also do ,[object Object],[object Object],Required by standard Allowed By gcc Illegal C/C++ code! uint32 i; float f; i = *((uint32 *)&f); uint32 i; union { float f; uchar8 c[4]; } u; u.f = f; i = (u.c[3]<<24L)+ (u.c[2]<<16L)+ ...; uint32 i; union { float f; uint32 i; } u; u.f = f; i = u.i;
Restrict-qualified pointers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Using the restrict keyword Given this code: You really want the compiler to treat it as if written: But because of possible aliasing it cannot! void Foo(float v[], float *c, int n) { for (int i = 0; i < n; i++) v[i] = *c + 1.0f; } void Foo(float v[], float *c, int n) { float tmp = *c + 1.0f; for (int i = 0; i < n; i++) v[i] = tmp; }
Using the restrict keyword giving for the first version: and for the second version: For example, the code might be called as: The compiler must be conservative, and cannot perform the optimization! v[] = 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 v[] = 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 float a[10]; a[4] = 0.0f; Foo(a, &a[4], 10);
Solving the aliasing problem The fix? Declaring the output as  restrict : ,[object Object],[object Object],[object Object],[object Object],[object Object],void Foo(float * restrict v, float *c, int n) { for (int i = 0; i < n; i++)  v[i] = *c + 1.0f; }
‘ const’ doesn’t help Some might think this would work: ,[object Object],[object Object],[object Object],[object Object],Since  *c  is const,  v[i]  cannot write to it, right? void Foo(float v[], const float *c, int n) { for (int i = 0; i < n; i++)  v[i] = *c + 1.0f; }
SIMD + restrict = TRUE ,[object Object],Independent loads and stores. Operations can be performed in parallel! Stores may alias loads. Must perform operations sequentially. void VecAdd(int *a, int *b, int *c) { for (int i = 0; i < 4; i++)  a[i] = b[i] + c[i];  } void VecAdd(int * restrict a, int *b, int *c) { for (int i = 0; i < 4; i++)  a[i] = b[i] + c[i];  }
Restrict-qualified pointers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tips for avoiding aliasing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
That’s it! – Resources 1/2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Resources 2/2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Scope Stack Allocation
Scope Stack AllocationScope Stack Allocation
Scope Stack Allocation
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
 
Understanding jvm gc advanced
Understanding jvm gc advancedUnderstanding jvm gc advanced
Understanding jvm gc advanced
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
 
Goroutine stack and local variable allocation in Go
Goroutine stack and local variable allocation in GoGoroutine stack and local variable allocation in Go
Goroutine stack and local variable allocation in Go
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
 
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
 
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 

Semelhante a Memory Optimization

Gdc03 ericson memory_optimization
Gdc03 ericson memory_optimizationGdc03 ericson memory_optimization
Gdc03 ericson memory_optimization
brettlevin
 
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docxECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
tidwellveronique
 
ECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docxECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docx
tidwellveronique
 

Semelhante a Memory Optimization (20)

Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
 
Gdc03 ericson memory_optimization
Gdc03 ericson memory_optimizationGdc03 ericson memory_optimization
Gdc03 ericson memory_optimization
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
PPU Optimisation Lesson
PPU Optimisation LessonPPU Optimisation Lesson
PPU Optimisation Lesson
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)
 
04 Cache Memory
04  Cache  Memory04  Cache  Memory
04 Cache Memory
 
Caching in
Caching inCaching in
Caching in
 
Happy To Use SIMD
Happy To Use SIMDHappy To Use SIMD
Happy To Use SIMD
 
C language
C languageC language
C language
 
DConf 2016: Keynote by Walter Bright
DConf 2016: Keynote by Walter Bright DConf 2016: Keynote by Walter Bright
DConf 2016: Keynote by Walter Bright
 
C language
C languageC language
C language
 
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docxECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
 
ECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docxECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docx
 
Coding for multiple cores
Coding for multiple coresCoding for multiple cores
Coding for multiple cores
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Memory Optimization

  • 1. MEMORY OPTIMIZATION Christer Ericson Sony Computer Entertainment, Santa Monica (christer_ericson@playstation.sony.com)
  • 2.
  • 3.
  • 4.
  • 5. Need more justification? 1/3 SIMD instructions consume data at 2-8 times the rate of normal instructions! Instruction parallelism:
  • 6. Need more justification? 2/3 Improvements to compiler technology double program performance every ~18 years ! Proebsting’s law: Corollary: Don’t expect the compiler to do it for you!
  • 7.
  • 8.
  • 9. The memory hierarchy Main memory L2 cache L1 cache CPU ~1-5 cycles ~5-20 cycles ~40-100 cycles 1 cycle Roughly:
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18. Software prefetching const int kLookAhead = 4; // Some elements ahead for (int i = 0; i < 4 * n; i += 4) { Prefetch(elem[i + kLookAhead]); Process(elem[i + 0]); Process(elem[i + 1]); Process(elem[i + 2]); Process(elem[i + 3]); } // Loop through and process all 4n elements for (int i = 0; i < 4 * n; i++) Process(elem[i]);
  • 19. Greedy prefetching void PreorderTraversal(Node *pNode) { // Greedily prefetch left traversal path Prefetch(pNode->left); // Process the current node Process(pNode); // Greedily prefetch right traversal path Prefetch(pNode->right); // Recursively visit left then right subtree PreorderTraversal(pNode->left); PreorderTraversal(pNode->right); }
  • 20. Preloading (pseudo-prefetch) (NB: This code reads one element beyond the end of the elem array.) Elem a = elem[0]; for (int i = 0; i < 4 * n; i += 4) { Elem e = elem[i + 4]; // Cache miss, non-blocking Elem b = elem[i + 1]; // Cache hit Elem c = elem[i + 2]; // Cache hit Elem d = elem[i + 3]; // Cache hit Process(a); Process(b); Process(c); Process(d); a = e; }
  • 21.
  • 22.
  • 23.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.  
  • 34.
  • 35.
  • 36.
  • 37.
  • 38. Matrix multiplication 1/3 Consider optimizing a 2x2 matrix multiplication: How do we typically optimize it? Right, unrolling! Mat22mul(float a[2][2], float b[2][2], float c[2][2]){ for (int i = 0; i < 2; i++) { for (int j = 0; j < 2; j++) { a[i][j] = 0.0f; for (int k = 0; k < 2; k++) a[i][j] += b[i][k] * c[k][j]; } } }
  • 39.
  • 40. Matrix multiplication 3/3 A correct approach is instead writing it as: … before producing outputs Consume inputs… // 8 memory reads, 4 writes Mat22mul(float a[2][2], float b[2][2], float c[2][2]){ float b00 = b[0][0], b01 = b[0][1]; float b10 = b[1][0], b11 = b[1][1]; float c00 = c[0][0], c01 = c[0][1]; float c10 = c[1][0], c11 = c[1][1]; a[0][0] = b00*c00 + b01*c10; a[0][1] = b00*c01 + b01*c11; a[1][0] = b10*c00 + b11*c10; a[1][1] = b10*c01 + b11*c11; }
  • 41.
  • 42.
  • 43. C++ abstraction penalty Pointer members in classes may alias other members: Code likely to refetch numVals each iteration! numVals not a local variable! May be aliased by pBuf ! class Buf { public: void Clear() { for (int i = 0; i < numVals; i++) pBuf[i] = 0; } private: int numVals, *pBuf; }
  • 44. C++ abstraction penalty We know that aliasing won’t happen, and can manually solve the aliasing issue by writing code as: class Buf { public: void Clear() { for (int i = 0, n = numVals; i < n; i++) pBuf[i] = 0; } private: int numVals, *pBuf; }
  • 45. C++ abstraction penalty Since pBuf[i] can only alias numVals in the first iteration, a quality compiler can fix this problem by peeling the loop once, turning it into: Q: Does your compiler do this optimization?! void Clear() { if (numVals >= 1) { pBuf[0] = 0; for (int i = 1, n = numVals; i < n; i++) pBuf[i] = 0; } }
  • 46.
  • 47.
  • 48.
  • 49. What TBAA can do for you into this: It can turn this: Possible aliasing between v[i] and *n No aliasing possible so fetch *n once! void Foo(float *v, int *n) { for (int i = 0; i < *n; i++) v[i] += 1.0f; } void Foo(float *v, int *n) { int t = *n; for (int i = 0; i < t; i++) v[i] += 1.0f; }
  • 50.
  • 51.
  • 52. Using the restrict keyword Given this code: You really want the compiler to treat it as if written: But because of possible aliasing it cannot! void Foo(float v[], float *c, int n) { for (int i = 0; i < n; i++) v[i] = *c + 1.0f; } void Foo(float v[], float *c, int n) { float tmp = *c + 1.0f; for (int i = 0; i < n; i++) v[i] = tmp; }
  • 53. Using the restrict keyword giving for the first version: and for the second version: For example, the code might be called as: The compiler must be conservative, and cannot perform the optimization! v[] = 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 v[] = 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 float a[10]; a[4] = 0.0f; Foo(a, &a[4], 10);
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.