SlideShare uma empresa Scribd logo
1 de 25
Efficient Floating-Point Texture Decompression,[object Object],Tomi Aarnio (NRC Tampere),[object Object],Claudio Brunelli (NRC Tampere),[object Object],Timo Viitanen (TUT),[object Object]
Texturing pipeline in a GPU,[object Object]
Texturing pipeline in a GPU,[object Object],Memory bandwidth is the worst bottleneck,[object Object]
Texturing pipeline in a GPU,[object Object],Cache size is another,[object Object],Memory bandwidth is the worst bottleneck,[object Object]
Texturing pipeline in a GPU,[object Object],Cache size is another,[object Object],Memory bandwidth is the worst bottleneck,[object Object],Texture compression can alleviate both!,[object Object]
Texturing pipeline in a GPU,[object Object],Must be very fast:,[object Object],~40 gigatexels/sec,[object Object]
The established solution,[object Object],Nearly all existing schemes work the same way,[object Object],Partition the image into blocks of 4 x 4 pixels,[object Object],Compress each block independently,[object Object],Use a fixed compression ratio (6:1),[object Object],Our focus is on high dynamic range (HDR) textures,[object Object],RGB colors in 16-bit floating-point (FP16),[object Object],Compressed from 48 bits per pixel, down to 8 bpp,[object Object]
FP16 texture compression,[object Object],Roimela et al. [SIGGRAPH 2006, I3D 2008],[object Object],Munkberg et al. [SIGGRAPH 2006, CGF 2008],[object Object],Sun et al. [Graphics Hardware 2008,  IEEE TVCG 2010],[object Object],BC6H/BPTC [DirectX 11, OpenGL 4],[object Object]
FP16 texture compression,[object Object],Roimela et al. [SIGGRAPH 2006, I3D 2008],[object Object],Munkberg et al. [SIGGRAPH 2006, CGF 2008],[object Object],Sun et al. [Graphics Hardware 2008,  IEEE TVCG 2010],[object Object],BC6H/BPTC [DirectX 11, OpenGL 4],[object Object],Far too high complexity,[object Object]
FP16 texture compression,[object Object],Roimela et al. [SIGGRAPH 2006, I3D 2008],[object Object],Munkberg et al. [SIGGRAPH 2006, CGF 2008],[object Object],Sun et al. [Graphics Hardware 2008,  IEEE TVCG 2010],[object Object],BC6H/BPTC [DirectX 11, OpenGL 4],[object Object],Our contribution,[object Object],Implemented and optimized #1 (a.k.a. ”NXR”),[object Object],Benchmarked against #4,[object Object]
Red,[object Object],Baseline decoder,[object Object],Extract bitfields,[object Object],R, B,,[object Object],Lexponent,[object Object],Lmantissa,[object Object],int-to-fp16 converter,[object Object],fp16 multiplier,[object Object],R,[object Object],R,[object Object],210,[object Object],            Green,[object Object],int-to-fp16 converter,[object Object],fp16 multiplier,[object Object],G,[object Object],Blue,[object Object],int-to-fp16 converter,[object Object],fp16 multiplier,[object Object],B,[object Object],B,[object Object],Lexponent,[object Object],fp16 normalizer,[object Object],Lmantissa,[object Object]
Optimizations,[object Object],Simplify this,[object Object],Red,[object Object],Extract bitfields,[object Object],R, B,,[object Object],Lexponent,[object Object],Lmantissa,[object Object],int-to-fp16 converter,[object Object],fp16 multiplier,[object Object],R,[object Object],R,[object Object],210,[object Object],            Green,[object Object],int-to-fp16 converter,[object Object],fp16 multiplier,[object Object],G,[object Object],Blue,[object Object],int-to-fp16 converter,[object Object],fp16 multiplier,[object Object],B,[object Object],B,[object Object],Simplify this,[object Object],Lexponent,[object Object],fp16 normalizer,[object Object],Lmantissa,[object Object]
Optimizations (Part 1),[object Object],Red and Blue are in 0.10-bit fixed point,[object Object], Can be treated as fp16 denormals with no conversion logic,[object Object],Simplify the multipliers (L*R and L*B),[object Object],Exponent can’t increase – remove biasing and overflow logic,[object Object],Mantissa will fit in 1.20 fixed point – remove overflow logic,[object Object],At most 10 leading zeros – truncate post-normalizers,[object Object],No need to deal with signs, infinities and NaNs,[object Object]
Red,[object Object],Extract bitfields,[object Object],R, B,,[object Object],Lexponent,[object Object],Lmantissa,[object Object],Green,[object Object],Blue,[object Object],Optimized decoder,[object Object]
Optimized decoder,[object Object],CLZ,[object Object],Count Leading Zeros,[object Object],<<,[object Object],Shift Left,[object Object],10 x 11 -bit multiplier,[object Object],Extract bitfields,[object Object],R, B,,[object Object],Lexponent,[object Object],Lmantissa,[object Object],Red,[object Object],Clamp, Shift &,[object Object],Pack,[object Object],Rexponent,[object Object],Lexponent,[object Object],R,[object Object],R,[object Object],CLZ,[object Object],Rmantissa,[object Object],<<,[object Object],Green,[object Object],Lmantissa,[object Object],Blue,[object Object]
Optimized decoder,[object Object],CLZ,[object Object],Count Leading Zeros,[object Object],<<,[object Object],Shift Left,[object Object],10 x 11 -bit multiplier,[object Object],Extract bitfields,[object Object],R, B,,[object Object],Lexponent,[object Object],Lmantissa,[object Object],Red,[object Object],Clamp, Shift &,[object Object],Pack,[object Object],Rexponent,[object Object],Lexponent,[object Object],R,[object Object],R,[object Object],CLZ,[object Object],Rmantissa,[object Object],<<,[object Object],Green,[object Object],Lmantissa,[object Object],Blue,[object Object],<<,[object Object],Clamp, Shift & Pack,[object Object],Bmantissa,[object Object],B,[object Object],B,[object Object],CLZ,[object Object],Lexponent,[object Object],Bexponent,[object Object]
Optimizations (Part 2),[object Object],Eliminate the green channel multiplier,[object Object],LG = L (1024 – (R + B)) = 1024L – (LR + LB),[object Object],Two 20-bit adders are much cheaper than a 10-bit multiplier,[object Object],Round to zero instead of nearest,[object Object],Introduces a maximum of 1-bit error,[object Object],Compression error is much larger, 4-8 bits,[object Object]
Optimized decoder,[object Object],CLZ,[object Object],Count Leading Zeros,[object Object],<<,[object Object],Shift Left,[object Object],10 x 11 -bit multiplier,[object Object],Extract bitfields,[object Object],R, B,,[object Object],Lexponent,[object Object],Lmantissa,[object Object],Red,[object Object],Clamp, Shift &,[object Object],Pack,[object Object],Rexponent,[object Object],Lexponent,[object Object],R,[object Object],R,[object Object],CLZ,[object Object],Rmantissa,[object Object],<<,[object Object],Green,[object Object],Lexponent,[object Object],Clamp, Shift & Pack,[object Object],220,[object Object],Gexponent,[object Object],Lmantissa,[object Object],G,[object Object],CLZ,[object Object],Gmantissa,[object Object],<<,[object Object],Blue,[object Object],<<,[object Object],Clamp, Shift & Pack,[object Object],Bmantissa,[object Object],B,[object Object],B,[object Object],CLZ,[object Object],Lexponent,[object Object],Bexponent,[object Object]
FPGA synthesis (Altera Stratix III),[object Object]
ASIC synthesis @ 180 nm (Synopsys),[object Object]
ASIC synthesis @ 180 nm (Synopsys),[object Object],Only one of 14 modes.,[object Object],A complete decoder would be somewhat larger.,[object Object]
ASIC synthesis @ 180 nm (Synopsys),[object Object],Relatively long critical path, due to leading-zero counters.,[object Object]
Summary,[object Object],VHDL implementation of a floating-point texture decoder,[object Object],Our optimizations reduced area by ~50%,[object Object],Competing decoder turned out 75% larger,[object Object],Main weakness: long critical path,[object Object],Completely feasible to put on real hardware,[object Object]
Future work,[object Object],Measure power consumption,[object Object],More important than silicon area,[object Object],Optimize the long latency,[object Object],Can also help reduce area & power,[object Object],Implement an encoder in ASIC,[object Object],Textures are increasingly generated in real time,[object Object]
Efficient Floating-Point Texture Decompression,[object Object],Tomi Aarnio (NRC Tampere),[object Object],Claudio Brunelli (NRC Tampere),[object Object],Timo Viitanen (TUT),[object Object]

Mais conteúdo relacionado

Último

NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 

Último (20)

NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 

Efficient floating-point texture decompression

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.

Notas do Editor

  1. The latest NVIDIA GeForce GTX 480 can fetch 42 billion texels per second, and the decoder must keep up with that.