SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
1

GPU Ray Tracing
with CUDA
BY TOM PITKIN

Bill Clark, PhD
Stu Steiner, MS, PhC
Objectives


Develop a sequential CPU and parallel GPU ray tracer



Illustrate the difference in rendering speed and design of a CPU and
GPU ray tracer

2
Outline


Introduction to Ray Tracing



CUDA



Parallelization with CUDA / Results



Future Work



Questions

3
What is Ray Tracing?


Rendering technique used in computer graphics



Simulates the behavior of light



Can produce advanced optical effects

4
Light in the Physical World

5
Light Source

Film
Object with
Red Reflectivity

Pinhole
The Virtual Camera Model


Eye Position – camera location in 3D space



Reference Point – point in 3D space where the camera is pointing



Orientation Vectors (u, v, n) – camera orientation in 3D space



Image Plane – projected plane of the camera’s field of view

Reference Point
v (Up Vector)
n

u
Eye Position

6
Ray Generation


Map the physical screen to the image plane



Divide the image plane into a uniform grid of pixel locations



7

Send a ray through the center of each pixel location

𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝐻𝑒𝑖𝑔ℎ𝑡
𝑆𝑐𝑟𝑒𝑒𝑛 𝐻𝑒𝑖𝑔ℎ𝑡

Pixel
Eye Position
𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝑊𝑖𝑑𝑡ℎ
𝑆𝑐𝑟𝑒𝑒𝑛 𝑊𝑖𝑑𝑡ℎ
Ray Intersection Testing


Ray – Sphere Intersection



Ray – Triangle Intersection

8
Phong Reflection Model

Ambient

+

Diffuse

+

Specular

9

=

Phong Reflection
Specular Reflection


Recursive Ray Tracing

10
Outline


Introduction to Ray Tracing



CUDA



Parallelization with CUDA / Results



Future Work



Questions

11
What is CUDA?


Compute Unified Device Architecture (CUDA)



Parallel computing platform



Developed by Nvidia

12
Kernel Functions


Specifies the code to be executed in parallel



Single Program, Multiple Data (SPMD)

13
Kernel Execution


Grids



Blocks



Threads

14
Memory Model


Global Memory



Constant Memory



Texture Memory



Registers



Local Memory



Shared Memory

15
Outline


Introduction to Ray Tracing



CUDA



Parallelization with CUDA / Results



Future Work



Questions

16
Thread Organization


2D array of blocks



2D array of threads



17

Each thread represents
a ray

Block (0, 0)

Block (1, 0)

Block (2, 0)

Block (0, 1)

Block (1, 1)

Block (2, 1)

Image Plane
Testing Environment


OS – Ubuntu Gnome Remix 13.04



CPU – Core i7-920




Core Clock – 2.66 GHz

GPU – Nvidia GTX 570


Core Clock - 742 MHz



CUDA Core - 480



Memory Clock - 3800 MHz



Video Memory - GDDR5 1280MB

18
Test Objects


Teapot






Surfaces: 1

Triangles: 992

Al





Surfaces: 174
Triangles: 7,124

Crocodile


Surfaces: 6



Triangles: 34,404

19
Single Kernel

20
Single Kernel
Single Thread

160 (0.16 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

23,003 (23 sec)

411 (0.41 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

55,260 (55.26 sec)

5,867 (5.87 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

1,617,160 (26.95 min)
1

10

100

1,000

10,000

Milliseconds

100,000

1,000,000

10,000,000
Kernel Complexity and Size


Driver timeout



Register Spilling

21
Replacing Recursion


Iterative Loop



Layer based stack




Layers store color values returned from rays

Final image from convex combination of layers

22
Multi-Kernel

23
Multi-Kernel
Single Kernel (Previous Kernel)

381 (0.38 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

160 (0.16 sec)

967 (0.97 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

411 (0.41 sec)

13,217 (13.22 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

5,867 (5.87 sec)
1

10

100

1,000

Milliseconds

10,000

100,000
Multi-Kernel with Single-Precision Floating Points

24

Multi-Kernel with Single-Precision
Floating Points
Multi-Kernel (Previous Kernel)
46 (0.05 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

381 (0.38 sec)

118 (0.12 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

967 (0.97 sec)

1,556 (1.56 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

13,217 (13.22 sec)
1

10

100

1,000

Milliseconds

10,000

100,000
Caching Surface Data


Object’s surface data stored on shared memory



All threads in same block have access to cached surface data



Removes duplicate memory requests



Data reuse

25
Multi-Kernel with Surface Caching

26

Multi-Kernel with Surface Caching
Multi-Kernel with Single-Precision
Floating Points (Previous Kernel)

30 (0.03 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

46 (0.05 sec)

133 (0.13 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

118 (0.12 sec)

1,007 (1.01 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

1,556 (1.56 sec)
1

10

100

Milliseconds

1,000

10,000
Simplifying Mesh Data

27



Triangle data originally stored as three points (vertices)



Optimize data by storing triangles as one point (vertex) and two edges


Calculate edges on host before kernel call

0.5, 1

0, 0

0.5, 1

1, 0
Multi-Kernel with Mesh Optimization

28

Multi-Kernel with Mesh Optimization
Multi-Kernel with Surface Caching
(Previous Kernel)

27 (0.03 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

30 (0.03 sec)

127 (0.13 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

133 (0.13 sec)

873 (0.87 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

1,007 (1.01 sec)
1

10

100

Milliseconds

1,000

10,000
Final Results

29
Multi-Kernel with Intersection
Optimization
Single Thread

27 (0.03 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

23,003 (23 sec)

127 (0.13 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

55,260 (55.26 sec)

873 (0.87 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

1,617,160 (26.95 min)
1

10

100

1,000

10,000

Milliseconds

100,000

1,000,000

10,000,000
Outline


Introduction to Ray Tracing



CUDA



Parallelization with CUDA / Results



Future Work



Questions

30
Future Work


Spatial partitioning



Multiple GPUs



Optimize code for different GPUs

31
Questions?

32

Mais conteúdo relacionado

Mais procurados

Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning
JAVAID AHMAD WANI
 
Master Thesis, Preliminary Defense
Master Thesis, Preliminary DefenseMaster Thesis, Preliminary Defense
Master Thesis, Preliminary Defense
Jenkins Macedo
 
IT6611 Mobile Application Development Lab Manual
IT6611 Mobile Application Development Lab ManualIT6611 Mobile Application Development Lab Manual
IT6611 Mobile Application Development Lab Manual
pkaviya
 
Thesis Power Point Presentation
Thesis Power Point PresentationThesis Power Point Presentation
Thesis Power Point Presentation
riddhikapandya1985
 

Mais procurados (20)

PhD. Thesis defence Slides
PhD. Thesis defence SlidesPhD. Thesis defence Slides
PhD. Thesis defence Slides
 
Introduction to multiple object tracking
Introduction to multiple object trackingIntroduction to multiple object tracking
Introduction to multiple object tracking
 
My PhD thesis defense presentation
My PhD thesis defense presentationMy PhD thesis defense presentation
My PhD thesis defense presentation
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for Graphs
 
Human Activity Recognition
Human Activity RecognitionHuman Activity Recognition
Human Activity Recognition
 
Masters Thesis Defense Presentation
Masters Thesis Defense PresentationMasters Thesis Defense Presentation
Masters Thesis Defense Presentation
 
Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning
 
Human Action Recognition
Human Action RecognitionHuman Action Recognition
Human Action Recognition
 
Master Thesis, Preliminary Defense
Master Thesis, Preliminary DefenseMaster Thesis, Preliminary Defense
Master Thesis, Preliminary Defense
 
Msc Thesis - Presentation
Msc Thesis - PresentationMsc Thesis - Presentation
Msc Thesis - Presentation
 
Process Mining - Chapter 3 - Data Mining
Process Mining - Chapter 3 - Data MiningProcess Mining - Chapter 3 - Data Mining
Process Mining - Chapter 3 - Data Mining
 
Human Pose Estimation by Deep Learning
Human Pose Estimation by Deep LearningHuman Pose Estimation by Deep Learning
Human Pose Estimation by Deep Learning
 
IT6611 Mobile Application Development Lab Manual
IT6611 Mobile Application Development Lab ManualIT6611 Mobile Application Development Lab Manual
IT6611 Mobile Application Development Lab Manual
 
Graduate Research Thesis Defense Presentation
Graduate Research Thesis Defense Presentation  Graduate Research Thesis Defense Presentation
Graduate Research Thesis Defense Presentation
 
Thesis Power Point Presentation
Thesis Power Point PresentationThesis Power Point Presentation
Thesis Power Point Presentation
 
M.S. Thesis Defense
M.S. Thesis DefenseM.S. Thesis Defense
M.S. Thesis Defense
 
Machine learning Summer Training report
Machine learning Summer Training reportMachine learning Summer Training report
Machine learning Summer Training report
 
Animal identification using machine learning techniques
Animal identification using machine learning techniquesAnimal identification using machine learning techniques
Animal identification using machine learning techniques
 
Computer Vision Crash Course
Computer Vision Crash CourseComputer Vision Crash Course
Computer Vision Crash Course
 
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
 

Destaque

Cyber Security - IDS/IPS is not enough
Cyber Security - IDS/IPS is not enoughCyber Security - IDS/IPS is not enough
Cyber Security - IDS/IPS is not enough
Savvius, Inc
 
Powerpoint presentation M.A. Thesis Defence
Powerpoint presentation M.A. Thesis DefencePowerpoint presentation M.A. Thesis Defence
Powerpoint presentation M.A. Thesis Defence
Catie Chase
 

Destaque (8)

Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Hardening Your Config Management - Security and Attack Vectors in Config Mana...
Hardening Your Config Management - Security and Attack Vectors in Config Mana...Hardening Your Config Management - Security and Attack Vectors in Config Mana...
Hardening Your Config Management - Security and Attack Vectors in Config Mana...
 
Cyber Security - IDS/IPS is not enough
Cyber Security - IDS/IPS is not enoughCyber Security - IDS/IPS is not enough
Cyber Security - IDS/IPS is not enough
 
Computer Security and Intrusion Detection(IDS/IPS)
Computer Security and Intrusion Detection(IDS/IPS)Computer Security and Intrusion Detection(IDS/IPS)
Computer Security and Intrusion Detection(IDS/IPS)
 
IDS and IPS
IDS and IPSIDS and IPS
IDS and IPS
 
My Thesis Defense Presentation
My Thesis Defense PresentationMy Thesis Defense Presentation
My Thesis Defense Presentation
 
Powerpoint presentation M.A. Thesis Defence
Powerpoint presentation M.A. Thesis DefencePowerpoint presentation M.A. Thesis Defence
Powerpoint presentation M.A. Thesis Defence
 
How to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a ProfessionalHow to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a Professional
 

Semelhante a Computer Science Thesis Defense

The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805
mistercteam
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
Johan Andersson
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
Thomas Goddard
 
CUDA by Example : Constant Memory and Events : Notes
CUDA by Example : Constant Memory and Events : NotesCUDA by Example : Constant Memory and Events : Notes
CUDA by Example : Constant Memory and Events : Notes
Subhajit Sahu
 

Semelhante a Computer Science Thesis Defense (20)

The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805
 
Computer graphics
Computer graphicsComputer graphics
Computer graphics
 
Computer Graphics - Introduction and CRT Devices
Computer Graphics - Introduction and CRT DevicesComputer Graphics - Introduction and CRT Devices
Computer Graphics - Introduction and CRT Devices
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUsPerformance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
 
Octnews featured article
Octnews featured articleOctnews featured article
Octnews featured article
 
2.Hardware.ppt
2.Hardware.ppt2.Hardware.ppt
2.Hardware.ppt
 
高解析度面板瑕疵檢測
高解析度面板瑕疵檢測高解析度面板瑕疵檢測
高解析度面板瑕疵檢測
 
Multi-core GPU – Fast parallel SAR image generation
Multi-core GPU – Fast parallel SAR image generationMulti-core GPU – Fast parallel SAR image generation
Multi-core GPU – Fast parallel SAR image generation
 
Sparse coding Super-Resolution を用いた核医学画像処理
Sparse coding Super-Resolution を用いた核医学画像処理Sparse coding Super-Resolution を用いた核医学画像処理
Sparse coding Super-Resolution を用いた核医学画像処理
 
Build Your Own 3D Scanner: 3D Scanning with Swept-Planes
Build Your Own 3D Scanner: 3D Scanning with Swept-PlanesBuild Your Own 3D Scanner: 3D Scanning with Swept-Planes
Build Your Own 3D Scanner: 3D Scanning with Swept-Planes
 
thesis
thesisthesis
thesis
 
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
 
Rendering Techniques in Virtual Reality.pdf
Rendering Techniques in Virtual Reality.pdfRendering Techniques in Virtual Reality.pdf
Rendering Techniques in Virtual Reality.pdf
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
 
CUDA by Example : Constant Memory and Events : Notes
CUDA by Example : Constant Memory and Events : NotesCUDA by Example : Constant Memory and Events : Notes
CUDA by Example : Constant Memory and Events : Notes
 
Svr Raskar
Svr RaskarSvr Raskar
Svr Raskar
 
Advanced Game Development with the Mobile 3D Graphics API
Advanced Game Development with the Mobile 3D Graphics APIAdvanced Game Development with the Mobile 3D Graphics API
Advanced Game Development with the Mobile 3D Graphics API
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Computer Science Thesis Defense

  • 1. 1 GPU Ray Tracing with CUDA BY TOM PITKIN Bill Clark, PhD Stu Steiner, MS, PhC
  • 2. Objectives  Develop a sequential CPU and parallel GPU ray tracer  Illustrate the difference in rendering speed and design of a CPU and GPU ray tracer 2
  • 3. Outline  Introduction to Ray Tracing  CUDA  Parallelization with CUDA / Results  Future Work  Questions 3
  • 4. What is Ray Tracing?  Rendering technique used in computer graphics  Simulates the behavior of light  Can produce advanced optical effects 4
  • 5. Light in the Physical World 5 Light Source Film Object with Red Reflectivity Pinhole
  • 6. The Virtual Camera Model  Eye Position – camera location in 3D space  Reference Point – point in 3D space where the camera is pointing  Orientation Vectors (u, v, n) – camera orientation in 3D space  Image Plane – projected plane of the camera’s field of view Reference Point v (Up Vector) n u Eye Position 6
  • 7. Ray Generation  Map the physical screen to the image plane  Divide the image plane into a uniform grid of pixel locations  7 Send a ray through the center of each pixel location 𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝐻𝑒𝑖𝑔ℎ𝑡 𝑆𝑐𝑟𝑒𝑒𝑛 𝐻𝑒𝑖𝑔ℎ𝑡 Pixel Eye Position 𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝑊𝑖𝑑𝑡ℎ 𝑆𝑐𝑟𝑒𝑒𝑛 𝑊𝑖𝑑𝑡ℎ
  • 8. Ray Intersection Testing  Ray – Sphere Intersection  Ray – Triangle Intersection 8
  • 11. Outline  Introduction to Ray Tracing  CUDA  Parallelization with CUDA / Results  Future Work  Questions 11
  • 12. What is CUDA?  Compute Unified Device Architecture (CUDA)  Parallel computing platform  Developed by Nvidia 12
  • 13. Kernel Functions  Specifies the code to be executed in parallel  Single Program, Multiple Data (SPMD) 13
  • 15. Memory Model  Global Memory  Constant Memory  Texture Memory  Registers  Local Memory  Shared Memory 15
  • 16. Outline  Introduction to Ray Tracing  CUDA  Parallelization with CUDA / Results  Future Work  Questions 16
  • 17. Thread Organization  2D array of blocks  2D array of threads  17 Each thread represents a ray Block (0, 0) Block (1, 0) Block (2, 0) Block (0, 1) Block (1, 1) Block (2, 1) Image Plane
  • 18. Testing Environment  OS – Ubuntu Gnome Remix 13.04  CPU – Core i7-920   Core Clock – 2.66 GHz GPU – Nvidia GTX 570  Core Clock - 742 MHz  CUDA Core - 480  Memory Clock - 3800 MHz  Video Memory - GDDR5 1280MB 18
  • 19. Test Objects  Teapot    Surfaces: 1 Triangles: 992 Al    Surfaces: 174 Triangles: 7,124 Crocodile  Surfaces: 6  Triangles: 34,404 19
  • 20. Single Kernel 20 Single Kernel Single Thread 160 (0.16 sec) Teapot (Surfaces: 1) (Triangles: 992) 23,003 (23 sec) 411 (0.41 sec) Al (Surfaces: 174) (Triangles: 7,124) 55,260 (55.26 sec) 5,867 (5.87 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 1,617,160 (26.95 min) 1 10 100 1,000 10,000 Milliseconds 100,000 1,000,000 10,000,000
  • 21. Kernel Complexity and Size  Driver timeout  Register Spilling 21
  • 22. Replacing Recursion  Iterative Loop  Layer based stack   Layers store color values returned from rays Final image from convex combination of layers 22
  • 23. Multi-Kernel 23 Multi-Kernel Single Kernel (Previous Kernel) 381 (0.38 sec) Teapot (Surfaces: 1) (Triangles: 992) 160 (0.16 sec) 967 (0.97 sec) Al (Surfaces: 174) (Triangles: 7,124) 411 (0.41 sec) 13,217 (13.22 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 5,867 (5.87 sec) 1 10 100 1,000 Milliseconds 10,000 100,000
  • 24. Multi-Kernel with Single-Precision Floating Points 24 Multi-Kernel with Single-Precision Floating Points Multi-Kernel (Previous Kernel) 46 (0.05 sec) Teapot (Surfaces: 1) (Triangles: 992) 381 (0.38 sec) 118 (0.12 sec) Al (Surfaces: 174) (Triangles: 7,124) 967 (0.97 sec) 1,556 (1.56 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 13,217 (13.22 sec) 1 10 100 1,000 Milliseconds 10,000 100,000
  • 25. Caching Surface Data  Object’s surface data stored on shared memory  All threads in same block have access to cached surface data  Removes duplicate memory requests  Data reuse 25
  • 26. Multi-Kernel with Surface Caching 26 Multi-Kernel with Surface Caching Multi-Kernel with Single-Precision Floating Points (Previous Kernel) 30 (0.03 sec) Teapot (Surfaces: 1) (Triangles: 992) 46 (0.05 sec) 133 (0.13 sec) Al (Surfaces: 174) (Triangles: 7,124) 118 (0.12 sec) 1,007 (1.01 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 1,556 (1.56 sec) 1 10 100 Milliseconds 1,000 10,000
  • 27. Simplifying Mesh Data 27  Triangle data originally stored as three points (vertices)  Optimize data by storing triangles as one point (vertex) and two edges  Calculate edges on host before kernel call 0.5, 1 0, 0 0.5, 1 1, 0
  • 28. Multi-Kernel with Mesh Optimization 28 Multi-Kernel with Mesh Optimization Multi-Kernel with Surface Caching (Previous Kernel) 27 (0.03 sec) Teapot (Surfaces: 1) (Triangles: 992) 30 (0.03 sec) 127 (0.13 sec) Al (Surfaces: 174) (Triangles: 7,124) 133 (0.13 sec) 873 (0.87 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 1,007 (1.01 sec) 1 10 100 Milliseconds 1,000 10,000
  • 29. Final Results 29 Multi-Kernel with Intersection Optimization Single Thread 27 (0.03 sec) Teapot (Surfaces: 1) (Triangles: 992) 23,003 (23 sec) 127 (0.13 sec) Al (Surfaces: 174) (Triangles: 7,124) 55,260 (55.26 sec) 873 (0.87 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 1,617,160 (26.95 min) 1 10 100 1,000 10,000 Milliseconds 100,000 1,000,000 10,000,000
  • 30. Outline  Introduction to Ray Tracing  CUDA  Parallelization with CUDA / Results  Future Work  Questions 30
  • 31. Future Work  Spatial partitioning  Multiple GPUs  Optimize code for different GPUs 31

Notas do Editor

  1. Used C++ and CUDA
  2. Forward Ray TracingBackward Ray Tracing
  3. Pixel – picture element that represents one point on an image. Consists of a single color
  4. Don’t forget to mention what happens if a ray misses completely
  5. Ambient Light – indirect light reflected off of other objects in the sceneDiffuse Light – direct light reflected off the surface in all directionsSpecular light – direct light reflected off the surface in a single direction
  6. Block and Threads have unique identifier
  7. Register Memory – 50x faster than Global MemoryL2 Cache – LRU (Least Recently Used)L1 Cache – Spatial Locality (Quickly access memory in nearby location of current memory reference), Caches per-thread stack and other local data structures
  8. Logarithmic ScaleSingle Pass, 640 x 480
  9. 1852X speedup!