SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
1

GPU Ray Tracing
with CUDA
BY TOM PITKIN

Bill Clark, PhD
Stu Steiner, MS, PhC
Objectives


Develop a sequential CPU and parallel GPU ray tracer



Illustrate the difference in rendering speed and design of a CPU and
GPU ray tracer

2
Outline


Introduction to Ray Tracing



CUDA



Parallelization with CUDA / Results



Future Work



Questions

3
What is Ray Tracing?


Rendering technique used in computer graphics



Simulates the behavior of light



Can produce advanced optical effects

4
Light in the Physical World

5
Light Source

Film
Object with
Red Reflectivity

Pinhole
The Virtual Camera Model


Eye Position – camera location in 3D space



Reference Point – point in 3D space where the camera is pointing



Orientation Vectors (u, v, n) – camera orientation in 3D space



Image Plane – projected plane of the camera’s field of view

Reference Point
v (Up Vector)
n

u
Eye Position

6
Ray Generation


Map the physical screen to the image plane



Divide the image plane into a uniform grid of pixel locations



7

Send a ray through the center of each pixel location

𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝐻𝑒𝑖𝑔ℎ𝑡
𝑆𝑐𝑟𝑒𝑒𝑛 𝐻𝑒𝑖𝑔ℎ𝑡

Pixel
Eye Position
𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝑊𝑖𝑑𝑡ℎ
𝑆𝑐𝑟𝑒𝑒𝑛 𝑊𝑖𝑑𝑡ℎ
Ray Intersection Testing


Ray – Sphere Intersection



Ray – Triangle Intersection

8
Phong Reflection Model

Ambient

+

Diffuse

+

Specular

9

=

Phong Reflection
Specular Reflection


Recursive Ray Tracing

10
Outline


Introduction to Ray Tracing



CUDA



Parallelization with CUDA / Results



Future Work



Questions

11
What is CUDA?


Compute Unified Device Architecture (CUDA)



Parallel computing platform



Developed by Nvidia

12
Kernel Functions


Specifies the code to be executed in parallel



Single Program, Multiple Data (SPMD)

13
Kernel Execution


Grids



Blocks



Threads

14
Memory Model


Global Memory



Constant Memory



Texture Memory



Registers



Local Memory



Shared Memory

15
Outline


Introduction to Ray Tracing



CUDA



Parallelization with CUDA / Results



Future Work



Questions

16
Thread Organization


2D array of blocks



2D array of threads



17

Each thread represents
a ray

Block (0, 0)

Block (1, 0)

Block (2, 0)

Block (0, 1)

Block (1, 1)

Block (2, 1)

Image Plane
Testing Environment


OS – Ubuntu Gnome Remix 13.04



CPU – Core i7-920




Core Clock – 2.66 GHz

GPU – Nvidia GTX 570


Core Clock - 742 MHz



CUDA Core - 480



Memory Clock - 3800 MHz



Video Memory - GDDR5 1280MB

18
Test Objects


Teapot






Surfaces: 1

Triangles: 992

Al





Surfaces: 174
Triangles: 7,124

Crocodile


Surfaces: 6



Triangles: 34,404

19
Single Kernel

20
Single Kernel
Single Thread

160 (0.16 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

23,003 (23 sec)

411 (0.41 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

55,260 (55.26 sec)

5,867 (5.87 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

1,617,160 (26.95 min)
1

10

100

1,000

10,000

Milliseconds

100,000

1,000,000

10,000,000
Kernel Complexity and Size


Driver timeout



Register Spilling

21
Replacing Recursion


Iterative Loop



Layer based stack




Layers store color values returned from rays

Final image from convex combination of layers

22
Multi-Kernel

23
Multi-Kernel
Single Kernel (Previous Kernel)

381 (0.38 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

160 (0.16 sec)

967 (0.97 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

411 (0.41 sec)

13,217 (13.22 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

5,867 (5.87 sec)
1

10

100

1,000

Milliseconds

10,000

100,000
Multi-Kernel with Single-Precision Floating Points

24

Multi-Kernel with Single-Precision
Floating Points
Multi-Kernel (Previous Kernel)
46 (0.05 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

381 (0.38 sec)

118 (0.12 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

967 (0.97 sec)

1,556 (1.56 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

13,217 (13.22 sec)
1

10

100

1,000

Milliseconds

10,000

100,000
Caching Surface Data


Object’s surface data stored on shared memory



All threads in same block have access to cached surface data



Removes duplicate memory requests



Data reuse

25
Multi-Kernel with Surface Caching

26

Multi-Kernel with Surface Caching
Multi-Kernel with Single-Precision
Floating Points (Previous Kernel)

30 (0.03 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

46 (0.05 sec)

133 (0.13 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

118 (0.12 sec)

1,007 (1.01 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

1,556 (1.56 sec)
1

10

100

Milliseconds

1,000

10,000
Simplifying Mesh Data

27



Triangle data originally stored as three points (vertices)



Optimize data by storing triangles as one point (vertex) and two edges


Calculate edges on host before kernel call

0.5, 1

0, 0

0.5, 1

1, 0
Multi-Kernel with Mesh Optimization

28

Multi-Kernel with Mesh Optimization
Multi-Kernel with Surface Caching
(Previous Kernel)

27 (0.03 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

30 (0.03 sec)

127 (0.13 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

133 (0.13 sec)

873 (0.87 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

1,007 (1.01 sec)
1

10

100

Milliseconds

1,000

10,000
Final Results

29
Multi-Kernel with Intersection
Optimization
Single Thread

27 (0.03 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

23,003 (23 sec)

127 (0.13 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

55,260 (55.26 sec)

873 (0.87 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

1,617,160 (26.95 min)
1

10

100

1,000

10,000

Milliseconds

100,000

1,000,000

10,000,000
Outline


Introduction to Ray Tracing



CUDA



Parallelization with CUDA / Results



Future Work



Questions

30
Future Work


Spatial partitioning



Multiple GPUs



Optimize code for different GPUs

31
Questions?

32

Mais conteúdo relacionado

Mais procurados

introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 
Introduction to Artificial Neural Networks (ANNs) - Step-by-Step Training & T...
Introduction to Artificial Neural Networks (ANNs) - Step-by-Step Training & T...Introduction to Artificial Neural Networks (ANNs) - Step-by-Step Training & T...
Introduction to Artificial Neural Networks (ANNs) - Step-by-Step Training & T...Ahmed Gad
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data VisualizationRaffael Marty
 
Introduction to Aneka, Aneka Model is explained
Introduction to Aneka, Aneka Model is explainedIntroduction to Aneka, Aneka Model is explained
Introduction to Aneka, Aneka Model is explainedDr Neelesh Jain
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Simplilearn
 
Knowledge representation and Predicate logic
Knowledge representation and Predicate logicKnowledge representation and Predicate logic
Knowledge representation and Predicate logicAmey Kerkar
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machinesNawal Sharma
 
First Order Logic resolution
First Order Logic resolutionFirst Order Logic resolution
First Order Logic resolutionAmar Jukuntla
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDatamining Tools
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methodsReza Ramezani
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed SystemSunita Sahu
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval ModelsNisha Arankandath
 
IRJET- Blockchain based Fake Product Identification in Supply Chain
IRJET- Blockchain based Fake Product Identification in Supply ChainIRJET- Blockchain based Fake Product Identification in Supply Chain
IRJET- Blockchain based Fake Product Identification in Supply ChainIRJET Journal
 

Mais procurados (20)

Human Action Recognition
Human Action RecognitionHuman Action Recognition
Human Action Recognition
 
Cloud security ppt
Cloud security pptCloud security ppt
Cloud security ppt
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Introduction to Artificial Neural Networks (ANNs) - Step-by-Step Training & T...
Introduction to Artificial Neural Networks (ANNs) - Step-by-Step Training & T...Introduction to Artificial Neural Networks (ANNs) - Step-by-Step Training & T...
Introduction to Artificial Neural Networks (ANNs) - Step-by-Step Training & T...
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
Fuzzy c means manual work
Fuzzy c means manual workFuzzy c means manual work
Fuzzy c means manual work
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Introduction to Aneka, Aneka Model is explained
Introduction to Aneka, Aneka Model is explainedIntroduction to Aneka, Aneka Model is explained
Introduction to Aneka, Aneka Model is explained
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
 
Knowledge representation and Predicate logic
Knowledge representation and Predicate logicKnowledge representation and Predicate logic
Knowledge representation and Predicate logic
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machines
 
First Order Logic resolution
First Order Logic resolutionFirst Order Logic resolution
First Order Logic resolution
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval Models
 
Cs6703 grid and cloud computing unit 1
Cs6703 grid and cloud computing unit 1Cs6703 grid and cloud computing unit 1
Cs6703 grid and cloud computing unit 1
 
Session hijacking
Session hijackingSession hijacking
Session hijacking
 
IRJET- Blockchain based Fake Product Identification in Supply Chain
IRJET- Blockchain based Fake Product Identification in Supply ChainIRJET- Blockchain based Fake Product Identification in Supply Chain
IRJET- Blockchain based Fake Product Identification in Supply Chain
 

Destaque

Hardening Your Config Management - Security and Attack Vectors in Config Mana...
Hardening Your Config Management - Security and Attack Vectors in Config Mana...Hardening Your Config Management - Security and Attack Vectors in Config Mana...
Hardening Your Config Management - Security and Attack Vectors in Config Mana...Peter Souter
 
Cyber Security - IDS/IPS is not enough
Cyber Security - IDS/IPS is not enoughCyber Security - IDS/IPS is not enough
Cyber Security - IDS/IPS is not enoughSavvius, Inc
 
Computer Security and Intrusion Detection(IDS/IPS)
Computer Security and Intrusion Detection(IDS/IPS)Computer Security and Intrusion Detection(IDS/IPS)
Computer Security and Intrusion Detection(IDS/IPS)LJ PROJECTS
 
My Thesis Defense Presentation
My Thesis Defense PresentationMy Thesis Defense Presentation
My Thesis Defense PresentationOnur Taylan
 
Powerpoint presentation M.A. Thesis Defence
Powerpoint presentation M.A. Thesis DefencePowerpoint presentation M.A. Thesis Defence
Powerpoint presentation M.A. Thesis DefenceCatie Chase
 
How to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a ProfessionalHow to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a ProfessionalMiriam College
 

Destaque (8)

Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Hardening Your Config Management - Security and Attack Vectors in Config Mana...
Hardening Your Config Management - Security and Attack Vectors in Config Mana...Hardening Your Config Management - Security and Attack Vectors in Config Mana...
Hardening Your Config Management - Security and Attack Vectors in Config Mana...
 
Cyber Security - IDS/IPS is not enough
Cyber Security - IDS/IPS is not enoughCyber Security - IDS/IPS is not enough
Cyber Security - IDS/IPS is not enough
 
Computer Security and Intrusion Detection(IDS/IPS)
Computer Security and Intrusion Detection(IDS/IPS)Computer Security and Intrusion Detection(IDS/IPS)
Computer Security and Intrusion Detection(IDS/IPS)
 
IDS and IPS
IDS and IPSIDS and IPS
IDS and IPS
 
My Thesis Defense Presentation
My Thesis Defense PresentationMy Thesis Defense Presentation
My Thesis Defense Presentation
 
Powerpoint presentation M.A. Thesis Defence
Powerpoint presentation M.A. Thesis DefencePowerpoint presentation M.A. Thesis Defence
Powerpoint presentation M.A. Thesis Defence
 
How to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a ProfessionalHow to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a Professional
 

Semelhante a Computer Science Thesis Defense

The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805mistercteam
 
Computer graphics
Computer graphicsComputer graphics
Computer graphicsMohsin Azam
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...Johan Andersson
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUsPerformance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUsFisnik Kraja
 
Octnews featured article
Octnews featured articleOctnews featured article
Octnews featured articleKangZhang
 
高解析度面板瑕疵檢測
高解析度面板瑕疵檢測高解析度面板瑕疵檢測
高解析度面板瑕疵檢測CHENHuiMei
 
Multi-core GPU – Fast parallel SAR image generation
Multi-core GPU – Fast parallel SAR image generationMulti-core GPU – Fast parallel SAR image generation
Multi-core GPU – Fast parallel SAR image generationMahesh Khadatare
 
Sparse coding Super-Resolution を用いた核医学画像処理
Sparse coding Super-Resolution を用いた核医学画像処理Sparse coding Super-Resolution を用いた核医学画像処理
Sparse coding Super-Resolution を用いた核医学画像処理Yutaka KATAYAMA
 
Build Your Own 3D Scanner: 3D Scanning with Swept-Planes
Build Your Own 3D Scanner: 3D Scanning with Swept-PlanesBuild Your Own 3D Scanner: 3D Scanning with Swept-Planes
Build Your Own 3D Scanner: 3D Scanning with Swept-PlanesDouglas Lanman
 
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Fisnik Kraja
 
Rendering Techniques in Virtual Reality.pdf
Rendering Techniques in Virtual Reality.pdfRendering Techniques in Virtual Reality.pdf
Rendering Techniques in Virtual Reality.pdfaditya800563
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And EffectsThomas Goddard
 
CUDA by Example : Constant Memory and Events : Notes
CUDA by Example : Constant Memory and Events : NotesCUDA by Example : Constant Memory and Events : Notes
CUDA by Example : Constant Memory and Events : NotesSubhajit Sahu
 
Advanced Game Development with the Mobile 3D Graphics API
Advanced Game Development with the Mobile 3D Graphics APIAdvanced Game Development with the Mobile 3D Graphics API
Advanced Game Development with the Mobile 3D Graphics APITomi Aarnio
 

Semelhante a Computer Science Thesis Defense (20)

The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805
 
Computer graphics
Computer graphicsComputer graphics
Computer graphics
 
Computer Graphics - Introduction and CRT Devices
Computer Graphics - Introduction and CRT DevicesComputer Graphics - Introduction and CRT Devices
Computer Graphics - Introduction and CRT Devices
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUsPerformance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
 
Octnews featured article
Octnews featured articleOctnews featured article
Octnews featured article
 
2.Hardware.ppt
2.Hardware.ppt2.Hardware.ppt
2.Hardware.ppt
 
高解析度面板瑕疵檢測
高解析度面板瑕疵檢測高解析度面板瑕疵檢測
高解析度面板瑕疵檢測
 
Multi-core GPU – Fast parallel SAR image generation
Multi-core GPU – Fast parallel SAR image generationMulti-core GPU – Fast parallel SAR image generation
Multi-core GPU – Fast parallel SAR image generation
 
Sparse coding Super-Resolution を用いた核医学画像処理
Sparse coding Super-Resolution を用いた核医学画像処理Sparse coding Super-Resolution を用いた核医学画像処理
Sparse coding Super-Resolution を用いた核医学画像処理
 
Build Your Own 3D Scanner: 3D Scanning with Swept-Planes
Build Your Own 3D Scanner: 3D Scanning with Swept-PlanesBuild Your Own 3D Scanner: 3D Scanning with Swept-Planes
Build Your Own 3D Scanner: 3D Scanning with Swept-Planes
 
thesis
thesisthesis
thesis
 
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
 
Rendering Techniques in Virtual Reality.pdf
Rendering Techniques in Virtual Reality.pdfRendering Techniques in Virtual Reality.pdf
Rendering Techniques in Virtual Reality.pdf
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
 
CUDA by Example : Constant Memory and Events : Notes
CUDA by Example : Constant Memory and Events : NotesCUDA by Example : Constant Memory and Events : Notes
CUDA by Example : Constant Memory and Events : Notes
 
Svr Raskar
Svr RaskarSvr Raskar
Svr Raskar
 
Advanced Game Development with the Mobile 3D Graphics API
Advanced Game Development with the Mobile 3D Graphics APIAdvanced Game Development with the Mobile 3D Graphics API
Advanced Game Development with the Mobile 3D Graphics API
 

Último

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Computer Science Thesis Defense

  • 1. 1 GPU Ray Tracing with CUDA BY TOM PITKIN Bill Clark, PhD Stu Steiner, MS, PhC
  • 2. Objectives  Develop a sequential CPU and parallel GPU ray tracer  Illustrate the difference in rendering speed and design of a CPU and GPU ray tracer 2
  • 3. Outline  Introduction to Ray Tracing  CUDA  Parallelization with CUDA / Results  Future Work  Questions 3
  • 4. What is Ray Tracing?  Rendering technique used in computer graphics  Simulates the behavior of light  Can produce advanced optical effects 4
  • 5. Light in the Physical World 5 Light Source Film Object with Red Reflectivity Pinhole
  • 6. The Virtual Camera Model  Eye Position – camera location in 3D space  Reference Point – point in 3D space where the camera is pointing  Orientation Vectors (u, v, n) – camera orientation in 3D space  Image Plane – projected plane of the camera’s field of view Reference Point v (Up Vector) n u Eye Position 6
  • 7. Ray Generation  Map the physical screen to the image plane  Divide the image plane into a uniform grid of pixel locations  7 Send a ray through the center of each pixel location 𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝐻𝑒𝑖𝑔ℎ𝑡 𝑆𝑐𝑟𝑒𝑒𝑛 𝐻𝑒𝑖𝑔ℎ𝑡 Pixel Eye Position 𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝑊𝑖𝑑𝑡ℎ 𝑆𝑐𝑟𝑒𝑒𝑛 𝑊𝑖𝑑𝑡ℎ
  • 8. Ray Intersection Testing  Ray – Sphere Intersection  Ray – Triangle Intersection 8
  • 11. Outline  Introduction to Ray Tracing  CUDA  Parallelization with CUDA / Results  Future Work  Questions 11
  • 12. What is CUDA?  Compute Unified Device Architecture (CUDA)  Parallel computing platform  Developed by Nvidia 12
  • 13. Kernel Functions  Specifies the code to be executed in parallel  Single Program, Multiple Data (SPMD) 13
  • 15. Memory Model  Global Memory  Constant Memory  Texture Memory  Registers  Local Memory  Shared Memory 15
  • 16. Outline  Introduction to Ray Tracing  CUDA  Parallelization with CUDA / Results  Future Work  Questions 16
  • 17. Thread Organization  2D array of blocks  2D array of threads  17 Each thread represents a ray Block (0, 0) Block (1, 0) Block (2, 0) Block (0, 1) Block (1, 1) Block (2, 1) Image Plane
  • 18. Testing Environment  OS – Ubuntu Gnome Remix 13.04  CPU – Core i7-920   Core Clock – 2.66 GHz GPU – Nvidia GTX 570  Core Clock - 742 MHz  CUDA Core - 480  Memory Clock - 3800 MHz  Video Memory - GDDR5 1280MB 18
  • 19. Test Objects  Teapot    Surfaces: 1 Triangles: 992 Al    Surfaces: 174 Triangles: 7,124 Crocodile  Surfaces: 6  Triangles: 34,404 19
  • 20. Single Kernel 20 Single Kernel Single Thread 160 (0.16 sec) Teapot (Surfaces: 1) (Triangles: 992) 23,003 (23 sec) 411 (0.41 sec) Al (Surfaces: 174) (Triangles: 7,124) 55,260 (55.26 sec) 5,867 (5.87 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 1,617,160 (26.95 min) 1 10 100 1,000 10,000 Milliseconds 100,000 1,000,000 10,000,000
  • 21. Kernel Complexity and Size  Driver timeout  Register Spilling 21
  • 22. Replacing Recursion  Iterative Loop  Layer based stack   Layers store color values returned from rays Final image from convex combination of layers 22
  • 23. Multi-Kernel 23 Multi-Kernel Single Kernel (Previous Kernel) 381 (0.38 sec) Teapot (Surfaces: 1) (Triangles: 992) 160 (0.16 sec) 967 (0.97 sec) Al (Surfaces: 174) (Triangles: 7,124) 411 (0.41 sec) 13,217 (13.22 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 5,867 (5.87 sec) 1 10 100 1,000 Milliseconds 10,000 100,000
  • 24. Multi-Kernel with Single-Precision Floating Points 24 Multi-Kernel with Single-Precision Floating Points Multi-Kernel (Previous Kernel) 46 (0.05 sec) Teapot (Surfaces: 1) (Triangles: 992) 381 (0.38 sec) 118 (0.12 sec) Al (Surfaces: 174) (Triangles: 7,124) 967 (0.97 sec) 1,556 (1.56 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 13,217 (13.22 sec) 1 10 100 1,000 Milliseconds 10,000 100,000
  • 25. Caching Surface Data  Object’s surface data stored on shared memory  All threads in same block have access to cached surface data  Removes duplicate memory requests  Data reuse 25
  • 26. Multi-Kernel with Surface Caching 26 Multi-Kernel with Surface Caching Multi-Kernel with Single-Precision Floating Points (Previous Kernel) 30 (0.03 sec) Teapot (Surfaces: 1) (Triangles: 992) 46 (0.05 sec) 133 (0.13 sec) Al (Surfaces: 174) (Triangles: 7,124) 118 (0.12 sec) 1,007 (1.01 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 1,556 (1.56 sec) 1 10 100 Milliseconds 1,000 10,000
  • 27. Simplifying Mesh Data 27  Triangle data originally stored as three points (vertices)  Optimize data by storing triangles as one point (vertex) and two edges  Calculate edges on host before kernel call 0.5, 1 0, 0 0.5, 1 1, 0
  • 28. Multi-Kernel with Mesh Optimization 28 Multi-Kernel with Mesh Optimization Multi-Kernel with Surface Caching (Previous Kernel) 27 (0.03 sec) Teapot (Surfaces: 1) (Triangles: 992) 30 (0.03 sec) 127 (0.13 sec) Al (Surfaces: 174) (Triangles: 7,124) 133 (0.13 sec) 873 (0.87 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 1,007 (1.01 sec) 1 10 100 Milliseconds 1,000 10,000
  • 29. Final Results 29 Multi-Kernel with Intersection Optimization Single Thread 27 (0.03 sec) Teapot (Surfaces: 1) (Triangles: 992) 23,003 (23 sec) 127 (0.13 sec) Al (Surfaces: 174) (Triangles: 7,124) 55,260 (55.26 sec) 873 (0.87 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 1,617,160 (26.95 min) 1 10 100 1,000 10,000 Milliseconds 100,000 1,000,000 10,000,000
  • 30. Outline  Introduction to Ray Tracing  CUDA  Parallelization with CUDA / Results  Future Work  Questions 30
  • 31. Future Work  Spatial partitioning  Multiple GPUs  Optimize code for different GPUs 31

Notas do Editor

  1. Used C++ and CUDA
  2. Forward Ray TracingBackward Ray Tracing
  3. Pixel – picture element that represents one point on an image. Consists of a single color
  4. Don’t forget to mention what happens if a ray misses completely
  5. Ambient Light – indirect light reflected off of other objects in the sceneDiffuse Light – direct light reflected off the surface in all directionsSpecular light – direct light reflected off the surface in a single direction
  6. Block and Threads have unique identifier
  7. Register Memory – 50x faster than Global MemoryL2 Cache – LRU (Least Recently Used)L1 Cache – Spatial Locality (Quickly access memory in nearby location of current memory reference), Caches per-thread stack and other local data structures
  8. Logarithmic ScaleSingle Pass, 640 x 480
  9. 1852X speedup!