Enviar pesquisa
Carregar
ISCA final presentation - Queuing Model
•
Transferir como PPTX, PDF
•
7 gostaram
•
1,256 visualizações
HSA Foundation
Seguir
Tecnologia
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 59
Baixar agora
Recomendados
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
HSA Foundation
HSA System Architecture Overview (2014-10-31)
HSA System Architecture Overview (2014-10-31)
Jay Wang
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
HSA Foundation
HSA Kernel Code (KFD v0.6)
HSA Kernel Code (KFD v0.6)
Hann Yu-Ju Huang
HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013
HSA Foundation
HSA Design (2015-04-30)
HSA Design (2015-04-30)
Jay Wang
Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_
Linaro
Linux Initialization Process (1)
Linux Initialization Process (1)
shimosawa
Recomendados
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
HSA Foundation
HSA System Architecture Overview (2014-10-31)
HSA System Architecture Overview (2014-10-31)
Jay Wang
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
HSA Foundation
HSA Kernel Code (KFD v0.6)
HSA Kernel Code (KFD v0.6)
Hann Yu-Ju Huang
HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013
HSA Foundation
HSA Design (2015-04-30)
HSA Design (2015-04-30)
Jay Wang
Trusted firmware deep_dive_v1.0_
Trusted firmware deep_dive_v1.0_
Linaro
Linux Initialization Process (1)
Linux Initialization Process (1)
shimosawa
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Linaro
LCU14 500 ARM Trusted Firmware
LCU14 500 ARM Trusted Firmware
Linaro
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward
U-Boot Porting on New Hardware
U-Boot Porting on New Hardware
RuggedBoardGroup
Arm device tree and linux device drivers
Arm device tree and linux device drivers
Houcheng Lin
U Boot or Universal Bootloader
U Boot or Universal Bootloader
Satpal Parmar
Memory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdf
Adrian Huang
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
Adrian Huang
semaphore & mutex.pdf
semaphore & mutex.pdf
Adrian Huang
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Adrian Huang
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
Kernel TLV
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
Linaro
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
Linaro
Making Linux do Hard Real-time
Making Linux do Hard Real-time
National Cheng Kung University
Embedded Android : System Development - Part III (Audio / Video HAL)
Embedded Android : System Development - Part III (Audio / Video HAL)
Emertxe Information Technologies Pvt Ltd
X / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural Overview
Moriyoshi Koizumi
LCU13: An Introduction to ARM Trusted Firmware
LCU13: An Introduction to ARM Trusted Firmware
Linaro
Kernel Recipes 2015: Representing device-tree peripherals in ACPI
Kernel Recipes 2015: Representing device-tree peripherals in ACPI
Anne Nicolas
The Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted Cores
yeokm1
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian Bratt
AMD Developer Central
HSA From A Software Perspective
HSA From A Software Perspective
HSA Foundation
Mais conteúdo relacionado
Mais procurados
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Linaro
LCU14 500 ARM Trusted Firmware
LCU14 500 ARM Trusted Firmware
Linaro
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward
U-Boot Porting on New Hardware
U-Boot Porting on New Hardware
RuggedBoardGroup
Arm device tree and linux device drivers
Arm device tree and linux device drivers
Houcheng Lin
U Boot or Universal Bootloader
U Boot or Universal Bootloader
Satpal Parmar
Memory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdf
Adrian Huang
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
Adrian Huang
semaphore & mutex.pdf
semaphore & mutex.pdf
Adrian Huang
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Adrian Huang
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
Kernel TLV
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
Linaro
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
Linaro
Making Linux do Hard Real-time
Making Linux do Hard Real-time
National Cheng Kung University
Embedded Android : System Development - Part III (Audio / Video HAL)
Embedded Android : System Development - Part III (Audio / Video HAL)
Emertxe Information Technologies Pvt Ltd
X / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural Overview
Moriyoshi Koizumi
LCU13: An Introduction to ARM Trusted Firmware
LCU13: An Introduction to ARM Trusted Firmware
Linaro
Kernel Recipes 2015: Representing device-tree peripherals in ACPI
Kernel Recipes 2015: Representing device-tree peripherals in ACPI
Anne Nicolas
The Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted Cores
yeokm1
Mais procurados
(20)
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
LCU14 500 ARM Trusted Firmware
LCU14 500 ARM Trusted Firmware
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
U-Boot Porting on New Hardware
U-Boot Porting on New Hardware
Arm device tree and linux device drivers
Arm device tree and linux device drivers
U Boot or Universal Bootloader
U Boot or Universal Bootloader
Memory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdf
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
semaphore & mutex.pdf
semaphore & mutex.pdf
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
Making Linux do Hard Real-time
Making Linux do Hard Real-time
Embedded Android : System Development - Part III (Audio / Video HAL)
Embedded Android : System Development - Part III (Audio / Video HAL)
X / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural Overview
LCU13: An Introduction to ARM Trusted Firmware
LCU13: An Introduction to ARM Trusted Firmware
Kernel Recipes 2015: Representing device-tree peripherals in ACPI
Kernel Recipes 2015: Representing device-tree peripherals in ACPI
The Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted Cores
Semelhante a ISCA final presentation - Queuing Model
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian Bratt
AMD Developer Central
HSA From A Software Perspective
HSA From A Software Perspective
HSA Foundation
ISCA Final Presentation - Intro
ISCA Final Presentation - Intro
HSA Foundation
ISCA final presentation - Runtime
ISCA final presentation - Runtime
HSA Foundation
HSA Introduction Hot Chips 2013
HSA Introduction Hot Chips 2013
HSA Foundation
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
Edge AI and Vision Alliance
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAIL
HSA Foundation
HSA Introduction
HSA Introduction
Ofer Rosenberg
Heterogeneous System Architecture Overview
Heterogeneous System Architecture Overview
inside-BigData.com
LCU13: HSA Architecture Presentation
LCU13: HSA Architecture Presentation
Linaro
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)
dibyendu.das
Introducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFire
John Blum
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
DataWorks Summit
ISCA final presentation - Memory Model
ISCA final presentation - Memory Model
HSA Foundation
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
AMD Developer Central
Streaming analytics manager
Streaming analytics manager
Sriharsha Chintalapani
Hive Now Sparks
Hive Now Sparks
DataWorks Summit
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Hortonworks
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA
HSA Foundation
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
Cask Data
Semelhante a ISCA final presentation - Queuing Model
(20)
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA From A Software Perspective
HSA From A Software Perspective
ISCA Final Presentation - Intro
ISCA Final Presentation - Intro
ISCA final presentation - Runtime
ISCA final presentation - Runtime
HSA Introduction Hot Chips 2013
HSA Introduction Hot Chips 2013
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAIL
HSA Introduction
HSA Introduction
Heterogeneous System Architecture Overview
Heterogeneous System Architecture Overview
LCU13: HSA Architecture Presentation
LCU13: HSA Architecture Presentation
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)
Introducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFire
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
ISCA final presentation - Memory Model
ISCA final presentation - Memory Model
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
Streaming analytics manager
Streaming analytics manager
Hive Now Sparks
Hive Now Sparks
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
Mais de HSA Foundation
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
HSA Foundation
Hsa Runtime version 1.00 Provisional
Hsa Runtime version 1.00 Provisional
HSA Foundation
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)
HSA Foundation
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton - Compilations
HSA Foundation
ISCA Final Presentation - Applications
ISCA Final Presentation - Applications
HSA Foundation
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshare
HSA Foundation
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013
HSA Foundation
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
HSA Foundation
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Foundation
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
HSA Foundation
Hsa10 whitepaper
Hsa10 whitepaper
HSA Foundation
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012
HSA Foundation
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime
HSA Foundation
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
HSA Foundation
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.
HSA Foundation
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
HSA Foundation
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSA
HSA Foundation
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is Invaluable
HSA Foundation
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
HSA Foundation
Mais de HSA Foundation
(20)
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
Hsa Runtime version 1.00 Provisional
Hsa Runtime version 1.00 Provisional
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton - Compilations
ISCA Final Presentation - Applications
ISCA Final Presentation - Applications
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshare
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
Hsa10 whitepaper
Hsa10 whitepaper
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSA
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is Invaluable
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
Último
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
Stefano
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
UXDXConf
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
CzechDreamin
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
UXDXConf
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
CzechDreamin
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
CzechDreamin
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
Patrick Viafore
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
Stephanie Beckett
The Metaverse: Are We There Yet?
The Metaverse: Are We There Yet?
Mark Billinghurst
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
FIDO Alliance
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
IoTAnalytics
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FIDO Alliance
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
FIDO Alliance
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
confluent
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
FIDO Alliance
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
CzechDreamin
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
FIDO Alliance
Último
(20)
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
The Metaverse: Are We There Yet?
The Metaverse: Are We There Yet?
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
ISCA final presentation - Queuing Model
1.
HSA QUEUING MODEL HAKAN
PERSSON, SENIOR PRINCIPAL ENGINEER, ARM
2.
HSA QUEUEING, MOTIVATION
3.
MOTIVATION (TODAY’S PICTURE) ©
Copyright 2014 HSA Foundation. All Rights Reserved Application OS GPU Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory
4.
HSA QUEUEING: REQUIREMENTS
5.
REQUIREMENTS Three key
technologies are used to build the user mode queueing mechanism Shared Virtual Memory System Coherency Signaling AQL (Architected Queueing Language) enables any agent enqueue tasks © Copyright 2014 HSA Foundation. All Rights Reserved
6.
SHARED VIRTUAL MEMORY
7.
PHYSICAL MEMORY SHARED VIRTUAL
MEMORY (TODAY) Multiple Virtual memory address spaces © Copyright 2014 HSA Foundation. All Rights Reserved CPU0 GPU VIRTUAL MEMORY1 PHYSICAL MEMORY VA1->PA1 VA2->PA1 VIRTUAL MEMORY2
8.
PHYSICAL MEMORY SHARED VIRTUAL
MEMORY (HSA) Common Virtual Memory for all HSA agents © Copyright 2014 HSA Foundation. All Rights Reserved CPU0 GPU VIRTUAL MEMORY PHYSICAL MEMORY VA->PA VA->PA
9.
SHARED VIRTUAL MEMORY
Advantages No mapping tricks, no copying back-and-forth between different PA addresses Send pointers (not data) back and forth between HSA agents. Implications Common Page Tables (and common interpretation of architectural semantics such as shareability, protection, etc). Common mechanisms for address translation (and servicing address translation faults) Concept of a process address space (PASID) to allow multiple, per process virtual address spaces within the system. © Copyright 2014 HSA Foundation. All Rights Reserved
10.
SHARED VIRTUAL MEMORY
Specifics Minimum supported VA width is 48b for 64b systems, and 32b for 32b systems. HSA agents may reserve VA ranges for internal use via system software. All HSA agents other than the host unit must use the lowest privilege level If present, read/write access flags for page tables must be maintained by all agents. Read/write permissions apply to all HSA agents, equally. © Copyright 2014 HSA Foundation. All Rights Reserved
11.
GETTING THERE … ©
Copyright 2014 HSA Foundation. All Rights Reserved Application OS GPU Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory
12.
CACHE COHERENCY
13.
CACHE COHERENCY DOMAINS
(1/3) Data accesses to global memory segment from all HSA Agents shall be coherent without the need for explicit cache maintenance. © Copyright 2014 HSA Foundation. All Rights Reserved
14.
CACHE COHERENCY DOMAINS
(2/3) Advantages Composability Reduced SW complexity when communicating between agents Lower barrier to entry when porting software Implications Hardware coherency support between all HSA agents Can take many forms Stand alone Snoop Filters / Directories Combined L3/Filters Snoop-based systems (no filter) Etc … © Copyright 2014 HSA Foundation. All Rights Reserved
15.
CACHE COHERENCY DOMAINS
(3/3) Specifics No requirement for instruction memory accesses to be coherent Only applies to the Primary memory type. No requirement for HSA agents to maintain coherency to any memory location where the HSA agents do not specify the same memory attributes Read-only image data is required to remain static during the execution of an HSA kernel. No double mapping (via different attributes) in order to modify. Must remain static © Copyright 2014 HSA Foundation. All Rights Reserved
16.
GETTING CLOSER … ©
Copyright 2014 HSA Foundation. All Rights Reserved Application OS GPU Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory
17.
SIGNALING
18.
SIGNALING (1/3) HSA
agents support the ability to use signaling objects All creation/destruction signaling objects occurs via HSA runtime APIs From an HSA Agent you can directly access signaling objects. Signaling a signal object (this will wake up HSA agents waiting upon the object) Query current object Wait on the current object (various conditions supported). © Copyright 2014 HSA Foundation. All Rights Reserved
19.
SIGNALING (2/3) Advantages
Enables asynchronous events between HSA agents, without involving the kernel Common idiom for work offload Low power waiting Implications Runtime support required Commonly implemented on top of cache coherency flows © Copyright 2014 HSA Foundation. All Rights Reserved
20.
SIGNALING (3/3) Specifics
Only supported within a PASID Supported wait conditions are =, !=, < and >= Wait operations may return sporadically (no guarantee against false positives) Programmer must test. Wait operations have a maximum duration before returning. The HSAIL atomic operations are supported on signal objects. Signal objects are opaque Must use dedicated HSAIL/HSA runtime operations © Copyright 2014 HSA Foundation. All Rights Reserved
21.
ALMOST THERE… © Copyright
2014 HSA Foundation. All Rights Reserved Application OS GPU Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory
22.
USER MODE QUEUING
23.
ONE BLOCK LEFT ©
Copyright 2014 HSA Foundation. All Rights Reserved Application OS GPU Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory
24.
USER MODE QUEUEING
(1/3) User mode Queueing Enables user space applications to directly, without OS intervention, enqueue jobs (“Dispatch Packets”) for HSA agents. Queues are created/destroyed via calls to the HSA runtime. One (or many) agents enqueue packets, a single agent dequeues packets. Requires coherency and shared virtual memory. © Copyright 2014 HSA Foundation. All Rights Reserved
25.
USER MODE QUEUEING
(2/3) Advantages Avoid involving the kernel/driver when dispatching work for an Agent. Lower latency job dispatch enables finer granularity of offload Standard memory protection mechanisms may be used to protect communication with the consuming agent. Implications Packet formats/fields are Architected – standard across vendors! Guaranteed backward compatibility Packets are enqueued/dequeued via an Architected protocol (all via memory accesses and signaling) More on this later…… © Copyright 2014 HSA Foundation. All Rights Reserved
26.
SUCCESS! © Copyright 2014
HSA Foundation. All Rights Reserved Application OS GPU Transfer buffer to GPU Copy/Map Memory Queue Job Schedule Job Start Job Finish Job Schedule Application Get Buffer Copy/Map Memory
27.
SUCCESS! © Copyright 2014
HSA Foundation. All Rights Reserved Application OS GPU Queue Job Start Job Finish Job
28.
ARCHITECTED QUEUEING LANGUAGE, QUEUES
29.
ARCHITECTED QUEUEING LANGUAGE
HSA Queues look just like standard shared memory queues, supporting multi-producer, single-consumer Single producer variant defined with some optimizations possible. Queues consist of storage, read/write indices, ID, etc. Queues are created/destroyed via calls to the HSA runtime “Packets” are placed in queues directly from user mode, via an architected protocol Packet format is architected © Copyright 2014 HSA Foundation. All Rights Reserved Producer Producer Consumer Read Index Write Index Storage in coherent, shared memory Packets
30.
ARCHITECTED QUEUING LANGUAGE
Packets are read and dispatched for execution from the queue in order, but may complete in any order. There is no guarantee that more than one packet will be processed in parallel at a time There may be many queues. A single agent may also consume from several queues. Any HSA agent may enqueue packets CPUs GPUs Other accelerators © Copyright 2014 HSA Foundation. All Rights Reserved
31.
QUEUE STRUCTURE © Copyright
2014 HSA Foundation. All Rights Reserved Offset (bytes) Size (bytes) Field Notes 0 4 queueType Differentiate different queues 4 4 queueFeatures Indicate supported features 8 8 baseAddress Pointer to packet array 16 16 doorbellSignal HSA signaling object handle 24 4 size Packet array cardinality 28 4 queueId Unique per process 32 8 serviceQueue Queue for callback services intrinsic 8 writeIndex Packet array write index intrinsic 8 readIndex Packet array read index
32.
QUEUE VARIANTS queueType
and queueFeatures together define queue semantics and capabilities Two queueType values defined, other values reserved: MULTI – queue supports multiple producers SINGLE – queue supports single producer queueFeatures is a bitfield indicating capabilities DISPATCH (bit 0) if set then queue supports DISPATCH packets AGENT_DISPATCH (bit 1) if set then queue supports AGENT_DISPATCH packets All other bits are reserved and must be 0 © Copyright 2014 HSA Foundation. All Rights Reserved
33.
QUEUE STRUCTURE DETAILS
Queue doorbells are HSA signaling objects with restrictions Created as part of the queue – lifetime tied to queue object Atomic read-modify-write not allowed size field value must be aligned to a power of 2 serviceQueue can be used by HSA kernel for callback services Provided by application when queue is created Can be mapped to HSA runtime provided serviceQueue, an application serviced queue, or NULL if no serviceQueue required © Copyright 2014 HSA Foundation. All Rights Reserved
34.
READ/WRITE INDICES readIndex
and writeIndex properties are part of the queue, but not visible in the queue structure Accessed through HSA runtime API and HSAIL operations HSA runtime/HSAIL operations defined to Read readIndex or writeIndex property Write readIndex or writeIndex property Add constant to writeIndex property (returns previous writeIndex value) CAS on writeIndex property readIndex & writeIndex operations treated as atomic in memory model relaxed, acquire, release and acquire-release variants defined as applicable readIndex and writeIndex never wrap PacketID – the index of a particular packet Uniquely identifies each packet of a queue © Copyright 2014 HSA Foundation. All Rights Reserved
35.
PACKET ENQUEUE Packet
enqueue follows a few simple steps: Reserve space Multiple packets can be reserved at a time Write packet to queue Mark packet as valid Producer no longer allowed to modify packet Consumer is allowed to start processing packet Notify consumer of packet through the queue doorbell Multiple packets can be notified at a time Doorbell signal should be signaled with last packetID notified On small machine model the lower 32 bits of the packetID are used © Copyright 2014 HSA Foundation. All Rights Reserved
36.
PACKET RESERVATION Two
flows envisaged Atomic add writeIndex with number of packets to reserve Producer must wait until packetID < readIndex + size before writing to packet Queue can be sized so that wait is unlikely (or impossible) Suitable when many threads use one queue Check queue not full first, then use atomic CAS to update writeIndex Can be inefficient if many threads use the same queue Allows different failure model if queue is congested © Copyright 2014 HSA Foundation. All Rights Reserved
37.
QUEUE OPTIMIZATIONS Queue
behavior is loosely defined to allow optimizations Some potential producer behavior optimizations: Keep local copy of readIndex, update when required For single producer queues: Keep local copy of writeIndex Use store operation rather than add/cas atomic to update writeIndex Some potential consumer behavior optimizations: Use packet format field to determine whether a packet has been submitted rather than writeIndex property Speculatively read multiple packets from the queue Not update readIndex for each packet processed Rely on value used for doorbellSignal to notify new packets Especially useful for single producer queues © Copyright 2014 HSA Foundation. All Rights Reserved
38.
POTENTIAL MULTI-PRODUCER ALGORITHM //
Allocate packet uint64_t packetID = hsa_queue_add_write_index_relaxed(q, 1); // Wait until the queue is no longer full. uint64_t rdIdx; do { rdIdx = hsa_queue_load_read_index_relaxed(q); } while (packetID >= (rdIdx + q->size)); // calculate index uint32_t arrayIdx = packetID & (q->size-1); // copy over the packet, the format field is INVALID q->baseAddress[arrayIdx] = pkt; // Update format field with release semantics q->baseAddress[index].hdr.format.store(DISPATCH, std::memory_order_release); // ring doorbell, with release semantics (could also amortize over multiple packets) hsa_signal_send_relaxed(q->doorbellSignal, packetID); © Copyright 2014 HSA Foundation. All Rights Reserved
39.
POTENTIAL CONSUMER ALGORITHM //
Get location of next packet uint64_t readIndex = hsa_queue_load_read_index_relaxed(q); // calculate the index uint32_t arrayIdx = readIndex & (q->size-1); // spin while empty (could also perform low-power wait on doorbell) while (INVALID == q->baseAddress[arrayIdx].hdr.format) { } // copy over the packet pkt = q->baseAddress[arrayIdx]; // set the format field to invalid q->baseAddress[arrayIdx].hdr.format.store(INVALID, std::memory_order_relaxed); // Update the readIndex using HSA intrinsic hsa_queue_store_read_index_relaxed(q, readIndex+1); // Now process <pkt>! © Copyright 2014 HSA Foundation. All Rights Reserved
40.
ARCHITECTED QUEUEING LANGUAGE, PACKETS
41.
PACKETS © Copyright 2014
HSA Foundation. All Rights Reserved Packets come in three main types with architected layouts Always reserved & Invalid Do not contain any valid tasks and are not processed (queue will not progress) Dispatch Specifies kernel execution over a grid Agent Dispatch Specifies a single function to perform with a set of parameters Barrier Used for task dependencies
42.
COMMON PACKET HEADER Start
Offset (Bytes) Format Field Name Description 0 uint16_t format:8 Contains the packet type (Always reserved, Invalid, Dispatch, Agent Dispatch, and Barrier). Other values are reserved and should not be used. barrier:1 If set then processing of packet will only begin when all preceding packets are complete. acquireFenceScope:2 Determines the scope and type of the memory fence operation applied before the packet enters the active phase. Must be 0 for Barrier Packets. releaseFenceScope:2 Determines the scope and type of the memory fence operation applied after kernel completion but before the packet is completed. reserved:3 Must be 0 © Copyright 2014 HSA Foundation. All Rights Reserved
43.
DISPATCH PACKET © Copyright
2014 HSA Foundation. All Rights Reserved Start Offset (Bytes) Format Field Name Description 0 uint16_t header Packet header 2 uint16_t dimensions:2 Number of dimensions specified in gridSize. Valid values are 1, 2, or 3. reserved:14 Must be 0. 4 uint16_t workgroupSize.x x dimension of work-group (measured in work-items). 6 uint16_t workgroupSize.y y dimension of work-group (measured in work-items). 8 uint16_t workgroupSize.z z dimension of work-group (measured in work-items). 10 uint16_t reserved2 Must be 0. 12 uint32_t gridSize.x x dimension of grid (measured in work-items). 16 uint32_t gridSize.y y dimension of grid (measured in work-items). 20 uint32_t gridSize.z z dimension of grid (measured in work-items). 24 uint32_t privateSegmentSizeBytes Total size in bytes of private memory allocation request (per work-item). 28 uint32_t groupSegmentSizeBytes Total size in bytes of group memory allocation request (per work-group). 32 uint64_t kernelObjectAddress Address of an object in memory that includes an implementation-defined executable ISA image for the kernel. 40 uint64_t kernargAddress Address of memory containing kernel arguments. 48 uint64_t reserved3 Must be 0. 56 uint64_t completionSignal Address of HSA signaling object used to indicate completion of the job.
44.
AGENT DISPATCH PACKET ©
Copyright 2014 HSA Foundation. All Rights Reserved Start Offset (Bytes) Format Field Name Description 0 uint16_t header Packet header 2 uint16_t type The function to be performed by the destination Agent. The type value is split into the following ranges: 0x0000:0x3FFF – Vendor specific 0x4000:0x7FFF – HSA runtime 0x8000:0xFFFF – User registered function 4 uint32_t reserved2 Must be 0. 8 uint64_t returnLocation Pointer to location to store the function return value in. 16 uint64_t arg[0] 64-bit direct or indirect arguments. 24 uint64_t arg[1] 32 uint64_t arg[2] 40 uint64_t arg[3] 48 uint64_t reserved3 Must be 0. 56 uint64_t completionSignal Address of HSA signaling object used to indicate completion of the job.
45.
BARRIER PACKET Used
for specifying dependences between packets HSA agent will not launch any further packets from this queue until the barrier packet signal conditions are met Used for specifying dependences on packets dispatched from any queue. Execution phase completes only when all of the dependent signals (up to five) have been signaled (with the value of 0). Or if an error has occurred in one of the packets upon which we have a dependence. © Copyright 2014 HSA Foundation. All Rights Reserved
46.
BARRIER PACKET © Copyright
2014 HSA Foundation. All Rights Reserved Start Offset (Bytes) Format Field Name Description 0 uint16_t header Packet header, see 2.8.1 Packet header (p. 16). 2 uint16_t reserved2 Must be 0. 4 uint32_t reserved3 Must be 0. 8 uint64_t depSignal0 Address of dependent signaling objects to be evaluated by the packet processor. 16 uint64_t depSignal1 24 uint64_t depSignal2 32 uint64_t depSignal3 40 uint64_t depSignal4 48 uint64_t reserved4 Must be 0. 56 uint64_t completionSignal Address of HSA signaling object used to indicate completion of the job.
47.
DEPENDENCES A user
may never assume more than one packet is being executed by an HSA agent at a time. Implications: Packets can’t poll on shared memory values which will be set by packets issued from other queues, unless the user has ensured the proper ordering. To ensure all previous packets from a queue have been completed, use the Barrier bit. To ensure specific packets from any queue have completed, use the Barrier packet. © Copyright 2014 HSA Foundation. All Rights Reserved
48.
HSA QUEUEING, PACKET
EXECUTION
49.
PACKET EXECUTION Launch
phase Initiated when launch conditions are met All preceding packets in the queue must have exited launch phase If the barrier bit in the packet header is set, then all preceding packets in the queue must have exited completion phase Includes memory acquire fence Active phase Execute the packet Barrier packets remain in Active phase until conditions are met. Completion phase First step is memory release fence – make results visible. completionSignal field is then signaled with a decrementing atomic. © Copyright 2014 HSA Foundation. All Rights Reserved
50.
PACKET EXECUTION –
BARRIER BIT © Copyright 2014 HSA Foundation. All Rights Reserved Pkt1 Launch Pkt2 Launch Pkt1 Execute Pkt2 Execute Pkt1 Complete Pkt3 Launch (barrier=1) Pkt2 Complete Pkt3 Execute Time Pkt3 launches whenall packets in the queue have completed.
51.
PUTTING IT ALL
TOGETHER (FFT) © Copyright 2014 HSA Foundation. All Rights Reserved Packet 1 Packet 2 Packet 3 Packet 4 Packet 5 Packet 6 Barrier Barrier X[0] X[1] X[2] X[3] X[4] X[5] X[6] X[7] Time
52.
PUTTING IT ALL
TOGETHER © Copyright 2014 HSA Foundation. All Rights Reserved AQL Pseudo Code // Send the packets to do the first stage. aql_dispatch(pkt1); aql_dispatch(pkt2); // Send the next two packets, setting the barrier bit so we // know packets 1 & 2 will be complete before 3 and 4 are // launched. aql_dispatch_with _barrier_bit(pkt3); aql_dispatch(pkt4); // Same as above (make sure 3 & 4 are done before issuing 5 // & 6) aql_dispatch_with_barrier_bit(pkt5); aql_dispatch(pkt6); // This packet will notify us when 5 & 6 are complete) aql_dispatch_with_barrier_bit(finish_pkt);
53.
PACKET EXECUTION –
BARRIER PACKET © Copyright 2014 HSA Foundation. All Rights Reserved Barrier T2Q2 T1Q1 Signal X init to 1 depSignal0 completionSignal Time Decrements signal X Barrier Launch T1 Launch Barrier Execute T1 Execute Barrier Complete T1 Complete T2 Launch T2 Execute T2 Complete Barrier completes when signal X signalled with 0 T2 launches once barrier complete
54.
DEPTH FIRST CHILD
TASK EXECUTION Consider two generations of child tasks Task T submits tasks T.1 & T.2 Task T.1 submits tasks T.1.1 & T.1.2 Task T.2 submits tasks T.2.1 & T.2.2 Desired outcome Depth first child task execution I.e. T T1 T.1.1 T.1.2 T.2 T.2.1 T.2.2 T passed signal (allComplete) to decrement when all tasks are complete (T and its children etc) © Copyright 2014 HSA Foundation. All Rights Reserved T T.2.2T.1.2T.1.2T.1.1 T.1 T.2
55.
HOW TO DO
THIS WITH HSA QUEUES? Use a separate user mode queue for each recursion level Task T submits to queue Q1 Tasks T.1 & T.2 submits tasks to queue Q2 Queues could be passed in as parameters to task T Depth first requires ordering of T.1, T.2 and their children Use additional signal object (childrenComplete) to track completion of the children of T.1 & T.2 childrenComplete set to number of children (i.e. 2) by each of T.1 & T.2 © Copyright 2014 HSA Foundation. All Rights Reserved
56.
A PICTURE SAYS
MORE THAN 1000 WORDS © Copyright 2014 HSA Foundation. All Rights Reserved T T.2.2T.1.2T.1.2T.1.1 T.1 T.2 T.1 Barrier T.2 BarrierQ1 Wait on childrenComplete Signal allComplete T.1.1 T.1.2 T.2.1 T.2.2Q2
57.
SUMMARY © Copyright 2014
HSA Foundation. All Rights Reserved
58.
KEY HSA TECHNOLOGIES
HSA combines several mechanisms to enable low overhead task dispatch Shared Virtual Memory System Coherency Signaling AQL User mode queues – from any compatible agent Architected packet format Rich dependency mechanism Flexible and efficient signaling of completion © Copyright 2014 HSA Foundation. All Rights Reserved
59.
QUESTIONS? © Copyright 2014
HSA Foundation. All Rights Reserved
Baixar agora