SlideShare a Scribd company logo
1 of 35
Ti DSP Optimization over Jacinto
Hank
2015/06/09
Generation
● 2002 OMAP
● 2006 Jacinto 1
● 2008 Jacinto 3
● 2010 Jacinto 4/5
● 2016 Jacinto 6
OMAP
– Application:
● CD-DA and CD-
ROM/DVD-
ROM/USB/SD with
MP3, WMA, and
AAC audio decoder
support
– Software platform:
● Cooperate with
QNX Software
Systems
Jacinto 1-DDR
– Application:
● Compressed video
playback, Bluetooth
A2DP audio
streaming
– Improvement:
● C64x+ fixed-point
for graphics
acceleration,
compressed audio
decoding, voice
recognition
Jacinto 3
– Application:
● Compressed video
playback, Bluetooth
A2DP audio
streaming
– Hardware
improvement:
● ARM Cortex A8
● GPU PowerVR
SGX
Jacinto 4/5
– Application:
● Full-HD 1080p
video
decode/endcode
● QNX CAR 2
platfomr
– Hardware
improvement:
● Dual ARM Cortex-
M3-used for
decoding video
stream
● C674x DSP
Jacinto 6
– Application:
● Advanced Driver
Assistance System
(ADAS)
– Hardware
improvement:
● ARM Cortex A15
● DSP C66x
● GPU SGX544
DSP Generation
C6000 DSP Optimization
● Code generation tool support languages:
– ANSI C (C89)
– ISO C++ (C++98)
– C6000 DSP assembly
– C6000 linear assembly
Optimization-Five key concepts
● Core(architecture)
– parallel processing
● Pipeline
– High throughput
● Software pipelining
– Instruction scheduling
● Compiler optimization
● Optimizied software library
– Intrinsic opertions in C6000, inlined functions
C6000 Core
● 8 paralleral function unit
– D: data load/store
(.D1, .D2)
– S: shift, branch(.S1, .S2)
– M: mulitply(.M1, .M2)
– L: logic, arithmetic
operations(.L1, .L2)
C6000 Core (conti.)
● 32 32-bit registers for each
side of function units
– A0-A31(.D1, .S1, .M1, .L1)
– B0-B31(.D2, .S2, .M2, .L2)
● Separate program and
data memory (L1P, L1D)
● 256-bit internal program
bus- fetch 8 32-bit
instructions from L1P
every cycle
● 2 64-bit internal data buses
that allows both .D1 and
.D2 to fetch data from L1D
every cycle
Core optimizationC++C++
Compiled parallel
Assembly
Pseudo assembly
Pipeline
F: fetch D: decode E: execute
C6000 pipeline
● Divide fetch, decode, execute into more
substages: 4-stage fetch, 2-stage decode, 10-
stage execute
Delay slots
● Pipeline will not optimize
– Current instruction depends on results of previous
instruction and it takes more than 1 cycle
– A branch is performed
● Solution
– Software scheduling (software pipelining)
– Hardware enhancement (SPLOOP buffer)
Software pipelining
● Enable
● Codes in C, just
add compiler
option -o3 to
enable software
pipelining
● Drawback
● Assembly code
size increases
● Solution
● Software
pipeline loop
buffer
SPLOOP buffer
● Support platform
– C64x+, C674x, and C66x
● SPLOOP buffer sotres a single scheduled
iteration of the loop in a specialized buffer
● C compiler automatically utilize SPLOOP
● Cannot handle loops that exceed 14 execute
packets(most 8 instructions/execute packet)
– Nested loops, conditional branches inside loops,
function calls inside loops
Compiler Optimization
● Using C compiler to generate assembly codes
that utilize C6000 functional units and pipeline
as fully as possible
– Add additional information and instructions help
compiler maximally optimize your codes
● Compiler options, e.g. -o3
● Keywords(C or C6000), e.g, restrict
● Pragma directives, e.g. MUST_ITERATE
– Understand compiler feedback
Loop qualification (option -k -mw)
Compiler feedback (option -k -mw)
Dependency & resource information
● Minimize iteration interval
– The loop carried dependency bound
● Distance of the largest loop carry path
– Partitioned resource bound
● Maximum number of cycles any functional unit is used in
a single iteration
Loop carried path
Explicit code optimization
● Previous solution is suitable but
– Function calls in a loop
– Complex, hard-to-implement operations
● Solutions – explicit code optimization
– Intrinsic operations
– Optimized C6000 DSP libraries
– C inline functions
Intrinsic operations
● Sample
– Shuffle operation seperates even and odd bits of a 32-
bit value into two variables
● Intrinsic operations
– Function-like statements
– Leading underscore, e.g. _shfl
– Not a function call, no branch needed
● Lists in “TMS320C6000 Optimizing Compiler v7.6
User's Guide“
● Devices depend
● _abs could be used directly
Optimized DSP software libraries
● Fundational Math & signal processing
– MathLIB
– IQMath
– FastRTS
– DSPLIB
● Adaptive filtering, matrix computations
● Image & video processing
– IMGLIB
– Video Analytics & Vision Library (VLIB)
– VICP Signal Processing Library
Inline functions
● Pros
– To reduce overhead of a function call
– Make optimizer perform loop optimization
● Cons
– Size of codes increases
● To use
– Use -O2 or -O3 to automatically make functions
inline
– Use explicit inline keyword
Optimization flow
Profiling
Optimization practice
● Use –o3 and consider –mt for optimization; use –k and consider –
mw for compiler feedback (mt : assume all pointers in loop are
independent)
● Apply the restrict keyword to minimize loop carried dependency
bound (alternative to mt)
● Use the MUST_ITERATE and UNROLL pragmas to optimize
pipeline usage
● Choose the smallest applicable data type and ensure proper data
alignment to help compiler invoke
● Single Instruction Multiple Data (SIMD) operations
● Use intrinsic operations and TI libraries in case major code
modification is needed (avoid standard I/O functions)
Using pragma
●
● Without minimum iterate count, compiler needs
to assume it will iterate once
– Providing factor gives compiler freedom to loop
unrolling
Unbalanced resource partition
Manual unroll
Compiler unroll
Reference
●
Texas Instruments, 『 Introduction to TMS320C6000 DSP
Optimization 』
– Recommended to read first
●
Texas Instruments, 『 In-Vehicle Connectivity is So
Retro 』
●
Texas Instruments, 『 TMS320C6000 Programmer's
Guide 』
●
『 TMS320C6000 Optimizing Compiler v7.6 User's

More Related Content

What's hot

Angelo Compagnucci - Upgrading buildroot based devices with swupdate
Angelo Compagnucci - Upgrading buildroot based devices with swupdateAngelo Compagnucci - Upgrading buildroot based devices with swupdate
Angelo Compagnucci - Upgrading buildroot based devices with swupdate
linuxlab_conf
 
Uboot startup sequence
Uboot startup sequenceUboot startup sequence
Uboot startup sequence
Houcheng Lin
 

What's hot (20)

Embedded Linux on ARM
Embedded Linux on ARMEmbedded Linux on ARM
Embedded Linux on ARM
 
Angelo Compagnucci - Upgrading buildroot based devices with swupdate
Angelo Compagnucci - Upgrading buildroot based devices with swupdateAngelo Compagnucci - Upgrading buildroot based devices with swupdate
Angelo Compagnucci - Upgrading buildroot based devices with swupdate
 
Linux Internals - Part II
Linux Internals - Part IILinux Internals - Part II
Linux Internals - Part II
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
 
Embedded Android : System Development - Part II (Linux device drivers)
Embedded Android : System Development - Part II (Linux device drivers)Embedded Android : System Development - Part II (Linux device drivers)
Embedded Android : System Development - Part II (Linux device drivers)
 
U-Boot presentation 2013
U-Boot presentation  2013U-Boot presentation  2013
U-Boot presentation 2013
 
Automotive embedded systems part8 v1
Automotive embedded systems part8 v1Automotive embedded systems part8 v1
Automotive embedded systems part8 v1
 
File systems for Embedded Linux
File systems for Embedded LinuxFile systems for Embedded Linux
File systems for Embedded Linux
 
Basic Linux Internals
Basic Linux InternalsBasic Linux Internals
Basic Linux Internals
 
Linux Internals - Part I
Linux Internals - Part ILinux Internals - Part I
Linux Internals - Part I
 
Hardware Abstraction Layer
Hardware Abstraction LayerHardware Abstraction Layer
Hardware Abstraction Layer
 
Using QEMU for cross development
Using QEMU for cross developmentUsing QEMU for cross development
Using QEMU for cross development
 
Uboot startup sequence
Uboot startup sequenceUboot startup sequence
Uboot startup sequence
 
Linux Internals - Interview essentials 4.0
Linux Internals - Interview essentials 4.0Linux Internals - Interview essentials 4.0
Linux Internals - Interview essentials 4.0
 
Gnome on wayland at a glance
Gnome on wayland at a glanceGnome on wayland at a glance
Gnome on wayland at a glance
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
Linux Internals - Interview essentials - 1.0
Linux Internals - Interview essentials - 1.0Linux Internals - Interview essentials - 1.0
Linux Internals - Interview essentials - 1.0
 
Embedded Android : System Development - Part II (HAL)
Embedded Android : System Development - Part II (HAL)Embedded Android : System Development - Part II (HAL)
Embedded Android : System Development - Part II (HAL)
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EAS
 

Similar to Ti DSP optimization on Jacinto

ELC 2016 - I2C hacking demystified
ELC 2016 - I2C hacking demystifiedELC 2016 - I2C hacking demystified
ELC 2016 - I2C hacking demystified
Igor Stoppa
 
A Methodology for Automatic GPU Kernel Optimization
A Methodology for Automatic GPU Kernel OptimizationA Methodology for Automatic GPU Kernel Optimization
A Methodology for Automatic GPU Kernel Optimization
NECST Lab @ Politecnico di Milano
 
Target updated track f
Target updated   track fTarget updated   track f
Target updated track f
Alona Gradman
 

Similar to Ti DSP optimization on Jacinto (20)

CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
OMAP
OMAPOMAP
OMAP
 
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
Introduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSPIntroduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSP
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
 
ELC 2016 - I2C hacking demystified
ELC 2016 - I2C hacking demystifiedELC 2016 - I2C hacking demystified
ELC 2016 - I2C hacking demystified
 
Eugene Khvedchenia - Image processing using FPGAs
Eugene Khvedchenia - Image processing using FPGAsEugene Khvedchenia - Image processing using FPGAs
Eugene Khvedchenia - Image processing using FPGAs
 
IBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache SparkIBM Runtimes Performance Observations with Apache Spark
IBM Runtimes Performance Observations with Apache Spark
 
Improving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberImproving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at Uber
 
FIWARE Tech Summit - Stream Processing with Kurento Media Server
FIWARE Tech Summit - Stream Processing with Kurento Media ServerFIWARE Tech Summit - Stream Processing with Kurento Media Server
FIWARE Tech Summit - Stream Processing with Kurento Media Server
 
BAXTER phase 1b
BAXTER phase 1bBAXTER phase 1b
BAXTER phase 1b
 
A Methodology for Automatic GPU Kernel Optimization
A Methodology for Automatic GPU Kernel OptimizationA Methodology for Automatic GPU Kernel Optimization
A Methodology for Automatic GPU Kernel Optimization
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance Observations
 
Codasip application class RISC-V processor solutions
Codasip application class RISC-V processor solutionsCodasip application class RISC-V processor solutions
Codasip application class RISC-V processor solutions
 
Dsp on an-avr
Dsp on an-avrDsp on an-avr
Dsp on an-avr
 
An Overview of LPC2101/02/03
An Overview of LPC2101/02/03An Overview of LPC2101/02/03
An Overview of LPC2101/02/03
 
Target updated track f
Target updated   track fTarget updated   track f
Target updated track f
 

Recently uploaded

Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 

Recently uploaded (20)

Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 

Ti DSP optimization on Jacinto

  • 1. Ti DSP Optimization over Jacinto Hank 2015/06/09
  • 2. Generation ● 2002 OMAP ● 2006 Jacinto 1 ● 2008 Jacinto 3 ● 2010 Jacinto 4/5 ● 2016 Jacinto 6
  • 3. OMAP – Application: ● CD-DA and CD- ROM/DVD- ROM/USB/SD with MP3, WMA, and AAC audio decoder support – Software platform: ● Cooperate with QNX Software Systems
  • 4. Jacinto 1-DDR – Application: ● Compressed video playback, Bluetooth A2DP audio streaming – Improvement: ● C64x+ fixed-point for graphics acceleration, compressed audio decoding, voice recognition
  • 5. Jacinto 3 – Application: ● Compressed video playback, Bluetooth A2DP audio streaming – Hardware improvement: ● ARM Cortex A8 ● GPU PowerVR SGX
  • 6. Jacinto 4/5 – Application: ● Full-HD 1080p video decode/endcode ● QNX CAR 2 platfomr – Hardware improvement: ● Dual ARM Cortex- M3-used for decoding video stream ● C674x DSP
  • 7. Jacinto 6 – Application: ● Advanced Driver Assistance System (ADAS) – Hardware improvement: ● ARM Cortex A15 ● DSP C66x ● GPU SGX544
  • 9. C6000 DSP Optimization ● Code generation tool support languages: – ANSI C (C89) – ISO C++ (C++98) – C6000 DSP assembly – C6000 linear assembly
  • 10. Optimization-Five key concepts ● Core(architecture) – parallel processing ● Pipeline – High throughput ● Software pipelining – Instruction scheduling ● Compiler optimization ● Optimizied software library – Intrinsic opertions in C6000, inlined functions
  • 11. C6000 Core ● 8 paralleral function unit – D: data load/store (.D1, .D2) – S: shift, branch(.S1, .S2) – M: mulitply(.M1, .M2) – L: logic, arithmetic operations(.L1, .L2)
  • 12. C6000 Core (conti.) ● 32 32-bit registers for each side of function units – A0-A31(.D1, .S1, .M1, .L1) – B0-B31(.D2, .S2, .M2, .L2) ● Separate program and data memory (L1P, L1D) ● 256-bit internal program bus- fetch 8 32-bit instructions from L1P every cycle ● 2 64-bit internal data buses that allows both .D1 and .D2 to fetch data from L1D every cycle
  • 14. Pipeline F: fetch D: decode E: execute
  • 15. C6000 pipeline ● Divide fetch, decode, execute into more substages: 4-stage fetch, 2-stage decode, 10- stage execute
  • 16. Delay slots ● Pipeline will not optimize – Current instruction depends on results of previous instruction and it takes more than 1 cycle – A branch is performed ● Solution – Software scheduling (software pipelining) – Hardware enhancement (SPLOOP buffer)
  • 17. Software pipelining ● Enable ● Codes in C, just add compiler option -o3 to enable software pipelining ● Drawback ● Assembly code size increases ● Solution ● Software pipeline loop buffer
  • 18. SPLOOP buffer ● Support platform – C64x+, C674x, and C66x ● SPLOOP buffer sotres a single scheduled iteration of the loop in a specialized buffer ● C compiler automatically utilize SPLOOP ● Cannot handle loops that exceed 14 execute packets(most 8 instructions/execute packet) – Nested loops, conditional branches inside loops, function calls inside loops
  • 19. Compiler Optimization ● Using C compiler to generate assembly codes that utilize C6000 functional units and pipeline as fully as possible – Add additional information and instructions help compiler maximally optimize your codes ● Compiler options, e.g. -o3 ● Keywords(C or C6000), e.g, restrict ● Pragma directives, e.g. MUST_ITERATE – Understand compiler feedback
  • 20. Loop qualification (option -k -mw) Compiler feedback (option -k -mw)
  • 21. Dependency & resource information ● Minimize iteration interval – The loop carried dependency bound ● Distance of the largest loop carry path – Partitioned resource bound ● Maximum number of cycles any functional unit is used in a single iteration
  • 23. Explicit code optimization ● Previous solution is suitable but – Function calls in a loop – Complex, hard-to-implement operations ● Solutions – explicit code optimization – Intrinsic operations – Optimized C6000 DSP libraries – C inline functions
  • 24. Intrinsic operations ● Sample – Shuffle operation seperates even and odd bits of a 32- bit value into two variables ● Intrinsic operations – Function-like statements – Leading underscore, e.g. _shfl – Not a function call, no branch needed ● Lists in “TMS320C6000 Optimizing Compiler v7.6 User's Guide“ ● Devices depend ● _abs could be used directly
  • 25.
  • 26. Optimized DSP software libraries ● Fundational Math & signal processing – MathLIB – IQMath – FastRTS – DSPLIB ● Adaptive filtering, matrix computations ● Image & video processing – IMGLIB – Video Analytics & Vision Library (VLIB) – VICP Signal Processing Library
  • 27. Inline functions ● Pros – To reduce overhead of a function call – Make optimizer perform loop optimization ● Cons – Size of codes increases ● To use – Use -O2 or -O3 to automatically make functions inline – Use explicit inline keyword
  • 30. Optimization practice ● Use –o3 and consider –mt for optimization; use –k and consider – mw for compiler feedback (mt : assume all pointers in loop are independent) ● Apply the restrict keyword to minimize loop carried dependency bound (alternative to mt) ● Use the MUST_ITERATE and UNROLL pragmas to optimize pipeline usage ● Choose the smallest applicable data type and ensure proper data alignment to help compiler invoke ● Single Instruction Multiple Data (SIMD) operations ● Use intrinsic operations and TI libraries in case major code modification is needed (avoid standard I/O functions)
  • 31. Using pragma ● ● Without minimum iterate count, compiler needs to assume it will iterate once – Providing factor gives compiler freedom to loop unrolling
  • 35. Reference ● Texas Instruments, 『 Introduction to TMS320C6000 DSP Optimization 』 – Recommended to read first ● Texas Instruments, 『 In-Vehicle Connectivity is So Retro 』 ● Texas Instruments, 『 TMS320C6000 Programmer's Guide 』 ● 『 TMS320C6000 Optimizing Compiler v7.6 User's

Editor's Notes

  1. Distance of the largest loop carry path: 產生的值與會被用到該值最遠的 iteration值