SlideShare uma empresa Scribd logo
1 de 13
Baixar para ler offline
SPEC2K Investigations and Next Step
Discussion
Kugan Vivekanandarajah, 30/10/03
www.linaro.org
● Objective and Set-up
● What is not considered
● What we found - Areas for further study
○ Register allocation issues
○ Missed Redundancy elimination in address calculations
○ Rematerialisation opportunity
○ Optimizations impacting register allocation
○ Opportunity to further conditionalize instructions
○ Others
● What next
Outline
www.linaro.org
● Study gcc 32-bit ARM performance for optimisation
opportunities
● Study how GCC optimization for 32-bit ARM fares compared
to other architectures
○ SPEC2K INT benchmarks are run in Chromebook (a15) and analyzed
○ numbers shown are the % difference in geomean compared to base
case being with -O3
○ GCC trunk version 20130922 is used
○ 252.eon (did not compile) and 254.gap (did not run correctly) are
excluded in the results
Objective of this study
www.linaro.org
● Tuning for specific Micro-architecture of 32-bit ARM is not
considered
● No consideration for specific microarchitecture
○ optimality of scheduling for the pipeline
○ data layout for cache architecture
○ influence of branch predictor performance
○ etc.
● Not interested in absolute numbers and all the results are
relative performance (%) compared to the program compiled
with -O3
What is not considered
www.linaro.org
● Register allocation Issues
○ tuning register allocation heuristics for arm
● Making optimizations register pressure aware
● Transformations that also reduce register pressure
● Rematerialization
● Further conditionalize instructions
What we found - Areas for further study
www.linaro.org
● GCC register allocator performs badly compared to x86 for
32-bit ARM
○ They are generally tuned for x86 like instructions where number of
live-ranges are much lesser for same code
● Tuning IRA and Reload (LRA) heuristics
○ In some cases, register allocator performs better with
combinations of -fira-loop-pressure -fsched-pressure -fira-region=one
-fira-algorithm=priority compared default at -O3
○ These heuristics may need tuning for 32-bit ARM
● New pass to reduce register pressure to help further
○ Scheduling for register pressure
○ Default fsched-pressure helps neon
Register allocation
-fsched-
pressure
-fira-algorithm=priority -fira-region=one
SPEC INT 0.77% 3.22% -0.66%
www.linaro.org
● Transformation that also reduce register pressure
● Redundancy Elimination in address calculation
○ Redundancy in address calculation such as
■ (struct_array[i + 1]->cost < struct_array[i]->cost)
■ switch (struct_array_2[index].mem) {
.....
struct_array_2[index].mem = VAL;
}
○
Redundancy in address calculations
if (strArray[i+1].cost < strArray[i].cost) add r3, r0, #1
movw r1, #:lower16:strArray
movt r1, #:upper16:strArray
add ip, r1, r0, asl #3
add r2, r1, r3, asl #3
flds s15, [ip]
flds s14, [r2]
fcmpes s14, s15
www.linaro.org
● Sharing labels for globals
○ Labels identify the start address for globals and loading them live
range
○ Label can be shared for multiple global static variables
○ rearranging them will help even when offset does not fit within 4095
bytes (for arm)
Redundancy in address calculations
...
movw r4, #:lower16:.LANCHOR0
movt r4, #:upper16:.LANCHOR0
...
movw r3, #:lower16:.LANCHOR1
ldr r2, [r4]
movt r3, #:upper16:.LANCHOR1
...
.bss
.align 2
.LANCHOR0 = . + 0
.LANCHOR1 = . + 8184
.type globalInt, %object
.size globalInt, 4
static struct str globalStrArray2[100];
static struct str globalStrArray[1000];
static int globalInt;
int bar()
{
globalInt = 22;
set ();
globalInt += globalStrArray[2].i;
globalStrArray[77].i = globalInt - 1;
globalStrArray2[77].i = globalInt;
return 0;
}
www.linaro.org
● GCSE makes live ranges larger that results in increased
register pressure
○ For cases like ply +1 and (ply + 1) * 4 gcse keeps the value ply+1
and increases register pressure
○ In this case, -fno-gcse improves performance.
● Loop invariant code motion
○ Especially constant loads with movw/movt are moved out and spilled
resulting in memory loads
Making optimizations register pressure
aware
-fno-gcse -fno-move-loop-invariant
SPEC INT 2.33% 1.15%
www.linaro.org
● Immediate loads of integer/float constants
● Loads from literal pools
● Address calculations
○ Computing a constant offset from the stack
○ Pointer Computing a constant offset from the static data area pointer
Rematerialize instead of spilling it to stack
position[ply+1]=position[ply]; add ip, r0, #1
str ip, [sp, #228]
movw fp, #:lower16:position
movt fp, #:upper16:position
mov ip, r0, asl #2
str ip, [sp, #284]
ldr r3, [fp, ip] @ unaligned
ldr ip, [sp, #228]
str r3, [fp, ip, asl #2] @ unaligned
www.linaro.org
● Using Conditionalized instructions
○ How much to conditionalize is very much microarchitecture
dependent.
○ Performance of branch predictor and the ability of predicting types of
branch in the instruction sequence matters
● Simple if-then-else blocks are not conditionalized
○ Some intentional and others not
● Conditional compares are also missed in many places
○ Currently being looked at
○
Conditionalize Instructions
www.linaro.org
What next?
More about Linaro: http://www.linaro.org/about/
More about Linaro engineering: http://www.linaro.org/engineering/
How to join: http://www.linaro.org/about/how-to-join
Linaro members: www.linaro.org/members
connect.linaro.org

Mais conteúdo relacionado

Mais de Linaro

Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...Linaro
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramLinaro
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNLinaro
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...Linaro
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...Linaro
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionLinaro
 
HKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersHKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersLinaro
 

Mais de Linaro (20)

Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NN
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: Introduction
 
HKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersHKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 Servers
 

Último

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Último (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

LCU13: SPEC Investigations and Next Steps Discussion

  • 1. SPEC2K Investigations and Next Step Discussion Kugan Vivekanandarajah, 30/10/03
  • 2. www.linaro.org ● Objective and Set-up ● What is not considered ● What we found - Areas for further study ○ Register allocation issues ○ Missed Redundancy elimination in address calculations ○ Rematerialisation opportunity ○ Optimizations impacting register allocation ○ Opportunity to further conditionalize instructions ○ Others ● What next Outline
  • 3. www.linaro.org ● Study gcc 32-bit ARM performance for optimisation opportunities ● Study how GCC optimization for 32-bit ARM fares compared to other architectures ○ SPEC2K INT benchmarks are run in Chromebook (a15) and analyzed ○ numbers shown are the % difference in geomean compared to base case being with -O3 ○ GCC trunk version 20130922 is used ○ 252.eon (did not compile) and 254.gap (did not run correctly) are excluded in the results Objective of this study
  • 4. www.linaro.org ● Tuning for specific Micro-architecture of 32-bit ARM is not considered ● No consideration for specific microarchitecture ○ optimality of scheduling for the pipeline ○ data layout for cache architecture ○ influence of branch predictor performance ○ etc. ● Not interested in absolute numbers and all the results are relative performance (%) compared to the program compiled with -O3 What is not considered
  • 5. www.linaro.org ● Register allocation Issues ○ tuning register allocation heuristics for arm ● Making optimizations register pressure aware ● Transformations that also reduce register pressure ● Rematerialization ● Further conditionalize instructions What we found - Areas for further study
  • 6. www.linaro.org ● GCC register allocator performs badly compared to x86 for 32-bit ARM ○ They are generally tuned for x86 like instructions where number of live-ranges are much lesser for same code ● Tuning IRA and Reload (LRA) heuristics ○ In some cases, register allocator performs better with combinations of -fira-loop-pressure -fsched-pressure -fira-region=one -fira-algorithm=priority compared default at -O3 ○ These heuristics may need tuning for 32-bit ARM ● New pass to reduce register pressure to help further ○ Scheduling for register pressure ○ Default fsched-pressure helps neon Register allocation -fsched- pressure -fira-algorithm=priority -fira-region=one SPEC INT 0.77% 3.22% -0.66%
  • 7. www.linaro.org ● Transformation that also reduce register pressure ● Redundancy Elimination in address calculation ○ Redundancy in address calculation such as ■ (struct_array[i + 1]->cost < struct_array[i]->cost) ■ switch (struct_array_2[index].mem) { ..... struct_array_2[index].mem = VAL; } ○ Redundancy in address calculations if (strArray[i+1].cost < strArray[i].cost) add r3, r0, #1 movw r1, #:lower16:strArray movt r1, #:upper16:strArray add ip, r1, r0, asl #3 add r2, r1, r3, asl #3 flds s15, [ip] flds s14, [r2] fcmpes s14, s15
  • 8. www.linaro.org ● Sharing labels for globals ○ Labels identify the start address for globals and loading them live range ○ Label can be shared for multiple global static variables ○ rearranging them will help even when offset does not fit within 4095 bytes (for arm) Redundancy in address calculations ... movw r4, #:lower16:.LANCHOR0 movt r4, #:upper16:.LANCHOR0 ... movw r3, #:lower16:.LANCHOR1 ldr r2, [r4] movt r3, #:upper16:.LANCHOR1 ... .bss .align 2 .LANCHOR0 = . + 0 .LANCHOR1 = . + 8184 .type globalInt, %object .size globalInt, 4 static struct str globalStrArray2[100]; static struct str globalStrArray[1000]; static int globalInt; int bar() { globalInt = 22; set (); globalInt += globalStrArray[2].i; globalStrArray[77].i = globalInt - 1; globalStrArray2[77].i = globalInt; return 0; }
  • 9. www.linaro.org ● GCSE makes live ranges larger that results in increased register pressure ○ For cases like ply +1 and (ply + 1) * 4 gcse keeps the value ply+1 and increases register pressure ○ In this case, -fno-gcse improves performance. ● Loop invariant code motion ○ Especially constant loads with movw/movt are moved out and spilled resulting in memory loads Making optimizations register pressure aware -fno-gcse -fno-move-loop-invariant SPEC INT 2.33% 1.15%
  • 10. www.linaro.org ● Immediate loads of integer/float constants ● Loads from literal pools ● Address calculations ○ Computing a constant offset from the stack ○ Pointer Computing a constant offset from the static data area pointer Rematerialize instead of spilling it to stack position[ply+1]=position[ply]; add ip, r0, #1 str ip, [sp, #228] movw fp, #:lower16:position movt fp, #:upper16:position mov ip, r0, asl #2 str ip, [sp, #284] ldr r3, [fp, ip] @ unaligned ldr ip, [sp, #228] str r3, [fp, ip, asl #2] @ unaligned
  • 11. www.linaro.org ● Using Conditionalized instructions ○ How much to conditionalize is very much microarchitecture dependent. ○ Performance of branch predictor and the ability of predicting types of branch in the instruction sequence matters ● Simple if-then-else blocks are not conditionalized ○ Some intentional and others not ● Conditional compares are also missed in many places ○ Currently being looked at ○ Conditionalize Instructions
  • 13. More about Linaro: http://www.linaro.org/about/ More about Linaro engineering: http://www.linaro.org/engineering/ How to join: http://www.linaro.org/about/how-to-join Linaro members: www.linaro.org/members connect.linaro.org