Microcontrollers represent a highly resource constrained environment. Very small microcontrollers typically have only several K of program space available and several hundred bytes of memory, in addition to very low clock speeds. This talk will look at how to address these resource limitations. Many of the techniques examined also apply to larger / PC class hardware, and can be used to improve the performance for those systems. In addition the techniques explored are also beneficial for optimizing the power consumption of mobile devices and applications.
Exploring the Future Potential of AI-Enabled Smartphone Processors
C for Microcontrollers
1. ‘C’ for Microcontrollers,
Just Being Efficient
Lloyd Moore, President
Lloyd@CyberData-Robotics.com
www.CyberData-Robotics.com
Seattle Robotics Society 9/15/2012
3. Disclaimer
Some microcontroller techniques necessarily
need to trade one benefit for another –
typically lower resource usage for
maintainability
Point of this presentation is to point out various
techniques that can be used as needed
Use these suggestions when necessary
Feel free to suggest better solutions as we go
along
4. Microcontroller Resources
EVERYTHING resides on one die inside one
package: RAM, Flash, Processor, I/O
Cost is a MAJOR design consideration
Typical costs are $0.25 to $25 each (1000’s)
RAM: 16 BYTES to 256K Bytes typical
Flash/ROM: 384 BYTES to 1M Byte
Clock Speed: 4MHz to 175MHz typical
Much lower for battery saving modes (32KHz)
Busis 8, 16, or 32 bits wide
Have dedicated peripherals (MAC, Phys, etc)
5. Power Consumption
Microcontrollers
typically used in battery
operated devices
Power requirements can be
EXTREMELY tight
Energy harvesting applications
Long term battery installations (remote
controls, hard to reach devices, etc.)
EVERY instruction executed consumes
power, even if you have the time and
memory!
6. Know Your Environment
Traditionallywe ignore hardware details
Need to tailor code to hardware
available
Specialized hardware MUCH more efficient
Compilers typically have extensions
Interrupt – specifies code as being ISR
Memory model – may handle banked
memory and/or simultaneous access banks
Multiple data pointers / address generators
Debugger may use some resources
7. Memory Usage
Put constant data into program memory (Flash/ROM)
Alignment / padding issues
Typically NOT an issue, non-aligned access ok
Avoid dynamic memory allocation, even if available
Take extra space and processing time
Memory fragmentation a big issue
Use and reuse static buffers
Reduces variable passing overhead
Allows for smaller / faster code due to reduced indirections
Does bring back over write bugs if not done carefully
More reliable for mission critical systems
Use the appropriate variable type
Don’t use int and double for everything!!
Affects processing time as well as storage
8. C99 Datatypes – inttypes.h
int8_t,int16_t, int32_t, int64_t
uint8_t, uint16_t, uint32_t, uint_64_t
Avoids the ambiguity of int and uint
when moving code between processors
of different native size
Makes code more portable and
upgradable over time
9. Char vs. Int Increment on 8051
char cX; int iX;
cX++; iX++;
0000 900000 MOV DPTR,#iX
000A 900000 MOV DPTR,#cX 0003 E4 CLR A
000D E0 MOVX A,@DPTR 0004 75F001 MOV B,#01H
000E 04 INC A 0007 120000 LCALL ?C?IILDX
000F F0 MOVX @DPTR,A
10 Bytes of Flash +
6 Bytes of Flash subroutine overhead
4 Instruction cycles Many more than 4
instruction cycles with a
LCALL
10. Code Structure
Count down instead of up
Saves a subtraction on all processors
Decrement-jump-not-zero style instruction on some
processors
Pointers vs. array notation
Generally better using pointers
Bit Shifting
May not always generate what you think
May or may not have barrel shifter hardware
May or may not have logical vs. arithmetic shifts
11. Shifting Example on 8051
cX = cX << 3; cA = 3;
cX = cX << cA;
0006 33 RLC A
000B 900000 MOV DPTR,#cA
0007 33 RLC A
000E E0 MOVX A,@DPTR
0008 33 RLC A
000F FE MOV R6,A
0009 54F8 ANL A,#0F8H
0010 EF MOV A,R7
0011 A806 MOV R0,AR6
0013 08 INC R0
0014 8002 SJMP ?C0005
0016 ?C0004:
Constants turn into seperate 0016 C3 CLR C
statements 0017 33 RLC A
Variables turn into loops 0018 ?C0005
0018 D8FC DJNZ R0,?C0004
Both of these can be one
instruction with a barrel shifter
12. Indexed Array vs Pointer on M8C
ucMode = g_Channels[uc_Channel].ucMode; ucMode = pChannel->ucMode;
01DC 52FC mov A,[X-4] 01ED 5201 mov A,[X+1]
01DE 5300 mov [__r1],A 01EF 5300 mov [__r1],A
01E0 5000 mov A,0
01F1 3E00 mvi A,[__r1]
01E2 08 push A
01E3 5100 mov A,[__r1] 01F3 5405 mov [X+5],A
01E5 08 push A
01E6 5000 mov A,0 Does the same thing
01E8 08 push A Saves 29 bytes of memory AND a
01E9 5007 mov A,7
01EB 08 push A
call to a 16 bit multiplication routine!
01EC 7C0000 xcall __mul16 Pointer version will be at least 4x
01EF 38FC add SP,-4 faster to execute as well, maybe 10x
01F1 5F0000 mov [__r1],[__rX] Most compilers not this bad – but you
01F4 5F0000 mov [__r0],[__rY] do find some!
01F7 060000 add[__r1],<_g_Channels
01FA 0E0000 adc[__r0],>_g_Channels
01FD 3E00 mvi A,[__r1]
01FF 5403 mov [X+3],A
13. More Code Structure
Actual parameters typically passed in registers if
available
Keep function parameters to less than 3
May also be passed on stack or special parameter area
May be more efficient to pass pointer to struct
Global variables
While generally frowned upon for most code can be very
helpful here
Typically ends up being a direct access
Read assembly code for critical areas
Know which optimizations are present
Small compilers do not always have common optimizations
Inline, loop unrolling, loop invariant, pointer conversion
14. Switch Statement Implementation
Switch statements can be implemented in various
ways
Sequential compares
In line table look up for case block
Special function with look up table
Specific implementation can also vary based case
clauses
Clean sequence (1, 2, 3, 4, 5)
Gaps in sequence (1, 10, 30, 255)
Ordering of sequence (5, 4, 1, 2, 3)
Knowing which method gets implemented is critical to
optimizing!
15. Switch Statement Example
switch(cA) 0006 900000 MOV DPTR,#cA
{ 0009 E0 MOVX A,@DPTR
000A FF MOV R7,A
case 0:
000B EF MOV A,R7
cX = 4; 000C 120000 LCALL ?C?CCASE
break; 000F 0000 DW ?C0003
case 1: 0011 00 DB 00H
cX = 10; 0012 0000 DW ?C0002
break; 0014 01 DB 01H
case 2: 0015 0000 DW ?C0004
cX = 30; 0017 02 DB 02H
break; 0018 0000 DW 00H
default: 001A 0000 DW ?C0005
cX = 0;
break; 001C ?C0002:
} 001C 900000 MOV DPTR,#cX
001F 7404 MOV A,#04H
0021 F0 MOVX @DPTR,A
0022 8015 SJMP ?C0006
...More blocks follow for each case
16. Optimization Process
Step 0 – Before coding anything, think about
risk points and prototype unknowns!!!
Use available dedicated hardware
Step 1 – Get it working!!
Fast but wrong is of no use to anyone
Optimization will typically reduce readability
Step 2 – Profile to know where to optimize
Usually only one or two routines are critical
You need to have specific performance metrics to
target
17. Optimization Process
Step 3 – Let the tools do as much as
they can
Turn off debugging!
Select the correct memory model
Select the correct optimization level
Step 4 – Do it manually
Read the generated code! Might be able to
make a simple code or structure change.
Last – think about assembly coding
18. Summary
Microcontrollers are a resource
constrained environment
Be familiar with the hardware in your
microcontroller
Be familiar with your compiler options
and how it translates your code
For time or space critical code look at
the assembly listing from time to time