1. #ESCBOS #ESCBOS
From
Hw
to
Sw:
Parallel
Logic
Applied
to
Event-‐Driven
Firmware
Jonny
Doin
–
GridVortex
2. #ESCBOS
From Hardware to Firmware
•
Introduc+on
•
Mul+tasking:
the
holy
grail
of
compu+ng
•
Parallel
compu+ng
and
VHDL
•
process()
and
sequen+al
parallel
logic
•
Signals
and
Sensi+vity
lists
in
VHDL
•
Signals
and
Sensi+vity
lists
in
Firmware
•
Bit-‐banding
on
Cortex-‐M
•
Event-‐driven
scheduling
•
Hardware
scheduling
and
Mul+core
µC
•
Final
thoughts
3. #ESCBOS
Intro
In
this
talk
we
will
see:
•
Architectural
aspects
of
mul+-‐tasking
•
Some
techniques
for
implemen+ng
event-‐driven
firmware
•
Concepts
of
Hardware
Design
that
can
be
applied
to
Firmware
development
4. #ESCBOS
Mul3tasking
Mul+tasking
is
one
of
the
most
important
concepts
of
modern
compu+ng.
Efficient
use
of
processing
bandwidth
affects
energy
and
real-‐+me
response.
Microcontrollers
with
over
200MIPS
are
becoming
very
accessible
to
even
the
smallest
applica+ons.
hRps://s-‐media-‐cache-‐ak0.pinimg.com/736x/d5/6e/06/d56e06a6441353a405456bbdc29df294.jpg
5. #ESCBOS
Mul3tasking (2)
Mul+tasking
can
be
described
as
simula+on
of
a
parallel
processing
system
using
a
smaller
number
of
sequen+al
processors.
Several
mul+tasking
schemes
evolved
over
+me
for
tradi+onal
compu+ng
systems:
• Priority-‐based
scheduling
and
mul+threading
• Collabora+ve
mul+tasking
• Interrupt-‐based
real
+me
systems
• Event-‐driven
mul+tasking
6. #ESCBOS
Mul3tasking (3)
Mul+tasking
schemes
are
a
compromise:
• Cost
of
scheduling
• System
blocking
+me
• Effec+ve
processing
bandwidth
• System
response
+me
USER
TASK
CPU
TIME
SCHEDULER
CPU
TIME
7. #ESCBOS
Parallel processing and VHDL
Truly
parallel
systems
can
be
implemented
in
digital
hardware.
Languages
to
describe
and
design
such
systems
have
specific
language
features
to
describe
parallel
logic.
VHDL
uses
a
state-‐based
model
to
describe
parallel
processing.
8. #ESCBOS
process() and parallel logic
In
VHDL,
sec+ons
of
sequen+al
logic
that
run
in
parallel
with
the
rest
of
the
system
are
defined
using
the
process()
structure:
!
counter: process (clk_i, cnt_clear) is
begin
if cnt_clear = '1' then
cnt_reg <= 0;
else
if clk_i'event and clk_i = '1' then
if cnt_ce = '1' then
cnt_reg <= cnt_next;
end if;
end if;
end if;
end process counter;
cnt_next <= cnt_reg + 1 when cnt_top = '0' else cnt_reg;
Register,
sequen+al
logic
Adder,
combina+onal
logic
9. #ESCBOS
Signals and sensi3vity lists
The
process()
defini+on
includes
a
list
of
signals:
process (clk_i, cnt_clear)
Logic
in
the
process()
is
only
“executed”
when
any
signals
declared
on
its
sensi(vity
list
change
state.
Any
other
logic
in
the
circuit
can
alter
the
state
of
these
signals,
and
when
that
happens,
the
process
is
executed.
The
signals
in
VHDL
have
much
more
to
them.
They
have
a
“transac+on
+meline”
and
support
future
transac+ons
to
be
scheduled
on
the
signal.
10. #ESCBOS
Signals and sensi3vity lists (2)
VHDL
sensi+vity
lists:
• Simple
state-‐based,
event-‐driven
paradigm
• Simulate
parallel
hardware
logic
• Simulators
use
processing
bandwidth
efficiently
The
paradigm
is
based
on
the
delta
cycle,
a
concept
similar
to
an
execu(on
pass
of
the
logic.
All
signals
will
be
assigned
their
values
only
at
the
end
of
the
current
delta
cycle.
11. #ESCBOS
Signals and sensi3vity lists (3)
The
VHDL
concepts
of
process()
with
sensi+vity
lists
and
delta
cycles
can
be
implemented
in
a
bare-‐metal
firmware
to
achieve
mul+tasking
with
low
processing
cost.
The
benefits
of
these
elements
of
mul+tasking
are:
• Fast
event-‐driven
scheduling
• Structural
integrity
of
the
logic
• Scalability
for
mul+core
systems
12. #ESCBOS
Bit-‐banding on Cortex-‐M
ARM
Cortex-‐M
cores
have
dedicated
memory
addressing
hardware
to
implement
atomic
bit-‐access
in
memory
without
read-‐modify-‐write
ar+facts.
• bit-‐signals
can
be
used
as
efficient
Inter
Process
Communica+on
(IPC)
• Fastest
atomic
opera+ons
in
a
Cortex-‐M
(faster
than
STREX/LDREX)
• Map
to
a
special
area
in
RAM
13. #ESCBOS
Bit-‐banding on Cortex-‐M (2)
System Control Space (SCS) and debug components.
Priority is always given to the processor to ensure that any debug accesses are as non-intrusive
as possible. For a zero wait state system, all debug accesses to system memory, SCS, and debug
resources are completely non-intrusive.
Figure 3-1 shows the system address map.
Figure 3-1 System address map
Table 3-3 shows the processor interfaces that are addressed by the different memory map
regions.
System
External device
External RAM
Peripheral
SRAM
Code
0xFFFFFFFF
Private peripheral bus - External
0xE0100000
0xE0040000
0xA0000000
0x60000000
0x40000000
0x20000000
0x00000000
ROM Table
ETM
TPIU
Reserved
SCS
Reserved
FPB
DWT
ITM
External PPB
0xE0042000
0xE0041000
0xE0040000
0xE000F000
0xE000E000
0xE0003000
0xE0002000
0xE00FF000
0x40000000
Bit band region
Bit band alias32MB
1MB
31MB
0x40100000
0x42000000
0x44000000
0xE0001000
0xE0000000
Private peripheral bus - Internal
Bit band region
Bit band alias32MB
1MB
31MB
0x20000000
0x20100000
0x22000000
1.0GB
1.0GB
0.5GB
0.5GB
0.5GB
0xE0000000
0xE0100000
0xE0040000
0x24000000
• Hardware
remapping
of
accesses
• Known
adresses
for
any
Cortex-‐M
• Atomic
writes
on
individual
bits
• Simultaneous
reads
on
all
32bits
source:
ARM
DDI
0439C,
page
3-‐20
14. #ESCBOS
Bit-‐banding on Cortex-‐M (3)
Bit-‐banding
memory
remap
structure:
• Words
(32bit)
in
the
alias
region
map
to
individual
bits
in
the
normal
SRAM
memory
• The
remapped
writes
are
guaranteed
atomic
ProgrammersModel
• The alias word at 0x2200001C maps to bit [7] of the bit-band byte at 0x20000000: 0x2200001C
= 0x22000000 + (0*32) + 7*4.
Figure 3-2 Bit-band mapping
0x23FFFFE4
0x22000004
0x23FFFFE00x23FFFFE80x23FFFFEC0x23FFFFF00x23FFFFF40x23FFFFF80x23FFFFFC
0x220000000x220000140x220000180x2200001C 0x220000080x22000010 0x2200000C
32MB alias region
0
7 0
07
0x200000000x200000010x200000020x20000003
6 5 4 3 2 1 07 6 5 4 3 2 1 7 6 5 4 3 2 1 07 6 5 4 3 2 1
07 6 5 4 3 2 1 6 5 4 3 2 107 6 5 4 3 2 1 07 6 5 4 3 2 1
0x200FFFFC0x200FFFFD0x200FFFFE0x200FFFFF
1MB SRAM bit-band region
source:
ARM
DDI
0439C,
page
3-‐20
15. #ESCBOS
Event-‐driven scheduling
Using
the
concepts
from
VHDL
and
the
atomic
Bit-‐banding
from
Cortex-M
it
is
possible
to:
• Implement
event-‐driven
mul+tasking
• Have
process()-‐like
handlers
with
light
overhead
• Implement
state
machine
logic
efficiently
• Use
bit
signals
as
efficient
IPC
16. #ESCBOS
Event-‐driven scheduling (2)
typedef uint32_t * PFLAGS_T;
typedef volatile struct ipc_flags_t { // any object of this type is volatile qualified
PFLAGS_T pflags_bits; // Ptr to the 'bit bandable' word with 32 ipc bits
PFLAGS_T pflags_base; // Ptr to the base of the word alias array
} IPC_FLAGS_T;
// for the ipc macros, pass a IPC_FLAGS_T struct
#define get_bit(flags, bit) ((flags).pflags_base[(bit)])
#define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1)
#define clr_bit(flags, bit) ((flags).pflags_base[(bit)] = 0)
#define toggle(flags, bit) ((flags).pflags_base[(bit)] ^= 1)
#define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0)
#define clr_bits(flags) (*((flags).pflags_bits) = 0)
#define get_bits(flags, bitmask) (*((flags).pflags_bits) & (bitmask))
extern void init_ipc(void);
extern uint32_t request_ipc_word(IPC_FLAGS_T *pflags);
17. #ESCBOS
Event-‐driven scheduling (3)
#define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1)
so:
set_bit(my_flags, 7);
translates to:
myflags.pflags_base[7] = 1;
where:
IPC_FLAGS_T myflags;
myflags.pflags_base = (PFLAGS_T) 0x22000000;
myflags.pflags_bits = (PFLAGS_T) 0x20000000;
...
0x00000001
bit-‐band
alias
area
0x22000000
0x22000080
bit-‐band
region
0x00000080
0x20000000
18. #ESCBOS
Event-‐driven scheduling (4)
#define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0)
so:
if(event(my_flags, 7))
{
...
}
translates to:
if(((myflags.pflags_base[7] = 0), 1))
after evaluation of the side effect, becomes:
if((1))
comma
operator
side
effect
part
result
20. #ESCBOS
Event-‐driven scheduling (6)
This
event-‐driven
architecture:
• Is
simple
to
implement
• Scales
well
even
with
mul+core
Cortex-‐M
systems
• Improves
processing
granularity
• Can
be
implemented
in
hardware
on
ARM+FPGA
systems
21. #ESCBOS
Hardware scheduling
The
event-‐driven
scheduling
can
be
implemented
directly
in
hardware
on
a
ARM+FPGA
system.
Instead
of
using
a
round-‐robin
cycle
in
firmware,
the
underlying
hardware
can
place
a
“call”
to
each
process()
according
to
its
sensi+vity
list.
This
approach
can
reduce
overhead
to
a
few
instruc+on
cycles
for
a
very
responsive
real+me
system.
22. #ESCBOS
Mul3core Cortex-‐M devices
The
event-‐driven
paradigm
can
be
effec+vely
implemented
in
a
mul+core
Cortex-‐M
system
with
common
memory.
hRp://hothardware.com/newsimages/Item9563/cortex-‐m3-‐arm-‐cpu.png
BUX
MATRIX
SHARED
RAM
SHARED
FLASH
This
approach
simplifies
system
par++oning
on
the
processor
cores,
and
can
decrease
system
response
+me
for
event-‐driven
bare-‐
metal
logic.
Even
when
no
bit-‐banding
is
available
in
the
shared
memory,
atomic
events
can
be
used.
23. #ESCBOS
Final Thoughts
The
event-‐driven
paradigm
is
a
powerful
and
scalable
architectural
structure.
It
is
being
used
in
bare-‐metal
embedded
systems
with
300KLOC+.
If
coupled
with
hardware
scheduling
support,
it
can
be
used
to
implement
very
fast
event
response
systems
that
are
very
hard
to
implement
with
priority-‐based
schedulers.