SlideShare a Scribd company logo
1 of 24
Download to read offline
#ESCBOS #ESCBOS
From	
  Hw	
  to	
  Sw:	
  Parallel	
  Logic	
  Applied	
  to	
  Event-­‐Driven	
  Firmware	
  
Jonny	
  Doin	
  –	
  GridVortex	
  
#ESCBOS
From  Hardware  to  Firmware
•	
  Introduc+on	
  
•	
  Mul+tasking:	
  the	
  holy	
  grail	
  of	
  compu+ng	
  
•	
  Parallel	
  compu+ng	
  and	
  VHDL	
  	
  
•	
  process()	
  and	
  sequen+al	
  parallel	
  logic	
  
•	
  Signals	
  and	
  Sensi+vity	
  lists	
  in	
  VHDL	
  
•	
  Signals	
  and	
  Sensi+vity	
  lists	
  in	
  Firmware	
  
•	
  Bit-­‐banding	
  on	
  Cortex-­‐M	
  
•	
  Event-­‐driven	
  scheduling	
  
•	
  Hardware	
  scheduling	
  and	
  Mul+core	
  µC	
  
•	
  Final	
  thoughts	
  
#ESCBOS
Intro
In	
  this	
  talk	
  we	
  will	
  see:	
  
•	
  Architectural	
  aspects	
  of	
  mul+-­‐tasking	
  
•	
  Some	
  techniques	
  for	
  implemen+ng	
  event-­‐driven	
  firmware	
  
•	
  Concepts	
  of	
  Hardware	
  Design	
  that	
  can	
  be	
  applied	
  to	
  Firmware	
  
development	
  
#ESCBOS
Mul3tasking
Mul+tasking	
  is	
  one	
  of	
  the	
  most	
  
important	
  concepts	
  of	
  modern	
  
compu+ng.	
  
Efficient	
  use	
  of	
  processing	
  bandwidth	
  
affects	
  energy	
  and	
  real-­‐+me	
  response.	
  
Microcontrollers	
  with	
  over	
  200MIPS	
  are	
  
becoming	
  very	
  accessible	
  to	
  even	
  the	
  
smallest	
  applica+ons.	
  
hRps://s-­‐media-­‐cache-­‐ak0.pinimg.com/736x/d5/6e/06/d56e06a6441353a405456bbdc29df294.jpg	
  
#ESCBOS
Mul3tasking  (2)
Mul+tasking	
  can	
  be	
  described	
  as	
  simula+on	
  of	
  a	
  
parallel	
  processing	
  system	
  using	
  a	
  smaller	
  
number	
  of	
  sequen+al	
  processors.	
  
Several	
  mul+tasking	
  schemes	
  evolved	
  over	
  +me	
  
for	
  tradi+onal	
  compu+ng	
  systems:	
  
•  Priority-­‐based	
  scheduling	
  and	
  mul+threading	
  
•  Collabora+ve	
  mul+tasking	
  
•  Interrupt-­‐based	
  real	
  +me	
  systems	
  
•  Event-­‐driven	
  mul+tasking	
  
#ESCBOS
Mul3tasking  (3)
Mul+tasking	
  schemes	
  are	
  a	
  compromise:	
  
•  Cost	
  of	
  scheduling	
  
•  System	
  blocking	
  +me	
  
•  Effec+ve	
  processing	
  bandwidth	
  
•  System	
  response	
  +me	
  
USER	
  TASK	
  
CPU	
  TIME	
  
SCHEDULER	
  
CPU	
  TIME	
  
#ESCBOS
Parallel  processing  and  VHDL
Truly	
  parallel	
  systems	
  can	
  be	
  implemented	
  in	
  
digital	
  hardware.	
  
Languages	
  to	
  describe	
  and	
  design	
  such	
  
systems	
  have	
  specific	
  language	
  features	
  to	
  
describe	
  parallel	
  logic.	
  
VHDL	
  uses	
  a	
  state-­‐based	
  model	
  to	
  describe	
  
parallel	
  processing.	
  
#ESCBOS
process()  and  parallel  logic
In	
  VHDL,	
  sec+ons	
  of	
  sequen+al	
  logic	
  that	
  run	
  in	
  parallel	
  with	
  the	
  rest	
  of	
  the	
  system	
  
are	
  defined	
  using	
  the	
  process()	
  structure:	
  
!
counter: process (clk_i, cnt_clear) is
begin
if cnt_clear = '1' then
cnt_reg <= 0;
else
if clk_i'event and clk_i = '1' then
if cnt_ce = '1' then
cnt_reg <= cnt_next;
end if;
end if;
end if;
end process counter;
cnt_next <= cnt_reg + 1 when cnt_top = '0' else cnt_reg;
Register,	
  sequen+al	
  logic	
  
Adder,	
  combina+onal	
  logic	
  
#ESCBOS
Signals  and  sensi3vity  lists
The	
  process()	
  defini+on	
  includes	
  a	
  list	
  of	
  signals:	
  
process (clk_i, cnt_clear)
Logic	
  in	
  the	
  process()	
  is	
  only	
  “executed”	
  when	
  any	
  signals	
  declared	
  on	
  its	
  
sensi(vity	
  list	
  change	
  state.	
  	
  
Any	
  other	
  logic	
  in	
  the	
  circuit	
  can	
  alter	
  the	
  state	
  of	
  these	
  signals,	
  and	
  when	
  that	
  
happens,	
  the	
  process	
  is	
  executed.	
  
The	
  signals	
  in	
  VHDL	
  have	
  much	
  more	
  to	
  them.	
  They	
  have	
  a	
  “transac+on	
  +meline”	
  
and	
  support	
  future	
  transac+ons	
  to	
  be	
  scheduled	
  on	
  the	
  signal.	
  	
  
#ESCBOS
Signals  and  sensi3vity  lists  (2)
VHDL	
  sensi+vity	
  lists:	
  
•  Simple	
  state-­‐based,	
  event-­‐driven	
  paradigm	
  
•  Simulate	
  parallel	
  hardware	
  logic	
  
•  Simulators	
  use	
  processing	
  bandwidth	
  efficiently	
  
The	
  paradigm	
  is	
  based	
  on	
  the	
  delta	
  cycle,	
  a	
  concept	
  similar	
  to	
  an	
  execu(on	
  pass	
  of	
  
the	
  logic.	
  All	
  signals	
  will	
  be	
  assigned	
  their	
  values	
  only	
  at	
  the	
  end	
  of	
  the	
  current	
  
delta	
  cycle.	
  	
  
#ESCBOS
Signals  and  sensi3vity  lists  (3)
The	
  VHDL	
  concepts	
  of	
  process()	
  with	
  sensi+vity	
  lists	
  and	
  delta	
  cycles	
  
can	
  be	
  implemented	
  in	
  a	
  bare-­‐metal	
  firmware	
  to	
  achieve	
  mul+tasking	
  
with	
  low	
  processing	
  cost.	
  
The	
  benefits	
  of	
  these	
  elements	
  of	
  mul+tasking	
  are:	
  
•  Fast	
  event-­‐driven	
  scheduling	
  
•  Structural	
  integrity	
  of	
  the	
  logic	
  
•  Scalability	
  for	
  mul+core	
  systems	
  
#ESCBOS
Bit-­‐banding  on  Cortex-­‐M
ARM	
  Cortex-­‐M	
  cores	
  have	
  dedicated	
  memory	
  addressing	
  hardware	
  to	
  
implement	
  atomic	
  bit-­‐access	
  in	
  memory	
  without	
  read-­‐modify-­‐write	
  
ar+facts.	
  	
  
•  bit-­‐signals	
  can	
  be	
  used	
  as	
  efficient	
  Inter	
  Process	
  Communica+on	
  (IPC)	
  
•  Fastest	
  atomic	
  opera+ons	
  in	
  a	
  Cortex-­‐M	
  (faster	
  than	
  STREX/LDREX)	
  
•  Map	
  to	
  a	
  special	
  area	
  in	
  RAM	
  
#ESCBOS
Bit-­‐banding  on  Cortex-­‐M  (2)
System Control Space (SCS) and debug components.
Priority is always given to the processor to ensure that any debug accesses are as non-intrusive
as possible. For a zero wait state system, all debug accesses to system memory, SCS, and debug
resources are completely non-intrusive.
Figure 3-1 shows the system address map.
Figure 3-1 System address map
Table 3-3 shows the processor interfaces that are addressed by the different memory map
regions.
System
External device
External RAM
Peripheral
SRAM
Code
0xFFFFFFFF
Private peripheral bus - External
0xE0100000
0xE0040000
0xA0000000
0x60000000
0x40000000
0x20000000
0x00000000
ROM Table
ETM
TPIU
Reserved
SCS
Reserved
FPB
DWT
ITM
External PPB
0xE0042000
0xE0041000
0xE0040000
0xE000F000
0xE000E000
0xE0003000
0xE0002000
0xE00FF000
0x40000000
Bit band region
Bit band alias32MB
1MB
31MB
0x40100000
0x42000000
0x44000000
0xE0001000
0xE0000000
Private peripheral bus - Internal
Bit band region
Bit band alias32MB
1MB
31MB
0x20000000
0x20100000
0x22000000
1.0GB
1.0GB
0.5GB
0.5GB
0.5GB
0xE0000000
0xE0100000
0xE0040000
0x24000000
•  Hardware	
  remapping	
  of	
  accesses	
  
•  Known	
  adresses	
  for	
  any	
  Cortex-­‐M	
  
•  Atomic	
  writes	
  on	
  individual	
  bits	
  
•  Simultaneous	
  reads	
  on	
  all	
  32bits	
  
source:	
  ARM	
  DDI	
  0439C,	
  page	
  3-­‐20	
  
#ESCBOS
Bit-­‐banding  on  Cortex-­‐M  (3)
Bit-­‐banding	
  memory	
  remap	
  
structure:	
  
•  Words	
  (32bit)	
  in	
  the	
  alias	
  
region	
  map	
  to	
  individual	
  
bits	
  in	
  the	
  normal	
  SRAM	
  
memory	
  
•  The	
  remapped	
  writes	
  are	
  
guaranteed	
  atomic	
  
ProgrammersModel
• The alias word at 0x2200001C maps to bit [7] of the bit-band byte at 0x20000000: 0x2200001C
= 0x22000000 + (0*32) + 7*4.
Figure 3-2 Bit-band mapping
0x23FFFFE4
0x22000004
0x23FFFFE00x23FFFFE80x23FFFFEC0x23FFFFF00x23FFFFF40x23FFFFF80x23FFFFFC
0x220000000x220000140x220000180x2200001C 0x220000080x22000010 0x2200000C
32MB alias region
0
7 0
07
0x200000000x200000010x200000020x20000003
6 5 4 3 2 1 07 6 5 4 3 2 1 7 6 5 4 3 2 1 07 6 5 4 3 2 1
07 6 5 4 3 2 1 6 5 4 3 2 107 6 5 4 3 2 1 07 6 5 4 3 2 1
0x200FFFFC0x200FFFFD0x200FFFFE0x200FFFFF
1MB SRAM bit-band region
source:	
  ARM	
  DDI	
  0439C,	
  page	
  3-­‐20	
  
#ESCBOS
Event-­‐driven  scheduling
Using	
  the	
  concepts	
  from	
  VHDL	
  and	
  the	
  atomic	
  Bit-­‐banding	
  from	
  
Cortex-M	
  it	
  is	
  possible	
  to:	
  
•  Implement	
  event-­‐driven	
  mul+tasking	
  
•  Have	
  process()-­‐like	
  handlers	
  with	
  light	
  overhead	
  
•  Implement	
  state	
  machine	
  logic	
  efficiently	
  
•  Use	
  bit	
  signals	
  as	
  efficient	
  IPC	
  
#ESCBOS
Event-­‐driven  scheduling  (2)
typedef uint32_t * PFLAGS_T;
typedef volatile struct ipc_flags_t { // any object of this type is volatile qualified
PFLAGS_T pflags_bits; // Ptr to the 'bit bandable' word with 32 ipc bits
PFLAGS_T pflags_base; // Ptr to the base of the word alias array
} IPC_FLAGS_T;
// for the ipc macros, pass a IPC_FLAGS_T struct
#define get_bit(flags, bit) ((flags).pflags_base[(bit)])
#define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1)
#define clr_bit(flags, bit) ((flags).pflags_base[(bit)] = 0)
#define toggle(flags, bit) ((flags).pflags_base[(bit)] ^= 1)
#define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0)
#define clr_bits(flags) (*((flags).pflags_bits) = 0)
#define get_bits(flags, bitmask) (*((flags).pflags_bits) & (bitmask))
extern void init_ipc(void);
extern uint32_t request_ipc_word(IPC_FLAGS_T *pflags);
#ESCBOS
Event-­‐driven  scheduling  (3)
#define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1)
so:
set_bit(my_flags, 7);
translates to:
myflags.pflags_base[7] = 1;
where:
IPC_FLAGS_T myflags;
myflags.pflags_base = (PFLAGS_T) 0x22000000;
myflags.pflags_bits = (PFLAGS_T) 0x20000000;
...	
  
0x00000001	
  
bit-­‐band	
  alias	
  area	
  
0x22000000	
  
0x22000080	
  
bit-­‐band	
  region	
  0x00000080	
  0x20000000	
  
#ESCBOS
Event-­‐driven  scheduling  (4)
#define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0)
so:
if(event(my_flags, 7))
{
...
}
translates to:
if(((myflags.pflags_base[7] = 0), 1))
after evaluation of the side effect, becomes:
if((1))
comma	
  operator	
  
side	
  effect	
  part	
   result	
  
#ESCBOS
Event-­‐driven  scheduling  (5)
enum keypad_bits_t {
bit_keypad_value_update = 0,
bit_keypressed_wait,
bit_refresh_debounce_tmr,
};
void process_keypad(void)
{
if(event_refresh_debounce_tmr())
{
keypad_data.debounce_tmr = KEYPAD_DEBOUNCE_TIME;
keypad_data.state = KEYPAD_DEBOUNCE;
}
...
}
static void trigger_keypad_update(void *object)
{
keypad_data.latched = read_keypad_value();
set_bit_refresh_debounce_tmr();
}
#ESCBOS
Event-­‐driven  scheduling  (6)
This	
  event-­‐driven	
  architecture:	
  
•  Is	
  simple	
  to	
  implement	
  
•  Scales	
  well	
  even	
  with	
  mul+core	
  Cortex-­‐M	
  systems	
  
•  Improves	
  processing	
  granularity	
  
•  Can	
  be	
  implemented	
  in	
  hardware	
  on	
  ARM+FPGA	
  systems	
  
#ESCBOS
Hardware  scheduling
The	
  event-­‐driven	
  scheduling	
  can	
  be	
  implemented	
  directly	
  in	
  hardware	
  
on	
  a	
  ARM+FPGA	
  system.	
  
Instead	
  of	
  using	
  a	
  round-­‐robin	
  cycle	
  in	
  firmware,	
  the	
  underlying	
  
hardware	
  can	
  place	
  a	
  “call”	
  to	
  each	
  process()	
  according	
  to	
  its	
  
sensi+vity	
  list.	
  
This	
  approach	
  can	
  reduce	
  overhead	
  to	
  a	
  few	
  instruc+on	
  cycles	
  for	
  a	
  
very	
  responsive	
  real+me	
  system.	
  
#ESCBOS
Mul3core  Cortex-­‐M  devices
The	
  event-­‐driven	
  paradigm	
  can	
  be	
  effec+vely	
  implemented	
  in	
  a	
  
mul+core	
  Cortex-­‐M	
  system	
  with	
  common	
  memory.	
  
hRp://hothardware.com/newsimages/Item9563/cortex-­‐m3-­‐arm-­‐cpu.png	
  
BUX	
  MATRIX	
  
SHARED	
  	
  
RAM	
  
SHARED	
  FLASH	
  
This	
  approach	
  simplifies	
  system	
  par++oning	
  
on	
  the	
  processor	
  cores,	
  and	
  can	
  decrease	
  
system	
  response	
  +me	
  for	
  event-­‐driven	
  bare-­‐
metal	
  logic.	
  
Even	
  when	
  no	
  bit-­‐banding	
  is	
  available	
  in	
  the	
  
shared	
  memory,	
  atomic	
  events	
  can	
  be	
  used.	
  
#ESCBOS
Final  Thoughts
The	
  event-­‐driven	
  paradigm	
  is	
  a	
  powerful	
  and	
  scalable	
  architectural	
  
structure.	
  
It	
  is	
  being	
  used	
  in	
  bare-­‐metal	
  embedded	
  systems	
  with	
  300KLOC+.	
  
If	
  coupled	
  with	
  hardware	
  scheduling	
  support,	
  it	
  can	
  be	
  used	
  to	
  
implement	
  very	
  fast	
  event	
  response	
  systems	
  that	
  are	
  very	
  hard	
  to	
  
implement	
  with	
  priority-­‐based	
  schedulers.	
  
#ESCBOS
Thank	
  you	
  
Jonny	
  Doin	
  
jonnydoin@gridvortex.com	
  
	
  

More Related Content

What's hot

Micro operation control of processor
Micro operation control of processorMicro operation control of processor
Micro operation control of processor
Muhammad Ishaq
 
Fpga(field programmable gate array)
Fpga(field programmable gate array) Fpga(field programmable gate array)
Fpga(field programmable gate array)
Iffat Anjum
 
Timers in Unix/Linux
Timers in Unix/LinuxTimers in Unix/Linux
Timers in Unix/Linux
geeksrik
 

What's hot (20)

Fpga based encryption design using vhdl
Fpga based encryption design using vhdlFpga based encryption design using vhdl
Fpga based encryption design using vhdl
 
M04302093096
M04302093096M04302093096
M04302093096
 
TinyML - 4 speech recognition
TinyML - 4 speech recognition TinyML - 4 speech recognition
TinyML - 4 speech recognition
 
RISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentor
 
Aes
AesAes
Aes
 
MaPU-HPCA2016
MaPU-HPCA2016MaPU-HPCA2016
MaPU-HPCA2016
 
Micro operation control of processor
Micro operation control of processorMicro operation control of processor
Micro operation control of processor
 
G05215356
G05215356G05215356
G05215356
 
Digital System Design Basics
Digital System Design BasicsDigital System Design Basics
Digital System Design Basics
 
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISALec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
 
16-bit Microprocessor Design (2005)
16-bit Microprocessor Design (2005)16-bit Microprocessor Design (2005)
16-bit Microprocessor Design (2005)
 
MIPS Assembly Language I
MIPS Assembly Language IMIPS Assembly Language I
MIPS Assembly Language I
 
Geep networking stack-linuxkernel
Geep networking stack-linuxkernelGeep networking stack-linuxkernel
Geep networking stack-linuxkernel
 
GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64
 
Digital Electronics & Computer Oraganisation
Digital Electronics & Computer OraganisationDigital Electronics & Computer Oraganisation
Digital Electronics & Computer Oraganisation
 
Implementation of Fast Pipelined AES Algorithm on Xilinx FPGA
Implementation of Fast Pipelined AES Algorithm on Xilinx FPGAImplementation of Fast Pipelined AES Algorithm on Xilinx FPGA
Implementation of Fast Pipelined AES Algorithm on Xilinx FPGA
 
Fpga(field programmable gate array)
Fpga(field programmable gate array) Fpga(field programmable gate array)
Fpga(field programmable gate array)
 
Linux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureLinux on ARM 64-bit Architecture
Linux on ARM 64-bit Architecture
 
Timers in Unix/Linux
Timers in Unix/LinuxTimers in Unix/Linux
Timers in Unix/Linux
 
Control unit
Control unitControl unit
Control unit
 

Similar to ParallelLogicToEventDrivenFirmware_Doin

Sudhir tms 320 f 2812
Sudhir tms 320 f 2812 Sudhir tms 320 f 2812
Sudhir tms 320 f 2812
vijaydeepakg
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eve
chiportal
 
2nd ARM Developer Day - NXP USB Workshop
2nd ARM Developer Day - NXP USB Workshop2nd ARM Developer Day - NXP USB Workshop
2nd ARM Developer Day - NXP USB Workshop
Antonio Mondragon
 
Arm cortex-m3 by-joe_bungo_arm
Arm cortex-m3 by-joe_bungo_armArm cortex-m3 by-joe_bungo_arm
Arm cortex-m3 by-joe_bungo_arm
Prashant Ahire
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226
Nick Kypreos
 

Similar to ParallelLogicToEventDrivenFirmware_Doin (20)

Sudhir tms 320 f 2812
Sudhir tms 320 f 2812 Sudhir tms 320 f 2812
Sudhir tms 320 f 2812
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eve
 
Week1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC BeginWeek1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC Begin
 
soc ip core based for spacecraft application
soc ip core based for spacecraft applicationsoc ip core based for spacecraft application
soc ip core based for spacecraft application
 
CST 20363 Session 4 Computer Logic Design
CST 20363 Session 4 Computer Logic DesignCST 20363 Session 4 Computer Logic Design
CST 20363 Session 4 Computer Logic Design
 
2nd ARM Developer Day - NXP USB Workshop
2nd ARM Developer Day - NXP USB Workshop2nd ARM Developer Day - NXP USB Workshop
2nd ARM Developer Day - NXP USB Workshop
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Design & Simulation of RISC Processor using Hyper Pipelining Technique
Design & Simulation of RISC Processor using Hyper Pipelining TechniqueDesign & Simulation of RISC Processor using Hyper Pipelining Technique
Design & Simulation of RISC Processor using Hyper Pipelining Technique
 
Module_01.ppt
Module_01.pptModule_01.ppt
Module_01.ppt
 
Arm cortex-m3 by-joe_bungo_arm
Arm cortex-m3 by-joe_bungo_armArm cortex-m3 by-joe_bungo_arm
Arm cortex-m3 by-joe_bungo_arm
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
 
Performance Evaluation & Design Methodologies for Automated 32 Bit CRC Checki...
Performance Evaluation & Design Methodologies for Automated 32 Bit CRC Checki...Performance Evaluation & Design Methodologies for Automated 32 Bit CRC Checki...
Performance Evaluation & Design Methodologies for Automated 32 Bit CRC Checki...
 
UNIT 2_ESD.pdf
UNIT 2_ESD.pdfUNIT 2_ESD.pdf
UNIT 2_ESD.pdf
 
DSP Processor.pptx
DSP Processor.pptxDSP Processor.pptx
DSP Processor.pptx
 
Presentation on risc pipeline
Presentation on risc pipelinePresentation on risc pipeline
Presentation on risc pipeline
 
Embedded OS and Application-2024-01 Embedded system introduction.pdf
Embedded OS and Application-2024-01 Embedded system introduction.pdfEmbedded OS and Application-2024-01 Embedded system introduction.pdf
Embedded OS and Application-2024-01 Embedded system introduction.pdf
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226
 
Aes
AesAes
Aes
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 

More from Jonny Doin

SiliconFailsafeForIoT_Doin
SiliconFailsafeForIoT_DoinSiliconFailsafeForIoT_Doin
SiliconFailsafeForIoT_Doin
Jonny Doin
 
ImplementingCryptoSecurityARMCortex_Doin
ImplementingCryptoSecurityARMCortex_DoinImplementingCryptoSecurityARMCortex_Doin
ImplementingCryptoSecurityARMCortex_Doin
Jonny Doin
 

More from Jonny Doin (12)

Jonny doin safe io t- lt_spice failsafe
Jonny doin safe io t- lt_spice failsafeJonny doin safe io t- lt_spice failsafe
Jonny doin safe io t- lt_spice failsafe
 
Impacto metrologialegal jonnydoin
Impacto metrologialegal jonnydoinImpacto metrologialegal jonnydoin
Impacto metrologialegal jonnydoin
 
Jonny doin lt spice servo_dac
Jonny doin lt spice servo_dacJonny doin lt spice servo_dac
Jonny doin lt spice servo_dac
 
Sts 401 slides-doin
Sts 401 slides-doinSts 401 slides-doin
Sts 401 slides-doin
 
Esc 209 slides-doin
Esc 209 slides-doinEsc 209 slides-doin
Esc 209 slides-doin
 
Esc 209 paper_doin
Esc 209 paper_doinEsc 209 paper_doin
Esc 209 paper_doin
 
Network insecuritysimplehackscortexm jonnydoin
Network insecuritysimplehackscortexm jonnydoinNetwork insecuritysimplehackscortexm jonnydoin
Network insecuritysimplehackscortexm jonnydoin
 
Io t hurdles_i_pv6_slides_doin
Io t hurdles_i_pv6_slides_doinIo t hurdles_i_pv6_slides_doin
Io t hurdles_i_pv6_slides_doin
 
Implementing lora smartcity doin
Implementing lora smartcity doinImplementing lora smartcity doin
Implementing lora smartcity doin
 
Csc jonny doin_painel1_sm
Csc jonny doin_painel1_smCsc jonny doin_painel1_sm
Csc jonny doin_painel1_sm
 
SiliconFailsafeForIoT_Doin
SiliconFailsafeForIoT_DoinSiliconFailsafeForIoT_Doin
SiliconFailsafeForIoT_Doin
 
ImplementingCryptoSecurityARMCortex_Doin
ImplementingCryptoSecurityARMCortex_DoinImplementingCryptoSecurityARMCortex_Doin
ImplementingCryptoSecurityARMCortex_Doin
 

ParallelLogicToEventDrivenFirmware_Doin

  • 1. #ESCBOS #ESCBOS From  Hw  to  Sw:  Parallel  Logic  Applied  to  Event-­‐Driven  Firmware   Jonny  Doin  –  GridVortex  
  • 2. #ESCBOS From  Hardware  to  Firmware •  Introduc+on   •  Mul+tasking:  the  holy  grail  of  compu+ng   •  Parallel  compu+ng  and  VHDL     •  process()  and  sequen+al  parallel  logic   •  Signals  and  Sensi+vity  lists  in  VHDL   •  Signals  and  Sensi+vity  lists  in  Firmware   •  Bit-­‐banding  on  Cortex-­‐M   •  Event-­‐driven  scheduling   •  Hardware  scheduling  and  Mul+core  µC   •  Final  thoughts  
  • 3. #ESCBOS Intro In  this  talk  we  will  see:   •  Architectural  aspects  of  mul+-­‐tasking   •  Some  techniques  for  implemen+ng  event-­‐driven  firmware   •  Concepts  of  Hardware  Design  that  can  be  applied  to  Firmware   development  
  • 4. #ESCBOS Mul3tasking Mul+tasking  is  one  of  the  most   important  concepts  of  modern   compu+ng.   Efficient  use  of  processing  bandwidth   affects  energy  and  real-­‐+me  response.   Microcontrollers  with  over  200MIPS  are   becoming  very  accessible  to  even  the   smallest  applica+ons.   hRps://s-­‐media-­‐cache-­‐ak0.pinimg.com/736x/d5/6e/06/d56e06a6441353a405456bbdc29df294.jpg  
  • 5. #ESCBOS Mul3tasking  (2) Mul+tasking  can  be  described  as  simula+on  of  a   parallel  processing  system  using  a  smaller   number  of  sequen+al  processors.   Several  mul+tasking  schemes  evolved  over  +me   for  tradi+onal  compu+ng  systems:   •  Priority-­‐based  scheduling  and  mul+threading   •  Collabora+ve  mul+tasking   •  Interrupt-­‐based  real  +me  systems   •  Event-­‐driven  mul+tasking  
  • 6. #ESCBOS Mul3tasking  (3) Mul+tasking  schemes  are  a  compromise:   •  Cost  of  scheduling   •  System  blocking  +me   •  Effec+ve  processing  bandwidth   •  System  response  +me   USER  TASK   CPU  TIME   SCHEDULER   CPU  TIME  
  • 7. #ESCBOS Parallel  processing  and  VHDL Truly  parallel  systems  can  be  implemented  in   digital  hardware.   Languages  to  describe  and  design  such   systems  have  specific  language  features  to   describe  parallel  logic.   VHDL  uses  a  state-­‐based  model  to  describe   parallel  processing.  
  • 8. #ESCBOS process()  and  parallel  logic In  VHDL,  sec+ons  of  sequen+al  logic  that  run  in  parallel  with  the  rest  of  the  system   are  defined  using  the  process()  structure:   ! counter: process (clk_i, cnt_clear) is begin if cnt_clear = '1' then cnt_reg <= 0; else if clk_i'event and clk_i = '1' then if cnt_ce = '1' then cnt_reg <= cnt_next; end if; end if; end if; end process counter; cnt_next <= cnt_reg + 1 when cnt_top = '0' else cnt_reg; Register,  sequen+al  logic   Adder,  combina+onal  logic  
  • 9. #ESCBOS Signals  and  sensi3vity  lists The  process()  defini+on  includes  a  list  of  signals:   process (clk_i, cnt_clear) Logic  in  the  process()  is  only  “executed”  when  any  signals  declared  on  its   sensi(vity  list  change  state.     Any  other  logic  in  the  circuit  can  alter  the  state  of  these  signals,  and  when  that   happens,  the  process  is  executed.   The  signals  in  VHDL  have  much  more  to  them.  They  have  a  “transac+on  +meline”   and  support  future  transac+ons  to  be  scheduled  on  the  signal.    
  • 10. #ESCBOS Signals  and  sensi3vity  lists  (2) VHDL  sensi+vity  lists:   •  Simple  state-­‐based,  event-­‐driven  paradigm   •  Simulate  parallel  hardware  logic   •  Simulators  use  processing  bandwidth  efficiently   The  paradigm  is  based  on  the  delta  cycle,  a  concept  similar  to  an  execu(on  pass  of   the  logic.  All  signals  will  be  assigned  their  values  only  at  the  end  of  the  current   delta  cycle.    
  • 11. #ESCBOS Signals  and  sensi3vity  lists  (3) The  VHDL  concepts  of  process()  with  sensi+vity  lists  and  delta  cycles   can  be  implemented  in  a  bare-­‐metal  firmware  to  achieve  mul+tasking   with  low  processing  cost.   The  benefits  of  these  elements  of  mul+tasking  are:   •  Fast  event-­‐driven  scheduling   •  Structural  integrity  of  the  logic   •  Scalability  for  mul+core  systems  
  • 12. #ESCBOS Bit-­‐banding  on  Cortex-­‐M ARM  Cortex-­‐M  cores  have  dedicated  memory  addressing  hardware  to   implement  atomic  bit-­‐access  in  memory  without  read-­‐modify-­‐write   ar+facts.     •  bit-­‐signals  can  be  used  as  efficient  Inter  Process  Communica+on  (IPC)   •  Fastest  atomic  opera+ons  in  a  Cortex-­‐M  (faster  than  STREX/LDREX)   •  Map  to  a  special  area  in  RAM  
  • 13. #ESCBOS Bit-­‐banding  on  Cortex-­‐M  (2) System Control Space (SCS) and debug components. Priority is always given to the processor to ensure that any debug accesses are as non-intrusive as possible. For a zero wait state system, all debug accesses to system memory, SCS, and debug resources are completely non-intrusive. Figure 3-1 shows the system address map. Figure 3-1 System address map Table 3-3 shows the processor interfaces that are addressed by the different memory map regions. System External device External RAM Peripheral SRAM Code 0xFFFFFFFF Private peripheral bus - External 0xE0100000 0xE0040000 0xA0000000 0x60000000 0x40000000 0x20000000 0x00000000 ROM Table ETM TPIU Reserved SCS Reserved FPB DWT ITM External PPB 0xE0042000 0xE0041000 0xE0040000 0xE000F000 0xE000E000 0xE0003000 0xE0002000 0xE00FF000 0x40000000 Bit band region Bit band alias32MB 1MB 31MB 0x40100000 0x42000000 0x44000000 0xE0001000 0xE0000000 Private peripheral bus - Internal Bit band region Bit band alias32MB 1MB 31MB 0x20000000 0x20100000 0x22000000 1.0GB 1.0GB 0.5GB 0.5GB 0.5GB 0xE0000000 0xE0100000 0xE0040000 0x24000000 •  Hardware  remapping  of  accesses   •  Known  adresses  for  any  Cortex-­‐M   •  Atomic  writes  on  individual  bits   •  Simultaneous  reads  on  all  32bits   source:  ARM  DDI  0439C,  page  3-­‐20  
  • 14. #ESCBOS Bit-­‐banding  on  Cortex-­‐M  (3) Bit-­‐banding  memory  remap   structure:   •  Words  (32bit)  in  the  alias   region  map  to  individual   bits  in  the  normal  SRAM   memory   •  The  remapped  writes  are   guaranteed  atomic   ProgrammersModel • The alias word at 0x2200001C maps to bit [7] of the bit-band byte at 0x20000000: 0x2200001C = 0x22000000 + (0*32) + 7*4. Figure 3-2 Bit-band mapping 0x23FFFFE4 0x22000004 0x23FFFFE00x23FFFFE80x23FFFFEC0x23FFFFF00x23FFFFF40x23FFFFF80x23FFFFFC 0x220000000x220000140x220000180x2200001C 0x220000080x22000010 0x2200000C 32MB alias region 0 7 0 07 0x200000000x200000010x200000020x20000003 6 5 4 3 2 1 07 6 5 4 3 2 1 7 6 5 4 3 2 1 07 6 5 4 3 2 1 07 6 5 4 3 2 1 6 5 4 3 2 107 6 5 4 3 2 1 07 6 5 4 3 2 1 0x200FFFFC0x200FFFFD0x200FFFFE0x200FFFFF 1MB SRAM bit-band region source:  ARM  DDI  0439C,  page  3-­‐20  
  • 15. #ESCBOS Event-­‐driven  scheduling Using  the  concepts  from  VHDL  and  the  atomic  Bit-­‐banding  from   Cortex-M  it  is  possible  to:   •  Implement  event-­‐driven  mul+tasking   •  Have  process()-­‐like  handlers  with  light  overhead   •  Implement  state  machine  logic  efficiently   •  Use  bit  signals  as  efficient  IPC  
  • 16. #ESCBOS Event-­‐driven  scheduling  (2) typedef uint32_t * PFLAGS_T; typedef volatile struct ipc_flags_t { // any object of this type is volatile qualified PFLAGS_T pflags_bits; // Ptr to the 'bit bandable' word with 32 ipc bits PFLAGS_T pflags_base; // Ptr to the base of the word alias array } IPC_FLAGS_T; // for the ipc macros, pass a IPC_FLAGS_T struct #define get_bit(flags, bit) ((flags).pflags_base[(bit)]) #define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1) #define clr_bit(flags, bit) ((flags).pflags_base[(bit)] = 0) #define toggle(flags, bit) ((flags).pflags_base[(bit)] ^= 1) #define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0) #define clr_bits(flags) (*((flags).pflags_bits) = 0) #define get_bits(flags, bitmask) (*((flags).pflags_bits) & (bitmask)) extern void init_ipc(void); extern uint32_t request_ipc_word(IPC_FLAGS_T *pflags);
  • 17. #ESCBOS Event-­‐driven  scheduling  (3) #define set_bit(flags, bit) ((flags).pflags_base[(bit)] = 1) so: set_bit(my_flags, 7); translates to: myflags.pflags_base[7] = 1; where: IPC_FLAGS_T myflags; myflags.pflags_base = (PFLAGS_T) 0x22000000; myflags.pflags_bits = (PFLAGS_T) 0x20000000; ...   0x00000001   bit-­‐band  alias  area   0x22000000   0x22000080   bit-­‐band  region  0x00000080  0x20000000  
  • 18. #ESCBOS Event-­‐driven  scheduling  (4) #define event(flags, bit) (get_bit((flags), (bit)) ? ((clr_bit((flags), (bit))), 1) : 0) so: if(event(my_flags, 7)) { ... } translates to: if(((myflags.pflags_base[7] = 0), 1)) after evaluation of the side effect, becomes: if((1)) comma  operator   side  effect  part   result  
  • 19. #ESCBOS Event-­‐driven  scheduling  (5) enum keypad_bits_t { bit_keypad_value_update = 0, bit_keypressed_wait, bit_refresh_debounce_tmr, }; void process_keypad(void) { if(event_refresh_debounce_tmr()) { keypad_data.debounce_tmr = KEYPAD_DEBOUNCE_TIME; keypad_data.state = KEYPAD_DEBOUNCE; } ... } static void trigger_keypad_update(void *object) { keypad_data.latched = read_keypad_value(); set_bit_refresh_debounce_tmr(); }
  • 20. #ESCBOS Event-­‐driven  scheduling  (6) This  event-­‐driven  architecture:   •  Is  simple  to  implement   •  Scales  well  even  with  mul+core  Cortex-­‐M  systems   •  Improves  processing  granularity   •  Can  be  implemented  in  hardware  on  ARM+FPGA  systems  
  • 21. #ESCBOS Hardware  scheduling The  event-­‐driven  scheduling  can  be  implemented  directly  in  hardware   on  a  ARM+FPGA  system.   Instead  of  using  a  round-­‐robin  cycle  in  firmware,  the  underlying   hardware  can  place  a  “call”  to  each  process()  according  to  its   sensi+vity  list.   This  approach  can  reduce  overhead  to  a  few  instruc+on  cycles  for  a   very  responsive  real+me  system.  
  • 22. #ESCBOS Mul3core  Cortex-­‐M  devices The  event-­‐driven  paradigm  can  be  effec+vely  implemented  in  a   mul+core  Cortex-­‐M  system  with  common  memory.   hRp://hothardware.com/newsimages/Item9563/cortex-­‐m3-­‐arm-­‐cpu.png   BUX  MATRIX   SHARED     RAM   SHARED  FLASH   This  approach  simplifies  system  par++oning   on  the  processor  cores,  and  can  decrease   system  response  +me  for  event-­‐driven  bare-­‐ metal  logic.   Even  when  no  bit-­‐banding  is  available  in  the   shared  memory,  atomic  events  can  be  used.  
  • 23. #ESCBOS Final  Thoughts The  event-­‐driven  paradigm  is  a  powerful  and  scalable  architectural   structure.   It  is  being  used  in  bare-­‐metal  embedded  systems  with  300KLOC+.   If  coupled  with  hardware  scheduling  support,  it  can  be  used  to   implement  very  fast  event  response  systems  that  are  very  hard  to   implement  with  priority-­‐based  schedulers.  
  • 24. #ESCBOS Thank  you   Jonny  Doin   jonnydoin@gridvortex.com