16. What is FPGA?
• Field Programmable Gate Array
− LSIs whose contents can
be changed any time
− We can design a unique
digital circuit (HW) on it
− Two major vendors
Xilinx・Altera (powered by Intel)
16
IOB
SB
CB LB
IOB
IOB
SB
IOB LB
SB
CB CB
SB
CB
SB SB
CB CBLB IOBIOB LB
SB SB SB
IOB IOB
CB
CB
CB
CB
CB
CB
I/O block
connection blockLB logic block
IOBSB switching block
CB
LUT
IN OUT
0000 1
0001 0
0010 0
… …
1110 1
1111 0
D-FF
D Q
17. How to Use of FPGA
17
processor
通信バス
FPGA
Offloading
heavy processing
HW
HW
interface IF
circuit
performance improvement
and low power consumption
can be achieved
SW
SW
communication
between SW/HW
SW
SW
IF
driver
18. Advantages of FPGA
18
FPGA
Memory
Func Func
Func FuncFunc
Func FuncFunc
FuncFunc
• Various systems can be designed onto one LSI
• High performance / low power consumption
• Parallel processing can be realized at task/data level
• Data streaming processing can be realized
19. Current Technology Trends
• Increase in circuit scale and amount of LB
− High performance systems can be realized
− Further increase will continue by new technology
multi-die, 3D stacking,,,
• Tightly coupling with processors
− General-purpose: Connection via PCIe to processors
− Embedded: Integration with embedded processors
19
high-quality system design
in a short time
has become difficult,,,
20. High Level Synthesis (HLS)
• Solution to improve design productivity!
− Technology for synthesizing HDL from behavioral
descriptions with a programming language
C/C++ or its extension is commonly used
− Abstraction level of design becomes higher
20
int func (int x) {
int a[N];
int i;
for(i=0;i<N;i++){
a[i] = ・・・;
:
:
}
:
}
x
func
i
a
21. Commercial HLS Tools
• Xilinx Vivado HLS
− Synthesize from C/C++
− #pragma is offered to
indicate the optimization
21
• Intel SDK for OpenCL
− Synthesize from
OpenCL parallelized code
− Can be executed with same
description as the host PC
Ref: Xilinx Inc. White paper UG902
D. Neto, Optimizing OpenCL for Altera FPGAs, Int’l Workshop on Open CL, 2014.
It is essential to understand
#pragma and libraries deeply
for deriving optimized hardware
22. not only C/C++!!
• Chisel: Scala based
− Object Oriented / Functional styled DSL
• CλaSH: Haskell based
− Synthesize HDL from description of functional language
• Karuta: original scripting language
• Synthesijer: Java based
− HLS from the subset of Java specification
• PyCoRAM, Polyphony: Python based
Veriloggen: Python library for HDL design
• Mulvery: Ruby based
− Synthesis from Reactive Programming
• Octopus🐙:OCaml based
22
developed by
hls-friends!!
24. OK, What We Want is,,,
24
We want to design
HW by Elixir!!
We want to operate HW
from our Elixir code!!
25. Concept of Cockatrice
• Why Elixir would be suitable for HW design?
• HW synthesis flow from Elixir code
• SW/HW communication interface
26. What is Cockatrice?
• Summoned beast that appears in FF4 (^^;
− The effect is to make all enemies to stones
• Hardware design environment with Elixir!
• Features
− It synthesizes Elixir Zen Styled code
to the description of HW circuits
− It provides communication interface
between Elixir code and HW circuits
26
Your Elixir code can be accelerated,
and low-powered!!
NOTE: Current logo of cockatrice is from Wikipedia
27. Zen’s process model
27
input_list
|> Flow.from_enumerable(stages: 4)
|> Flow.map(& foo(&1))
|> Flow.map(fn a->-a end)
|> Enum.to_list
|> Enum.sort
from_
enumerable
input_list
foo
foo
foo
foo
sortto_list
arbitrator
-a
-a
-a
-a
It’s similar to
efficient HW
architecture!!
28. Zen is suitable for HW design!
28
Cockatrice
input_list
|> Flow.from_enumerable(stages: 4)
|> Flow.map(& foo(&1))
|> Flow.map(fn a->-a end)
|> Enum.to_list
|> Enum.sort
from_
enumerable
input_list
foo
foo
foo
foo
sortto_list
arbitrator
-a
-a
-a
-a
We summon Cockatrice to lithify
Elixir Zen Styled Code
as parallel HW stones!!
29. Effect of Cockatrice
29
Input
List from_
enume
rable
to_list
sort
foo -a
foo -a
foo -a
foo -a
arbitrator
foo -a
foo
-a
foo -a
foo
-a
foo
-a
foo
foo -a
foo
-a
foo -a -a
foo -a
foo -afoo -a
foo -a
30. HW Description by Elixir
• defcockatrice part will be
treat as HW description
− It is completely equivalent
to native Elixir code
You do not need to
consider HW design
It can be verified at
functional level
• HW module can be called
as same as SW function
− We assume SW/HW
cooperative systems
30
31. Synthesis Flow
31
Code analysis &
AST optimization
design desc.
Elixir
templates for IP
DSL
info. of desc.
AST
Synthesis of
HW modules from
Elixir function
HW IP modules
HDL
data flow HW
circuit HDL
HW circuits
bitstream
logic synthesis
SW app
Elixir+C(NIF)
Compilation
of SW
Generation of
device driver
of I/F circuit
Synthesis of
data flow
I/F driver
C(NIF)
32. Code analysis &
AST optimization
design desc.
Elixir
templates for IP
DSL
info. of desc.
AST
Synthesis of
HW modules from
Elixir function
HW IP modules
HDL
data flow HW
circuit HDL
HW circuits
bitstream
logic synthesis
SW app
Elixir+C(NIF)
Compilation
of SW
Generation of
device driver
of I/F circuit
Synthesis of
data flow
I/F driver
C(NIF)
Synthesis Flow
32
Metaprogramming method is employed
to derive AST of Zen styled design
description by Quote function
33. Code analysis &
AST optimization
design desc.
Elixir
templates for IP
DSL
info. of desc.
AST
Synthesis of
HW modules from
Elixir function
HW IP modules
HDL
data flow HW
circuit HDL
HW circuits
bitstream
logic synthesis
SW app
Elixir+C(NIF)
Compilation
of SW
Generation of
device driver
of I/F circuit
Synthesis of
data flow
I/F driver
C(NIF)
Synthesis Flow
33
we provide templates of HDL
code that are equivalent to
Enum functions as DSL files
HDL code is synthesized by
applying pattern matching
with AST and DSL
34. Code analysis &
AST optimization
design desc.
Elixir
templates for IP
DSL
info. of desc.
AST
Synthesis of
HW modules from
Elixir function
HW IP modules
HDL
data flow HW
circuit HDL
HW circuits
bitstream
logic synthesis
SW app
Elixir+C(NIF)
Compilation
of SW
Generation of
device driver
of I/F circuit
Synthesis of
data flow
I/F driver
C(NIF)
Synthesis Flow
34
each modules is connected as data flow
from AST representation of |> and Flow
data flow and parallel processing
HW circuit is finally synthesized!!
35. Code analysis &
AST optimization
design desc.
Elixir
templates for IP
DSL
info. of desc.
AST
Synthesis of
HW modules from
Elixir function
HW IP modules
HDL
data flow HW
circuit HDL
HW circuits
bitstream
logic synthesis
SW app
Elixir+C(NIF)
Compilation
of SW
Generation of
device driver
of I/F circuit
Synthesis of
data flow
I/F driver
C(NIF)
Synthesis Flow
35
communication interface
and its driver are
generated as NIF function
36. Code analysis &
AST optimization
design desc.
Elixir
templates for IP
DSL
info. of desc.
AST
Synthesis of
HW modules from
Elixir function
HW IP modules
HDL
data flow HW
circuit HDL
HW circuits
bitstream
logic synthesis
SW app
Elixir+C(NIF)
Compilation
of SW
Generation of
device driver
of I/F circuit
Synthesis of
data flow
I/F driver
C(NIF)
SW binary and HW bit files are
compiled by respective tools
Synthesis Flow
36
SW binary and HW bit files are
compiled by respective tools
37. SW/HW Comm. Interface
• Activation/Operation to
HW from Elixir code
• Data communication
between SW and HW
− AXI4 bus on Zynq is used
• We implement device
driver as C/NIF module
− ikwzm/udmabuf is used
for DMA transfer
− Elixir/Erlang list should be
converted to C array
37
FPGA
processor
DMA buffer
HW circuits
Elixir app
Erlang VM
device driver
(NIF module)
interface
circuit
44. Discussion
• Currently, we just implement prototypes
− We will publish them as Hex pkgs very soon,,,
− Currently supported features are limited
IOW, we only synthesize Zen styled code
Are another Elixir/Erlang process models
suitable for efficient HW architecture?
− Quantitative evaluation of our proposal will be also
important (to verify academic contribution^^;
44
45. Discussion
• Applicable range of Cockatrice?
− Not only embedded, but also HPC domain!?
Bigger data for Cockatrice would be suitable
since there is some overhead on SW/HW comm.
− AI/ML would be a killer application
Big data stream processing for IoT
Cloud processing that allows users
to change functions flexibly
− We are planning to support large-scale FPGA
boards with comm. interface for PCIe bus
45
46. BTW, I love Nerves!!
• Experiences at Lonestar2019 was great for me!
• I made a presentation to promote the innovation of
Nerves to Japan at Erlang & Elixir Fest 2019!!
46
Nervesが開拓する
『ElixirでIoT』
の新世界
⾼瀬 英希
(京都⼤学/JSTさきがけ)
takase@i.kyoto-u.ac.jp
18
ライブデモのお品書き
1. Nervesプロジェクトの準備とビルド
2. microSDに書き込んでブート・IEx実⾏
3. ソース編集してlocal ssh書き込み
4. NervesHubから書き込み
5. Scenic連携&GPIOデバイスの制御
Raspberry Pi Zero WH Adafruit 128x64 OLED Bonnet
https://github.com/takasehideki/eefest19demo
NervesKey
『ElixirでIoT』の新世界︕
25
デバイス
エッジサーバ クラウド
あらゆるモノ・コト・ヒトを
ネットワーク化︕
情報科学の総合格闘技︕
新たな社会的価値を創出!!
みんなで⼀緒に
IoTを創ろう︕
14
NervesHub
•サーバ経由のOTA (Over The Air) で
Nervesアプリをリモートデプロイ︕
- X.509署名証明書とNervesKey回路で
セキュアな接続経路を実現
- 更新先とファームを任意指定可
47. Future Direction
47
What will happen
when Nerves meets Cockatrice?
Please help us, to evolve the new era of
"IoT development with Elixir"
48. Future of “Elixir for IoT”
48
device
edge server cloud
これがワタシの
Extreme Computing!!
49. Thank to,,, with Wabi-Sabi
• My students in lab.
− Kentaro Matsui
− Yasuhiro Nitta
• My research partners at fukuoka.ex
− @zacky1972
− @hisawayex
− @piacere_ex
− @enpedasi
• My friends at hls-friends
− Tech comm. for self-made high-level synthesis tools
49
Notas do Editor
I’m Hideki Takase from Kyoto, Japan.
This is the second time for me to attend ElixirConf. First time was Lonestar in this year.
So, nice to see you or long time no see!
It’s my big pleasure to present our work on ElixirConf.
Thank you so much to accept my talk proposal.
粗粒度並列化よりパイプライン化とかのほうが効くかも それを指定できると良いかも
どこかで高位合成しないといけないのだから,HLS Cを吐くアプローチを取ったほうが手っ取り早いのでは?
Elixir/ErlangからCに変換するようなコンパイラの研究ありそう
データを流すから通信が重くなる 共有メモリへのアクセスとかで改善したほうが良さそう
最適化目標を指定できるようにしたほうが良いのか?
My presentation consists of 3 parts.
How many people do you know about FPGA?
First part is just a lecture in the University. I will introduce about hardware design and FPGA.
OK, let’s go to the 1st part.
by providing the training materials from Frank and Justin.
He agreed to hold it in Japan.
Thank you so much, Justin & Frank,
My presentation consists of 3 parts.
How many people do you know about FPGA?
First part is just a lecture in the University. I will introduce about hardware design and FPGA.
OK, let’s go to the 1st part.
There is pros and cons between SW and HW.
For HW, performance and power efficiency are much better compared with processor because HW can be operated as the demand. In addition, high-performance parallel processing can be realized easily if we can design HW carefully.
On the other hands, one of the advantages of processor is design flexibility. We can realize various application respective to your programming.
So, FPGA takes good advantage from both resources. This means that FPGA is better performance and power efficiency than processor with better design flexibility.
I introduce what is FPGA?
FPGA stands for field programmable gate array, that is LSI whose contents can be changed any time as you want. So, we can realize a unique digital circuits on FPGA.
There is 2 major vendors, Xilinx and Intel Altera.
As shown in this figure, internal architecture of FPGA is expressed as the systolic array of logic block, connection, switching, and I/O blocks. The logic block consists of lookup tables and data flip-flop.
So, we can change the HW Behavior by deciding the values of LUTs and their connections.
I will show general usage of FPGA.
FPGA is typically used with processor as the accelerator.
We can offload the part of heavy processing on processor to the FPGA. So, we expect the performance improvement and power savings by utilizing FPGA.
To communicate between processors and FPGA efficiently, we need the suitable communication interface.
Higher quality HW/SW cooperative system can be realized
逐次,分岐,繰り返しなどの制御はステートマシンとして,変数はレジスタ,配列はメモリとして生成される
sequential
オープンソース!
LegUpはトロント大,ChiselはUCB
18分でいきたい
The effect of cockatrice is to make all enemies to stones. So, I decide its codename that makes your Elixir code to HW.
flow genstage
Metaprogramming
udmabuf is a Linux device driver that allocates contiguous memory blocks in the kernel space as DMA buffers and makes them available from the user space. It is intended that these memory blocks are used as DMA buffers when a user application implements device driver in user space using UIO (User space I/O).
Zynq-7000 devices are equipped with dual-core ARM Cortex-A9 processors integrated with 28nm Artix-7 or Kintex®-7 based programmable logic for excellent performance-per-watt and maximum design flexibility. With up to 6.6M logic cells and offered with transceivers ranging from 6.25Gb/s to 12.5Gb/s, Zynq-7000 devices enable highly differentiated designs for a wide range of embedded applications including multi-camera drivers assistance systems and 4K2K Ultra-HDTV.
EG devices feature a quad-core ARM® Cortex-A53 platform running up to 1.5GHz. Combined with dual-core Cortex-R5 real-time processors, a Mali-400 MP2 graphics processing unit, and 16nm FinFET+ programmable logic, EG devices have the specialized processing elements needed to excel in next-generation wired and 5G wireless infrastructure, cloud computing, and Aerospace and Defense applications.
I’m Hideki Takase from Kyoto, Japan.
This is the second time for me to attend ElixirConf. First time was Lonestar in this year.
So, nice to see you or long time no see!
It’s my great pleasure to present our work. Thank you so much to accept my talk proposal.
粗粒度並列化よりパイプライン化とかのほうが効くかも それを指定できると良いかも
どこかで高位合成しないといけないのだから,HLS Cを吐くアプローチを取ったほうが手っ取り早いのでは?
Elixir/ErlangからCに変換するようなコンパイラの研究ありそう
データを流すから通信が重くなる 共有メモリへのアクセスとかで改善したほうが良さそう
最適化目標を指定できるようにしたほうが良いのか?
Task, GenServer, and so on
Bigger data would be suitable
The new era of "IoT development with Elixir" pioneered by Nerves technology
This is the last slide.
I believe if Nerves can control the FPGA directly.
19:30
First of all, let me introduce my research collaborator since Wabi-Sabi is important for Japanese.
I would like to say a big thank to Kentaro and Yasuhiro. They are my students in laboratory, and have made a great effort on this project.
I appreciate the members of fukuoka.ex, It is a Elixir community in Fukuoka, Japan. They always give a technical support for me. As you may know, Zacky and Hisaway will present their work for a novel technology about GPU with Elixir.
I also appreciate the members of hls-friends. It is a Japanese community for self-made high level synthesis tools by various programming languages. These members give useful comments and motivation to my project.