SlideShare a Scribd company logo
1 of 21
Download to read offline
FPGA design with CλaSH
Conrad Parker <conrad@metadecks.org>
fp-syd, February 2016
FPGA
• Essentially a bunch of logic gates and memory blocks
that can be told to connect up to be whatever digital
circuit you want it to be.
• You load a binary file onto it that describes that
configuration.
• You generate that binary file using an application like
Vivado, from source code that you wrote in a language
like VHDL.
• At this point we're back in software land – we have a
programming language and some tool that can compile
that down to a binary.
• Flash it, reboot, and you’re executing on hardware!
FPGA design with CλaSH Conrad Parker <conrad@metadecks.org>
VHDL
Unfortunately, VHDL is fairly tedious to develop:
• It is more low-level than assembly, in that you are
describing circuits of logic gates like AND and OR
and NOT and XOR, and worrying about clock timings and
which values need to be latched between stages of some
pipeline you're inventing.
• You don't necessarily have luxuries like floating point
numbers, and the tools can be flaky.
On the flip-side it is possible to create very efficient circuits:
you can optimize for space by using fewer bits, and you can
optimize for time by doing more things in parallel.
Clash
Clash is based on Haskell, with features of suitable for
hardware design such as length-typed bit vectors (the length
is part of the type declaration), primitive data flow strategies
like Mealy and Moore machines, and parallel reducers.
For example, the expression:
fold ( + ) vec
will construct a depth log n tree of adders, to sum the
contents of vec.
Clash
• There is "poor support" for recursive functions,
meaning that the recursion depth is explicitly
limited.
• Other than that, most commonly used Haskell
features seem to be supported.
• Haskell seems well-suited to designing parallel
systems, as the language simply describes
dependencies between expressions without
specifying an explicit order of evaluation.
CRC32
In this evaluation, I implemented and tested CRC32.
• I wrote 32 lines of Haskell
• ... which included 2 one-liners to describe some input
and expected output for testing.
• From that, Clash generated 742 lines (11 files) of VHDL
• ... including 394 lines (5 files) for the test bench (which
you would normally write by hand ...)
• Vivado uses the VHDL to make a fairly simple circuit
(the elaborated design)
• ... and synthesis creates an optimal circuit layout.
• The test bench result, from Vivado's hardware
simulator, includes a boolean "done" signal
which switches to TRUE when the test passes.
Test data
First we generate some test data, using a known-good implementation
(the crc32() function in zlib).
We use C to print the CRC32 for each successive letter of the input string.
Running this, we get the expected output for our test data:
$ gcc -std=c11 CRC32.c -o CRC32 -lz
$ ./CRC32 Optiver
3523407757, 2814004990, 3634756580, 584440114, 781824552,
1593772748, 2102791115,
#include <stdio.h>
#include <zlib.h>
unsigned long generate_crc32(const unsigned char * buf,
unsigned int len)
{
unsigned long crc_init = crc32(0L, Z_NULL, 0);
unsigned long crc_0 = crc32(crc_init, "0", 1);
return crc32(crc_0, buf, len);
}
int main (int argc, char *argv[])
{
unsigned int len = 0;
if (argc < 2) return 1;
for (len=0; argv[1][len] != '0'; len++)
printf ("%ld, ",
generate_crc32((const unsigned char*)argv[1], len));
printf ("n");
return 0;
}
Mealy machine
crc32T :: Unsigned 32 -> Unsigned 8 -> (Unsigned 32, Unsigned 32)
crc32T prevCRC c = (prevCRC', o)
where
prevCRC' = crc32Step prevCRC c
o = prevCRC
crc32 :: Signal (Unsigned 8) -> Signal (Unsigned 32)
crc32 = mealy crc32T 0
An abstraction that hardware designers use to describe a component that
has an input and output and keeps some state.
• Our crc32 function is a mealy machine with an initial state of 0.
• The transfer function crc32T updates the state (an Unsigned 32)
using an input Unsigned 8, producing a tuple of the new state and
the output (also an Unsigned 32, being the CRC up to the previous
character).
crc32Step :: Unsigned 32 -> Unsigned 8 -> Unsigned 32
crc32Step prevCRC c =
flipAll (tblEntry `xor` (crc `shiftR` 8))
where
tblEntry = crc32Table (truncateB crc `xor` c)
crc = flipAll prevCRC
flipAll = xor 0xffffffff
The crc32Step function does the actual CRC32 calculation, being a
bunch of XORs and bit shifts.
This is directly adapted from the pure Haskell implementation
in Data.Digest.Pure.CRC32, using Clash functions like truncateB.
The target bit width of truncateB does not need to be specified, it is
inferred by the context: Here, the result of truncation has to be xor'd
with the input character c, which is an Unsigned 8, so crc gets
truncated to 8 bits.
Async ROM
crc32Table :: Unsigned 8 -> Unsigned 32
crc32Table = unpack . asyncRomFile d256 "crc32_table.bin"
That 8 bit value is used to index into crc32Table, which we implement as
an async ROM. Async means that you can read it without waiting a clock
cycle; ROM of course means that it's read-only, but that doesn't force the
hardware tools to actually use a dedicated ROM (or RAM) component.
The values for the ROM are read from a file, which we generate by
dumping an array from zlib as binary.
Top entity
crc32 :: Signal (Unsigned 8) -> Signal (Unsigned 32)
crc32 = mealy crc32T 0
topEntity :: Signal (Unsigned 8) -> Signal (Unsigned 32)
topEntity = crc32
The topEntity is like a main function. The types of this are Signals,
which is a wrapper that Clash uses to describe synchronous (clocked)
circuits. You might notice that the crc32T function that we use to
construct a mealy machine does not use Signals, it is just a pure
function. The compiler can safely compose pure functions without
worrying about clocking. Clash uses this to allow things like
composing mealy machines.
Testbench
testInput :: Signal (Unsigned 8)
testInput = stimuliGenerator $(v [0 :: Unsigned 8, 79,
112,116,105,118,101,114])
expectedOutput :: Signal (Unsigned 32) -> Signal Bool
expectedOutput = outputVerifier $(v [0 :: Unsigned 32,
3523407757, 2814004990, 3634756580, 584440114, 781824552,
1593772748, 2102791115])
Writing a test is as simple as providing some testInput (here, the word
"Optiver" in ASCII), and the expectedOutput.
This will eventually be used to generate a circuit that we can use in hardware
simulation, but first …
Testing in the Clash REPL
*CRC32> sampleN 9 $ expectedOutput (topEntity testInput)
[False,False,False,False,False,False,False,False,
cycle(system1000): 8, outputVerifier
expected value: 2102791115, not equal to actual value:
1623372462
True]
Regarding the "expected value not equal to actual value" error, the Clash tutorial says:
“We can see that for the first N samples, everything is working as expected, after which
warnings are being reported. The reason is that stimuliGenerator will keep on
producing the last sample, ... while the outputVerifier will keep on expecting the last
sample, .... In the VHDL testbench these errors won't show, as the the global clock will be
stopped after N ticks.”
We can do something that we can't normally do with VHDL: we can
test it purely in software! Loading it into clash --interactive:
Generating VHDL
*CRC32> :vhdl
[1 of 1] Compiling CRC32 ( CRC32.hs, CRC32.o )
Loading dependencies took 0.053376s
Applied 143 transformations
Normalisation took 0.145971s
Netlist generation took 0.013298s
Applied 65 transformations
Applied 111 transformations
Testbench generation took 0.505447s
Total compilation took 0.752854s
Ok, finally it's time to generate some VHDL:
Simulation
• The done signal indicates the test passed, finished that it
finished.
• system1000 is a simulated 1000ns clock,
and system1000_rstn is a reset line.
• eta_i1[7:0] is the 8 bit input, which we display as ASCII
("Optiver"), and topLet_0[31:0] is the CRC32 output.
Elaborated design
• The elaborated design is the logical design
that Vivado generates after interpreting the
VHDL. We can see that it includes a ROM in
the middle, a few XORs and a right shift.
Synthesis
Finally, we can synthesize an actual design
using gates.
• 8 bits of input on the left, and 32 bits of
output on the right
• In the middle is a whole heap of lookup
tables and flip-flops.
• What happened to the ROM?
Although the FPGA does contain some "block RAM" which could be used to
store the crc table, for this little circuit Vivado actually found it more
efficient/better to implement the lookup as a collection of one-bit lookup
tables of different sizes (variously 3, 4, 5, or 6 inputs).
What happened to the ROM?
Good luck working backwards from the synthesized design to the logical
circuit!
Conclusion
The generated VHDL is quite readable, and the source
Haskell is far easier to reason about than VHDL.
Agile hardware development:
• The automated testbench generation is useful, as that
part of VHDL design is often quite tedious
• You can run tests in the software interpreter, even
before generating VHDL
The world’s finest imperative programming language is
also useful for implementing in hardware.
FPGA design with CλaSH Conrad Parker <conrad@metadecks.org>

More Related Content

What's hot

Vlsi lab manual exp:1
Vlsi lab manual exp:1Vlsi lab manual exp:1
Vlsi lab manual exp:1
komala vani
 
Lecture 2 verilog
Lecture 2   verilogLecture 2   verilog
Lecture 2 verilog
venravi10
 
Vlsi lab manual exp:2
Vlsi lab manual exp:2Vlsi lab manual exp:2
Vlsi lab manual exp:2
komala vani
 

What's hot (20)

Programs of VHDL
Programs of VHDLPrograms of VHDL
Programs of VHDL
 
Vectorization in ATLAS
Vectorization in ATLASVectorization in ATLAS
Vectorization in ATLAS
 
HKG15-211: Advanced Toolchain Usage Part 4
HKG15-211: Advanced Toolchain Usage Part 4HKG15-211: Advanced Toolchain Usage Part 4
HKG15-211: Advanced Toolchain Usage Part 4
 
Vhdl programming
Vhdl programmingVhdl programming
Vhdl programming
 
Klee and angr
Klee and angrKlee and angr
Klee and angr
 
VLSI Lab manual PDF
VLSI Lab manual PDFVLSI Lab manual PDF
VLSI Lab manual PDF
 
VHDL Packages, Coding Styles for Arithmetic Operations and VHDL-200x Additions
VHDL Packages, Coding Styles for Arithmetic Operations and VHDL-200x AdditionsVHDL Packages, Coding Styles for Arithmetic Operations and VHDL-200x Additions
VHDL Packages, Coding Styles for Arithmetic Operations and VHDL-200x Additions
 
VHDL CODES
VHDL CODES VHDL CODES
VHDL CODES
 
VLSI lab manual
VLSI lab manualVLSI lab manual
VLSI lab manual
 
Verilog lab manual (ECAD and VLSI Lab)
Verilog lab manual (ECAD and VLSI Lab)Verilog lab manual (ECAD and VLSI Lab)
Verilog lab manual (ECAD and VLSI Lab)
 
Vlsi lab manual exp:1
Vlsi lab manual exp:1Vlsi lab manual exp:1
Vlsi lab manual exp:1
 
Verilog overview
Verilog overviewVerilog overview
Verilog overview
 
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
 
EC6612 VLSI Design Lab Manual
EC6612 VLSI Design Lab ManualEC6612 VLSI Design Lab Manual
EC6612 VLSI Design Lab Manual
 
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)
Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)
 
Lecture 2 verilog
Lecture 2   verilogLecture 2   verilog
Lecture 2 verilog
 
Jumping into heaven’s gate
Jumping into heaven’s gateJumping into heaven’s gate
Jumping into heaven’s gate
 
Vlsi lab manual exp:2
Vlsi lab manual exp:2Vlsi lab manual exp:2
Vlsi lab manual exp:2
 
Appsec obfuscator reloaded
Appsec obfuscator reloadedAppsec obfuscator reloaded
Appsec obfuscator reloaded
 
Pythonによるカスタム可能な高位設計技術 (Design Solution Forum 2016@新横浜)
Pythonによるカスタム可能な高位設計技術 (Design Solution Forum 2016@新横浜)Pythonによるカスタム可能な高位設計技術 (Design Solution Forum 2016@新横浜)
Pythonによるカスタム可能な高位設計技術 (Design Solution Forum 2016@新横浜)
 

Similar to FPGA design with CλaSH

EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5
PRADEEP
 
Finding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated DisassemblyFinding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated Disassembly
Priyanka Aash
 
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
Positive Hack Days
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eve
chiportal
 
Shellcoding in linux
Shellcoding in linuxShellcoding in linux
Shellcoding in linux
Ajin Abraham
 

Similar to FPGA design with CλaSH (20)

SPARKNaCl: A verified, fast cryptographic library
SPARKNaCl: A verified, fast cryptographic librarySPARKNaCl: A verified, fast cryptographic library
SPARKNaCl: A verified, fast cryptographic library
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5
 
24-02-18 Rejender pratap.pdf
24-02-18 Rejender pratap.pdf24-02-18 Rejender pratap.pdf
24-02-18 Rejender pratap.pdf
 
Verilog
VerilogVerilog
Verilog
 
Picmico
PicmicoPicmico
Picmico
 
embedded C.pptx
embedded C.pptxembedded C.pptx
embedded C.pptx
 
The CppCat Analyzer Checks TortoiseGit
The CppCat Analyzer Checks TortoiseGitThe CppCat Analyzer Checks TortoiseGit
The CppCat Analyzer Checks TortoiseGit
 
Verilog
VerilogVerilog
Verilog
 
from Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Worksfrom Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Works
 
Finding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated DisassemblyFinding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated Disassembly
 
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
 
Dot matrix display design using fpga
Dot matrix display design using fpgaDot matrix display design using fpga
Dot matrix display design using fpga
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eve
 
Shellcoding in linux
Shellcoding in linuxShellcoding in linux
Shellcoding in linux
 
Microcontroller part 4
Microcontroller part 4Microcontroller part 4
Microcontroller part 4
 
How Triton can help to reverse virtual machine based software protections
How Triton can help to reverse virtual machine based software protectionsHow Triton can help to reverse virtual machine based software protections
How Triton can help to reverse virtual machine based software protections
 
DEF CON 23 - Rodringo Almeida - embedded system design from electronics
DEF CON 23 - Rodringo Almeida - embedded system design from electronics DEF CON 23 - Rodringo Almeida - embedded system design from electronics
DEF CON 23 - Rodringo Almeida - embedded system design from electronics
 
Performance Evaluation & Design Methodologies for Automated 32 Bit CRC Checki...
Performance Evaluation & Design Methodologies for Automated 32 Bit CRC Checki...Performance Evaluation & Design Methodologies for Automated 32 Bit CRC Checki...
Performance Evaluation & Design Methodologies for Automated 32 Bit CRC Checki...
 
Arduino by yogesh t s'
Arduino by yogesh t s'Arduino by yogesh t s'
Arduino by yogesh t s'
 
LDPC - Low Density Parity Check Matrix
LDPC - Low Density Parity Check MatrixLDPC - Low Density Parity Check Matrix
LDPC - Low Density Parity Check Matrix
 

FPGA design with CλaSH

  • 1. FPGA design with CλaSH Conrad Parker <conrad@metadecks.org> fp-syd, February 2016
  • 2. FPGA • Essentially a bunch of logic gates and memory blocks that can be told to connect up to be whatever digital circuit you want it to be. • You load a binary file onto it that describes that configuration. • You generate that binary file using an application like Vivado, from source code that you wrote in a language like VHDL. • At this point we're back in software land – we have a programming language and some tool that can compile that down to a binary. • Flash it, reboot, and you’re executing on hardware! FPGA design with CλaSH Conrad Parker <conrad@metadecks.org>
  • 3. VHDL Unfortunately, VHDL is fairly tedious to develop: • It is more low-level than assembly, in that you are describing circuits of logic gates like AND and OR and NOT and XOR, and worrying about clock timings and which values need to be latched between stages of some pipeline you're inventing. • You don't necessarily have luxuries like floating point numbers, and the tools can be flaky. On the flip-side it is possible to create very efficient circuits: you can optimize for space by using fewer bits, and you can optimize for time by doing more things in parallel.
  • 4. Clash Clash is based on Haskell, with features of suitable for hardware design such as length-typed bit vectors (the length is part of the type declaration), primitive data flow strategies like Mealy and Moore machines, and parallel reducers. For example, the expression: fold ( + ) vec will construct a depth log n tree of adders, to sum the contents of vec.
  • 5. Clash • There is "poor support" for recursive functions, meaning that the recursion depth is explicitly limited. • Other than that, most commonly used Haskell features seem to be supported. • Haskell seems well-suited to designing parallel systems, as the language simply describes dependencies between expressions without specifying an explicit order of evaluation.
  • 6. CRC32 In this evaluation, I implemented and tested CRC32. • I wrote 32 lines of Haskell • ... which included 2 one-liners to describe some input and expected output for testing. • From that, Clash generated 742 lines (11 files) of VHDL • ... including 394 lines (5 files) for the test bench (which you would normally write by hand ...) • Vivado uses the VHDL to make a fairly simple circuit (the elaborated design) • ... and synthesis creates an optimal circuit layout.
  • 7. • The test bench result, from Vivado's hardware simulator, includes a boolean "done" signal which switches to TRUE when the test passes.
  • 8. Test data First we generate some test data, using a known-good implementation (the crc32() function in zlib). We use C to print the CRC32 for each successive letter of the input string. Running this, we get the expected output for our test data: $ gcc -std=c11 CRC32.c -o CRC32 -lz $ ./CRC32 Optiver 3523407757, 2814004990, 3634756580, 584440114, 781824552, 1593772748, 2102791115,
  • 9. #include <stdio.h> #include <zlib.h> unsigned long generate_crc32(const unsigned char * buf, unsigned int len) { unsigned long crc_init = crc32(0L, Z_NULL, 0); unsigned long crc_0 = crc32(crc_init, "0", 1); return crc32(crc_0, buf, len); } int main (int argc, char *argv[]) { unsigned int len = 0; if (argc < 2) return 1; for (len=0; argv[1][len] != '0'; len++) printf ("%ld, ", generate_crc32((const unsigned char*)argv[1], len)); printf ("n"); return 0; }
  • 10. Mealy machine crc32T :: Unsigned 32 -> Unsigned 8 -> (Unsigned 32, Unsigned 32) crc32T prevCRC c = (prevCRC', o) where prevCRC' = crc32Step prevCRC c o = prevCRC crc32 :: Signal (Unsigned 8) -> Signal (Unsigned 32) crc32 = mealy crc32T 0 An abstraction that hardware designers use to describe a component that has an input and output and keeps some state. • Our crc32 function is a mealy machine with an initial state of 0. • The transfer function crc32T updates the state (an Unsigned 32) using an input Unsigned 8, producing a tuple of the new state and the output (also an Unsigned 32, being the CRC up to the previous character).
  • 11. crc32Step :: Unsigned 32 -> Unsigned 8 -> Unsigned 32 crc32Step prevCRC c = flipAll (tblEntry `xor` (crc `shiftR` 8)) where tblEntry = crc32Table (truncateB crc `xor` c) crc = flipAll prevCRC flipAll = xor 0xffffffff The crc32Step function does the actual CRC32 calculation, being a bunch of XORs and bit shifts. This is directly adapted from the pure Haskell implementation in Data.Digest.Pure.CRC32, using Clash functions like truncateB. The target bit width of truncateB does not need to be specified, it is inferred by the context: Here, the result of truncation has to be xor'd with the input character c, which is an Unsigned 8, so crc gets truncated to 8 bits.
  • 12. Async ROM crc32Table :: Unsigned 8 -> Unsigned 32 crc32Table = unpack . asyncRomFile d256 "crc32_table.bin" That 8 bit value is used to index into crc32Table, which we implement as an async ROM. Async means that you can read it without waiting a clock cycle; ROM of course means that it's read-only, but that doesn't force the hardware tools to actually use a dedicated ROM (or RAM) component. The values for the ROM are read from a file, which we generate by dumping an array from zlib as binary.
  • 13. Top entity crc32 :: Signal (Unsigned 8) -> Signal (Unsigned 32) crc32 = mealy crc32T 0 topEntity :: Signal (Unsigned 8) -> Signal (Unsigned 32) topEntity = crc32 The topEntity is like a main function. The types of this are Signals, which is a wrapper that Clash uses to describe synchronous (clocked) circuits. You might notice that the crc32T function that we use to construct a mealy machine does not use Signals, it is just a pure function. The compiler can safely compose pure functions without worrying about clocking. Clash uses this to allow things like composing mealy machines.
  • 14. Testbench testInput :: Signal (Unsigned 8) testInput = stimuliGenerator $(v [0 :: Unsigned 8, 79, 112,116,105,118,101,114]) expectedOutput :: Signal (Unsigned 32) -> Signal Bool expectedOutput = outputVerifier $(v [0 :: Unsigned 32, 3523407757, 2814004990, 3634756580, 584440114, 781824552, 1593772748, 2102791115]) Writing a test is as simple as providing some testInput (here, the word "Optiver" in ASCII), and the expectedOutput. This will eventually be used to generate a circuit that we can use in hardware simulation, but first …
  • 15. Testing in the Clash REPL *CRC32> sampleN 9 $ expectedOutput (topEntity testInput) [False,False,False,False,False,False,False,False, cycle(system1000): 8, outputVerifier expected value: 2102791115, not equal to actual value: 1623372462 True] Regarding the "expected value not equal to actual value" error, the Clash tutorial says: “We can see that for the first N samples, everything is working as expected, after which warnings are being reported. The reason is that stimuliGenerator will keep on producing the last sample, ... while the outputVerifier will keep on expecting the last sample, .... In the VHDL testbench these errors won't show, as the the global clock will be stopped after N ticks.” We can do something that we can't normally do with VHDL: we can test it purely in software! Loading it into clash --interactive:
  • 16. Generating VHDL *CRC32> :vhdl [1 of 1] Compiling CRC32 ( CRC32.hs, CRC32.o ) Loading dependencies took 0.053376s Applied 143 transformations Normalisation took 0.145971s Netlist generation took 0.013298s Applied 65 transformations Applied 111 transformations Testbench generation took 0.505447s Total compilation took 0.752854s Ok, finally it's time to generate some VHDL:
  • 17. Simulation • The done signal indicates the test passed, finished that it finished. • system1000 is a simulated 1000ns clock, and system1000_rstn is a reset line. • eta_i1[7:0] is the 8 bit input, which we display as ASCII ("Optiver"), and topLet_0[31:0] is the CRC32 output.
  • 18. Elaborated design • The elaborated design is the logical design that Vivado generates after interpreting the VHDL. We can see that it includes a ROM in the middle, a few XORs and a right shift.
  • 19. Synthesis Finally, we can synthesize an actual design using gates. • 8 bits of input on the left, and 32 bits of output on the right • In the middle is a whole heap of lookup tables and flip-flops. • What happened to the ROM?
  • 20. Although the FPGA does contain some "block RAM" which could be used to store the crc table, for this little circuit Vivado actually found it more efficient/better to implement the lookup as a collection of one-bit lookup tables of different sizes (variously 3, 4, 5, or 6 inputs). What happened to the ROM? Good luck working backwards from the synthesized design to the logical circuit!
  • 21. Conclusion The generated VHDL is quite readable, and the source Haskell is far easier to reason about than VHDL. Agile hardware development: • The automated testbench generation is useful, as that part of VHDL design is often quite tedious • You can run tests in the software interpreter, even before generating VHDL The world’s finest imperative programming language is also useful for implementing in hardware. FPGA design with CλaSH Conrad Parker <conrad@metadecks.org>