Design and implementation of a Neural Network based image compression engine as part of Final Year Project by Jesu Joseph and Shibu Menon at Nanyang Technological University. The project won the best possible grade and excellent accolades from the research center.
17. STEPS STEP 1: Find closest Neuron (neuron c) ||X(t) – W c (t)|| = min{||X(t)-W i (t)||} STEP 2: Update Weight of the winning Neuron and the Neurons in the topological neighborhood. Wi(t+1) = Wi(t) + α (t).{X(t) – W(t)} For i Є N c (t) Neighborhood Iterate STEP1 and STEP2 Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A
25. ARCHITECTURAL NOVELTIES Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Implementation of 7-bit learning: 7-bit learning, mapping of pixels to an octal space and encoding the MSB plane with the image, are new theoretical ideas that we implemented on hardware. This mode should theoretically create images with a better quality than those encoded using 8-bit mode. This is because the neurons are more closely packed in a smaller space, hence creating a better response on the structure from each pixel. Implementation of the 8-bit and the 7-bit Learning Algorithm: The same hardware can process an image in both 7-bit and 8-bit modes. A single push-button switch on the FPGA board sets the mode for a cycle. This is useful because certain images give a better output on 7-bit mode than on 8-bit or vice versa and they can be compared for later studies. This is done, keeping in mind the need for future upgrading of the functionalities. A module can be added to the design that calculates mean-square error for both 7-bit and 8-bit images and the better one can be selected.
26. ARCHITECTURAL NOVELTIES Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Integration of encoding hardware to the learning hardware: The integration of encoding and learning hardware ensures faster compression and reduces the hardware overhead. This is done, keeping in mind the future practical application of the hardware for real-time video compression, rather than just for stand-alone images. Implementation of the variable learning rate: Variable learning rate (using learning rates of 1/2,1/4,…etc.) is a novel feature of this arhitecture. This ensures better updating of neighbors based on its distance from the winning neuron, rather than a fixed updating. The neighbors are updated based on 5 ranges of distances from the winner and the updating distances calculated through theoretical calculations.
27. ARCHITECTURAL NOVELTIES Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Implementation of the learning rate depending on the frequency count: The frequency count value is calculated so that all neurons get equal chance of being the winner. At the same time, the algorithm ensures that a neuron that has been a winner the most number of the time doesn’t get updated as much as the ones that are not as lucky. This is not seen in any other similar algorithms. Implementation of the Neighbor updating: Neighbor-updating together with the winner-updating is another novel feature of our algorithm. This makes the design complicated, but the output quality is considerably improved compared to other architecture.
37. Parallel architecture for image compression Synthesis Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Technology Libraries Verilog Code Constraints Synthesis tool Prototype Model Schematic optimized net-list In-signal file Out-signal file Xilinx ISE Series 4.1i
38. Parallel architecture for image compression Synthesis Introduction | Algorithm | Architecture | Results | Conclusion | Q&A ================== Chip top-optimized ================== Summary Information: -------------------- Type: Optimized implementation Source: top, up to date Status: 0 errors, 0 warnings, 0 messages Export: exported after last optimization Chip create time: 0.000000s Chip optimize time: 598.734000s FSM synthesis: ONEHOT Target Information: ------------------- Vendor: Xilinx Family: VIRTEX Device: V800HQ240 Speed: -4 Chip Parameters: ---------------- Optimize for: Speed Optimization effort: Low Frequency: 50 MHz Is module: No Keep io pads: No Number of flip-flops: 3129 Number of latches: 0
39.
40. Parallel architecture for image compression FPGA Implementation Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Upload configuration files and image to the on-board memory Upload the FPGA bit file to the CPLD BAR LED 1 glows - FPGA is configured Press Push Button 1 (START) to start the learning process BAR LED 2 glows – 2 loops completed BAR LED 3 glows – 4 loops completed BAR LED 4 glows – 6 loops completed BAR LED 5 glows – 10 loops completed BAR LED 6 glows – Encoding completed Download the image and convert it to tiff format
43. Parallel architecture for image compression Conclusion 1. 7-bit process better than 8-bit process Introduction | Algorithm | Architecture | Results | Conclusion | Q&A 2. Suitable for real-time encoding and streaming of video images (About 12 seconds at 5MHz) 3. Use of frequency count register gives better images 4. More the loops, better the image (8-bit, beyond 5 loops). Similar to human learning
44. Parallel architecture for image compression Recommendation 1. Algorithm can be modified to improve learning time Introduction | Algorithm | Architecture | Results | Conclusion | Q&A 2. Real time video compression with 2 parallel learning chips 3. Both 7-bit and 8-bit in the same hardware 4. MSB plane compression
45. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Q & A