PBL1-v1-012e.pptx

•Transferir como PPTX, PDF•

0 gostou•9 visualizações

The document discusses different computational architectures including scalar, SIMD, CGRA, and beyond-neuromorphic systems. It focuses on IMAX3, a dataflow-centric coarse-grained reconfigurable array (CGRA) and its scalability. IMAX3 can perform 307,200 operations across 64 units in 4 cycles. The document also discusses pipelining strategies when using IMAX3 with HBM2 memory and developing a next-generation CGRA called IMAX2.

Dispositivos e hardware

CPU GPU
Ultimate CGRA w/ high-speed compiler
CGRA for Energy-efficient Cryptography
Beyond-Neuromorphic Systems
Non-Deterministic Computing
1
IMAX3: Amazing Dataflow-Centric CGRA
and its Scalability ep.12
20210401

2
Scalar, SIMD and CGRA
20220202
time
I1
L2
VST
L2
VLD VLD
VFMA
I1
L2
VST
L2
VLD VLD
VFMA
I1
L2
VST
L2
VLD VLD
VFMA
I1
L2
VST
L2
VLD VLD
VFMA
MM
LD LM MM
LD LM FMA LM
ST LD LM LD LM FMA LM
ST
LD LM MM
LD LM FMA LM
ST LD LM LD LM FMA LM
ST
LD LM MM
LD LM FMA LM
ST LD LM LD LM FMA LM
ST
LD LM MM
LD LM FMA LM
ST LD LM LD LM FMA LM
ST
LD LD FMA ST
D1 D1 D1
I1 I1 I1
L2
L2
LD LD FMA ST
D1 D1 D1
I1 I1 I1
L2
L2
LD LD FMA ST
D1 D1 D1
I1 I1 I1
L2
L2
LD LD FMA ST
D1 D1 D1
I1 I1 I1
L2
L2
LD LD FMA ST
D1 D1 D1
I1 I1 I1
L2
L2
LD LD FMA ST
D1 D1 D1
I1 I1 I1
L2
L2
LD LD FMA ST
D1 D1 D1
I1 I1 I1
L2
L2
LD LD FMA ST
D1 D1 D1
I1 I1 I1
L2
L2
MM
I1
I1
I1
I1
VST
VST
VST
VST
VFMA
VFMA
VFMA
VFMA
VLD
VLD
VLD
VLD
VLD
VLD
VLD
VLD
MM
Scalar
(VL=32)
Vector1
(VL=256)
Vector2
(VL=2048)
CGRA
(VL=16K)

HBM2
Medium pipelining
20210401
4
Double buffering

IMAX (64units) x 120 modules = 307200 operations / 4 cycles
20210401
6
HBM2
HOST HOST
IMAX (1.2mm2/8nm) x 120 modules ≃ 144mm2 ?

Top-down approach
20210401
7
HBM2
HOST HOST
I/O I/O
FFT CONV
MM SORT SpMM

20210401
9
HBM2 and VMK version will appear soon
IMAX2: Ultra-speed compilable CGRA 2022/12/XX
First CGRA, based on linear cores (not island-style)
32-unit, 1280-operations/4cycle (768-int32, 256-fp32, 256-media8/16,
512-load/store, 1024-stochastic-fma8, and 128-sparse-matrix)
IMAX2 32 cores 250MHz
1280 operations per 4 cycles
ALVEO-U280/U280
Memory/core: 64KB
Operations/core: 32-load/8-store, quad-sparse-load,
3-cascaded octa-int/media, octa-single-float FMA,
32-stochastic FMA
http://archlab.naist.jp/proj-arm64/fpga/U280-step4000-20221020.img.gz
IMAX2 32 cores 250MHz
1280 operations per 4 cycles
VMK180/VM1802
Memory/core: 64KB
Operations/core: 32-load/8-store, quad-sparse-load,
3-cascaded octa-int/media, octa-single-float FMA,
32-stochastic FMA
http://archlab.naist.jp/proj-arm64/fpga/VMK180-step4000-20221020.img.gz

Mais conteúdo relacionado

Semelhante a PBL1-v1-012e.pptx

GSM GPRS sim900 a modem with aurdino compatibleRaghav Shetty

Earthquake detection social networkingKashifKhan417

PROJECT REVIEWASHISH RANJAN

Part-2: Mastering microcontroller with embedded driver developmentFastBit Embedded Brain Academy

Versatile modular electronics for rapid design and developmentGokul Nair

Z turn boardmyirtech

automatic railway gate controll using ir sensorAnurag Reddy

Solution on Handheld Signal Generator Premier Farnell

SmartAdvent Electronics Global Pte Ltd

GGSN-Gateway GPRS Support NodeMustafa Golam

Muda ProposalSyoyo Fujita

Jetson AGX Xavier and the New Era of Autonomous MachinesDustin Franklin

IBM QRadar V7.3.0 Hardware Guide - quick overview v0.1Prakash Pawar

DASW6CMB6A0 Quanta SW6C /SCHEMATICES.pdfMARK42AWMREALGAMER

Final presentation [dissertation project], 20192 esv0002MOHAMMED FURQHAN

DDR DIMM DesignMohamad Tisani

Imaging on embedded GPUsMikael Bourges-Sevenier

Tablet in 2012JJ Wu

Gigalight Solutions for Data Center and Cloud ComputingGigalight

Introduction to PIC18FX6J Series MCUsPremier Farnell

Semelhante a PBL1-v1-012e.pptx (20)

GSM GPRS sim900 a modem with aurdino compatible

Earthquake detection social networking

PROJECT REVIEW

Part-2: Mastering microcontroller with embedded driver development

Versatile modular electronics for rapid design and development

Z turn board

automatic railway gate controll using ir sensor

Solution on Handheld Signal Generator

Smart

GGSN-Gateway GPRS Support Node

Muda Proposal

Jetson AGX Xavier and the New Era of Autonomous Machines

IBM QRadar V7.3.0 Hardware Guide - quick overview v0.1

DASW6CMB6A0 Quanta SW6C /SCHEMATICES.pdf

Final presentation [dissertation project], 20192 esv0002

DDR DIMM Design

Imaging on embedded GPUs

Tablet in 2012

Gigalight Solutions for Data Center and Cloud Computing

Introduction to PIC18FX6J Series MCUs

Mais de NAIST

PBL1-v1-200j.pptxNAIST

PBL1-v1-200e.pptxNAIST

PBL1-v1-100j.pptxNAIST

PBL1-v1-100e.pptxNAIST

PBL1-v1-014j.pptxNAIST

PBL1-v1-013j.pptxNAIST

PBL1-v1-013e.pptxNAIST

PBL1-v1-011j.pptxNAIST

PBL1-v1-010j.pptxNAIST

PBL1-v1-009j.pptxNAIST

PBL1-v1-008j.pptxNAIST

PBL1-v1-007j.pptxNAIST

PBL1-v1-006j.pptxNAIST

PBL1-v1-005j.pptxNAIST

PBL1-v1-004j.pptxNAIST

PBL1-v1-003j.pptxNAIST

PBL1-v1-002j.pptxNAIST

PBL1-v1-001j.pptxNAIST

PBL1-v0-200j.pptxNAIST

Mais de NAIST (19)

PBL1-v1-200j.pptx

PBL1-v1-200e.pptx

PBL1-v1-100j.pptx

PBL1-v1-100e.pptx

PBL1-v1-014j.pptx

PBL1-v1-013j.pptx

PBL1-v1-013e.pptx

PBL1-v1-011j.pptx

PBL1-v1-010j.pptx

PBL1-v1-009j.pptx

PBL1-v1-008j.pptx

PBL1-v1-007j.pptx

PBL1-v1-006j.pptx

PBL1-v1-005j.pptx

PBL1-v1-004j.pptx

PBL1-v1-003j.pptx

PBL1-v1-002j.pptx

PBL1-v1-001j.pptx

PBL1-v0-200j.pptx

Último

Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)amitlee9823

Kothanur Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...amitlee9823

Deira Dubai Escorts +0561951007 Escort Service in Dubai by Dubai Escort GirlsEscorts Call Girls

Call Girls in Vashi Escorts Services - 7738631006Pooja Nehwal

Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...Naicy mandal

Get Premium Pimple Saudagar Call Girls (8005736733) 24x7 Rate 15999 with A/c ...MOHANI PANDEY

Shikrapur Call Girls Most Awaited Fun 6297143586 High Profiles young Beautie...tanu pandey

↑Top celebrity ( Pune ) Nagerbazar Call Girls8250192130 unlimited shot and al...Call Girls in Nagpur High Profile

Top Rated Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...Call Girls in Nagpur High Profile

Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi

VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...Call Girls in Nagpur High Profile

9004554577, Get Adorable Call Girls service. Book call girls & escort service...Pooja Nehwal

Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Call Girls in Nagpur High Profile

怎样办理斯威本科技大学毕业证（SUT毕业证书）成绩单留信认证tufbav

CALL GIRLS IN Saket 83778-77756 | Escort Service In DELHI NcRdollysharma2066

Call Girls Kothrud Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi

SM-N975F esquematico completo - reparación.pdfStefanoBiamonte1

Vip Mumbai Call Girls Kalyan Call On 9920725232 With Body to body massage wit...amitlee9823

(ISHITA) Call Girls Service Aurangabad Call Now 8617697112 Aurangabad Escorts...Call Girls in Nagpur High Profile Call Girls

怎样办理圣芭芭拉分校毕业证（UCSB毕业证书）成绩单留信认证ehyxf

PBL1-v1-012e.pptx

1. CPU GPU Ultimate CGRA w/ high-speed compiler CGRA for Energy-efficient Cryptography Beyond-Neuromorphic Systems Non-Deterministic Computing 1 IMAX3: Amazing Dataflow-Centric CGRA and its Scalability ep.12 20210401

2. 2 Scalar, SIMD and CGRA 20220202 time I1 L2 VST L2 VLD VLD VFMA I1 L2 VST L2 VLD VLD VFMA I1 L2 VST L2 VLD VLD VFMA I1 L2 VST L2 VLD VLD VFMA MM LD LM MM LD LM FMA LM ST LD LM LD LM FMA LM ST LD LM MM LD LM FMA LM ST LD LM LD LM FMA LM ST LD LM MM LD LM FMA LM ST LD LM LD LM FMA LM ST LD LM MM LD LM FMA LM ST LD LM LD LM FMA LM ST LD LD FMA ST D1 D1 D1 I1 I1 I1 L2 L2 LD LD FMA ST D1 D1 D1 I1 I1 I1 L2 L2 LD LD FMA ST D1 D1 D1 I1 I1 I1 L2 L2 LD LD FMA ST D1 D1 D1 I1 I1 I1 L2 L2 LD LD FMA ST D1 D1 D1 I1 I1 I1 L2 L2 LD LD FMA ST D1 D1 D1 I1 I1 I1 L2 L2 LD LD FMA ST D1 D1 D1 I1 I1 I1 L2 L2 LD LD FMA ST D1 D1 D1 I1 I1 I1 L2 L2 MM I1 I1 I1 I1 VST VST VST VST VFMA VFMA VFMA VFMA VLD VLD VLD VLD VLD VLD VLD VLD MM Scalar (VL=32) Vector1 (VL=256) Vector2 (VL=2048) CGRA (VL=16K)

3. HBM2 Micro pipelining 20210401 3

4. HBM2 Medium pipelining 20210401 4 Double buffering

5. HBM2 Macro pipelining 20210401 5

6. IMAX (64units) x 120 modules = 307200 operations / 4 cycles 20210401 6 HBM2 HOST HOST IMAX (1.2mm2/8nm) x 120 modules ≃ 144mm2 ?

7. Top-down approach 20210401 7 HBM2 HOST HOST I/O I/O FFT CONV MM SORT SpMM

8. Multilevel pipelining 20210401 8 HBM2

9. 20210401 9 HBM2 and VMK version will appear soon IMAX2: Ultra-speed compilable CGRA 2022/12/XX First CGRA, based on linear cores (not island-style) 32-unit, 1280-operations/4cycle (768-int32, 256-fp32, 256-media8/16, 512-load/store, 1024-stochastic-fma8, and 128-sparse-matrix) IMAX2 32 cores 250MHz 1280 operations per 4 cycles ALVEO-U280/U280 Memory/core: 64KB Operations/core: 32-load/8-store, quad-sparse-load, 3-cascaded octa-int/media, octa-single-float FMA, 32-stochastic FMA http://archlab.naist.jp/proj-arm64/fpga/U280-step4000-20221020.img.gz IMAX2 32 cores 250MHz 1280 operations per 4 cycles VMK180/VM1802 Memory/core: 64KB Operations/core: 32-load/8-store, quad-sparse-load, 3-cascaded octa-int/media, octa-single-float FMA, 32-stochastic FMA http://archlab.naist.jp/proj-arm64/fpga/VMK180-step4000-20221020.img.gz

10. You can test IMAX 20210401 10

Notas do Editor

Computing Architecture Laboratory in Nara Institute of Science and Technology is now targeting at power efficient computers to suppress global warming. I present this video to all hungry engineers who are tired of CPU, GPU, FPGA, tensor core, AI core, who want some challenging one with no black box inside, and who want to improve by themselves. This video follows episode 11, and focuses on the scalability of IMAX.
Let's scale it up. A scalar processor also has SIMD instructions for about 32 elements. Increasing the number of cores increases performance. However, if you don't program it well, you will get many cache misses, and poor performance. SIMD with 256 elements or more are called vector. There are two types of vectors. Vector 1 is connected to cache memory. Since the cache memory is small, the number of elements for vector operations is only about 256. Vector 2 is directly connected to the main memory and the number of elements can be increased up to about 2048. The CGRA has various configurations. This diagram shows a sandwich structure of ALU and 64 kilobytes of local memory. The number of elements that can be handled at once is now 16000. By absorbing irregular memory references in local memory, main memory can keep high-speed with only regular access. You can also concatenate multiple memory spaces into the pipeline to build a longer pipeline.
As in the episode 1, HBM2 can provide multiple AXI busses for scaling up of IMAX. This is the case just increasing the number of lanes.
In addition to the micro pipelining, medium pipelining is available in each lane. The double buffering in local memory can isolate each stage in FFT, merge sort, and so on.
Furthermore, the multiple lanes can be concatenated through HBM2 like this way. This configuration combines micro, medium and macro pipelining all together.
One lane can support four modules of IMAX that has 64 units, then 10240 operations can be mapped on one lane. If 30 ports are available in HBM2, 307200 operations can be mapped at once. One module of IMAX will occupy 1.2 millimeter square. If we can fabricate with 8 nanometer technology, 120 modules of IMAX will occupy 144 millimeter square that is quarter of high-end GPGPU.
This is a top-down approach from the view of application. Suppose that various data are located in the main memory and processed in a pipelined manner. For avoiding the interference of multiple data flow in the main memory, the intermediate data should be stored to the space out of the main memory. The ring structure and the local memory of IMAX can be employed for building pipelines outside of the main memory.
This is an example. CPU can also be employed to cover complicated function with many conditional branches.
IMAX is ready on some FPGA boards. The bin files or SD card images in our web site provide you real CGRAs. HBM2 version and VMK version will appear soon.
Our web site has links for documents and tools. Note that only Verilog code is not included. I hope IMAX can contribute to stop global warming. Thank you for your attention.

PBL1-v1-012e.pptx

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a PBL1-v1-012e.pptx

Semelhante a PBL1-v1-012e.pptx (20)

Mais de NAIST

Mais de NAIST (19)

Último

Último (20)

PBL1-v1-012e.pptx

Notas do Editor