4. Implementation ChoicesImplementation Choices
Full Custom
Standard Cells
Compiled Cells
Macro Cells
Cell-based
Pre-diffused
(Gate Arrays)
Pre-wired
(FPGA's)
Array-based
Semicustom
Digital Circuit Implementation Approaches
5. Full-Custom Design Methodology
Design a chip from scratch.
Custom mask layers are created in order to fabricate
a full-custom IC
Cost can be amortized over a large volume
Ex: Microprocessors and memory chips
Cost is not primary concern
Super computers and defense applications
Engineers design some or all of the logic cells, circuits,
and the chip layout specifically for a full-custom IC.
Advantages
complete flexibility, high degree of
optimization in performance and area.
Reused many times (Ex: library cell)
Disadvantages
large amount of design effort, expensive.
Solution
Semi-custom design approaches
6. Full-custom design methodology
Early days of digital microelectronics
High performance and design density
– Handcrafting circuit topology and physical design
Advances in the design automation
Share of custom design reduces from year to year
– High performance processors
All modules are designed using semi-custom approaches
Performance critical modules using full custom
» Phase locked-loops and clock buffers
7. Semi-Custom Design MethodologySemi-Custom Design Methodology
Variety of semi-custom design
approaches
To reduce design time
To automate design process
Penalty
Reduced integration density and/or
performance
9. Transition to Automation and Regular StructuresTransition to Automation and Regular Structures
Intel 4004 (‘71)Intel 4004 (‘71)
Intel 8080Intel 8080 Intel 8085Intel 8085
Intel 8286Intel 8286 Intel 8486Intel 8486
Courtesy Intel
10. Standard Cell-based DesignStandard Cell-based Design
Routing channel
requirements are
reduced by presence
of more interconnect
layers
Functional
module
(RAM,
multiplier, …)
Routing
channel
Logic cellFeedthrough cell
Rowsofcells
Standard-cell layout methodology
11. Standard Cell-based DesignStandard Cell-based Design
Standardizes the design entry at the gate
level
Design capture
– using schematic entry
– Generated automatically using HDL
Library containing a wide selection of logic gates
over a wide range of fan-in and fan-out
– INV, AND/NAND, XOR/XNOR and FFS
Complex functions
– Full adders, comparators, Mux, counters, decoders
Library cells
– Same height
– Width varies depending on the functionality
12. Standard Cell-based DesignStandard Cell-based Design
Layout is generated automatically
Cells are placed in rows separated by routing channels
All the cells should have the same height
Width of cell vary depending on the complexity
Cells are interconnected using routing channels
Reduction of routing overhead is challenging goal
Feed-through cells
Adding more interconnect layers
Library cells
Small library
– Cells with limited fan-in
Large library
– Many versions of each cell
Sized for different driving strength
Performance
Power consumption levels
Generation and detailed documentation of cell library
Functionality
characterization of the delay and power as a function of load
capacitance and input rise and fall times
13. Standard Cell — ExampleStandard Cell — Example
[Brodersen92]
14. Standard Cell – The New GenerationStandard Cell – The New Generation
Cell-structure
hidden under
interconnect layers
15. Standard Cell - ExampleStandard Cell - Example
3-input NAND cell
(from ST Microelectronics):
C = Load capacitance
T = input rise/fall time
16. Compiled cellsCompiled cells
Implementing and characterization of a library
of cells
Expensive
Technology changes
Need to prepare layout for complete library
Need for characterization of complete library
Automated approaches
Generation of cell layouts on the fly for the given
transistor netlist
17. A Historical Perspective: the PLAA Historical Perspective: the PLA
x0 x1 x2
AND
plane
x0x1
x2
Product terms
OR
plane
f0 f1
20. Breathing Some New Life in PLAsBreathing Some New Life in PLAs
River PLAs
A cascade of multiple-output PLAs.
Adjacent PLAs are connected via river routing.
P R E - C H A R G E
PRE-
CHARGE
P R E - C H A R G E
PRE-CHARGE
B U F F E R
B U F F E R
BUFFER
BUFFER
P R E - C H A R G E
PRE-CHARGE
B U F F E R
BUFFER
P R E - C H A R G E
PRE-
CHARGE
B U F F E R
BUFFER
• No placement and routing needed.
• Output buffers and the input buffers
of the next stage are shared.
Courtesy B. Brayton
21. Experimental ResultsExperimental Results
Layout of C2670
Network of PLAs,
4 layers OTC
River PLA,
2 layers no additional routing
Standard cell,
2 layers channel routing
Standard cell,
3 layers OTC
0.2
0.6
1
1.4
0 2 4 6 area
delay
S C N P L A R P L A
Area:
RPLAs (2 layers) 1.23
SCs (3 layers) - 1.00,
NPLAs (4 layers) 1.31
Delay
RPLAs 1.04
SCs 1.00
NPLAs 1.09
Synthesis time: for RPLA , synthesis time equals design time;
SCs and NPLAs still need P&R.
Also: RPLAs are regular and predictable
23. MacrocellsMacrocells
Complex cells
Multipliers, data paths, memories and embedded microprocessors,
DSPs
Hard macro
– Module with given functionality and a predetermined physical
implementation
– Represents the Custom design of the requested function
– Can be parameterized
– Predictable performance and power dissipation
Soft macro
– Module with given functionality without a specific physical
implementation
– Timing can be determined after the final synthesis and routing
– Soft-macro cell generators
– Gives netlist for the given function and for requested paarmeter values
27. SEMICUSTOM DESIGN FLOWSEMICUSTOM DESIGN FLOW
Design capture
Schematic, HDLs like VHDL, VERILOG, SystemC
Logic Synthesis
Translates modules into net list
Prelayout Simulation and Verification
Design is checked for correctness
Performance analysis
– Estimated parasitic and layout parameters
Floor planning
Schematic representation of tentative placement of its major
functional blocks
– Based on estimated module sizes
– Global power and clock distribution network is also conceived at
this time
28. SEMICUSTOM DESIGN FLOWSEMICUSTOM DESIGN FLOW
Placement
Precise positioning of the cells is decided
Routing
The interconnection between cells and blocks are wired
Extraction
Model of chip is generated from the physical layout
– precise device sizes, device parasitic, and interconnect parastics
Post layout simulation and verification
Verification of functionality and performance in the presence of
layout parastics
– Lack of meeting required functionality and performance
Repeat the design steps
» Floorplanning, Placement, Routing
Tape out
Binary file needed for mask generation after verifying the design for
the given specification
– To ASIC vendor or foundry
32. More Complex PALMore Complex PAL
From Smith97
programmable AND array (2i 3 jk) k macrocells
j -wide OR array
j
macrocell
product
terms
D Q
A
1
j
B
CLK
OUT
C i i inputs
i inputs, j minterms/macrocell, k macrocells
33. 2-input mux2-input mux
as programmable logic blockas programmable logic block
F
A 0
B
S
1
Configuration
A B S F=
0 0 0 0
0 X 1 X
0 Y 1 Y
0 Y X XY
X 0 Y
Y 0 X
Y 1 X X 1 Y
1 0 X
1 0 Y
1 1 1 1
XY
XY
X
Y
34. Logic Cell of Actel Fuse-Based FPGALogic Cell of Actel Fuse-Based FPGA
A
B
SA Y
1
C
D
SB
1
S0
S1
1
35. Look-up Table Based Logic CellLook-up Table Based Logic Cell
Out
ln1 ln2
Memory
In Out
00 00
01 1
10 1
11 0
36. LUT-Based Logic CellLUT-Based Logic Cell
Courtesy Xilinx
D4
C1....C4
x
xxxxx
D3
D2
D1
F4
F3
F2
F1
Logic
function
of
xxx
Logic
function
of
xxx
Logic
function
of
xxx
xx
xx
4
xx
xx
xx
xx
xx
xx
xx
x
xx
xxxx xxxx xxxx
H
P
Bits
control
Bits
control
Multiplexer Controlled
by Configuration Program
x
xx
x
xx
xxx xx
xxxx
x
xx
xxxx
xx
x
xx
xxx
xx
Xilinx 4000 Series
Figure must be
updated
38. Altera MAX Interconnect ArchitectureAltera MAX Interconnect Architecture
LAB2
PIA
LAB1
LAB6
tPIA
tPIA
row channelcolumn channel
LAB
Courtesy Altera
Array-based
(MAX 3000-7000)
Mesh-based
(MAX 9000)
39. Field-Programmable Gate ArraysField-Programmable Gate Arrays
Fuse-basedFuse-based
I / O B u f f e r s
P ro g r a m / T e s t/ D ia g n o s t i c s
I / O B u f f e r s
I/OBuffers
I/OBuffers
V e r t i c a l ro u t e s
R o w s o f lo g i c m o d u le s
R o u ti n g c h a n n e l s
Standard-cell like
floorplan