4. We forget Silicon Valley = Silicon + Valley
Silicon engineering is one of the most complex-coordinated process that
humankind has ever practiced so far.
▪ Enormous challenges ahead as design complexity explodes.
▪ Nvidia Volta GPUs packed with 20 billion transistors.
Silicon Valley engineering culture is very influenced and shaped by very disciplined
silicon engineering.
▪ Jeff Dean, Sanjay Ghemawat and Urs Holze all came from HW companies
before joining Google.
▪ Our DEVIEW keynote speaker Song also worked for DEC ☺ It proves my point.
5. SW is eating the world. But,
“People who are really serious about software should make their own hardware” –
Alan Kay / Steve Jobs
▪ There is not much distinction between HW and SW if we are serious about it.
Google, Amazon, Facebook, Microsoft, Alibaba, Baidu, Apple: Everyone is trying to
build the strong silicon team as it’s strategically important to get vertical
customizing their architectures controlling the entire stack.
▪ EX: Google TPU
6. What is our opportunity?
We are into the big wave of global Semiconductor Super-Cycle
▪ Just think about cloud datacenter, autonomous car, IoT and AR/VR, all the
electronic gadgets that will be powered by semiconductors.
It is simply the biggest driving engine of our economy now and future.
▪ Global dominance in memory: 25% of the entire national exports
▪ We all know that we are relatively weak with non-memory products.
▪ SSD is in-between memory and non-memory.
▪ How about AI chips?
7. Yes We Can.
We have one of the most advanced semiconductor manufacturing facilities in the
world.
▪ TSMC vs. Samsung
We have new generation of engineers with great potential
▪ Global Hit Semiconductor product experiences: Mobile Application Processor
(AP), Solid State Drive (SSD)
We also have AI application and service industries of good enough size.
▪ Good testbed before launching into global products.
8. Hell of challenges
We don’t have much experience and success stories of enterprise level B2B
solution initiated by startups.
▪ Domestic market is too small. Weak ecosystem in terms of market size.
Semiconductor is fundamentally very tough business. It’s not easy at all even for
big guys. It has been very capital and human resource intensive because
▪ It’s the timing business. You should be very fast.
▪ It requires extreme precision engineering. It shouldn’t fail.
9. To pull off successful design and
sell to the masses,
It should be very strategic and
orchestrated long-term effort.
Let’s go back to the fundamental
.
10. AI chip engineering
There are many aspects of AI chip design. We will mainly focus on
microarchitecture.
▪ Application
▪ Algorithm
▪ Software
▪ Microarchitecture
▪ Physical Design
▪ …
11. Rendering of GDS2 file illustrating physical structure of silicon chips
Zoom into a microchip
12. Microarchitecutre = micro + architecture
Chip Design companies (Ex: Qualcomm, Nvidia, FuriosaAI) passes the architecture
blueprint to the Fab companies (Ex: TSMC, Samsung, Global Foundary).
13. Great architecture need great architects
Great building serves people to enable the best human activities in the most
humane manner possible given the building material
Great microarchitecture serves computation process that enables the best
applications in the most efficient manner possible given the silicon/power/budget
▪ Real estate in the micro world
▪ Great architect should know in and out of everything and is able to implement
the chip as scheduled with the given budgets
18. Build the performance modeling simulator
It’s a so called cycle accurate-simulator which can simulate both behavior and
performance of machine we’re building at the very fine granularity and abstraction
level which is usually at the level of clock cycle. This enforces the discipline of
▪ Concrete and precise thinking
▪ Data-Driven evaluation for important trade-off of design choices
Architect should have strong (or reasonable) SW skill to build this simulator.
OOP language and Event-Driven programming paradigm is the natural fit for this
job. C++ is the standard choice.
19. Arch exploration takes time and experiences.
Korean industries have neglected this part because we didn’t (or couldn’t afford
to) allocate enough time for defining and exploring the design space to come up
with the solid architecture specification. It takes time because
▪ Workload characterization and prediction takes time.
▪ Simulation needs supercomputer-scale computation.
▪ Understanding very detailed design trade-off just takes time.
In other words, cultivating intuition by refining it iteratively by methodically taking
good measures takes time
20. Time Schedule
So let’s say it takes 1.5~2 years to build commercial AI chips from concept to
production. We need to allocate at least 6~8 month for performance modeling
that goes in parallel to the implementation
Performance Modeling /
Architecturing
RTL Implementation
Software Architecturing / Implementation
Verification
Physical Design / Manufacturing
21. Arch Examples: : Quantization (suggested by Google)
▪ Aggressive operator fusion: Performing as many operations as possible in a
single pass can lower the cost of memory accesses and provide significant
improvements in run-time and power consumption
▪ Compressed memory access: One can optimize memory bandwidth by
supporting on the fly de-compression of weights (and activations). A simple way
to do that is to support lower precision storage of weights and possibly
activations.
▪ Lower precision 4/8/16 bit arithmetic processing
▪ Per-layer selection of bitwidths
▪ Per-channel quantization
22. Arch Examples: : Quantization (suggested by Google)
▪ Aggressive operator fusion: Performing as many operations as possible in a
single pass can lower the cost of memory accesses and provide significant
improvements in run-time and power consumption
▪ Compressed memory access: One can optimize memory bandwidth by
supporting on the fly de-compression of weights (and activations). A simple way
to do that is to support lower precision storage of weights and possibly
activations.
▪ Lower precision 4/8/16 bit arithmetic processing
▪ Per-layer selection of bitwidths
▪ Per-channel quantization
24. Have you heard of Verilog, VHDL?
▪ HDL is notoriously hard to write in a right way.
▪ It’s partly due to the syntax, but the main reason is that you need to specify
every step of the computation process at the very precise level using logic gate
and finite-state machine.
▪ State machine is the very fundamental concept. Please read Leslie lamport and
TLA+.
25. The best introduction to HW computation
Amazing, SICP Ch5 "Computing with register machines" has one of the best
explanation of HW computation process.
30. Real Production HDL Source Code
▪ Rocket-V Core Source code it is written in Chisel language, which is Scala-
based.
HDL source code is the most important golden part of the hardware IP that our
engineers spend most of time on. It should be developed and maintained with the
highest standard:
▪ Very strong discipline of test: Unit, Random, Formal, Top Level, Emulation,
System Level Test. It requires 100 % test coverage. Once shipped, you can’t
change hardware.
▪ But there are still many bugs. Observability such as performance and status
registers should be baked into hardware at every level.
31. You learned major concept.
Can you describe the matrix computation in HDL language? Give it a gry.
34. HDL to the physical realities
It’s the Physical Compiler = physical + compiler who does the job.
Caution: it’s very capital intensive, expensive translation.
35. Let’s wrap up here.
▪ We mainly focused on microarchitecture and HDL aspect of AI chip engineering.
▪ AI chip ocused design is the true interplay and codesign of Algorithm + SW +
HW.
▪ SW and Algorithm might matter more. It’s also really exciting technology. We
have SW and Algorithm team as big as HW.
▪ Hope that we can discuss this in next Deview event after we have our chip out
next year.
▪ Thank you! Good Luck!