SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
How can we build super-great AI
chips?
For software engineers who are new to HW.
Paik June
FuriosaAI
Contents
▪ Silicon Engineering
▪ Architecture Exploration
▪ HDL describing HW computation
▪ Conclusion
Silicon engineering:
The foundation of computing
We forget Silicon Valley = Silicon + Valley
Silicon engineering is one of the most complex-coordinated process that
humankind has ever practiced so far.
▪ Enormous challenges ahead as design complexity explodes.
▪ Nvidia Volta GPUs packed with 20 billion transistors.
Silicon Valley engineering culture is very influenced and shaped by very disciplined
silicon engineering.
▪ Jeff Dean, Sanjay Ghemawat and Urs Holze all came from HW companies
before joining Google.
▪ Our DEVIEW keynote speaker Song also worked for DEC ☺ It proves my point.
SW is eating the world. But,
“People who are really serious about software should make their own hardware” –
Alan Kay / Steve Jobs
▪ There is not much distinction between HW and SW if we are serious about it.
Google, Amazon, Facebook, Microsoft, Alibaba, Baidu, Apple: Everyone is trying to
build the strong silicon team as it’s strategically important to get vertical
customizing their architectures controlling the entire stack.
▪ EX: Google TPU
What is our opportunity?
We are into the big wave of global Semiconductor Super-Cycle
▪ Just think about cloud datacenter, autonomous car, IoT and AR/VR, all the
electronic gadgets that will be powered by semiconductors.
It is simply the biggest driving engine of our economy now and future.
▪ Global dominance in memory: 25% of the entire national exports
▪ We all know that we are relatively weak with non-memory products.
▪ SSD is in-between memory and non-memory.
▪ How about AI chips?
Yes We Can.
We have one of the most advanced semiconductor manufacturing facilities in the
world.
▪ TSMC vs. Samsung
We have new generation of engineers with great potential
▪ Global Hit Semiconductor product experiences: Mobile Application Processor
(AP), Solid State Drive (SSD)
We also have AI application and service industries of good enough size.
▪ Good testbed before launching into global products.
Hell of challenges
We don’t have much experience and success stories of enterprise level B2B
solution initiated by startups.
▪ Domestic market is too small. Weak ecosystem in terms of market size.
Semiconductor is fundamentally very tough business. It’s not easy at all even for
big guys. It has been very capital and human resource intensive because
▪ It’s the timing business. You should be very fast.
▪ It requires extreme precision engineering. It shouldn’t fail.
To pull off successful design and
sell to the masses,
It should be very strategic and
orchestrated long-term effort.
Let’s go back to the fundamental
.
AI chip engineering
There are many aspects of AI chip design. We will mainly focus on
microarchitecture.
▪ Application
▪ Algorithm
▪ Software
▪ Microarchitecture
▪ Physical Design
▪ …
Rendering of GDS2 file illustrating physical structure of silicon chips
Zoom into a microchip
Microarchitecutre = micro + architecture
Chip Design companies (Ex: Qualcomm, Nvidia, FuriosaAI) passes the architecture
blueprint to the Fab companies (Ex: TSMC, Samsung, Global Foundary).
Great architecture need great architects
Great building serves people to enable the best human activities in the most
humane manner possible given the building material
Great microarchitecture serves computation process that enables the best
applications in the most efficient manner possible given the silicon/power/budget
▪ Real estate in the micro world
▪ Great architect should know in and out of everything and is able to implement
the chip as scheduled with the given budgets
Microarchitect’s toolkit
▪ Instruction Set Architecture
▪ VLIW, SIMD, Vector, Systolic Array
▪ SuperScalar/ Multithreading / DataFlow
▪ Pipelining
▪ Virtualization
▪ Prefetching/Caching
▪ IO/Memory subsystem
▪ Finite state machine
▪ …
Key Question:
What is the great winner architecture for
AI computation?
More important questions
How can we explore and find the best
architecture and build it?
Let’s explore architecture
.
Build the performance modeling simulator
It’s a so called cycle accurate-simulator which can simulate both behavior and
performance of machine we’re building at the very fine granularity and abstraction
level which is usually at the level of clock cycle. This enforces the discipline of
▪ Concrete and precise thinking
▪ Data-Driven evaluation for important trade-off of design choices
Architect should have strong (or reasonable) SW skill to build this simulator.
OOP language and Event-Driven programming paradigm is the natural fit for this
job. C++ is the standard choice.
Arch exploration takes time and experiences.
Korean industries have neglected this part because we didn’t (or couldn’t afford
to) allocate enough time for defining and exploring the design space to come up
with the solid architecture specification. It takes time because
▪ Workload characterization and prediction takes time.
▪ Simulation needs supercomputer-scale computation.
▪ Understanding very detailed design trade-off just takes time.
In other words, cultivating intuition by refining it iteratively by methodically taking
good measures takes time
Time Schedule
So let’s say it takes 1.5~2 years to build commercial AI chips from concept to
production. We need to allocate at least 6~8 month for performance modeling
that goes in parallel to the implementation
Performance Modeling /
Architecturing
RTL Implementation
Software Architecturing / Implementation
Verification
Physical Design / Manufacturing
Arch Examples: : Quantization (suggested by Google)
▪ Aggressive operator fusion: Performing as many operations as possible in a
single pass can lower the cost of memory accesses and provide significant
improvements in run-time and power consumption
▪ Compressed memory access: One can optimize memory bandwidth by
supporting on the fly de-compression of weights (and activations). A simple way
to do that is to support lower precision storage of weights and possibly
activations.
▪ Lower precision 4/8/16 bit arithmetic processing
▪ Per-layer selection of bitwidths
▪ Per-channel quantization
Arch Examples: : Quantization (suggested by Google)
▪ Aggressive operator fusion: Performing as many operations as possible in a
single pass can lower the cost of memory accesses and provide significant
improvements in run-time and power consumption
▪ Compressed memory access: One can optimize memory bandwidth by
supporting on the fly de-compression of weights (and activations). A simple way
to do that is to support lower precision storage of weights and possibly
activations.
▪ Lower precision 4/8/16 bit arithmetic processing
▪ Per-layer selection of bitwidths
▪ Per-channel quantization
Implementation:
the dirty game
starts with
Hardware Description Language(HDL)
.
Have you heard of Verilog, VHDL?
▪ HDL is notoriously hard to write in a right way.
▪ It’s partly due to the syntax, but the main reason is that you need to specify
every step of the computation process at the very precise level using logic gate
and finite-state machine.
▪ State machine is the very fundamental concept. Please read Leslie lamport and
TLA+.
The best introduction to HW computation
Amazing, SICP Ch5 "Computing with register machines" has one of the best
explanation of HW computation process.
Euclid algorithm: SW implementation
Euclid Algorithm: HW implementation
Datapath Controller
Describing HW datapath and controller
Where is the programmability of HW?
Real Production HDL Source Code
▪ Rocket-V Core Source code it is written in Chisel language, which is Scala-
based.
HDL source code is the most important golden part of the hardware IP that our
engineers spend most of time on. It should be developed and maintained with the
highest standard:
▪ Very strong discipline of test: Unit, Random, Formal, Top Level, Emulation,
System Level Test. It requires 100 % test coverage. Once shipped, you can’t
change hardware.
▪ But there are still many bugs. Observability such as performance and status
registers should be baked into hardware at every level.
You learned major concept.
Can you describe the matrix computation in HDL language? Give it a gry.
Example of AI chips: Google TPU
Example of AI Chips: Furiosa Madrun
HDL to the physical realities
It’s the Physical Compiler = physical + compiler who does the job.
Caution: it’s very capital intensive, expensive translation.
Let’s wrap up here.
▪ We mainly focused on microarchitecture and HDL aspect of AI chip engineering.
▪ AI chip ocused design is the true interplay and codesign of Algorithm + SW +
HW.
▪ SW and Algorithm might matter more. It’s also really exciting technology. We
have SW and Algorithm team as big as HW.
▪ Hope that we can discuss this in next Deview event after we have our chip out
next year.
▪ Thank you! Good Luck!
Q & A
질문은 Slido에 남겨주세요.
sli.do
#deview
TRACK 4

Mais conteúdo relacionado

Mais procurados

Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
게임 서버 성능 분석하기
게임 서버 성능 분석하기게임 서버 성능 분석하기
게임 서버 성능 분석하기iFunFactory Inc.
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
 
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기Taejun Kim
 
How to build massive service for advance
How to build massive service for advanceHow to build massive service for advance
How to build massive service for advanceDaeMyung Kang
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with EsperTed Won
 
Kotlin coroutine - behind the scenes
Kotlin coroutine - behind the scenesKotlin coroutine - behind the scenes
Kotlin coroutine - behind the scenesAnh Vu
 
Starring sakila my sql university 2009
Starring sakila my sql university 2009Starring sakila my sql university 2009
Starring sakila my sql university 2009David Paz
 
golang과 websocket을 활용한 서버프로그래밍 - 장애없는 서버 런칭 도전기
golang과 websocket을 활용한 서버프로그래밍 - 장애없는 서버 런칭 도전기golang과 websocket을 활용한 서버프로그래밍 - 장애없는 서버 런칭 도전기
golang과 websocket을 활용한 서버프로그래밍 - 장애없는 서버 런칭 도전기Sangik Bae
 
인프런 - 스타트업 인프랩 시작 사례
인프런 - 스타트업 인프랩 시작 사례인프런 - 스타트업 인프랩 시작 사례
인프런 - 스타트업 인프랩 시작 사례Hyung Lee
 
Window manager활용하기 곽근봉
Window manager활용하기 곽근봉Window manager활용하기 곽근봉
Window manager활용하기 곽근봉keunbong kwak
 
Media Handling in FreeSWITCH
Media Handling in FreeSWITCHMedia Handling in FreeSWITCH
Media Handling in FreeSWITCHMoises Silva
 
Capturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in GoCapturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in GoScyllaDB
 
[NDC2017 : 박준철] Python 게임 서버 안녕하십니까 - 몬스터 슈퍼리그 게임 서버
[NDC2017 : 박준철] Python 게임 서버 안녕하십니까 - 몬스터 슈퍼리그 게임 서버[NDC2017 : 박준철] Python 게임 서버 안녕하십니까 - 몬스터 슈퍼리그 게임 서버
[NDC2017 : 박준철] Python 게임 서버 안녕하십니까 - 몬스터 슈퍼리그 게임 서버준철 박
 
C++ 코드 품질 관리 비법
C++ 코드 품질 관리 비법C++ 코드 품질 관리 비법
C++ 코드 품질 관리 비법선협 이
 
Iocp 기본 구조 이해
Iocp 기본 구조 이해Iocp 기본 구조 이해
Iocp 기본 구조 이해Nam Hyeonuk
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
닷넷프레임워크에서 Redis 사용하기
닷넷프레임워크에서 Redis 사용하기닷넷프레임워크에서 Redis 사용하기
닷넷프레임워크에서 Redis 사용하기흥배 최
 

Mais procurados (20)

Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
게임 서버 성능 분석하기
게임 서버 성능 분석하기게임 서버 성능 분석하기
게임 서버 성능 분석하기
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
 
How to build massive service for advance
How to build massive service for advanceHow to build massive service for advance
How to build massive service for advance
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
 
Kotlin coroutine - behind the scenes
Kotlin coroutine - behind the scenesKotlin coroutine - behind the scenes
Kotlin coroutine - behind the scenes
 
Starring sakila my sql university 2009
Starring sakila my sql university 2009Starring sakila my sql university 2009
Starring sakila my sql university 2009
 
golang과 websocket을 활용한 서버프로그래밍 - 장애없는 서버 런칭 도전기
golang과 websocket을 활용한 서버프로그래밍 - 장애없는 서버 런칭 도전기golang과 websocket을 활용한 서버프로그래밍 - 장애없는 서버 런칭 도전기
golang과 websocket을 활용한 서버프로그래밍 - 장애없는 서버 런칭 도전기
 
인프런 - 스타트업 인프랩 시작 사례
인프런 - 스타트업 인프랩 시작 사례인프런 - 스타트업 인프랩 시작 사례
인프런 - 스타트업 인프랩 시작 사례
 
Window manager활용하기 곽근봉
Window manager활용하기 곽근봉Window manager활용하기 곽근봉
Window manager활용하기 곽근봉
 
HBase at LINE 2017
HBase at LINE 2017HBase at LINE 2017
HBase at LINE 2017
 
Media Handling in FreeSWITCH
Media Handling in FreeSWITCHMedia Handling in FreeSWITCH
Media Handling in FreeSWITCH
 
Capturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in GoCapturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in Go
 
[NDC2017 : 박준철] Python 게임 서버 안녕하십니까 - 몬스터 슈퍼리그 게임 서버
[NDC2017 : 박준철] Python 게임 서버 안녕하십니까 - 몬스터 슈퍼리그 게임 서버[NDC2017 : 박준철] Python 게임 서버 안녕하십니까 - 몬스터 슈퍼리그 게임 서버
[NDC2017 : 박준철] Python 게임 서버 안녕하십니까 - 몬스터 슈퍼리그 게임 서버
 
C++ 코드 품질 관리 비법
C++ 코드 품질 관리 비법C++ 코드 품질 관리 비법
C++ 코드 품질 관리 비법
 
Iocp 기본 구조 이해
Iocp 기본 구조 이해Iocp 기본 구조 이해
Iocp 기본 구조 이해
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
닷넷프레임워크에서 Redis 사용하기
닷넷프레임워크에서 Redis 사용하기닷넷프레임워크에서 Redis 사용하기
닷넷프레임워크에서 Redis 사용하기
 
Git flow
Git flowGit flow
Git flow
 

Semelhante a [241] AI 칩 개발에 사용되는 엔지니어링

[TMS 2018] 기술개발 / FuriosaAI 백준호 CEO, 글로벌 격전지에서 발견한 기회
[TMS 2018] 기술개발 / FuriosaAI 백준호 CEO, 글로벌 격전지에서 발견한 기회 [TMS 2018] 기술개발 / FuriosaAI 백준호 CEO, 글로벌 격전지에서 발견한 기회
[TMS 2018] 기술개발 / FuriosaAI 백준호 CEO, 글로벌 격전지에서 발견한 기회 NAVER D2 STARTUP FACTORY
 
Maximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesMaximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesJeff Bertman
 
CHIPS Alliance_Object Automation Inc_workshop
CHIPS Alliance_Object Automation Inc_workshopCHIPS Alliance_Object Automation Inc_workshop
CHIPS Alliance_Object Automation Inc_workshopObject Automation
 
Computing Without Computers - Oct08
Computing Without Computers - Oct08Computing Without Computers - Oct08
Computing Without Computers - Oct08Ian Page
 
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructureDeterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructureSean Cohen
 
From prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioFrom prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioMáté Lang
 
Enterprise-Grade DevOps Solutions for a Start Up Budget
Enterprise-Grade DevOps Solutions for a Start Up BudgetEnterprise-Grade DevOps Solutions for a Start Up Budget
Enterprise-Grade DevOps Solutions for a Start Up BudgetDevOps.com
 
Introduction to Agile Hardware
Introduction to Agile Hardware Introduction to Agile Hardware
Introduction to Agile Hardware Cprime
 
Functional verification techniques EW16 session
Functional verification techniques  EW16 sessionFunctional verification techniques  EW16 session
Functional verification techniques EW16 sessionSameh El-Ashry
 
Agile software architecture
Agile software architectureAgile software architecture
Agile software architectureScott Hsieh
 
The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...
The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...
The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...NRB
 
Accelerating Digital Transformation: It's About Digital Enablement
Accelerating Digital Transformation:  It's About Digital EnablementAccelerating Digital Transformation:  It's About Digital Enablement
Accelerating Digital Transformation: It's About Digital EnablementJoshua Gossett
 
Cincom Smalltalk: Present, Future & Smalltalk Advocacy
Cincom Smalltalk: Present, Future & Smalltalk AdvocacyCincom Smalltalk: Present, Future & Smalltalk Advocacy
Cincom Smalltalk: Present, Future & Smalltalk AdvocacyESUG
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Giridhar Addepalli
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuningYosuke Mizutani
 
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...Tyrone Systems
 
Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...Vadym Kazulkin
 
Software Factories in the Real World: How an IBM WebSphere Integration Factor...
Software Factories in the Real World: How an IBM WebSphere Integration Factor...Software Factories in the Real World: How an IBM WebSphere Integration Factor...
Software Factories in the Real World: How an IBM WebSphere Integration Factor...ghodgkinson
 
Technology and Digital Platform | 2019 partner summit
Technology and Digital Platform | 2019 partner summitTechnology and Digital Platform | 2019 partner summit
Technology and Digital Platform | 2019 partner summitAndrew Kumar
 

Semelhante a [241] AI 칩 개발에 사용되는 엔지니어링 (20)

[TMS 2018] 기술개발 / FuriosaAI 백준호 CEO, 글로벌 격전지에서 발견한 기회
[TMS 2018] 기술개발 / FuriosaAI 백준호 CEO, 글로벌 격전지에서 발견한 기회 [TMS 2018] 기술개발 / FuriosaAI 백준호 CEO, 글로벌 격전지에서 발견한 기회
[TMS 2018] 기술개발 / FuriosaAI 백준호 CEO, 글로벌 격전지에서 발견한 기회
 
Maximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesMaximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and Practices
 
CHIPS Alliance_Object Automation Inc_workshop
CHIPS Alliance_Object Automation Inc_workshopCHIPS Alliance_Object Automation Inc_workshop
CHIPS Alliance_Object Automation Inc_workshop
 
Computing Without Computers - Oct08
Computing Without Computers - Oct08Computing Without Computers - Oct08
Computing Without Computers - Oct08
 
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructureDeterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
 
From prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioFrom prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.io
 
Enterprise-Grade DevOps Solutions for a Start Up Budget
Enterprise-Grade DevOps Solutions for a Start Up BudgetEnterprise-Grade DevOps Solutions for a Start Up Budget
Enterprise-Grade DevOps Solutions for a Start Up Budget
 
Introduction to Agile Hardware
Introduction to Agile Hardware Introduction to Agile Hardware
Introduction to Agile Hardware
 
Functional verification techniques EW16 session
Functional verification techniques  EW16 sessionFunctional verification techniques  EW16 session
Functional verification techniques EW16 session
 
Agile software architecture
Agile software architectureAgile software architecture
Agile software architecture
 
The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...
The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...
The NRB Group mainframe day 2021 - Containerisation on Z - Paul Pilotto - Seb...
 
Accelerating Digital Transformation: It's About Digital Enablement
Accelerating Digital Transformation:  It's About Digital EnablementAccelerating Digital Transformation:  It's About Digital Enablement
Accelerating Digital Transformation: It's About Digital Enablement
 
OA centre of excellence
OA centre of excellenceOA centre of excellence
OA centre of excellence
 
Cincom Smalltalk: Present, Future & Smalltalk Advocacy
Cincom Smalltalk: Present, Future & Smalltalk AdvocacyCincom Smalltalk: Present, Future & Smalltalk Advocacy
Cincom Smalltalk: Present, Future & Smalltalk Advocacy
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuning
 
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
 
Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...
 
Software Factories in the Real World: How an IBM WebSphere Integration Factor...
Software Factories in the Real World: How an IBM WebSphere Integration Factor...Software Factories in the Real World: How an IBM WebSphere Integration Factor...
Software Factories in the Real World: How an IBM WebSphere Integration Factor...
 
Technology and Digital Platform | 2019 partner summit
Technology and Digital Platform | 2019 partner summitTechnology and Digital Platform | 2019 partner summit
Technology and Digital Platform | 2019 partner summit
 

Mais de NAVER D2

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다NAVER D2
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...NAVER D2
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기NAVER D2
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발NAVER D2
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈NAVER D2
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&ANAVER D2
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기NAVER D2
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep LearningNAVER D2
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applicationsNAVER D2
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingNAVER D2
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지NAVER D2
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기NAVER D2
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화NAVER D2
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)NAVER D2
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기NAVER D2
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual SearchNAVER D2
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화NAVER D2
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지NAVER D2
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터NAVER D2
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?NAVER D2
 

Mais de NAVER D2 (20)

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
 

Último

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

[241] AI 칩 개발에 사용되는 엔지니어링

  • 1. How can we build super-great AI chips? For software engineers who are new to HW. Paik June FuriosaAI
  • 2. Contents ▪ Silicon Engineering ▪ Architecture Exploration ▪ HDL describing HW computation ▪ Conclusion
  • 4. We forget Silicon Valley = Silicon + Valley Silicon engineering is one of the most complex-coordinated process that humankind has ever practiced so far. ▪ Enormous challenges ahead as design complexity explodes. ▪ Nvidia Volta GPUs packed with 20 billion transistors. Silicon Valley engineering culture is very influenced and shaped by very disciplined silicon engineering. ▪ Jeff Dean, Sanjay Ghemawat and Urs Holze all came from HW companies before joining Google. ▪ Our DEVIEW keynote speaker Song also worked for DEC ☺ It proves my point.
  • 5. SW is eating the world. But, “People who are really serious about software should make their own hardware” – Alan Kay / Steve Jobs ▪ There is not much distinction between HW and SW if we are serious about it. Google, Amazon, Facebook, Microsoft, Alibaba, Baidu, Apple: Everyone is trying to build the strong silicon team as it’s strategically important to get vertical customizing their architectures controlling the entire stack. ▪ EX: Google TPU
  • 6. What is our opportunity? We are into the big wave of global Semiconductor Super-Cycle ▪ Just think about cloud datacenter, autonomous car, IoT and AR/VR, all the electronic gadgets that will be powered by semiconductors. It is simply the biggest driving engine of our economy now and future. ▪ Global dominance in memory: 25% of the entire national exports ▪ We all know that we are relatively weak with non-memory products. ▪ SSD is in-between memory and non-memory. ▪ How about AI chips?
  • 7. Yes We Can. We have one of the most advanced semiconductor manufacturing facilities in the world. ▪ TSMC vs. Samsung We have new generation of engineers with great potential ▪ Global Hit Semiconductor product experiences: Mobile Application Processor (AP), Solid State Drive (SSD) We also have AI application and service industries of good enough size. ▪ Good testbed before launching into global products.
  • 8. Hell of challenges We don’t have much experience and success stories of enterprise level B2B solution initiated by startups. ▪ Domestic market is too small. Weak ecosystem in terms of market size. Semiconductor is fundamentally very tough business. It’s not easy at all even for big guys. It has been very capital and human resource intensive because ▪ It’s the timing business. You should be very fast. ▪ It requires extreme precision engineering. It shouldn’t fail.
  • 9. To pull off successful design and sell to the masses, It should be very strategic and orchestrated long-term effort. Let’s go back to the fundamental .
  • 10. AI chip engineering There are many aspects of AI chip design. We will mainly focus on microarchitecture. ▪ Application ▪ Algorithm ▪ Software ▪ Microarchitecture ▪ Physical Design ▪ …
  • 11. Rendering of GDS2 file illustrating physical structure of silicon chips Zoom into a microchip
  • 12. Microarchitecutre = micro + architecture Chip Design companies (Ex: Qualcomm, Nvidia, FuriosaAI) passes the architecture blueprint to the Fab companies (Ex: TSMC, Samsung, Global Foundary).
  • 13. Great architecture need great architects Great building serves people to enable the best human activities in the most humane manner possible given the building material Great microarchitecture serves computation process that enables the best applications in the most efficient manner possible given the silicon/power/budget ▪ Real estate in the micro world ▪ Great architect should know in and out of everything and is able to implement the chip as scheduled with the given budgets
  • 14. Microarchitect’s toolkit ▪ Instruction Set Architecture ▪ VLIW, SIMD, Vector, Systolic Array ▪ SuperScalar/ Multithreading / DataFlow ▪ Pipelining ▪ Virtualization ▪ Prefetching/Caching ▪ IO/Memory subsystem ▪ Finite state machine ▪ …
  • 15. Key Question: What is the great winner architecture for AI computation?
  • 16. More important questions How can we explore and find the best architecture and build it?
  • 18. Build the performance modeling simulator It’s a so called cycle accurate-simulator which can simulate both behavior and performance of machine we’re building at the very fine granularity and abstraction level which is usually at the level of clock cycle. This enforces the discipline of ▪ Concrete and precise thinking ▪ Data-Driven evaluation for important trade-off of design choices Architect should have strong (or reasonable) SW skill to build this simulator. OOP language and Event-Driven programming paradigm is the natural fit for this job. C++ is the standard choice.
  • 19. Arch exploration takes time and experiences. Korean industries have neglected this part because we didn’t (or couldn’t afford to) allocate enough time for defining and exploring the design space to come up with the solid architecture specification. It takes time because ▪ Workload characterization and prediction takes time. ▪ Simulation needs supercomputer-scale computation. ▪ Understanding very detailed design trade-off just takes time. In other words, cultivating intuition by refining it iteratively by methodically taking good measures takes time
  • 20. Time Schedule So let’s say it takes 1.5~2 years to build commercial AI chips from concept to production. We need to allocate at least 6~8 month for performance modeling that goes in parallel to the implementation Performance Modeling / Architecturing RTL Implementation Software Architecturing / Implementation Verification Physical Design / Manufacturing
  • 21. Arch Examples: : Quantization (suggested by Google) ▪ Aggressive operator fusion: Performing as many operations as possible in a single pass can lower the cost of memory accesses and provide significant improvements in run-time and power consumption ▪ Compressed memory access: One can optimize memory bandwidth by supporting on the fly de-compression of weights (and activations). A simple way to do that is to support lower precision storage of weights and possibly activations. ▪ Lower precision 4/8/16 bit arithmetic processing ▪ Per-layer selection of bitwidths ▪ Per-channel quantization
  • 22. Arch Examples: : Quantization (suggested by Google) ▪ Aggressive operator fusion: Performing as many operations as possible in a single pass can lower the cost of memory accesses and provide significant improvements in run-time and power consumption ▪ Compressed memory access: One can optimize memory bandwidth by supporting on the fly de-compression of weights (and activations). A simple way to do that is to support lower precision storage of weights and possibly activations. ▪ Lower precision 4/8/16 bit arithmetic processing ▪ Per-layer selection of bitwidths ▪ Per-channel quantization
  • 23. Implementation: the dirty game starts with Hardware Description Language(HDL) .
  • 24. Have you heard of Verilog, VHDL? ▪ HDL is notoriously hard to write in a right way. ▪ It’s partly due to the syntax, but the main reason is that you need to specify every step of the computation process at the very precise level using logic gate and finite-state machine. ▪ State machine is the very fundamental concept. Please read Leslie lamport and TLA+.
  • 25. The best introduction to HW computation Amazing, SICP Ch5 "Computing with register machines" has one of the best explanation of HW computation process.
  • 26. Euclid algorithm: SW implementation
  • 27. Euclid Algorithm: HW implementation Datapath Controller
  • 28. Describing HW datapath and controller
  • 29. Where is the programmability of HW?
  • 30. Real Production HDL Source Code ▪ Rocket-V Core Source code it is written in Chisel language, which is Scala- based. HDL source code is the most important golden part of the hardware IP that our engineers spend most of time on. It should be developed and maintained with the highest standard: ▪ Very strong discipline of test: Unit, Random, Formal, Top Level, Emulation, System Level Test. It requires 100 % test coverage. Once shipped, you can’t change hardware. ▪ But there are still many bugs. Observability such as performance and status registers should be baked into hardware at every level.
  • 31. You learned major concept. Can you describe the matrix computation in HDL language? Give it a gry.
  • 32. Example of AI chips: Google TPU
  • 33. Example of AI Chips: Furiosa Madrun
  • 34. HDL to the physical realities It’s the Physical Compiler = physical + compiler who does the job. Caution: it’s very capital intensive, expensive translation.
  • 35. Let’s wrap up here. ▪ We mainly focused on microarchitecture and HDL aspect of AI chip engineering. ▪ AI chip ocused design is the true interplay and codesign of Algorithm + SW + HW. ▪ SW and Algorithm might matter more. It’s also really exciting technology. We have SW and Algorithm team as big as HW. ▪ Hope that we can discuss this in next Deview event after we have our chip out next year. ▪ Thank you! Good Luck!
  • 36. Q & A