SlideShare a Scribd company logo
1 of 14
Download to read offline
LORA: LOW-RANK ADAPTATION OF LARGE
LANGUAGE MODELS
2023.3.9
유용상
NLP 티타임
Introduction
• We take inspiration from Li et al. (2018a); Aghajanyan et al. (2020) which
show that the learned over-parametrized models in fact reside on a low
intrinsic dimension.
• We hypothesize that the change in weights during model adaptation also
has a low “intrinsic rank”, leading to our proposed Low-Rank Adaptation
(LoRA) approach.
Introduction : Adventages
• A pre-trained model can be shared and used to build many small LoRA modules for
different tasks. We can freeze the shared model and efficiently switch tasks by replacing
the matrices in reducing the storage requirement and task-switching overhead
significantly.
• LoRA makes training more efficient and lowers the hardware barrier to entry by up to 3
times when using adaptive optimizers. optimize the injected, much smaller low-rank
matrices.
• Our simple linear design allows us to merge the trainable matrices with the frozen
weights when deployed, introducing no inference latency compared to a fully fine-tuned
model, by construction.
• LoRA is orthogonal to many prior methods and can be combined with many of them,
such as prefix-tuning.
Problem Statement
175B
82B
178B
530B
280B
Problem Statement
Aren’t Existing Solutions Good Enough?
Low-Rank Parametrized Update Matrices
• Forward Pass
• Update
W0 + ∆W = W0 + BA
• Hypothesis :
가중치에 대한 update도 adaptation 중 intrinsic rank가 낮을
것이다.
Low-Rank Parametrized Update Matrices
Low-Rank Parametrized Update Matrices
Low-Rank Parametrized Update Matrices
Empirical Experiments
Empirical Experiments
Conclusion
• Large-scale model을 효율적으로 튜닝하는 LoRA 제안
• Adapter류의 기법과 다르게 inference latency가 발생하지 않음
• Prefix-tuning과 다르게 usable sequence length를 줄일 필요가 없음
• 가중치 업데이트 행렬이 low intrinsic rank를 가진다고 가정
• 논문에선 LM에 초점을 맞췄지만 이론적으로 모든 dense layer에 적용
가능
PEFT
Blog : https://huggingface.co/blog/peft
https://4n3mone.tistory.com/7
Code : https://github.com/huggingface/peft

More Related Content

What's hot

LangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AILangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AI
OzgurOscarOzkan
 
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
Po-Chuan Chen
 
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdf
Po-Chuan Chen
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Po-Chuan Chen
 
Data Con LA 2022 - Transformers for NLP
Data Con LA 2022 - Transformers for NLPData Con LA 2022 - Transformers for NLP
Data Con LA 2022 - Transformers for NLP
Data Con LA
 

What's hot (20)

LangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AILangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AI
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Optimizers
OptimizersOptimizers
Optimizers
 
BERT
BERTBERT
BERT
 
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
 
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdf
 
FixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidenceFixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidence
 
Word embedding
Word embedding Word embedding
Word embedding
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
Data Con LA 2022 - Transformers for NLP
Data Con LA 2022 - Transformers for NLPData Con LA 2022 - Transformers for NLP
Data Con LA 2022 - Transformers for NLP
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)
 

Similar to 230309_LoRa

Ratpack and Grails 3 (and Spring Boot) SpringOne 2GX 2014
Ratpack and Grails 3 (and Spring Boot) SpringOne 2GX 2014Ratpack and Grails 3 (and Spring Boot) SpringOne 2GX 2014
Ratpack and Grails 3 (and Spring Boot) SpringOne 2GX 2014
Lari Hotari
 
Ieee projects 2012 for cse
Ieee projects 2012 for cseIeee projects 2012 for cse
Ieee projects 2012 for cse
SBGC
 

Similar to 230309_LoRa (20)

Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішенняRoman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud ...
 
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Scalable analytics for iaa s cloud a...
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
 
Cla06.ppt
Cla06.pptCla06.ppt
Cla06.ppt
 
Parallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitParallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks Summit
 
[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...
[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...
[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...
 
Integrating lock free and combining techniques for a practical and scalable f...
Integrating lock free and combining techniques for a practical and scalable f...Integrating lock free and combining techniques for a practical and scalable f...
Integrating lock free and combining techniques for a practical and scalable f...
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSW
 
Data warehouse 26 exploiting parallel technologies
Data warehouse  26 exploiting parallel technologiesData warehouse  26 exploiting parallel technologies
Data warehouse 26 exploiting parallel technologies
 
Ratpack and Grails 3 (and Spring Boot) SpringOne 2GX 2014
Ratpack and Grails 3 (and Spring Boot) SpringOne 2GX 2014Ratpack and Grails 3 (and Spring Boot) SpringOne 2GX 2014
Ratpack and Grails 3 (and Spring Boot) SpringOne 2GX 2014
 
Pretzel: optimized Machine Learning framework for low-latency and high throu...
Pretzel: optimized Machine Learning framework for  low-latency and high throu...Pretzel: optimized Machine Learning framework for  low-latency and high throu...
Pretzel: optimized Machine Learning framework for low-latency and high throu...
 
Algorithm Solved IEEE Projects 2012 2013 Java @ Seabirdssolutions
Algorithm Solved IEEE Projects 2012 2013 Java @ SeabirdssolutionsAlgorithm Solved IEEE Projects 2012 2013 Java @ Seabirdssolutions
Algorithm Solved IEEE Projects 2012 2013 Java @ Seabirdssolutions
 
Ieee projects 2012 for cse
Ieee projects 2012 for cseIeee projects 2012 for cse
Ieee projects 2012 for cse
 
Ieee projects 2012 for cse
Ieee projects 2012 for cseIeee projects 2012 for cse
Ieee projects 2012 for cse
 
Jooq java object oriented querying
Jooq java object oriented queryingJooq java object oriented querying
Jooq java object oriented querying
 
Electi Deep Learning Optimization
Electi  Deep Learning OptimizationElecti  Deep Learning Optimization
Electi Deep Learning Optimization
 
Scala a case4
Scala a case4Scala a case4
Scala a case4
 
Scalable analytics for iaa s cloud availability
Scalable analytics for iaa s cloud availabilityScalable analytics for iaa s cloud availability
Scalable analytics for iaa s cloud availability
 

More from YongSang Yoo

More from YongSang Yoo (10)

20230727_tinystories
20230727_tinystories20230727_tinystories
20230727_tinystories
 
20230608_megabyte
20230608_megabyte20230608_megabyte
20230608_megabyte
 
221220_페르소나챗봇
221220_페르소나챗봇221220_페르소나챗봇
221220_페르소나챗봇
 
220920_AI ETHICS
220920_AI ETHICS220920_AI ETHICS
220920_AI ETHICS
 
230305_Characterizing English Variation across Social Media Communities with ...
230305_Characterizing English Variation across Social Media Communities with ...230305_Characterizing English Variation across Social Media Communities with ...
230305_Characterizing English Variation across Social Media Communities with ...
 
230223_Knowledge_Distillation
230223_Knowledge_Distillation230223_Knowledge_Distillation
230223_Knowledge_Distillation
 
221108_Multimodal Transformer
221108_Multimodal Transformer221108_Multimodal Transformer
221108_Multimodal Transformer
 
221011_BERT
221011_BERT221011_BERT
221011_BERT
 
220910_GatedRNN
220910_GatedRNN220910_GatedRNN
220910_GatedRNN
 
220906_Glove
220906_Glove220906_Glove
220906_Glove
 

Recently uploaded

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 

Recently uploaded (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 

230309_LoRa

  • 1. LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS 2023.3.9 유용상 NLP 티타임
  • 2. Introduction • We take inspiration from Li et al. (2018a); Aghajanyan et al. (2020) which show that the learned over-parametrized models in fact reside on a low intrinsic dimension. • We hypothesize that the change in weights during model adaptation also has a low “intrinsic rank”, leading to our proposed Low-Rank Adaptation (LoRA) approach.
  • 3. Introduction : Adventages • A pre-trained model can be shared and used to build many small LoRA modules for different tasks. We can freeze the shared model and efficiently switch tasks by replacing the matrices in reducing the storage requirement and task-switching overhead significantly. • LoRA makes training more efficient and lowers the hardware barrier to entry by up to 3 times when using adaptive optimizers. optimize the injected, much smaller low-rank matrices. • Our simple linear design allows us to merge the trainable matrices with the frozen weights when deployed, introducing no inference latency compared to a fully fine-tuned model, by construction. • LoRA is orthogonal to many prior methods and can be combined with many of them, such as prefix-tuning.
  • 7. Low-Rank Parametrized Update Matrices • Forward Pass • Update W0 + ∆W = W0 + BA • Hypothesis : 가중치에 대한 update도 adaptation 중 intrinsic rank가 낮을 것이다.
  • 13. Conclusion • Large-scale model을 효율적으로 튜닝하는 LoRA 제안 • Adapter류의 기법과 다르게 inference latency가 발생하지 않음 • Prefix-tuning과 다르게 usable sequence length를 줄일 필요가 없음 • 가중치 업데이트 행렬이 low intrinsic rank를 가진다고 가정 • 논문에선 LM에 초점을 맞췄지만 이론적으로 모든 dense layer에 적용 가능