Webinar on ChatGPT.pptx

•Transferir como PPTX, PDF•

0 gostou•1,455 visualizações

Seminar on ChatGPT Large Language Model by Abhilash Majumder(Intel) This presentation is solely for reading purposes and contains technical details about ChatGPT fundamentals

Engenharia

Webinar on ChatGPT
Abhilash Majumder (Intel SCG)

ChatGPT
• Chat GPT model is trained using Reinforcement Learning from
Human Feedback (RLHF),
• ChatGPT uses the same methods as InstructGPT, but with
slight differences in the data collection setup.
• ChatGPT is trained on an initial model using supervised fine-
tuning: human AI trainers provided conversations in which they
played both sides—the user and an AI assistant.
• For supervised fine-tuning ChatGPT leverages a reward
function based on PPO on policy algorithm to achieve SOTA
generative sequences

ChatGPT- GPT3
• GPT-3 is an autoregressive
transformer model with 175
billion parameters. It uses
the same architecture/model
as GPT-2, including the
modified initialization,
pre-normalization, and
reversible tokenization,
with the exception that GPT-
3 uses alternating dense and
locally banded sparse
attention patterns in the
layers of the transformer,
similar to the Sparse
Transformer.

ChatGPT- PPO(A2C)
• There are two primary variants of PPO: PPO-
Penalty and PPO-Clip.
• PPO-Penalty approximately solves a KL-
constrained update like TRPO, but penalizes
the KL-divergence in the objective function
instead of making it a hard constraint, and
automatically adjusts the penalty coefficient
over the course of training so that it’s
scaled appropriately.
• PPO-Clip doesn’t have a KL-divergence term in
the objective and doesn’t have a constraint
at all. Instead relies on specialized
clipping in the objective function to remove
incentives for the new policy to get far from
the old policy.
• PPO is an on-policy algorithm.
• PPO can be used for environments with either
discrete or continuous action spaces.
•

ChatGPT
• In case of GPT, PPO
infusion is semi
supervised. This implies
that a reward function is
moderated by human
supervision based on
previous results. The
initial LLM
(GPT)generative sequences
are ranked based on the
cumulative rewards based
on human supervised PPO.

ChatGPT
• Both models are given a
prompt and get a response.
The tuned LLM responses
are scored with the reward
function and which is then
used to update the
parameters of the fine-
tuned LLM to maximize the
reward function score (PPO
rewards)
•

ChatGPT
• But we also don't want
it to deviate too much
from the initial
response, which is what
the KL penalty is used
for. Otherwise the
optimization might
result in an LLM that
produces gibberish but
maximizes the reward
model score.

ChatGPT
• OpenAI Blog: https://openai.com/blog/chatgpt/
• InstructGPT: https://t.co/2VXhz0kK1o
• Minimalist Repository (in progress) :
https://github.com/abhilash1910/Minimalist-ChatGPT
• Other Repositories in RL/LLM :
https://github.com/abhilash1910/

ChatGPT
• Twitter: https://twitter.com/abhilash1396
• Github: https://github.com/abhilash1910/
• Linkedin: https://www.linkedin.com/in/abhilash-majumder-
1aa7b9138/

Mais conteúdo relacionado

Mais procurados

ChatGPT OpenAI Primer for BusinessDion Hinchcliffe

ChatGPT ppt.pptxYuvrajS9

Jawad's presentation on GPT.pptxJawadNadeem3

ChatGPT vs. GPT-3.pdfAddepto

Introduction to ChatGPTDamian T. Gordon

ChatGPT ChatBotLinconMondal

Unlocking the Power of ChatGPTKristine Schachinger SEO and Online Marketing

How Does Generative AI Actually Work? (a quick semi-technical introduction to...ssuser4edc93

Let's talk about GPT: A crash course in Generative AI for researchersSteven Van Vaerenbergh

Using AI for Learning.pptxGDSCUOWMKDUPG

CHATGPT.pptxSajedRahman2

ChatGPT Evaluation for NLPXiachongFeng

ChatGPT-the-revolution-is-coming.pdfLiang Yan

ChatGPT for AcademicAndry Alamsyah

Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...Applitools

What Are the Problems Associated with ChatGPT?Windzoon Technologies

Praneet’s Pre On ChatGpt edited.pptxSalunke2

Introduction to ChatGPT and Overview of its capabilities and functionality.pdfAD Techlogix - Website & Mobile App Development Company

How ChatGPT and AI-assisted coding changes software engineering profoundlyPekka Abrahamsson / Tampere University

Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapAnant Corporation

Mais procurados (20)

ChatGPT OpenAI Primer for Business

ChatGPT ppt.pptx

Jawad's presentation on GPT.pptx

ChatGPT vs. GPT-3.pdf

Introduction to ChatGPT

ChatGPT ChatBot

Unlocking the Power of ChatGPT

How Does Generative AI Actually Work? (a quick semi-technical introduction to...

Let's talk about GPT: A crash course in Generative AI for researchers

Using AI for Learning.pptx

CHATGPT.pptx

ChatGPT Evaluation for NLP

ChatGPT-the-revolution-is-coming.pdf

ChatGPT for Academic

Unlocking the Power of ChatGPT and AI in Testing - NextSteps, presented by Ap...

What Are the Problems Associated with ChatGPT?

Praneet’s Pre On ChatGpt edited.pptx

Introduction to ChatGPT and Overview of its capabilities and functionality.pdf

How ChatGPT and AI-assisted coding changes software engineering profoundly

Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap

Semelhante a Webinar on ChatGPT.pptx

How does ChatGPT work: an Information Retrieval perspectiveSease

chatGPT.pptxMahiJamunkar

Introduction for ChatGPT - Primer to DummiesSwethaKJ2

Training language models to follow instructions with human feedback (Instruct...Rama Irsheidat

MuleSoft + Augmented Reality & ChatGPTMuleSoft Meetups

Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Intel® Software

PPt on Chat GPT New users.pptxMohdMansoorAli1

Training language models to follow instructions with human feedback.pdfPo-Chuan Chen

LTE Short TTI Feature.docxAkhtar Khan

MuleSoft + Augmented Reality & ChatGPTMuleSoft Meetups

SigOpt for Machine Learning and AISigOpt

LLMs for the “GPU-Poor” - Franck Nijimbere.pdfGDG Bujumbura

Using Optimal Learning to Tune Deep Learning PipelinesSigOpt

Using Optimal Learning to Tune Deep Learning PipelinesScott Clark

A brief primer on OpenAI's GPT-3Ishan Jain

2019 Levenshtein Transformer広樹本間

Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...IRJET Journal

MuleSoft Integration with ChatGPT — Part 1 | MuleSoft Mysore Meetup #27MysoreMuleSoftMeetup

Scolari's ICCD17 TalkNECST Lab @ Politecnico di Milano

Gate-Level Simulation Methodology Improving Gate-Level Simulation Performancesuddentrike2

Semelhante a Webinar on ChatGPT.pptx (20)

How does ChatGPT work: an Information Retrieval perspective

chatGPT.pptx

Introduction for ChatGPT - Primer to Dummies

Training language models to follow instructions with human feedback (Instruct...

MuleSoft + Augmented Reality & ChatGPT

Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...

PPt on Chat GPT New users.pptx

Training language models to follow instructions with human feedback.pdf

LTE Short TTI Feature.docx

MuleSoft + Augmented Reality & ChatGPT

SigOpt for Machine Learning and AI

LLMs for the “GPU-Poor” - Franck Nijimbere.pdf

Using Optimal Learning to Tune Deep Learning Pipelines

A brief primer on OpenAI's GPT-3

2019 Levenshtein Transformer

Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...

MuleSoft Integration with ChatGPT — Part 1 | MuleSoft Mysore Meetup #27

Scolari's ICCD17 Talk

Gate-Level Simulation Methodology Improving Gate-Level Simulation Performance

Último

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya

University management System project report..pdfKamal Acharya

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

Extrusion Processes and Their Limitations120cr0395

Online banking management system project.pdfKamal Acharya

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile

Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEslot gacor bisa pakai pulsa

Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N

Introduction and different types of Ethernet.pptxupamatechverse

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7Call Girls in Nagpur High Profile Call Girls

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Webinar on ChatGPT.pptx

1. Webinar on ChatGPT Abhilash Majumder (Intel SCG)

2. ChatGPT • Chat GPT model is trained using Reinforcement Learning from Human Feedback (RLHF), • ChatGPT uses the same methods as InstructGPT, but with slight differences in the data collection setup. • ChatGPT is trained on an initial model using supervised fine- tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. • For supervised fine-tuning ChatGPT leverages a reward function based on PPO on policy algorithm to achieve SOTA generative sequences

3. ChatGPT

4. ChatGPT- GPT3 • GPT-3 is an autoregressive transformer model with 175 billion parameters. It uses the same architecture/model as GPT-2, including the modified initialization, pre-normalization, and reversible tokenization, with the exception that GPT- 3 uses alternating dense and locally banded sparse attention patterns in the layers of the transformer, similar to the Sparse Transformer.

5. ChatGPT- PPO(A2C) • There are two primary variants of PPO: PPO- Penalty and PPO-Clip. • PPO-Penalty approximately solves a KL- constrained update like TRPO, but penalizes the KL-divergence in the objective function instead of making it a hard constraint, and automatically adjusts the penalty coefficient over the course of training so that it’s scaled appropriately. • PPO-Clip doesn’t have a KL-divergence term in the objective and doesn’t have a constraint at all. Instead relies on specialized clipping in the objective function to remove incentives for the new policy to get far from the old policy. • PPO is an on-policy algorithm. • PPO can be used for environments with either discrete or continuous action spaces. •

6. ChatGPT • In case of GPT, PPO infusion is semi supervised. This implies that a reward function is moderated by human supervision based on previous results. The initial LLM (GPT)generative sequences are ranked based on the cumulative rewards based on human supervised PPO.

7. ChatGPT • Both models are given a prompt and get a response. The tuned LLM responses are scored with the reward function and which is then used to update the parameters of the fine- tuned LLM to maximize the reward function score (PPO rewards) •

8. ChatGPT • But we also don't want it to deviate too much from the initial response, which is what the KL penalty is used for. Otherwise the optimization might result in an LLM that produces gibberish but maximizes the reward model score.

9. ChatGPT

10. ChatGPT • OpenAI Blog: https://openai.com/blog/chatgpt/ • InstructGPT: https://t.co/2VXhz0kK1o • Minimalist Repository (in progress) : https://github.com/abhilash1910/Minimalist-ChatGPT • Other Repositories in RL/LLM : https://github.com/abhilash1910/

11. ChatGPT • Twitter: https://twitter.com/abhilash1396 • Github: https://github.com/abhilash1910/ • Linkedin: https://www.linkedin.com/in/abhilash-majumder- 1aa7b9138/

Webinar on ChatGPT.pptx

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Webinar on ChatGPT.pptx

Semelhante a Webinar on ChatGPT.pptx (20)

Último

Último (20)

Webinar on ChatGPT.pptx