SlideShare a Scribd company logo
1 of 16
Download to read offline
The Back Propagation Learning Algorithm




  For networks with hidden units.
  Error Correcting algorithm.
  Solves the credit (blame) assignment problem.




                         1
What is supervised learning?


Can we teach a network to learn to associate a pattern of
inputs with corresponding outputs?
i.e. given initial set of weights, how can they be adapted
to produce the desired output? Use a training set:
                                                    y


                 a       f?        d
   payment




                     b        e?


             c                                  w       p



                               workload
             person      workload      pay   P(happy)
             a           0.1           0.9   0.95
             b           0.3           0.7   0.8
             c           0.07          0.2   0.2
             d           0.9           0.9   0.3
             e           0.7           0.5   ??
             f           0.4           0.8   ??

After training, how does network generalise to patterns
unseen during learning?

                                   2
Learning by Error Correction


In the perceptron there was a binary valued output Ý and
a target Ø.
               x1            x2                        xN




                        w1        w2           wN




                             output y
                             target t
                                                        y


                Æ                                           1

   Ý    step            ÛÜ
                    ¼
                                                            0
                                                                Σwi xi
                                                                i


Define this error measure:
                              ½            ´Ø   Ý µ¾
                              ¾
It counts the number of incorrect outputs.

We want to design a weight changing procedure that
minimises .

                                       3
Learning by Error Correction



How do we change the weights Û¼ Û½         ÛÆ so that
error decreases?
                     E

Suppose error
                           slope                  slope
varies with weight         -ve                    +ve
Û like this.

                                                 wi


If we could measure the slope


                           Û
then changing weights by the negative of the slope will
minimise .



  slope +ve   ¡Û -ve     move towards minimum of
  slope -ve   ¡Û +ve



                           4
More Perceptron Problems


For the perceptron, can’t be differentiated with respect
to weights Û¼ Û½        ÛÆ because involves output Ý
which is not differentiable.
           ½    ´Ø   Ý µ¾           Ý   step
                                                       Æ
                                                           ÛÜ
           ¾                                           ¼


Threshold Unit:
                                               y

     ´        ÈÆ Û Ü                               1
       ½   if               ¼
Ý
       ¼   if
              ÈÆ ¼ Û Ü      ¼
                    ¼
                                                   0
                                                                Σwi xi
                                                                i


Sigmoid Unit:
                                               y


                ½                              1
Ý              ÈÆ ¡
     ½ · ÜÔ      ÛÜ
                                               0
                                                                 Σwi xi
                                                                 i




                                5
Gradient Descent


                         E

The error is now               slope               slope
a differentiable               -ve                 +ve
function.

                                                   wi


Change weights using negative slope

                         ¡Û            Û




    Û
        +ve   ¡Û   -ve
                             move towards minimum of

    Û
        -ve   ¡Û   +ve



This approach is called Gradient Descent




                                6
Derivation of Back Propagation


         x1                      v1                      y1


         x2                      v2                      y2




         xk                       vj                       yi
                      uj k                    wi j



         xN                      vN                      yN



         inputs                  hidden                    outputs
           xk                      vj                        yi



                                              È            ¡
          output             Ý         sig           Û Ú
                                              È               ¡
         hidden              Ú         sig           Ù Ü

              error                    ½È     È  Ø   Ý ¡¾
                                       ¾

We need to find the derivatives of                    with respect to weights
Û and Ù .

                                       7
Preliminaries


                                                      xk   ujk vj   wij yi

On a single pattern (drop )
                        ½                    ¡¾
                        ¾           Ø    Ý
and
                                    ½
             Ý                   ÈÆ ¡
                   ½ · ÜÔ          Û Ú

Note that:
                        Ý                     ¡
                        Ú
                                Ý       ½ Ý       Û


                        Ý                     ¡
                        Û
                                Ý       ½ Ý       Ú



              since if Ý
                                      ½
                                ½ · ÜÔ´ Üµ
                        Ý
                 then           Ý ´½   Ý µ
                        Ü




                            8
Between Hidden and Output                               Û

                                                            xk   ujk vj   wij yi

For weights between hidden units
and output units.

                                 ½                 ¡¾
                                 ¾        Ø    Ý

                                              Ý
                            Û             Ý   Û
                        ¡
       Ý
               Ý    Ø
       Ý
       Û
               Ý ´½     ÝµÚ
                                    ¡
                   Û
                             Ý     Ø ßÞ ´½   Ý µ Ú
                                      Ý
                                     call this Æ




                                      9
Between Input and Hidden                                    Ù

                                                                 xk   ujk vj   wij yi

For weights between input units
and hidden units.

                                      ½                     ¡¾
                                      ¾            Ø    Ý

                                                       Ý     Ú
                         Ù                     Ý       Ú     Ù

                             ¡
       Ý
                   Ý    Ø
       Ý
       Ú
               Ý ´½      ÝµÛ
       Ú
       Ù
                   Ú   ´½   Ú µ Ü

                                          ¡
           Ù
                                 Ý     Ø Ý ´½   Ý µ Û Ú ´½   Ú µ Ü


           Ù
                             ÆÛ           Ú   ´½   Ú µ Ü

                                              10
Between Hidden and Output                                 ¡Û

                                                                  xk      ujk vj   wij yi

Modifying weights between hidden
units and output units using
gradient descent.


          ¡Û                Û

                                        ¡
                            Ý    
                                ßÞ Ø        Ý ´½      
                                                    ßÞ Ý µ Ú
                                                 close to ¼ ½
                                                 small for Ý
                 Learning
                 constant




                                                                “input”
                                error




                                            ßÞ
                                            Æ




                                 11
Between Input and Hidden             ¡Ù

                                            xk   ujk vj   wij yi

Modifying weights between input
units and hidden units using
gradient descent.


          ¡Ù            Ù


                            Æ Û Ú   ´½   Ú µÜ

               back propagation of error


The same procedure is applicable to a net with many
hidden layers.




                            12
An Example



        x1      u                                  x2
                      =0                  2.0
                 21
                         .8              =
    u 11 =2.0                     u 12
                                                 u 22 =0.8
                                                                        ܽ ܾ target Ø
     u 10 = -1.0                                u 20 = -1.0              0       0   0
                       v1           v2
1                                                             1
                                                                         0       1   1
                                                                         1       0   1
             w1 =2.0                     w2 = -1.0
                                                                         1       1   0
        1                     y
             w0 = -1.0



                                                                             ¡
                 hidden Ú½                         sig Ù½½Ü½ · Ù½¾Ü¾ · Ù½¼
                                                       
                                                   0.9526                  ¡
                                   Ú¾              sig Ù¾½Ü½ · Ù¾¾Ü¾ · Ù¾¼
                                                       
                                                   0.6457               ¡
                 output Ý                          sig Û½Ú½ · Û¾Ú¾ · Û¼
                                                   0.5645

                       error                        ½  Ø   Ý ¡¾
                                                    ¾
                                                   0.1593




                                                         13
An Example: updating the weights


 Learning constant        ½¼

             output        Æ        ´Ý   ص Ý´½   ݵ
                                    0.1388
                        ¡Û¼           ƽ ¼
                                    -0.1388
                        ¡Û½             ÆÚ½
                                    -0.1322
                        ¡Û¾             ÆÚ¾
                                    -0.0896


 hidden (to Ú½)                         hidden (to Ú¾)
¡Ù½¼        ÆÛ½ Ú½´½     Ú½µ½ ¼ ¡Ù¾¼               ÆÛ¾ Ú¾´½     Ú¾µ½ ¼
        -0.0125                                0.0318
¡Ù½½        ÆÛ½ Ú½´½     Ú½µÜ½ ¡Ù¾½                ÆÛ¾ Ú¾´½     Ú¾µÜ½
        -0.0125                                0.0318
¡Ù½¾        ÆÛ½ Ú½´½     Ú½µÜ¾ ¡Ù¾¾                ÆÛ¾ Ú¾´½     Ú¾µÜ¾
        -0.0125                                0.0318




                               14
An Example: a New Error



        x1      u                                  x2
                                              8
                      =0                  1.9
                 21
                        .83              =
 u 11 =1.98                       u 12
                                                  u 22 =0.83
                                                                        ܽ ܾ target Ø
     u 10 = -1.01                            u 20 = -0.96                0       0   0
                      v1            v2
 1                                                             1
                                                                         0       1   1
                                                                         1       0   1
            w1 =1.86                     w2 = -1.08
                                                                         1       1   0
        1                     y
             w0 = -1.13



                                                                             ¡
                hidden Ú½                          sig Ù½½Ü½ · Ù½¾Ü¾ · Ù½¼
                                                       
                                                   0.9509                  ¡
                                   Ú¾              sig Ù¾½Ü½ · Ù¾¾Ü¾ · Ù¾¼
                                                       
                                                   0.6672               ¡
                 output Ý                          sig Û½Ú½ · Û¾Ú¾ · Û¼
                                                   0.4776

                      error                         ½  Ø   Ý ¡¾
                                                    ¾
                                                   0.1140

The error has reduced for this pattern.



                                                         15
Summary




  Credit-assignment problem solved for hidden units:

           Input                                Output

                                           ƽ
                             Û½

                             Û¾
                 Æ                         ƾ
                             Û¿

             Æ       ¼

                         ´ µÈ Û Æ          Æ¿

                             Errors

       total input to unit ;            1st derivative of acti-
                                    ¼




  vation function (sigmoid)
  Outstanding issues:
  1. Number of layers; number and type of units in
     layer
  2. Learning rates
  3. Local or distributed representations

                              16

More Related Content

Viewers also liked

The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmESCOM
 
Artificial Neural Network
Artificial Neural Network Artificial Neural Network
Artificial Neural Network Iman Ardekani
 
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersArtificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersMohammed Bennamoun
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.pptbutest
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
neural network
neural networkneural network
neural networkSTUDENT
 
Neural Networks and Deep Learning (Part 1 of 2): An introduction - Valentino ...
Neural Networks and Deep Learning (Part 1 of 2): An introduction - Valentino ...Neural Networks and Deep Learning (Part 1 of 2): An introduction - Valentino ...
Neural Networks and Deep Learning (Part 1 of 2): An introduction - Valentino ...Data Science Milan
 

Viewers also liked (10)

Hopfield Networks
Hopfield NetworksHopfield Networks
Hopfield Networks
 
The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning Algorithm
 
Artificial Neural Network
Artificial Neural Network Artificial Neural Network
Artificial Neural Network
 
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersArtificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.ppt
 
HOPFIELD NETWORK
HOPFIELD NETWORKHOPFIELD NETWORK
HOPFIELD NETWORK
 
Back propagation
Back propagationBack propagation
Back propagation
 
neural network
neural networkneural network
neural network
 
Neural Networks and Deep Learning (Part 1 of 2): An introduction - Valentino ...
Neural Networks and Deep Learning (Part 1 of 2): An introduction - Valentino ...Neural Networks and Deep Learning (Part 1 of 2): An introduction - Valentino ...
Neural Networks and Deep Learning (Part 1 of 2): An introduction - Valentino ...
 

More from ESCOM

redes neuronales tipo Som
redes neuronales tipo Somredes neuronales tipo Som
redes neuronales tipo SomESCOM
 
redes neuronales Som
redes neuronales Somredes neuronales Som
redes neuronales SomESCOM
 
redes neuronales Som Slides
redes neuronales Som Slidesredes neuronales Som Slides
redes neuronales Som SlidesESCOM
 
red neuronal Som Net
red neuronal Som Netred neuronal Som Net
red neuronal Som NetESCOM
 
Self Organinising neural networks
Self Organinising  neural networksSelf Organinising  neural networks
Self Organinising neural networksESCOM
 
redes neuronales Kohonen
redes neuronales Kohonenredes neuronales Kohonen
redes neuronales KohonenESCOM
 
Teoria Resonancia Adaptativa
Teoria Resonancia AdaptativaTeoria Resonancia Adaptativa
Teoria Resonancia AdaptativaESCOM
 
ejemplo red neuronal Art1
ejemplo red neuronal Art1ejemplo red neuronal Art1
ejemplo red neuronal Art1ESCOM
 
redes neuronales tipo Art3
redes neuronales tipo Art3redes neuronales tipo Art3
redes neuronales tipo Art3ESCOM
 
Art2
Art2Art2
Art2ESCOM
 
Redes neuronales tipo Art
Redes neuronales tipo ArtRedes neuronales tipo Art
Redes neuronales tipo ArtESCOM
 
Neocognitron
NeocognitronNeocognitron
NeocognitronESCOM
 
Neocognitron
NeocognitronNeocognitron
NeocognitronESCOM
 
Neocognitron
NeocognitronNeocognitron
NeocognitronESCOM
 
Fukushima Cognitron
Fukushima CognitronFukushima Cognitron
Fukushima CognitronESCOM
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORKESCOM
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORKESCOM
 
Counterpropagation
CounterpropagationCounterpropagation
CounterpropagationESCOM
 
Teoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa Art2 ARTMAPTeoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa Art2 ARTMAPESCOM
 
Teoría de Resonancia Adaptativa ART1
Teoría de Resonancia Adaptativa ART1Teoría de Resonancia Adaptativa ART1
Teoría de Resonancia Adaptativa ART1ESCOM
 

More from ESCOM (20)

redes neuronales tipo Som
redes neuronales tipo Somredes neuronales tipo Som
redes neuronales tipo Som
 
redes neuronales Som
redes neuronales Somredes neuronales Som
redes neuronales Som
 
redes neuronales Som Slides
redes neuronales Som Slidesredes neuronales Som Slides
redes neuronales Som Slides
 
red neuronal Som Net
red neuronal Som Netred neuronal Som Net
red neuronal Som Net
 
Self Organinising neural networks
Self Organinising  neural networksSelf Organinising  neural networks
Self Organinising neural networks
 
redes neuronales Kohonen
redes neuronales Kohonenredes neuronales Kohonen
redes neuronales Kohonen
 
Teoria Resonancia Adaptativa
Teoria Resonancia AdaptativaTeoria Resonancia Adaptativa
Teoria Resonancia Adaptativa
 
ejemplo red neuronal Art1
ejemplo red neuronal Art1ejemplo red neuronal Art1
ejemplo red neuronal Art1
 
redes neuronales tipo Art3
redes neuronales tipo Art3redes neuronales tipo Art3
redes neuronales tipo Art3
 
Art2
Art2Art2
Art2
 
Redes neuronales tipo Art
Redes neuronales tipo ArtRedes neuronales tipo Art
Redes neuronales tipo Art
 
Neocognitron
NeocognitronNeocognitron
Neocognitron
 
Neocognitron
NeocognitronNeocognitron
Neocognitron
 
Neocognitron
NeocognitronNeocognitron
Neocognitron
 
Fukushima Cognitron
Fukushima CognitronFukushima Cognitron
Fukushima Cognitron
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORK
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORK
 
Counterpropagation
CounterpropagationCounterpropagation
Counterpropagation
 
Teoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa Art2 ARTMAPTeoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa Art2 ARTMAP
 
Teoría de Resonancia Adaptativa ART1
Teoría de Resonancia Adaptativa ART1Teoría de Resonancia Adaptativa ART1
Teoría de Resonancia Adaptativa ART1
 

Recently uploaded

Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 

Recently uploaded (20)

LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 

The Back Propagation Learning Algorithm

  • 1. The Back Propagation Learning Algorithm For networks with hidden units. Error Correcting algorithm. Solves the credit (blame) assignment problem. 1
  • 2. What is supervised learning? Can we teach a network to learn to associate a pattern of inputs with corresponding outputs? i.e. given initial set of weights, how can they be adapted to produce the desired output? Use a training set: y a f? d payment b e? c w p workload person workload pay P(happy) a 0.1 0.9 0.95 b 0.3 0.7 0.8 c 0.07 0.2 0.2 d 0.9 0.9 0.3 e 0.7 0.5 ?? f 0.4 0.8 ?? After training, how does network generalise to patterns unseen during learning? 2
  • 3. Learning by Error Correction In the perceptron there was a binary valued output Ý and a target Ø. x1 x2 xN w1 w2 wN output y target t y Æ 1 Ý step ÛÜ ¼ 0 Σwi xi i Define this error measure: ½ ´Ø   Ý µ¾ ¾ It counts the number of incorrect outputs. We want to design a weight changing procedure that minimises . 3
  • 4. Learning by Error Correction How do we change the weights Û¼ Û½ ÛÆ so that error decreases? E Suppose error slope slope varies with weight -ve +ve Û like this. wi If we could measure the slope Û then changing weights by the negative of the slope will minimise . slope +ve ¡Û -ve move towards minimum of slope -ve ¡Û +ve 4
  • 5. More Perceptron Problems For the perceptron, can’t be differentiated with respect to weights Û¼ Û½ ÛÆ because involves output Ý which is not differentiable. ½ ´Ø   Ý µ¾ Ý step Æ ÛÜ ¾ ¼ Threshold Unit: y ´ ÈÆ Û Ü 1 ½ if ¼ Ý ¼ if ÈÆ ¼ Û Ü ¼ ¼ 0 Σwi xi i Sigmoid Unit: y ½ 1 Ý  ÈÆ ¡ ½ · ÜÔ   ÛÜ 0 Σwi xi i 5
  • 6. Gradient Descent E The error is now slope slope a differentiable -ve +ve function. wi Change weights using negative slope ¡Û   Û Û +ve ¡Û -ve move towards minimum of Û -ve ¡Û +ve This approach is called Gradient Descent 6
  • 7. Derivation of Back Propagation x1 v1 y1 x2 v2 y2 xk vj yi uj k wi j xN vN yN inputs hidden outputs xk vj yi  È ¡ output Ý sig Û Ú  È ¡ hidden Ú sig Ù Ü error ½È È  Ø   Ý ¡¾ ¾ We need to find the derivatives of with respect to weights Û and Ù . 7
  • 8. Preliminaries xk ujk vj wij yi On a single pattern (drop ) ½   ¡¾ ¾ Ø  Ý and ½ Ý  ÈÆ ¡ ½ · ÜÔ   Û Ú Note that: Ý   ¡ Ú Ý ½ Ý Û Ý   ¡ Û Ý ½ Ý Ú since if Ý ½ ½ · ÜÔ´ Üµ Ý then Ý ´½   Ý µ Ü 8
  • 9. Between Hidden and Output Û xk ujk vj wij yi For weights between hidden units and output units. ½   ¡¾ ¾ Ø  Ý Ý Û Ý Û   ¡ Ý Ý  Ø Ý Û Ý ´½  ÝµÚ   ¡ Û Ý   Ø ßÞ ´½   Ý µ Ú Ý call this Æ 9
  • 10. Between Input and Hidden Ù xk ujk vj wij yi For weights between input units and hidden units. ½   ¡¾ ¾ Ø  Ý Ý Ú Ù Ý Ú Ù   ¡ Ý Ý  Ø Ý Ú Ý ´½  ÝµÛ Ú Ù Ú ´½   Ú µ Ü   ¡ Ù Ý   Ø Ý ´½   Ý µ Û Ú ´½   Ú µ Ü Ù ÆÛ Ú ´½   Ú µ Ü 10
  • 11. Between Hidden and Output ¡Û xk ujk vj wij yi Modifying weights between hidden units and output units using gradient descent. ¡Û   Û   ¡   Ý   ßÞ Ø Ý ´½   ßÞ Ý µ Ú close to ¼ ½ small for Ý Learning constant “input” error ßÞ Æ 11
  • 12. Between Input and Hidden ¡Ù xk ujk vj wij yi Modifying weights between input units and hidden units using gradient descent. ¡Ù   Ù   Æ Û Ú ´½   Ú µÜ back propagation of error The same procedure is applicable to a net with many hidden layers. 12
  • 13. An Example x1 u x2 =0 2.0 21 .8 = u 11 =2.0 u 12 u 22 =0.8 ܽ ܾ target Ø u 10 = -1.0 u 20 = -1.0 0 0 0 v1 v2 1 1 0 1 1 1 0 1 w1 =2.0 w2 = -1.0 1 1 0 1 y w0 = -1.0   ¡ hidden Ú½ sig Ù½½Ü½ · Ù½¾Ü¾ · Ù½¼   0.9526 ¡ Ú¾ sig Ù¾½Ü½ · Ù¾¾Ü¾ · Ù¾¼   0.6457 ¡ output Ý sig Û½Ú½ · Û¾Ú¾ · Û¼ 0.5645 error ½  Ø   Ý ¡¾ ¾ 0.1593 13
  • 14. An Example: updating the weights Learning constant ½¼ output Æ ´Ý   ص Ý´½   ݵ 0.1388 ¡Û¼   ƽ ¼ -0.1388 ¡Û½   ÆÚ½ -0.1322 ¡Û¾   ÆÚ¾ -0.0896 hidden (to Ú½) hidden (to Ú¾) ¡Ù½¼   ÆÛ½ Ú½´½   Ú½µ½ ¼ ¡Ù¾¼   ÆÛ¾ Ú¾´½   Ú¾µ½ ¼ -0.0125 0.0318 ¡Ù½½   ÆÛ½ Ú½´½   Ú½µÜ½ ¡Ù¾½   ÆÛ¾ Ú¾´½   Ú¾µÜ½ -0.0125 0.0318 ¡Ù½¾   ÆÛ½ Ú½´½   Ú½µÜ¾ ¡Ù¾¾   ÆÛ¾ Ú¾´½   Ú¾µÜ¾ -0.0125 0.0318 14
  • 15. An Example: a New Error x1 u x2 8 =0 1.9 21 .83 = u 11 =1.98 u 12 u 22 =0.83 ܽ ܾ target Ø u 10 = -1.01 u 20 = -0.96 0 0 0 v1 v2 1 1 0 1 1 1 0 1 w1 =1.86 w2 = -1.08 1 1 0 1 y w0 = -1.13   ¡ hidden Ú½ sig Ù½½Ü½ · Ù½¾Ü¾ · Ù½¼   0.9509 ¡ Ú¾ sig Ù¾½Ü½ · Ù¾¾Ü¾ · Ù¾¼   0.6672 ¡ output Ý sig Û½Ú½ · Û¾Ú¾ · Û¼ 0.4776 error ½  Ø   Ý ¡¾ ¾ 0.1140 The error has reduced for this pattern. 15
  • 16. Summary Credit-assignment problem solved for hidden units: Input Output ƽ Û½ Û¾ Æ Æ¾ Û¿ Æ ¼ ´ µÈ Û Æ Æ¿ Errors total input to unit ; 1st derivative of acti- ¼ vation function (sigmoid) Outstanding issues: 1. Number of layers; number and type of units in layer 2. Learning rates 3. Local or distributed representations 16