SlideShare uma empresa Scribd logo
1 de 52
Training on Errors Experiment
              to Detect Fault-Prone Software Modules
                                        by Spam Filter


                                                 Osamu Mizuno, Tohru Kikuno
                  Graduate School of Information Science and Technology
                                               Osaka University, JAPAN

Osaka Univ.                                                             ESEC/FSE2007 presentation



                                                                                                    1
                    (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved
Training on Errors Experiment
              to Detect Fault-Prone Software Modules
                                        by Spam Filter


                                                 Osamu Mizuno, Tohru Kikuno
                  Graduate School of Information Science and Technology
                                               Osaka University, JAPAN

Osaka Univ.                                                             ESEC/FSE2007 presentation



                                                                                                    1
                    (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved
2



Osaka Univ.                                                                     What We Tried…

Idea: Fault-prone filtering
         Detection of fault-prone modules using a generic text
         discriminator such as a spam filter


Experiment
         SPAM filter: CRM114 (generic text discriminator)
         Data of fault-proneness from an OSS project (Eclipse)
         Training Only Errors(TOE) procedure
Result
         Achieved high recall (despite of low precision)

                          (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved   ESEC/FSE2007
3



Osaka Univ.                                                                               Overview

     Preliminary


     Fault-Prone Filtering


     Experiments
         Training Only Errors (TOE) procedure
         Results


     Conclusions

                         (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved       ESEC/FSE2007
4
                                                                                          Preliminary:
Osaka Univ.                                                         Fault-Prone Modules

     Fault-prone modules are:
         Software modules (a certain unit of source code) which may
         include faults.


     In this study:
         Source code of Java methods which seems to include faults
         from the information of a bug tracking system.




                         (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved          ESEC/FSE2007
5
                                                                                                 Preliminary:
Osaka Univ.
                                                               Spam E-mail Filtering (1)

     Spam e-mail increases year by year.
         About 94% of entire e-mail messages are Spam.


     Various spam filters have been developed.
         Pattern matching based approach causes a rat
         race between spammers and developers.
         Bayesian classification based approach has been
         recognized effective[1].

         [1] P. Graham, Hackers and Painters: Big Ideas from the Computer
         Age, chapter 8, pp. 121-129, 2004.


                                (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved         ESEC/FSE2007
6
                                                                                              Preliminary:
Osaka Univ.
                                                            Spam E-mail Filtering (2)

     All e-mail messages can be classified into
         Spam: undesired e-mail
         Ham: desired e-mail
     Tokenize and learn both spam and ham e-mail messages as
     text data and construct corpuses.
    Existing e-mail

                      Learning (Training)
       HAM
     HAM SPAM

                       SPAM            HAM
                       corpus         corpus
                             SPAM
                             Filter


                             (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved          ESEC/FSE2007
6
                                                                                                       Preliminary:
Osaka Univ.
                                                                     Spam E-mail Filtering (2)

     All e-mail messages can be classified into
         Spam: undesired e-mail
         Ham: desired e-mail
     Tokenize and learn both spam and ham e-mail messages as
     text data and construct corpuses.
    Existing e-mail

                             Learning (Training)
       HAM
     HAM SPAM

                                 SPAM           HAM                                      Incoming e-mail messages
   ?                  Classify
                                 corpus
                                      SPAM
                                      Filter
                                               corpus
                                                                       SPAM
                                                                                         are classified into spam or
                                                                                         ham by spam filter.
                                                                       HAM
    Incoming e-mail
                                      (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved          ESEC/FSE2007
7



Osaka Univ.                                                                               Overview

     Preliminary


     Fault-Prone Filtering


     Experiments
         Training Only Errors (TOE) procedure
         Results


     Conclusions

                         (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved       ESEC/FSE2007
8



Osaka Univ.                                                                 Fault-Prone Filtering

     All software modules can be classified into
         bug-detected (fault-prone: FP)
         not-bug-detected (not-fault-prone: NFP)
     Tokenize and learn both FP and NFP modules as
     text data and construct corpuses
Existing code modules

              FP      FP   Learning (Training)
                   NFP



                              FP             NFP
                            corpus          corpus
                                    FP
                                   Filter

                                 (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved   ESEC/FSE2007
8



Osaka Univ.                                                                       Fault-Prone Filtering

     All software modules can be classified into
         bug-detected (fault-prone: FP)
         not-bug-detected (not-fault-prone: NFP)
     Tokenize and learn both FP and NFP modules as
     text data and construct corpuses
Existing code modules

              FP      FP      Learning (Training)
                   NFP




     ?
                                    FP            NFP                                      Newly developed modules
                       Classify
                                  corpus         corpus
                                                                             FP
                                                                                           are classified into FP or
                                         FP                                                NFP by the FP filter.
                                        Filter
                                                                             NFP
  Newly created code
                                      (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved         ESEC/FSE2007
9
                                                                                     Fault-Prone Filtering:
Osaka Univ.                                                                  Spam Filter: CRM114

     Spam filter: CRM114 (http://crm114.sourceforge.net/)
         Generic text discriminator for various purpose
         Implements several classifiers: Markov, OSB, kNN, ...
         Characteristic: generation of tokens.
              Tokens are generated by combination of words (not a single word)
        return (x + 2 * y );

                              with the OSB tokenizer                                               token

     return       x                 x +                                       + 2                   2      y

     return           +             x         2                               +         *           * y

     return               2         x              *                          +              y

     return                   *     x                   y                     2 *
                                  (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved               ESEC/FSE2007
10
                                                                             Fault-Prone Filtering:
 Osaka Univ.   Example of Fault-Prone Filtering (CRM114, OSB)
Source code (mFP)                                               Source code (mNFP)
public int fact(int x) {                                           public int fact(int x) {
   return (x<=1?1:x*fact(++x));                                       return (x<=1?1:x*fact(--x));
}                                                                  }




                          (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved      ESEC/FSE2007
10
                                                                             Fault-Prone Filtering:
 Osaka Univ.   Example of Fault-Prone Filtering (CRM114, OSB)
Source code (mFP)                                               Source code (mNFP)
public int fact(int x) {                                           public int fact(int x) {
   return (x<=1?1:x*fact(++x));                                       return (x<=1?1:x*fact(--x));
}                                                                  }




                                                  Empty                    Empty

                                          FP                  NFP
                                        corpus               corpus
                                                    FP
                                                   Filter




                          (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved      ESEC/FSE2007
10
                                                                             Fault-Prone Filtering:
 Osaka Univ.   Example of Fault-Prone Filtering (CRM114, OSB)
Source code (mFP)                                               Source code (mNFP)
public int fact(int x) {                                           public int fact(int x) {
   return (x<=1?1:x*fact(++x));                                       return (x<=1?1:x*fact(--x));
}                                                                  }

      Tokens (TFP)                                                                         Tokens (TNFP)
       public int                                                                          public int
       public fact                                                                         public fact
       public x                                                                            public x
       int fact                                                                            int fact
       int int                                                                             int int
            ...                                                                                 ...
            ...                                                                                 ...
       x ++                               FP                  NFP                          x --
       x x                              corpus               corpus                        x x
       * fact                                                                              * fact
       * ++                                         FP                                     * --
       * x                                         Filter                                  * x
       fact ++                                                                             fact --
       fact x                                                                              fact x
       ++ x                                                                                -- x

                          (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved            ESEC/FSE2007
10
                                                                             Fault-Prone Filtering:
 Osaka Univ.   Example of Fault-Prone Filtering (CRM114, OSB)
Source code (mFP)                                               Source code (mNFP)
public int fact(int x) {                                           public int fact(int x) {
   return (x<=1?1:x*fact(++x));                                       return (x<=1?1:x*fact(--x));
}                                                                  }

      Tokens (TFP)                                                                         Tokens (TNFP)
       public int                                                                          public int
       public fact                                                                         public fact
       public x                                                                            public x
       int fact         Training                                        Training           int fact
       int int                                                                             int int
            ...                                                                                 ...
            ...                                                                                 ...
       x ++                               FP                  NFP                          x --
       x x                              corpus               corpus                        x x
       * fact                                                                              * fact
       * ++                                         FP                                     * --
       * x                                         Filter                                  * x
       fact ++                                                                             fact --
       fact x                                                                              fact x
       ++ x                                                                                -- x

                          (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved            ESEC/FSE2007
11
                                                                            Fault-Prone Filtering:
Osaka Univ.   Example of Fault-Prone Filtering (CRM114, OSB)
       Source code (mnew)
          public int sigma(int x) {
             return (x<=0?0:x+sigma(++x));
          }




                         (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved      ESEC/FSE2007
11
                                                                             Fault-Prone Filtering:
Osaka Univ.   Example of Fault-Prone Filtering (CRM114, OSB)
       Source code (mnew)
          public int sigma(int x) {
             return (x<=0?0:x+sigma(++x));
          }

                   Tokens (Tnew)
                    public int
                    public sigma
                    public x
                    int sigma
                    int int
                         ...
                         ...
                    x ++
                    x x
                    + sigma
                    + ++
                    + x
                    sigma ++
                    sigma x
                    ++ x

                          (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved      ESEC/FSE2007
11
                                                                             Fault-Prone Filtering:
Osaka Univ.   Example of Fault-Prone Filtering (CRM114, OSB)
       Source code (mnew)
          public int sigma(int x) {
             return (x<=0?0:x+sigma(++x));
          }

                   Tokens (Tnew)
                    public int
                    public sigma
                    public x
                    int sigma
                    int int
                         ...                                                         FP              NFP
                         ...                                                       corpus           corpus
                    x ++
                    x x                            Prediction                               FP
                    + sigma                                                                Filter
                    + ++
                    + x
                    sigma ++
                    sigma x
                    ++ x

                          (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved                     ESEC/FSE2007
11
                                                                             Fault-Prone Filtering:
Osaka Univ.   Example of Fault-Prone Filtering (CRM114, OSB)
       Source code (mnew)
          public int sigma(int x) {
             return (x<=0?0:x+sigma(++x));
          }

Tokens (TFP) Tokens (Tnew)                       Tokens (TNFP)
public int          public int                     public int
public fact         public sigma                   public fact
public x            public x                       public x
int fact            int sigma                      int fact
int int             int int                        int int
     ...                 ...                            ...
     ...                 ...                            ...
x ++                x ++                           x --
x x                 x x                            x x
* fact              + sigma                        * fact
* ++                + ++                           * --
* x                 + x                            * x
fact ++             sigma ++                       fact --
fact x              sigma x                        fact x
++ x                ++ x                           -- x

                          (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved      ESEC/FSE2007
11
                                                                             Fault-Prone Filtering:
Osaka Univ.   Example of Fault-Prone Filtering (CRM114, OSB)
       Source code (mnew)
          public int sigma(int x) {
             return (x<=0?0:x+sigma(++x));
          }

Tokens (TFP) Tokens (Tnew)                       Tokens (TNFP)
public int          public int                     public int
public fact         public sigma                   public fact
public x            public x                       public x
int fact            int sigma                      int fact
int int             int int                        int int
     ...                 ...                            ...
     ...                 ...                            ...
x ++                x ++                           x --
x x                 x x                            x x
* fact              + sigma                        * fact
* ++                + ++                           * --
* x                 + x                            * x
fact ++             sigma ++                       fact --
fact x              sigma x                        fact x
++ x                ++ x                           -- x

                          (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved      ESEC/FSE2007
11
                                                                             Fault-Prone Filtering:
Osaka Univ.   Example of Fault-Prone Filtering (CRM114, OSB)
       Source code (mnew)
          public int sigma(int x) {
             return (x<=0?0:x+sigma(++x));
          }

                   Tokens (Tnew)
                                                                                       mnew is predicted as FP
                    public int
                    public sigma                                                       because TFP has more
                    public x                                                           similarity than TNFP.
                    int sigma
                    int int
                         ...                                                       FP                NFP
                         ...                                                     corpus             corpus
                    x ++                                                                                     Probability:
                    x x                           Prediction                                FP                  0.52
                    + sigma                                                                Filter
                    + ++
                                                                                                             Predicted:
                    + x                                                                                          FP
                    sigma ++
                    sigma x
                    ++ x

                          (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved                      ESEC/FSE2007
12



Osaka Univ.                                                                               Overview

     Preliminary


     Fault-Prone Filtering


     Experiments
         Training Only Errors (TOE) procedure
         Results


     Conclusions

                         (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved       ESEC/FSE2007
13



Osaka Univ.                                                                                        Experiments

     Target: Eclipse project
         Written in Java
              “Methods” in Java classes are considered as modules


         Date of snapshots of cvs repository and bugzilla database
              January 30, 2007.


         Large CVS repository (about 14GB)
         Faults are recorded precisely



                                  (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved          ESEC/FSE2007
14
                                                                                                  Experiment:
Osaka Univ.                                      Collecting FP & NFP Modules

     Track FP modules from CVS log based on an algorithm by
     Sliwerski, et. al[2].
     [2] J. Sliwerski, et. al., When do changes induce fixes? (on fridays.). In Proc. of MSR2005, pp.
     24-28, 2005.



         Search terms such as “issue”, “problem”, “#”, and bug id as well as
         “fixed”, “resolved”, or “removed” from CVS log, then identify a
         revision the bug is removed.
         Get difference from the previous revision and identify modified
         modules.
         Track back repository and identify modules that have not been
         modified since the bug is reported.
              They are FP modules.
                                 (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved         ESEC/FSE2007
15
                                                                                                 Experiment:
Osaka Univ.                                          Result of Module Collection

     Extracted bugs from bugzilla database of Eclipse
         Conditions:
              Type of faults: Bugs
              Status of faults: Resolved,Verified, or Closed
              Resolution of faults: Fixed
              Severity: Blocker, Critical, Major, or Normal
         Total # of faults: 40,627
     Result of collection
         # of faults found in CVS log:                       21,761 (52% of total)
              # of fault-prone(FP) modules: 65,782
              # of not-fault-prone(NFP) modules: 1,113,063

                                (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved         ESEC/FSE2007
16



Osaka Univ.                                                                            Overview

     Preliminary


     Fault-Prone Filtering


     Experiments
         Training Only Errors (TOE) procedure
         Results


     Conclusions

                      (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved       ESEC/FSE2007
17



Osaka Univ.                         Training Only Errors Procedure

     In Spam filtering:
         Apply e-mail messages to
         spam filter in order of arrival.
         Only misclassified e-mail
         messages are trained in
         corpuses.
         You may do this procedure
         in daily e-mail assorting.
     In Fault-prone filtering:
         Apply software modules to fault-prone filter in order of
         construction and modification.
         Only misclassified modules are trained in corpuses.
                           (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved   ESEC/FSE2007
18



Osaka Univ.                                                                               Overview

     Preliminary


     Fault-Prone Filtering


     Experiments
         Training Only Errors (TOE) procedure
         Results


     Conclusions

                         (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved       ESEC/FSE2007
19



Osaka Univ.                                          Evaluation Measurements

     Accuracy
                                                                                 Result of        Predicted
         Overall accuracy of prediction                                         prediction       NFP     FP
         (N1+N4) / (N1+N2+N3+N4)                                                           NFP     N1         N2
                                                                              Actual
                                                                                           FP      N3         N4
     Recall
         How much actual FP modules are predicted as FP.
         N3 / (N3+N4)
     Precision
         How much predicted FP modules include actual FP modules
         N2 / (N2+N4)



                          (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved                 ESEC/FSE2007
20



Osaka Univ.   Result of Experiment (Transition of Rates)

   All extracted                              1

   modules are sorted                        0.9
                                                   AA
                                                    A
                                                     A
                                                     A
                                                     A
                                                      AAAA
                                                       AA
                                                           AAAA
                                                        AAAA
                                                        AAA
                                                                                                                                            Accuracy
                                                                           AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                                                                 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                                                                  A AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                                                                 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                                                               A AAAAAAAA
                                                              AAA AAA
                                                             AAA
                                                                                                                                                                                                              AAAA A




   by date, and applied
                                             0.8   A
                                                   A
                                                   A                        FFFF
                                                                            F
                                                   A
                                                   A                       FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
                                                                           FFFFFFFFFF FFF
                                                                           F FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
                                                                                      F FF FFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
                                                                           F                                   FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
                                                                                                                                                                            FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
                                                                                                                                                                                 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
                                                                           F                                                                                                                 FFFFFFFFFFFFFFFFFFFFFF
                                             0.7         F
                                                         F FFFFFFFFFFFFFFFF
                                                            FFFFFFFFFFFFFFF
                                                          FFFFFFFFFFFFFFFFF
                                                        F
                                                        F                                                                                     Recall
   FP filter one by one
                                                        F
                                                      FF
                                                     FF
                                                   FFF
                                                     F
                                                       F
                                                   FFF
                                                   F
                                             0.6   F
                                                   F
                                                    F
                                                    F




                                      Rate
   from the oldest one.                      0.5
                                                                                                           I
                                                                                                           IIII
                                                                                                                                            Precision
                                             0.4                                                           I IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIII
                                                                                                           I
                                                                                                           I                                                                 IIIII           IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                                                            II IIIIIIIIIIIIIIIIIIIIII                                                                                                                                                                                                                                                                                                   IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                                                           IIII
                                                           I                        IIIIIIIIIIIIIIIIIIIIIIII                                                                                                                                                                                                                                                                                                                                                          IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                                                         I
                                                        IIII
                                             0.3    II
                                                   IIII
                                                     I
                                                       I
                                                       II
                                                   DI
                                                   II
                                                    I
                                                   D
                                                   D
                                                   D
                                                   D
                                             0.2   DDD
                                                   DDD
                                                     D
                                                    DDD
                                                   DDD
                                                    DDDDD
                                                     D DDDDDD
                                                      D DDDDDDD                                                                               False-negative
                                                       D DDDDDDDDDDDDD

   Observation
                                                       D DDDDDDDDDDD
                                                       DDD                      DDDDDDD
                                                                                    DDDDDD                DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
                                                        DDD
                                                         DD
                                                          DDDD
                                                             DDD                                           DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
                                                                                                            DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
                                                                                                           DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
                                                                                                             D
                                                              DDDD
                                                                 DDD
                                             0.1                      DD DDDDD
                                                                       DDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDD
                                                                             DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
                                                                            DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
                                                                              D                    DDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
                                                                                                                D                                                                   D                  DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD                                                                                                                                                                                                                          DDD
                                                                                                                                              False-positive
                                              0
      The prediction result




                                                                                                                                                                                                                                                                                                                                                                                                                                                      0

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            0
                                                                                         00

                                                                                                                               00

                                                                                                                                                                     00

                                                                                                                                                                                                           00

                                                                                                                                                                                                                                                  00

                                                                                                                                                                                                                                                                                        00

                                                                                                                                                                                                                                                                                                                              00

                                                                                                                                                                                                                                                                                                                                                                    00

                                                                                                                                                                                                                                                                                                                                                                                                          00
                                                          0




                                                                                                                                                                                                                                                                                                                                                                                                                                         00

                                                                                                                                                                                                                                                                                                                                                                                                                                                                               00
                                               00
                                                                            00

                                                                                                                  00

                                                                                                                                                        00

                                                                                                                                                                                              00

                                                                                                                                                                                                                                    00

                                                                                                                                                                                                                                                                          00

                                                                                                                                                                                                                                                                                                                00

                                                                                                                                                                                                                                                                                                                                                      00

                                                                                                                                                                                                                                                                                                                                                                                            00
                                                                                                                                                                                                                                                                                                                                                                                                                            50

                                                                                                                                                                                                                                                                                                                                                                                                                                                                  50
                                              50




      become stable after
                                                               15

                                                                                                     25

                                                                                                                                           35

                                                                                                                                                                                 45

                                                                                                                                                                                                                       55

                                                                                                                                                                                                                                                             65

                                                                                                                                                                                                                                                                                                   75

                                                                                                                                                                                                                                                                                                                                         85

                                                                                                                                                                                                                                                                                                                                                                               95
                                                                                                                                                                                                                                                                                                                                                                                                              10

                                                                                                                                                                                                                                                                                                                                                                                                                                                    11
                                                   old                                                                                                                               Methods sorted by date                                                                                                                                                                                                                                                                           new
      50,000 modules
      classification.


                        (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved                                                                                                                                                                                                                                                                                                                                                                          ESEC/FSE2007
21



Osaka Univ.            Result of Experiment (Final Accuracy)

         Cumulative prediction result at the end of TOE.

                   TOE - final                       Predicted
                     OSB
                                                                                                 Precision: 0.347
                                             NFP                      FP
                                                                                                 Recall:   0.728
                           NFP            1,022,895                   90,168
                  Actual                                                                         Accuracy: 0.908
                           FP                   17,890                47,892

         In other words,
              72% of actual FP modules are predicted as FP.
              34% of predicted FP modules include faults.



                                (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved                      ESEC/FSE2007
22



Osaka Univ.                                                                               Overview

     Preliminary


     Fault-Prone Filtering


     Experiments
         Training Only Errors (TOE) procedure
         Results


     Conclusions

                         (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved       ESEC/FSE2007
23



Osaka Univ.                                                                  Threats to Validity

     Threats to construction validity
         Collection of fault-prone modules from OSS projects.
         We could not cover all faults in bugzilla database.
         We have to collect more reliable data in future work.


     Threats to external validity
         Generalizability of the results
         We have to apply Fault-prone Filtering to many projects including
         industrial ones.



                           (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved   ESEC/FSE2007
24



Osaka Univ.                                                                                Related Works

     Much research has been done so far.
         Logistic regression
         CART
         Bayesian classification
         and more.


     Most of them use software metrics
         McCabe, Halstead, Object-oriented, and so on.
     Intuitively speaking, our approach uses a new metric,
     “frequency of tokens”.

                          (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved            ESEC/FSE2007
25



Osaka Univ.                                                                                    Conclusions

     Summary
         We proposed the new approach to detect fault prone modules
         using spam filter.
         The case study showed that our approach can predict fault prone
         modules with high accuracy.


     Future works
         Using semantic parsing information instead of raw code
         Using differences between revisions as an input of Fault-prone
         filtering
              Seems more reasonable...

                              (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved          ESEC/FSE2007
26



Osaka Univ.                                                                    Q&A




                         Thank you!




                            Any questions?


              (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved   ESEC/FSE2007
27
                                                 Result of Cross Validation
                                                  (From slide of MSR2007)

Result for Eclipse BIRT plugin
10-fold cross validation
       Cross Validation               Predicted
            OSB                                                              Precision: 0.319
                                  NFP                FP
                  NFP             70,369            16,011                   Recall:     0.786
       Actual
                   FP               2,039             7,501                  Accuracy: 0.811


                        Recall is important for quality assurance.
                        Precision implies the cost for finding FP modules.


       Recall is rather high, and precision is rather low.

                        (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved           MSR2007
28
                           Training Only Errors Procedure
  Osaka Univ.           Case 2: Prediction matches to actual status




?
 ?
     ?
  .java                                    FP                  NFP

        ?
     .java                               corpus               corpus
        .java                                        FP
                .java                               Filter




                        (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved   ESEC/FSE2007
28
                                       Training Only Errors Procedure
  Osaka Univ.                       Case 2: Prediction matches to actual status




?
 ?
     ?
  .java                                                FP                  NFP                Probability:
                        ?
     .java                                           corpus               corpus
                                                                                                 0.9
                .java                                            FP
                                                                                              Predicted:
                            .java                               Filter
                                    Prediction                                                    FP




                                    (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved           ESEC/FSE2007
28
                           Training Only Errors Procedure
  Osaka Univ.           Case 2: Prediction matches to actual status




?
 ?
     ?
  .java                                    FP                  NFP                Probability:
     .java                               corpus               corpus
                                                                                     0.9
                .java                                FP
                                                                                  Predicted:
                                                    Filter
                                                                                      FP


                                            ?FP
                                                  .java

                        (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved           ESEC/FSE2007
28
                           Training Only Errors Procedure
  Osaka Univ.           Case 2: Prediction matches to actual status




?
 ?
     ?
  .java                                    FP                  NFP                Probability:
     .java                               corpus               corpus
                                                                                     0.9
                .java                                FP
                                                                                  Predicted:
                                                    Filter
                                                                                      FP


                                                                                  ?FP
                                                                                         .java

                        (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved           ESEC/FSE2007
28
                           Training Only Errors Procedure
  Osaka Univ.           Case 2: Prediction matches to actual status




?
 ?
     ?
  .java                                    FP                  NFP
     .java                               corpus               corpus
                .java                                FP
                                                    Filter




                        (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved   ESEC/FSE2007
28
                              Training Only Errors Procedure
 Osaka Univ.               Case 2: Prediction matches to actual status




?
 ?.java                                       FP                  NFP

               ?
      .java                                 corpus               corpus
                                                        FP
                   .java                               Filter
                           Prediction




                           (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved   ESEC/FSE2007
29
                               Training Only Errors Procedure
      Osaka Univ.   Case 1: Prediction does not match to actual status




?
    ?
     ?
    .java
           ?
       .java                                  FP                  NFP

            ?
          .java                             corpus               corpus
             .java                                      FP
                .java                                  Filter




                           (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved   ESEC/FSE2007
29
                                Training Only Errors Procedure
      Osaka Univ.    Case 1: Prediction does not match to actual status




?
    ?
     ?
    .java
           ?
       .java                                        FP                  NFP
                                                                                           Probability:
                     ?
          .java                                   corpus               corpus
                                                                                              0.2
             .java                                            FP
                                                                                           Predicted:
                         .java                               Filter
                                 Prediction                                                   NFP




                                 (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved           ESEC/FSE2007
29
                                Training Only Errors Procedure
      Osaka Univ.    Case 1: Prediction does not match to actual status




?
    ?
     ?
    .java
           ?
       .java                                   FP                  NFP
          .java                              corpus               corpus
                                                                                      Probability:
                                                                                         0.2
             .java                                       FP
                                                                                      Predicted:
                                                        Filter
                                                                                         NFP


                                                ? FP
                                                      .java

                            (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved           ESEC/FSE2007
29
                                Training Only Errors Procedure
      Osaka Univ.    Case 1: Prediction does not match to actual status




?
    ?
     ?
    .java
           ?
       .java                                   FP                  NFP
          .java                              corpus               corpus
                                                                                      Probability:
                                                                                         0.2
             .java                                       FP
                                                                                      Predicted:
                                                        Filter
                                                                                         NFP


                                                                                      ? FP
                                                                                             .java

                            (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved           ESEC/FSE2007
29
                                       Training Only Errors Procedure
      Osaka Univ.           Case 1: Prediction does not match to actual status


                                                          ? FP
?                                                               .java
    ?                                  Training
       ?
    .java
          ?
       .java                                          FP                  NFP
          .java                                     corpus               corpus
                                                                                             Probability:
                                                                                                0.2
                    .java                                       FP
                                                                                             Predicted:
                                                               Filter
                                                                                                NFP




                                   (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved           ESEC/FSE2007
29
                                Training Only Errors Procedure
    Osaka Univ.      Case 1: Prediction does not match to actual status




?
    ?
     ?
    .java                                      FP                  NFP

             ?
       .java                                 corpus               corpus
          .java                                          FP
             .java                                      Filter




                            (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved   ESEC/FSE2007
30



Osaka Univ.                                            Procedure of Experiment

     Two experiments with different thresholds of
     probability(tFP) to determine FP and NFP.
         Changing tFP may achieve higher recall


     Experiment 1:
         TOE with OSB classifier, tFP=0.5


     Experiment 2:
         TOE with OSB classifier, tFP=0.25
         Predict more modules as FP than Experiment 1

                          (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved   ESEC/FSE2007
31



Osaka Univ.             Result of Experiment (OSB, tFP = 0.25)

 Comparison with                               1
                                              0.9                                                                                        Recall
 threshold = 0.50
                                                                                          FF
                                                                                         FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
                                                                                         FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
                                                                                         F FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
                                                                                                                                                                                                  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
                                                    D                                    F                                                                                                 A AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFFFFFFFAAAAAFFFFFFFFFF
                                                                                                                                                                                     AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFFFFFFFFFFFFFFFFF
                                                                         FFFFFAFAFAFAF AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                                                              FFFFAAAAAFAFAFA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                                                                                                                                                                                                            AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFAAAAAAAA                                                                                                                                      AAAAAAAAFFAAAAAAAAA             AAAAAAAAAA
                                                             FFFFFAAAAAFAFAFAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A
                                                                              FFF
                                                             FFFFFAAAFFAFAFAFAAAAAAAAAAAAAAAAAAAAAAAAAAA
                                                                           FFFF FAFAFAFA
                                                                          FFAAFAFAFAFA
                                              0.8   DFFF AA
                                                    F FA
                                                    FF A
                                                    DF A
                                                            F AAAA
                                                            F AA
                                                         FFAAA
                                                          F AA        A
                                                     F AA
                                                    FFAA
                                                    FAFA                                                                                 A ccuracy
                                                       A
                                                       A
                                              0.7      A

     Precision becomes
                                                    DA
                                                     AA
                                                      A
                                                    AA
                                                    A
                                                    A
                                                    D
                                              0.6   A
                                                    D
                                                    D

     lower.
                                                    A
                                                    A
                                                    A
                                                    DD




                                       Rate
                                                     D
                                              0.5    D
                                                     DD
                                                      D
                                                      DD
                                                      DD
                                                       D
                                                       D

          Only 1/4 of FP                      0.4      D
                                                       DD
                                                        D
                                                        DD
                                                         D
                                                         D
                                              0.3
                                                         DDDD
                                                            D
                                                            D
                                                           DD
                                                           DDD                                                                           Precision
          predicted modules
                                                                D
                                                                DD                       I
                                                                                         IIIIIIIIIIIIIII
                                                                 DD
                                                                  DD
                                                                   DD
                                                                IIIIIIIIIIIIIDIIIIII
                                                                        DDI              I I IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                                                                                         I
                                                            IIIII DIDDDDDDIIIIIIIII
                                                                          DID DIDIIIIII I
                                                                           DI IIIIIIII
                                                                              DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
                                                                                                                                                                                                                                                                                                                                                                                     IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            III
                                                         IIII
                                                          I                   DDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
                                                                                D DDDDDDDDDDDDDDDDDDDDDDDD D
                                                                                       DDDDDDDDDDDDDDDDDDDDDDD DDDDDDD D                                DDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD D
                                                                                                                                                            D DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
                                              0.2   IIIIII
                                                     III
                                                    II
                                                                                                                                         False-positive
                                                                                                                                                                                      DD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
                                                                                                                                                                                            D DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDD                                                                                                                                                                            D DDDDDDDDDD                   DDDD

          hits actual faulty
                                                     I

                                              0.1   DDDDDDDDDDDDD
                                                                                                                                         False-negative
                                                     DDDDDDDDDDDDDDD
                                                     D
                                                    DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
                                                    DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD

          modules.
                                                                  DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
                                                                     DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
                                               0




                                               11 00

                                                       0
                                                25 0

                                                35 0

                                                45 0

                                                55 0

                                                65 0

                                                75 0

                                                85 0

                                                95 0
                                               10 00
                                                15 0




                                                    00
                                                     0

                                                     0

                                                     0

                                                     0

                                                     0

                                                     0

                                                     0

                                                     0
                                                    00




                                                    0
                                                  00

                                                  00

                                                  00

                                                  00

                                                  00

                                                  00

                                                  00

                                                  00

                                                  00
     Recall becomes much




                                                 50

                                                 50
                                                 50




     higher.                                                                                                                                                               Methods sorted by date

                                              TOE - final                                                                                                                       Predicted                                                                                                                                                   Precision: 0.232
          83% of actual faulty                  OSB                                                                                                      NFP                                                                                         FP
          modules can be                                                                                                                                                                                                                                                                                                                   Recall:                                                                                                           0.839
                                                                                   NFP                                                                930,218                                                                          182,845
                                        Actual
          detected.                                                                       FP                                                                   10,592                                                                           55,190                                                                                     Accuracy: 0.835
                          (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved                                                                                                                                                                                                                                                                                                                                                   ESEC/FSE2007

Mais conteúdo relacionado

Último

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Último (20)

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Destaque

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Destaque (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

ESEC/FSE 2007 presentation slide by Osamu Mizuno

  • 1. Training on Errors Experiment to Detect Fault-Prone Software Modules by Spam Filter Osamu Mizuno, Tohru Kikuno Graduate School of Information Science and Technology Osaka University, JAPAN Osaka Univ. ESEC/FSE2007 presentation 1 (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved
  • 2. Training on Errors Experiment to Detect Fault-Prone Software Modules by Spam Filter Osamu Mizuno, Tohru Kikuno Graduate School of Information Science and Technology Osaka University, JAPAN Osaka Univ. ESEC/FSE2007 presentation 1 (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved
  • 3. 2 Osaka Univ. What We Tried… Idea: Fault-prone filtering Detection of fault-prone modules using a generic text discriminator such as a spam filter Experiment SPAM filter: CRM114 (generic text discriminator) Data of fault-proneness from an OSS project (Eclipse) Training Only Errors(TOE) procedure Result Achieved high recall (despite of low precision) (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 4. 3 Osaka Univ. Overview Preliminary Fault-Prone Filtering Experiments Training Only Errors (TOE) procedure Results Conclusions (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 5. 4 Preliminary: Osaka Univ. Fault-Prone Modules Fault-prone modules are: Software modules (a certain unit of source code) which may include faults. In this study: Source code of Java methods which seems to include faults from the information of a bug tracking system. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 6. 5 Preliminary: Osaka Univ. Spam E-mail Filtering (1) Spam e-mail increases year by year. About 94% of entire e-mail messages are Spam. Various spam filters have been developed. Pattern matching based approach causes a rat race between spammers and developers. Bayesian classification based approach has been recognized effective[1]. [1] P. Graham, Hackers and Painters: Big Ideas from the Computer Age, chapter 8, pp. 121-129, 2004. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 7. 6 Preliminary: Osaka Univ. Spam E-mail Filtering (2) All e-mail messages can be classified into Spam: undesired e-mail Ham: desired e-mail Tokenize and learn both spam and ham e-mail messages as text data and construct corpuses. Existing e-mail Learning (Training) HAM HAM SPAM SPAM HAM corpus corpus SPAM Filter (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 8. 6 Preliminary: Osaka Univ. Spam E-mail Filtering (2) All e-mail messages can be classified into Spam: undesired e-mail Ham: desired e-mail Tokenize and learn both spam and ham e-mail messages as text data and construct corpuses. Existing e-mail Learning (Training) HAM HAM SPAM SPAM HAM Incoming e-mail messages ? Classify corpus SPAM Filter corpus SPAM are classified into spam or ham by spam filter. HAM Incoming e-mail (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 9. 7 Osaka Univ. Overview Preliminary Fault-Prone Filtering Experiments Training Only Errors (TOE) procedure Results Conclusions (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 10. 8 Osaka Univ. Fault-Prone Filtering All software modules can be classified into bug-detected (fault-prone: FP) not-bug-detected (not-fault-prone: NFP) Tokenize and learn both FP and NFP modules as text data and construct corpuses Existing code modules FP FP Learning (Training) NFP FP NFP corpus corpus FP Filter (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 11. 8 Osaka Univ. Fault-Prone Filtering All software modules can be classified into bug-detected (fault-prone: FP) not-bug-detected (not-fault-prone: NFP) Tokenize and learn both FP and NFP modules as text data and construct corpuses Existing code modules FP FP Learning (Training) NFP ? FP NFP Newly developed modules Classify corpus corpus FP are classified into FP or FP NFP by the FP filter. Filter NFP Newly created code (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 12. 9 Fault-Prone Filtering: Osaka Univ. Spam Filter: CRM114 Spam filter: CRM114 (http://crm114.sourceforge.net/) Generic text discriminator for various purpose Implements several classifiers: Markov, OSB, kNN, ... Characteristic: generation of tokens. Tokens are generated by combination of words (not a single word) return (x + 2 * y ); with the OSB tokenizer token return x x + + 2 2 y return + x 2 + * * y return 2 x * + y return * x y 2 * (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 13. 10 Fault-Prone Filtering: Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB) Source code (mFP) Source code (mNFP) public int fact(int x) { public int fact(int x) { return (x<=1?1:x*fact(++x)); return (x<=1?1:x*fact(--x)); } } (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 14. 10 Fault-Prone Filtering: Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB) Source code (mFP) Source code (mNFP) public int fact(int x) { public int fact(int x) { return (x<=1?1:x*fact(++x)); return (x<=1?1:x*fact(--x)); } } Empty Empty FP NFP corpus corpus FP Filter (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 15. 10 Fault-Prone Filtering: Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB) Source code (mFP) Source code (mNFP) public int fact(int x) { public int fact(int x) { return (x<=1?1:x*fact(++x)); return (x<=1?1:x*fact(--x)); } } Tokens (TFP) Tokens (TNFP) public int public int public fact public fact public x public x int fact int fact int int int int ... ... ... ... x ++ FP NFP x -- x x corpus corpus x x * fact * fact * ++ FP * -- * x Filter * x fact ++ fact -- fact x fact x ++ x -- x (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 16. 10 Fault-Prone Filtering: Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB) Source code (mFP) Source code (mNFP) public int fact(int x) { public int fact(int x) { return (x<=1?1:x*fact(++x)); return (x<=1?1:x*fact(--x)); } } Tokens (TFP) Tokens (TNFP) public int public int public fact public fact public x public x int fact Training Training int fact int int int int ... ... ... ... x ++ FP NFP x -- x x corpus corpus x x * fact * fact * ++ FP * -- * x Filter * x fact ++ fact -- fact x fact x ++ x -- x (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 17. 11 Fault-Prone Filtering: Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB) Source code (mnew) public int sigma(int x) { return (x<=0?0:x+sigma(++x)); } (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 18. 11 Fault-Prone Filtering: Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB) Source code (mnew) public int sigma(int x) { return (x<=0?0:x+sigma(++x)); } Tokens (Tnew) public int public sigma public x int sigma int int ... ... x ++ x x + sigma + ++ + x sigma ++ sigma x ++ x (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 19. 11 Fault-Prone Filtering: Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB) Source code (mnew) public int sigma(int x) { return (x<=0?0:x+sigma(++x)); } Tokens (Tnew) public int public sigma public x int sigma int int ... FP NFP ... corpus corpus x ++ x x Prediction FP + sigma Filter + ++ + x sigma ++ sigma x ++ x (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 20. 11 Fault-Prone Filtering: Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB) Source code (mnew) public int sigma(int x) { return (x<=0?0:x+sigma(++x)); } Tokens (TFP) Tokens (Tnew) Tokens (TNFP) public int public int public int public fact public sigma public fact public x public x public x int fact int sigma int fact int int int int int int ... ... ... ... ... ... x ++ x ++ x -- x x x x x x * fact + sigma * fact * ++ + ++ * -- * x + x * x fact ++ sigma ++ fact -- fact x sigma x fact x ++ x ++ x -- x (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 21. 11 Fault-Prone Filtering: Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB) Source code (mnew) public int sigma(int x) { return (x<=0?0:x+sigma(++x)); } Tokens (TFP) Tokens (Tnew) Tokens (TNFP) public int public int public int public fact public sigma public fact public x public x public x int fact int sigma int fact int int int int int int ... ... ... ... ... ... x ++ x ++ x -- x x x x x x * fact + sigma * fact * ++ + ++ * -- * x + x * x fact ++ sigma ++ fact -- fact x sigma x fact x ++ x ++ x -- x (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 22. 11 Fault-Prone Filtering: Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB) Source code (mnew) public int sigma(int x) { return (x<=0?0:x+sigma(++x)); } Tokens (Tnew) mnew is predicted as FP public int public sigma because TFP has more public x similarity than TNFP. int sigma int int ... FP NFP ... corpus corpus x ++ Probability: x x Prediction FP 0.52 + sigma Filter + ++ Predicted: + x FP sigma ++ sigma x ++ x (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 23. 12 Osaka Univ. Overview Preliminary Fault-Prone Filtering Experiments Training Only Errors (TOE) procedure Results Conclusions (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 24. 13 Osaka Univ. Experiments Target: Eclipse project Written in Java “Methods” in Java classes are considered as modules Date of snapshots of cvs repository and bugzilla database January 30, 2007. Large CVS repository (about 14GB) Faults are recorded precisely (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 25. 14 Experiment: Osaka Univ. Collecting FP & NFP Modules Track FP modules from CVS log based on an algorithm by Sliwerski, et. al[2]. [2] J. Sliwerski, et. al., When do changes induce fixes? (on fridays.). In Proc. of MSR2005, pp. 24-28, 2005. Search terms such as “issue”, “problem”, “#”, and bug id as well as “fixed”, “resolved”, or “removed” from CVS log, then identify a revision the bug is removed. Get difference from the previous revision and identify modified modules. Track back repository and identify modules that have not been modified since the bug is reported. They are FP modules. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 26. 15 Experiment: Osaka Univ. Result of Module Collection Extracted bugs from bugzilla database of Eclipse Conditions: Type of faults: Bugs Status of faults: Resolved,Verified, or Closed Resolution of faults: Fixed Severity: Blocker, Critical, Major, or Normal Total # of faults: 40,627 Result of collection # of faults found in CVS log: 21,761 (52% of total) # of fault-prone(FP) modules: 65,782 # of not-fault-prone(NFP) modules: 1,113,063 (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 27. 16 Osaka Univ. Overview Preliminary Fault-Prone Filtering Experiments Training Only Errors (TOE) procedure Results Conclusions (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 28. 17 Osaka Univ. Training Only Errors Procedure In Spam filtering: Apply e-mail messages to spam filter in order of arrival. Only misclassified e-mail messages are trained in corpuses. You may do this procedure in daily e-mail assorting. In Fault-prone filtering: Apply software modules to fault-prone filter in order of construction and modification. Only misclassified modules are trained in corpuses. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 29. 18 Osaka Univ. Overview Preliminary Fault-Prone Filtering Experiments Training Only Errors (TOE) procedure Results Conclusions (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 30. 19 Osaka Univ. Evaluation Measurements Accuracy Result of Predicted Overall accuracy of prediction prediction NFP FP (N1+N4) / (N1+N2+N3+N4) NFP N1 N2 Actual FP N3 N4 Recall How much actual FP modules are predicted as FP. N3 / (N3+N4) Precision How much predicted FP modules include actual FP modules N2 / (N2+N4) (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 31. 20 Osaka Univ. Result of Experiment (Transition of Rates) All extracted 1 modules are sorted 0.9 AA A A A A AAAA AA AAAA AAAA AAA Accuracy AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A AAAAAAAA AAA AAA AAA AAAA A by date, and applied 0.8 A A A FFFF F A A FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFF FFF F FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF F FF FFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF F FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF F FFFFFFFFFFFFFFFFFFFFFF 0.7 F F FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFF F F Recall FP filter one by one F FF FF FFF F F FFF F 0.6 F F F F Rate from the oldest one. 0.5 I IIII Precision 0.4 I IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIII I I IIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII II IIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIII I IIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII I IIII 0.3 II IIII I I II DI II I D D D D 0.2 DDD DDD D DDD DDD DDDDD D DDDDDD D DDDDDDD False-negative D DDDDDDDDDDDDD Observation D DDDDDDDDDDD DDD DDDDDDD DDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDD DD DDDD DDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD D DDDD DDD 0.1 DD DDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD D DDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD D D DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDD False-positive 0 The prediction result 0 0 00 00 00 00 00 00 00 00 00 0 00 00 00 00 00 00 00 00 00 00 00 00 50 50 50 become stable after 15 25 35 45 55 65 75 85 95 10 11 old Methods sorted by date new 50,000 modules classification. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 32. 21 Osaka Univ. Result of Experiment (Final Accuracy) Cumulative prediction result at the end of TOE. TOE - final Predicted OSB Precision: 0.347 NFP FP Recall: 0.728 NFP 1,022,895 90,168 Actual Accuracy: 0.908 FP 17,890 47,892 In other words, 72% of actual FP modules are predicted as FP. 34% of predicted FP modules include faults. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 33. 22 Osaka Univ. Overview Preliminary Fault-Prone Filtering Experiments Training Only Errors (TOE) procedure Results Conclusions (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 34. 23 Osaka Univ. Threats to Validity Threats to construction validity Collection of fault-prone modules from OSS projects. We could not cover all faults in bugzilla database. We have to collect more reliable data in future work. Threats to external validity Generalizability of the results We have to apply Fault-prone Filtering to many projects including industrial ones. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 35. 24 Osaka Univ. Related Works Much research has been done so far. Logistic regression CART Bayesian classification and more. Most of them use software metrics McCabe, Halstead, Object-oriented, and so on. Intuitively speaking, our approach uses a new metric, “frequency of tokens”. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 36. 25 Osaka Univ. Conclusions Summary We proposed the new approach to detect fault prone modules using spam filter. The case study showed that our approach can predict fault prone modules with high accuracy. Future works Using semantic parsing information instead of raw code Using differences between revisions as an input of Fault-prone filtering Seems more reasonable... (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 37. 26 Osaka Univ. Q&A Thank you! Any questions? (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 38. 27 Result of Cross Validation (From slide of MSR2007) Result for Eclipse BIRT plugin 10-fold cross validation Cross Validation Predicted OSB Precision: 0.319 NFP FP NFP 70,369 16,011 Recall: 0.786 Actual FP 2,039 7,501 Accuracy: 0.811 Recall is important for quality assurance. Precision implies the cost for finding FP modules. Recall is rather high, and precision is rather low. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved MSR2007
  • 39. 28 Training Only Errors Procedure Osaka Univ. Case 2: Prediction matches to actual status ? ? ? .java FP NFP ? .java corpus corpus .java FP .java Filter (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 40. 28 Training Only Errors Procedure Osaka Univ. Case 2: Prediction matches to actual status ? ? ? .java FP NFP Probability: ? .java corpus corpus 0.9 .java FP Predicted: .java Filter Prediction FP (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 41. 28 Training Only Errors Procedure Osaka Univ. Case 2: Prediction matches to actual status ? ? ? .java FP NFP Probability: .java corpus corpus 0.9 .java FP Predicted: Filter FP ?FP .java (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 42. 28 Training Only Errors Procedure Osaka Univ. Case 2: Prediction matches to actual status ? ? ? .java FP NFP Probability: .java corpus corpus 0.9 .java FP Predicted: Filter FP ?FP .java (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 43. 28 Training Only Errors Procedure Osaka Univ. Case 2: Prediction matches to actual status ? ? ? .java FP NFP .java corpus corpus .java FP Filter (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 44. 28 Training Only Errors Procedure Osaka Univ. Case 2: Prediction matches to actual status ? ?.java FP NFP ? .java corpus corpus FP .java Filter Prediction (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 45. 29 Training Only Errors Procedure Osaka Univ. Case 1: Prediction does not match to actual status ? ? ? .java ? .java FP NFP ? .java corpus corpus .java FP .java Filter (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 46. 29 Training Only Errors Procedure Osaka Univ. Case 1: Prediction does not match to actual status ? ? ? .java ? .java FP NFP Probability: ? .java corpus corpus 0.2 .java FP Predicted: .java Filter Prediction NFP (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 47. 29 Training Only Errors Procedure Osaka Univ. Case 1: Prediction does not match to actual status ? ? ? .java ? .java FP NFP .java corpus corpus Probability: 0.2 .java FP Predicted: Filter NFP ? FP .java (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 48. 29 Training Only Errors Procedure Osaka Univ. Case 1: Prediction does not match to actual status ? ? ? .java ? .java FP NFP .java corpus corpus Probability: 0.2 .java FP Predicted: Filter NFP ? FP .java (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 49. 29 Training Only Errors Procedure Osaka Univ. Case 1: Prediction does not match to actual status ? FP ? .java ? Training ? .java ? .java FP NFP .java corpus corpus Probability: 0.2 .java FP Predicted: Filter NFP (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 50. 29 Training Only Errors Procedure Osaka Univ. Case 1: Prediction does not match to actual status ? ? ? .java FP NFP ? .java corpus corpus .java FP .java Filter (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 51. 30 Osaka Univ. Procedure of Experiment Two experiments with different thresholds of probability(tFP) to determine FP and NFP. Changing tFP may achieve higher recall Experiment 1: TOE with OSB classifier, tFP=0.5 Experiment 2: TOE with OSB classifier, tFP=0.25 Predict more modules as FP than Experiment 1 (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
  • 52. 31 Osaka Univ. Result of Experiment (OSB, tFP = 0.25) Comparison with 1 0.9 Recall threshold = 0.50 FF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF F FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF D F A AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFFFFFFFAAAAAFFFFFFFFFF AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFFFFFFFFFFFFFFFFF FFFFFAFAFAFAF AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA FFFFAAAAAFAFAFA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFAAAAAAAA AAAAAAAAFFAAAAAAAAA AAAAAAAAAA FFFFFAAAAAFAFAFAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A FFF FFFFFAAAFFAFAFAFAAAAAAAAAAAAAAAAAAAAAAAAAAA FFFF FAFAFAFA FFAAFAFAFAFA 0.8 DFFF AA F FA FF A DF A F AAAA F AA FFAAA F AA A F AA FFAA FAFA A ccuracy A A 0.7 A Precision becomes DA AA A AA A A D 0.6 A D D lower. A A A DD Rate D 0.5 D DD D DD DD D D Only 1/4 of FP 0.4 D DD D DD D D 0.3 DDDD D D DD DDD Precision predicted modules D DD I IIIIIIIIIIIIIII DD DD DD IIIIIIIIIIIIIDIIIIII DDI I I IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII I IIIII DIDDDDDDIIIIIIIII DID DIDIIIIII I DI IIIIIIII DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII III IIII I DDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD D DDDDDDDDDDDDDDDDDDDDDDDD D DDDDDDDDDDDDDDDDDDDDDDD DDDDDDD D DDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD D D DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD 0.2 IIIIII III II False-positive DD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD D DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDD D DDDDDDDDDD DDDD hits actual faulty I 0.1 DDDDDDDDDDDDD False-negative DDDDDDDDDDDDDDD D DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD modules. DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD 0 11 00 0 25 0 35 0 45 0 55 0 65 0 75 0 85 0 95 0 10 00 15 0 00 0 0 0 0 0 0 0 0 00 0 00 00 00 00 00 00 00 00 00 Recall becomes much 50 50 50 higher. Methods sorted by date TOE - final Predicted Precision: 0.232 83% of actual faulty OSB NFP FP modules can be Recall: 0.839 NFP 930,218 182,845 Actual detected. FP 10,592 55,190 Accuracy: 0.835 (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007

Notas do Editor

  1. My name is Osamu Mizuno, from Osaka University, JAPAN.\n\nThe title of my talk is ....\n\n
  2. This slide briefly shows what I tried.\n\nThe main idea is development of new approach to detect fault prone modules using a generic text discriminator such as a spam filter.\n\nIn order to show effectiveness, we performed an experiment.\nWe used CRM114 text discriminator for a spam filter,\nWe collected data of fault-prone modules from Eclipse project.\nWe then performed training only errors procedure in the experiment.\n\nThe result of experiment showed that we achieved high recall, but low precision.\n\n
  3. Overview of this presentation.\n
  4. \n
  5. In order to breakthrough the situation,\n\n\n\nNowadays, most of spam filters implement Bayesian filters.\n
  6. it is assumed that all e-mail messages can be classified into SPAM and HAM. SPAM is undesired, and HAM is desired e-mail messages.\n\nwe then tokenize both spam and ham e-mail messages and learn them into corpuses\n\nThen, incoming messages are classified into spam or ham by SPAM filter.\n
  7. it is assumed that all e-mail messages can be classified into SPAM and HAM. SPAM is undesired, and HAM is desired e-mail messages.\n\nwe then tokenize both spam and ham e-mail messages and learn them into corpuses\n\nThen, incoming messages are classified into spam or ham by SPAM filter.\n
  8. \n
  9. As for the fault-prone filtering, almost the same approach to spam filtering is adopted.\n\nAt first, we assume all software modules can be classified into bug-injected and not-bug-injected\nwe call bug injected modules as fault-prone (FP), and not bug injected modules as not fault-prone (NFP).\n\nThen modules are tokenized and learned to FP and NFP corpuses.\n\nFinally, newly developed modules are classified into FP or NFP by spam filter.\n\nNote that the input of our approach is source code only.\n\n
  10. As for the fault-prone filtering, almost the same approach to spam filtering is adopted.\n\nAt first, we assume all software modules can be classified into bug-injected and not-bug-injected\nwe call bug injected modules as fault-prone (FP), and not bug injected modules as not fault-prone (NFP).\n\nThen modules are tokenized and learned to FP and NFP corpuses.\n\nFinally, newly developed modules are classified into FP or NFP by spam filter.\n\nNote that the input of our approach is source code only.\n\n
  11. \n
  12. Assume that two modules mFP and mNFP are created.\nAlso Assume that we have empty FP and NFP corpuses.\n\nin fact, mFP is a falut-injected module, and mNFP is a revised one of mFP.\nWe can see that this module is intended to calculate factorial of given integer x, however, mFP has a bug that the ++ should be --. mNFP revised the bug.\n\nBy using CRM114, we obtain tokens to be trained like this. The difference between tokens TFP and TNFP are represented as red.\n
  13. Assume that two modules mFP and mNFP are created.\nAlso Assume that we have empty FP and NFP corpuses.\n\nin fact, mFP is a falut-injected module, and mNFP is a revised one of mFP.\nWe can see that this module is intended to calculate factorial of given integer x, however, mFP has a bug that the ++ should be --. mNFP revised the bug.\n\nBy using CRM114, we obtain tokens to be trained like this. The difference between tokens TFP and TNFP are represented as red.\n
  14. Assume that two modules mFP and mNFP are created.\nAlso Assume that we have empty FP and NFP corpuses.\n\nin fact, mFP is a falut-injected module, and mNFP is a revised one of mFP.\nWe can see that this module is intended to calculate factorial of given integer x, however, mFP has a bug that the ++ should be --. mNFP revised the bug.\n\nBy using CRM114, we obtain tokens to be trained like this. The difference between tokens TFP and TNFP are represented as red.\n
  15. Assume that two modules mFP and mNFP are created.\nAlso Assume that we have empty FP and NFP corpuses.\n\nin fact, mFP is a falut-injected module, and mNFP is a revised one of mFP.\nWe can see that this module is intended to calculate factorial of given integer x, however, mFP has a bug that the ++ should be --. mNFP revised the bug.\n\nBy using CRM114, we obtain tokens to be trained like this. The difference between tokens TFP and TNFP are represented as red.\n
  16. Assume that two modules mFP and mNFP are created.\nAlso Assume that we have empty FP and NFP corpuses.\n\nin fact, mFP is a falut-injected module, and mNFP is a revised one of mFP.\nWe can see that this module is intended to calculate factorial of given integer x, however, mFP has a bug that the ++ should be --. mNFP revised the bug.\n\nBy using CRM114, we obtain tokens to be trained like this. The difference between tokens TFP and TNFP are represented as red.\n
  17. Assume that two modules mFP and mNFP are created.\nAlso Assume that we have empty FP and NFP corpuses.\n\nin fact, mFP is a falut-injected module, and mNFP is a revised one of mFP.\nWe can see that this module is intended to calculate factorial of given integer x, however, mFP has a bug that the ++ should be --. mNFP revised the bug.\n\nBy using CRM114, we obtain tokens to be trained like this. The difference between tokens TFP and TNFP are represented as red.\n
  18. Assume that two modules mFP and mNFP are created.\nAlso Assume that we have empty FP and NFP corpuses.\n\nin fact, mFP is a falut-injected module, and mNFP is a revised one of mFP.\nWe can see that this module is intended to calculate factorial of given integer x, however, mFP has a bug that the ++ should be --. mNFP revised the bug.\n\nBy using CRM114, we obtain tokens to be trained like this. The difference between tokens TFP and TNFP are represented as red.\n
  19. Assume that two modules mFP and mNFP are created.\nAlso Assume that we have empty FP and NFP corpuses.\n\nin fact, mFP is a falut-injected module, and mNFP is a revised one of mFP.\nWe can see that this module is intended to calculate factorial of given integer x, however, mFP has a bug that the ++ should be --. mNFP revised the bug.\n\nBy using CRM114, we obtain tokens to be trained like this. The difference between tokens TFP and TNFP are represented as red.\n
  20. Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (&amp;#x6307;&amp;#x3055;&amp;#x3059;). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  21. Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (&amp;#x6307;&amp;#x3055;&amp;#x3059;). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  22. Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (&amp;#x6307;&amp;#x3055;&amp;#x3059;). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  23. Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (&amp;#x6307;&amp;#x3055;&amp;#x3059;). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  24. Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (&amp;#x6307;&amp;#x3055;&amp;#x3059;). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  25. Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (&amp;#x6307;&amp;#x3055;&amp;#x3059;). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  26. Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (&amp;#x6307;&amp;#x3055;&amp;#x3059;). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  27. Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (&amp;#x6307;&amp;#x3055;&amp;#x3059;). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  28. Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (&amp;#x6307;&amp;#x3055;&amp;#x3059;). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  29. Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (&amp;#x6307;&amp;#x3055;&amp;#x3059;). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  30. Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (&amp;#x6307;&amp;#x3055;&amp;#x3059;). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  31. Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (&amp;#x6307;&amp;#x3055;&amp;#x3059;). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  32. Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (&amp;#x6307;&amp;#x3055;&amp;#x3059;). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  33. Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (&amp;#x6307;&amp;#x3055;&amp;#x3059;). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  34. \n
  35. in order to show the usefulness of our approach, we have conducted experiment.\n\nthe target system is &amp;#x201C;Eclipse&amp;#x201D; project\n\n
  36. For the case study, we have to collect FP modules from the target CVS repository. \n\nTo do so, we adopt an algorithm shown by Sliwerski et. al. in MSR2005.\nBriefly speaking, at first, we search terms such as &amp;#x201C;issue&amp;#x201D;, problem, or #, along with terms &amp;#x201C;fixed&amp;#x201D; or &amp;#x201C;resolved&amp;#x201D; or &amp;#x201C;removed&amp;#x201D; and so on. we can then identify a revision the bug is removed.\nnext, we get differences from the previous revision and identify modified modules.\nfinally, we track back the cvs and identify modules that have not been modified since the bug is reported. we consider these are FP modules.\n\nThe result of collection is summarized here.\n- Bugs found in cvs log was 1,973. it was 42 % of total\n- # of found FP modules are 9,547, and NFP modules are 86,770 modules.\n\n
  37. \n
  38. \n
  39. \nIf an e-mail message from my boss is marked as spam, I have to click not junk button unwillingly. \nThis is exactly &amp;#x201C;training in case of errors&amp;#x201D;.\n\n
  40. \n
  41. \n
  42. This slide shows the result of training only errors experiment.\n\nAll extracted modules are sorted by date, and applied FP filter one by one from the oldest one.\n\nThe graph shows transitions of accuracy, recall, and precision.\nX axis shows the cumulative number of modules applied to FP filter. \nY axis shows the values of rates.\n\nWe can see that the prediction results become stable after 50,000 modules classification.\nIt is because there is little training data at the beginning of the TOE.\n\n\n
  43. the table shows the cumulative classification result at the final point of the experiment.\n\n\n\n
  44. \n
  45. We have proposed a new approach to detect FP modules using spam filter.\n\nThe threat of construction validity is that it is not clear whether collected FP modules really include faults or not. We need to investigate this threat as a future work.\n\n
  46. \n
  47. We have proposed a new approach to detect FP modules using spam filter.\n\nThe threat of construction validity is that it is not clear whether collected FP modules really include faults or not. We need to investigate this threat as a future work.\n\n
  48. \n
  49. the result of cross validation is shown in this table.\n\nthe columns shows the number of predicted status of modules, and the rows shows the number of actual status of modules.\n\nin order to evaluate the result, we introduce 3 measures.\n[click]\nPrecision denotes the ratio of actual FP modules to the predicted FP modules.\n[click]\nRecall denotes the ratio of predicted FP modules to the actual FP modules. \n[click]\nAccuracy is the ratio of correctly predicted modules to all modules.\n\nThe recall is an important measure to assure the quality\nthe precision is important for the cost of testing\n\nWe can see that recall and accuracy are relatively high, but precision is low.\n\n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  72. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  73. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  74. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  75. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  76. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  77. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  78. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  79. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  80. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  81. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  82. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  83. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  84. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  85. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  86. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  87. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  88. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  89. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  90. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  91. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  92. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  93. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  94. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  95. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  96. At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  97. \n
  98. \n\n