SlideShare a Scribd company logo
1 of 24
Detecting Bug Duplicate Reports Through Locality of Reference Tomi Prifti, Sean Banerjee, Bojan Cukic Lane Department of CSEE West Virginia University Morgantown, WV, USA September 2011
Presentation Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A typical bug report
Goals ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Related Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Related Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Related Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Firefox Repository ,[object Object],[object Object],[object Object]
Time Interval Between Reports ,[object Object]
Experimental Setup ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Experimental Procedure ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Measuring Performance ,[object Object],[object Object],[object Object]
Approach methodology ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sliding Window Defined ,[object Object],[object Object],[object Object],[object Object],[object Object]
Experimental Results ,[object Object]
Performance and Runtime ,[object Object],[object Object]
Result Comparisons Group Approach Results Hiew, et-al Text analysis Recall rate ~50% Cubranic, et-al Bayesian learning Text categorization Correctly predicted ~30% duplicates Jalbert, et-al Text Similarity Clustering  Recall rate ~51%  List size 20 Wang, et-al NLP Execution Information 67-93% detection rate (43-72% with NLP) Wang, et-al Enhanced version of prior algorithm 17-31% improvement over state of art Our approach Time Window/Centroids ~53% recall rate
Threats to Validity ,[object Object],[object Object],[object Object]
Summary and Future Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
TF/IDF  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Sliding Window - TF/IDF  ,[object Object],[object Object],[object Object],[object Object]
Sliding Window – Centroid ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sliding Window – Centroid – TD/IDF ,[object Object],[object Object]

More Related Content

Viewers also liked

GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...CS, NcState
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9CS, NcState
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).CS, NcState
 
Talks2015 novdec
Talks2015 novdecTalks2015 novdec
Talks2015 novdecCS, NcState
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
Superscalar & superpipeline processor
Superscalar & superpipeline processorSuperscalar & superpipeline processor
Superscalar & superpipeline processorMuhammad Ishaq
 

Viewers also liked (10)

Pipelining Cache
Pipelining CachePipelining Cache
Pipelining Cache
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software Engineering
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).
 
Future se oct15
Future se oct15Future se oct15
Future se oct15
 
Talks2015 novdec
Talks2015 novdecTalks2015 novdec
Talks2015 novdec
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Superscalar & superpipeline processor
Superscalar & superpipeline processorSuperscalar & superpipeline processor
Superscalar & superpipeline processor
 
pipelining
pipeliningpipelining
pipelining
 

Similar to Detecting Bug Duplicate Reports Through Locality of Reference

178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...ESEM 2014
 
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTSUSING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTSijseajournal
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningIRJET Journal
 
Software Engineering Domain Knowledge to Identify Duplicate Bug Reports
Software Engineering Domain Knowledge to Identify Duplicate Bug ReportsSoftware Engineering Domain Knowledge to Identify Duplicate Bug Reports
Software Engineering Domain Knowledge to Identify Duplicate Bug ReportsIJCERT
 
IRJET - Neural Network based Leaf Disease Detection and Remedy Recommenda...
IRJET -  	  Neural Network based Leaf Disease Detection and Remedy Recommenda...IRJET -  	  Neural Network based Leaf Disease Detection and Remedy Recommenda...
IRJET - Neural Network based Leaf Disease Detection and Remedy Recommenda...IRJET Journal
 
Bug Triage: An Automated Process
Bug Triage: An Automated ProcessBug Triage: An Automated Process
Bug Triage: An Automated ProcessIRJET Journal
 
When do software issues get reported in large open source software - Rakesh Rana
When do software issues get reported in large open source software - Rakesh RanaWhen do software issues get reported in large open source software - Rakesh Rana
When do software issues get reported in large open source software - Rakesh RanaIWSM Mensura
 
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)yguarata
 
Collaborative Filtering Survey
Collaborative Filtering SurveyCollaborative Filtering Survey
Collaborative Filtering Surveymobilizer1000
 
Sound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingSound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingJaguaraci Silva
 
Towards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniquesTowards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniquesredpel dot com
 
Progressive Duplicate Detection
Progressive Duplicate DetectionProgressive Duplicate Detection
Progressive Duplicate Detection1crore projects
 
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila CastilhoHuman Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila CastilhoSebastian Ruder
 
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...Shakas Technologies
 
When do software issues get reported in large open source software
When do software issues get reported in large open source softwareWhen do software issues get reported in large open source software
When do software issues get reported in large open source softwareRAKESH RANA
 
Siena's Clinical Decision Assistant
Siena's Clinical Decision AssistantSiena's Clinical Decision Assistant
Siena's Clinical Decision AssistantMichael Ippolito
 

Similar to Detecting Bug Duplicate Reports Through Locality of Reference (20)

178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
 
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTSUSING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
 
Software Engineering Domain Knowledge to Identify Duplicate Bug Reports
Software Engineering Domain Knowledge to Identify Duplicate Bug ReportsSoftware Engineering Domain Knowledge to Identify Duplicate Bug Reports
Software Engineering Domain Knowledge to Identify Duplicate Bug Reports
 
IRJET - Neural Network based Leaf Disease Detection and Remedy Recommenda...
IRJET -  	  Neural Network based Leaf Disease Detection and Remedy Recommenda...IRJET -  	  Neural Network based Leaf Disease Detection and Remedy Recommenda...
IRJET - Neural Network based Leaf Disease Detection and Remedy Recommenda...
 
Bug Triage: An Automated Process
Bug Triage: An Automated ProcessBug Triage: An Automated Process
Bug Triage: An Automated Process
 
M018147883
M018147883M018147883
M018147883
 
When do software issues get reported in large open source software - Rakesh Rana
When do software issues get reported in large open source software - Rakesh RanaWhen do software issues get reported in large open source software - Rakesh Rana
When do software issues get reported in large open source software - Rakesh Rana
 
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
 
IJET-V2I6P28
IJET-V2I6P28IJET-V2I6P28
IJET-V2I6P28
 
Collaborative Filtering Survey
Collaborative Filtering SurveyCollaborative Filtering Survey
Collaborative Filtering Survey
 
Sound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingSound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software Testing
 
The Chemtools LaBLog
The Chemtools LaBLogThe Chemtools LaBLog
The Chemtools LaBLog
 
Towards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniquesTowards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniques
 
Progressive Duplicate Detection
Progressive Duplicate DetectionProgressive Duplicate Detection
Progressive Duplicate Detection
 
Pedersen acl2011-business-meeting
Pedersen acl2011-business-meetingPedersen acl2011-business-meeting
Pedersen acl2011-business-meeting
 
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila CastilhoHuman Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
 
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
 
When do software issues get reported in large open source software
When do software issues get reported in large open source softwareWhen do software issues get reported in large open source software
When do software issues get reported in large open source software
 
Siena's Clinical Decision Assistant
Siena's Clinical Decision AssistantSiena's Clinical Decision Assistant
Siena's Clinical Decision Assistant
 

More from CS, NcState

Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceCS, NcState
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits CS, NcState
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab templateCS, NcState
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUCS, NcState
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements EngineeringCS, NcState
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginiaCS, NcState
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software EngineeringCS, NcState
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)CS, NcState
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceCS, NcState
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1CS, NcState
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataCS, NcState
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter? CS, NcState
 
In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?CS, NcState
 
Sayyad slides ase13_v4
Sayyad slides ase13_v4Sayyad slides ase13_v4
Sayyad slides ase13_v4CS, NcState
 
Warning: don't do CS
Warning: don't do CSWarning: don't do CS
Warning: don't do CSCS, NcState
 
How to do better experiments in SE
How to do better experiments in SEHow to do better experiments in SE
How to do better experiments in SECS, NcState
 
Idea Engineering
Idea EngineeringIdea Engineering
Idea EngineeringCS, NcState
 

More from CS, NcState (20)

Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data Science
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab template
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSU
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements Engineering
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software Engineering
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data Science
 
Goldrush
GoldrushGoldrush
Goldrush
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1
 
Know thy tools
Know thy toolsKnow thy tools
Know thy tools
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter?
 
In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?
 
Sayyad slides ase13_v4
Sayyad slides ase13_v4Sayyad slides ase13_v4
Sayyad slides ase13_v4
 
Ase2013
Ase2013Ase2013
Ase2013
 
Warning: don't do CS
Warning: don't do CSWarning: don't do CS
Warning: don't do CS
 
How to do better experiments in SE
How to do better experiments in SEHow to do better experiments in SE
How to do better experiments in SE
 
Idea Engineering
Idea EngineeringIdea Engineering
Idea Engineering
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Detecting Bug Duplicate Reports Through Locality of Reference

  • 1. Detecting Bug Duplicate Reports Through Locality of Reference Tomi Prifti, Sean Banerjee, Bojan Cukic Lane Department of CSEE West Virginia University Morgantown, WV, USA September 2011
  • 2.
  • 3.
  • 4. A typical bug report
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18. Result Comparisons Group Approach Results Hiew, et-al Text analysis Recall rate ~50% Cubranic, et-al Bayesian learning Text categorization Correctly predicted ~30% duplicates Jalbert, et-al Text Similarity Clustering Recall rate ~51% List size 20 Wang, et-al NLP Execution Information 67-93% detection rate (43-72% with NLP) Wang, et-al Enhanced version of prior algorithm 17-31% improvement over state of art Our approach Time Window/Centroids ~53% recall rate
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.