SlideShare uma empresa Scribd logo
1 de 19
Inferring Resource Specifications
from
Natural Language API Documentation
Presented By:
Hemankita Perabathini
Outline
• Introduction
• Related Work
• Our Approach
• Evaluations
• Benefits
• Future Work
• Conclusion
• Typically, software libraries provide API documentation, through which developers can learn how to
use libraries correctly.
• However, developers may still write code inconsistent with API documentation and thus introduce
bugs, as existing research shows that many developers are reluctant to carefully read API
documentation.
• To find those bugs, researchers have proposed various detection approaches based on known
specifications.
• To mine specifications, many approaches have been proposed, and most of them rely on existing
client code. Consequently, these mining approaches would fail to mine specifications when client
code is not available.
Why to infer resource specifications from natural language API
documentation ?
In this paper, we propose an approach, called Doc2Spec, that infers resource specifications from API
documentation.
• For our approach, we implemented a tool and conducted an
evaluation on Javadocs of five libraries. The results show that
our approach infers various specifications with relatively high
precisions, recalls, and F-scores.
• We further evaluated the usefulness of inferred specifications
through detecting bugs in open source projects.
• The results show that specifications inferred by Doc2Spec are
useful to detect real bugs in existing projects.
Introduction
Example
How to infer specifications and detect bugs ?
Let us use a resource in J2EE named CCIConnection to illustrate the same.
Inferring specifications
Step 1 : Extract method descriptions and class/interface hierarchies from API documentation.
In this example, from J2EE’s Javadoc, our approach extracts three method descriptions of interface
javax.resource.cci.Connection
• createInteraction(): “Creates an interaction associated with this connection.”
• getMetaData(): “Gets the information on the underlying EIS instance represented through an active connection.”
• close(): “Initiates close of the connection handle at the application level.”
Step 2 : Build an action – resource pair from each method
• createInteraction():⟨𝑐𝑟𝑒𝑎𝑡𝑒, 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑜𝑛⟩
• getMetaData():⟨𝑔𝑒𝑡, 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑜𝑛⟩
• close():⟨𝑐𝑙𝑜𝑠𝑒, 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑜𝑛⟩
For a method, its action – resource pair denotes what action the method takes on what resource.
Step 3 : Infer automata for resources based on action-resource pairs and class/interface hierarchies
createInteraction()→creation method.
getMetaData()→manipulation method.
close()→closure method.
Fig: Resource specifications
Fig: Specification template
Detecting bugs
In this example, we can check whether close() is eventually called in all possible execution paths of a code snippet for
violations of the specification.
Related Work
As client projects contain many valuable usages of libraries, many approaches have been proposed to mine
specifications from client code statically or dynamically.
These mining approaches can be divided into two types.
• Automata-based approaches
• Sequence-based approaches
Most of these preceding previous approaches use existing client code as an input.
• When a library is not popular or new, its clients are difficult to find, and randomly generated test cases may
not reflect real usages of libraries.
• Our previous work shows that even in a popular library, some methods or classes are rarely used.
As our approach does not need client code, it is able to infer specifications when these methods have
descriptions, complementing these previous approaches.
These inferred specifications can be used to detect bugs when developers leverage libraries to develop client
code.
Approach
Overview of our approach
Javadoc Analysis – Extract method descriptions and class/interface hierarchies from Javadocs
Method Summary
• Search for the anchor name “method_summary”
Class/interface hierarchies
• Searches for the file named “package-tree.html” in the directory of each package and then locates the
topics “Class Hierarchy” and “Interface Hierarchy”
NLP analysis
• Build an action – resource pair from each method’s description through NLP analysis.
• In NLP, the problem of identifying words belonging to a predefined category in a document is known as Named Entity
Recognition (NER).
• In the literature, researchers have proposed the below approaches to recognize those entities.
i. Rule-based approaches
ii. Dictionary-based approaches
iii. Machine-learning-based approaches
• Rule-based approaches use hand-crafted rules.
• Dictionary-based approaches use a large collection of names as a dictionary for entities.
• As it is difficult to build hand-crafted rules or dictionaries for actions and resources, we choose machine-learning based
approaches for actions and resources.
• In particular, our approach uses the NER based on Hidden Markov Model(HMM) since it is reported to perform better
than other machine-learning-based approaches.
In NER, HMM is a five-tuple {Ω𝑠,Ω𝑜,𝜋,𝐴,𝐵} where
• Ω𝑠 ={𝑠1,...,𝑠𝑛}is the finite set of states.
In our approach, these states include action, resource, and other.
• Ω𝑜 ={𝑜1,...,𝑜𝑛}is the set of observations.
In our approach, 𝑜𝑖 =⟨𝑤𝑖,𝑓𝑖⟩ where 𝑤𝑖 is a word and 𝑓𝑖 =⟨𝐹𝑊𝑖 ,𝐹𝑀𝑖 ,𝐹𝑃𝑂𝑆𝑖⟩. Here, 𝐹𝑊 denotes the word feature such
as capitalization and digitalization; 𝐹𝑀 denotes the morphological feature such as prefix and suffix; 𝐹𝑃𝑂𝑆 denotes the
part-of-speech feature such as nouns, verbs, prepositions, adverbs, and adjectives.
• 𝜋∈Ω𝑠 is the initial state.
In our approach, 𝜋 denotes the state of the first word of each method description.
• 𝐴:Ω𝑠×Ω𝑠 →[0,1] is the probability distribution on state transitions.
For example,𝐴(𝑎𝑐𝑡𝑖𝑜𝑛, 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒) denotes the probability of a transition from action to resource.
• 𝐵 :Ω𝑠 ×Ω𝑜 →[0,1] is the probability distribution on state symbol emissions.
For example, 𝐵(𝑎𝑐𝑡𝑖𝑜𝑛,⟨𝑐𝑙𝑜𝑠𝑒, 𝑓⟩) denotes the probability of observing⟨𝑐𝑙𝑜𝑠𝑒, 𝑓⟩when it is in the state action.
Automata Inference
The final step of our approach is to infer automata for resources from action-resource pairs and class/interface hierarchies.
Action – Resource pairs
java.nio.channels.Channel.close()
⟨𝑐𝑙𝑜𝑠𝑒, 𝑐ℎ𝑎𝑛𝑛𝑒𝑙⟩←“Closes this channel.”
java.nio.channels.GatheringByteChannel.write()
⟨𝑤𝑟𝑖𝑡𝑒, 𝑐ℎ𝑎𝑛𝑛𝑒𝑙⟩←‘Writes a sequence of bytes to this
channel from the given buffers.”
java.nio.channels.ReadableByteChannel.read()
⟨𝑟𝑒𝑎𝑑, 𝑐ℎ𝑎𝑛𝑛𝑒𝑙⟩←“Reads a sequence of bytes from
this channel into the given buffer.”
Grouping
java.nio.channels.GatheringByteChannel
{write(), close()}
java.nio.channels.ReadableByteChannel
{read(), close()}
java.nio.channels.Channel
{close()}
Mapping
• Creation methods
• Lock methods
• Manipulation methods
• Unlock methods
• Closure methods
Evaluations
Our evaluations focus on two aspects of our approach
1. The performance of documentation analysis and specification inference and
2. The quality of inferred specifications
In our evaluations,
We have used 687 methods.(tagged manually)
We inferred specifications for only Five libraries: J2SE, J2EE, JBoss, iText and Oracle JDBC driver
Performance:
Quality of Inferred Specifications
1. Statistics of inferred resource specifications
Golden Standard
Usefulness of inferred resource specifications:
The specifications inferred by doc2spec is useful to detect bugs.
How to detect bugs using this specification ?
1. Searches using Google Code Search Engine.
2. Uses Partial parser to build control flow graph.
3. Finally our infrastructure checks whether the downloaded code snippets violates the inferred specification using the
resolved type and control flow graphs.
For example:
If resource is declared by method as a local variable our infrastructure checks the methods control flow graph using
the following criteria.
O/M: A manipulated resource should be already created if a resource has manipulation methods and creation methods.
O/C: A created resource should eventually be closed if a resource has creation methods and closure methods.
L/U: A locked resource should eventually be unlocked if a resource has lock methods and unlock methods.
M/C: A manipulated resource should be closed eventually if a resource has manipulation methods and closure methods.
To validate the suspected bug:
1. Check the latest version of software whether it is existing or not. If exists then it is a Real bug
2. Submit the bug report to developer
Threats to validity :
External validity:
We applied our approach only on Javadocs particularly on
limited libraries. We need to extend our approach on more
subjects in future work.
Internal validity:
Human factors for determining bugs need to inspect bugs
carefully and involve more experience developers.
Benefits
1. This complements mining specifications from client code.
2. Similarly it also complements inferring specifications from comments.
3. Detecting bugs in API documentation.
Future work
A. Resource Analysis
• Need to group similar methods with different names.
• Need to resolve noun phrases.
B. False positive rate
• To reduce high false positive rate.
C. Extension
• Bugs in local code bases.
• Mining specification templates.
• Other API documentation and descriptions.
Conclusion
In this paper, we propose a novel approach that
infers resource specifications from existing API
documentation.
We conducted an evaluation on Javadocs of five
widely used libraries.
We further conducted an evaluation to use inferred
specifications to detect bugs.
The results show that resource specifications
inferred by our approach are useful to detect real
bugs in practice.
SEppt

Mais conteúdo relacionado

Mais procurados

Plagiarism introduction
Plagiarism introductionPlagiarism introduction
Plagiarism introduction
Merin Paul
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Sease
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Lucidworks
 

Mais procurados (20)

NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
Lucene And Solr Document Classification
Lucene And Solr Document ClassificationLucene And Solr Document Classification
Lucene And Solr Document Classification
 
Plagiarism introduction
Plagiarism introductionPlagiarism introduction
Plagiarism introduction
 
Enhance discovery Solr and Mahout
Enhance discovery Solr and MahoutEnhance discovery Solr and Mahout
Enhance discovery Solr and Mahout
 
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @ChorusRated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
 
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)
 
Opal Hermes - towards representative benchmarks
Opal  Hermes - towards representative benchmarksOpal  Hermes - towards representative benchmarks
Opal Hermes - towards representative benchmarks
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
 
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs Documentation
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs DocumentationDRONE: A Tool to Detect and Repair Directive Defects in Java APIs Documentation
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs Documentation
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 
Feature Extraction for Large-Scale Text Collections
Feature Extraction for Large-Scale Text CollectionsFeature Extraction for Large-Scale Text Collections
Feature Extraction for Large-Scale Text Collections
 
Interactive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupInteractive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval Meetup
 
Defaultification Refactoring: A Tool for Automatically Converting Java Method...
Defaultification Refactoring: A Tool for Automatically Converting Java Method...Defaultification Refactoring: A Tool for Automatically Converting Java Method...
Defaultification Refactoring: A Tool for Automatically Converting Java Method...
 
Sparql semantic information retrieval by
Sparql semantic information retrieval bySparql semantic information retrieval by
Sparql semantic information retrieval by
 
How to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - HaystackHow to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - Haystack
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
C# 3.0 and LINQ Tech Talk
C# 3.0 and LINQ Tech TalkC# 3.0 and LINQ Tech Talk
C# 3.0 and LINQ Tech Talk
 
Runtime Behavior of JavaScript Programs
Runtime Behavior of JavaScript ProgramsRuntime Behavior of JavaScript Programs
Runtime Behavior of JavaScript Programs
 

Destaque

30 Viagem Astral 412 07 Jan05
30 Viagem Astral 412 07 Jan0530 Viagem Astral 412 07 Jan05
30 Viagem Astral 412 07 Jan05
Marco Aurelio V
 
Endereco Do Blog De Fabiana Aparecida Rodrigues Tonsis
Endereco Do Blog De Fabiana Aparecida Rodrigues TonsisEndereco Do Blog De Fabiana Aparecida Rodrigues Tonsis
Endereco Do Blog De Fabiana Aparecida Rodrigues Tonsis
Fabiana Tonsis
 
Carta Associação De Pais Para Conselho Executivo
Carta Associação De Pais Para Conselho ExecutivoCarta Associação De Pais Para Conselho Executivo
Carta Associação De Pais Para Conselho Executivo
apee
 
Olimpiadas Sesc 2 Rodada Futebol
Olimpiadas  Sesc 2 Rodada FutebolOlimpiadas  Sesc 2 Rodada Futebol
Olimpiadas Sesc 2 Rodada Futebol
guestd1dbb0
 
HBME Menu Coverage_Jan 2016
HBME Menu Coverage_Jan 2016HBME Menu Coverage_Jan 2016
HBME Menu Coverage_Jan 2016
Louisa Sumagui
 

Destaque (18)

30 Viagem Astral 412 07 Jan05
30 Viagem Astral 412 07 Jan0530 Viagem Astral 412 07 Jan05
30 Viagem Astral 412 07 Jan05
 
Endereco Do Blog De Fabiana Aparecida Rodrigues Tonsis
Endereco Do Blog De Fabiana Aparecida Rodrigues TonsisEndereco Do Blog De Fabiana Aparecida Rodrigues Tonsis
Endereco Do Blog De Fabiana Aparecida Rodrigues Tonsis
 
025.haikai 俳句 pvd para aço trabalho a frio.rv01
025.haikai 俳句 pvd para aço trabalho a frio.rv01025.haikai 俳句 pvd para aço trabalho a frio.rv01
025.haikai 俳句 pvd para aço trabalho a frio.rv01
 
ApresentaçãO2
ApresentaçãO2ApresentaçãO2
ApresentaçãO2
 
Mariana, te amo!
Mariana, te amo!Mariana, te amo!
Mariana, te amo!
 
protocolo visita unidad corta estadía psiquiatrica Ptt visita
protocolo visita unidad corta estadía psiquiatrica Ptt visitaprotocolo visita unidad corta estadía psiquiatrica Ptt visita
protocolo visita unidad corta estadía psiquiatrica Ptt visita
 
Vic Dillinger's Thanksgiving Day Bash!
Vic Dillinger's Thanksgiving Day Bash!Vic Dillinger's Thanksgiving Day Bash!
Vic Dillinger's Thanksgiving Day Bash!
 
DST
DSTDST
DST
 
Carta Associação De Pais Para Conselho Executivo
Carta Associação De Pais Para Conselho ExecutivoCarta Associação De Pais Para Conselho Executivo
Carta Associação De Pais Para Conselho Executivo
 
Olimpiadas Sesc 2 Rodada Futebol
Olimpiadas  Sesc 2 Rodada FutebolOlimpiadas  Sesc 2 Rodada Futebol
Olimpiadas Sesc 2 Rodada Futebol
 
Teste
TesteTeste
Teste
 
Vessie CV
Vessie CVVessie CV
Vessie CV
 
HBME Menu Coverage_Jan 2016
HBME Menu Coverage_Jan 2016HBME Menu Coverage_Jan 2016
HBME Menu Coverage_Jan 2016
 
Depoimento
DepoimentoDepoimento
Depoimento
 
Wikis
WikisWikis
Wikis
 
Morte
MorteMorte
Morte
 
SL17
SL17SL17
SL17
 
Alemanha Vs. Mali
Alemanha Vs. MaliAlemanha Vs. Mali
Alemanha Vs. Mali
 

Semelhante a SEppt

Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the Pipes
Rajarshi Guha
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
butest
 

Semelhante a SEppt (20)

Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)
 
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging Services
 
Memory models in c#
Memory models in c#Memory models in c#
Memory models in c#
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
 
Owner - Java properties reinvented.
Owner - Java properties reinvented.Owner - Java properties reinvented.
Owner - Java properties reinvented.
 
Java 102 intro to object-oriented programming in java
Java 102   intro to object-oriented programming in javaJava 102   intro to object-oriented programming in java
Java 102 intro to object-oriented programming in java
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the Pipes
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Java basics
Java basicsJava basics
Java basics
 
Lecture semantic augmentation
Lecture semantic augmentationLecture semantic augmentation
Lecture semantic augmentation
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache Spark
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeLearning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
 
Java 8
Java 8Java 8
Java 8
 
How to Reverse Engineer Web Applications
How to Reverse Engineer Web ApplicationsHow to Reverse Engineer Web Applications
How to Reverse Engineer Web Applications
 

SEppt

  • 1. Inferring Resource Specifications from Natural Language API Documentation Presented By: Hemankita Perabathini
  • 2. Outline • Introduction • Related Work • Our Approach • Evaluations • Benefits • Future Work • Conclusion
  • 3. • Typically, software libraries provide API documentation, through which developers can learn how to use libraries correctly. • However, developers may still write code inconsistent with API documentation and thus introduce bugs, as existing research shows that many developers are reluctant to carefully read API documentation. • To find those bugs, researchers have proposed various detection approaches based on known specifications. • To mine specifications, many approaches have been proposed, and most of them rely on existing client code. Consequently, these mining approaches would fail to mine specifications when client code is not available. Why to infer resource specifications from natural language API documentation ?
  • 4. In this paper, we propose an approach, called Doc2Spec, that infers resource specifications from API documentation. • For our approach, we implemented a tool and conducted an evaluation on Javadocs of five libraries. The results show that our approach infers various specifications with relatively high precisions, recalls, and F-scores. • We further evaluated the usefulness of inferred specifications through detecting bugs in open source projects. • The results show that specifications inferred by Doc2Spec are useful to detect real bugs in existing projects. Introduction
  • 5. Example How to infer specifications and detect bugs ? Let us use a resource in J2EE named CCIConnection to illustrate the same. Inferring specifications Step 1 : Extract method descriptions and class/interface hierarchies from API documentation. In this example, from J2EE’s Javadoc, our approach extracts three method descriptions of interface javax.resource.cci.Connection • createInteraction(): “Creates an interaction associated with this connection.” • getMetaData(): “Gets the information on the underlying EIS instance represented through an active connection.” • close(): “Initiates close of the connection handle at the application level.”
  • 6. Step 2 : Build an action – resource pair from each method • createInteraction():⟨𝑐𝑟𝑒𝑎𝑡𝑒, 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑜𝑛⟩ • getMetaData():⟨𝑔𝑒𝑡, 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑜𝑛⟩ • close():⟨𝑐𝑙𝑜𝑠𝑒, 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑜𝑛⟩ For a method, its action – resource pair denotes what action the method takes on what resource. Step 3 : Infer automata for resources based on action-resource pairs and class/interface hierarchies createInteraction()→creation method. getMetaData()→manipulation method. close()→closure method. Fig: Resource specifications Fig: Specification template
  • 7. Detecting bugs In this example, we can check whether close() is eventually called in all possible execution paths of a code snippet for violations of the specification.
  • 8. Related Work As client projects contain many valuable usages of libraries, many approaches have been proposed to mine specifications from client code statically or dynamically. These mining approaches can be divided into two types. • Automata-based approaches • Sequence-based approaches Most of these preceding previous approaches use existing client code as an input. • When a library is not popular or new, its clients are difficult to find, and randomly generated test cases may not reflect real usages of libraries. • Our previous work shows that even in a popular library, some methods or classes are rarely used. As our approach does not need client code, it is able to infer specifications when these methods have descriptions, complementing these previous approaches. These inferred specifications can be used to detect bugs when developers leverage libraries to develop client code.
  • 9. Approach Overview of our approach Javadoc Analysis – Extract method descriptions and class/interface hierarchies from Javadocs Method Summary • Search for the anchor name “method_summary” Class/interface hierarchies • Searches for the file named “package-tree.html” in the directory of each package and then locates the topics “Class Hierarchy” and “Interface Hierarchy”
  • 10. NLP analysis • Build an action – resource pair from each method’s description through NLP analysis. • In NLP, the problem of identifying words belonging to a predefined category in a document is known as Named Entity Recognition (NER). • In the literature, researchers have proposed the below approaches to recognize those entities. i. Rule-based approaches ii. Dictionary-based approaches iii. Machine-learning-based approaches • Rule-based approaches use hand-crafted rules. • Dictionary-based approaches use a large collection of names as a dictionary for entities. • As it is difficult to build hand-crafted rules or dictionaries for actions and resources, we choose machine-learning based approaches for actions and resources. • In particular, our approach uses the NER based on Hidden Markov Model(HMM) since it is reported to perform better than other machine-learning-based approaches.
  • 11. In NER, HMM is a five-tuple {Ω𝑠,Ω𝑜,𝜋,𝐴,𝐵} where • Ω𝑠 ={𝑠1,...,𝑠𝑛}is the finite set of states. In our approach, these states include action, resource, and other. • Ω𝑜 ={𝑜1,...,𝑜𝑛}is the set of observations. In our approach, 𝑜𝑖 =⟨𝑤𝑖,𝑓𝑖⟩ where 𝑤𝑖 is a word and 𝑓𝑖 =⟨𝐹𝑊𝑖 ,𝐹𝑀𝑖 ,𝐹𝑃𝑂𝑆𝑖⟩. Here, 𝐹𝑊 denotes the word feature such as capitalization and digitalization; 𝐹𝑀 denotes the morphological feature such as prefix and suffix; 𝐹𝑃𝑂𝑆 denotes the part-of-speech feature such as nouns, verbs, prepositions, adverbs, and adjectives. • 𝜋∈Ω𝑠 is the initial state. In our approach, 𝜋 denotes the state of the first word of each method description. • 𝐴:Ω𝑠×Ω𝑠 →[0,1] is the probability distribution on state transitions. For example,𝐴(𝑎𝑐𝑡𝑖𝑜𝑛, 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒) denotes the probability of a transition from action to resource. • 𝐵 :Ω𝑠 ×Ω𝑜 →[0,1] is the probability distribution on state symbol emissions. For example, 𝐵(𝑎𝑐𝑡𝑖𝑜𝑛,⟨𝑐𝑙𝑜𝑠𝑒, 𝑓⟩) denotes the probability of observing⟨𝑐𝑙𝑜𝑠𝑒, 𝑓⟩when it is in the state action.
  • 12. Automata Inference The final step of our approach is to infer automata for resources from action-resource pairs and class/interface hierarchies. Action – Resource pairs java.nio.channels.Channel.close() ⟨𝑐𝑙𝑜𝑠𝑒, 𝑐ℎ𝑎𝑛𝑛𝑒𝑙⟩←“Closes this channel.” java.nio.channels.GatheringByteChannel.write() ⟨𝑤𝑟𝑖𝑡𝑒, 𝑐ℎ𝑎𝑛𝑛𝑒𝑙⟩←‘Writes a sequence of bytes to this channel from the given buffers.” java.nio.channels.ReadableByteChannel.read() ⟨𝑟𝑒𝑎𝑑, 𝑐ℎ𝑎𝑛𝑛𝑒𝑙⟩←“Reads a sequence of bytes from this channel into the given buffer.” Grouping java.nio.channels.GatheringByteChannel {write(), close()} java.nio.channels.ReadableByteChannel {read(), close()} java.nio.channels.Channel {close()} Mapping • Creation methods • Lock methods • Manipulation methods • Unlock methods • Closure methods
  • 13. Evaluations Our evaluations focus on two aspects of our approach 1. The performance of documentation analysis and specification inference and 2. The quality of inferred specifications In our evaluations, We have used 687 methods.(tagged manually) We inferred specifications for only Five libraries: J2SE, J2EE, JBoss, iText and Oracle JDBC driver Performance:
  • 14. Quality of Inferred Specifications 1. Statistics of inferred resource specifications Golden Standard
  • 15. Usefulness of inferred resource specifications: The specifications inferred by doc2spec is useful to detect bugs. How to detect bugs using this specification ? 1. Searches using Google Code Search Engine. 2. Uses Partial parser to build control flow graph. 3. Finally our infrastructure checks whether the downloaded code snippets violates the inferred specification using the resolved type and control flow graphs. For example: If resource is declared by method as a local variable our infrastructure checks the methods control flow graph using the following criteria. O/M: A manipulated resource should be already created if a resource has manipulation methods and creation methods. O/C: A created resource should eventually be closed if a resource has creation methods and closure methods. L/U: A locked resource should eventually be unlocked if a resource has lock methods and unlock methods. M/C: A manipulated resource should be closed eventually if a resource has manipulation methods and closure methods.
  • 16. To validate the suspected bug: 1. Check the latest version of software whether it is existing or not. If exists then it is a Real bug 2. Submit the bug report to developer Threats to validity : External validity: We applied our approach only on Javadocs particularly on limited libraries. We need to extend our approach on more subjects in future work. Internal validity: Human factors for determining bugs need to inspect bugs carefully and involve more experience developers.
  • 17. Benefits 1. This complements mining specifications from client code. 2. Similarly it also complements inferring specifications from comments. 3. Detecting bugs in API documentation. Future work A. Resource Analysis • Need to group similar methods with different names. • Need to resolve noun phrases. B. False positive rate • To reduce high false positive rate. C. Extension • Bugs in local code bases. • Mining specification templates. • Other API documentation and descriptions.
  • 18. Conclusion In this paper, we propose a novel approach that infers resource specifications from existing API documentation. We conducted an evaluation on Javadocs of five widely used libraries. We further conducted an evaluation to use inferred specifications to detect bugs. The results show that resource specifications inferred by our approach are useful to detect real bugs in practice.