SlideShare uma empresa Scribd logo
1 de 15
Baixar para ler offline
Topological methods




            Presented by:
            Sukhpal Singh
            Thapar University
Topological methods
Topological methods are based on the simple
premise that, given a query that describes
some required features, we are interested in
identifying library assets that come closest to
providing these features. Such methods are
critically dependent on what it means to come
closest, which in turn depends on some
definition of distance between the query and
candidate assets [1].
Categories of Topological methods
• Exclusive approximate retrieval: Methods that
  fall into this category make a distinction
  between two retrieval goals: exact retrieval and
  approximate retrieval, whereby we seek to
  identify library assets that completely satisfy all
  the requirements of the query.
• Inclusive approximate retrieval: Methods that
  fall into this category make no distinction
  between exact retrieval and approximate
  retrieval. Rather, they focus on identifying
  library assets that minimize some measure of
  distance to the query.
Measures of distance can be divided into two
                broad classes
• Measures of functional (semantic) distance,
  which reflect the extent of similarity between
  the functional properties of the query and those
  of candidate components.

• Measures of structural (syntactic) distance,
  which reflect the extent of similarity between
  the structure of (solutions to) the query and
  the structure of candidate components.
Characterizing topological methods
The Google PageRank Algorithm is
  used in Topological methods to
  retrieve a software assets from
        software repository.
What is PageRank?
• In short PageRank is a “vote”, by all the other
  pages on the Web, about how important a
  page is [3].
• A link to a page counts as a vote of support
• PR(A) = (1-d) + d(PR(T1)/C(T1)
  +…+PR(Tn)/C(Tn))
Breaking Down the Equation
• PR(Tn) - Each page has a notion of its own self-importance. That’s “PR(T1)”
  for the first page in the web all the way up to “PR(Tn)” for the last page

• C(Tn) - Each page spreads its vote out evenly amongst all of it’s outgoing
  links. The count, or number, of outgoing links for page 1 is “C(T1)”, “C(Tn)”
  for page n, and so on for all pages.

• PR(Tn)/C(Tn) - so if our page (page A) has a backlink from page “n” the
  share of the vote page A will get is “PR(Tn)/C(Tn)”

• d(… - All these fractions of votes are added together but, to stop the other
  pages having too much influence, this total vote is “damped down” by
  multiplying it by 0.85 (the factor “d”)

• (1 - d) - The (1 – d) bit at the beginning is a bit of probability math magic so
  the “sum of all web pages’ PageRank's will be one”: it adds in the bit lost
  by the d(…. It also means that if a page has no links to it (no backlinks) even
  then it will still get a small PR of 0.15 (i.e. 1 – 0.85).
How is it Calculated?
• The PR of each page depends on the PR of the
  pages pointing to it.
• But we won’t know what PR those pages have
  until the pages pointing to them have their PR
  calculated and so on.
• So what we do is make a guess.
Simple Example



• Each page has one outgoing link (backlink). So that
  means [2] :

• C(T1) = 1 for A
      and
• C(T2) = 1 for B
We don’t know what their PR should be to begin with, so we
         will just guess 1 as a safe random number.


• d (damping factor) = 0.85
• PR(A)= (1 – d) + d(PR(T1)/C(T1))= (1 – d) + d(1/1)

  i.e.

• PR(A)= 0.15 + 0.85 * 1
  =1
• PR(B)= 0.15 + 0.85 * 1
  =1
Let’s Do It Again with Another Number. Let’s try 0 and re-
                            calculate…
• PR(A)= 0.15 + 0.85 * 0
      = 0.15
      = 0.15 + 0.85 *
• PR(B) 0.15
      = 0.2775
• Now we have calculated a “next best guess” so we just plug it in the
  equation again…

• PR(A)= 0.15 + 0.85 * 0.2775
  = 0.385875
• PR(B)= 0.15 + 0.85 * 0.385875
  = 0.47799375

And again…
• PR(A)= 0.15 + 0.85 * 0.47799375
  = 0.5562946875
• PR(B)= 0.15 + 0.85 * 0.5562946875
  = 0.622850484375
Principle
• It doesn’t matter where you start your guess,
  once the PageRank calculations have settled
  down, the “normalized probability
  distribution” (the average PageRank for all
  pages) will be 1.0
• In software repository we are using software
  assets instead of pages and also using
  relationships among software assets based on
  their keywords instead of links.
Summary
References:
[1]   A survey of software reuse libraries A. Mili a,_, R. Mili
      b and R.T. Mittermeir Annals of Software
      Engineering 5 (1998) 349–414 349

[2]   http://wwwdb.stanford.edu/~backrub/google.html
      http://www-db.stanford.edu/~backrub/google.html

[3]   Semantic Component Retrieval in Software
      Engineering Inaugural dissertation zur Erlangung des
      akademischen       Grades eines Doktors der
      Naturwissenschaften der, Universitat Mannheim,
      Mannheim, 2008

Mais conteúdo relacionado

Mais procurados

Spherule Diagrams: A Matrix-based Set Visualization Compared with Euler Diagrams
Spherule Diagrams: A Matrix-based Set Visualization Compared with Euler DiagramsSpherule Diagrams: A Matrix-based Set Visualization Compared with Euler Diagrams
Spherule Diagrams: A Matrix-based Set Visualization Compared with Euler DiagramsMithileysh Sathiyanarayanan
 
Spherule Diagrams with Graph for Social Network Visualization
Spherule Diagrams with Graph for Social Network VisualizationSpherule Diagrams with Graph for Social Network Visualization
Spherule Diagrams with Graph for Social Network VisualizationMithileysh Sathiyanarayanan
 
Data Structure Assignment help , Data Structure Online tutors
Data Structure Assignment help , Data Structure Online tutorsData Structure Assignment help , Data Structure Online tutors
Data Structure Assignment help , Data Structure Online tutorsjohn mayer
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Henock Beyene
 
Contextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsContextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsMatthias Braunhofer
 

Mais procurados (6)

Spherule Diagrams: A Matrix-based Set Visualization Compared with Euler Diagrams
Spherule Diagrams: A Matrix-based Set Visualization Compared with Euler DiagramsSpherule Diagrams: A Matrix-based Set Visualization Compared with Euler Diagrams
Spherule Diagrams: A Matrix-based Set Visualization Compared with Euler Diagrams
 
Spherule Diagrams with Graph for Social Network Visualization
Spherule Diagrams with Graph for Social Network VisualizationSpherule Diagrams with Graph for Social Network Visualization
Spherule Diagrams with Graph for Social Network Visualization
 
Lecture6 pca
Lecture6 pcaLecture6 pca
Lecture6 pca
 
Data Structure Assignment help , Data Structure Online tutors
Data Structure Assignment help , Data Structure Online tutorsData Structure Assignment help , Data Structure Online tutors
Data Structure Assignment help , Data Structure Online tutors
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
 
Contextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsContextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender Systems
 

Destaque

How to Write an Effective Research Paper
How to Write an Effective Research PaperHow to Write an Effective Research Paper
How to Write an Effective Research PaperDr Sukhpal Singh Gill
 
Reduction of Blocking Artifacts In JPEG Compressed Image
Reduction of Blocking Artifacts In JPEG Compressed ImageReduction of Blocking Artifacts In JPEG Compressed Image
Reduction of Blocking Artifacts In JPEG Compressed ImageDr Sukhpal Singh Gill
 
Java.NET: Integration of Java and .NET
Java.NET: Integration of Java and .NETJava.NET: Integration of Java and .NET
Java.NET: Integration of Java and .NETDr Sukhpal Singh Gill
 
If you know nothing about HTML, this is where you can start !!
If you know nothing about HTML, this is where you can start !!If you know nothing about HTML, this is where you can start !!
If you know nothing about HTML, this is where you can start !!Dr Sukhpal Singh Gill
 
Reduction of Blocking Artifacts In JPEG Compressed Image
 Reduction of Blocking Artifacts In JPEG Compressed Image Reduction of Blocking Artifacts In JPEG Compressed Image
Reduction of Blocking Artifacts In JPEG Compressed ImageDr Sukhpal Singh Gill
 
GREEN CLOUD COMPUTING-A Data Center Approach
GREEN CLOUD COMPUTING-A Data Center ApproachGREEN CLOUD COMPUTING-A Data Center Approach
GREEN CLOUD COMPUTING-A Data Center ApproachDr Sukhpal Singh Gill
 
Workshop on Basics of Software Engineering (DFD, UML and Project Culture)
Workshop on Basics of Software Engineering (DFD, UML and Project Culture)Workshop on Basics of Software Engineering (DFD, UML and Project Culture)
Workshop on Basics of Software Engineering (DFD, UML and Project Culture)Dr Sukhpal Singh Gill
 
Software Requirements Specification (SRS) for Online Tower Plotting System (O...
Software Requirements Specification (SRS) for Online Tower Plotting System (O...Software Requirements Specification (SRS) for Online Tower Plotting System (O...
Software Requirements Specification (SRS) for Online Tower Plotting System (O...Dr Sukhpal Singh Gill
 
Case Study Based Software Engineering Project Development: State of Art
Case Study Based Software Engineering Project Development: State of ArtCase Study Based Software Engineering Project Development: State of Art
Case Study Based Software Engineering Project Development: State of ArtDr Sukhpal Singh Gill
 

Destaque (14)

How to Write an Effective Research Paper
How to Write an Effective Research PaperHow to Write an Effective Research Paper
How to Write an Effective Research Paper
 
Reduction of Blocking Artifacts In JPEG Compressed Image
Reduction of Blocking Artifacts In JPEG Compressed ImageReduction of Blocking Artifacts In JPEG Compressed Image
Reduction of Blocking Artifacts In JPEG Compressed Image
 
The reuse capability model
The reuse capability modelThe reuse capability model
The reuse capability model
 
Network Topologies
Network TopologiesNetwork Topologies
Network Topologies
 
Java.NET: Integration of Java and .NET
Java.NET: Integration of Java and .NETJava.NET: Integration of Java and .NET
Java.NET: Integration of Java and .NET
 
If you know nothing about HTML, this is where you can start !!
If you know nothing about HTML, this is where you can start !!If you know nothing about HTML, this is where you can start !!
If you know nothing about HTML, this is where you can start !!
 
Reduction of Blocking Artifacts In JPEG Compressed Image
 Reduction of Blocking Artifacts In JPEG Compressed Image Reduction of Blocking Artifacts In JPEG Compressed Image
Reduction of Blocking Artifacts In JPEG Compressed Image
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
GREEN CLOUD COMPUTING-A Data Center Approach
GREEN CLOUD COMPUTING-A Data Center ApproachGREEN CLOUD COMPUTING-A Data Center Approach
GREEN CLOUD COMPUTING-A Data Center Approach
 
Workshop on Basics of Software Engineering (DFD, UML and Project Culture)
Workshop on Basics of Software Engineering (DFD, UML and Project Culture)Workshop on Basics of Software Engineering (DFD, UML and Project Culture)
Workshop on Basics of Software Engineering (DFD, UML and Project Culture)
 
Software Requirements Specification (SRS) for Online Tower Plotting System (O...
Software Requirements Specification (SRS) for Online Tower Plotting System (O...Software Requirements Specification (SRS) for Online Tower Plotting System (O...
Software Requirements Specification (SRS) for Online Tower Plotting System (O...
 
Case Study Based Software Engineering Project Development: State of Art
Case Study Based Software Engineering Project Development: State of ArtCase Study Based Software Engineering Project Development: State of Art
Case Study Based Software Engineering Project Development: State of Art
 
Software Requirement Specification
Software Requirement SpecificationSoftware Requirement Specification
Software Requirement Specification
 
Constructors and Destructors
Constructors and DestructorsConstructors and Destructors
Constructors and Destructors
 

Semelhante a Topological methods

Semelhante a Topological methods (20)

Page rank1
Page rank1Page rank1
Page rank1
 
Analysis Of Algorithm
Analysis Of AlgorithmAnalysis Of Algorithm
Analysis Of Algorithm
 
Dm page rank
Dm page rankDm page rank
Dm page rank
 
How Google Works
How Google WorksHow Google Works
How Google Works
 
Page rank algortihm
Page rank algortihmPage rank algortihm
Page rank algortihm
 
Implementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduceImplementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduce
 
Page rank2
Page rank2Page rank2
Page rank2
 
PageRank
PageRankPageRank
PageRank
 
Search engine page rank demystification
Search engine page rank demystificationSearch engine page rank demystification
Search engine page rank demystification
 
Local Approximation of PageRank
Local Approximation of PageRankLocal Approximation of PageRank
Local Approximation of PageRank
 
BigData - PageRank Algorithm with Scala and Spark
BigData - PageRank Algorithm with Scala and SparkBigData - PageRank Algorithm with Scala and Spark
BigData - PageRank Algorithm with Scala and Spark
 
PageRank & Searching
PageRank & SearchingPageRank & Searching
PageRank & Searching
 
Optimizing search engines
Optimizing search enginesOptimizing search engines
Optimizing search engines
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
nueva
nuevanueva
nueva
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Topological methods

  • 1. Topological methods Presented by: Sukhpal Singh Thapar University
  • 2. Topological methods Topological methods are based on the simple premise that, given a query that describes some required features, we are interested in identifying library assets that come closest to providing these features. Such methods are critically dependent on what it means to come closest, which in turn depends on some definition of distance between the query and candidate assets [1].
  • 3. Categories of Topological methods • Exclusive approximate retrieval: Methods that fall into this category make a distinction between two retrieval goals: exact retrieval and approximate retrieval, whereby we seek to identify library assets that completely satisfy all the requirements of the query. • Inclusive approximate retrieval: Methods that fall into this category make no distinction between exact retrieval and approximate retrieval. Rather, they focus on identifying library assets that minimize some measure of distance to the query.
  • 4. Measures of distance can be divided into two broad classes • Measures of functional (semantic) distance, which reflect the extent of similarity between the functional properties of the query and those of candidate components. • Measures of structural (syntactic) distance, which reflect the extent of similarity between the structure of (solutions to) the query and the structure of candidate components.
  • 6. The Google PageRank Algorithm is used in Topological methods to retrieve a software assets from software repository.
  • 7. What is PageRank? • In short PageRank is a “vote”, by all the other pages on the Web, about how important a page is [3]. • A link to a page counts as a vote of support • PR(A) = (1-d) + d(PR(T1)/C(T1) +…+PR(Tn)/C(Tn))
  • 8. Breaking Down the Equation • PR(Tn) - Each page has a notion of its own self-importance. That’s “PR(T1)” for the first page in the web all the way up to “PR(Tn)” for the last page • C(Tn) - Each page spreads its vote out evenly amongst all of it’s outgoing links. The count, or number, of outgoing links for page 1 is “C(T1)”, “C(Tn)” for page n, and so on for all pages. • PR(Tn)/C(Tn) - so if our page (page A) has a backlink from page “n” the share of the vote page A will get is “PR(Tn)/C(Tn)” • d(… - All these fractions of votes are added together but, to stop the other pages having too much influence, this total vote is “damped down” by multiplying it by 0.85 (the factor “d”) • (1 - d) - The (1 – d) bit at the beginning is a bit of probability math magic so the “sum of all web pages’ PageRank's will be one”: it adds in the bit lost by the d(…. It also means that if a page has no links to it (no backlinks) even then it will still get a small PR of 0.15 (i.e. 1 – 0.85).
  • 9. How is it Calculated? • The PR of each page depends on the PR of the pages pointing to it. • But we won’t know what PR those pages have until the pages pointing to them have their PR calculated and so on. • So what we do is make a guess.
  • 10. Simple Example • Each page has one outgoing link (backlink). So that means [2] : • C(T1) = 1 for A and • C(T2) = 1 for B
  • 11. We don’t know what their PR should be to begin with, so we will just guess 1 as a safe random number. • d (damping factor) = 0.85 • PR(A)= (1 – d) + d(PR(T1)/C(T1))= (1 – d) + d(1/1) i.e. • PR(A)= 0.15 + 0.85 * 1 =1 • PR(B)= 0.15 + 0.85 * 1 =1
  • 12. Let’s Do It Again with Another Number. Let’s try 0 and re- calculate… • PR(A)= 0.15 + 0.85 * 0 = 0.15 = 0.15 + 0.85 * • PR(B) 0.15 = 0.2775 • Now we have calculated a “next best guess” so we just plug it in the equation again… • PR(A)= 0.15 + 0.85 * 0.2775 = 0.385875 • PR(B)= 0.15 + 0.85 * 0.385875 = 0.47799375 And again… • PR(A)= 0.15 + 0.85 * 0.47799375 = 0.5562946875 • PR(B)= 0.15 + 0.85 * 0.5562946875 = 0.622850484375
  • 13. Principle • It doesn’t matter where you start your guess, once the PageRank calculations have settled down, the “normalized probability distribution” (the average PageRank for all pages) will be 1.0 • In software repository we are using software assets instead of pages and also using relationships among software assets based on their keywords instead of links.
  • 15. References: [1] A survey of software reuse libraries A. Mili a,_, R. Mili b and R.T. Mittermeir Annals of Software Engineering 5 (1998) 349–414 349 [2] http://wwwdb.stanford.edu/~backrub/google.html http://www-db.stanford.edu/~backrub/google.html [3] Semantic Component Retrieval in Software Engineering Inaugural dissertation zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften der, Universitat Mannheim, Mannheim, 2008