SlideShare uma empresa Scribd logo
1 de 31
Kenyon: A Software Stratigraphy Platform



 Jennifer Bevan, Sunghun               Lijie Zou, Mike Godfrey
Kim, E. James Whitehead Jr.               University of Waterloo
University of California, Santa Cruz        {lzou, migod}
     {jbevan, hunkim, ejw}                  @uwaterloo.edu
         @cs.ucsc.edu
Motivation

 Static analysis-based software evolution
  research has several common technical
  issues to manage.
     Extracting meaningful configurations from an
      SCM repository.
     Calculating static relations, metrics.
         Augments data from commit log messages.
     Saving the extracted facts.
         For later time-based analysis, data mining,
          incremental data addition.
Ongoing Static Evolution Research

 Instability Analysis (J. Bevan)
      Refines Zimmerman/Ying/Murphy using static
       dependence to remove temporal dependencies
 Entity Mapping/Origin Analysis (L. Zou, M.
  Godfrey)
      Uses static metrics to identify moved/split/merged
       procedures, files.
 Code clone evolution (M. Kim)
      Identifies clones and follows their evolution.
More Static Evolution Research

 Association rule mining
      For predicting changes [Ying et al., IEEE TSE, v30 n9, Sept. 2004]
      For architectural justification [Zimmermann, Diehl, and Zeller,
       Proc. IWPSE 2003]
 Identifying code “chunks” for future
  modularization [Mockus and Weiss, IEEE Software, v18 n2, 2001]
 “Feature” identification [Fischer, Pinzger, and Gall, Proc. WCRE
  2003]

 …and the ongoing research related to these.
Problem

 Despite similarity of approach, systems make
  several choices that limit sharing of technology and
  results:
      Usually choosing a single SCM system (CVS) for data.
      Usually creating a proprietary database schema.
      Usually not easily integratable with other research
       projects for result sharing.
 The cost of computationally expensive analysis
  techniques are not amortized across multiple
  research directions.
Solution: Kenyon

 Kenyon is designed to facilitate static software
  evolution research by providing common solutions
  to these common problems:
      Phase 1: Automatic configuration extraction from SCM
      Phase 2: Invoking static analysis tool(s)
      Phase 3: Storing the results from these preprocessing
       steps.
      Asynchronously provides access to previously
       processed and stored data.
Kenyon Processing

                                           Phases 2 & 3
                                           Fact Extraction
             Phase 1                       (Static Analysis)
             Configuration                 and Persist
             Extraction                    Gathered Facts
  SCM                                                            Kenyon
Repository                                                       Repository
                                                                 (RDBMS/
                                                                 Hibernate)
                             Filesystem



                                                               Client Tools
                                                               perform queries,
                                                               add new facts


                                            Client
                                           Software
                                          (e.g., IVA)
Phase 1: Extract Configurations

 Kenyon provides transaction recovery and logical
  configuration extraction for multiple SCM systems.
      Configurations specified by time + branch identifier.
      Sliding window algorithm for transaction recovery.
      Only changes from completed transactions are extracted
       for a “logical configuration”.
      Only changes from transactions that completed between
       two specifications are considered for a “configuration
       delta”.
Configuration Specification

 Kenyon’s logical configuration extraction and delta
  calculations allow researchers to consider software
  “as it existed at time T on branch B”.
      Most SCM systems archive data along a timeline with
       varying support for parallel development.
      Kenyon uses this commonality as the basis for its SCM
       interface and configuration specification.
      There is no indication that change-set based SCM
       systems will not be supportable by Kenyon.
Logical Configuration

• At any given point in time,
  one or more transactions may
  have just completed, and one
  or more may be ongoing.        T1
• Ongoing transactions are                           F4
  shown in red.
• Completed transactions are               F2
  shown in green.                     F1
                                                F3
Configuration Deltas

• Configuration deltas are
  calculated as C(T2) –
  C(T1).
• Only changes from            T2
  transactions completing
  between T1 (exclusive) and
                               T1                  F4
  T2 (inclusive) are
  considered.
                                              F3
                                         F2
                                    F1
Data from Phase 1

 Valid configuration specifications for extraction are
  created by Kenyon, one per timestamp where a
  transaction completed.
 For each configuration extracted:
      Author and log message of each transaction completing
       at that specification.
      The configuration is placed on the filesystem.
 A configuration delta for each consecutive pair of
  configurations processed can also be stored.
Phase 2: Invoke Fact Extractors

 Kenyon provides an abstract class that is used to
  invoke third-party fact extractors on the
  configuration extracted to the filesystem.
      Kenyon users would subclass this class to invoke their
       own fact extractor.
      Support for Codesurfer (line-level analysis) and
       SWAGKIT (procedure-level analysis) are provided with
       Kenyon. [www.grammatech.com, swag.uwaterloo.ca]
      FactExtractor subclasses have a tri-modal return status:
       “failure”, “new data to store”, or “no new data to store”.
Data from Phase 2

 FactExtractor subclasses provide:
      A ConfigGraph that maps software elements to nodes
       and static relationships to edges.
      The graph, any node, and any edge may be attributed
       with static metrics.
 Multiple fact extractors may be invoked on a single
  configuration: each created ConfigGraph is saved
  with a reference to the fact extractor that created it.
 If a configuration has already been processed by a
  given fact extractor, it will not be processed again
  unless new metrics are to be calculated.
Phase 3: Data Storage

 Kenyon uses Hibernate to persist data
  classes.
     Hibernate is an “object/relational persistence and
      query service for Java” [www.hibernate.org].
     Allows reuse of Kenyon classes by research
      tools implemented in Java.
     Each configuration processed by Kenyon is
      assigned to a Project, the top-level data class
      persisted by Kenyon.
Persisted Kenyon Data

• Projects contain one set of
  data for each configuration                             Project
  specification processed.                                    1

                                                              N
• Each such data set                            N   1
                                ConfigGraph             ConfigData
  contains one or more                1                       1
  ConfigGraphs, each                  1                       N
  produced by a different
                                FactExtractor           ConfigSpec
  FactExtractor.
                                      1                       2
• FactExtractors specify              1                       1
  what GraphSchema              GraphSchema             ConfigDelta
  subclass they use (not
  restrictive).
Data Access

 Hibernate allows access to preprocessed data using
  SQL or the Hibernate query methods (HQL, QBE/
  QBC), which support class/field-based queries.
      A Hibernate query returns a List of Objects, each of
       which is of the type originally persisted.
      Data fields in the returned class are populated unless
       specified as lazily loaded.
 Kenyon provides several convenience queries for
  common anticipated queries, such as “what
  configurations are available for this project”.
Kenyon Usage

 Kenyon processes data based on specifications in a
  configuration file
      Start time, stop time, how often to process
      Fact extractors and their assigned metric calculators.
      SCM parameters, filesystem parameters, some control
       over what Hibernate persists.
 A “processing run” will reuse any previously
  processed data if available
      For example, if a ConfigGraph has already been created,
       if new metrics are necessary they are calculated and
       added to the existing ConfigGraph.
Iterative Refinement Example

 When looking for “interesting” timeframes of
  evolution, a multiple-pass process is recommended.
      A user can configure Kenyon to process the changes in a
       system once per day.
      Days with high activity or other metrics exceeding a
       threshold can be flagged as “interesting”.
      The user can then configure Kenyon to process those
       days (via multiple processing runs) at the frequency of
       “every 20 minutes”.
      This process can repeat down to the “every second”
       level.
Parallel Preprocessing

 Kenyon is a single-threaded process, but Hibernate
  supports multiple connections to a single Kenyon
  database.
 A 10-year history can be processed in chunks by
  any number of computers, even if the processing
  configurations have overlapping times or different
  intervals.
 Kenyon does not integrate the deltas between
  different processing runs, so a small overlap in
  processing chunks is suggested.
Kenyon Architecture


 ConfigData           Project                 Hibernate/DBMS


 ConfigGraph                           <<calls>>
                                                   DataManager
                                                   <<calls>>

               <<calls>>
MetricLoader               Fact Extractor          SCMInterface
                                                   <<calls>>



                                                        SCM
                            Filesystem
                                                      Repository
Current Status

 Kenyon 1.2 available at
  http://kenyon.dforge.cse.ucsc.edu
 Supports CVS, Subversion, and ClearCase
 Students in 290G are performing projects
  using Kenyon this quarter
 Actively working with Samsung to analyze
  some of their source code.
Future Work (1/3)

 Continue working with M. Kim
      Evaluate usefulness of SCM-only module.
      If she decides to use Kenyon, assist with full integration.
 Finish integration of Beagle/Kenyon and
  IVA/Kenyon.
 Work with G. Murphy on using Kenyon at UBC.
 Evaluate Kenyon’s ability to reduce the time-to-
  results for static software evolution research by
  analyzing the seminar class projects.
Future Work (2/3)

 Support branch path traversal
      Allow users to see the branch points in a system and
       specify a path for processing instead of a single branch.
      Will reuse existing visualizations, must add specification
       mechanism.
 Incorporate full language-specific containment
  models for better inter-language graph traversal and
  mapping.
      Use M. Godfrey’s Java fact extractor and containment
       model.
Future Work (3/3)

 Support more of the Standard Exchange
  Formats for ConfigGraph export.
     TA is already supported, but only the Fact
      sections. Schema sections should be improved
      to use the language-specific containment models.
 Encourage other reseachers to use Kenyon,
  and improve results-sharing, capabilities, etc.
  based on their feedback.
Open Issues (1/3)

 The exact mechanism for allowing data
  sharing between researchers is not entirely
  controllable by Kenyon
     Database setup and administration can
      effectively override much of Kenyon’s
      preferences.
     By default, Kenyon-created tables are not
      mutable by processes other than Kenyon.
Open Issues (2/3)

 Kenyon provides a public class, EvolutionPath, that
  links a subgraph in one ConfigGraph to one in
  another ConfigGraph.
      Directed and attributable.
      Basic building block for evolution data.
 Is currently persisted by Kenyon, will likely not be
  after 1.1, due to database mutability issues.
      Other research projects can subclass and, if they want to
       share their results easily, persist them to a Hibernate
       database using the provided Hibernate mapping
       examples.
Open Issues (3/3)

 Kenyon is able to be automatically invoked
  via a post-commit script or a cron job.
 Should Kenyon be able to be automatically
  invoked from an IDE?
 What sort of support should Kenyon provide
  for better integration with, for example,
  Eclipse?
Conclusions (1/2)

 Kenyon is an engineering solution, designed to
  amortize the cost of the computationally expensive
  preprocessing steps that can benefit static software
  evolution research.
 Research projects using Kenyon will not have to
  independently create solutions for these common
  problems.
      18% code reduction in Beagle without really trying.
      Is expected to reduce the lag between beginning system
       implementation and producing research results.
Conclusions (2/2)

 Kenyon is not intended to be a lightweight data
  mining system for software evolution research.
      Tradeoff of speed vs. precision is still controllable via
       the choice of fact extractors.
      The configuration extraction time and associated
       network lag already put the per-configuration time at
       O(seconds)
 Instead, it allows the cost of time-consuming,
  computationally expensive preprocessing, to be
  amortized among researchers.
Questions?

 Kenyon was created primarily from code that existed in
  IVA, which is being funded by NSF grant CCR-01234603.
  Kenyon also contains code from Beagle, the origin analysis
  project overseen by Mike Godfrey.


 Email jbevan@cs.ucsc.edu with future questions.

   http://www.cse.ucsc.edu/research/labs/grase/kenyon/

Mais conteúdo relacionado

Semelhante a Kenyon: A Software Stratigraphy Platform (ESEC/FSE 2005)

Net framework session03
Net framework session03Net framework session03
Net framework session03Niit Care
 
Application scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system designApplication scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system designMr. Chanuwan
 
Microx - A Unix like kernel for Embedded Systems written from scratch.
Microx - A Unix like kernel for Embedded Systems written from scratch.Microx - A Unix like kernel for Embedded Systems written from scratch.
Microx - A Unix like kernel for Embedded Systems written from scratch.Waqar Sheikh
 
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen HypervisorMatteo Ferroni
 
Eclipse DemoCamp Bucharest 2014 - Continuous Integration Jenkins/Hudson
Eclipse DemoCamp Bucharest 2014 - Continuous Integration Jenkins/HudsonEclipse DemoCamp Bucharest 2014 - Continuous Integration Jenkins/Hudson
Eclipse DemoCamp Bucharest 2014 - Continuous Integration Jenkins/HudsonVladLica
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentBharaniDharan195623
 
SELF LEARNING REAL TIME EXPERT SYSTEM
SELF LEARNING REAL TIME EXPERT SYSTEMSELF LEARNING REAL TIME EXPERT SYSTEM
SELF LEARNING REAL TIME EXPERT SYSTEMcscpconf
 
Agile & Iconix sdlc
Agile & Iconix sdlcAgile & Iconix sdlc
Agile & Iconix sdlcAhmed Nehad
 
Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...
Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...
Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...RUDDER
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixZongYing Lyu
 
cloud computing preservity
cloud computing preservitycloud computing preservity
cloud computing preservitychennuruvishnu
 
Second review presentation
Second review presentationSecond review presentation
Second review presentationArvind Krishnaa
 
Synchronization
SynchronizationSynchronization
Synchronizationmisra121
 
A tale of Disaster Recovery (Cfengine everyday, practices and tools)
A tale of Disaster Recovery (Cfengine everyday, practices and tools)A tale of Disaster Recovery (Cfengine everyday, practices and tools)
A tale of Disaster Recovery (Cfengine everyday, practices and tools)RUDDER
 
A tale of Disaster Recovery (Cfengine everyday, practices and tools)
A tale of Disaster Recovery (Cfengine everyday, practices and tools)A tale of Disaster Recovery (Cfengine everyday, practices and tools)
A tale of Disaster Recovery (Cfengine everyday, practices and tools)Jonathan Clarke
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysisDenis C. Bauer
 
Application cloudification with liberty and urban code deploy - UCD
Application cloudification with liberty and urban code deploy - UCDApplication cloudification with liberty and urban code deploy - UCD
Application cloudification with liberty and urban code deploy - UCDDavide Veronese
 

Semelhante a Kenyon: A Software Stratigraphy Platform (ESEC/FSE 2005) (20)

Net framework session03
Net framework session03Net framework session03
Net framework session03
 
Application scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system designApplication scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system design
 
ResumeJagannath
ResumeJagannathResumeJagannath
ResumeJagannath
 
Microx - A Unix like kernel for Embedded Systems written from scratch.
Microx - A Unix like kernel for Embedded Systems written from scratch.Microx - A Unix like kernel for Embedded Systems written from scratch.
Microx - A Unix like kernel for Embedded Systems written from scratch.
 
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
 
Eclipse DemoCamp Bucharest 2014 - Continuous Integration Jenkins/Hudson
Eclipse DemoCamp Bucharest 2014 - Continuous Integration Jenkins/HudsonEclipse DemoCamp Bucharest 2014 - Continuous Integration Jenkins/Hudson
Eclipse DemoCamp Bucharest 2014 - Continuous Integration Jenkins/Hudson
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managament
 
SELF LEARNING REAL TIME EXPERT SYSTEM
SELF LEARNING REAL TIME EXPERT SYSTEMSELF LEARNING REAL TIME EXPERT SYSTEM
SELF LEARNING REAL TIME EXPERT SYSTEM
 
Agile & Iconix sdlc
Agile & Iconix sdlcAgile & Iconix sdlc
Agile & Iconix sdlc
 
Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...
Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...
Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unix
 
cloud computing preservity
cloud computing preservitycloud computing preservity
cloud computing preservity
 
Second review presentation
Second review presentationSecond review presentation
Second review presentation
 
Synchronization
SynchronizationSynchronization
Synchronization
 
A tale of Disaster Recovery (Cfengine everyday, practices and tools)
A tale of Disaster Recovery (Cfengine everyday, practices and tools)A tale of Disaster Recovery (Cfengine everyday, practices and tools)
A tale of Disaster Recovery (Cfengine everyday, practices and tools)
 
A tale of Disaster Recovery (Cfengine everyday, practices and tools)
A tale of Disaster Recovery (Cfengine everyday, practices and tools)A tale of Disaster Recovery (Cfengine everyday, practices and tools)
A tale of Disaster Recovery (Cfengine everyday, practices and tools)
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysis
 
IUG ATL PC 9.5
IUG ATL PC 9.5IUG ATL PC 9.5
IUG ATL PC 9.5
 
Application cloudification with liberty and urban code deploy - UCD
Application cloudification with liberty and urban code deploy - UCDApplication cloudification with liberty and urban code deploy - UCD
Application cloudification with liberty and urban code deploy - UCD
 
Spirent CloudScore
Spirent CloudScoreSpirent CloudScore
Spirent CloudScore
 

Mais de Sung Kim

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningSung Kim
 
Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Sung Kim
 
Time series classification
Time series classificationTime series classification
Time series classificationSung Kim
 
Tensor board
Tensor boardTensor board
Tensor boardSung Kim
 
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...Sung Kim
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Sung Kim
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesSung Kim
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Sung Kim
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSung Kim
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Sung Kim
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...Sung Kim
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)Sung Kim
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving softwareSung Kim
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test GenerationSung Kim
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect PredictionSung Kim
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 openingSung Kim
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect PredictionSung Kim
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSung Kim
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learningSung Kim
 

Mais de Sung Kim (20)

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
 
Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)
 
Time series classification
Time series classificationTime series classification
Time series classification
 
Tensor board
Tensor boardTensor board
Tensor board
 
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 opening
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 

Último

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Último (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Kenyon: A Software Stratigraphy Platform (ESEC/FSE 2005)

  • 1. Kenyon: A Software Stratigraphy Platform Jennifer Bevan, Sunghun Lijie Zou, Mike Godfrey Kim, E. James Whitehead Jr. University of Waterloo University of California, Santa Cruz {lzou, migod} {jbevan, hunkim, ejw} @uwaterloo.edu @cs.ucsc.edu
  • 2. Motivation  Static analysis-based software evolution research has several common technical issues to manage.  Extracting meaningful configurations from an SCM repository.  Calculating static relations, metrics.  Augments data from commit log messages.  Saving the extracted facts.  For later time-based analysis, data mining, incremental data addition.
  • 3. Ongoing Static Evolution Research  Instability Analysis (J. Bevan)  Refines Zimmerman/Ying/Murphy using static dependence to remove temporal dependencies  Entity Mapping/Origin Analysis (L. Zou, M. Godfrey)  Uses static metrics to identify moved/split/merged procedures, files.  Code clone evolution (M. Kim)  Identifies clones and follows their evolution.
  • 4. More Static Evolution Research  Association rule mining  For predicting changes [Ying et al., IEEE TSE, v30 n9, Sept. 2004]  For architectural justification [Zimmermann, Diehl, and Zeller, Proc. IWPSE 2003]  Identifying code “chunks” for future modularization [Mockus and Weiss, IEEE Software, v18 n2, 2001]  “Feature” identification [Fischer, Pinzger, and Gall, Proc. WCRE 2003]  …and the ongoing research related to these.
  • 5. Problem  Despite similarity of approach, systems make several choices that limit sharing of technology and results:  Usually choosing a single SCM system (CVS) for data.  Usually creating a proprietary database schema.  Usually not easily integratable with other research projects for result sharing.  The cost of computationally expensive analysis techniques are not amortized across multiple research directions.
  • 6. Solution: Kenyon  Kenyon is designed to facilitate static software evolution research by providing common solutions to these common problems:  Phase 1: Automatic configuration extraction from SCM  Phase 2: Invoking static analysis tool(s)  Phase 3: Storing the results from these preprocessing steps.  Asynchronously provides access to previously processed and stored data.
  • 7. Kenyon Processing Phases 2 & 3 Fact Extraction Phase 1 (Static Analysis) Configuration and Persist Extraction Gathered Facts SCM Kenyon Repository Repository (RDBMS/ Hibernate) Filesystem Client Tools perform queries, add new facts Client Software (e.g., IVA)
  • 8. Phase 1: Extract Configurations  Kenyon provides transaction recovery and logical configuration extraction for multiple SCM systems.  Configurations specified by time + branch identifier.  Sliding window algorithm for transaction recovery.  Only changes from completed transactions are extracted for a “logical configuration”.  Only changes from transactions that completed between two specifications are considered for a “configuration delta”.
  • 9. Configuration Specification  Kenyon’s logical configuration extraction and delta calculations allow researchers to consider software “as it existed at time T on branch B”.  Most SCM systems archive data along a timeline with varying support for parallel development.  Kenyon uses this commonality as the basis for its SCM interface and configuration specification.  There is no indication that change-set based SCM systems will not be supportable by Kenyon.
  • 10. Logical Configuration • At any given point in time, one or more transactions may have just completed, and one or more may be ongoing. T1 • Ongoing transactions are F4 shown in red. • Completed transactions are F2 shown in green. F1 F3
  • 11. Configuration Deltas • Configuration deltas are calculated as C(T2) – C(T1). • Only changes from T2 transactions completing between T1 (exclusive) and T1 F4 T2 (inclusive) are considered. F3 F2 F1
  • 12. Data from Phase 1  Valid configuration specifications for extraction are created by Kenyon, one per timestamp where a transaction completed.  For each configuration extracted:  Author and log message of each transaction completing at that specification.  The configuration is placed on the filesystem.  A configuration delta for each consecutive pair of configurations processed can also be stored.
  • 13. Phase 2: Invoke Fact Extractors  Kenyon provides an abstract class that is used to invoke third-party fact extractors on the configuration extracted to the filesystem.  Kenyon users would subclass this class to invoke their own fact extractor.  Support for Codesurfer (line-level analysis) and SWAGKIT (procedure-level analysis) are provided with Kenyon. [www.grammatech.com, swag.uwaterloo.ca]  FactExtractor subclasses have a tri-modal return status: “failure”, “new data to store”, or “no new data to store”.
  • 14. Data from Phase 2  FactExtractor subclasses provide:  A ConfigGraph that maps software elements to nodes and static relationships to edges.  The graph, any node, and any edge may be attributed with static metrics.  Multiple fact extractors may be invoked on a single configuration: each created ConfigGraph is saved with a reference to the fact extractor that created it.  If a configuration has already been processed by a given fact extractor, it will not be processed again unless new metrics are to be calculated.
  • 15. Phase 3: Data Storage  Kenyon uses Hibernate to persist data classes.  Hibernate is an “object/relational persistence and query service for Java” [www.hibernate.org].  Allows reuse of Kenyon classes by research tools implemented in Java.  Each configuration processed by Kenyon is assigned to a Project, the top-level data class persisted by Kenyon.
  • 16. Persisted Kenyon Data • Projects contain one set of data for each configuration Project specification processed. 1 N • Each such data set N 1 ConfigGraph ConfigData contains one or more 1 1 ConfigGraphs, each 1 N produced by a different FactExtractor ConfigSpec FactExtractor. 1 2 • FactExtractors specify 1 1 what GraphSchema GraphSchema ConfigDelta subclass they use (not restrictive).
  • 17. Data Access  Hibernate allows access to preprocessed data using SQL or the Hibernate query methods (HQL, QBE/ QBC), which support class/field-based queries.  A Hibernate query returns a List of Objects, each of which is of the type originally persisted.  Data fields in the returned class are populated unless specified as lazily loaded.  Kenyon provides several convenience queries for common anticipated queries, such as “what configurations are available for this project”.
  • 18. Kenyon Usage  Kenyon processes data based on specifications in a configuration file  Start time, stop time, how often to process  Fact extractors and their assigned metric calculators.  SCM parameters, filesystem parameters, some control over what Hibernate persists.  A “processing run” will reuse any previously processed data if available  For example, if a ConfigGraph has already been created, if new metrics are necessary they are calculated and added to the existing ConfigGraph.
  • 19. Iterative Refinement Example  When looking for “interesting” timeframes of evolution, a multiple-pass process is recommended.  A user can configure Kenyon to process the changes in a system once per day.  Days with high activity or other metrics exceeding a threshold can be flagged as “interesting”.  The user can then configure Kenyon to process those days (via multiple processing runs) at the frequency of “every 20 minutes”.  This process can repeat down to the “every second” level.
  • 20. Parallel Preprocessing  Kenyon is a single-threaded process, but Hibernate supports multiple connections to a single Kenyon database.  A 10-year history can be processed in chunks by any number of computers, even if the processing configurations have overlapping times or different intervals.  Kenyon does not integrate the deltas between different processing runs, so a small overlap in processing chunks is suggested.
  • 21. Kenyon Architecture ConfigData Project Hibernate/DBMS ConfigGraph <<calls>> DataManager <<calls>> <<calls>> MetricLoader Fact Extractor SCMInterface <<calls>> SCM Filesystem Repository
  • 22. Current Status  Kenyon 1.2 available at http://kenyon.dforge.cse.ucsc.edu  Supports CVS, Subversion, and ClearCase  Students in 290G are performing projects using Kenyon this quarter  Actively working with Samsung to analyze some of their source code.
  • 23. Future Work (1/3)  Continue working with M. Kim  Evaluate usefulness of SCM-only module.  If she decides to use Kenyon, assist with full integration.  Finish integration of Beagle/Kenyon and IVA/Kenyon.  Work with G. Murphy on using Kenyon at UBC.  Evaluate Kenyon’s ability to reduce the time-to- results for static software evolution research by analyzing the seminar class projects.
  • 24. Future Work (2/3)  Support branch path traversal  Allow users to see the branch points in a system and specify a path for processing instead of a single branch.  Will reuse existing visualizations, must add specification mechanism.  Incorporate full language-specific containment models for better inter-language graph traversal and mapping.  Use M. Godfrey’s Java fact extractor and containment model.
  • 25. Future Work (3/3)  Support more of the Standard Exchange Formats for ConfigGraph export.  TA is already supported, but only the Fact sections. Schema sections should be improved to use the language-specific containment models.  Encourage other reseachers to use Kenyon, and improve results-sharing, capabilities, etc. based on their feedback.
  • 26. Open Issues (1/3)  The exact mechanism for allowing data sharing between researchers is not entirely controllable by Kenyon  Database setup and administration can effectively override much of Kenyon’s preferences.  By default, Kenyon-created tables are not mutable by processes other than Kenyon.
  • 27. Open Issues (2/3)  Kenyon provides a public class, EvolutionPath, that links a subgraph in one ConfigGraph to one in another ConfigGraph.  Directed and attributable.  Basic building block for evolution data.  Is currently persisted by Kenyon, will likely not be after 1.1, due to database mutability issues.  Other research projects can subclass and, if they want to share their results easily, persist them to a Hibernate database using the provided Hibernate mapping examples.
  • 28. Open Issues (3/3)  Kenyon is able to be automatically invoked via a post-commit script or a cron job.  Should Kenyon be able to be automatically invoked from an IDE?  What sort of support should Kenyon provide for better integration with, for example, Eclipse?
  • 29. Conclusions (1/2)  Kenyon is an engineering solution, designed to amortize the cost of the computationally expensive preprocessing steps that can benefit static software evolution research.  Research projects using Kenyon will not have to independently create solutions for these common problems.  18% code reduction in Beagle without really trying.  Is expected to reduce the lag between beginning system implementation and producing research results.
  • 30. Conclusions (2/2)  Kenyon is not intended to be a lightweight data mining system for software evolution research.  Tradeoff of speed vs. precision is still controllable via the choice of fact extractors.  The configuration extraction time and associated network lag already put the per-configuration time at O(seconds)  Instead, it allows the cost of time-consuming, computationally expensive preprocessing, to be amortized among researchers.
  • 31. Questions?  Kenyon was created primarily from code that existed in IVA, which is being funded by NSF grant CCR-01234603. Kenyon also contains code from Beagle, the origin analysis project overseen by Mike Godfrey.  Email jbevan@cs.ucsc.edu with future questions. http://www.cse.ucsc.edu/research/labs/grase/kenyon/