08448380779 Call Girls In Greater Kailash - I Women Seeking Men
An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis
1. Layers
An Adaptive Filter-Framework for the
Quality Improvement of Open-Source
Software Analysis
Advanced Community Information Systems (ACIS)
RWTH Aachen University, Germany
Anna Hannemann, Michael Hackstein, Ralf
Klamma, Matthias Jarke
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
1 This slide deck is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
2. Open Source Software Projects
Layers Community-driven Development
Voluntary participation
Communication, project management and
development via Web tools
Some successful and famous examples
Smaller niche projects
A long-tail of unsuccessful projects
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
2
3. Open Source Software Analysis for
Software Engineering
Layers Understand, model, simulate and organize
community-driven development
Agile development practices
Distributed and intercultural practices
New success factors
Long-term freely available datasets
Low cost empirical studies
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
3
4. Open Source Software Analysis
Research Results
Layers
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
4 Scacchi, “The Future Research in Free/Open Source Software Development”, 2010
5. Techniques for Knowledge Mining in
Development Repositories
Layers
Results
are only as good as data is!
Remember DNA Phantom?
“A hypothesized unknown female serial killer as a result of
contaminated cotton swabs used for collecting DNA”
MineData not Noise!
Cleaning of Artifacts from Communication and
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
Development Repositories Needed
5
6. Data Cleaning for Knowledge Mining
in Development Repositories
Layers
Data-structure independence: variable artifacts types
Additive filtering: filter only new data
Filter nesting: sequence of arbitrary order
Consistent data format: cross-medium analysis
Consistent and easy-to-use interface
Extensibility: continuous evolution
Adaptive database insertion
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
6
7. Adaptive-Filtering Approach
Cross-Media Mapping
Layers
Artifact types
Mail
Comment
Post
...
Cross-media mapping
Assignment of semantic meaning to artifact elements
Extensibility to new data sources
Lehrstuhl Informatik 5
(Information Systems) Same filters for different data
Prof. Dr. M. Jarke
7
8. Adaptive-Filtering Approach
Filter Nesting
Layers Sequence of filters F1, F2, …, FN
Results in same predefined format
One filter – one cleaning (analysis) task
Each filter triggers its predecessor
Complex filter as a combination of several filters
Filtering triggered on demand
Filtering of a subset possible
Simple filters first and than analysis of reduced data
Lehrstuhl Informatik 5
(Information Systems)
set with more filters of higher complexity
Prof. Dr. M. Jarke
8
9. Adaptive-Filtering Approach
Multi-Threading
Layers
Only new data is filtered
Asynchronous processing: filtered data subset is
provided directly to the next analysis task
Lehrstuhl Informatik 5
Synchronous processing: wait till the complete data
set is filtered
(Information Systems)
Prof. Dr. M. Jarke
9
10. Dataset Reduction and Content
Cleaning Filters
Layers Dataset Reduction Filter (DRF)
– Reduces amount of artifacts
– Select artifacts, which fulfill certain criteria
– Example
– Spam detection
– Artifact classification based on Bayes Decision Rule
Content Cleaning Filter (CRF)
– Modifies content of artifacts
– Example
Lehrstuhl Informatik 5
– Quotation Filter
(Information Systems)
Prof. Dr. M. Jarke
10
– Detection of predefined patterns in content
11. Artifact Transformation Filters
Layers Filter
as analysis task
Modifies artifact attributes
Example:
– Core-Periphery Filter: Separates
core of community from periphery
– Hierarchical clustering based on
power law distribution
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
11
12. Validation in BioJava, Biopython and
BioPerl OSS: Spam Detection
Layers
BioJava
Spam and spammer level in mailing lists of OSS
Significant amount (up to 60%)
Non-monoton
Distortion of dynamics
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
12
13. Validation in BioJava, Biopython and
BioPerl OSS: Results Distortion
Layers
Year 2004, BioJava
Mood within project community
Summarized sentiment of project Mails per month
Positive sentiment of spam advertisement
Incorrect sentiment assignment due to quotation
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
13
14. Adaptive Filter-Framework and OSS
Analysis
OSS Analysis for SE
Layers
– Methods/metrics for knowledge mining in company
communication and development repositories
– Understanding of community-oriented development:
principles, obstacles and advantages
! Data Cleaning: Results are only as good as data is!
Adaptive Filter-Framework
– Significant noise level in data
– Adaptable for any Web artifact format
Lehrstuhl Informatik 5
– Filter nesting
(Information Systems)
Prof. Dr. M. Jarke
14 – Filter as analysis method