SlideShare a Scribd company logo
1 of 26
Tao Xie
North Carolina State University
with Dongmei Zhang (Microsoft ResearchAsia)
Xusheng Xiao (North Carolina State University)
Chunhua Weng (Columbia University)
RAISE 2013
Software analytics is to enable software
practitioners to perform data exploration and
analysis in order to obtain insightful and
actionable information for data-driven tasks
around software and services.
Dongmei Zhang,Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, andTao
Xie. Software Analytics as a Learning Case in Practice: Approaches and Experiences.
In Proc. MALETS 2011.
MSRA SoftwareAnalytics group founded in May 2009
Term coined/defined expanding scope of previous work
[Buse and Zimmermann, FoSER 10][Hassan and Xie, FoSER 10]
http://research.microsoft.com/en-us/groups/sa/
http://research.microsoft.com/en-us/news/features/softwareanalytics-052013.aspx
ICSE 2013
Research
Topics
Technology Pillars
Target
Audience
Connection to
Practice
Output
Software
Users
Software
Development
Process
Software
System
• Covering different areas of
software domain
• Throughout entire development
cycle
• Enabling practitioners to obtain
insights
Runtime traces
Program logs
System events
Perf counters
…
Usage log
User surveys
Online forum posts
Blog &Twitter
…
Source code
Bug history
Check-in history
Test cases
…
Developer
Tester
Program Manager
Usability engineer
Designer
Support engineer
Management personnel
Operation engineer
ICSE 2013
 Conveys meaningful and useful understanding or
knowledge towards completing the target task
 Not easily attainable via directly investigating raw
data without aid of analytics technologies
 Going from correlation to causality
 Examples
 It is easy to count the number of re-opened bugs, but how to
find out the primary reasons for these re-opened bugs?
 When the availability of an online service drops below a
threshold, how to localize the problem?
ICSE 2013
 Enables software practitioners to come up with
concrete solutions towards completing the target task
 Examples
 Why bugs were re-opened?
▪ A list of bug groups each with the same reason of re-
opening
 Why availability of online services dropped?
▪ A list of problematic areas with associated confidence
values
 Which part of my code should be refactored?
▪ A list of cloned code snippets easily explored from
different perspectives
Vertical
Horizontal
Information
Visualization
DataAnalysis
Algorithms
Large-scaleComputing
Software
Users
Software
Development
Process
Software
System
ICSE 2013
 SoftwareAnalytics is naturally tied with
software development practice
 Getting real
Real
Data
Real
Problems
Real
Users
Real
Tools
11
Pattern Matching
Bug update
Problematic
Pattern Repository
Bug Database
Trace analysis
Bug
filing
StackMine [Han et al. ICSE 12]
Trace StorageTrace collection
Internet
Shi Han,Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large
via Mining Millions of StackTraces. In Proc. ICSE 2012
How many issues are still
unknown?
Which trace file should I
investigate first?
Key to issue
discovery
Bottleneck of
scalability
“We believe that the MSRA tool is highly valuable and
much more efficient for mass trace (100+ traces) analysis.
For 1000 traces, we believe the tool saves us 4-6 weeks of
time to create new signatures, which is quite a significant
productivity boost.”
- from Development Manager inWindows
Highly effective new issue discovery on
Windows mini-hang
Continuous impact on futureWindows versions
Shi Han,Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large
via Mining Millions of StackTraces. In Proc. ICSE 2012
Dual Ends of the Road
13
Foundation: Science of Software Analytics?
From correlation to causality
Practice: Software Analytics
From pieces to a whole
Bring human in the loop
Make real impact in practice
Foundation Practice
Choose random system
component
Find vulnerability
Suggest defense
Analyze security or
test performance
Are we making
progress?
Positive aspect: most security research addresses real problems
@J. Mitchell
Systematization of Knowledge: An organized body of
knowledge gained through research
Ad hoc point solutions vs. general understanding
Repeating failures of the past with each new platform, type of
vulnerability
Scientific Method: System of acquiring knowledge based
on the scientific method
Process of hypothesis testing and experiments
Building abstractions and models, theorems
Universal Laws: Laws or theories that are predictive
Widely applicable
Make strong, quantitative predictions
@D. Evans, J. Mitchell
Percentage of bug-introducing changes for eclipse
Don’t program on Fridays ;-)
[Zimmermann et al. 05]
Failure is a 4-letterWord
[PROMISE’11 Zeller et al.]
From Correlation to Causality
Analytic techniques are often used for applications
that emphasize results over causation of the
findings
Users may choose to act on the behavior without
focus on understanding it (or its causation)
provided that the pattern has a high empirical
probability of correctly identifying an issue
E.g., smuggling, traveling with false documents,
or predicting winning stock
@L.Williams, M. Rappa
From Correlation to Causality cont.
Analytic techniques are often not used to
support the identification and advancement
of fundamental scientific principles based
upon an analysis of causation
Emphasize the use of analytics to advance
science (e.g., producing insights) besides the
use of analytics in providing just observations
@L.Williams, M. Rappa
Open Questions
How much science of a field (e.g., soft analytics)?
A field may be a means/solution in contrast
to a problem domain like “security”,
“design”
How can analytics/AI be used to help build
science of “X”?
How to move a field to a foundational level?
How to balance foundation and practice?
Dual Ends of the Road
21
Foundation Practice
Foundation: Science of Software Analytics?
From correlation to causality
Practice: Software Analytics
From pieces to a whole
Bring human in the loop
Make real impact in practice
Download counts
initial 20 months of release
Academic: 17,366
Industrial: 13,022
Total: 30,388
22
Released since 2008
Interesting
results
Actionable
results
vs.
Problem hunting
vs.
Problem driven
Open Questions
24
Who should bring software analytics research
results to the hands of practitioners?
How to do so?
Dual Ends of the Road
25
Foundation Practice
Foundation: Science of Software Analytics?
From correlation to causality
Practice: Software Analytics
From pieces to a whole
Bring human in the loop
Make real impact in practice
Questions ?
https://sites.google.com/site/asergrp/
taoxie@gmail.com
NSF grants CCF-0845272, CCF-0915400, CNS-0958235, ARO grant W911NF-08-1-0443, an
NSA Science of Security, Lablet grant, a NIST grant, a 2011 Microsoft Research SEIFAward

More Related Content

What's hot

Table of Content - International Journal of Managing Information Technology (...
Table of Content - International Journal of Managing Information Technology (...Table of Content - International Journal of Managing Information Technology (...
Table of Content - International Journal of Managing Information Technology (...IJMIT JOURNAL
 
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...Stephen Childs
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Tao Xie
 
Planning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchPlanning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchTao Xie
 
Past, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software DataPast, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software DataJeongwhan Choi
 
Datascienceindia article
Datascienceindia articleDatascienceindia article
Datascienceindia articleHimanshuPise1
 
Citi Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics PresentationCiti Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics PresentationMarquis Cabrera
 
Machine Learning in Healthcare: What's Now & What's Next
Machine Learning in Healthcare: What's Now & What's NextMachine Learning in Healthcare: What's Now & What's Next
Machine Learning in Healthcare: What's Now & What's NextPointClear Solutions
 
Understand the Demand of Analyst Opportunity in U.S
Understand the Demand of Analyst Opportunity in U.SUnderstand the Demand of Analyst Opportunity in U.S
Understand the Demand of Analyst Opportunity in U.SJiaming Zhang
 
Big Data Analytics and Advanced Computer Networking Scenarios
Big Data Analytics and Advanced Computer Networking ScenariosBig Data Analytics and Advanced Computer Networking Scenarios
Big Data Analytics and Advanced Computer Networking ScenariosStenio Fernandes
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentationnirvdrum
 
Using Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentUsing Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentEleanor Howe
 
What researchers want with regard to research data management (RDM)
What researchers want with regard to research data management (RDM)What researchers want with regard to research data management (RDM)
What researchers want with regard to research data management (RDM)heila1
 
Information Needs for Software Development Analytics
Information Needs for Software Development AnalyticsInformation Needs for Software Development Analytics
Information Needs for Software Development AnalyticsRay Buse
 
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)Uncertainty Quantification in Complex Physical Systems. (An Inroduction)
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)Ogechi Onuoha
 
Data Analytics.03. Data processing
Data Analytics.03. Data processingData Analytics.03. Data processing
Data Analytics.03. Data processingAlex Rayón Jerez
 

What's hot (20)

Table of Content - International Journal of Managing Information Technology (...
Table of Content - International Journal of Managing Information Technology (...Table of Content - International Journal of Managing Information Technology (...
Table of Content - International Journal of Managing Information Technology (...
 
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...
 
Planning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchPlanning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful Research
 
An overview of big data analytics
An overview of big data analytics An overview of big data analytics
An overview of big data analytics
 
Past, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software DataPast, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software Data
 
Datascienceindia article
Datascienceindia articleDatascienceindia article
Datascienceindia article
 
Citi Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics PresentationCiti Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics Presentation
 
Machine Learning in Healthcare: What's Now & What's Next
Machine Learning in Healthcare: What's Now & What's NextMachine Learning in Healthcare: What's Now & What's Next
Machine Learning in Healthcare: What's Now & What's Next
 
AI-SDV 2020: Kairntech
AI-SDV 2020: KairntechAI-SDV 2020: Kairntech
AI-SDV 2020: Kairntech
 
Understand the Demand of Analyst Opportunity in U.S
Understand the Demand of Analyst Opportunity in U.SUnderstand the Demand of Analyst Opportunity in U.S
Understand the Demand of Analyst Opportunity in U.S
 
Big Data Analytics and Advanced Computer Networking Scenarios
Big Data Analytics and Advanced Computer Networking ScenariosBig Data Analytics and Advanced Computer Networking Scenarios
Big Data Analytics and Advanced Computer Networking Scenarios
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentation
 
50 Years of Data Science
50 Years of Data Science50 Years of Data Science
50 Years of Data Science
 
SLA Summer 2008
SLA Summer 2008SLA Summer 2008
SLA Summer 2008
 
Using Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentUsing Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and development
 
What researchers want with regard to research data management (RDM)
What researchers want with regard to research data management (RDM)What researchers want with regard to research data management (RDM)
What researchers want with regard to research data management (RDM)
 
Information Needs for Software Development Analytics
Information Needs for Software Development AnalyticsInformation Needs for Software Development Analytics
Information Needs for Software Development Analytics
 
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)Uncertainty Quantification in Complex Physical Systems. (An Inroduction)
Uncertainty Quantification in Complex Physical Systems. (An Inroduction)
 
Data Analytics.03. Data processing
Data Analytics.03. Data processingData Analytics.03. Data processing
Data Analytics.03. Data processing
 

Similar to Advancing Foundation and Practice of Software Analytics

Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringTao Xie
 
The Emerging Role of Data Scientists on Software Developmen.docx
The Emerging Role of Data Scientists  on Software Developmen.docxThe Emerging Role of Data Scientists  on Software Developmen.docx
The Emerging Role of Data Scientists on Software Developmen.docxarnoldmeredith47041
 
The Emerging Role of Data Scientists on Software Developmen.docx
The Emerging Role of Data Scientists  on Software Developmen.docxThe Emerging Role of Data Scientists  on Software Developmen.docx
The Emerging Role of Data Scientists on Software Developmen.docxtodd701
 
Carol Harstad Research Proposal
Carol Harstad   Research ProposalCarol Harstad   Research Proposal
Carol Harstad Research ProposalCarol Harstad
 
Mindtrek 2015 - Tampere Finland
Mindtrek 2015 - Tampere Finland Mindtrek 2015 - Tampere Finland
Mindtrek 2015 - Tampere Finland Panos Fitsilis
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)Tao Xie
 
Databases set for scientific research.pptx
Databases set for scientific research.pptxDatabases set for scientific research.pptx
Databases set for scientific research.pptxzahraashouman
 
A Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisA Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisMichele Thomas
 
A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...
A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...
A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...Michael Mortenson
 
Data_Mining_for_Software_Engineering.pdf
Data_Mining_for_Software_Engineering.pdfData_Mining_for_Software_Engineering.pdf
Data_Mining_for_Software_Engineering.pdfassadabbas22
 
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...Michael Mortenson
 
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...
IRJET-  	  Comparative Analysis of Various Tools for Data Mining and Big Data...IRJET-  	  Comparative Analysis of Various Tools for Data Mining and Big Data...
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...IRJET Journal
 
MAT111–Spring2020 Name__________________________.docx
MAT111–Spring2020     Name__________________________.docxMAT111–Spring2020     Name__________________________.docx
MAT111–Spring2020 Name__________________________.docxalfredacavx97
 
New research articles 2018 november issue- international journal of softwar...
New research articles   2018 november issue- international journal of softwar...New research articles   2018 november issue- international journal of softwar...
New research articles 2018 november issue- international journal of softwar...ijseajournal
 
YASH DATA SCIENCE SEMINAR.pptx
YASH DATA SCIENCE SEMINAR.pptxYASH DATA SCIENCE SEMINAR.pptx
YASH DATA SCIENCE SEMINAR.pptxYashShiva3
 
Introduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfIntroduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfGraceOkeke3
 
The Road to Data-Informed Agile Development Processes
The Road to Data-Informed Agile Development ProcessesThe Road to Data-Informed Agile Development Processes
The Road to Data-Informed Agile Development ProcessesChristoph Matthies
 
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...ijseajournal
 

Similar to Advancing Foundation and Practice of Software Analytics (20)

Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software Engineering
 
Lopez
LopezLopez
Lopez
 
The Emerging Role of Data Scientists on Software Developmen.docx
The Emerging Role of Data Scientists  on Software Developmen.docxThe Emerging Role of Data Scientists  on Software Developmen.docx
The Emerging Role of Data Scientists on Software Developmen.docx
 
The Emerging Role of Data Scientists on Software Developmen.docx
The Emerging Role of Data Scientists  on Software Developmen.docxThe Emerging Role of Data Scientists  on Software Developmen.docx
The Emerging Role of Data Scientists on Software Developmen.docx
 
Carol Harstad Research Proposal
Carol Harstad   Research ProposalCarol Harstad   Research Proposal
Carol Harstad Research Proposal
 
Mindtrek 2015 - Tampere Finland
Mindtrek 2015 - Tampere Finland Mindtrek 2015 - Tampere Finland
Mindtrek 2015 - Tampere Finland
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
 
Databases set for scientific research.pptx
Databases set for scientific research.pptxDatabases set for scientific research.pptx
Databases set for scientific research.pptx
 
A Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisA Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And Analysis
 
A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...
A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...
A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...
 
Data_Mining_for_Software_Engineering.pdf
Data_Mining_for_Software_Engineering.pdfData_Mining_for_Software_Engineering.pdf
Data_Mining_for_Software_Engineering.pdf
 
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
 
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...
IRJET-  	  Comparative Analysis of Various Tools for Data Mining and Big Data...IRJET-  	  Comparative Analysis of Various Tools for Data Mining and Big Data...
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...
 
MAT111–Spring2020 Name__________________________.docx
MAT111–Spring2020     Name__________________________.docxMAT111–Spring2020     Name__________________________.docx
MAT111–Spring2020 Name__________________________.docx
 
ONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATION
ONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATIONONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATION
ONE HIDDEN LAYER ANFIS MODEL FOR OOS DEVELOPMENT EFFORT ESTIMATION
 
New research articles 2018 november issue- international journal of softwar...
New research articles   2018 november issue- international journal of softwar...New research articles   2018 november issue- international journal of softwar...
New research articles 2018 november issue- international journal of softwar...
 
YASH DATA SCIENCE SEMINAR.pptx
YASH DATA SCIENCE SEMINAR.pptxYASH DATA SCIENCE SEMINAR.pptx
YASH DATA SCIENCE SEMINAR.pptx
 
Introduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfIntroduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdf
 
The Road to Data-Informed Agile Development Processes
The Road to Data-Informed Agile Development ProcessesThe Road to Data-Informed Agile Development Processes
The Road to Data-Informed Agile Development Processes
 
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
 

More from Tao Xie

MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...Tao Xie
 
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...Tao Xie
 
Diversity and Computing/Engineering: Perspectives from Allies
Diversity and Computing/Engineering: Perspectives from AlliesDiversity and Computing/Engineering: Perspectives from Allies
Diversity and Computing/Engineering: Perspectives from AlliesTao Xie
 
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...Tao Xie
 
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchTao Xie
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...Tao Xie
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecurityTao Xie
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringTao Xie
 
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)Tao Xie
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTao Xie
 
Advances in Unit Testing: Theory and Practice
Advances in Unit Testing: Theory and PracticeAdvances in Unit Testing: Theory and Practice
Advances in Unit Testing: Theory and PracticeTao Xie
 
Common Technical Writing Issues
Common Technical Writing IssuesCommon Technical Writing Issues
Common Technical Writing IssuesTao Xie
 
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckHotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckTao Xie
 
Transferring Software Testing and Analytics Tools to Practice
Transferring Software Testing and Analytics Tools to PracticeTransferring Software Testing and Analytics Tools to Practice
Transferring Software Testing and Analytics Tools to PracticeTao Xie
 
User Expectations in Mobile App Security
User Expectations in Mobile App SecurityUser Expectations in Mobile App Security
User Expectations in Mobile App SecurityTao Xie
 
Impact-Driven Research on Software Engineering Tooling
Impact-Driven Research on Software Engineering ToolingImpact-Driven Research on Software Engineering Tooling
Impact-Driven Research on Software Engineering ToolingTao Xie
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesTao Xie
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software DatasetsTao Xie
 
Next Generation Developer Testing: Parameterized Testing
Next Generation Developer Testing: Parameterized TestingNext Generation Developer Testing: Parameterized Testing
Next Generation Developer Testing: Parameterized TestingTao Xie
 
Csise15 codehunt
Csise15 codehuntCsise15 codehunt
Csise15 codehuntTao Xie
 

More from Tao Xie (20)

MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
 
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
 
Diversity and Computing/Engineering: Perspectives from Allies
Diversity and Computing/Engineering: Perspectives from AlliesDiversity and Computing/Engineering: Perspectives from Allies
Diversity and Computing/Engineering: Perspectives from Allies
 
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
 
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
 
Advances in Unit Testing: Theory and Practice
Advances in Unit Testing: Theory and PracticeAdvances in Unit Testing: Theory and Practice
Advances in Unit Testing: Theory and Practice
 
Common Technical Writing Issues
Common Technical Writing IssuesCommon Technical Writing Issues
Common Technical Writing Issues
 
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckHotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
 
Transferring Software Testing and Analytics Tools to Practice
Transferring Software Testing and Analytics Tools to PracticeTransferring Software Testing and Analytics Tools to Practice
Transferring Software Testing and Analytics Tools to Practice
 
User Expectations in Mobile App Security
User Expectations in Mobile App SecurityUser Expectations in Mobile App Security
User Expectations in Mobile App Security
 
Impact-Driven Research on Software Engineering Tooling
Impact-Driven Research on Software Engineering ToolingImpact-Driven Research on Software Engineering Tooling
Impact-Driven Research on Software Engineering Tooling
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
Next Generation Developer Testing: Parameterized Testing
Next Generation Developer Testing: Parameterized TestingNext Generation Developer Testing: Parameterized Testing
Next Generation Developer Testing: Parameterized Testing
 
Csise15 codehunt
Csise15 codehuntCsise15 codehunt
Csise15 codehunt
 

Recently uploaded

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

Advancing Foundation and Practice of Software Analytics

  • 1. Tao Xie North Carolina State University with Dongmei Zhang (Microsoft ResearchAsia) Xusheng Xiao (North Carolina State University) Chunhua Weng (Columbia University) RAISE 2013
  • 2. Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services. Dongmei Zhang,Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, andTao Xie. Software Analytics as a Learning Case in Practice: Approaches and Experiences. In Proc. MALETS 2011. MSRA SoftwareAnalytics group founded in May 2009 Term coined/defined expanding scope of previous work [Buse and Zimmermann, FoSER 10][Hassan and Xie, FoSER 10] http://research.microsoft.com/en-us/groups/sa/ http://research.microsoft.com/en-us/news/features/softwareanalytics-052013.aspx
  • 4. Software Users Software Development Process Software System • Covering different areas of software domain • Throughout entire development cycle • Enabling practitioners to obtain insights
  • 5. Runtime traces Program logs System events Perf counters … Usage log User surveys Online forum posts Blog &Twitter … Source code Bug history Check-in history Test cases …
  • 6. Developer Tester Program Manager Usability engineer Designer Support engineer Management personnel Operation engineer
  • 7. ICSE 2013  Conveys meaningful and useful understanding or knowledge towards completing the target task  Not easily attainable via directly investigating raw data without aid of analytics technologies  Going from correlation to causality  Examples  It is easy to count the number of re-opened bugs, but how to find out the primary reasons for these re-opened bugs?  When the availability of an online service drops below a threshold, how to localize the problem?
  • 8. ICSE 2013  Enables software practitioners to come up with concrete solutions towards completing the target task  Examples  Why bugs were re-opened? ▪ A list of bug groups each with the same reason of re- opening  Why availability of online services dropped? ▪ A list of problematic areas with associated confidence values  Which part of my code should be refactored? ▪ A list of cloned code snippets easily explored from different perspectives
  • 10. ICSE 2013  SoftwareAnalytics is naturally tied with software development practice  Getting real Real Data Real Problems Real Users Real Tools
  • 11. 11 Pattern Matching Bug update Problematic Pattern Repository Bug Database Trace analysis Bug filing StackMine [Han et al. ICSE 12] Trace StorageTrace collection Internet Shi Han,Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of StackTraces. In Proc. ICSE 2012 How many issues are still unknown? Which trace file should I investigate first? Key to issue discovery Bottleneck of scalability
  • 12. “We believe that the MSRA tool is highly valuable and much more efficient for mass trace (100+ traces) analysis. For 1000 traces, we believe the tool saves us 4-6 weeks of time to create new signatures, which is quite a significant productivity boost.” - from Development Manager inWindows Highly effective new issue discovery on Windows mini-hang Continuous impact on futureWindows versions Shi Han,Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of StackTraces. In Proc. ICSE 2012
  • 13. Dual Ends of the Road 13 Foundation: Science of Software Analytics? From correlation to causality Practice: Software Analytics From pieces to a whole Bring human in the loop Make real impact in practice Foundation Practice
  • 14. Choose random system component Find vulnerability Suggest defense Analyze security or test performance Are we making progress? Positive aspect: most security research addresses real problems @J. Mitchell
  • 15. Systematization of Knowledge: An organized body of knowledge gained through research Ad hoc point solutions vs. general understanding Repeating failures of the past with each new platform, type of vulnerability Scientific Method: System of acquiring knowledge based on the scientific method Process of hypothesis testing and experiments Building abstractions and models, theorems Universal Laws: Laws or theories that are predictive Widely applicable Make strong, quantitative predictions @D. Evans, J. Mitchell
  • 16. Percentage of bug-introducing changes for eclipse Don’t program on Fridays ;-) [Zimmermann et al. 05]
  • 17. Failure is a 4-letterWord [PROMISE’11 Zeller et al.]
  • 18. From Correlation to Causality Analytic techniques are often used for applications that emphasize results over causation of the findings Users may choose to act on the behavior without focus on understanding it (or its causation) provided that the pattern has a high empirical probability of correctly identifying an issue E.g., smuggling, traveling with false documents, or predicting winning stock @L.Williams, M. Rappa
  • 19. From Correlation to Causality cont. Analytic techniques are often not used to support the identification and advancement of fundamental scientific principles based upon an analysis of causation Emphasize the use of analytics to advance science (e.g., producing insights) besides the use of analytics in providing just observations @L.Williams, M. Rappa
  • 20. Open Questions How much science of a field (e.g., soft analytics)? A field may be a means/solution in contrast to a problem domain like “security”, “design” How can analytics/AI be used to help build science of “X”? How to move a field to a foundational level? How to balance foundation and practice?
  • 21. Dual Ends of the Road 21 Foundation Practice Foundation: Science of Software Analytics? From correlation to causality Practice: Software Analytics From pieces to a whole Bring human in the loop Make real impact in practice
  • 22. Download counts initial 20 months of release Academic: 17,366 Industrial: 13,022 Total: 30,388 22 Released since 2008
  • 24. Open Questions 24 Who should bring software analytics research results to the hands of practitioners? How to do so?
  • 25. Dual Ends of the Road 25 Foundation Practice Foundation: Science of Software Analytics? From correlation to causality Practice: Software Analytics From pieces to a whole Bring human in the loop Make real impact in practice
  • 26. Questions ? https://sites.google.com/site/asergrp/ taoxie@gmail.com NSF grants CCF-0845272, CCF-0915400, CNS-0958235, ARO grant W911NF-08-1-0443, an NSA Science of Security, Lablet grant, a NIST grant, a 2011 Microsoft Research SEIFAward

Editor's Notes

  1. http://stackoverflow.com/questions/3165483/why-pex-is-not-massive