Vision Statement Presentation on "Advancing Foundation & Practice of Software Analytics" at the 2nd International NSF sponsored Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE 2013) http://promisedata.org/raise/2013/
Advancing Foundation and Practice of Software Analytics
1. Tao Xie
North Carolina State University
with Dongmei Zhang (Microsoft ResearchAsia)
Xusheng Xiao (North Carolina State University)
Chunhua Weng (Columbia University)
RAISE 2013
2. Software analytics is to enable software
practitioners to perform data exploration and
analysis in order to obtain insightful and
actionable information for data-driven tasks
around software and services.
Dongmei Zhang,Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, andTao
Xie. Software Analytics as a Learning Case in Practice: Approaches and Experiences.
In Proc. MALETS 2011.
MSRA SoftwareAnalytics group founded in May 2009
Term coined/defined expanding scope of previous work
[Buse and Zimmermann, FoSER 10][Hassan and Xie, FoSER 10]
http://research.microsoft.com/en-us/groups/sa/
http://research.microsoft.com/en-us/news/features/softwareanalytics-052013.aspx
5. Runtime traces
Program logs
System events
Perf counters
…
Usage log
User surveys
Online forum posts
Blog &Twitter
…
Source code
Bug history
Check-in history
Test cases
…
7. ICSE 2013
Conveys meaningful and useful understanding or
knowledge towards completing the target task
Not easily attainable via directly investigating raw
data without aid of analytics technologies
Going from correlation to causality
Examples
It is easy to count the number of re-opened bugs, but how to
find out the primary reasons for these re-opened bugs?
When the availability of an online service drops below a
threshold, how to localize the problem?
8. ICSE 2013
Enables software practitioners to come up with
concrete solutions towards completing the target task
Examples
Why bugs were re-opened?
▪ A list of bug groups each with the same reason of re-
opening
Why availability of online services dropped?
▪ A list of problematic areas with associated confidence
values
Which part of my code should be refactored?
▪ A list of cloned code snippets easily explored from
different perspectives
10. ICSE 2013
SoftwareAnalytics is naturally tied with
software development practice
Getting real
Real
Data
Real
Problems
Real
Users
Real
Tools
11. 11
Pattern Matching
Bug update
Problematic
Pattern Repository
Bug Database
Trace analysis
Bug
filing
StackMine [Han et al. ICSE 12]
Trace StorageTrace collection
Internet
Shi Han,Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large
via Mining Millions of StackTraces. In Proc. ICSE 2012
How many issues are still
unknown?
Which trace file should I
investigate first?
Key to issue
discovery
Bottleneck of
scalability
12. “We believe that the MSRA tool is highly valuable and
much more efficient for mass trace (100+ traces) analysis.
For 1000 traces, we believe the tool saves us 4-6 weeks of
time to create new signatures, which is quite a significant
productivity boost.”
- from Development Manager inWindows
Highly effective new issue discovery on
Windows mini-hang
Continuous impact on futureWindows versions
Shi Han,Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large
via Mining Millions of StackTraces. In Proc. ICSE 2012
13. Dual Ends of the Road
13
Foundation: Science of Software Analytics?
From correlation to causality
Practice: Software Analytics
From pieces to a whole
Bring human in the loop
Make real impact in practice
Foundation Practice
14. Choose random system
component
Find vulnerability
Suggest defense
Analyze security or
test performance
Are we making
progress?
Positive aspect: most security research addresses real problems
@J. Mitchell
15. Systematization of Knowledge: An organized body of
knowledge gained through research
Ad hoc point solutions vs. general understanding
Repeating failures of the past with each new platform, type of
vulnerability
Scientific Method: System of acquiring knowledge based
on the scientific method
Process of hypothesis testing and experiments
Building abstractions and models, theorems
Universal Laws: Laws or theories that are predictive
Widely applicable
Make strong, quantitative predictions
@D. Evans, J. Mitchell
17. Failure is a 4-letterWord
[PROMISE’11 Zeller et al.]
18. From Correlation to Causality
Analytic techniques are often used for applications
that emphasize results over causation of the
findings
Users may choose to act on the behavior without
focus on understanding it (or its causation)
provided that the pattern has a high empirical
probability of correctly identifying an issue
E.g., smuggling, traveling with false documents,
or predicting winning stock
@L.Williams, M. Rappa
19. From Correlation to Causality cont.
Analytic techniques are often not used to
support the identification and advancement
of fundamental scientific principles based
upon an analysis of causation
Emphasize the use of analytics to advance
science (e.g., producing insights) besides the
use of analytics in providing just observations
@L.Williams, M. Rappa
20. Open Questions
How much science of a field (e.g., soft analytics)?
A field may be a means/solution in contrast
to a problem domain like “security”,
“design”
How can analytics/AI be used to help build
science of “X”?
How to move a field to a foundational level?
How to balance foundation and practice?
21. Dual Ends of the Road
21
Foundation Practice
Foundation: Science of Software Analytics?
From correlation to causality
Practice: Software Analytics
From pieces to a whole
Bring human in the loop
Make real impact in practice
22. Download counts
initial 20 months of release
Academic: 17,366
Industrial: 13,022
Total: 30,388
22
Released since 2008
24. Open Questions
24
Who should bring software analytics research
results to the hands of practitioners?
How to do so?
25. Dual Ends of the Road
25
Foundation Practice
Foundation: Science of Software Analytics?
From correlation to causality
Practice: Software Analytics
From pieces to a whole
Bring human in the loop
Make real impact in practice