SlideShare uma empresa Scribd logo
1 de 85
Baixar para ler offline
Software Analytics:
Data Analytics for Software Engineering
Achievements and Opportunities
Tao Xie
Department of Computer Science
University of Illinois at Urbana-Champaign, USA
taoxie@illinois.edu
In Collaboration with Microsoft Research; Most Slides Adapted from Slides Co-Presented with
Dongmei Zhang (Microsoft Research, http://research.microsoft.com/en-us/groups/sa/)
New Era
Software itself is changing...
Software Services
How people use software is changing

Individual Isolated
Not much data/content
generation
How people use software is changing

How people use software is changing

Individual Isolated
Not much data/content
generation
How people use software is changing

Individual
Social
Isolated
Not much data/content
generation
Collaborative
Huge amount of data/artifacts
generated anywhere anytime
How software is built & operated is changing

How software is built & operated is changing

Data pervasive
Long product cycle
Experience & gut-feeling
In-lab testing
Informed decision making
Centralized development
Code centric
Debugging in the large
Distributed development
Continuous release

 

How software is built & operated is changing

Data pervasive
Long product cycle
Experience & gut-feeling
In-lab testing
Informed decision making
Centralized development
Code centric
Debugging in the large
Distributed development
Continuous release

 

Software Analytics
Software analytics is to enable software
practitioners to perform data exploration and
analysis in order to obtain insightful and
actionable information for data-driven tasks
around software and services.
Dongmei Zhang, Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, and Tao Xie. Software
Analytics as a Learning Case in Practice: Approaches and Experiences. In MALETS 2011
http://research.microsoft.com/en-us/groups/sa/malets11-analytics.pdf
Software Analytics
Software analytics is to enable software
practitioners to perform data exploration and
analysis in order to obtain insightful and
actionable information for data-driven tasks
around software and services.
http://research.microsoft.com/en-us/groups/sa/
http://research.microsoft.com/en-us/news/features/softwareanalytics-052013.aspx
Research topics – the trinity view
Software
Users
Software
Development
Process
Software
System
‱ Covering different areas of
software domain
‱ Throughout entire development
cycle
‱ Enabling practitioners to obtain
insights
Data sources
Runtime traces
Program logs
System events
Perf counters


Usage log
User surveys
Online forum posts
Blog & Twitter


Source code
Bug history
Check-in history
Test cases
Eye tracking
MRI/EMG


Target audience – software practitioners
Target audience – software practitioners
Developer
Tester
Target audience – software practitioners
Developer
Tester
Program Manager
Usability engineer
Designer
Support engineer
Management personnel
Operation engineer
Output – insightful information
‱ Conveys meaningful and useful understanding or
knowledge towards completing the target task
‱ Not easily attainable via directly investigating raw data
without aid of analytics technologies
‱ Examples
– It is easy to count the number of re-opened bugs, but how to
find out the primary reasons for these re-opened bugs?
– When the availability of an online service drops below a
threshold, how to localize the problem?
Output – actionable information
‱ Enables software practitioners to come up with concrete
solutions towards completing the target task
‱ Examples
– Why bugs were re-opened?
‱ A list of bug groups each with the same reason of re-
opening
– Why availability of online services dropped?
‱ A list of problematic areas with associated confidence values
Research topics & technology pillars
Software
Users
Software
Development
Process
Software
System
Vertical
Horizontal
Information Visualization
Data Analysis Algorithms
Large-scale Computing
Connection to practice
‱ Software Analytics is naturally tied with software
development practice
‱ Getting real
Connection to practice
‱ Software Analytics is naturally tied with software
development practice
‱ Getting real
Real
Data
Real
Problems
Real
Users
Real
Tools
Outline
‱ Software Engineering for Big Data
‱ Overview of Software Analytics
‱ Selected Projects
– StackMine: Performance debugging in the large
– XIAO: Scalable code clone analysis
– SAS: Incident management of online services
– Code Hunt/Pex4Fun: Teaching & learning via interactive gaming
StackMine
Performance debugging in the large
via mining millions of stack traces
http://research.microsoft.com/en-us/groups/sa/stackmine_icse2012.pdf
Performance issues in the real world
‱ One of top user complaints
‱ Impacting large number of users every day
‱ High impact on usability and productivity
High Disk I/O High CPU consumption
Performance issues in the real world
‱ One of top user complaints
‱ Impacting large number of users every day
‱ High impact on usability and productivity
High Disk I/O High CPU consumption
As modern software systems tend to get more and more complex,
given limited time and resource before software release, development-
site testing and debugging become more and more insufficient to
ensure satisfactory software performance.
Performance debugging in the large
Performance debugging in the large
Trace Storage
Trace collection
Network
Performance debugging in the large
Trace Storage
Trace collection
Network
Trace analysis
Performance debugging in the large
Trace Storage
Trace collection
Bug Database
Network
Trace analysis
Bug filing
Performance debugging in the large
Trace Storage
Trace collection
Problematic Pattern
Repository Bug Database
Network
Trace analysis
Bug filing
Performance debugging in the large
Pattern Matching
Trace Storage
Trace collection
Bug update
Problematic Pattern
Repository Bug Database
Network
Trace analysis
Bug filing
Performance debugging in the large
Pattern Matching
Trace Storage
Trace collection
Bug update
Problematic Pattern
Repository Bug Database
Network
Trace analysis
Bug filing
Key to issue
discovery
Performance debugging in the large
Pattern Matching
Trace Storage
Trace collection
Bug update
Problematic Pattern
Repository Bug Database
Network
Trace analysis
Bug filing
Key to issue
discovery
Bottleneck of
scalability
Performance debugging in the large
Pattern Matching
Trace Storage
Trace collection
Bug update
Problematic Pattern
Repository Bug Database
Network
Trace analysis
How many issues are
still unknown?
Bug filing
Key to issue
discovery
Bottleneck of
scalability
Performance debugging in the large
Pattern Matching
Trace Storage
Trace collection
Bug update
Problematic Pattern
Repository Bug Database
Network
Trace analysis
How many issues are
still unknown?
Which trace file should I
investigate first?
Bug filing
Key to issue
discovery
Bottleneck of
scalability
Problem definition
Given OS traces collected from tens of thousands
(potentially millions) of users,
help domain experts identify impactful program execution
patterns
(that cause the most impactful underlying performance problems)
with limited time and resource.
Challenges
Combination of expertise
‱ Generic machine learning tools without domain
knowledge guidance do not work well
Highly complex analysis
‱ Numerous program runtime combinations triggering
performance problems
‱ Multi-layer runtime components from application to
kernel being intertwined
Large-scale trace data
‱ TBs of trace files and increasing
‱ Millions of events in single trace streamInternet
Technical highlights
‱ Machine learning for system domain
– Formulate the discovery of problematic execution patterns as
callstack mining & clustering
– Systematic mechanism to incorporate domain knowledge
‱ Interactive performance analysis system
– Parallel mining infrastructure based on HPC + MPI
– Visualization aided interactive exploration
Impact
“We believe that the MSRA tool is highly valuable and much more
efficient for mass trace (100+ traces) analysis. For 1000 traces, we
believe the tool saves us 4-6 weeks of time to create new signatures,
which is quite a significant productivity boost.”
Highly effective new issue discovery on Windows
mini-hang
Continuous impact on future Windows
versions
XIAO
Scalable code clone analysis
2012
http://research.microsoft.com/jump/175199
XIAO: Code Clone Analysis
‱ Motivation
– Copy-and-paste is a common developer behavior
– A real tool widely adopted internally and externally
‱ XIAO enables code clone analysis in the following way
– High tunability
– High scalability
– High compatibility
– High explorability
High tunability – what you tune is what you get
‱ Intuitive similarity metric: effective control of the
degree of syntactical differences between two code
snippets
for (i = 0; i < n; i ++) {
a ++;
b ++;
c = foo(a, b);
d = bar(a, b, c);
e = a + c; }
for (i = 0; i < n; i ++) {
c = foo(a, b);
a ++;
b ++;
d = bar(a, b, c);
e = a + d;
e ++; }
High scalability
‱ Four-step analysis process
‱ Easily parallelizable based on source code partition
Pre-processing
Pruning Fine Matching
Coarse Matching
High compatibility
‱ Compiler independent
‱ Light-weight built-in parsers for C/C++ and C#
‱ Open architecture for plug-in parsers to support
different languages
‱ Easy adoption by product teams
– Different build environment
– Almost zero cost for trial
High explorability
1. Clone navigation based on source tree hierarchy
2. Pivoting of folder level statistics
3. Folder level statistics
4. Clone function list in selected folder
5. Clone function filters
6. Sorting by bug or refactoring potential
7. Tagging
1 2 3 4 5 6
7
1. Block correspondence
2. Block types
3. Block navigation
4. Copying
5. Bug filing
6. Tagging
1
2
3
4
1
6
5
Scenarios & Solutions
Quality gates at milestones
‱ Architecture refactoring
‱ Code clone clean up
‱ Bug fixing
Post-release maintenance
‱ Security bug investigation
‱ Bug investigation for sustained engineering
Development and testing
‱ Checking for similar issues before check-in
‱ Reference info for code review
‱ Supporting tool for bug triage
Online code clone search
Offline code clone analysis
Benefiting developer community
Available in Visual Studio 2012 RC
Searching similar snippets
for fixing bug once
Finding refactoring
opportunity
More secure Microsoft products
Code Clone Search service integrated into
workflow of Microsoft Security Response Center
Over 590 million lines of code indexed across
multiple products
Real security issues proactively identified and
addressed
Example – MS Security Bulletin MS12-034
Combined Security Update for Microsoft Office, Windows, .NET Framework, and
Silverlight, published: Tuesday, May 08, 2012
3 publicly disclosed vulnerabilities and seven privately reported involved. Specifically,
one is exploited by the Duqu malware to execute arbitrary code when a user opened
a malicious Office document
Insufficient bounds check within the font parsing subsystem of win32k.sys
Cloned copy in gdiplus.dll, ogl.dll (office), Silver Light, Windows Journal viewer
Microsoft Technet Blog about this bulletin
However, we wanted to be sure to address the vulnerable code wherever it appeared
across the Microsoft code base. To that end, we have been working with Microsoft
Research to develop a “Cloned Code Detection” system that we can run for every
MSRC case to find any instance of the vulnerable code in any shipping product. This
system is the one that found several of the copies of CVE-2011-3402 that we are
now addressing with MS12-034.
SAS
Incident management of online services
http://research.microsoft.com/apps/pubs/?id=202451
Motivation
Incident Management (IcM) is a critical task to
assure service quality
‱ Online services are increasingly popular & important
‱ High service quality is the key
Incident Management: Workflow
Detect a
service
issue
Alert On-
Call
Engineers
(OCEs)
Investigate
the problem
Restore
the
service
Fix root cause
via
postmortem
analysis
Incident Management: Characteristics
Shrink-Wrapped
Software Debugging
Root Cause and Fix
Debugger
Controlled
Environment
Online Service
Incident
Management
Workaround
No Debugger
Live Data
Incident Management: Challenges
Large volume and noisy data
Highly complex problem space
No knowledge of entire system
Knowledge not well organized
SAS: Incident management of online services
SAS, developed and deployed to effectively reduce MTTR
(Mean Time To Restore) via automatically analyzing
monitoring data
5
5
 Design Principle of SAS
 Automating Analysis
 Handling Heterogeneity
 Accumulating Knowledge
 Supporting human-in-the-loop (HITL)
Techniques Overview
‱ System metrics
– Identifying Incident Beacons
‱ Transaction logs
– Mining Suspicious Execution Patterns
‱ Historical incidents
– Mining Historical Workaround Solutions
Industry Impact of SAS
Deployment
‱ SAS deployed to
worldwide datacenters for
Service X (serving
hundreds of millions of
users) since June 2011
‱ OCEs now heavily depend
on SAS
Usage
‱ SAS helped successfully
diagnose ~76% of the
service incidents assisted
with SAS
Code Hunt/Pex4Fun
Teaching and learning via interactive gaming
http://research.microsoft.com/pubs/189240/icse13see-pex4fun.pdf
http://pex4fun.com/https://www.codehunt.com/
Coding Duels
1,841,880 clicked 'Ask Pex!'
http://pex4fun.com/
Code Hunt: Resigned As Game
https://www.codehunt.com/
Behind the Scene
Secret Implementation
class Secret {
public static int Puzzle(int
x) {
if (x <= 0) return 1;
return x * Puzzle(x-1);
}
}
Player Implementation
class Player {
public static int Puzzle(int x) {
return x;
}
}
class Test {
public static void Driver(int x) {
if (Secret.Puzzle(x) != Player.Puzzle(x))
throw new Exception(“Mismatch”);
}
}
behavior
Secret Impl == Player Impl
62
Coding Duels: Fun and Engaging
Iterative gameplay
Adaptive
Personalized
No cheating
Clear winning criterion
Teaching and Learning
Coding Duels for Course Assignments
@Grad Software Engineering Course
http://pexforfun.com/gradsofteng
Observed Benefits
‱ Automatic Grading
‱ Real-time Feedback (for Both Students and Teachers)
‱ Fun Learning Experiences
Coding Duel Competition
@ICSE 2011
http://pexforfun.com/icse2011
Example User Feedback
“It really got me *excited*. The part that got me most
is about spreading interest in teaching CS: I do think
that it’s REALLY great for teaching | learning!”
“I used to love the first person shooters and
the satisfaction of blowing away a whole team
of Noobies playing Rainbow Six, but this is far
more fun.”
“I’m afraid I’ll have to constrain myself to spend just an
hour or so a day on this really exciting stuff, as I’m
really stuffed with work.”
Released since 2010
X
Usage Scenarios
Students: proceed through a sequence on puzzles to
learn and practice.
Educators: exercise different parts of a curriculum,
and track students’ progress
Recruiters: use contests to inspire communities and
results to guide hiring
Researchers: mine extensive data in Azure to evaluate
how people code and learn
http://taoxie.cs.illinois.edu/publications/icse15jseet-codehunt.pdf
ICSE 2015 JSEET:
http://taoxie.cs.illinois.edu/publications/icse13see-pex4fun.pdf
ICSE 2013 SEE:
Conclusion
Software
Users
Software
Development
Process
Software
System
Vertical
Horizontal
Information Visualization
Data Analysis Algorithms
Large-scale Computing
Q & A
http://taoxie.cs.illinois.edu/
Contact: taoxie@illinois.edu
Conclusion
Software
Users
Software
Development
Process
Software
System
Vertical
Horizontal
Information Visualization
Data Analysis Algorithms
Large-scale Computing
Code Hunt Programming Game
Code Hunt Programming Game
Code Hunt Programming Game
Code Hunt Programming Game
Code Hunt Programming Game
Code Hunt Programming Game
Code Hunt Programming Game
Code Hunt Programming Game
Code Hunt Programming Game
Code Hunt Programming Game
Code Hunt Programming Game
Code Hunt Programming Game
Code Hunt Programming Game
Leaderboard and Dashboard
Publically visible, updated during the contest
Visible only to the organizer
It’s a game!
iterative gameplay
adaptive
personalized
no cheating
clear winning criterion
code
test cases

Mais conteĂșdo relacionado

Mais procurados

Text mining
Text miningText mining
Text miningKoshy Geoji
 
Data Reduction Stratergies
Data Reduction StratergiesData Reduction Stratergies
Data Reduction StratergiesAnjaliSoorej
 
Student Management System
Student Management System Student Management System
Student Management System Vinay Yadav
 
Edusec college management software
Edusec college management softwareEdusec college management software
Edusec college management softwareRudra Softech
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data ScienceJason Geng
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.Institute of Technology Telkom
 
Online Admission System
Online Admission System  Online Admission System
Online Admission System Laukesh Jaishwal
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
 
Data mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarityData mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarityRushali Deshmukh
 
Applications of Machine Learning
Applications of Machine LearningApplications of Machine Learning
Applications of Machine LearningHayim Makabee
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text MiningHemant Sharma
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reductionKrish_ver2
 
Student Management System best PPT
Student Management System best PPTStudent Management System best PPT
Student Management System best PPTDheeraj Kumar tiwari
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
Web scraping in python
Web scraping in python Web scraping in python
Web scraping in python Viren Rajput
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining TasksMahmoud Alfarra
 

Mais procurados (20)

Text mining
Text miningText mining
Text mining
 
Data Reduction Stratergies
Data Reduction StratergiesData Reduction Stratergies
Data Reduction Stratergies
 
Student Management System
Student Management System Student Management System
Student Management System
 
What is Machine Learning
What is Machine LearningWhat is Machine Learning
What is Machine Learning
 
Edusec college management software
Edusec college management softwareEdusec college management software
Edusec college management software
 
Big Data
Big DataBig Data
Big Data
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Online Admission System
Online Admission System  Online Admission System
Online Admission System
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Data mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarityData mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarity
 
Applications of Machine Learning
Applications of Machine LearningApplications of Machine Learning
Applications of Machine Learning
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
Student Management System best PPT
Student Management System best PPTStudent Management System best PPT
Student Management System best PPT
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Kdd process
Kdd processKdd process
Kdd process
 
Web scraping in python
Web scraping in python Web scraping in python
Web scraping in python
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
 

Semelhante a Software Analytics: Data Analytics for Software Engineering

Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesTao Xie
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecurityTao Xie
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)Tao Xie
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software DatasetsTao Xie
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
 
2014 01-ticosa
2014 01-ticosa2014 01-ticosa
2014 01-ticosaPharo
 
Software Ecosystems = Big Data
Software Ecosystems = Big DataSoftware Ecosystems = Big Data
Software Ecosystems = Big DataTom Mens
 
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...Denim Group
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTao Xie
 
Continuous Software Engineering - A tutorial
Continuous Software Engineering - A tutorialContinuous Software Engineering - A tutorial
Continuous Software Engineering - A tutorialBreno de França
 
Case study
Case studyCase study
Case studykaran saini
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPaige_Roberts
 
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Memoori
 
Innovate Better Through Machine data Analytics
Innovate Better Through Machine data AnalyticsInnovate Better Through Machine data Analytics
Innovate Better Through Machine data AnalyticsHal Rottenberg
 
Software systems engineering PRINCIPLES
Software systems engineering PRINCIPLESSoftware systems engineering PRINCIPLES
Software systems engineering PRINCIPLESIvano Malavolta
 
Keys to Continuous Delivery Success - Mark Warren, Product Director, Perforc...
Keys to Continuous  Delivery Success - Mark Warren, Product Director, Perforc...Keys to Continuous  Delivery Success - Mark Warren, Product Director, Perforc...
Keys to Continuous Delivery Success - Mark Warren, Product Director, Perforc...Perforce
 
Agile and continuous delivery – How IBM Watson Workspace is built
Agile and continuous delivery – How IBM Watson Workspace is builtAgile and continuous delivery – How IBM Watson Workspace is built
Agile and continuous delivery – How IBM Watson Workspace is builtVincent Burckhardt
 
2006 bio it web services
2006 bio it web services2006 bio it web services
2006 bio it web servicesChris Dwan
 

Semelhante a Software Analytics: Data Analytics for Software Engineering (20)

Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
2014 01-ticosa
2014 01-ticosa2014 01-ticosa
2014 01-ticosa
 
Software Ecosystems = Big Data
Software Ecosystems = Big DataSoftware Ecosystems = Big Data
Software Ecosystems = Big Data
 
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
 
Continuous Software Engineering - A tutorial
Continuous Software Engineering - A tutorialContinuous Software Engineering - A tutorial
Continuous Software Engineering - A tutorial
 
Case study
Case studyCase study
Case study
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
 
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
 
Innovate Better Through Machine data Analytics
Innovate Better Through Machine data AnalyticsInnovate Better Through Machine data Analytics
Innovate Better Through Machine data Analytics
 
Software systems engineering PRINCIPLES
Software systems engineering PRINCIPLESSoftware systems engineering PRINCIPLES
Software systems engineering PRINCIPLES
 
Keys to Continuous Delivery Success - Mark Warren, Product Director, Perforc...
Keys to Continuous  Delivery Success - Mark Warren, Product Director, Perforc...Keys to Continuous  Delivery Success - Mark Warren, Product Director, Perforc...
Keys to Continuous Delivery Success - Mark Warren, Product Director, Perforc...
 
Resume_Apoorva
Resume_ApoorvaResume_Apoorva
Resume_Apoorva
 
Agile and continuous delivery – How IBM Watson Workspace is built
Agile and continuous delivery – How IBM Watson Workspace is builtAgile and continuous delivery – How IBM Watson Workspace is built
Agile and continuous delivery – How IBM Watson Workspace is built
 
2006 bio it web services
2006 bio it web services2006 bio it web services
2006 bio it web services
 

Mais de Tao Xie

MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...Tao Xie
 
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...Tao Xie
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringTao Xie
 
Diversity and Computing/Engineering: Perspectives from Allies
Diversity and Computing/Engineering: Perspectives from AlliesDiversity and Computing/Engineering: Perspectives from Allies
Diversity and Computing/Engineering: Perspectives from AlliesTao Xie
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Tao Xie
 
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...Tao Xie
 
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...Tao Xie
 
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchTao Xie
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...Tao Xie
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringTao Xie
 
Planning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchPlanning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchTao Xie
 
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)Tao Xie
 
Advances in Unit Testing: Theory and Practice
Advances in Unit Testing: Theory and PracticeAdvances in Unit Testing: Theory and Practice
Advances in Unit Testing: Theory and PracticeTao Xie
 
Common Technical Writing Issues
Common Technical Writing IssuesCommon Technical Writing Issues
Common Technical Writing IssuesTao Xie
 
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckHotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckTao Xie
 
Transferring Software Testing and Analytics Tools to Practice
Transferring Software Testing and Analytics Tools to PracticeTransferring Software Testing and Analytics Tools to Practice
Transferring Software Testing and Analytics Tools to PracticeTao Xie
 
User Expectations in Mobile App Security
User Expectations in Mobile App SecurityUser Expectations in Mobile App Security
User Expectations in Mobile App SecurityTao Xie
 
Impact-Driven Research on Software Engineering Tooling
Impact-Driven Research on Software Engineering ToolingImpact-Driven Research on Software Engineering Tooling
Impact-Driven Research on Software Engineering ToolingTao Xie
 
Next Generation Developer Testing: Parameterized Testing
Next Generation Developer Testing: Parameterized TestingNext Generation Developer Testing: Parameterized Testing
Next Generation Developer Testing: Parameterized TestingTao Xie
 
Csise15 codehunt
Csise15 codehuntCsise15 codehunt
Csise15 codehuntTao Xie
 

Mais de Tao Xie (20)

MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
 
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersect...
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software Engineering
 
Diversity and Computing/Engineering: Perspectives from Allies
Diversity and Computing/Engineering: Perspectives from AlliesDiversity and Computing/Engineering: Perspectives from Allies
Diversity and Computing/Engineering: Perspectives from Allies
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...
 
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
 
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
 
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software Engineering
 
Planning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchPlanning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful Research
 
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
 
Advances in Unit Testing: Theory and Practice
Advances in Unit Testing: Theory and PracticeAdvances in Unit Testing: Theory and Practice
Advances in Unit Testing: Theory and Practice
 
Common Technical Writing Issues
Common Technical Writing IssuesCommon Technical Writing Issues
Common Technical Writing Issues
 
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckHotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
 
Transferring Software Testing and Analytics Tools to Practice
Transferring Software Testing and Analytics Tools to PracticeTransferring Software Testing and Analytics Tools to Practice
Transferring Software Testing and Analytics Tools to Practice
 
User Expectations in Mobile App Security
User Expectations in Mobile App SecurityUser Expectations in Mobile App Security
User Expectations in Mobile App Security
 
Impact-Driven Research on Software Engineering Tooling
Impact-Driven Research on Software Engineering ToolingImpact-Driven Research on Software Engineering Tooling
Impact-Driven Research on Software Engineering Tooling
 
Next Generation Developer Testing: Parameterized Testing
Next Generation Developer Testing: Parameterized TestingNext Generation Developer Testing: Parameterized Testing
Next Generation Developer Testing: Parameterized Testing
 
Csise15 codehunt
Csise15 codehuntCsise15 codehunt
Csise15 codehunt
 

Último

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto GonzĂĄlez Trastoy
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Último (20)

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Call Girls In Mukherjee Nagar đŸ“± 9999965857 đŸ€© Delhi đŸ«Š HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar đŸ“±  9999965857  đŸ€© Delhi đŸ«Š HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar đŸ“±  9999965857  đŸ€© Delhi đŸ«Š HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar đŸ“± 9999965857 đŸ€© Delhi đŸ«Š HOT AND SEXY VVIP 🍎 SE...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Vip Call Girls Noida âžĄïž Delhi âžĄïž 9999965857 No Advance 24HRS Live
Vip Call Girls Noida âžĄïž Delhi âžĄïž 9999965857 No Advance 24HRS LiveVip Call Girls Noida âžĄïž Delhi âžĄïž 9999965857 No Advance 24HRS Live
Vip Call Girls Noida âžĄïž Delhi âžĄïž 9999965857 No Advance 24HRS Live
 

Software Analytics: Data Analytics for Software Engineering

  • 1. Software Analytics: Data Analytics for Software Engineering Achievements and Opportunities Tao Xie Department of Computer Science University of Illinois at Urbana-Champaign, USA taoxie@illinois.edu In Collaboration with Microsoft Research; Most Slides Adapted from Slides Co-Presented with Dongmei Zhang (Microsoft Research, http://research.microsoft.com/en-us/groups/sa/)
  • 2. New Era
Software itself is changing... Software Services
  • 3. How people use software is changing

  • 4. Individual Isolated Not much data/content generation How people use software is changing

  • 5. How people use software is changing
 Individual Isolated Not much data/content generation
  • 6. How people use software is changing
 Individual Social Isolated Not much data/content generation Collaborative Huge amount of data/artifacts generated anywhere anytime
  • 7. How software is built & operated is changing

  • 8. How software is built & operated is changing
 Data pervasive Long product cycle Experience & gut-feeling In-lab testing Informed decision making Centralized development Code centric Debugging in the large Distributed development Continuous release 
 

  • 9. How software is built & operated is changing
 Data pervasive Long product cycle Experience & gut-feeling In-lab testing Informed decision making Centralized development Code centric Debugging in the large Distributed development Continuous release 
 

  • 10. Software Analytics Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services. Dongmei Zhang, Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, and Tao Xie. Software Analytics as a Learning Case in Practice: Approaches and Experiences. In MALETS 2011 http://research.microsoft.com/en-us/groups/sa/malets11-analytics.pdf
  • 11. Software Analytics Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services. http://research.microsoft.com/en-us/groups/sa/ http://research.microsoft.com/en-us/news/features/softwareanalytics-052013.aspx
  • 12. Research topics – the trinity view Software Users Software Development Process Software System ‱ Covering different areas of software domain ‱ Throughout entire development cycle ‱ Enabling practitioners to obtain insights
  • 13. Data sources Runtime traces Program logs System events Perf counters 
 Usage log User surveys Online forum posts Blog & Twitter 
 Source code Bug history Check-in history Test cases Eye tracking MRI/EMG 

  • 14. Target audience – software practitioners
  • 15. Target audience – software practitioners Developer Tester
  • 16. Target audience – software practitioners Developer Tester Program Manager Usability engineer Designer Support engineer Management personnel Operation engineer
  • 17. Output – insightful information ‱ Conveys meaningful and useful understanding or knowledge towards completing the target task ‱ Not easily attainable via directly investigating raw data without aid of analytics technologies ‱ Examples – It is easy to count the number of re-opened bugs, but how to find out the primary reasons for these re-opened bugs? – When the availability of an online service drops below a threshold, how to localize the problem?
  • 18. Output – actionable information ‱ Enables software practitioners to come up with concrete solutions towards completing the target task ‱ Examples – Why bugs were re-opened? ‱ A list of bug groups each with the same reason of re- opening – Why availability of online services dropped? ‱ A list of problematic areas with associated confidence values
  • 19. Research topics & technology pillars Software Users Software Development Process Software System Vertical Horizontal Information Visualization Data Analysis Algorithms Large-scale Computing
  • 20. Connection to practice ‱ Software Analytics is naturally tied with software development practice ‱ Getting real
  • 21. Connection to practice ‱ Software Analytics is naturally tied with software development practice ‱ Getting real Real Data Real Problems Real Users Real Tools
  • 22. Outline ‱ Software Engineering for Big Data ‱ Overview of Software Analytics ‱ Selected Projects – StackMine: Performance debugging in the large – XIAO: Scalable code clone analysis – SAS: Incident management of online services – Code Hunt/Pex4Fun: Teaching & learning via interactive gaming
  • 23. StackMine Performance debugging in the large via mining millions of stack traces http://research.microsoft.com/en-us/groups/sa/stackmine_icse2012.pdf
  • 24. Performance issues in the real world ‱ One of top user complaints ‱ Impacting large number of users every day ‱ High impact on usability and productivity High Disk I/O High CPU consumption
  • 25. Performance issues in the real world ‱ One of top user complaints ‱ Impacting large number of users every day ‱ High impact on usability and productivity High Disk I/O High CPU consumption As modern software systems tend to get more and more complex, given limited time and resource before software release, development- site testing and debugging become more and more insufficient to ensure satisfactory software performance.
  • 27. Performance debugging in the large Trace Storage Trace collection Network
  • 28. Performance debugging in the large Trace Storage Trace collection Network Trace analysis
  • 29. Performance debugging in the large Trace Storage Trace collection Bug Database Network Trace analysis Bug filing
  • 30. Performance debugging in the large Trace Storage Trace collection Problematic Pattern Repository Bug Database Network Trace analysis Bug filing
  • 31. Performance debugging in the large Pattern Matching Trace Storage Trace collection Bug update Problematic Pattern Repository Bug Database Network Trace analysis Bug filing
  • 32. Performance debugging in the large Pattern Matching Trace Storage Trace collection Bug update Problematic Pattern Repository Bug Database Network Trace analysis Bug filing Key to issue discovery
  • 33. Performance debugging in the large Pattern Matching Trace Storage Trace collection Bug update Problematic Pattern Repository Bug Database Network Trace analysis Bug filing Key to issue discovery Bottleneck of scalability
  • 34. Performance debugging in the large Pattern Matching Trace Storage Trace collection Bug update Problematic Pattern Repository Bug Database Network Trace analysis How many issues are still unknown? Bug filing Key to issue discovery Bottleneck of scalability
  • 35. Performance debugging in the large Pattern Matching Trace Storage Trace collection Bug update Problematic Pattern Repository Bug Database Network Trace analysis How many issues are still unknown? Which trace file should I investigate first? Bug filing Key to issue discovery Bottleneck of scalability
  • 36. Problem definition Given OS traces collected from tens of thousands (potentially millions) of users, help domain experts identify impactful program execution patterns (that cause the most impactful underlying performance problems) with limited time and resource.
  • 37. Challenges Combination of expertise ‱ Generic machine learning tools without domain knowledge guidance do not work well Highly complex analysis ‱ Numerous program runtime combinations triggering performance problems ‱ Multi-layer runtime components from application to kernel being intertwined Large-scale trace data ‱ TBs of trace files and increasing ‱ Millions of events in single trace streamInternet
  • 38. Technical highlights ‱ Machine learning for system domain – Formulate the discovery of problematic execution patterns as callstack mining & clustering – Systematic mechanism to incorporate domain knowledge ‱ Interactive performance analysis system – Parallel mining infrastructure based on HPC + MPI – Visualization aided interactive exploration
  • 39. Impact “We believe that the MSRA tool is highly valuable and much more efficient for mass trace (100+ traces) analysis. For 1000 traces, we believe the tool saves us 4-6 weeks of time to create new signatures, which is quite a significant productivity boost.” Highly effective new issue discovery on Windows mini-hang Continuous impact on future Windows versions
  • 40. XIAO Scalable code clone analysis 2012 http://research.microsoft.com/jump/175199
  • 41. XIAO: Code Clone Analysis ‱ Motivation – Copy-and-paste is a common developer behavior – A real tool widely adopted internally and externally ‱ XIAO enables code clone analysis in the following way – High tunability – High scalability – High compatibility – High explorability
  • 42. High tunability – what you tune is what you get ‱ Intuitive similarity metric: effective control of the degree of syntactical differences between two code snippets for (i = 0; i < n; i ++) { a ++; b ++; c = foo(a, b); d = bar(a, b, c); e = a + c; } for (i = 0; i < n; i ++) { c = foo(a, b); a ++; b ++; d = bar(a, b, c); e = a + d; e ++; }
  • 43. High scalability ‱ Four-step analysis process ‱ Easily parallelizable based on source code partition Pre-processing Pruning Fine Matching Coarse Matching
  • 44. High compatibility ‱ Compiler independent ‱ Light-weight built-in parsers for C/C++ and C# ‱ Open architecture for plug-in parsers to support different languages ‱ Easy adoption by product teams – Different build environment – Almost zero cost for trial
  • 45. High explorability 1. Clone navigation based on source tree hierarchy 2. Pivoting of folder level statistics 3. Folder level statistics 4. Clone function list in selected folder 5. Clone function filters 6. Sorting by bug or refactoring potential 7. Tagging 1 2 3 4 5 6 7 1. Block correspondence 2. Block types 3. Block navigation 4. Copying 5. Bug filing 6. Tagging 1 2 3 4 1 6 5
  • 46. Scenarios & Solutions Quality gates at milestones ‱ Architecture refactoring ‱ Code clone clean up ‱ Bug fixing Post-release maintenance ‱ Security bug investigation ‱ Bug investigation for sustained engineering Development and testing ‱ Checking for similar issues before check-in ‱ Reference info for code review ‱ Supporting tool for bug triage Online code clone search Offline code clone analysis
  • 47. Benefiting developer community Available in Visual Studio 2012 RC Searching similar snippets for fixing bug once Finding refactoring opportunity
  • 48. More secure Microsoft products Code Clone Search service integrated into workflow of Microsoft Security Response Center Over 590 million lines of code indexed across multiple products Real security issues proactively identified and addressed
  • 49. Example – MS Security Bulletin MS12-034 Combined Security Update for Microsoft Office, Windows, .NET Framework, and Silverlight, published: Tuesday, May 08, 2012 3 publicly disclosed vulnerabilities and seven privately reported involved. Specifically, one is exploited by the Duqu malware to execute arbitrary code when a user opened a malicious Office document Insufficient bounds check within the font parsing subsystem of win32k.sys Cloned copy in gdiplus.dll, ogl.dll (office), Silver Light, Windows Journal viewer Microsoft Technet Blog about this bulletin However, we wanted to be sure to address the vulnerable code wherever it appeared across the Microsoft code base. To that end, we have been working with Microsoft Research to develop a “Cloned Code Detection” system that we can run for every MSRC case to find any instance of the vulnerable code in any shipping product. This system is the one that found several of the copies of CVE-2011-3402 that we are now addressing with MS12-034.
  • 50. SAS Incident management of online services http://research.microsoft.com/apps/pubs/?id=202451
  • 51. Motivation Incident Management (IcM) is a critical task to assure service quality ‱ Online services are increasingly popular & important ‱ High service quality is the key
  • 52. Incident Management: Workflow Detect a service issue Alert On- Call Engineers (OCEs) Investigate the problem Restore the service Fix root cause via postmortem analysis
  • 53. Incident Management: Characteristics Shrink-Wrapped Software Debugging Root Cause and Fix Debugger Controlled Environment Online Service Incident Management Workaround No Debugger Live Data
  • 54. Incident Management: Challenges Large volume and noisy data Highly complex problem space No knowledge of entire system Knowledge not well organized
  • 55. SAS: Incident management of online services SAS, developed and deployed to effectively reduce MTTR (Mean Time To Restore) via automatically analyzing monitoring data 5 5  Design Principle of SAS  Automating Analysis  Handling Heterogeneity  Accumulating Knowledge  Supporting human-in-the-loop (HITL)
  • 56. Techniques Overview ‱ System metrics – Identifying Incident Beacons ‱ Transaction logs – Mining Suspicious Execution Patterns ‱ Historical incidents – Mining Historical Workaround Solutions
  • 57. Industry Impact of SAS Deployment ‱ SAS deployed to worldwide datacenters for Service X (serving hundreds of millions of users) since June 2011 ‱ OCEs now heavily depend on SAS Usage ‱ SAS helped successfully diagnose ~76% of the service incidents assisted with SAS
  • 58. Code Hunt/Pex4Fun Teaching and learning via interactive gaming http://research.microsoft.com/pubs/189240/icse13see-pex4fun.pdf http://pex4fun.com/https://www.codehunt.com/
  • 59. Coding Duels 1,841,880 clicked 'Ask Pex!' http://pex4fun.com/
  • 60. Code Hunt: Resigned As Game https://www.codehunt.com/
  • 61. Behind the Scene Secret Implementation class Secret { public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); } } Player Implementation class Player { public static int Puzzle(int x) { return x; } } class Test { public static void Driver(int x) { if (Secret.Puzzle(x) != Player.Puzzle(x)) throw new Exception(“Mismatch”); } } behavior Secret Impl == Player Impl 62
  • 62. Coding Duels: Fun and Engaging Iterative gameplay Adaptive Personalized No cheating Clear winning criterion
  • 64. Coding Duels for Course Assignments @Grad Software Engineering Course http://pexforfun.com/gradsofteng Observed Benefits ‱ Automatic Grading ‱ Real-time Feedback (for Both Students and Teachers) ‱ Fun Learning Experiences
  • 65. Coding Duel Competition @ICSE 2011 http://pexforfun.com/icse2011
  • 66. Example User Feedback “It really got me *excited*. The part that got me most is about spreading interest in teaching CS: I do think that it’s REALLY great for teaching | learning!” “I used to love the first person shooters and the satisfaction of blowing away a whole team of Noobies playing Rainbow Six, but this is far more fun.” “I’m afraid I’ll have to constrain myself to spend just an hour or so a day on this really exciting stuff, as I’m really stuffed with work.” Released since 2010 X
  • 67. Usage Scenarios Students: proceed through a sequence on puzzles to learn and practice. Educators: exercise different parts of a curriculum, and track students’ progress Recruiters: use contests to inspire communities and results to guide hiring Researchers: mine extensive data in Azure to evaluate how people code and learn http://taoxie.cs.illinois.edu/publications/icse15jseet-codehunt.pdf ICSE 2015 JSEET: http://taoxie.cs.illinois.edu/publications/icse13see-pex4fun.pdf ICSE 2013 SEE:
  • 84. Leaderboard and Dashboard Publically visible, updated during the contest Visible only to the organizer
  • 85. It’s a game! iterative gameplay adaptive personalized no cheating clear winning criterion code test cases