Intelligent Software Engineering: Synergy between AI and Software Engineering

Intelligent Software Engineering:
Synergy between AI and Software
Engineering
Tao Xie
University of Illinois at Urbana-Champaign
taoxie@illinois.edu
http://taoxie.cs.illinois.edu/

Artificial Intelligence  Software Engineering
Artificial
Intelligence
Software
Engineering
Intelligent Software Engineering
Intelligence Software Engineering

Intelligent Test Generation for Binary Code
• 2016 DARPA’s Cyber Grand Challenge (all-machine hacking tournament)
• 7 teams/systems conducted defense & attack tasks on a server constantly being fed with
new code patches filled with bugs/vulnerabilities
• Top winner: Mayhem developed by ForAllSecure co-founded by CMU Prof. David Brumley,
with its key tech as dynamic symbolic execution [Godefroid et al. PLDI’05]
• Microsoft Security Risk Detection [MS]
• With its key tech as dynamic symbolic execution [Godefroid et al. PLDI’05]
• Evolved from SAGE @Microsoft Research [Godefroid et al. ACM Queue‘12]
Coldewey. Carnegie Mellon’s Mayhem AI takes home $2 million from DARPA’s Cyber Grand Challenge. techcrunch.com. Aug 5, 2016.
“Carnegie Mellon’s Mayhem AI takes home
$2 million from DARPA’s Cyber Grand Challenge”

Intelligent Test Generation for .NET Code
• Pex @Microsoft Research [ASE’14 Ex]
• Automated test generation tool for .NET code
• With its key tech as dynamic symbolic execution [Godefroid et al. PLDI’05]
• Intelligent techniques to tackle challenges of
• Path explosion via fitness function [DSN’09]
• Method sequence explosion via program synthesis [OOPSLA’11]
• …
• Shipped in Visual Studio 2015/2017 Enterprise Edition
• As IntelliTest
[ASE’14 Ex] Tillmann, de Halleux, Xie. Transferring an Automated Test Generation Tool to Practice: From Pex to Fakes
and Code Digger. ASE’14 Experience Paper

Intelligent Test Generation for Mobile App Code
• Tencent’s automatic test generation tool for WeChat [FSE’16IN][ICSE’17SEIP]
• Guided random test generation tool improved over Google Monkey
• Deployed in daily WeChat testing practice
• WeChat = WhatsApp + Facebook + Instagram + PayPal + Uber …
• #monthly active users: 1 billion @2018 March
• Facebook’s automatic test generation tool for Facebook mobile apps
• Originated from search-based testing tool Sapienz [Mao et al. ISSTA’16] at UCL
• Evolved from commercialized Sapienz by its acquired UCL spinoff MaJiCkE co-founded
by Prof. Mark Harman
• Deployed in daily testing practice of various Facebook apps

Intelligent Software Testing: Continuous Learning
• Learning from others working on the same things
• Our MSeqGen work on mining API usage call sequences to test the API [ESEC/FSE’09]
• Green: Reducing, reusing & recycling constraints [Visser et al. FSE’12]
• Learning from others working on similar things
• Enhancing reuse of constraint solutions [Jia et al. ISSTA’15]
• Heuristically matching solution spaces of arithmetic formulas [Aquino et al. ICSE’17]

Software Analytics
Software analytics is to enable software practitioners
to perform data exploration and analysis in order to
obtain insightful and actionable information for data-
driven tasks around software and services.
Zhang, Dang, Lou, Han, Zhang, Xie. Software Analytics as a Learning Case in Practice: Approaches and Experiences.
MALETS 2011

Data Sources in Software/Services
Runtime traces
Program logs
System events
Perf counters
…
Usage log
User surveys
Online forum posts
Blog & Twitter
…
Source code
Bug history
Check-in history
Test cases
Eye tracking
MRI/EMG
…
Zhang, Han, Dang, Lou, Zhang, Xie. Software Analytics in Practice. IEEE Software 2013

Research Topics & Technology Pillars
Zhang, Han, Dang, Lou, Zhang, Xie. Software Analytics in Practice. IEEE Software 2013

Example Research on Software Analytics
• StackMine [ICSE’12, IEEESoft’13]: performance debugging in the large
• Data Source: Performance call stack traces from Windows end users
• Analytics Output: Ranked clusters of call stack traces based on shared patterns
• Impact: Deployed/used in daily practice of Windows Performance Analysis team
• Service Analysis Studio [ASE’13-EX]: service incident management
• Data Source: Transaction logs, system metrics, past incident reports
• Analytics Output: Healing suggestions/likely root causes of the given incident
• Impact: Deployed and used by an important Microsoft service (hundreds of
millions of users) for incident management
Internet
@Microsoft Research Asia

Open Source Microservice Benchmark System
70+ microservices, including 40+ business ones, 30+ infrastructure ones (message middleware service,
distributed cache services, database services), totally 300K LOC
Git Repo：https://github.com/FudanSELab/train-ticketFudan、UIUC、SUTD Collaborative Research

Open Source Microservice Benchmark System
Git Repo：https://github.com/FudanSELab/train-ticket
• Include Java、Python、Go、Node.js
• Use asynchronous communication and message queue
• Use distributed cache
• Visualization tools for runtime monitoring and management
• Cluster orchestration support: docker swarm and Kubernetes
• Service mesh support: Istio
• Substantial test cases including 100+ unit and integration tests
• Multiple client support: Web, Android
Zhou, Peng, Xie, Sun, Ji, Li, Ding. Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical
Study. IEEE Transactions on Software Engineering (TSE), To appear
Fudan、UIUC、SUTD Collaborative Research

Intelligent IDE: Natural Language Interfacing
Inspired by AnnaTalk (Software Analytics Group @Microsoft Research Asia):
Conversational Interface for Business Analytics
Source: https://www.hksilicon.com/articles/1213020

Translation of NL to Programs
● Program Aliasing: a semantically equivalent program may have
many syntactically different forms
NL Sentences

NL  Regex: Sequence-to-Sequence Model
● Encoder/Decoder: 2 layers stacked LSTM architectures [Locascio et al. EMNLP’16]

Training Objective: Maximum Likelihood Estimation (MLE) 
Maximizing Semantic Correctness
● Standard seq-to-seq maximizes likelihood mapping NL to ground truth
● MLE penalizes syntactically different but semantically equivalent regex
● Reward : semantic correctness
● Alternative objective: Maximize the expected
Leveraging reinforcement learning to maximize Expected Semantic Correctness
Zhong, Guo, Yang, Peng, Xie, Lou, Liu, Zhang. SemRegex: A Semantics-Based Approach for Generating Regular Expressions
from Natural Language Specifications. EMNLP’18.

Measurements of Semantic Correctness
([ABab]&[A-Z]).*X● Minimal DFAs
● Test Cases (pos/neg string examples)
Zhong, Guo, Yang, Peng, Xie, Lou, Liu, Zhang. SemRegex: A Semantics-Based Approach for Generating Regular Expressions
from Natural Language Specifications. EMNLP’18.

Intelligent Software Engineering: Tech Transfer
http://www.diffblue.com/
Oxford University spin-off,
Daniel Kroening et al.
Peking University spin-off,
Ge Li et al.
https://www.codota.com/
Technion spin-off,
Eran Yahav et al.
Technical University Munich spin-off,
Benedikt Hauptmann et al.
https://www.qualicen.de/en/
http://www.aixcoder.com/
MaJiCKe
UCL spin-off, Mark Harman et al.
Acquired by Facebook
https://www.deepcode.ai/
ETH Zurich spin-off,
Martin Vechev et al.

Quite Many Recent Papers in AI/ML for SE
https://ml4code.github.io/
• 2018 (35)
• 2017 (34)
• 2016 (25)
• 2015 (25)
• 2014 (14)
• 2013 (9)
• 2012 (1)
• 2009 (1)
• 2007 (1)https://arxiv.org/abs/1709.06182

Open Topics in Intelligent Software Engineering (ISE)
• How to define or determine levels of intelligence in software engineering
solutions?
• Automation vs. Intelligence
• How to bring high levels of intelligence in software engineering solutions?
• How to attain high-quantity and high-quality labelled software data?
• How to transfer ISE research results into industrial/open source practice?
• …

AI Software is Complex
• March 18, 2018 (Tempe, Arizona): self-driving Uber car struck and
killed Elaine Herzberg, aged 49
• “Was this algorithmic tragedy inevitable? And how used to such
incidents would we, should we, be prepared to get?”
• “Software is released into a code universe which no one can fully
understand.”
Smith. Franken-algorithms: the deadly consequences of unpredictable code. theguardian.com, Aug 9, 2018

Requirements and Testing of AI Software - Chatbot
• March 23/24, 2016: Microsoft's Teen Chatbot Tay Turned into Genocidal
Racist
• "There are a number of precautionary steps they [Microsoft] could have
taken. It wouldn't have been too hard to create a blacklist of terms; or
narrow the scope of replies. They could also have simply manually
moderated Tay for the first few days, even if that had meant slower
responses.“
• “businesses and other AI developers will need to give more thought to the
protocols they design for testing and training AIs like Tay.”
Shead. Here's why Microsoft's teen chatbot turned into a genocidal racist, according to an AI expert. businessinsider.com, March 24, 2016

Requirements and Testing of AI Software – AI Hiring
• In 2014, Amazon tried building an AI tool to help with recruiting
but the tool was discriminating against women; in 2017, Amazon
abandoned the tool.
• Data source: predominantly male resumes submitted to Amazon over a
10-year period as candidates for recruitment
• Fairness testing: testing software for discrimination
[Galhotra et al. ESEC/FSE’17]
Hamilton. Amazon built an AI tool to hire people but had to shut it down because it was discriminating against women. businessinsider.com,
Oct. 10, 2018

Adversarial Machine Learning/Testing
● Generate adversarial examples derived from legitimate examples
with slight modification (imperceptible to human) to induce
misclassification
28
Turn rightGo straight
Slight
modification
Tian et al. DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars. ICSE 2018

Adversarial Machine Learning/Testing
● Generate adversarial examples derived from legitimate examples
with slight modification (imperceptible to human) to induce
misclassification
29
Turn rightGo straight
Slight
modification
Tian et al. DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars. ICSE 2018
Lu et al. NO Need to Worry about Adversarial Examples in
Object Detection in Autonomous Vehicles. CVPR’17.
https://arxiv.org/abs/1707.03501
“Our work raises an important question: can one
construct examples that are adversarial for many or
most viewing conditions? If so, the construction
should offer very significant insights into the internal
representation of patterns by deep networks. If not,
there is a good prospect that adversarial examples
can be reduced to a curiosity with little practical
impact.”

Example Defenses Against Adversarial Examples
30
Defense
Target model t
Target model t’
x
Detector &
Reformer
Target model t
x
x’
Class label
Class label
A single model is still vulnerable to an adversarial example white-box attack

MulDef: Multi-Model-based Defense Framework
31
Target model t
Model
Generator
Runtime Model
Selector
Input x Class label
Model Generator: generates a family of models T, M1, M2, ...
● Legitimate-behavior preservation
● Robustness diversity
● As many models as possible
Runtime Model Selector: randomly selects a model from the
family at runtime
Srisakaokul, Zhang, Zhong, Yang, Xie. MULDEF: Multi-model-based Defense Against Adversarial Examples for Neural
Networks. arXiv:1809.00065, August 2018.
Family of models:
t, M1, M2, …

Neural Machine Translation
Screen snapshot captured on April 5, 2018
• Overall better than statistical
machine translation
• Worse controllability
• Existing translation quality
assurance
 Need reference translation，
not applicable online

Translation Quality Assurance
● Key idea：black-box algorithms specialized for common problems
○ No need for reference translation; need only the original sentence and generated
translation
○ Precise problem localization
● Common problems
○ Under-translation
○ Over-translation
Tencent、UIUC Collaborative Work
Zheng, Wang, Liu, Zhang, Zeng, Deng, Yang, He, Xie. Testing Untestable Neural Machine Translation: An Industrial Case. ICSE 2019 Poster

Deployment of Translation Quality Assurance
● Adopted to improve WeChat translation service (over 1 billion users，
online serving 12 million translation tasks)
○ Offline monitoring (regression testing)
○ Online monitoring (real time selection of best model)
● Large scale test data for translation
○ ~130K English/180K Chinese words/phrases
○ Detect numerous problems in other vendors as well
BLEU Score Improvement %Problems Reduction
Problem Cases in Other Translation Services
Tencent、UIUC Collaborative Work
Zheng, Wang, Liu, Zhang, Zeng, Deng, Yang, He, Xie.
Testing Untestable Neural Machine Translation: An
Industrial Case. ICSE 2019 Poster

Quite Many Recent Papers in SE for AI/ML
• Ma et al. MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input
Selection. ESEC/FSE’18
• Sun et al. Concolic Testing for Deep Neural Networks. ASE’18
• Udeshi et al. Automated Directed Fairness Testing. ASE’18
• Ma et al. DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems. ASE’18
• Zhang et al. DeepRoad: GAN-based Metamorphic Testing and Input Validation Framework for Autonomous
Driving Systems. ASE’18
• Dwarakanath et al. Identifying Implementation Bugs in Machine Learning based Image Classifiers using
Metamorphic Testing. ISSTA’18
• Zhang et al. An Empirical Study on TensorFlow Program Bugs. ISSTA’18
• Tian et al. DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars. ICSE’18
• Abdessalem et al. Testing Vision-Based Control Systems Using Learnable Evolutionary Algorithms. ICSE’18
• Odena, Goodfellow. TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing.
arXiv:1807.10875. 2018.
• …

Open Topics in Intelligence Software Engineering (ISE)
• How to solicit and specify requirements for intelligence software?
• How to tackle the complexity of integrating intelligence software with the rest of the
software system?
• How to define test oracles (or properties) for intelligence software?
• How to design high-quality library/framework APIs for developing intelligence
software?
• How to transfer ISE research results into industrial/open source practice?
• …

(SE  AI)  Practice Impact
Problem
Domain
Solution
Domain
Practice
Intelligent Software Engineering
Intelligence Software Engineering

38
Thank You!
Q & A
This work was supported in part by NSF under grants no. CNS-1513939, CNS-1564274, CCF-1816615.
Tao Xie
University of Illinois at Urbana-Champaign
taoxie@illinois.edu
http://taoxie.cs.illinois.edu/

Intelligent Software Engineering: Synergy between AI and Software Engineering

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Intelligent Software Engineering: Synergy between AI and Software Engineering

Semelhante a Intelligent Software Engineering: Synergy between AI and Software Engineering (20)

Mais de Tao Xie

Mais de Tao Xie (20)

Último

Último (20)

Intelligent Software Engineering: Synergy between AI and Software Engineering

Notas do Editor