Martin Chapman: Research Overview, 2017

Research Overview
Provenance Themes
Dr. Martin Chapman
March 17, 2017
King’s College London
martin.chapman@kcl.ac.uk
1

Overview
Learning the language of error
Playing Hide-And-Seek
Storing and processing MT300s
2

Learning the language of error

Problem
Problem: One of the assertions in a program fails, based upon
given input, and you want to know how the sequence of method
calls in your program might have had an impact on this.
static void gl_read () {
do {
gl_insert(nondet_int ());
}
while (nondet_int ());
}
static void gl_insert(int value) {
struct node *node = malloc(sizeof *node );
node ->value = value;
list_add (&node ->linkage , &gl_list );
INIT_LIST_HEAD (&node ->nested );
}
static inline void __list_add(struct list_head *new ,
struct list_head *prev ,
struct list_head *next) {
next ->prev = new;
new ->next = next;
new ->prev = prev;
prev ->next = new;
}
static void inspect(const struct list_head *head) {
head = head ->prev;
const struct node *node =
list_entry(head , struct node , linkage );
assert(node ->nested.prev == &node ->nested );
for (head = head ->next; &node ->linkage != head;
head = head ->next );
}
static inline void __list_del(struct list_head *prev ,
next ->prev = prev;
prev ->next = next;
}
static inline void list_add(struct list_head *new ,
struct list_head *head) {
__list_add(new , head , head ->next );
}
int main () {
gl_read ();
inspect (& gl_list );
}
3

Problem
Problem: One of the assertions in a program fails, based upon
given input, and you want to know how the sequence of method
calls in your program might have had an impact on this.
static void gl_read () {
do {
gl_insert(nondet_int ());
}
while (nondet_int ());
}
static void gl_insert(int value) {
struct node *node = malloc(sizeof *node );
node ->value = value;
list_add (&node ->linkage , &gl_list );
INIT_LIST_HEAD (&node ->nested );
}
static inline void __list_add(struct list_head *new ,
struct list_head *prev ,
next ->prev = new;
new ->next = next;
new ->prev = prev;
prev ->next = new;
}
static void inspect(const struct list_head *head) {
head = head ->prev;
const struct node *node =
list_entry(head , struct node , linkage );
assert(node ->nested.prev == &node ->nested );
for (head = head ->next; &node ->linkage != head;
head = head ->next );
}
static inline void __list_del(struct list_head *prev ,
next ->prev = prev;
prev ->next = next;
}
static inline void list_add(struct list_head *new ,
struct list_head *head) {
__list_add(new , head , head ->next );
}
int main () {
gl_read ();
inspect (& gl_list );
}
Analysing large amounts of code to understand this can be diﬃcult.
3

Solution: Learning the language of error
Proposed solution: Summarise all the paths that lead to a failing
program assertion as a DFA [Chapman et al., 2015].
4

D
gl read gl insert listadd
list add
glinsert
inspect
Learn Assert
4

D
list add
glinsert
inspect
Learn Assert
Much easier to analyse. Provides an overview of program
behaviours, some of which may be unexpected.
4

D
list add
glinsert
inspect
Learn Assert
Did we really
want this
method call loop
in our program?
Much easier to analyse. Provides an overview of program
behaviours, some of which may be unexpected.
4

Implementation
Paths are formed from software counterexamples (method calls
that lead to a failing assertion in a program).
5

Implementation
Our software learns these counterexamples via the L* algorithm
[Angluin, 1987] (where the oracle is a model checker).
5

Implementation
Membership queries pertain to individual counterexamples, while
conjecture queries pertain to full automata.
5

Implementation
Membership queries pertain to individual counterexamples, while
conjecture queries pertain to full automata.
Supported by a Google faculty research award.
5

Case study: Automatic merging
Unexpected behaviours are particularly prevalent when code is
automatically merged:
main {
...
functionA();
functionB();
...}
(a) Source
main {
...
functionA();
...
functionZ();}
functionZ() {
functionB();}
(b) Branch A
main {
...
functionA();
functionB();
functionC();
... }
(c) Branch B
main {
...
functionA();
...
functionZ();
}
functionZ() {
functionB();
functionC();
...}
(d) Merged
6

Capitalising on automata representation (1)
Our software uses an automaton representation to draw the
developer’s attention to the changes introduced by the merge.
7

First we generate three automata:
Branch A B1 Merged Code P Branch B B2
Automaton A1 Automaton A2Automaton AMerged
7

First we generate three automata:
Branch A B1 Merged Code P Branch B B2
Automaton A1 Automaton A2Automaton AMerged
We then compute the following: AMerged A1 and AMerged A2
in order to show the new behaviours.
7

D
Z
C
Learnassert
Figure 1: AMerged A1 or behavior not in Branch A
D
Z
BC
Learnassert
Figure 2: AMerged A2 or behavior not in Branch Ba
a
Subtracting the union of A1 and A2 (common behaviour) would also allow us
to summarise all the new behaviour introduced by the merge.
8

Why an automaton?
9

Why an automaton?
1. Processing the source code directly in order to achieve a
similar representation is likely to be ineﬃcient (operations on
automata are well established).
9

Why an automaton?
1. Processing the source code directly in order to achieve a
similar representation is likely to be ineﬃcient (operations on
automata are well established).
2. The automata representation is highly intelligible.
9

Overview (1)
Problem: Network attacks are becoming more frequent.
10

Overview (1)
Potential solution: Construct formal decision making models
(e.g. game theoretic frameworks) that capture network security
scenarios in order to aid automated response (in respect of
automatic processing and solution, etc.).
10

Overview (1)
An interesting class of network security models: network security
games (NSGs).
• Typically consider the interactions between an attacker and a
defender.
10

Overview (1)
An interesting class of network security models: network security
games (NSGs).
• Typically consider the interactions between an attacker and a
defender.
A common approach to deriving an NSG model is to apply
existing types of games to unexplored network security problems.
10

Overview (2)
Unexplored network security problem: Multiple node attacks
(e.g. botnets and attack pivots).
11

Overview (2)
How do we link multiple node attacks to an existing type of game?
11

Overview (2)
How do we link multiple node attacks to an existing type of game?
The link: Multiple node attacks exhibit the two-sided search
problem (looking for something that does not want to be found;
the bots in a botnet (perspective of defender), or hidden, sensitive
resources (perspective of attacker)) with multiple hidden entities.
11

Overview (3)
Search games are designed to model and investigate the two-sided
search problem, as interactions between a hider and a seeker.
12

Overview (3)
Hide-and-seek games, a subset of search games, are designed to do
this for multiple hidden objects.
12

Overview (3)
Hide-and-seek games, a subset of search games, are designed to do
this for multiple hidden objects.
Initial proposal: It is logical to study hide-and-seek games in
order to study multiple node attacks [Chapman et al., 2014].
• The hider is the defender, and the seeker is the attacker, or
vice-versa.
12

Hide-And-Seek Games
Diﬀerent permutations on same basic model. The permutation of
interest to us:
• Two competing players; the hider and the seeker
• A search space; for our purposes, a network graph
• Hidden objects to be concealed on the network
• Some cost to seeker for undertaking a search; the hider is
rewarded in an inverse amount.
Diﬀerent strategies are explored for both the hider and the seeker.
13

Hide-And-Seek Games
interest to us:
This model is simple, but already promising in what it can capture
from a multiple node attack.
13

Hide-And-Seek Games
interest to us:
This model is simple, but already promising in what it can capture
from a multiple node attack.
Richer variants to the model are natural, why aren’t they explored?
‘Complexity’. 13

Methodology
We increase the richness of the model, and thus what it can
capture of the security domain (e.g. timesteps, repeated
interactions).
We compensate for any increase in complexity by using an
Empirical Game Theoretical Analysis (EGTA) approach to
estimate the payoﬀ values associated with diﬀerent strategies by
realising computational representations of them, and evaluating
their performance in simulation.
14

Methodology
interactions).
• Also allowed us to contribute a computational platform, which
can be used as the basis for Distributed Research Games
(more at cyberhands.co.uk)
14

Methodology
interactions).
• Also allowed us to contribute a computational platform, which
can be used as the basis for Distributed Research Games
(more at cyberhands.co.uk)
The performance of diﬀerent strategies provides the basis for
heuristics that can be applied to real security applications. 14

Results (1)
Multiple interaction game: the same attacker (hider) and
defender (seeker) meet each other multiple times.
15

Results (1)
Natural for the defender to keep data on the actions of an
attacker, to help plan future strategies, by observing how the
attacker interacts with the environment (e.g. where objects are
hidden).
15

Results (1)
hidden).
Therefore, natural for attacker to attempt to manipulate this data
(i.e. switch the source of the data from the environment to
themselves).
15

Results (1)
hidden).
Therefore, natural for attacker to attempt to manipulate this data
(i.e. switch the source of the data from the environment to
themselves).
Finding: deceptive strategies are not eﬀective if the defender is
sophisticated in respect of determining the source of data (i.e
determining when manipulation is being attempted).
15

Results (2)
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
hD
eceptive
hRandom
Set
Payoﬀ
Strategy
sHighProbability
***
***
16

Storing and processing MT300s (1)
Brief: “To design and build a distributed ledger POC system to
process and store proprietary messages for inter- subsidiary forex
transactions (MT300s) internal to a major Fortune 500 ﬁnancial
institution.”
• Taking pairs of messages about forex transactions (e.g. £→
$; $ → £), and storing them on the ‘blockchain’.
Distributed ledger: A generalised term for a blockchain,
emphasising that it’s not only currency exchanges that can be
stored in this sequential, secure and replicated way.
17

Storing and processing MT300s (2)
Why? Intermediate message processors add time and money.
Cynically: become familiar with the technology that may one day
supplant them.
Output. A platform that:
• Integrates a wider range of diﬀerent technologies to achieve
its aim (e.g. BigChainDB, ErisDB (Permissioned chains based
on Ethereum / EVM) + Tendermint)).
• Focuses on scalability and throughput.
Research into inter-chain interaction for processing and storing
data (e.g. using a separate chain to store ﬁltered transactions).
18

Provenance Themes
A recurring theme of both conceptual provenance, and data
provenance, in my work:
1
19

Provenance Themes
• Learning the language of error: Understanding the functions
that have had an impact on input data, using a graph based
representation, and how this has lead to an error.
1
19

Provenance Themes
• Playing Hide-And-Seek: Understanding the origin of data in
order to make strategic decisions.
1
19

Provenance Themes
• Playing Hide-And-Seek: Understanding the origin of data in
order to make strategic decisions.
• MT300 Processing: Using a distributed ledger (blockchain) to
provide a secure historic record of all the actions involving an
entity1.
1
Lots of research to be done at the intersection here!
19

Summary (1)
Experience of, and achievements as a part of, projects that require
not only good development skills, but also research capabilities.
Strong programming ability.
Wide range of experience working with diﬀerent systems, some
large in scale, or designed to be scalable. In particular systems
that have required me to consider how to facilitate
communication between heterogeneous entities (e.g. learn
tool, HANDS platform, distributed ledger projects).
Ph.D. with a focus on game theory, artiﬁcial intelligence, and
elements of learning.
20

Summary (2)
Themes of provenance, and graph-based representation,
throughout work.
Additional experience as a teaching academic staﬀ member at
King’s: signiﬁcant teaching responsibilities, in addition to
administrative and pastoral responsibilities. Some system
development as a part of this role.
21

References
Angluin, D. (1987).
Learning regular sets from queries and counterexamples.
Information and computation, 75(2):87–106.
Chapman, M., Chockler, H., Kesseli, P., Kroening, D.,
Strichman, O., and Tautschnig, M. (2015).
Learning the language of error.
In International Symposium on Automated Technology for
Veriﬁcation and Analysis, pages 114–130. Springer.
Chapman, M., Tyson, G., McBurney, P., Luck, M., and
Parsons, S. (2014).
Playing hide-and-seek: an abstract game for cyber
security.
In Proceedings of the 1st International Workshop on Agents
and CyberSecurity, page 3. ACM. 22

Martin Chapman: Research Overview, 2017

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (18)

Semelhante a Martin Chapman: Research Overview, 2017

Semelhante a Martin Chapman: Research Overview, 2017 (20)

Mais de Martin Chapman

Mais de Martin Chapman (20)

Último

Último (20)

Martin Chapman: Research Overview, 2017