SlideShare uma empresa Scribd logo
1 de 136
Baixar para ler offline
Applicability of Interactive Genetic Algorithms to
Multi-agent Systems: Experiments on Games Used
in Smart Grid Simulations.
by

Yomna Mahmoud Ibrahim Hassan

A Thesis Presented to the
Masdar Institute of Science and Technology
in Partial Fulfillment of the Requirements for the Degree of
Master of Science
in
Computing and Information Science

c 2011 Masdar Institute of Science and Technology
All rights reserved
AUTHOR’S DECLARATION
I understand that copyright in my thesis is transferred to Masdar Institute of Science
and Technology.

ACCEPTANCE DECLARATION
This thesis has been accepted and approved by Masdar Institute of Science and
Technology on August 01, 2011.

EXAMINATION COMMITTEE MEMBERS
Jacob Crandall, Advisor, Masdar Institute of Science and Technology
Davor Svetinovic, Masdar Institute of Science and Technology
Iyad Rahwan, Masdar Institute of Science and Technology

ii
Abstract
A common goal of many organizations over the next decades is to enhance the
efficiency of electrical power grids. This entails : (1) modifying the power grid
structure to be able to utilize the available resources in the best way possible, and
(2) introducing new energy sources that are able to benefit from the surrounding
circumstances. The trend toward the use of renewable energy sources requires the
development of power systems that are able to accommodate variability and intermittency in electricity generation. Therefore, these power grids, usually called
“smart grids,” must be dynamic enough to adapt smoothly to changes in the environment and human preferences.
In a smart grid, each decision maker can be represented as an intelligent agent
that consumes or produces electricity. Each agent interacts with other agents and
the surrounding environment. The goal of these agents may vary between maintaining the stability of electricity in the grid from the generation side and increasing users’ satisfaction with the electricity service from the consumers’ side (which
is our focus). This is done through the interaction between different agents to
schedule and divide the tasks of consumption and generation among each other,
depending on the need and the type of each agent.
In this thesis, we investigate the use of interactive genetic algorithms to derive
intelligent behavior that enables an agent on the consumer’s side to consume the
proper amount of electricity to satisfy human preferences. This behavior must take
iii
into account the existence of other agents within the system, which increases the
dynamicity of the system. In order to evaluate the effectiveness of the suggested
algorithms within a multi-agent settings, we test our algorithms in repeated matrix games when they associate with other copies of themselves, and against other
known multi-agent learning algorithms. We run different variations of the genetic
algorithm, with and without human input, in order to determine what are the factors
that affect the performance of the algorithm within a dynamic multi-agent system.
Our results show reasonable potential for using genetic algorithms in such circumstances, particularly when they utilize effective human input.

iv
This research was supported by the Government of Abu Dhabi to help fulfill the
vision of the late President Sheikh Zayed Bin Sultan Al Nayhan for sustainable
development and empowerment of the UAE and humankind.

v
Acknowledgments
I would first like to thank Masdar for giving us the opportunity to conduct our research, and increase the potential of a suitable environment to achieve a successful
research.
I would also like to express my gratitude towards my committee members, starting
with my advisor Professor Jacob Crandall, for his support, encouragement and continuous enthusiasm about our research for the past 2 years. Furthermore, I would
like to thank Dr. Davor Svetinovic and Dr. Iyad Rahwan for their feedback.

Last, but never least, I would like to thank my family and friends, for their
continuous support and their belief in me.
Yomna Mahmoud Ibrahim Hassan,

Masdar City, August 1, 2011.

vi
Contents

1

1

1.1

Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.2

Motivation and Relevance to the Masdar Initiative . . . . . . . . .

2

1.3

Thesis statement . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

1.4
2

Introduction

Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . .

6

Literature Review

7

2.1

Electrical Power grids . . . . . . . . . . . . . . . . . . . . . . . .

8

2.1.1

Smart grids . . . . . . . . . . . . . . . . . . . . . . . . .

8

2.2

Multi-agent systems . . . . . . . . . . . . . . . . . . . . . . . . .

10

2.3

Matrix games . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.3.1

Types of matrix games . . . . . . . . . . . . . . . . . . .

12

2.3.2

Solution concepts . . . . . . . . . . . . . . . . . . . . . .

13

2.3.3

Repeated matrix games . . . . . . . . . . . . . . . . . . .

15

2.3.4

Stochastic games . . . . . . . . . . . . . . . . . . . . . .

17

Learning in repeated matrix games . . . . . . . . . . . . . . . . .

17

2.4

vi
2.4.1

18

Evolutionary algorithms . . . . . . . . . . . . . . . . . . . . . .

19

Genetic algorithms . . . . . . . . . . . . . . . . . . . . .

19

Genetic algorithm structure . . . . . . . . . . . . . . . . .

20

Genetic algorithms in repeated matrix games . . . . . . . . . . .

27

2.6.1

Genetic algorithms in distributed systems . . . . . . . . .

29

2.6.2

Genetic algorithms in dynamic systems . . . . . . . . . .

29

Interactive learning . . . . . . . . . . . . . . . . . . . . . . . . .

31

2.7.1

Interactive learning in repeated matrix games . . . . . . .

32

2.7.2

3

Reinforcement learning . . . . . . . . . . . . . . . . . . .

2.5.2

2.8

18

2.5.1

2.7

No-regret learning . . . . . . . . . . . . . . . . . . . . .

2.4.3

2.6

18

2.4.2

2.5

Belief-based learning . . . . . . . . . . . . . . . . . . . .

Interactive genetic algorithms . . . . . . . . . . . . . . .

32

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

Experimental Setup

36

3.1

Games’ structure . . . . . . . . . . . . . . . . . . . . . . . . . .

36

3.1.1

Prisoner’s dilemma . . . . . . . . . . . . . . . . . . . . .

37

3.1.2

Chicken . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

3.1.3

Shapley’s game . . . . . . . . . . . . . . . . . . . . . . .

38

3.1.4

Cooperative games . . . . . . . . . . . . . . . . . . . . .

38

3.2

Knowledge and Information . . . . . . . . . . . . . . . . . . . .

39

3.3

Opponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

3.3.1

GIGA-WolF . . . . . . . . . . . . . . . . . . . . . . . .

40

3.3.2

Q-learning . . . . . . . . . . . . . . . . . . . . . . . . .

41

3.4

Evaluation criteria . . . . . . . . . . . . . . . . . . . . . . . . . .

42

3.5

Performance of GIGA-WoLF and Q-learning . . . . . . . . . . .

43

3.5.1

Prisoner’s dilemma . . . . . . . . . . . . . . . . . . . . .

43

3.5.2

Chicken . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

vii
3.5.3

4

44

3.5.4
3.6

Shapley’s game . . . . . . . . . . . . . . . . . . . . . . .
Cooperative games . . . . . . . . . . . . . . . . . . . . .

44

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

Learning using Genetic Algorithms

47

4.1

Algorithm structure . . . . . . . . . . . . . . . . . . . . . . . . .

48

4.1.1

Basic genetic algorithm . . . . . . . . . . . . . . . . . . .

51

4.1.2

Genetic algorithm with history propagation . . . . . . . .

52

4.1.3

Genetic algorithm with stopping condition

. . . . . . . .

52

4.1.4

Genetic algorithm with dynamic parameters’ setting . . .

52

4.1.5

Genetic algorithm with dynamic parameters’ setting and
stopping condition . . . . . . . . . . . . . . . . . . . . .

Genetic algorithms vs. GIGA-WoLF . . . . . . . . . . . .

55

Genetic algorithms Vs. Q-learning . . . . . . . . . . . . .

56

4.2.3

5

54

4.2.2

4.3

Results and analysis . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1

4.2

53

Genetic algorithms in self play . . . . . . . . . . . . . . .

59

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

Interactive genetic algorithms

63

5.1

Human input framework . . . . . . . . . . . . . . . . . . . . . .

64

5.1.1

Evaluate the population . . . . . . . . . . . . . . . . . . .

64

5.1.2

Select set of histories . . . . . . . . . . . . . . . . . . . .

66

5.1.3

Generate statistics for selected histories . . . . . . . . . .

67

5.1.4

Generating a new population from human input . . . . . .

68

Interactive genetic algorithms: Six variations . . . . . . . . . . .

69

5.2.1

Effect of input quality on the performance of GA . . . . .

69

Results and analysis . . . . . . . . . . . . . . . . . . . . . . . . .

71

5.3.1

73

5.2

5.3

Interactive genetic algorithms . . . . . . . . . . . . . . .

viii
5.3.2

Modificiations on interactive genetic algorithms . . . . . .

5.3.3

78

The effect of human input quality on interactive genetic
algorithms . . . . . . . . . . . . . . . . . . . . . . . . .

5.4
6

83

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89
90

6.1

N-player prisoner’s dilemma . . . . . . . . . . . . . . . . . . . .

90

6.2

Strategy representation . . . . . . . . . . . . . . . . . . . . . . .

92

6.3

Human input . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

6.4

Results and analysis . . . . . . . . . . . . . . . . . . . . . . . . .

94

6.5
7

IGA in N-player matrix games

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

97

Conclusions and Future work

99

7.1

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99

7.2

Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

ix
List of Tables

2.1

Payoff matrix for the Prisoner’s dilemma. . . . . . . . . . . . . .

12

2.2

Payoff matrix for the Prisoner’s dilemma. . . . . . . . . . . . . .

17

3.1

Payoff matrix for the Prisoner’s dilemma. . . . . . . . . . . . . .

37

3.2

Payoff matrix for chicken game. . . . . . . . . . . . . . . . . . .

38

3.3

Payoff matrix of Shapley’s game. . . . . . . . . . . . . . . . . . .

38

3.4

Payoff matrix of a fully cooperative matrix game. . . . . . . . . .

39

4.1

Variables used within the algorithms . . . . . . . . . . . . . . . .

49

4.2

Payoff matrix for the Prisoner’s dilemma. . . . . . . . . . . . . .

56

4.3

Payoff of a fully cooperative matrix game. . . . . . . . . . . . . .

56

4.4

Payoff matrix for chicken game. . . . . . . . . . . . . . . . . . .

56

4.5

Payoff matrix of Shapley’s game. . . . . . . . . . . . . . . . . . .

57

5.1

Properties of the different variations of IGA algorithms. . . . . . .

70

5.2

Acceptable and unacceptable human inputs for the selected 2-agent
matrix games. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

x

70
List of Figures

2.1

A traditional Electrical Grid [8]. . . . . . . . . . . . . . . . . . .

9

2.2

A smart electrical grid [8]. . . . . . . . . . . . . . . . . . . . . .

10

2.3

Payoff space for the prisoner’s dilemma game [25]. . . . . . . . .

16

2.4

Roulette wheel selection mechanism [30]. . . . . . . . . . . . . .

22

2.5

Crossover in genetic algorithms. . . . . . . . . . . . . . . . . . .

23

2.6

Mutation in genetic algorithms. . . . . . . . . . . . . . . . . . . .

24

2.7

Basic structure of genetic algorithms. . . . . . . . . . . . . . . .

25

2.8

The interactive artificial learning process [27]. . . . . . . . . . . .

31

3.1

Payoffs of GIGA-Wolf and Q-Learning within selected games. . .

46

4.1

Chromosome structure . . . . . . . . . . . . . . . . . . . . . . .

50

4.2

Effect of variations on GA on final payoffs against GIGA-WoLF
(all games). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.3

57

Effect of variations on GA on final payoffs against GIGA-WoLF in
prisoner’s dilemma. . . . . . . . . . . . . . . . . . . . . . . . . .

xi

58
4.4

Effect of variations on GA on final payoffs against GIGA-WoLF in
cooperation game. . . . . . . . . . . . . . . . . . . . . . . . . . .

4.5

Effect of variations on GA on final payoffs against GIGA-WoLF in
Chicken game. . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.6

59

Effect of variations on GA on final payoffs against Q-learning in
prisoner’s dilemma. . . . . . . . . . . . . . . . . . . . . . . . . .

4.9

58

Effect of variations on GA on final payoffs against Q-learning (all
games). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.8

58

Effect of variations on GA on final payoffs against GIGA-WoLF in
Shapley’s game. . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.7

58

60

Effect of variations on GA on final payoffs against Q-learning in a
Cooperation game. . . . . . . . . . . . . . . . . . . . . . . . . .

60

4.10 Effect of variations on GA on final payoffs against Q-learning in
Chicken game. . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

4.11 Effect of variations on GA on final payoffs against Q-learning in
Shapley’s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

4.12 Sample of the chromosomes generated vs. Q-learning in prisoner’s
dilemma. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

4.13 Effect of history propagation on GA against Q-learning in Prisoners dilemma. Values shows are the average payoff per generation.

61

4.14 Effect of variations on GA against Q-learning in cooperation game
(Average payoff per generation). . . . . . . . . . . . . . . . . . .

61

4.15 Effect of variations on GA on final payoffs in self play (all games)

61

4.16 Effect of variations on GA on final payoffs in prisoner’s dilemma .

62

4.17 Effect of variations on GA on final payoffs in a Cooperation game

62

4.18 Effect of variations on GA on final payoffs in Chicken game . . .

62

4.19 Effect of variations on GA on final payoffs in Shapley’s game . . .

62

xii
5.1

Human input framework. . . . . . . . . . . . . . . . . . . . . . .

65

5.2

Designed graphical user interface. . . . . . . . . . . . . . . . . .

66

5.3

Evaluation metrices example . . . . . . . . . . . . . . . . . . . .

72

5.4

Effect of human input on basic IGA against GIGA-WoLF (all games). 73

5.5

Effect of basic human input on the performance (final payoff) of
GA vs. GIGA-WoLF in prisoner’s dilemma. . . . . . . . . . . . .

5.6

Effect of basic human input on the performance (final payoff) of
GA vs. GIGA-WoLF in a cooperation game. . . . . . . . . . . . .

5.7

74

Effect of basic human input on the performance (final payoff) of
GA vs. GIGA-WoLF in Shapley’s game. . . . . . . . . . . . . . .

5.9

74

Effect of basic human input on the performance (final payoff) of
GA vs. GIGA-WoLF in chicken game. . . . . . . . . . . . . . . .

5.8

74

74

Effect of basic human input on GA against GIGA-WoLF in Shapley’s (Average payoff per generation). . . . . . . . . . . . . . . .

75

5.10 Effect of basic human input on GA on final payoffs against Qlearning (all games). . . . . . . . . . . . . . . . . . . . . . . . .

75

5.11 A sample of the chromosomes generated from basic IGA vs. Qlearning in prisoner’s dilemma. . . . . . . . . . . . . . . . . . . .

76

5.12 Effect of basic human input on the performance (final payoff) of
GA vs. Q-learning in prisoner’s dilemma. . . . . . . . . . . . . .

76

5.13 Effect of basic human input on the performance (final payoff) of
GA vs. Q-learning in a cooperation game. . . . . . . . . . . . . .

76

5.14 Effect of basic human input on the performance (final payoff) of
GA vs. Q-learning in chicken game. . . . . . . . . . . . . . . . .

76

5.15 Effect of basic human input on the performance (final payoff) of
GA vs. Q-learning in shapley’s game. . . . . . . . . . . . . . . .

xiii

76
5.16 Effect of basic human input on GA on final payoffs in self-play (all
games). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77

5.17 Effect of basic human input on GA in self-play in chicken game
(Average payoff per generation). . . . . . . . . . . . . . . . . . .

77

5.18 Effect of basic human input on the performance (final payoff) of
GA in self-play in prisoner’s dilemma. . . . . . . . . . . . . . . .

78

5.19 Effect of basic human input on the performance (final payoff) of
GA in self-play in a cooperation game. . . . . . . . . . . . . . . .

78

5.20 Effect of basic human input on the performance (final payoff) of
GA in self-play in chicken game. . . . . . . . . . . . . . . . . . .

78

5.21 Effect of basic human input on the performance (final payoff) of
GA in self-play in shaply’s game. . . . . . . . . . . . . . . . . . .

78

5.22 Effect of variations on human input on GA against GIGA-WoLF
(all games). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

79

5.23 Effect of variations on IGA against GIGA-WoLF in prisoners dilemma
(Average payoff per generation). . . . . . . . . . . . . . . . . . .

79

5.24 Effect of variations on IGA on final payoffs vs. GIGA-WoLF in
prisoner’s dilemma. . . . . . . . . . . . . . . . . . . . . . . . . .

80

5.25 Effect of variations on IGA on final payoffs vs. GIGA-WoLF in a
cooperation game. . . . . . . . . . . . . . . . . . . . . . . . . . .

80

5.26 Effect of variations on IGA on final payoffs vs. GIGA-WoLF in
Chicken game. . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

5.27 Effect of variations on IGA on final payoffs vs. GIGA-WoLF in
Shapley’s game. . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

5.28 Effect of variations on IGA on final payoffs vs. Q-learning (all
games). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xiv

81
5.29 Effect of variations on IGA on final payoffs vs. Q-learning in prisoner’s dilemma. . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

5.30 Effect of variations on IGA on final payoffs vs. Q-learning in a
cooperation game. . . . . . . . . . . . . . . . . . . . . . . . . . .

82

5.31 Effect of variations on IGA on final payoffs vs. Q-learning in
chicken game. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

5.32 Effect of variations on IGA on final payoffs vs. Q-learning in Shapley’s game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

5.33 Effect of variations on IGA on final payoffs in self-play (all games). 83
5.34 Effect of variations on IGA on final payoffs in self-play in prisoner’s dilemma. . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

5.35 Effect of variations on IGA on final payoffs in self-play in a cooperation game. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

5.36 Effect of variations on IGA on final payoffs in self-play in chicken
game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

5.37 Effect of variations on IGA on final payoffs in self-play in shapley’s game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

5.38 Human input quality and its effect on final payoffs of IGA against
GIGA-WoLF (all games). . . . . . . . . . . . . . . . . . . . . . .

85

5.39 Human input quality and its effect on final payoffs of IGA vs.
GIGA-WoLF in prisoner’s dilemma. . . . . . . . . . . . . . . . .

85

5.40 Human input quality and its effect on final payoffs of IGA vs.
GIGA-WoLF in a cooperation game. . . . . . . . . . . . . . . . .

85

5.41 Human input quality and its effect on final payoffs of IGA vs.
GIGA-WoLF in chicken game. . . . . . . . . . . . . . . . . . . .

85

5.42 Human input quality and its effect on final payoffs of IGA vs.
GIGA-WoLF in shapley’s game. . . . . . . . . . . . . . . . . . .

xv

85
5.43 Human input quality and its effect on IGA vs. GIGA-WoLF in
prisoners dilemma (Average payoff per generation). . . . . . . . .

86

5.44 Effect of human input quality and its effect on IGA against GIGAWoLF in cooperation (Average payoff per generation). . . . . . .

86

5.45 Human input quality and its effect on final payoffs of IGA vs. Qlearning (all games). . . . . . . . . . . . . . . . . . . . . . . . .

86

5.46 Human input quality and its effect on final payoffs of IGA vs. Qlearning in prisoners dillema (per generation). . . . . . . . . . . .

87

5.47 Human input quality and its effect on final payoffs of IGA vs. Qlearning in prisoner’s dilemma. . . . . . . . . . . . . . . . . . . .

87

5.48 Human input quality and its effect on final payoffs of IGA vs. Qlearning in a cooperation game. . . . . . . . . . . . . . . . . . . .

87

5.49 Human input quality and its effect on final payoffs of IGA vs. Qlearning in chicken game. . . . . . . . . . . . . . . . . . . . . . .

87

5.50 Human input quality and its effect on final payoffs of IGA vs. Qlearning in shapley’s game. . . . . . . . . . . . . . . . . . . . . .

87

5.51 Human input quality and its effect on final payoffs of IGA in selfplay (all games). . . . . . . . . . . . . . . . . . . . . . . . . . . .

88

5.52 Human input quality and its effect on final payoffs of IGA in selfplay in prisoner’s dilemma. . . . . . . . . . . . . . . . . . . . . .

88

5.53 Human input quality and its effect on final payoffs of IGA in selfplay in a cooperation game. . . . . . . . . . . . . . . . . . . . . .

88

5.54 Human input quality and its effect on final payoffs of IGA in selfplay in chicken game. . . . . . . . . . . . . . . . . . . . . . . . .

88

5.55 Human input quality and its effect on final payoffs of IGA in selfplay in shapley’s game. . . . . . . . . . . . . . . . . . . . . . . .
6.1

88

Payoff matrix of the 3-player prisoner’s dilemma. . . . . . . . . .

91

xvi
6.2

Relationship between the fraction of cooperators and the utility received by a game participant. . . . . . . . . . . . . . . . . . . . .

92

6.3

Final payoffs of selected opponents in 3-player prisoner’s dilemma.

93

6.4

Effect of human input on the performance of GA in the 3-player
prisoner’s dilemma in self-play. . . . . . . . . . . . . . . . . . . .

6.5

95

Effect of human input on the performance of GA in the 3-player
prisoner’s dilemma with 1-player as GIGA-WoLF and 1-player in
self-play. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6.6

Effect of human input on the performance of GA in 3-player prisoner’s dilemma with 2-players as GIGA-WoLF. . . . . . . . . . .

6.7

95

96

Effect of human input on the performance of GA in 3-player prisoner’s dilemma with 1-player as Q-learning and 1-player in self-play. 96

6.8

Effect of human input on the performance of GA in 3-player prisoner’s dilemma with 2-players as Q-learning. . . . . . . . . . . .

6.9

97

Effect of human input on the performance of GA in 3-player prisoner’s dilemma with 1-player as Q-learning and 1-player as GIGAWoLF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xvii

97
List of Algorithms

3.1

GIGA-WolF . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

3.2

Q-learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

4.1

Basic genetic algorithm . . . . . . . . . . . . . . . . . . . . . . .

51

4.2

Genetic algorithm with history propagation . . . . . . . . . . . .

52

4.3

Genetic algorithm with stopping condition . . . . . . . . . . . . .

53

4.4

Genetic algorithm with dynamic parameters’ setting . . . . . . . .

54

4.5

Genetic algorithm with dynamic parameters’ setting and stopping
condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xviii

55
CHAPTER

1

Introduction

1.1

Problem Definition

In multi-agent systems (MAS), intelligent agents interact with each other seeking
to maximize their own welfare. In many instances, these agents need to learn overtime in order to become more successful. One of the main issues within MAS is
the ability of each agent in the system to learn effectively and co-exist with heterogeneous agents within the system. MAS is considered one of the most prominent
fields of research because its structure describes many of sreal life problems.
Extensive research has been performed in an effort to design a learning algorithm applicable to work in MAS [10, 53, 91]. However, proposed solutions
typically suffer from at least one of the following problems:
1. Inability to adapt with the increase of dynamic situations within the system.
2. Settlement into myopic, non-evolving solutions.

1
CHAPTER 1. INTRODUCTION

2

3. Requirement of extensive learning time in order to reach an acceptable solution.
Power systems are widely used as an example of MAS [51, 106]. In such systems, each consumer can be considered as an agent. Each agent in this multi-agent
system must learn an intelligent behavior in order to maximize its own personal
gain. In this case, the gain from the consumer’s perspective is the ability to satisfy
the user’s consumption and preferences.

1.2

Motivation and Relevance to the Masdar Initiative

Electricity consumption has increased drastically in the past decade as a result of
an enormous increase in population and technology. In the UAE for instance, there
has been a sudden increase in the usage of high-tech appliances and ability to add
more electrical devices than ever before [70]. This increase in consumption, while
still relying on the old electrical grid as a mean of distributing electricity, leads
to high losses in electricity. This is partially due to the variation in consumption
among different sectors, where each sector consumes electricity based on different schedules. Another recent development in modern power systems is the entry
of renewable energy sources on a large scale. The use of these energy sources,
which generate an intermittent and less predictable supply, is expected to continue
to increase over the next few decades to reduce the consumption of less environmentally friendly energy resources. To effectively handle these forms of electricity
generation and increased electricity usage, a more intelligent distribution structure
must be implemented. This intelligent solution is referred to in the literature as the
“smart grid” [16].
Electricity dispatch, in which electricity supply must be matched with electricity demand, is a common problem in electrical grids. Research and industrial
CHAPTER 1. INTRODUCTION

3

work has been dedicated to design systems that are capable of taking information
from generators and consumers to determine how to effectively distribute electricity. These types of systems are called “electricity management systems” [105].
Traditional management systems typically use a centralized structure, and rely
on a market operator to manage electricity distribution [99]. This central agent
determines the distribution of electricity either by applying a constant scheduling
mechanism or by running an off-line predication mechanism to model and predict
supply and demand [59]. However, management systems relying on a central single
operator do not match the structure of how an electricity market operate. Normally,
electricity markets rely on different entities (including the generators, distributors
and consumers) to make a decision. As such; electricity markets are better modeled by distributed interactions between consumers and distributors. Furthermore,
traditional management systems have other drawbacks, such as:
1. The extensive computational power required at the decision center, which
results in slow response times and an inability to keep up with real-time
updates [37].
2. The system’s inability to respond adequately to an event not covered by the
system occurs. This is the result of the static structure of the scheduling
mechanism running in the system, and its inability to adapt to regulatory
changes in the environment in a real-time manner [37].
3. The algorithm running at the center of the system must be completely redesigned if the configuration has to be changed (addition or removal of an
element or an agent) [37].
4. The centralized model’s focus concentrates solely on the overall response of
the power grid, making it difficult to model real-time interactions between
different entities [59].
CHAPTER 1. INTRODUCTION

4

However, the realization of a distributed power system requires the development of additional capabilities to manage electricity transmission efficiently without relying on a central agent. By moving into a multi-agent perspective, where
we rely on different decision makers, we must determine how charges can be distributed amongst generation companies, how to take transmission constraints into
account, and which regulation mechanisms are suitable for such a system [99].
Additionally, informational and motivational aspects of the agents, such as beliefs,
desires, intentions and commitments, must be investigated [13]. Furthermore, with
variations in workloads and energy sources, it is hard to define a single policy that
will perform well in all cases [32]. This leads us to a requirement for making
the agents more intelligent and adaptable. To achieve this, agents need to evolve
their own behavior with respect to scheduling their consumption according to these
changes, without the need to redesign their decision making structure. This can be
potentially achieved by applying game theory and machine learning concepts.
Previous work has been done in order to investigate the use of evolutionary
algorithms (EA) within the area of electricity management [67]. These algorithms
have been shown to be successful under specific conditions [81, 20, 83]. Unfortunately, under other conditions, EA tend to perform poorly. One such situation
occurs when there is a very large solution space defined by two or more interacting subspaces. One solution to this challenge is to run EAs on a multi-agent distributed structure of the system, and use the interaction between agents as means of
decreasing the search space by dividing it among different agents, and make each
agent benefit from others’ experience [102]. This introduces the concept of “multiobjective fitness function”, where EA work on multiple fitness functions for each
agent. The ideal solution, in most cases, does not exist because of the contradictory
nature of objective functions, so compromises have to be made.
A second challenge of the use of EA for distributed power systems is a chal-
CHAPTER 1. INTRODUCTION

5

lenge that exists in most of the existing learning algorithms, which is the inability
to adapt quickly to the changes in the environment and user preferences. EAs are
expected to work best if the market is not volatile.
Research has been done to try to overcome these problems in evolutionary algorithms, mainly in single agent systems. One suggested solution was to utilize
human input through “Interactive evolutionary algorithms” [33, 6]. In these algorithms, human input is used by the algorithm to decrease the amount of time needed
to reach a stable and efficient policy for the system. To ensure that the policy follows human preferences, and that it is robust enough to cope with variations in
the system. The latest of our knowledge, the idea of using interactive evolutionary
algorithms in multi-agent systems has not be studied to date.
Therefore, our objective is to find a learning algorithm that can be used by
individual entities in power systems to effectively acquire and manage energy resources to satisfy user preferences. This learning algorithm should be able to adapt
to the behavior of other learning entities, changes in the environment and changes
in user preferences.

1.3

Thesis statement

In this research, we study the performance of genetic algorithms (GAs) as a learning methodology for an agent within a multi-agent system. We discuss the effect of
integrating human input into GAs (known as interactive genetic algorithms) within
a multi-agent system. By conducting different experiments, we try to identify how
human input can be integrated into GAs, and test the applicability of interactive
genetic algorithms (IGAs) in repeated matrix games. The matrix games we use in
our experiments cover different possibilities and variations of the agents’ payoffs
in order to examine the effect of this variation on the algorithms’ performance.
CHAPTER 1. INTRODUCTION

6

We run different variations of our algorithms against different learning opponents,
including itself, Q-learning [100] and GIGA-WoLF [10].

1.4

Thesis Overview

Through this thesis, we will give a detailed analysis of related work done on the
aforementioned problem. We will begin by giving a literature review on related
topics through chapter 2 such as: multi-agent systems, genetic algorithms, and
interactive genetic algorithms. In chapter 3, we move forward to an overview of
the problem and our experimental setup. We show different variations of GAs
implemented through our experiments in chapter 4, and we study the effect of each
of these variations on the final performance of the system.
The next part of our thesis discusses and evaluates a potential framework for integrating human input into GAs. In chapter 5, we experiment the suggested framework within the repeated matrix games and evaluate their performance against the
previously selected learning opponents. In order to evaluate the scalability of our
algorithms, in chapter 6, we apply the GA with and without human input in a 3player environment. We show how GA perform in such an environment and if
there are certain features that are not able to propagate to large-scale enviornments.
This study will help us gain an understanding of how our algorithms will perform
within more complex systems.
Finally, we will give a detailed discussion of the results (Chapter ??). This discussion helps in deriving the conclusions presented in chapter 7. We then suggest
potential future research to be based on this thesis.
CHAPTER

2

Literature Review

In this section, we present an overview of different fields related to our research.
We also discuss related work and explain how this work came to help in our experiments. We will start by giving an overview about electrical power grids in section
2.1. Then, we will connect it to multi-agent systems, and explain the structure of
a standard MAS problem. We then move along to the specific MAS branch of
problems that we study in this thesis, which are repeated matrix games.
After giving the required background about the problem structure, we start giving background about the solution methodology we are using. We give an overview
about evolutionary algorithms in general (which includes genetic algorithms), after
which we explain the history and structure of genetic algorithms. We then examine
the work done in the field of learning in multi-agent systems, and start relating the
work done within genetic algorithms and learning in multi-agent systems. This relation is materialized through different topics including genetic algorithms within
dynamic systems, genetic algorithms for developing strategies in matrix games and
7
CHAPTER 2. LITERATURE REVIEW

8

finally, our main topic, interactive genetic algorithms.

2.1

Electrical Power grids

An electrical power grid is a network of interconnected entities, in which electricity generators are connected to consumers through a set of transmission lines and
distributors (Figure 2.1). Existing electrical grids mainly rely on classic control
systems. The structure of these systems is based on the following: The generator
generates a certain amount of electricity (depending on its capacity), which is then
distributed to the consumers through the distributers. This main structure faces
many difficulties, mainly when it tries to deal with variable types of generators,
distributors and consumers.
In real life systems, this variety is to be expected. In the generators’ case,
renewable energy sources have become extensively used for power generation, especially within the past decade. This leads to intermittency in the supply that is
usually hard to model [48]. Within the consumers’ case, although various research
is targeted towards modeling electricity demand and consumption patterns [43, 24],
demand is not always deemed predictable. This unpredictability leads to many constraints regarding distribution (which is also called a demand-supply problem), and
requires additional research to visualize more work that can efficiently enhance the
generation, distribution and consumption cycles through more intelligent means.

2.1.1

Smart grids

In order to enhance the efficiency of electricity generation, distribution and consumption within the power grid, a logical solution is the merger of intelligent automation systems into the electrical power grid to form the “smart grid” [37].
A smart grid (see Figure 2.2) delivers electricity from suppliers to consumers
CHAPTER 2. LITERATURE REVIEW

9

Figure 2.1: A traditional Electrical Grid [8].
using two-way digital communications to control appliances at consumers’ homes.
This could save energy, reduce costs and increase reliability and transparency if the
risks inherent in executing massive information technology projects are avoided.
Smart grids are being promoted by many governments as a way of addressing energy independence, global warming and emergency resilience issues [37].
In our research, we consider the management of electricity from the consumer
side, where an “agent” here represents a consumer, which tries to satisfy its needs
in the presence of external circumstances.
CHAPTER 2. LITERATURE REVIEW

10

Figure 2.2: A smart electrical grid [8].

2.2

Multi-agent systems

A multi-agent system (MAS) is a system composed of multiple interacting intelligent agents. Many of the real world problems, including electricity grids’ demandresponse problems, can be easily visualized as a MAS [78, 106]. In power systems,
each entity (distributors, consumers, and generators), is represented as an agent.
Each agent may have an individual goal to reach while interacting with other entities (agents).
Various research has been in done in the field to solve the electrical supplydemand problem using multi-agent simulations [97, 80]. As we are testing a
new technique in this thesis, we wanted to base our work on research trends that
have been used before in the field of simulating electrical power grids. One of
these trends is considering our MAS as a simple matrix game (more details in
the following section). This decision was based on the fact that various electrical grids, electricity scheduling and electricity market simulators use matrix
games [61, 73, 62, 5]. The supplier’s goal in these games is to supply power to
CHAPTER 2. LITERATURE REVIEW

11

the consumers with the best price possible, while maintaining stability in the grid.
On the other hand, the consumer agents must satisfy their needs while minimizing
their costs (depending on the consumers’ preferences). All of these goals should
be satisfied keeping in mind the existence of other agents and external influences.
In order to achieve its goal, each agent follows a “strategy,” which is either fixed or
adaptable with time.
As a result of this extensive usage as a representation of an electricity market
problem, we choose to evaluate our algorithms within repeated matrix games. In
the following section we give more details about matrix games and their structure.

2.3

Matrix games

Matrix games are a subset of what are called “stochastic games.” In these games,
each player takes an action, which produces a reward. In a matrix game (also called
“normal form game”), the payoffs (rewards) towards the players action space is defined in the form of a matrix. This action space represents the set of possible actions
that each player can perform within this game. Depending on the rewards they get,
the players decide the “strategy” they are going to follow, where the strategy represents the decision of which action to play over time [42].
For clarification, consider the game represented in Table 2.1. In this matrix
game, we have two players, player1 and player2. Player 1 (row player) can play
either action A or B (so its strategy space is {A,B}). Likewise, Player 2 (column
player) can play either action a or b. From this we can conclude that the set of
possible joint actions is {(A, a), (A, b), (B, a), (B, b)}. Each cell within the matrix
shown in the figure represents the reward for each player when this joint action
occurs. In the matrix given as example, the payoff to the row player (player 1) is
listed first, followed by the payoff to the column player (player 2). For example, if
CHAPTER 2. LITERATURE REVIEW

A
B

a
-1,2
0,0

12
b
3,2
2,-1

Table 2.1: Payoff matrix for the Prisoner’s dilemma.
the row player plays B and the column player plays a (which is the joint action (B,
a)), then the row player receives a payoff of 2, while the column player receives a
payoff of -1.
A strategy for an agent (player) i is a distribution πi over its action set Ai .
Similarly, it can be defined as the probability that a player will move from one state
to another. A strategy can be either a pure strategy (where the probability of playing
one of the actions is 1, while the probability of playing any of the other actions is
0), or a mixed strategy, where each action is played with a certain probability over
time. The joint strategy played by the n agents is π=(π1 , π2 ,...,πn ) and, thus, ri (π)
is the expected payoff for agent i when the joint strategy π is played.
Depending on the action taken, a player’s situation changes overtime. The
situation of the player can be represented in what is called a “state” [11]. In this
example, this situation where each player takes a certain payoff in the presence of
a certain joint-action pair represents the state.

2.3.1

Types of matrix games

Matrix games can be divided into different types using different criteria. In this
section, we discuss the main criteria that affected how we selected the games used
for the experiments. Details about the exact games used in the experiments are
mentioned in the next chapter.
CHAPTER 2. LITERATURE REVIEW

13

Cooperative vs. non-cooperative games
A cooperative game represents a situation in which cooperation amongst the players is the most beneficial to them. Therefore, these games mainly require an efficient arrangement between players to reach the cooperative behavior. An example
of these games is a coordination game [17].
On the other hand, non-cooperative games are not defined as games in which
players do not cooperate, but as games in which any cooperation must be selfenforcing. Most of realistic problems fall under the non-cooperative games.

Symmetric vs. asymmetric games
A symmetric game is a game where the payoffs for playing a particular strategy
depend only on the other strategies employed, not on who is playing them. If the
order of the players can be changed without changing the payoff to the strategies,
then a game is symmetric. On the other hand, in asymmetric games, the action set
for each player is different than the others. For simplicity, we focus in this research
on symmetric games.

2.3.2

Solution concepts

In matrix games, there are concepts through which we can identify strategies for
the players, which, if played, can lead to a state known as equilibrium. These
concepts can be useful in evaluating the performance of the strategy played. We
will give an overview about some of these concepts, which are directly related to
our research.

Best response
A best response is a strategy that produces the most favorable outcome for a player,
given the other players’ strategies [36]. Therefore the strategy πi ∗ is a best response
CHAPTER 2. LITERATURE REVIEW

14

for agent i if ri (πi ∗, π−i ) ≥ ri (πi , π−i ) for all possible πi .
Nash equilibrium
A Nash equilibrium, is a set of strategies, one for each player, that has the property
that no player can unilaterally change his strategy and get a better payoff. The Nash
equilibrium (NE) has had the most impact on the design and evaluation of multiagent learning algorithms to date. The concept of a Nash equilibrium is based on
the best response. When all agents play best responses to the strategies of other
agents, the result is an NE. Nash showed that every game has at least one NE [74].
However, there is no known algorithm for calculating NEs in polynomial time [77].
If we consider that all players are self-interested, each of them would tend to
play the best response to the strategies of other agents, if they know them, therefore
resulting in an NE. Many games have an infinite number of NEs. In the case of
repeated games, these NEs are called NEs of the repeated game(rNEs), which we
will discuss shortly. Therefore, the main goal of an intelligent learning agent is
not just to play best-response to the surrounding agents, but also to influence other
agents to play according to what is profitable to the agent as much as possible.

Maximin
Maximin is a decision rule used in different fields for minimizing the worst possible
loss while maximizing the potential gain. Alternatively, it can be thought of as
maximizing the minimum gain. The maximin theorem states [75]:
For every two-person, zero-sum game with finite strategies, there exists a value V and a mixed strategy for each player, such that (a) Given
player 2’s strategy, the best payoff possible for player 1 is V, and (b)
Given player 1’s strategy, the best payoff possible for player 2 is V [75].
CHAPTER 2. LITERATURE REVIEW

15

Equivalently, Player 1’s strategy guarantees him a payoff of V regardless of Player
2’s strategy, and similarly Player 2 can guarantee himself a payoff of -V. The name
minimax arises because each player minimizes the maximum payoff possible for
the other since the game is zero-sum, he also maximizes his own minimum payoff.

Pareto efficiency
Named after Vilfredo Pareto, Pareto efficiency (optimality) is a measure of efficiency. An outcome of a game is Pareto efficient if there is no other outcome that
makes every player at least as well off and at least one player strictly better off.
That is, if an outcome is Pareto optimal, there does not exist another outcome that
does not provide at least one other player a lower payoff.

2.3.3

Repeated matrix games

A repeated matrix game, from the name, is a matrix game that is played repeatedly.
The joint action taken by the agents identifies the payoff (reward) in each round
(or stage) of the game, which can help the player in taking the decision of which
action can be taken next through learning. The task of a learning agent in a repeated
matrix game is to learn to play a strategy πi such that its average payoff over time
(denoted ri for agent i) is maximized. Let ri be given by:
¯
¯

ri =
¯

1 T
∑ ri (πit , π−it )
T t=1

(2.1)

where πi is the strategy played by agent i at time t, π−i is the joint strategy played
by all the agents except agent i, and 1 ≤ T ≤ ∞ is the number of episodes in the
game. In our work, we consider simultaneous action games, where in each round,
both agents play an action (without knowing the action of the other agents within
the same round).
CHAPTER 2. LITERATURE REVIEW

16

Figure 2.3: Payoff space for the prisoner’s dilemma game [25].
In order to evaluate the strategies played, and see different equilibria, the concept of the one-shot Nash equilibrium does not fully represent the equilibrium in
repeated games. Therefore, in order to define the concept of NE within a repeated
game, we discuss what is called the folk theorem [90].
Consider the case in the prisoners’ dilemma, shown in Figure 2.3, which shows
the joint payoffs of the two players. The x-axis shows the payoffs of the row player
and the y-axis shows the payoffs of the column player. The combination of the
shaded regions (light and dark) in the figure represents what is called the “ConvexHull”, which is all the possible joint-action pairs possible within the game. As
can be noticed, on average, the player guarantees itself a higher stable payoff by
playing defect; neither of the players has incentive to receive an average payoff
(over time) less than 0. Therefore, the darkly shaded region in the figure shows the
set of expected joint payoffs that the agents may possibly accept as average payoffs
within each step of the game.
The folk theorem states that any joint payoffs in the convex hull can be sus-
CHAPTER 2. LITERATURE REVIEW

Cooperate
Defect

Cooperate
3,3
5,0

17
Defect
0,5
1,1

Table 2.2: Payoff matrix for the Prisoner’s dilemma.
tained by an rNE, provided that the discount rates of the players are close to
unity (i.e., players believe that play will continue with high probability after each
episode). This theorem helps us understand the fact that, in repeated games, it is
possible to have an infinite number of NE.

2.3.4

Stochastic games

In real life situations, the more detailed stochastic games (where we have different
possible games to transit from one action-state pair to another) can be considered
more informative and suitable for modeling. However, research has been done [26]
in experimenting if learning algorithms that work within repeated matrix games
can be extended into repeated stochastic games or not. This extension was found
to give suitable results within the prisoner’s dilemma and its stochastic version
(for 2-agent games), which gives us the motivation needed in pursuing the current
experimentation within matrix games.

2.4

Learning in repeated matrix games

In this section, we give an overview of the related work in multi-agent learning,
especially algorithms that were used within matrix games. Because of the existence of various multi-agent learning algorithms found in the literature, we restrict
our attention to those that have had the most impact on the multi-agent learning
community as well as to those that seem to be particularly connected to the work
presented in this thesis. We divided the learning algorithms that we review into
three different (although related) categories: belief-based learning, reinforcement
CHAPTER 2. LITERATURE REVIEW

18

learning, and no-regret learning [26].

2.4.1

Belief-based learning

Belief-based learning is based on the idea of constructing a model of the opponent’s
behavior. These models usually rely on previous interactions with the opponent.
Using this model, we try to find the best response with respect to this model. One
of the most known belief-based learning algorithms is fictitious play [15].

2.4.2

No-regret learning

A no-regret algorithm compares its performance with the “best action” available
within its set. Regret in this case is defined as the difference between the rewards
obtained by the agent and the rewards the agent might have obtained if it followed
a certain combination of its history of actions. In the long run, an algorithm with
no-regret plays such that it has little or no regret for not having played any other
strategy. GIGA-WoLF [10] is one example of a no-regret algorithm. We describe
GIGA-WoLF in greater detail within the next chapter.

2.4.3

Reinforcement learning

Reinforcement learning (RL) methods involve learning what to do so as to maximize (future) payoffs. RL agents use trial and error to learn which strategies produce the highest payoffs. The main idea of reinforcement learning is that through
time, the learner tries to take an action that maximizes a certain reward. Reinforcement learning is widely used within matrix game environments [91, 52]. There are
several known learning algorithms that can be identified as reinforcement learning,
including: Q-learning [100] and evolutionary algorithms [70].
CHAPTER 2. LITERATURE REVIEW

2.5

19

Evolutionary algorithms

Evolutionary algorithms are a popular form of reinforcement learning. Evolutionary algorithms (EAs) are population-based metaheuristic optimization algorithms
that use biology-inspired mechanisms like mutation, crossover, natural selection,
and survival of the fittest in order to refine a set of solution candidates iteratively.
Each iteration of an EA involves a competitive selection designed to remove poor
solutions from the population. The solutions with high ”fitness” are recombined
with other solutions by swapping parts of a solution with another.
Solutions are also mutated by making a small change to a single element of
the solution. Recombination and mutation are used to generate new solutions that
are biased towards solutions that are most fit [58]. This process is repeated until
the solution population converges to the solution with a high fitness value. In general, evolutionary algorithms are considered an effective optimization method [2].
Survival of the fittest concept, together with the evolutionary process, guarantees a
better adaptation of the population [58].

2.5.1

Genetic algorithms

A Genetic Algorithm (GA) [45] is a type of evolutionary algorithm. GAs are based
on a biological metaphor, in which learning is a competition among a population of
evolving candidate problem solutions. A fitness function evaluates each solution to
decide whether it will contribute to the next generation of solutions. Then, through
operations analogous to gene transfer in sexual reproduction, the algorithm creates
a new population of candidate solutions [65, 44].
The main feature of GAs is that they encode the problem within binary string
individuals. In addition to other encoding techniques. Another feature of GAs is
their simplicity as a concept, and their parallel search nature, which makes it possible to easily modify GAs so they can be adapted to a distributed environment [18].
CHAPTER 2. LITERATURE REVIEW

2.5.2

20

Genetic algorithm structure

In this section we give a description of how GAs work. In order to get a better understanding of the algorithms, we define certain terminology that we use
throughout subsequent sections.

Fitness Fitness is the value of the objective function for a certain solution.
The goal of the algorithm is either to minimize or maximize this value, depending
on the objective function.

Genome “Chromosome”

A genome, or frequently called a chromosome, is

the representation of a solution (strategy in the case of matrix games) that is to be
taken at a certain point of time. The GA generates various chromosomes, each
of which is assigned a certain fitness according to its performance. Using this
fitness, the known evolutionary functions are implemented in order to create new
population (generation) of chromosomes.

Gene Genes are the units that form a certain genome (chromosome). The
evolutionary functions such as mutation and crossover are mainly performed on
the genes within the chromosomes.

Solution space The solution space defines the set of all possible chromosomes (solutions) within a certain system. Through the evolutionary functions, we
try to cover as much as possible of the solution space for proper evaluation, without
trying all possible solutions one by one, as in the case of brute-search, in order to
save time.
After giving the proper definitions, we are going to describe how GAs work. In
the following sections, we will discuss the main components that vary among different GAs according to the application. These components include: chromosome
CHAPTER 2. LITERATURE REVIEW

21

representation, selection process (fitness calculation and representation), mutation
process, and crossover process.

Chromosome structure
As previously mentioned, one important feature of GAs is their focus on fixedlength characters’ strings, although variable-length strings and other structures
have been used. Within matrix games, these character strings represent the binary
encoding of a certain strategy [3, 31]. But others have used non-binary encoding,
depending on their application [86].
It should be noted, however, that there are special cases in which we consider
the use of a binary encoding perfectly acceptable. In a prisoner’s dilemma, for example, agents have to make decisions that are intrinsically binary, namely decisions
between cooperation and defection. The use of a binary encoding of strategies then
seems like a very natural choice that is unlikely to cause undesirable artifacts [4].

Fitness functions and selection
The fitness function is a representation of the quality of each solution (chromosome). This representation varies from one application to another. According to
the fitness value, we select the fittest chromosomes and then perform crossover
and mutation functions on these chromosomes to generate the new chromosomes.
These techniques include: roulette wheel selection, rank base selection, elitism and
tournament based selection.
In the following section, we discuss each method. We exclude going over
tournament based selection, as it requires experimenting and testing the solution
through the selection process, which is not be suitable to our application since it is
an off- line training technique for the algorithm.
CHAPTER 2. LITERATURE REVIEW

22

Figure 2.4: Roulette wheel selection mechanism [30].
Roulette wheel selection

Parents are selected according to their fitness. The

better the chromosomes, the more chances they have to be selected. In order to
get a better understanding, imagine a roulette wheel where all chromosomes in the
population are distributed on a wheel. Each chromosome gains its share of the
wheel size according to its fitness (Figure 2.4). A marble is thrown to select the
chromosome on this wheel. The fittest chromosome has a higher opportunity of
being selected.

Rank based selection Roullete wheel selection has problems when the fitnesses vary widely. For example, if the best chromosome’s fitness is 90 percent of
all the roulette wheel then the other chromosomes will have very few chances to
be selected. Rank selection first ranks the population and then every chromosome
receives fitness from this ranking. The worst will have a fitness of 1, the second
worst will have a fitness of 2, and this continues till we reach the best chromosome. The best chromosome will have a fitness of N (where N is the number of
CHAPTER 2. LITERATURE REVIEW

23

Figure 2.5: Crossover in genetic algorithms.
chromosomes in the population).

Elitism The idea of elitism has already been introduced. When creating new
population by crossover and mutation, we have a big chance that we will lose the
best chromosome. Elitism selection starts by copying a certain percentage of the
best chromosomes to the new population. The rest of the population is then created
by applying mutation and crossover on the selected elite. Elitism can very rapidly
increase the performance of a GA because it prevents the algorithm from forgetting
the best found solution.

Crossover and Mutation
Crossover and mutation are the main functions of any genetic algorithm after selection. They are the functions responsible for the creation of new chromosomes
out of the existing chromosomes.
In the crossover phase, all of the selected chromosomes are paired up, and
with a probability called “crossover probability,” they are mixed together so that a
certain part of one of the parents is replaced by a part of the same length from the
other parent chromosome (Figure 2.5). The crossover is accomplished by randomly
CHAPTER 2. LITERATURE REVIEW

24

Figure 2.6: Mutation in genetic algorithms.
choosing a site along the length of the chromosome, and exchanging the genes of
the two chromosomes for each gene past this crossover site.
After the crossover, each of the genes of the chromosomes (except for the elite
chromosome) is mutated to any one of the codes with a probability defined as
the “mutation probability” (Figure 2.6). With the crossover and mutations completed, the chromosomes are once again evaluated for another round of selection
and reproduction. Setting the parameters concerned with crossover and mutation
is mainly dependent on the application at hand and the chromosome structure [45].

Algorithm summary
Genetic algorithms are based on the fundamental algorithm structure as shown in
Figure 2.7. First, an initial population of N individuals, which evolves at each generation, is created. Generally, we can say that a generation of solutions is obtained
from the previous generation through the following procedure: solutions are randomly selected from the current population. Pairs of selected individuals are then
submitted to the crossover operation with a given crossover probability Pc . Each
descendant is then submitted to a mutation operation with a mutation probability
Pm , which is usually very small. The chromosome’s ability to solve the problem
is determined by its fitness function; the final step in the generation process is the
substitution of individuals of the current population with low performance by the
CHAPTER 2. LITERATURE REVIEW

25

Figure 2.7: Basic structure of genetic algorithms.
new descendants. The algorithm stops after a predefined number, Gen, of generations has been created. An alternative stopping mechanism is a limit on computing
time [101].

Advantages and Applications of genetic algorithms
GAs represent one of the most renowned optimization search techniques, especially with the presence of high potential, non-linear search spaces. GAs have been
used to solve different single agent problems [70]. As the computational requirements increase, it became more applicable to distribute different methods within
the GA to different agents, where all agents in this case have the same goal. As
a summary, we can say that GAs are most efficient and appropriate for situations
such as the following:
CHAPTER 2. LITERATURE REVIEW

26

• The search space is large, complex, or not easily understood
• There is no programmatic method that can be used to narrow the search space
• Traditional optimization methods, such as dynamic programming, are not
sufficient
Genetic algorithms may be utilized in solving a wide range of problems across
multiple fields such as science, business, engineering, and medicine. The following
provides a few examples:
• Optimization: production scheduling, call routing for call centers, routing
for transportation, determining electrical circuit layouts
• Machine learning: designing neural networks, designing and controlling
robots
• Business applications: utilized in financial trading, credit evaluation, budget
allocation, fraud detection
Genetic algorithms are important in machine learning for various reasons:
1. They can work on discrete spaces, where generally gradient methods cannot
be applied.
2. They can be used to search parameters for other machine learning models
such as fuzzy sets and neural networks.
3. They can be used in situations where the only information we have is a
measurement of performance, and here it competes with temporal difference
techniques, such as Q-learning [86].
4. They converge to a near optimal solution after exploring only a small fraction
of the search space [98, 49].
CHAPTER 2. LITERATURE REVIEW

27

5. They can be easily hybridized and customized depending on the application.
6. They may also be advantageous in situations when needs to find a near optimal solution [87].
While the great advantage of GA is the fact that they find a solution through
evolution, this is also the biggest disadvantage. Evolution is inductive in nature,
so it does not evolve towards a good solution, but it evolves away from bad circumstances [84]. This can cause a species to evolve into an evolutionary dead end.
This disadvantage can be clearly seen within more dynamic system, where stabilzing within a dead-end can stop the algorithm from satisfying a dynamic learning
process.

2.6

Genetic algorithms in repeated matrix games

GAs have also been used within matrix games. Mainly, they have been used in
this context as either a method for computing the Nash equilibrium of a certain
game [22, 2, 54], or for generating the “optimal” chromosomes (strategies) to be
played within a certain game [4]. These applications have been either solved by a
single genetic algorithm running through the game parameters to reach an optimal
solution [34], or by using what is called a “co-evolutionary” technique, which basically involves the usage of two genetic algorithms (both having the same goal),
in order to reach the optimal solution in a more trusted and efficient manner [52].
Note here that co-evolution is considered as an “offline” learning technique,
as it require testing all the current chromosomes within the population against all
other chromosomes. This is not the same case while playing online, where the
chromosome is tested against only the current associate it was set up to play against
(not the whole population), which gives it less opportunity of exploration in front
of different criteria.
CHAPTER 2. LITERATURE REVIEW

28

GAs have been used before in formation of strategies in dynamic Prisoner’s
Dilemma games. For example, Axelrod has used genetic algorithms in order to find
the most desirable strategies to be used in the prisoner’s dilemma [4]. Axelrod’s
stimulus-response players were modeled as strings, where each character within
the string corresponds to a possible state (one possible history) and decodes to the
player’s action in the next period. The longer the steps in memory to be taken into
consideration, the longer the string representing the chromosome will be. This is
as a result of the increase in the possible number of states. In addition, moving to a
game with more than two possible moves of will lengthen the string. Increasing the
number of players will also increase the number of states. Formally, the number of
states is given by am×p , where there are a actions and p players, and each player
keeps m periods of time in its history [44].
Another example was the usage of GA within a simple formulation of a buyerseller dilemma [92]. The GA implements a mixed strategy for the seller as an
individual member of the population. Each population member is therefore a vector of probabilities for each action that all add up to 1.0. Within this experiment,
they discussed the performance of GA in contrast of other RL techniques. The difference in performance between the GA agents and RL agents is primarily because
GA agents are population based. Since RL agents deal with only one strategy, they
are faster in adapting it in response to the feedback received from the environment.
In contrast, the GA agents are population based. It takes the GA population
as a whole longer to respond to the feedback received. For the same reason, the
GA agents are expected to exhibit less variance, and hence better convergence
properties. This was a good start in using genetic algorithm as a learning technique
instead of optimization. However, it still needed more work regarding working
against simple learning agents, for example without full state representation, and
without more specific domain knowledge. The authors of this work also raised the
CHAPTER 2. LITERATURE REVIEW

29

question of how human input can contribute as potential future work [92].
In order to get a better understanding about how GAs may perform in more
complicated situations, we now discuss within the following sections the performance of GAs within similar fields, such as distributed and dynamic systems.

2.6.1

Genetic algorithms in distributed systems

In distributed systems, the primary form of GAs that has been used was co-evolutionary
algorithms [56, 52, 107]. In this case, each GA agent represents one possible solution, and with the existence of other GA agents, they try to verify which solution
will possibly be optimal. In another context, where each GA agent evolves its own
set of solutions, all the agents are centralized with the same objective function (all
the agents cooperate and communicate in order to reach the same goal) [47, 69].
As we can see in all of these situations, the GA agent is not completely independent
of other existing agents’ objective.

2.6.2

Genetic algorithms in dynamic systems

The goal of a GA within a dynamic system changes from finding an “optimal”
answer to tracking a certain goal (and enhancing the overall performance). Most
real world, artificial systems and societies change due to changes in a number of
external factors in the environments, agents learning new knowledge, or changes
in the make up of the population. When the environment changes over time, resulting in modifications of the fitness function from one cycle to another, we say that
we are in the presence of a dynamic environment [93]. Several researchers have
addressed related issues in previous work. Examples include evolutionary models
and co-evolutionary models where the population is changing over time, and studies in the viscosity of populations [68, 103]. Different from classical GA the goal
of such system is to maximize the average result instead of determining the best
CHAPTER 2. LITERATURE REVIEW

30

optimal solution, where you track the performance of different solutions over time
instead of reaching a certain optimal target.
Brank [12] surveys the strategies for making evolutionary algorithms, which
include GAs, suitable for dynamic problems. The author grouped the different
techniques into three categories:
• React to changes, where as soon as a change in the environment has been
detected explicit actions are taken
• Maintain diversity throughout the run, where convergence is avoided all the
time and it is hoped that a spread-out population can adapt to modifications
more easily [55]
• Maintain an additional memory through generations (memory-based approaches),
where the evolutionary algorithm is supplied with memory to be able to recall useful information from past generations
Many methods have been presented to make genetic algorithms applicable in dynamic environments. First, researchers have modeled change in the environment
by introducing noise into the system whereby agents actions are mis-implemented
or mis-interpreted by other agents[29]. Another idea has been to localize the search
within a certain part of the search space. This can be either done through intelligent initialization of the population [85], or as done within the “memetic algorithm” [79], by evaluating close and similar neighbors of the chromosomes on trial
in addition to the chromosomes already tested. This is where we get part of our
motivation within interactive learning. This idea motivated our feedback of generating populations based on feedback from users(evaluating close neighbors helps
me evaluate existing ones).
The aforementioned methods either do not consider the existence of other heterogeneous learning entities in the system, or learn only under certain identified
CHAPTER 2. LITERATURE REVIEW

31

constraints [1]. However, experimental results are promising and show interesting
properties of the adaptive behavior of GA techniques.

2.7

Interactive learning

Another factor to consider to potentially enhance the performance of genetic algorithms is to gather human input in real time to teach the algorithm. Within
any learning algorithm, this can be done by merging the learning algorithm with
human-machine interaction, resulting in what is called in the literature “interactive artificial learning.” Using human input as a part of the learning procedure can
provide a more concrete reward mechanism, which can increase the convergence
speed [96, 27]. These learning methods occur in either the “act,” “observe” or “update” step of an interactive artificial learning mechanism [27]. Experiments have
been performed to evaluate potential effects of human input on the learning curve
in multi-agent environments. Results show a significant improvement in learning,
depending on the quality of the human input [27, 28].

Figure 2.8: The interactive artificial learning process [27].
CHAPTER 2. LITERATURE REVIEW

2.7.1

32

Interactive learning in repeated matrix games

Within the repeated games environment. Experiments have been performed in
order to analyze the effect of the human input on the performance of the learning algorithms [27]. The algorithms used in these experiments use “learning by
demonstration” (LbD). Results showed that LbD does help learning agents to learn
non-myopic equilibrium in repeated stochastic games when human demonstrations
are well-informed. On the other hand, when human demonstrations are less informed, these agents do not always learn behavior that produces (more successful)
non-myopic equilibria. However, it appears that well-formed variations of LbD algorithms that distinguish between informed and uninformed demonstrations could
learn non-myopic equilibrium.
When humans play iterated prisoners’ dilemma games, their performance depends on many factors [32, 41, 59]. Thus, it can be concluded that a similar trend
applies to LbD algorithms, and that there is a chance that LbD algorithms could
potentially provide information about the game and associates that would provide
a context that facilitates better demonstrations.

2.7.2

Interactive genetic algorithms

An interactive genetic algorithm (IGA) is defined as a genetic algorithm that uses
human evaluation. These algorithms belong to a more general category of interactive evolutionary computation. The main application of these techniques include
domains where it is hard or impossible to design a computational fitness function,
including evolving images, music, various artistic designs and forms to fit a user’s
aesthetic preferences.
In an in interactive genetic algorithm (IGA), the algorithm interacts with the
human in attempt to quickly learn effective behavior and to better consider human
preferences. Previous work on IGA in distributed tasks has shown that human
CHAPTER 2. LITERATURE REVIEW

33

input can allow genetic algorithms to learn more effectively [33, 38]. However,
such successes required heavy user interaction, which causes human fatigue [33].
Previous work in interactive evolutionary learning in single-agent systems has
analyzed methods for decreasing the amount of necessary human interaction in
interactive genetic learning. These methods either apply bootstrapping techniques,
which rely on estimations of the reward in between iterations instead of a direct
reward from the user [60, 66], or they divide the set of policies to be evaluated into
clusters, where the user only evaluates the center of the cluster and not all policies
[82].
Another suggestion for reducing human fatigue, which is applicable only in
multi-agent systems, is by using input from other agents (and potentially other
agents’ experiences), as one’s own experience [39].
Interaction between a human and the algorithm may occur in different stages
of GA and in different ways. The most common way is to make a human part of
the fitness evaluation for the population. This can be done by either ranking available solutions [94], or directly assigning the fitness function value to the available
policies in the population. Other work, which targets reducing human fatigue as
mentioned above, had the human evaluate only selected representatives of the population [82, 88, 60]. Also human input has been investigated in the mutation stage,
where the human first selects the best policy from his point of view, and suggests a
mutation operation to enhance its performance [33].
Babbar-Sebens et al. [28] realized a problem with IGA, which was how an algorithm can overcome the temporal changes in human input. This situation not only
leads the genetic algorithm to prematurely converge, but it can also reduce diversity
in the genetic algorithm population. This occurs when solutions that initially have
poor human rankings are not able to survive the GA’s selection process. Loss of
these solutions early in the search process could be detrimental to the performance
CHAPTER 2. LITERATURE REVIEW

34

of the genetic algorithm, if these solutions have the potential to perform better
when preferences change later. That is why they suggested the use of case-based
memory per population of policies. This memory acts as a continuous memory of
the population and its fitnesses to give a more continuous and non-myopic view of
evaluating the performance of the chromosomes through the generations (instead
of having the evaluations be based on a single generation biases) [6].
IGAs have also been used to define robot behavior in known environments. A
child (representing the human factor) trains the genetic algorithm through feedback on the evolved population. This training happens by selecting the top three
preferred routes to be taken by the robot [66]. Usually IGA are used in more visual
problems, where we can easily make the user rank or evaluate chromosomes in
certain domains such as music and design [40, 14, 89]. It also has been used in
resource allocation problems (which usually are more static) [94, 7].
The last related example was the interaction of a human with a GA within a
board game, where GA plays against another GA or against a human. The GA
here works on a limited set of (easy to reach) solutions that describe a behavior for
the whole game (not move by move). The human in this case sets the parameters in
the beginning of the game, including the number of generations and the mutation
rate [21].

Interactive genetic algorithms in multi-agent systems
Research that studied interactive genetic algorithms in multi-agent systems was
mainly focused on dividing the IGA functions (including human interaction, mutation, crossover) into separate agents [57, 51]. In this case, the IGA is not fully
independent of the other modules, where all modules interact with each other to
reach a common objective.
CHAPTER 2. LITERATURE REVIEW

2.8

35

Summary

Past work has derived intelligent behavior in repeated games using various methodolgies. However, existing solutions have various problems, including:
• They require too much input from the user.
• They force certain constraints on users which decrease their comfort level.
• They do not learn fast enough for real time systems, and are not able to deal
with environments or goals that change over time.
• They cannot be used in distributed systems.
• They do not consider how human input can be incorporated in the system.
From this, we conclude that there is a need of a solution that is able to target
these drawbacks.
CHAPTER

3

Experimental Setup

In this chapter, we will present the experimental setup used to test our hypothesis.
Since our goal is to test the performance of interactive genetic algorithms (IGAs)
within a multi-agent system setting, we designed an experiment that will allow us
to do that. In order to test the efficiency of the algorithm, we run it against itself and
other learning algorithms in a variety of matrix games. The two renowned learning
algorithms we use are GIGA-Wolf [10] and Q-learning [100].

3.1

Games’ structure

In this section, we give an overview of the matrix games’ we use to evaluate our
algorithms. The expectations from the players within the games differ from one
game to another. Therefore, we expect a different response from each learning
algorithm.

36
CHAPTER 3. EXPERIMENTAL SETUP

Cooperate
Defect

Cooperate
3,3
5,0

37
Defect
0,5
1,1

Table 3.1: Payoff matrix for the Prisoner’s dilemma.

3.1.1

Prisoner’s dilemma

The prisoner’s dilemma is perhaps the most studied social dilemma [4], [3], as it
appears to model many real life situations. In the prisoner’s dilemma (Table 3.1),
defection is each agent’s dominant action. However, both agents can increase their
payoffs simultaneously by influencing the other agent to cooperate. To do so, an
agent must (usually) be willing to cooperate (at least to some degree) in the longrun. An n-agent, m-action version of this game has also been studied [95].
As we can see from the matrix in Table 3.1, if we consider the reward of each
player for each joint-action pair, the following rule should apply within a prisoner’s
dilemma matrix: rdc ≥ rcc ≥ rdd ≥ rcd . Here, i represents my action, j represents
the opponent’s action, and ri j represents my reward at the joint action pair (i, j).
However, it is more desirable than mutual defection for both players to choose the
first actions (C,C) and obtain rcc .

3.1.2

Chicken

The game of Chicken is a game of conflicting interests. Chicken models the Cuban
Missile Crisis [19], among other real-life situations. The game has two one-shot
NEs ((C, d) and (D, c)). However, in the case of a repeated game, agents may
be unwilling to receive a payoff of 2 continuously when much more profitable
solutions are available. Thus, in such cases, compromises can be reached, such as
the Nash bargaining solution (Swerve, Swerve) (Table 3.2). Therefore, the game is
similar to the prisoner’s dilemma game (Table 3.1) in that an “agreeable” mutual
solution is available. This solution, however, is unstable since both players are
CHAPTER 3. EXPERIMENTAL SETUP
Swerve
6,6
7,4

Swerve
Straight

38
Straight
4,7
2,2

Table 3.2: Payoff matrix for chicken game.

A
B
C

a
0,0
1,0
0,1

b
0,1
0,0
1,0

c
1,0
0,1
0,0

Table 3.3: Payoff matrix of Shapley’s game.
individually tempted to stray from it.

3.1.3

Shapley’s game

Shapley’s game [36] is a 3-action game. It is a variation from the rock-paperscissors game. Shapley’s game has often been used to show that various learning
algorithms do not converge. The game has a unique one-shot NE in which all
agents play randomly. The NE of this game gives a payoff of 1/3 to each agent
(in a 2-agent formulation). However, the players can reach a compromise in which
both receive an average payoff of 1/2. This situation can be reached if both players
alternate between receiving a payoff of 1 and receiving a payoffs of 0. The payoff
matrix for this game is shown in Table 3.3.

3.1.4

Cooperative games

As mentioned in the previous chapter, cooperative games are exactly opposite to
competitive games (which are part of non-cooperative games). In these games, all
agents share common goals, some of which may be more profitable than others.
Table 3.4 shows the payoff matrix of a fully cooperative game.
CHAPTER 3. EXPERIMENTAL SETUP

A
B

a
4,4
0,0

39
b
0,0
2,2

Table 3.4: Payoff matrix of a fully cooperative matrix game.

3.2

Knowledge and Information

In matrix games, the more information any learning algorithm has about the game
and its associates, the more efficiently it can learn. Some of the information is
usually hidden within the learning process, and the algorithm has to deal with only
the information available. The following list shows the possible variations in the
level of knowledge of the agent. This will help us have a better understanding of
how our algorithm, and its opponents view the surrounding world.
• The agent’s own action. The agent has to basically know its actions in order
to know how to act in the first place.
• The agent’s own payoffs. The agent may know the reward it has taken at a
certain point of time, or how the actions are rewarded over time.
• Associates’ actions. The agent can either know directly which action was
taken by an associate, or be able to predict it over time.
• Associates’ payoffs. The agent can know what outcomes can be used to
motivate or threaten the other associates to be able to act accordingly.
• Associates’ internal structure. The agent may have a knowledge of how the
opponent reacts to certain situations. This knowledge usually is gained by
attempts to model the associates over time.
In our experiment, we assume that the algorithm has a complete knowledge
about its own payoffs and actions as well as the opponent’s history of actions (from
previous plays). No knowledge of the opponents internal structure is assumed.
CHAPTER 3. EXPERIMENTAL SETUP

40

Algorithm 3.1 GIGA-WolF
xt is the strategy according to which I play my action
zt is the “baseline” strategy
loop
xt+1 ← xt + ηt ∗ rt
ˆ
zt+1 ← zt + ηt ∗ rt /3
δt+1 ← min( zt+1 − zt / zt+1 − xt+1 )
ˆ
xt+1 = xt+1 + δt+1 ∗ (zt+1 − xt+1 )
ˆ
ˆ
end loop

3.3

Opponents

The following section overviews the learning algorithms that we selected as opponents in our experiments.

3.3.1

GIGA-WolF

GIGA-WoLF [10] (Generalized Infinitesimal Gradiet Ascent-Win or Learn Fast) is
a gradient ascent algorithm. It is also a model-free algorithm like Q-learning. The
idea of the algorithm is that it compares its strategy to a baseline strategy. It learns
quickly if the strategy is performing worse than the baseline strategy. On the other
hand, if the strategy is performing better than the baseline strategy, it learns at a
slower rate. Algorithm 3.1 shows the basic update structure of the algorithm.
As we can see, this algorithm consists of two main components. The first
component is the “GIGA” component. The idea of “GIGA” is that after each play
the agent updates its strategy in the direction of the gradient of its value function.
The “WoLF” component was introduced later [11]. The idea is to use two different
strategy update steps, one which is updated with a faster learning rate than the other.
To distinguish between those situations, the player keeps track of two policies.
Each policy is concerned with assigning the probabilities of taking a certain action
under this specific situation.
GIGA-WoLF is a no-regret algorithm. No-regret learning converges to NEs
CHAPTER 3. EXPERIMENTAL SETUP

41

Algorithm 3.2 Q-learning
for each state-action pair (s, a) do
Q(s, a) ← 0
end for
loop
Depending on exploration rate ε, select an action a and execute it
Receive immediate reward r
Observe the new state s
Update the table entry for Q(s, a) as follows:
Q(s, a) ← (1 − α) ∗ Q(s, a) + α ∗ (r + γ ∗ maxa Q(s , a ))
s←s
end loop

in dominance-solvable, constant-sum, and 2-action general-sum games, but do not
necessarily converge in Shapleys Game [50].

3.3.2

Q-learning

Q-learning [100] is a reinforcement learning technique that is widely used in artificial intelligence research [35], [72], [9]. It can be also viewed as a dynamic
programing technique, in which it iteratively tries to learn its ”to-go” payoff(called
a Q-value) over time. Its main idea can be summarized as follows: an agent tries
an action at a particular state, and evaluates its consequences in terms of the immediate reward or penalty it receives and its estimate of the value of the next state. By
trying all actions in all states repeatedly, it learns which are best overall, judged by
the long-term discounted reward. Algorithm 3.2 shows the main structure of the
algorithm.
For all states and action pairs, Q(s, a) converges to the true value under the
optimal policy when (i) the environment has the Markov property, (ii) the agent
visits all states and takes all actions infinitely often, and (iii) the learning rate α
is decreased properly. However, if the agent always chooses the actions greedily
during learning, Q-values may converge to a local optimum because the agent may
not visit all states sufficiently. To avoid this, the agent usually uses a stochastic
CHAPTER 3. EXPERIMENTAL SETUP

42

methods (like ε-greedy) to choose actions. The ε-greedy method chooses an action that has the maximum Q-value with probability (1-ε) or a random action with
probability ε [72].
The Q-learning algorithm we are using in our experiments has the following
settings:
• Discount factor γ = 0.95.
• State is represented by the previous joint action of the agent and its associates.
• Exploration rate ε = 1.0/(10.0 + (t/1000.0)), where t represents the number
of rounds played.

3.4

Evaluation criteria

In order to evaluate the performance of our algorithms, we will mainly focus on
two main points:
1. The average fitness of population per generation: This will help us to evaluate the performance of the algorithm regarding convergence and the ability
of the algorithm to learn over time.
2. The final payoff achieved: By knowing and studying the final payoff, we
know the final performance of the algorithm in comparison to other algorithms.
These evaluation criteria are averaged over 10 runs of each algorithm against
the selected opponent in order to eliminate the effect of randomness. We show the
variations within the final payoffs over all the conducted runs in order to verify
the stability of the performance of the algorithm within a specific game against a
specific opponent.
CHAPTER 3. EXPERIMENTAL SETUP

3.5

43

Performance of GIGA-WoLF and Q-learning

In order to get a better understanding of the behavior of the algorithms in the selected games, we compared the performance of the algorithms against each other
and in self-play with available reports from previous work [10], [104], [71]. Figure 3.1 shows our results. These results represent the average of the average of
final payoffs of both learners over 10 runs within each game. Each run consists of
running both learning algorithms against each other for 100,000 steps. In addition,
the figure shows the standard deviation of the payoffs from the average (to show if
it converges consistently to the same rewards or not).

3.5.1

Prisoner’s dilemma

In the prisoner’s dilemma, we expected the agents to learn mutual defection when
GIGA-WoLF plays against Q-learning. This is due to the fact that we are working
with a no-regret algorithm, which learns within few iterations to defect. On the
other hand, the Q-learner learns slower than the GIGA-Wolf, so it takes it a larger
number of iterations for the Q-learner to learn to defect against GIGA-WoLF. The
Q-learner we used uses a high value for γ as it was shown from previous work
that it increases the probability of cooperation if the other learners are willing to
cooperate [71].

3.5.2

Chicken

When GIGA-WoLF and Q-learning interact in Chicken, they did not stabilize to
a fixed outcome among the 10 simulations we ran. Depending on the state action
pairs experimented by the Q-learner in the beginning of a simulation, GIGA-WoLF
acts accordingly. Thus, if the Q-learner started by attempting to swerve in the beginning, the GIGA-WoLF will go straight. But in most of the cases, GIGA-WoLF
CHAPTER 3. EXPERIMENTAL SETUP

44

will go with the safe option (if Q-learner tried to go straight in the beginning), and
it will swerve for the rest of the game.
Both Q-learning and GIGA-WoLF are not able to reach the compromise situation of (swerve, swerve) in self-play. One agent always “bullies” the other into
swerving, while the other goes straight.

3.5.3

Shapley’s game

Previous work [9], [50] shows that GIGA-WoLF’s policy does not converge in
Shapley’s game. This fact is apparent in both self-play and against Q-learning. As
a result, players receive an average payoff near the NE value in this game. The
best performance for Q-learning is in self play, as it is often able to learn over time
to reach the solution of iterating between “winning” and “loosing”. But again, in
some cases it is still unable to reach this satisfactory solution.

3.5.4

Cooperative games

Within cooperative games, GIGA-WoLF will find it easy to maintain one of the
actions over time, giving it the ability to reach cooperation quickly in self-play.
This property helps the Q-learner to easily discover the state-action pair that has
the maximum Q-value (as both of the agents in this case have the same goal). As
a result, our Q-learners learn mutual cooperation. On the other hand, Q-learning is
not able to maintain the highest payoff possible from cooperation in self-play. The
reason is although each agent tries to stabilize at one of the actions (to reach its
steady-state), the exploration mechhanism within the algorithm sometimes make it
hard for both agents to maintain a certain action pair.
CHAPTER 3. EXPERIMENTAL SETUP

3.6

45

Summary

From the results presented within this chapter, we find that, although both Qlearning and GIGA-WoLF perform well under certain situations, there are situations in which the algorithms do not learn effectively. Furthermore, these algorithms sometimes take a long time to converge. This motivates us to work on
deducing new algorithms that are able to adapt within such dynamic systems. In
the following chapter, we start discussing the structure of the suggested algorithm
and potential variations that could enhance it.
CHAPTER 3. EXPERIMENTAL SETUP

46

Final payoffs of GIGA−WoLF and Q−learning in Prisoners dilemma
3

Average payoff (10 runs, each run 100,000 steps)

GIGA−WoLF
Q−learning

2.5

2

1.5

1

0.5

Vs. GIGA−WoLF

Vs. Q−learning
Opponents (Vs.)

Final payoffs of GIGA−WoLF and Q−learning in Cooperation game
GIGA−WoLF
Q−learning

4.5

Average payoff (10 runs, each run 100,000 steps)

4

3.5

3

2.5

2

1.5

1

0.5

0

Vs. GIGA−WoLF

Vs. Q−learning
Opponents (Vs.)

Final payoffs of GIGA−WoLF and Q−learning in Chicken game
7.5
GIGA−WoLF
Q−learning

Average payoff (10 runs, each run 100,000 steps)

7

6.5

6

5.5

5

4.5

4

3.5

Vs. GIGA−WoLF

Vs. Q−learning
Opponents (Vs.)

Final payoffs of GIGA−WoLF and Q−learning in Shapleys game
0.55
GIGA−WoLF
Q−learning

Average payoff (10 runs, each run 100,000 steps)

0.5

0.45

0.4

0.35

0.3

0.25

0.2

Vs. GIGA−WoLF

Vs. Q−learning
Opponents (Vs.)

Figure 3.1: Payoffs of GIGA-Wolf and Q-Learning within selected games.
CHAPTER

4

Learning using Genetic Algorithms

In this chapter, we discuss the performance of a basic genetic algorithm (GA) algorithm in repeated matrix games. In addition, we present several suggested modifications to this basic algorithm and show how they may affect the performance of
the GA. We first describe the basic GA structure, and the modifications we apply to
it. We then demonstrate the performance of these algorithms against GIGA-WoLF,
Q-Learning and in self-play.
We initially define a set of parameters that are used within our algorithms, we
tried to maximize the number of steps taken in order to get a better understanding
of the learning trends. These parameters include: the total number of steps (Ns )
and the number of generations (NG ), both determine the time range through which
the agent can learn through playing against another agents. We set NG = 100 generations and Ns = 100,000 (for easier manipulation of the calculations required).
Once we know NG and Ns , we get a trade-off between the number of chromosomes within a population (Nc ), and the number of steps that each chromosome
47
CHAPTER 4. LEARNING USING GENETIC ALGORITHMS

48

plays against the opponent (Nsc ). Equation 4.1 shows the resulting trade-off.

NG =

Ns
Nsc × NG

(4.1)

Therefore, by setting the total number of steps (Ns ) to 100,00, and by fixing the
number of generations to 100 (in order to have an acceptable number of generations
through which we compare our results), we get that Nsc × Nc =1000.
Through initial experimentations, we set Nsc to different values including 50,
100, and 200, which caused Nc to be 20, 10, and 5 respectively. This analysis
shows us that decreasing the number of chromosomes within a population reduces
randomization, which can cause the population lean towards a local-optima in certain situations. At the same time, by increasing the members in the population,
although it will allow us more exploration, evaluating the population consumes
more time. That is why we settled for a population of 20 chromosomes, which
appears to be reasonable in our settings.
In the following section, we will describe the main structure of the algorithm
to get a better understanding of how we incorporate these modifications, as well as
comprehend the results and analysis.

4.1

Algorithm structure

We analyze a GA typically used in similar problems [3], with slight modifications
to the selection function. Before we start, we introduce some of the variables
used within our work. Table 4.1 shows the common parameters used within the
algorithms.
As mentioned within the literature review, GA starts with the initialization of a
new population of chromosomes (Pop). Each chromosome C represents the strategy followed by the player in response to the current history (his) of both the player
CHAPTER 4. LEARNING USING GENETIC ALGORITHMS

49

and its opponent. An example of the structure of the chromosome within the prisoner’s dilemma game can be seen in Figure 4.1.
AIn this structure, each bit in the chromosome represents the action to be taken
for a particular history (specificed by the position) of joint action. In our case, we
used the last three actions taken by both the agent and its opponent, as has been
done in some past work [3]. In order to determine which action to take according to
a set of history steps (his), we convert the number his from the base Na to decimal
base in order to identify the bit location of the step to be taken. This conversion is
made as follows:
3

A p = ∑ (Na )i × hisi
i=1

Representation
Pm
Pc
Pe
C
f
P
Ch
Ap
Bp
Avp
Bavp
Nc
NG
Na
Ns
Ns c
Pop
his
Ovf
g
Entrop

Variable
Mutation Rate
Crossover Rate
Elitism Rate
Chromosome (Strategy)
Fitness
Parent
Child
Position of the gene to determine action to be taken
Best Chromosome in Previous Generation
Average fitness of chromosomes in current generation
Best average payoff over generations
Number of chromosomes per generation
Number of generations
Number of actions available for each player
Total number of chromosomes
Number of steps per chromosome
Current population
Current history of actions
Overall fitness of chromosome
Gene within a chromosome (bit)
Entropy of a gene

Table 4.1: Variables used within the algorithms

(4.2)
CHAPTER 4. LEARNING USING GENETIC ALGORITHMS

50

Figure 4.1: Chromosome structure
Using this equation, we can identify the position of the action to be taken in
response to a certain history within the chromosome. For example, if my history
was CCCCDC (which represents that my moves in the past three stages were Cooperate, except the last stage was defect, while the opponent cooperated all the
time). This history can be encoded as a binary number 000010, which we convert
to decimal base. This means that the bit g at position Ap=2 within the chromosome
shows the action to be taken given this history. Take note that the binary encoded
history can be set in other bases depending on the number of actions available. For
example in a 3-action game, we will be working with history encoded in the ternary
numeric base.
After the initialization of the population (in this experiment we have 20 chromosomes within the population), we start running these random chromosomes
against the opponent player to “evaluate” them. Our evaluation here is based on
averaging the reward taken by each chromosome within each step against the opponent. Each chromosome plays for 50 steps against the opponent. The reward is
then averaged to compute the fitness of the chromosome.
Following the evaluation, we sort the population according to fitness f. Here
our selection process starts, with the higher fitness at the top. We keep the elite
(top) chromosomes, whose number is defined by the elitism rate Pe for the following generation, and apply mutation and crossover on the best two chromosomes.
The same sequence is repeated again with the new population Ch until a stopping
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.
Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.

Mais conteúdo relacionado

Mais procurados

Robofish - Final Report (amended)
Robofish - Final Report (amended)Robofish - Final Report (amended)
Robofish - Final Report (amended)Adam Zienkiewicz
 
Alpha and Gamma Oscillations in MEG-Data: Networks, Function and Development.
Alpha and Gamma Oscillations in MEG-Data: Networks, Function and Development.Alpha and Gamma Oscillations in MEG-Data: Networks, Function and Development.
Alpha and Gamma Oscillations in MEG-Data: Networks, Function and Development.FrdricRoux5
 
Computer security using machine learning
Computer security using machine learningComputer security using machine learning
Computer security using machine learningSandeep Sabnani
 
Classification System for Impedance Spectra
Classification System for Impedance SpectraClassification System for Impedance Spectra
Classification System for Impedance SpectraCarl Sapp
 
Lequocanh
LequocanhLequocanh
LequocanhLê Anh
 
Nguyễn Nho Vĩnh - Problem solvingwithalgorithmsanddatastructures
Nguyễn Nho Vĩnh - Problem solvingwithalgorithmsanddatastructuresNguyễn Nho Vĩnh - Problem solvingwithalgorithmsanddatastructures
Nguyễn Nho Vĩnh - Problem solvingwithalgorithmsanddatastructuresNguyễn Nho Vĩnh
 
CharmainePhDThesis (4)
CharmainePhDThesis (4)CharmainePhDThesis (4)
CharmainePhDThesis (4)Charmaine Borg
 
Mining of massive datasets
Mining of massive datasetsMining of massive datasets
Mining of massive datasetssunsine123
 

Mais procurados (18)

Robofish - Final Report (amended)
Robofish - Final Report (amended)Robofish - Final Report (amended)
Robofish - Final Report (amended)
 
Alpha and Gamma Oscillations in MEG-Data: Networks, Function and Development.
Alpha and Gamma Oscillations in MEG-Data: Networks, Function and Development.Alpha and Gamma Oscillations in MEG-Data: Networks, Function and Development.
Alpha and Gamma Oscillations in MEG-Data: Networks, Function and Development.
 
phd-thesis
phd-thesisphd-thesis
phd-thesis
 
Computer security using machine learning
Computer security using machine learningComputer security using machine learning
Computer security using machine learning
 
Report
ReportReport
Report
 
Diederik Fokkema - Thesis
Diederik Fokkema - ThesisDiederik Fokkema - Thesis
Diederik Fokkema - Thesis
 
Thesis
ThesisThesis
Thesis
 
Classification System for Impedance Spectra
Classification System for Impedance SpectraClassification System for Impedance Spectra
Classification System for Impedance Spectra
 
Lequocanh
LequocanhLequocanh
Lequocanh
 
feilner0201
feilner0201feilner0201
feilner0201
 
Nguyễn Nho Vĩnh - Problem solvingwithalgorithmsanddatastructures
Nguyễn Nho Vĩnh - Problem solvingwithalgorithmsanddatastructuresNguyễn Nho Vĩnh - Problem solvingwithalgorithmsanddatastructures
Nguyễn Nho Vĩnh - Problem solvingwithalgorithmsanddatastructures
 
PhD-2013-Arnaud
PhD-2013-ArnaudPhD-2013-Arnaud
PhD-2013-Arnaud
 
Ims16 thesis-knabl-v1.1
Ims16 thesis-knabl-v1.1Ims16 thesis-knabl-v1.1
Ims16 thesis-knabl-v1.1
 
thesis
thesisthesis
thesis
 
CharmainePhDThesis (4)
CharmainePhDThesis (4)CharmainePhDThesis (4)
CharmainePhDThesis (4)
 
thesis
thesisthesis
thesis
 
Mining of massive datasets
Mining of massive datasetsMining of massive datasets
Mining of massive datasets
 
Physics grade 10-12
Physics grade 10-12Physics grade 10-12
Physics grade 10-12
 

Destaque

Design and Implementation of a Multi-Agent System for the Job Shop Scheduling...
Design and Implementation of a Multi-Agent System for the Job Shop Scheduling...Design and Implementation of a Multi-Agent System for the Job Shop Scheduling...
Design and Implementation of a Multi-Agent System for the Job Shop Scheduling...CSCJournals
 
Multi agent system for knowledge management in SCM
Multi agent system for knowledge management in SCMMulti agent system for knowledge management in SCM
Multi agent system for knowledge management in SCMGeorge Ogrinja
 
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...infopapers
 
A Multi-Agent Architecture for Intrusion Detection
A Multi-Agent Architecture for Intrusion DetectionA Multi-Agent Architecture for Intrusion Detection
A Multi-Agent Architecture for Intrusion DetectionJuan A. Suárez Romero
 
T9. Trust and reputation in multi-agent systems
T9. Trust and reputation in multi-agent systemsT9. Trust and reputation in multi-agent systems
T9. Trust and reputation in multi-agent systemsEASSS 2012
 
A refined metric suite for a multi agent system
A refined metric suite for a multi agent systemA refined metric suite for a multi agent system
A refined metric suite for a multi agent systemeSAT Journals
 
Extending the Gillespie's Stochastic Simulation Algorithm for Integrating Dis...
Extending the Gillespie's Stochastic Simulation Algorithm for Integrating Dis...Extending the Gillespie's Stochastic Simulation Algorithm for Integrating Dis...
Extending the Gillespie's Stochastic Simulation Algorithm for Integrating Dis...Danilo Pianini
 
Blending Event-Based and Multi-Agent Systems around Coordination Abstractions
Blending Event-Based and Multi-Agent Systems around Coordination AbstractionsBlending Event-Based and Multi-Agent Systems around Coordination Abstractions
Blending Event-Based and Multi-Agent Systems around Coordination AbstractionsAndrea Omicini
 
Federal Mutil-Agent System (FEDMAS)
Federal Mutil-Agent System (FEDMAS)Federal Mutil-Agent System (FEDMAS)
Federal Mutil-Agent System (FEDMAS)COL Vernon Myers
 
Interactions in Multi Agent Systems
Interactions in Multi Agent SystemsInteractions in Multi Agent Systems
Interactions in Multi Agent SystemsSSA KPI
 
Event-Based vs. Multi-Agent Systems: Towards a Unified Conceptual Framework. ...
Event-Based vs. Multi-Agent Systems: Towards a Unified Conceptual Framework. ...Event-Based vs. Multi-Agent Systems: Towards a Unified Conceptual Framework. ...
Event-Based vs. Multi-Agent Systems: Towards a Unified Conceptual Framework. ...Andrea Omicini
 
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...Soumya Banerjee
 
Multi-agent Control of Thermal Systems in Buildings
Multi-agent Control of Thermal Systems in BuildingsMulti-agent Control of Thermal Systems in Buildings
Multi-agent Control of Thermal Systems in BuildingsBenoit Lacroix
 
Event-Based vs. Multi-Agent Systems: Towards a Unified Conceptual Framework
Event-Based vs. Multi-Agent Systems: Towards a Unified Conceptual FrameworkEvent-Based vs. Multi-Agent Systems: Towards a Unified Conceptual Framework
Event-Based vs. Multi-Agent Systems: Towards a Unified Conceptual FrameworkAndrea Omicini
 
Evaluation of recommender technology using multi agent simulation
Evaluation of recommender technology using multi agent simulationEvaluation of recommender technology using multi agent simulation
Evaluation of recommender technology using multi agent simulationZina Petrushyna
 
BBL multi agent systems
BBL multi agent systemsBBL multi agent systems
BBL multi agent systemsCédric BURON
 
Multi-agent systems
Multi-agent systemsMulti-agent systems
Multi-agent systemsR A Akerkar
 
Introduction to Agents and Multi-agent Systems (lecture slides)
Introduction to Agents and Multi-agent Systems (lecture slides)Introduction to Agents and Multi-agent Systems (lecture slides)
Introduction to Agents and Multi-agent Systems (lecture slides)Dagmar Monett
 

Destaque (20)

Design and Implementation of a Multi-Agent System for the Job Shop Scheduling...
Design and Implementation of a Multi-Agent System for the Job Shop Scheduling...Design and Implementation of a Multi-Agent System for the Job Shop Scheduling...
Design and Implementation of a Multi-Agent System for the Job Shop Scheduling...
 
Multi agent system for knowledge management in SCM
Multi agent system for knowledge management in SCMMulti agent system for knowledge management in SCM
Multi agent system for knowledge management in SCM
 
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
 
A Multi-Agent Architecture for Intrusion Detection
A Multi-Agent Architecture for Intrusion DetectionA Multi-Agent Architecture for Intrusion Detection
A Multi-Agent Architecture for Intrusion Detection
 
T9. Trust and reputation in multi-agent systems
T9. Trust and reputation in multi-agent systemsT9. Trust and reputation in multi-agent systems
T9. Trust and reputation in multi-agent systems
 
A refined metric suite for a multi agent system
A refined metric suite for a multi agent systemA refined metric suite for a multi agent system
A refined metric suite for a multi agent system
 
Extending the Gillespie's Stochastic Simulation Algorithm for Integrating Dis...
Extending the Gillespie's Stochastic Simulation Algorithm for Integrating Dis...Extending the Gillespie's Stochastic Simulation Algorithm for Integrating Dis...
Extending the Gillespie's Stochastic Simulation Algorithm for Integrating Dis...
 
Type 2 fuzzy ontology ahmadchan
Type 2 fuzzy ontology ahmadchanType 2 fuzzy ontology ahmadchan
Type 2 fuzzy ontology ahmadchan
 
Blending Event-Based and Multi-Agent Systems around Coordination Abstractions
Blending Event-Based and Multi-Agent Systems around Coordination AbstractionsBlending Event-Based and Multi-Agent Systems around Coordination Abstractions
Blending Event-Based and Multi-Agent Systems around Coordination Abstractions
 
Federal Mutil-Agent System (FEDMAS)
Federal Mutil-Agent System (FEDMAS)Federal Mutil-Agent System (FEDMAS)
Federal Mutil-Agent System (FEDMAS)
 
Interactions in Multi Agent Systems
Interactions in Multi Agent SystemsInteractions in Multi Agent Systems
Interactions in Multi Agent Systems
 
Event-Based vs. Multi-Agent Systems: Towards a Unified Conceptual Framework. ...
Event-Based vs. Multi-Agent Systems: Towards a Unified Conceptual Framework. ...Event-Based vs. Multi-Agent Systems: Towards a Unified Conceptual Framework. ...
Event-Based vs. Multi-Agent Systems: Towards a Unified Conceptual Framework. ...
 
I 7
I 7I 7
I 7
 
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
 
Multi-agent Control of Thermal Systems in Buildings
Multi-agent Control of Thermal Systems in BuildingsMulti-agent Control of Thermal Systems in Buildings
Multi-agent Control of Thermal Systems in Buildings
 
Event-Based vs. Multi-Agent Systems: Towards a Unified Conceptual Framework
Event-Based vs. Multi-Agent Systems: Towards a Unified Conceptual FrameworkEvent-Based vs. Multi-Agent Systems: Towards a Unified Conceptual Framework
Event-Based vs. Multi-Agent Systems: Towards a Unified Conceptual Framework
 
Evaluation of recommender technology using multi agent simulation
Evaluation of recommender technology using multi agent simulationEvaluation of recommender technology using multi agent simulation
Evaluation of recommender technology using multi agent simulation
 
BBL multi agent systems
BBL multi agent systemsBBL multi agent systems
BBL multi agent systems
 
Multi-agent systems
Multi-agent systemsMulti-agent systems
Multi-agent systems
 
Introduction to Agents and Multi-agent Systems (lecture slides)
Introduction to Agents and Multi-agent Systems (lecture slides)Introduction to Agents and Multi-agent Systems (lecture slides)
Introduction to Agents and Multi-agent Systems (lecture slides)
 

Semelhante a Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.

Integrating IoT Sensory Inputs For Cloud Manufacturing Based Paradigm
Integrating IoT Sensory Inputs For Cloud Manufacturing Based ParadigmIntegrating IoT Sensory Inputs For Cloud Manufacturing Based Paradigm
Integrating IoT Sensory Inputs For Cloud Manufacturing Based ParadigmKavita Pillai
 
A Comparative Study Of Generalized Arc-Consistency Algorithms
A Comparative Study Of Generalized Arc-Consistency AlgorithmsA Comparative Study Of Generalized Arc-Consistency Algorithms
A Comparative Study Of Generalized Arc-Consistency AlgorithmsSandra Long
 
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingBig Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingGabriela Agustini
 
(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...
(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...
(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...ssuserfa7e73
 
Stochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning PerspectiveStochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning Perspectivee2wi67sy4816pahn
 
Neural Networks on Steroids
Neural Networks on SteroidsNeural Networks on Steroids
Neural Networks on SteroidsAdam Blevins
 
Stock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_AnalysisStock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_AnalysisOktay Bahceci
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsArinto Murdopo
 
Efficient algorithms for sorting and synchronization
Efficient algorithms for sorting and synchronizationEfficient algorithms for sorting and synchronization
Efficient algorithms for sorting and synchronizationrmvvr143
 
Efficient algorithms for sorting and synchronization
Efficient algorithms for sorting and synchronizationEfficient algorithms for sorting and synchronization
Efficient algorithms for sorting and synchronizationrmvvr143
 
Interactive Filtering Algorithm - George Jenkins 2014
Interactive Filtering Algorithm - George Jenkins 2014Interactive Filtering Algorithm - George Jenkins 2014
Interactive Filtering Algorithm - George Jenkins 2014George Jenkins
 
High Performance Traffic Sign Detection
High Performance Traffic Sign DetectionHigh Performance Traffic Sign Detection
High Performance Traffic Sign DetectionCraig Ferguson
 

Semelhante a Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations. (20)

Integrating IoT Sensory Inputs For Cloud Manufacturing Based Paradigm
Integrating IoT Sensory Inputs For Cloud Manufacturing Based ParadigmIntegrating IoT Sensory Inputs For Cloud Manufacturing Based Paradigm
Integrating IoT Sensory Inputs For Cloud Manufacturing Based Paradigm
 
Master_Thesis
Master_ThesisMaster_Thesis
Master_Thesis
 
Thesis_Nazarova_Final(1)
Thesis_Nazarova_Final(1)Thesis_Nazarova_Final(1)
Thesis_Nazarova_Final(1)
 
A Comparative Study Of Generalized Arc-Consistency Algorithms
A Comparative Study Of Generalized Arc-Consistency AlgorithmsA Comparative Study Of Generalized Arc-Consistency Algorithms
A Comparative Study Of Generalized Arc-Consistency Algorithms
 
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingBig Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
 
Big data-and-the-web
Big data-and-the-webBig data-and-the-web
Big data-and-the-web
 
2013McGinnissPhD
2013McGinnissPhD2013McGinnissPhD
2013McGinnissPhD
 
(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...
(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...
(Springer optimization and its applications 37) eligius m.t. hendrix, boglárk...
 
Stochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning PerspectiveStochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning Perspective
 
Neural Networks on Steroids
Neural Networks on SteroidsNeural Networks on Steroids
Neural Networks on Steroids
 
Stock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_AnalysisStock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_Analysis
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
dissertation
dissertationdissertation
dissertation
 
Master_Thesis
Master_ThesisMaster_Thesis
Master_Thesis
 
main
mainmain
main
 
Efficient algorithms for sorting and synchronization
Efficient algorithms for sorting and synchronizationEfficient algorithms for sorting and synchronization
Efficient algorithms for sorting and synchronization
 
Efficient algorithms for sorting and synchronization
Efficient algorithms for sorting and synchronizationEfficient algorithms for sorting and synchronization
Efficient algorithms for sorting and synchronization
 
Interactive Filtering Algorithm - George Jenkins 2014
Interactive Filtering Algorithm - George Jenkins 2014Interactive Filtering Algorithm - George Jenkins 2014
Interactive Filtering Algorithm - George Jenkins 2014
 
High Performance Traffic Sign Detection
High Performance Traffic Sign DetectionHigh Performance Traffic Sign Detection
High Performance Traffic Sign Detection
 
Mak ms
Mak msMak ms
Mak ms
 

Mais de Yomna Mahmoud Ibrahim Hassan

1Computer Graphics new-L1-Introduction to Computer Graphics.pdf
1Computer Graphics new-L1-Introduction to Computer Graphics.pdf1Computer Graphics new-L1-Introduction to Computer Graphics.pdf
1Computer Graphics new-L1-Introduction to Computer Graphics.pdfYomna Mahmoud Ibrahim Hassan
 
Human Computer Interaction-fall2021 - CSC341-L1.pptx.pdf
Human Computer Interaction-fall2021 - CSC341-L1.pptx.pdfHuman Computer Interaction-fall2021 - CSC341-L1.pptx.pdf
Human Computer Interaction-fall2021 - CSC341-L1.pptx.pdfYomna Mahmoud Ibrahim Hassan
 
Word Tagging using Max Entropy Model and Feature selection
Word Tagging using Max Entropy Model and Feature selection Word Tagging using Max Entropy Model and Feature selection
Word Tagging using Max Entropy Model and Feature selection Yomna Mahmoud Ibrahim Hassan
 
Report on Knowledge Modeling in Various applications in Traffic Systems
Report on Knowledge Modeling in Various applications in Traffic SystemsReport on Knowledge Modeling in Various applications in Traffic Systems
Report on Knowledge Modeling in Various applications in Traffic SystemsYomna Mahmoud Ibrahim Hassan
 
Knowledge Modeling in Various applications in Traffic Systems
Knowledge Modeling in Various applications in Traffic SystemsKnowledge Modeling in Various applications in Traffic Systems
Knowledge Modeling in Various applications in Traffic SystemsYomna Mahmoud Ibrahim Hassan
 
Genetic Algorithms in Repeated Matrix Games: The Effects of Algorithmic Modif...
Genetic Algorithms in Repeated Matrix Games: The Effects of Algorithmic Modif...Genetic Algorithms in Repeated Matrix Games: The Effects of Algorithmic Modif...
Genetic Algorithms in Repeated Matrix Games: The Effects of Algorithmic Modif...Yomna Mahmoud Ibrahim Hassan
 
How a company may expand its share in the student/university market segment f...
How a company may expand its share in the student/university market segment f...How a company may expand its share in the student/university market segment f...
How a company may expand its share in the student/university market segment f...Yomna Mahmoud Ibrahim Hassan
 
Using Information Systems to Improve Businesses: The present and the future
Using Information Systems to Improve Businesses: The present and the futureUsing Information Systems to Improve Businesses: The present and the future
Using Information Systems to Improve Businesses: The present and the futureYomna Mahmoud Ibrahim Hassan
 
ECG beats classification using multiclass SVMs with ECOC
ECG beats classification using multiclass SVMs with ECOCECG beats classification using multiclass SVMs with ECOC
ECG beats classification using multiclass SVMs with ECOCYomna Mahmoud Ibrahim Hassan
 

Mais de Yomna Mahmoud Ibrahim Hassan (20)

W1_CourseIntroduction.pptx advancedgraphics
W1_CourseIntroduction.pptx advancedgraphicsW1_CourseIntroduction.pptx advancedgraphics
W1_CourseIntroduction.pptx advancedgraphics
 
First Umrah Application Details - A proposal
First Umrah Application Details - A  proposalFirst Umrah Application Details - A  proposal
First Umrah Application Details - A proposal
 
1Computer Graphics new-L1-Introduction to Computer Graphics.pdf
1Computer Graphics new-L1-Introduction to Computer Graphics.pdf1Computer Graphics new-L1-Introduction to Computer Graphics.pdf
1Computer Graphics new-L1-Introduction to Computer Graphics.pdf
 
Introduction to Google Colaboratory.pdf
Introduction to Google Colaboratory.pdfIntroduction to Google Colaboratory.pdf
Introduction to Google Colaboratory.pdf
 
Human Computer Interaction-fall2021 - CSC341-L1.pptx.pdf
Human Computer Interaction-fall2021 - CSC341-L1.pptx.pdfHuman Computer Interaction-fall2021 - CSC341-L1.pptx.pdf
Human Computer Interaction-fall2021 - CSC341-L1.pptx.pdf
 
Word Tagging using Max Entropy Model and Feature selection
Word Tagging using Max Entropy Model and Feature selection Word Tagging using Max Entropy Model and Feature selection
Word Tagging using Max Entropy Model and Feature selection
 
Social Learning
Social LearningSocial Learning
Social Learning
 
Planning Innovation
Planning InnovationPlanning Innovation
Planning Innovation
 
3alem soora : Submission to ITU competition
3alem soora : Submission to ITU competition3alem soora : Submission to ITU competition
3alem soora : Submission to ITU competition
 
Report on Knowledge Modeling in Various applications in Traffic Systems
Report on Knowledge Modeling in Various applications in Traffic SystemsReport on Knowledge Modeling in Various applications in Traffic Systems
Report on Knowledge Modeling in Various applications in Traffic Systems
 
Knowledge Modeling in Various applications in Traffic Systems
Knowledge Modeling in Various applications in Traffic SystemsKnowledge Modeling in Various applications in Traffic Systems
Knowledge Modeling in Various applications in Traffic Systems
 
Yomna Hassan CV 2014
Yomna Hassan CV 2014Yomna Hassan CV 2014
Yomna Hassan CV 2014
 
Image Annotation
Image AnnotationImage Annotation
Image Annotation
 
Heterogeneous data annotation
Heterogeneous data annotationHeterogeneous data annotation
Heterogeneous data annotation
 
Genetic Algorithms in Repeated Matrix Games: The Effects of Algorithmic Modif...
Genetic Algorithms in Repeated Matrix Games: The Effects of Algorithmic Modif...Genetic Algorithms in Repeated Matrix Games: The Effects of Algorithmic Modif...
Genetic Algorithms in Repeated Matrix Games: The Effects of Algorithmic Modif...
 
Sparks RSS Reader
Sparks RSS ReaderSparks RSS Reader
Sparks RSS Reader
 
How a company may expand its share in the student/university market segment f...
How a company may expand its share in the student/university market segment f...How a company may expand its share in the student/university market segment f...
How a company may expand its share in the student/university market segment f...
 
Using Information Systems to Improve Businesses: The present and the future
Using Information Systems to Improve Businesses: The present and the futureUsing Information Systems to Improve Businesses: The present and the future
Using Information Systems to Improve Businesses: The present and the future
 
ECG beats classification using multiclass SVMs with ECOC
ECG beats classification using multiclass SVMs with ECOCECG beats classification using multiclass SVMs with ECOC
ECG beats classification using multiclass SVMs with ECOC
 
Beginners XNA
Beginners XNABeginners XNA
Beginners XNA
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Último (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations.

  • 1. Applicability of Interactive Genetic Algorithms to Multi-agent Systems: Experiments on Games Used in Smart Grid Simulations. by Yomna Mahmoud Ibrahim Hassan A Thesis Presented to the Masdar Institute of Science and Technology in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computing and Information Science c 2011 Masdar Institute of Science and Technology All rights reserved
  • 2. AUTHOR’S DECLARATION I understand that copyright in my thesis is transferred to Masdar Institute of Science and Technology. ACCEPTANCE DECLARATION This thesis has been accepted and approved by Masdar Institute of Science and Technology on August 01, 2011. EXAMINATION COMMITTEE MEMBERS Jacob Crandall, Advisor, Masdar Institute of Science and Technology Davor Svetinovic, Masdar Institute of Science and Technology Iyad Rahwan, Masdar Institute of Science and Technology ii
  • 3. Abstract A common goal of many organizations over the next decades is to enhance the efficiency of electrical power grids. This entails : (1) modifying the power grid structure to be able to utilize the available resources in the best way possible, and (2) introducing new energy sources that are able to benefit from the surrounding circumstances. The trend toward the use of renewable energy sources requires the development of power systems that are able to accommodate variability and intermittency in electricity generation. Therefore, these power grids, usually called “smart grids,” must be dynamic enough to adapt smoothly to changes in the environment and human preferences. In a smart grid, each decision maker can be represented as an intelligent agent that consumes or produces electricity. Each agent interacts with other agents and the surrounding environment. The goal of these agents may vary between maintaining the stability of electricity in the grid from the generation side and increasing users’ satisfaction with the electricity service from the consumers’ side (which is our focus). This is done through the interaction between different agents to schedule and divide the tasks of consumption and generation among each other, depending on the need and the type of each agent. In this thesis, we investigate the use of interactive genetic algorithms to derive intelligent behavior that enables an agent on the consumer’s side to consume the proper amount of electricity to satisfy human preferences. This behavior must take iii
  • 4. into account the existence of other agents within the system, which increases the dynamicity of the system. In order to evaluate the effectiveness of the suggested algorithms within a multi-agent settings, we test our algorithms in repeated matrix games when they associate with other copies of themselves, and against other known multi-agent learning algorithms. We run different variations of the genetic algorithm, with and without human input, in order to determine what are the factors that affect the performance of the algorithm within a dynamic multi-agent system. Our results show reasonable potential for using genetic algorithms in such circumstances, particularly when they utilize effective human input. iv
  • 5. This research was supported by the Government of Abu Dhabi to help fulfill the vision of the late President Sheikh Zayed Bin Sultan Al Nayhan for sustainable development and empowerment of the UAE and humankind. v
  • 6. Acknowledgments I would first like to thank Masdar for giving us the opportunity to conduct our research, and increase the potential of a suitable environment to achieve a successful research. I would also like to express my gratitude towards my committee members, starting with my advisor Professor Jacob Crandall, for his support, encouragement and continuous enthusiasm about our research for the past 2 years. Furthermore, I would like to thank Dr. Davor Svetinovic and Dr. Iyad Rahwan for their feedback. Last, but never least, I would like to thank my family and friends, for their continuous support and their belief in me. Yomna Mahmoud Ibrahim Hassan, Masdar City, August 1, 2011. vi
  • 7. Contents 1 1 1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation and Relevance to the Masdar Initiative . . . . . . . . . 2 1.3 Thesis statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 2 Introduction Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Literature Review 7 2.1 Electrical Power grids . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.1 Smart grids . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Multi-agent systems . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Matrix games . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1 Types of matrix games . . . . . . . . . . . . . . . . . . . 12 2.3.2 Solution concepts . . . . . . . . . . . . . . . . . . . . . . 13 2.3.3 Repeated matrix games . . . . . . . . . . . . . . . . . . . 15 2.3.4 Stochastic games . . . . . . . . . . . . . . . . . . . . . . 17 Learning in repeated matrix games . . . . . . . . . . . . . . . . . 17 2.4 vi
  • 8. 2.4.1 18 Evolutionary algorithms . . . . . . . . . . . . . . . . . . . . . . 19 Genetic algorithms . . . . . . . . . . . . . . . . . . . . . 19 Genetic algorithm structure . . . . . . . . . . . . . . . . . 20 Genetic algorithms in repeated matrix games . . . . . . . . . . . 27 2.6.1 Genetic algorithms in distributed systems . . . . . . . . . 29 2.6.2 Genetic algorithms in dynamic systems . . . . . . . . . . 29 Interactive learning . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.7.1 Interactive learning in repeated matrix games . . . . . . . 32 2.7.2 3 Reinforcement learning . . . . . . . . . . . . . . . . . . . 2.5.2 2.8 18 2.5.1 2.7 No-regret learning . . . . . . . . . . . . . . . . . . . . . 2.4.3 2.6 18 2.4.2 2.5 Belief-based learning . . . . . . . . . . . . . . . . . . . . Interactive genetic algorithms . . . . . . . . . . . . . . . 32 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Experimental Setup 36 3.1 Games’ structure . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.1.1 Prisoner’s dilemma . . . . . . . . . . . . . . . . . . . . . 37 3.1.2 Chicken . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.1.3 Shapley’s game . . . . . . . . . . . . . . . . . . . . . . . 38 3.1.4 Cooperative games . . . . . . . . . . . . . . . . . . . . . 38 3.2 Knowledge and Information . . . . . . . . . . . . . . . . . . . . 39 3.3 Opponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3.1 GIGA-WolF . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3.2 Q-learning . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4 Evaluation criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.5 Performance of GIGA-WoLF and Q-learning . . . . . . . . . . . 43 3.5.1 Prisoner’s dilemma . . . . . . . . . . . . . . . . . . . . . 43 3.5.2 Chicken . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 vii
  • 9. 3.5.3 4 44 3.5.4 3.6 Shapley’s game . . . . . . . . . . . . . . . . . . . . . . . Cooperative games . . . . . . . . . . . . . . . . . . . . . 44 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Learning using Genetic Algorithms 47 4.1 Algorithm structure . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.1.1 Basic genetic algorithm . . . . . . . . . . . . . . . . . . . 51 4.1.2 Genetic algorithm with history propagation . . . . . . . . 52 4.1.3 Genetic algorithm with stopping condition . . . . . . . . 52 4.1.4 Genetic algorithm with dynamic parameters’ setting . . . 52 4.1.5 Genetic algorithm with dynamic parameters’ setting and stopping condition . . . . . . . . . . . . . . . . . . . . . Genetic algorithms vs. GIGA-WoLF . . . . . . . . . . . . 55 Genetic algorithms Vs. Q-learning . . . . . . . . . . . . . 56 4.2.3 5 54 4.2.2 4.3 Results and analysis . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 4.2 53 Genetic algorithms in self play . . . . . . . . . . . . . . . 59 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Interactive genetic algorithms 63 5.1 Human input framework . . . . . . . . . . . . . . . . . . . . . . 64 5.1.1 Evaluate the population . . . . . . . . . . . . . . . . . . . 64 5.1.2 Select set of histories . . . . . . . . . . . . . . . . . . . . 66 5.1.3 Generate statistics for selected histories . . . . . . . . . . 67 5.1.4 Generating a new population from human input . . . . . . 68 Interactive genetic algorithms: Six variations . . . . . . . . . . . 69 5.2.1 Effect of input quality on the performance of GA . . . . . 69 Results and analysis . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.3.1 73 5.2 5.3 Interactive genetic algorithms . . . . . . . . . . . . . . . viii
  • 10. 5.3.2 Modificiations on interactive genetic algorithms . . . . . . 5.3.3 78 The effect of human input quality on interactive genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 6 83 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 90 6.1 N-player prisoner’s dilemma . . . . . . . . . . . . . . . . . . . . 90 6.2 Strategy representation . . . . . . . . . . . . . . . . . . . . . . . 92 6.3 Human input . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.4 Results and analysis . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.5 7 IGA in N-player matrix games Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Conclusions and Future work 99 7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 ix
  • 11. List of Tables 2.1 Payoff matrix for the Prisoner’s dilemma. . . . . . . . . . . . . . 12 2.2 Payoff matrix for the Prisoner’s dilemma. . . . . . . . . . . . . . 17 3.1 Payoff matrix for the Prisoner’s dilemma. . . . . . . . . . . . . . 37 3.2 Payoff matrix for chicken game. . . . . . . . . . . . . . . . . . . 38 3.3 Payoff matrix of Shapley’s game. . . . . . . . . . . . . . . . . . . 38 3.4 Payoff matrix of a fully cooperative matrix game. . . . . . . . . . 39 4.1 Variables used within the algorithms . . . . . . . . . . . . . . . . 49 4.2 Payoff matrix for the Prisoner’s dilemma. . . . . . . . . . . . . . 56 4.3 Payoff of a fully cooperative matrix game. . . . . . . . . . . . . . 56 4.4 Payoff matrix for chicken game. . . . . . . . . . . . . . . . . . . 56 4.5 Payoff matrix of Shapley’s game. . . . . . . . . . . . . . . . . . . 57 5.1 Properties of the different variations of IGA algorithms. . . . . . . 70 5.2 Acceptable and unacceptable human inputs for the selected 2-agent matrix games. . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 70
  • 12. List of Figures 2.1 A traditional Electrical Grid [8]. . . . . . . . . . . . . . . . . . . 9 2.2 A smart electrical grid [8]. . . . . . . . . . . . . . . . . . . . . . 10 2.3 Payoff space for the prisoner’s dilemma game [25]. . . . . . . . . 16 2.4 Roulette wheel selection mechanism [30]. . . . . . . . . . . . . . 22 2.5 Crossover in genetic algorithms. . . . . . . . . . . . . . . . . . . 23 2.6 Mutation in genetic algorithms. . . . . . . . . . . . . . . . . . . . 24 2.7 Basic structure of genetic algorithms. . . . . . . . . . . . . . . . 25 2.8 The interactive artificial learning process [27]. . . . . . . . . . . . 31 3.1 Payoffs of GIGA-Wolf and Q-Learning within selected games. . . 46 4.1 Chromosome structure . . . . . . . . . . . . . . . . . . . . . . . 50 4.2 Effect of variations on GA on final payoffs against GIGA-WoLF (all games). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 57 Effect of variations on GA on final payoffs against GIGA-WoLF in prisoner’s dilemma. . . . . . . . . . . . . . . . . . . . . . . . . . xi 58
  • 13. 4.4 Effect of variations on GA on final payoffs against GIGA-WoLF in cooperation game. . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Effect of variations on GA on final payoffs against GIGA-WoLF in Chicken game. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 59 Effect of variations on GA on final payoffs against Q-learning in prisoner’s dilemma. . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 58 Effect of variations on GA on final payoffs against Q-learning (all games). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 58 Effect of variations on GA on final payoffs against GIGA-WoLF in Shapley’s game. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 58 60 Effect of variations on GA on final payoffs against Q-learning in a Cooperation game. . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.10 Effect of variations on GA on final payoffs against Q-learning in Chicken game. . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.11 Effect of variations on GA on final payoffs against Q-learning in Shapley’s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.12 Sample of the chromosomes generated vs. Q-learning in prisoner’s dilemma. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.13 Effect of history propagation on GA against Q-learning in Prisoners dilemma. Values shows are the average payoff per generation. 61 4.14 Effect of variations on GA against Q-learning in cooperation game (Average payoff per generation). . . . . . . . . . . . . . . . . . . 61 4.15 Effect of variations on GA on final payoffs in self play (all games) 61 4.16 Effect of variations on GA on final payoffs in prisoner’s dilemma . 62 4.17 Effect of variations on GA on final payoffs in a Cooperation game 62 4.18 Effect of variations on GA on final payoffs in Chicken game . . . 62 4.19 Effect of variations on GA on final payoffs in Shapley’s game . . . 62 xii
  • 14. 5.1 Human input framework. . . . . . . . . . . . . . . . . . . . . . . 65 5.2 Designed graphical user interface. . . . . . . . . . . . . . . . . . 66 5.3 Evaluation metrices example . . . . . . . . . . . . . . . . . . . . 72 5.4 Effect of human input on basic IGA against GIGA-WoLF (all games). 73 5.5 Effect of basic human input on the performance (final payoff) of GA vs. GIGA-WoLF in prisoner’s dilemma. . . . . . . . . . . . . 5.6 Effect of basic human input on the performance (final payoff) of GA vs. GIGA-WoLF in a cooperation game. . . . . . . . . . . . . 5.7 74 Effect of basic human input on the performance (final payoff) of GA vs. GIGA-WoLF in Shapley’s game. . . . . . . . . . . . . . . 5.9 74 Effect of basic human input on the performance (final payoff) of GA vs. GIGA-WoLF in chicken game. . . . . . . . . . . . . . . . 5.8 74 74 Effect of basic human input on GA against GIGA-WoLF in Shapley’s (Average payoff per generation). . . . . . . . . . . . . . . . 75 5.10 Effect of basic human input on GA on final payoffs against Qlearning (all games). . . . . . . . . . . . . . . . . . . . . . . . . 75 5.11 A sample of the chromosomes generated from basic IGA vs. Qlearning in prisoner’s dilemma. . . . . . . . . . . . . . . . . . . . 76 5.12 Effect of basic human input on the performance (final payoff) of GA vs. Q-learning in prisoner’s dilemma. . . . . . . . . . . . . . 76 5.13 Effect of basic human input on the performance (final payoff) of GA vs. Q-learning in a cooperation game. . . . . . . . . . . . . . 76 5.14 Effect of basic human input on the performance (final payoff) of GA vs. Q-learning in chicken game. . . . . . . . . . . . . . . . . 76 5.15 Effect of basic human input on the performance (final payoff) of GA vs. Q-learning in shapley’s game. . . . . . . . . . . . . . . . xiii 76
  • 15. 5.16 Effect of basic human input on GA on final payoffs in self-play (all games). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.17 Effect of basic human input on GA in self-play in chicken game (Average payoff per generation). . . . . . . . . . . . . . . . . . . 77 5.18 Effect of basic human input on the performance (final payoff) of GA in self-play in prisoner’s dilemma. . . . . . . . . . . . . . . . 78 5.19 Effect of basic human input on the performance (final payoff) of GA in self-play in a cooperation game. . . . . . . . . . . . . . . . 78 5.20 Effect of basic human input on the performance (final payoff) of GA in self-play in chicken game. . . . . . . . . . . . . . . . . . . 78 5.21 Effect of basic human input on the performance (final payoff) of GA in self-play in shaply’s game. . . . . . . . . . . . . . . . . . . 78 5.22 Effect of variations on human input on GA against GIGA-WoLF (all games). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.23 Effect of variations on IGA against GIGA-WoLF in prisoners dilemma (Average payoff per generation). . . . . . . . . . . . . . . . . . . 79 5.24 Effect of variations on IGA on final payoffs vs. GIGA-WoLF in prisoner’s dilemma. . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.25 Effect of variations on IGA on final payoffs vs. GIGA-WoLF in a cooperation game. . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.26 Effect of variations on IGA on final payoffs vs. GIGA-WoLF in Chicken game. . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.27 Effect of variations on IGA on final payoffs vs. GIGA-WoLF in Shapley’s game. . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.28 Effect of variations on IGA on final payoffs vs. Q-learning (all games). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv 81
  • 16. 5.29 Effect of variations on IGA on final payoffs vs. Q-learning in prisoner’s dilemma. . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.30 Effect of variations on IGA on final payoffs vs. Q-learning in a cooperation game. . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.31 Effect of variations on IGA on final payoffs vs. Q-learning in chicken game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.32 Effect of variations on IGA on final payoffs vs. Q-learning in Shapley’s game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.33 Effect of variations on IGA on final payoffs in self-play (all games). 83 5.34 Effect of variations on IGA on final payoffs in self-play in prisoner’s dilemma. . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.35 Effect of variations on IGA on final payoffs in self-play in a cooperation game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.36 Effect of variations on IGA on final payoffs in self-play in chicken game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.37 Effect of variations on IGA on final payoffs in self-play in shapley’s game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.38 Human input quality and its effect on final payoffs of IGA against GIGA-WoLF (all games). . . . . . . . . . . . . . . . . . . . . . . 85 5.39 Human input quality and its effect on final payoffs of IGA vs. GIGA-WoLF in prisoner’s dilemma. . . . . . . . . . . . . . . . . 85 5.40 Human input quality and its effect on final payoffs of IGA vs. GIGA-WoLF in a cooperation game. . . . . . . . . . . . . . . . . 85 5.41 Human input quality and its effect on final payoffs of IGA vs. GIGA-WoLF in chicken game. . . . . . . . . . . . . . . . . . . . 85 5.42 Human input quality and its effect on final payoffs of IGA vs. GIGA-WoLF in shapley’s game. . . . . . . . . . . . . . . . . . . xv 85
  • 17. 5.43 Human input quality and its effect on IGA vs. GIGA-WoLF in prisoners dilemma (Average payoff per generation). . . . . . . . . 86 5.44 Effect of human input quality and its effect on IGA against GIGAWoLF in cooperation (Average payoff per generation). . . . . . . 86 5.45 Human input quality and its effect on final payoffs of IGA vs. Qlearning (all games). . . . . . . . . . . . . . . . . . . . . . . . . 86 5.46 Human input quality and its effect on final payoffs of IGA vs. Qlearning in prisoners dillema (per generation). . . . . . . . . . . . 87 5.47 Human input quality and its effect on final payoffs of IGA vs. Qlearning in prisoner’s dilemma. . . . . . . . . . . . . . . . . . . . 87 5.48 Human input quality and its effect on final payoffs of IGA vs. Qlearning in a cooperation game. . . . . . . . . . . . . . . . . . . . 87 5.49 Human input quality and its effect on final payoffs of IGA vs. Qlearning in chicken game. . . . . . . . . . . . . . . . . . . . . . . 87 5.50 Human input quality and its effect on final payoffs of IGA vs. Qlearning in shapley’s game. . . . . . . . . . . . . . . . . . . . . . 87 5.51 Human input quality and its effect on final payoffs of IGA in selfplay (all games). . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.52 Human input quality and its effect on final payoffs of IGA in selfplay in prisoner’s dilemma. . . . . . . . . . . . . . . . . . . . . . 88 5.53 Human input quality and its effect on final payoffs of IGA in selfplay in a cooperation game. . . . . . . . . . . . . . . . . . . . . . 88 5.54 Human input quality and its effect on final payoffs of IGA in selfplay in chicken game. . . . . . . . . . . . . . . . . . . . . . . . . 88 5.55 Human input quality and its effect on final payoffs of IGA in selfplay in shapley’s game. . . . . . . . . . . . . . . . . . . . . . . . 6.1 88 Payoff matrix of the 3-player prisoner’s dilemma. . . . . . . . . . 91 xvi
  • 18. 6.2 Relationship between the fraction of cooperators and the utility received by a game participant. . . . . . . . . . . . . . . . . . . . . 92 6.3 Final payoffs of selected opponents in 3-player prisoner’s dilemma. 93 6.4 Effect of human input on the performance of GA in the 3-player prisoner’s dilemma in self-play. . . . . . . . . . . . . . . . . . . . 6.5 95 Effect of human input on the performance of GA in the 3-player prisoner’s dilemma with 1-player as GIGA-WoLF and 1-player in self-play. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Effect of human input on the performance of GA in 3-player prisoner’s dilemma with 2-players as GIGA-WoLF. . . . . . . . . . . 6.7 95 96 Effect of human input on the performance of GA in 3-player prisoner’s dilemma with 1-player as Q-learning and 1-player in self-play. 96 6.8 Effect of human input on the performance of GA in 3-player prisoner’s dilemma with 2-players as Q-learning. . . . . . . . . . . . 6.9 97 Effect of human input on the performance of GA in 3-player prisoner’s dilemma with 1-player as Q-learning and 1-player as GIGAWoLF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii 97
  • 19. List of Algorithms 3.1 GIGA-WolF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2 Q-learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.1 Basic genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . 51 4.2 Genetic algorithm with history propagation . . . . . . . . . . . . 52 4.3 Genetic algorithm with stopping condition . . . . . . . . . . . . . 53 4.4 Genetic algorithm with dynamic parameters’ setting . . . . . . . . 54 4.5 Genetic algorithm with dynamic parameters’ setting and stopping condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii 55
  • 20. CHAPTER 1 Introduction 1.1 Problem Definition In multi-agent systems (MAS), intelligent agents interact with each other seeking to maximize their own welfare. In many instances, these agents need to learn overtime in order to become more successful. One of the main issues within MAS is the ability of each agent in the system to learn effectively and co-exist with heterogeneous agents within the system. MAS is considered one of the most prominent fields of research because its structure describes many of sreal life problems. Extensive research has been performed in an effort to design a learning algorithm applicable to work in MAS [10, 53, 91]. However, proposed solutions typically suffer from at least one of the following problems: 1. Inability to adapt with the increase of dynamic situations within the system. 2. Settlement into myopic, non-evolving solutions. 1
  • 21. CHAPTER 1. INTRODUCTION 2 3. Requirement of extensive learning time in order to reach an acceptable solution. Power systems are widely used as an example of MAS [51, 106]. In such systems, each consumer can be considered as an agent. Each agent in this multi-agent system must learn an intelligent behavior in order to maximize its own personal gain. In this case, the gain from the consumer’s perspective is the ability to satisfy the user’s consumption and preferences. 1.2 Motivation and Relevance to the Masdar Initiative Electricity consumption has increased drastically in the past decade as a result of an enormous increase in population and technology. In the UAE for instance, there has been a sudden increase in the usage of high-tech appliances and ability to add more electrical devices than ever before [70]. This increase in consumption, while still relying on the old electrical grid as a mean of distributing electricity, leads to high losses in electricity. This is partially due to the variation in consumption among different sectors, where each sector consumes electricity based on different schedules. Another recent development in modern power systems is the entry of renewable energy sources on a large scale. The use of these energy sources, which generate an intermittent and less predictable supply, is expected to continue to increase over the next few decades to reduce the consumption of less environmentally friendly energy resources. To effectively handle these forms of electricity generation and increased electricity usage, a more intelligent distribution structure must be implemented. This intelligent solution is referred to in the literature as the “smart grid” [16]. Electricity dispatch, in which electricity supply must be matched with electricity demand, is a common problem in electrical grids. Research and industrial
  • 22. CHAPTER 1. INTRODUCTION 3 work has been dedicated to design systems that are capable of taking information from generators and consumers to determine how to effectively distribute electricity. These types of systems are called “electricity management systems” [105]. Traditional management systems typically use a centralized structure, and rely on a market operator to manage electricity distribution [99]. This central agent determines the distribution of electricity either by applying a constant scheduling mechanism or by running an off-line predication mechanism to model and predict supply and demand [59]. However, management systems relying on a central single operator do not match the structure of how an electricity market operate. Normally, electricity markets rely on different entities (including the generators, distributors and consumers) to make a decision. As such; electricity markets are better modeled by distributed interactions between consumers and distributors. Furthermore, traditional management systems have other drawbacks, such as: 1. The extensive computational power required at the decision center, which results in slow response times and an inability to keep up with real-time updates [37]. 2. The system’s inability to respond adequately to an event not covered by the system occurs. This is the result of the static structure of the scheduling mechanism running in the system, and its inability to adapt to regulatory changes in the environment in a real-time manner [37]. 3. The algorithm running at the center of the system must be completely redesigned if the configuration has to be changed (addition or removal of an element or an agent) [37]. 4. The centralized model’s focus concentrates solely on the overall response of the power grid, making it difficult to model real-time interactions between different entities [59].
  • 23. CHAPTER 1. INTRODUCTION 4 However, the realization of a distributed power system requires the development of additional capabilities to manage electricity transmission efficiently without relying on a central agent. By moving into a multi-agent perspective, where we rely on different decision makers, we must determine how charges can be distributed amongst generation companies, how to take transmission constraints into account, and which regulation mechanisms are suitable for such a system [99]. Additionally, informational and motivational aspects of the agents, such as beliefs, desires, intentions and commitments, must be investigated [13]. Furthermore, with variations in workloads and energy sources, it is hard to define a single policy that will perform well in all cases [32]. This leads us to a requirement for making the agents more intelligent and adaptable. To achieve this, agents need to evolve their own behavior with respect to scheduling their consumption according to these changes, without the need to redesign their decision making structure. This can be potentially achieved by applying game theory and machine learning concepts. Previous work has been done in order to investigate the use of evolutionary algorithms (EA) within the area of electricity management [67]. These algorithms have been shown to be successful under specific conditions [81, 20, 83]. Unfortunately, under other conditions, EA tend to perform poorly. One such situation occurs when there is a very large solution space defined by two or more interacting subspaces. One solution to this challenge is to run EAs on a multi-agent distributed structure of the system, and use the interaction between agents as means of decreasing the search space by dividing it among different agents, and make each agent benefit from others’ experience [102]. This introduces the concept of “multiobjective fitness function”, where EA work on multiple fitness functions for each agent. The ideal solution, in most cases, does not exist because of the contradictory nature of objective functions, so compromises have to be made. A second challenge of the use of EA for distributed power systems is a chal-
  • 24. CHAPTER 1. INTRODUCTION 5 lenge that exists in most of the existing learning algorithms, which is the inability to adapt quickly to the changes in the environment and user preferences. EAs are expected to work best if the market is not volatile. Research has been done to try to overcome these problems in evolutionary algorithms, mainly in single agent systems. One suggested solution was to utilize human input through “Interactive evolutionary algorithms” [33, 6]. In these algorithms, human input is used by the algorithm to decrease the amount of time needed to reach a stable and efficient policy for the system. To ensure that the policy follows human preferences, and that it is robust enough to cope with variations in the system. The latest of our knowledge, the idea of using interactive evolutionary algorithms in multi-agent systems has not be studied to date. Therefore, our objective is to find a learning algorithm that can be used by individual entities in power systems to effectively acquire and manage energy resources to satisfy user preferences. This learning algorithm should be able to adapt to the behavior of other learning entities, changes in the environment and changes in user preferences. 1.3 Thesis statement In this research, we study the performance of genetic algorithms (GAs) as a learning methodology for an agent within a multi-agent system. We discuss the effect of integrating human input into GAs (known as interactive genetic algorithms) within a multi-agent system. By conducting different experiments, we try to identify how human input can be integrated into GAs, and test the applicability of interactive genetic algorithms (IGAs) in repeated matrix games. The matrix games we use in our experiments cover different possibilities and variations of the agents’ payoffs in order to examine the effect of this variation on the algorithms’ performance.
  • 25. CHAPTER 1. INTRODUCTION 6 We run different variations of our algorithms against different learning opponents, including itself, Q-learning [100] and GIGA-WoLF [10]. 1.4 Thesis Overview Through this thesis, we will give a detailed analysis of related work done on the aforementioned problem. We will begin by giving a literature review on related topics through chapter 2 such as: multi-agent systems, genetic algorithms, and interactive genetic algorithms. In chapter 3, we move forward to an overview of the problem and our experimental setup. We show different variations of GAs implemented through our experiments in chapter 4, and we study the effect of each of these variations on the final performance of the system. The next part of our thesis discusses and evaluates a potential framework for integrating human input into GAs. In chapter 5, we experiment the suggested framework within the repeated matrix games and evaluate their performance against the previously selected learning opponents. In order to evaluate the scalability of our algorithms, in chapter 6, we apply the GA with and without human input in a 3player environment. We show how GA perform in such an environment and if there are certain features that are not able to propagate to large-scale enviornments. This study will help us gain an understanding of how our algorithms will perform within more complex systems. Finally, we will give a detailed discussion of the results (Chapter ??). This discussion helps in deriving the conclusions presented in chapter 7. We then suggest potential future research to be based on this thesis.
  • 26. CHAPTER 2 Literature Review In this section, we present an overview of different fields related to our research. We also discuss related work and explain how this work came to help in our experiments. We will start by giving an overview about electrical power grids in section 2.1. Then, we will connect it to multi-agent systems, and explain the structure of a standard MAS problem. We then move along to the specific MAS branch of problems that we study in this thesis, which are repeated matrix games. After giving the required background about the problem structure, we start giving background about the solution methodology we are using. We give an overview about evolutionary algorithms in general (which includes genetic algorithms), after which we explain the history and structure of genetic algorithms. We then examine the work done in the field of learning in multi-agent systems, and start relating the work done within genetic algorithms and learning in multi-agent systems. This relation is materialized through different topics including genetic algorithms within dynamic systems, genetic algorithms for developing strategies in matrix games and 7
  • 27. CHAPTER 2. LITERATURE REVIEW 8 finally, our main topic, interactive genetic algorithms. 2.1 Electrical Power grids An electrical power grid is a network of interconnected entities, in which electricity generators are connected to consumers through a set of transmission lines and distributors (Figure 2.1). Existing electrical grids mainly rely on classic control systems. The structure of these systems is based on the following: The generator generates a certain amount of electricity (depending on its capacity), which is then distributed to the consumers through the distributers. This main structure faces many difficulties, mainly when it tries to deal with variable types of generators, distributors and consumers. In real life systems, this variety is to be expected. In the generators’ case, renewable energy sources have become extensively used for power generation, especially within the past decade. This leads to intermittency in the supply that is usually hard to model [48]. Within the consumers’ case, although various research is targeted towards modeling electricity demand and consumption patterns [43, 24], demand is not always deemed predictable. This unpredictability leads to many constraints regarding distribution (which is also called a demand-supply problem), and requires additional research to visualize more work that can efficiently enhance the generation, distribution and consumption cycles through more intelligent means. 2.1.1 Smart grids In order to enhance the efficiency of electricity generation, distribution and consumption within the power grid, a logical solution is the merger of intelligent automation systems into the electrical power grid to form the “smart grid” [37]. A smart grid (see Figure 2.2) delivers electricity from suppliers to consumers
  • 28. CHAPTER 2. LITERATURE REVIEW 9 Figure 2.1: A traditional Electrical Grid [8]. using two-way digital communications to control appliances at consumers’ homes. This could save energy, reduce costs and increase reliability and transparency if the risks inherent in executing massive information technology projects are avoided. Smart grids are being promoted by many governments as a way of addressing energy independence, global warming and emergency resilience issues [37]. In our research, we consider the management of electricity from the consumer side, where an “agent” here represents a consumer, which tries to satisfy its needs in the presence of external circumstances.
  • 29. CHAPTER 2. LITERATURE REVIEW 10 Figure 2.2: A smart electrical grid [8]. 2.2 Multi-agent systems A multi-agent system (MAS) is a system composed of multiple interacting intelligent agents. Many of the real world problems, including electricity grids’ demandresponse problems, can be easily visualized as a MAS [78, 106]. In power systems, each entity (distributors, consumers, and generators), is represented as an agent. Each agent may have an individual goal to reach while interacting with other entities (agents). Various research has been in done in the field to solve the electrical supplydemand problem using multi-agent simulations [97, 80]. As we are testing a new technique in this thesis, we wanted to base our work on research trends that have been used before in the field of simulating electrical power grids. One of these trends is considering our MAS as a simple matrix game (more details in the following section). This decision was based on the fact that various electrical grids, electricity scheduling and electricity market simulators use matrix games [61, 73, 62, 5]. The supplier’s goal in these games is to supply power to
  • 30. CHAPTER 2. LITERATURE REVIEW 11 the consumers with the best price possible, while maintaining stability in the grid. On the other hand, the consumer agents must satisfy their needs while minimizing their costs (depending on the consumers’ preferences). All of these goals should be satisfied keeping in mind the existence of other agents and external influences. In order to achieve its goal, each agent follows a “strategy,” which is either fixed or adaptable with time. As a result of this extensive usage as a representation of an electricity market problem, we choose to evaluate our algorithms within repeated matrix games. In the following section we give more details about matrix games and their structure. 2.3 Matrix games Matrix games are a subset of what are called “stochastic games.” In these games, each player takes an action, which produces a reward. In a matrix game (also called “normal form game”), the payoffs (rewards) towards the players action space is defined in the form of a matrix. This action space represents the set of possible actions that each player can perform within this game. Depending on the rewards they get, the players decide the “strategy” they are going to follow, where the strategy represents the decision of which action to play over time [42]. For clarification, consider the game represented in Table 2.1. In this matrix game, we have two players, player1 and player2. Player 1 (row player) can play either action A or B (so its strategy space is {A,B}). Likewise, Player 2 (column player) can play either action a or b. From this we can conclude that the set of possible joint actions is {(A, a), (A, b), (B, a), (B, b)}. Each cell within the matrix shown in the figure represents the reward for each player when this joint action occurs. In the matrix given as example, the payoff to the row player (player 1) is listed first, followed by the payoff to the column player (player 2). For example, if
  • 31. CHAPTER 2. LITERATURE REVIEW A B a -1,2 0,0 12 b 3,2 2,-1 Table 2.1: Payoff matrix for the Prisoner’s dilemma. the row player plays B and the column player plays a (which is the joint action (B, a)), then the row player receives a payoff of 2, while the column player receives a payoff of -1. A strategy for an agent (player) i is a distribution πi over its action set Ai . Similarly, it can be defined as the probability that a player will move from one state to another. A strategy can be either a pure strategy (where the probability of playing one of the actions is 1, while the probability of playing any of the other actions is 0), or a mixed strategy, where each action is played with a certain probability over time. The joint strategy played by the n agents is π=(π1 , π2 ,...,πn ) and, thus, ri (π) is the expected payoff for agent i when the joint strategy π is played. Depending on the action taken, a player’s situation changes overtime. The situation of the player can be represented in what is called a “state” [11]. In this example, this situation where each player takes a certain payoff in the presence of a certain joint-action pair represents the state. 2.3.1 Types of matrix games Matrix games can be divided into different types using different criteria. In this section, we discuss the main criteria that affected how we selected the games used for the experiments. Details about the exact games used in the experiments are mentioned in the next chapter.
  • 32. CHAPTER 2. LITERATURE REVIEW 13 Cooperative vs. non-cooperative games A cooperative game represents a situation in which cooperation amongst the players is the most beneficial to them. Therefore, these games mainly require an efficient arrangement between players to reach the cooperative behavior. An example of these games is a coordination game [17]. On the other hand, non-cooperative games are not defined as games in which players do not cooperate, but as games in which any cooperation must be selfenforcing. Most of realistic problems fall under the non-cooperative games. Symmetric vs. asymmetric games A symmetric game is a game where the payoffs for playing a particular strategy depend only on the other strategies employed, not on who is playing them. If the order of the players can be changed without changing the payoff to the strategies, then a game is symmetric. On the other hand, in asymmetric games, the action set for each player is different than the others. For simplicity, we focus in this research on symmetric games. 2.3.2 Solution concepts In matrix games, there are concepts through which we can identify strategies for the players, which, if played, can lead to a state known as equilibrium. These concepts can be useful in evaluating the performance of the strategy played. We will give an overview about some of these concepts, which are directly related to our research. Best response A best response is a strategy that produces the most favorable outcome for a player, given the other players’ strategies [36]. Therefore the strategy πi ∗ is a best response
  • 33. CHAPTER 2. LITERATURE REVIEW 14 for agent i if ri (πi ∗, π−i ) ≥ ri (πi , π−i ) for all possible πi . Nash equilibrium A Nash equilibrium, is a set of strategies, one for each player, that has the property that no player can unilaterally change his strategy and get a better payoff. The Nash equilibrium (NE) has had the most impact on the design and evaluation of multiagent learning algorithms to date. The concept of a Nash equilibrium is based on the best response. When all agents play best responses to the strategies of other agents, the result is an NE. Nash showed that every game has at least one NE [74]. However, there is no known algorithm for calculating NEs in polynomial time [77]. If we consider that all players are self-interested, each of them would tend to play the best response to the strategies of other agents, if they know them, therefore resulting in an NE. Many games have an infinite number of NEs. In the case of repeated games, these NEs are called NEs of the repeated game(rNEs), which we will discuss shortly. Therefore, the main goal of an intelligent learning agent is not just to play best-response to the surrounding agents, but also to influence other agents to play according to what is profitable to the agent as much as possible. Maximin Maximin is a decision rule used in different fields for minimizing the worst possible loss while maximizing the potential gain. Alternatively, it can be thought of as maximizing the minimum gain. The maximin theorem states [75]: For every two-person, zero-sum game with finite strategies, there exists a value V and a mixed strategy for each player, such that (a) Given player 2’s strategy, the best payoff possible for player 1 is V, and (b) Given player 1’s strategy, the best payoff possible for player 2 is V [75].
  • 34. CHAPTER 2. LITERATURE REVIEW 15 Equivalently, Player 1’s strategy guarantees him a payoff of V regardless of Player 2’s strategy, and similarly Player 2 can guarantee himself a payoff of -V. The name minimax arises because each player minimizes the maximum payoff possible for the other since the game is zero-sum, he also maximizes his own minimum payoff. Pareto efficiency Named after Vilfredo Pareto, Pareto efficiency (optimality) is a measure of efficiency. An outcome of a game is Pareto efficient if there is no other outcome that makes every player at least as well off and at least one player strictly better off. That is, if an outcome is Pareto optimal, there does not exist another outcome that does not provide at least one other player a lower payoff. 2.3.3 Repeated matrix games A repeated matrix game, from the name, is a matrix game that is played repeatedly. The joint action taken by the agents identifies the payoff (reward) in each round (or stage) of the game, which can help the player in taking the decision of which action can be taken next through learning. The task of a learning agent in a repeated matrix game is to learn to play a strategy πi such that its average payoff over time (denoted ri for agent i) is maximized. Let ri be given by: ¯ ¯ ri = ¯ 1 T ∑ ri (πit , π−it ) T t=1 (2.1) where πi is the strategy played by agent i at time t, π−i is the joint strategy played by all the agents except agent i, and 1 ≤ T ≤ ∞ is the number of episodes in the game. In our work, we consider simultaneous action games, where in each round, both agents play an action (without knowing the action of the other agents within the same round).
  • 35. CHAPTER 2. LITERATURE REVIEW 16 Figure 2.3: Payoff space for the prisoner’s dilemma game [25]. In order to evaluate the strategies played, and see different equilibria, the concept of the one-shot Nash equilibrium does not fully represent the equilibrium in repeated games. Therefore, in order to define the concept of NE within a repeated game, we discuss what is called the folk theorem [90]. Consider the case in the prisoners’ dilemma, shown in Figure 2.3, which shows the joint payoffs of the two players. The x-axis shows the payoffs of the row player and the y-axis shows the payoffs of the column player. The combination of the shaded regions (light and dark) in the figure represents what is called the “ConvexHull”, which is all the possible joint-action pairs possible within the game. As can be noticed, on average, the player guarantees itself a higher stable payoff by playing defect; neither of the players has incentive to receive an average payoff (over time) less than 0. Therefore, the darkly shaded region in the figure shows the set of expected joint payoffs that the agents may possibly accept as average payoffs within each step of the game. The folk theorem states that any joint payoffs in the convex hull can be sus-
  • 36. CHAPTER 2. LITERATURE REVIEW Cooperate Defect Cooperate 3,3 5,0 17 Defect 0,5 1,1 Table 2.2: Payoff matrix for the Prisoner’s dilemma. tained by an rNE, provided that the discount rates of the players are close to unity (i.e., players believe that play will continue with high probability after each episode). This theorem helps us understand the fact that, in repeated games, it is possible to have an infinite number of NE. 2.3.4 Stochastic games In real life situations, the more detailed stochastic games (where we have different possible games to transit from one action-state pair to another) can be considered more informative and suitable for modeling. However, research has been done [26] in experimenting if learning algorithms that work within repeated matrix games can be extended into repeated stochastic games or not. This extension was found to give suitable results within the prisoner’s dilemma and its stochastic version (for 2-agent games), which gives us the motivation needed in pursuing the current experimentation within matrix games. 2.4 Learning in repeated matrix games In this section, we give an overview of the related work in multi-agent learning, especially algorithms that were used within matrix games. Because of the existence of various multi-agent learning algorithms found in the literature, we restrict our attention to those that have had the most impact on the multi-agent learning community as well as to those that seem to be particularly connected to the work presented in this thesis. We divided the learning algorithms that we review into three different (although related) categories: belief-based learning, reinforcement
  • 37. CHAPTER 2. LITERATURE REVIEW 18 learning, and no-regret learning [26]. 2.4.1 Belief-based learning Belief-based learning is based on the idea of constructing a model of the opponent’s behavior. These models usually rely on previous interactions with the opponent. Using this model, we try to find the best response with respect to this model. One of the most known belief-based learning algorithms is fictitious play [15]. 2.4.2 No-regret learning A no-regret algorithm compares its performance with the “best action” available within its set. Regret in this case is defined as the difference between the rewards obtained by the agent and the rewards the agent might have obtained if it followed a certain combination of its history of actions. In the long run, an algorithm with no-regret plays such that it has little or no regret for not having played any other strategy. GIGA-WoLF [10] is one example of a no-regret algorithm. We describe GIGA-WoLF in greater detail within the next chapter. 2.4.3 Reinforcement learning Reinforcement learning (RL) methods involve learning what to do so as to maximize (future) payoffs. RL agents use trial and error to learn which strategies produce the highest payoffs. The main idea of reinforcement learning is that through time, the learner tries to take an action that maximizes a certain reward. Reinforcement learning is widely used within matrix game environments [91, 52]. There are several known learning algorithms that can be identified as reinforcement learning, including: Q-learning [100] and evolutionary algorithms [70].
  • 38. CHAPTER 2. LITERATURE REVIEW 2.5 19 Evolutionary algorithms Evolutionary algorithms are a popular form of reinforcement learning. Evolutionary algorithms (EAs) are population-based metaheuristic optimization algorithms that use biology-inspired mechanisms like mutation, crossover, natural selection, and survival of the fittest in order to refine a set of solution candidates iteratively. Each iteration of an EA involves a competitive selection designed to remove poor solutions from the population. The solutions with high ”fitness” are recombined with other solutions by swapping parts of a solution with another. Solutions are also mutated by making a small change to a single element of the solution. Recombination and mutation are used to generate new solutions that are biased towards solutions that are most fit [58]. This process is repeated until the solution population converges to the solution with a high fitness value. In general, evolutionary algorithms are considered an effective optimization method [2]. Survival of the fittest concept, together with the evolutionary process, guarantees a better adaptation of the population [58]. 2.5.1 Genetic algorithms A Genetic Algorithm (GA) [45] is a type of evolutionary algorithm. GAs are based on a biological metaphor, in which learning is a competition among a population of evolving candidate problem solutions. A fitness function evaluates each solution to decide whether it will contribute to the next generation of solutions. Then, through operations analogous to gene transfer in sexual reproduction, the algorithm creates a new population of candidate solutions [65, 44]. The main feature of GAs is that they encode the problem within binary string individuals. In addition to other encoding techniques. Another feature of GAs is their simplicity as a concept, and their parallel search nature, which makes it possible to easily modify GAs so they can be adapted to a distributed environment [18].
  • 39. CHAPTER 2. LITERATURE REVIEW 2.5.2 20 Genetic algorithm structure In this section we give a description of how GAs work. In order to get a better understanding of the algorithms, we define certain terminology that we use throughout subsequent sections. Fitness Fitness is the value of the objective function for a certain solution. The goal of the algorithm is either to minimize or maximize this value, depending on the objective function. Genome “Chromosome” A genome, or frequently called a chromosome, is the representation of a solution (strategy in the case of matrix games) that is to be taken at a certain point of time. The GA generates various chromosomes, each of which is assigned a certain fitness according to its performance. Using this fitness, the known evolutionary functions are implemented in order to create new population (generation) of chromosomes. Gene Genes are the units that form a certain genome (chromosome). The evolutionary functions such as mutation and crossover are mainly performed on the genes within the chromosomes. Solution space The solution space defines the set of all possible chromosomes (solutions) within a certain system. Through the evolutionary functions, we try to cover as much as possible of the solution space for proper evaluation, without trying all possible solutions one by one, as in the case of brute-search, in order to save time. After giving the proper definitions, we are going to describe how GAs work. In the following sections, we will discuss the main components that vary among different GAs according to the application. These components include: chromosome
  • 40. CHAPTER 2. LITERATURE REVIEW 21 representation, selection process (fitness calculation and representation), mutation process, and crossover process. Chromosome structure As previously mentioned, one important feature of GAs is their focus on fixedlength characters’ strings, although variable-length strings and other structures have been used. Within matrix games, these character strings represent the binary encoding of a certain strategy [3, 31]. But others have used non-binary encoding, depending on their application [86]. It should be noted, however, that there are special cases in which we consider the use of a binary encoding perfectly acceptable. In a prisoner’s dilemma, for example, agents have to make decisions that are intrinsically binary, namely decisions between cooperation and defection. The use of a binary encoding of strategies then seems like a very natural choice that is unlikely to cause undesirable artifacts [4]. Fitness functions and selection The fitness function is a representation of the quality of each solution (chromosome). This representation varies from one application to another. According to the fitness value, we select the fittest chromosomes and then perform crossover and mutation functions on these chromosomes to generate the new chromosomes. These techniques include: roulette wheel selection, rank base selection, elitism and tournament based selection. In the following section, we discuss each method. We exclude going over tournament based selection, as it requires experimenting and testing the solution through the selection process, which is not be suitable to our application since it is an off- line training technique for the algorithm.
  • 41. CHAPTER 2. LITERATURE REVIEW 22 Figure 2.4: Roulette wheel selection mechanism [30]. Roulette wheel selection Parents are selected according to their fitness. The better the chromosomes, the more chances they have to be selected. In order to get a better understanding, imagine a roulette wheel where all chromosomes in the population are distributed on a wheel. Each chromosome gains its share of the wheel size according to its fitness (Figure 2.4). A marble is thrown to select the chromosome on this wheel. The fittest chromosome has a higher opportunity of being selected. Rank based selection Roullete wheel selection has problems when the fitnesses vary widely. For example, if the best chromosome’s fitness is 90 percent of all the roulette wheel then the other chromosomes will have very few chances to be selected. Rank selection first ranks the population and then every chromosome receives fitness from this ranking. The worst will have a fitness of 1, the second worst will have a fitness of 2, and this continues till we reach the best chromosome. The best chromosome will have a fitness of N (where N is the number of
  • 42. CHAPTER 2. LITERATURE REVIEW 23 Figure 2.5: Crossover in genetic algorithms. chromosomes in the population). Elitism The idea of elitism has already been introduced. When creating new population by crossover and mutation, we have a big chance that we will lose the best chromosome. Elitism selection starts by copying a certain percentage of the best chromosomes to the new population. The rest of the population is then created by applying mutation and crossover on the selected elite. Elitism can very rapidly increase the performance of a GA because it prevents the algorithm from forgetting the best found solution. Crossover and Mutation Crossover and mutation are the main functions of any genetic algorithm after selection. They are the functions responsible for the creation of new chromosomes out of the existing chromosomes. In the crossover phase, all of the selected chromosomes are paired up, and with a probability called “crossover probability,” they are mixed together so that a certain part of one of the parents is replaced by a part of the same length from the other parent chromosome (Figure 2.5). The crossover is accomplished by randomly
  • 43. CHAPTER 2. LITERATURE REVIEW 24 Figure 2.6: Mutation in genetic algorithms. choosing a site along the length of the chromosome, and exchanging the genes of the two chromosomes for each gene past this crossover site. After the crossover, each of the genes of the chromosomes (except for the elite chromosome) is mutated to any one of the codes with a probability defined as the “mutation probability” (Figure 2.6). With the crossover and mutations completed, the chromosomes are once again evaluated for another round of selection and reproduction. Setting the parameters concerned with crossover and mutation is mainly dependent on the application at hand and the chromosome structure [45]. Algorithm summary Genetic algorithms are based on the fundamental algorithm structure as shown in Figure 2.7. First, an initial population of N individuals, which evolves at each generation, is created. Generally, we can say that a generation of solutions is obtained from the previous generation through the following procedure: solutions are randomly selected from the current population. Pairs of selected individuals are then submitted to the crossover operation with a given crossover probability Pc . Each descendant is then submitted to a mutation operation with a mutation probability Pm , which is usually very small. The chromosome’s ability to solve the problem is determined by its fitness function; the final step in the generation process is the substitution of individuals of the current population with low performance by the
  • 44. CHAPTER 2. LITERATURE REVIEW 25 Figure 2.7: Basic structure of genetic algorithms. new descendants. The algorithm stops after a predefined number, Gen, of generations has been created. An alternative stopping mechanism is a limit on computing time [101]. Advantages and Applications of genetic algorithms GAs represent one of the most renowned optimization search techniques, especially with the presence of high potential, non-linear search spaces. GAs have been used to solve different single agent problems [70]. As the computational requirements increase, it became more applicable to distribute different methods within the GA to different agents, where all agents in this case have the same goal. As a summary, we can say that GAs are most efficient and appropriate for situations such as the following:
  • 45. CHAPTER 2. LITERATURE REVIEW 26 • The search space is large, complex, or not easily understood • There is no programmatic method that can be used to narrow the search space • Traditional optimization methods, such as dynamic programming, are not sufficient Genetic algorithms may be utilized in solving a wide range of problems across multiple fields such as science, business, engineering, and medicine. The following provides a few examples: • Optimization: production scheduling, call routing for call centers, routing for transportation, determining electrical circuit layouts • Machine learning: designing neural networks, designing and controlling robots • Business applications: utilized in financial trading, credit evaluation, budget allocation, fraud detection Genetic algorithms are important in machine learning for various reasons: 1. They can work on discrete spaces, where generally gradient methods cannot be applied. 2. They can be used to search parameters for other machine learning models such as fuzzy sets and neural networks. 3. They can be used in situations where the only information we have is a measurement of performance, and here it competes with temporal difference techniques, such as Q-learning [86]. 4. They converge to a near optimal solution after exploring only a small fraction of the search space [98, 49].
  • 46. CHAPTER 2. LITERATURE REVIEW 27 5. They can be easily hybridized and customized depending on the application. 6. They may also be advantageous in situations when needs to find a near optimal solution [87]. While the great advantage of GA is the fact that they find a solution through evolution, this is also the biggest disadvantage. Evolution is inductive in nature, so it does not evolve towards a good solution, but it evolves away from bad circumstances [84]. This can cause a species to evolve into an evolutionary dead end. This disadvantage can be clearly seen within more dynamic system, where stabilzing within a dead-end can stop the algorithm from satisfying a dynamic learning process. 2.6 Genetic algorithms in repeated matrix games GAs have also been used within matrix games. Mainly, they have been used in this context as either a method for computing the Nash equilibrium of a certain game [22, 2, 54], or for generating the “optimal” chromosomes (strategies) to be played within a certain game [4]. These applications have been either solved by a single genetic algorithm running through the game parameters to reach an optimal solution [34], or by using what is called a “co-evolutionary” technique, which basically involves the usage of two genetic algorithms (both having the same goal), in order to reach the optimal solution in a more trusted and efficient manner [52]. Note here that co-evolution is considered as an “offline” learning technique, as it require testing all the current chromosomes within the population against all other chromosomes. This is not the same case while playing online, where the chromosome is tested against only the current associate it was set up to play against (not the whole population), which gives it less opportunity of exploration in front of different criteria.
  • 47. CHAPTER 2. LITERATURE REVIEW 28 GAs have been used before in formation of strategies in dynamic Prisoner’s Dilemma games. For example, Axelrod has used genetic algorithms in order to find the most desirable strategies to be used in the prisoner’s dilemma [4]. Axelrod’s stimulus-response players were modeled as strings, where each character within the string corresponds to a possible state (one possible history) and decodes to the player’s action in the next period. The longer the steps in memory to be taken into consideration, the longer the string representing the chromosome will be. This is as a result of the increase in the possible number of states. In addition, moving to a game with more than two possible moves of will lengthen the string. Increasing the number of players will also increase the number of states. Formally, the number of states is given by am×p , where there are a actions and p players, and each player keeps m periods of time in its history [44]. Another example was the usage of GA within a simple formulation of a buyerseller dilemma [92]. The GA implements a mixed strategy for the seller as an individual member of the population. Each population member is therefore a vector of probabilities for each action that all add up to 1.0. Within this experiment, they discussed the performance of GA in contrast of other RL techniques. The difference in performance between the GA agents and RL agents is primarily because GA agents are population based. Since RL agents deal with only one strategy, they are faster in adapting it in response to the feedback received from the environment. In contrast, the GA agents are population based. It takes the GA population as a whole longer to respond to the feedback received. For the same reason, the GA agents are expected to exhibit less variance, and hence better convergence properties. This was a good start in using genetic algorithm as a learning technique instead of optimization. However, it still needed more work regarding working against simple learning agents, for example without full state representation, and without more specific domain knowledge. The authors of this work also raised the
  • 48. CHAPTER 2. LITERATURE REVIEW 29 question of how human input can contribute as potential future work [92]. In order to get a better understanding about how GAs may perform in more complicated situations, we now discuss within the following sections the performance of GAs within similar fields, such as distributed and dynamic systems. 2.6.1 Genetic algorithms in distributed systems In distributed systems, the primary form of GAs that has been used was co-evolutionary algorithms [56, 52, 107]. In this case, each GA agent represents one possible solution, and with the existence of other GA agents, they try to verify which solution will possibly be optimal. In another context, where each GA agent evolves its own set of solutions, all the agents are centralized with the same objective function (all the agents cooperate and communicate in order to reach the same goal) [47, 69]. As we can see in all of these situations, the GA agent is not completely independent of other existing agents’ objective. 2.6.2 Genetic algorithms in dynamic systems The goal of a GA within a dynamic system changes from finding an “optimal” answer to tracking a certain goal (and enhancing the overall performance). Most real world, artificial systems and societies change due to changes in a number of external factors in the environments, agents learning new knowledge, or changes in the make up of the population. When the environment changes over time, resulting in modifications of the fitness function from one cycle to another, we say that we are in the presence of a dynamic environment [93]. Several researchers have addressed related issues in previous work. Examples include evolutionary models and co-evolutionary models where the population is changing over time, and studies in the viscosity of populations [68, 103]. Different from classical GA the goal of such system is to maximize the average result instead of determining the best
  • 49. CHAPTER 2. LITERATURE REVIEW 30 optimal solution, where you track the performance of different solutions over time instead of reaching a certain optimal target. Brank [12] surveys the strategies for making evolutionary algorithms, which include GAs, suitable for dynamic problems. The author grouped the different techniques into three categories: • React to changes, where as soon as a change in the environment has been detected explicit actions are taken • Maintain diversity throughout the run, where convergence is avoided all the time and it is hoped that a spread-out population can adapt to modifications more easily [55] • Maintain an additional memory through generations (memory-based approaches), where the evolutionary algorithm is supplied with memory to be able to recall useful information from past generations Many methods have been presented to make genetic algorithms applicable in dynamic environments. First, researchers have modeled change in the environment by introducing noise into the system whereby agents actions are mis-implemented or mis-interpreted by other agents[29]. Another idea has been to localize the search within a certain part of the search space. This can be either done through intelligent initialization of the population [85], or as done within the “memetic algorithm” [79], by evaluating close and similar neighbors of the chromosomes on trial in addition to the chromosomes already tested. This is where we get part of our motivation within interactive learning. This idea motivated our feedback of generating populations based on feedback from users(evaluating close neighbors helps me evaluate existing ones). The aforementioned methods either do not consider the existence of other heterogeneous learning entities in the system, or learn only under certain identified
  • 50. CHAPTER 2. LITERATURE REVIEW 31 constraints [1]. However, experimental results are promising and show interesting properties of the adaptive behavior of GA techniques. 2.7 Interactive learning Another factor to consider to potentially enhance the performance of genetic algorithms is to gather human input in real time to teach the algorithm. Within any learning algorithm, this can be done by merging the learning algorithm with human-machine interaction, resulting in what is called in the literature “interactive artificial learning.” Using human input as a part of the learning procedure can provide a more concrete reward mechanism, which can increase the convergence speed [96, 27]. These learning methods occur in either the “act,” “observe” or “update” step of an interactive artificial learning mechanism [27]. Experiments have been performed to evaluate potential effects of human input on the learning curve in multi-agent environments. Results show a significant improvement in learning, depending on the quality of the human input [27, 28]. Figure 2.8: The interactive artificial learning process [27].
  • 51. CHAPTER 2. LITERATURE REVIEW 2.7.1 32 Interactive learning in repeated matrix games Within the repeated games environment. Experiments have been performed in order to analyze the effect of the human input on the performance of the learning algorithms [27]. The algorithms used in these experiments use “learning by demonstration” (LbD). Results showed that LbD does help learning agents to learn non-myopic equilibrium in repeated stochastic games when human demonstrations are well-informed. On the other hand, when human demonstrations are less informed, these agents do not always learn behavior that produces (more successful) non-myopic equilibria. However, it appears that well-formed variations of LbD algorithms that distinguish between informed and uninformed demonstrations could learn non-myopic equilibrium. When humans play iterated prisoners’ dilemma games, their performance depends on many factors [32, 41, 59]. Thus, it can be concluded that a similar trend applies to LbD algorithms, and that there is a chance that LbD algorithms could potentially provide information about the game and associates that would provide a context that facilitates better demonstrations. 2.7.2 Interactive genetic algorithms An interactive genetic algorithm (IGA) is defined as a genetic algorithm that uses human evaluation. These algorithms belong to a more general category of interactive evolutionary computation. The main application of these techniques include domains where it is hard or impossible to design a computational fitness function, including evolving images, music, various artistic designs and forms to fit a user’s aesthetic preferences. In an in interactive genetic algorithm (IGA), the algorithm interacts with the human in attempt to quickly learn effective behavior and to better consider human preferences. Previous work on IGA in distributed tasks has shown that human
  • 52. CHAPTER 2. LITERATURE REVIEW 33 input can allow genetic algorithms to learn more effectively [33, 38]. However, such successes required heavy user interaction, which causes human fatigue [33]. Previous work in interactive evolutionary learning in single-agent systems has analyzed methods for decreasing the amount of necessary human interaction in interactive genetic learning. These methods either apply bootstrapping techniques, which rely on estimations of the reward in between iterations instead of a direct reward from the user [60, 66], or they divide the set of policies to be evaluated into clusters, where the user only evaluates the center of the cluster and not all policies [82]. Another suggestion for reducing human fatigue, which is applicable only in multi-agent systems, is by using input from other agents (and potentially other agents’ experiences), as one’s own experience [39]. Interaction between a human and the algorithm may occur in different stages of GA and in different ways. The most common way is to make a human part of the fitness evaluation for the population. This can be done by either ranking available solutions [94], or directly assigning the fitness function value to the available policies in the population. Other work, which targets reducing human fatigue as mentioned above, had the human evaluate only selected representatives of the population [82, 88, 60]. Also human input has been investigated in the mutation stage, where the human first selects the best policy from his point of view, and suggests a mutation operation to enhance its performance [33]. Babbar-Sebens et al. [28] realized a problem with IGA, which was how an algorithm can overcome the temporal changes in human input. This situation not only leads the genetic algorithm to prematurely converge, but it can also reduce diversity in the genetic algorithm population. This occurs when solutions that initially have poor human rankings are not able to survive the GA’s selection process. Loss of these solutions early in the search process could be detrimental to the performance
  • 53. CHAPTER 2. LITERATURE REVIEW 34 of the genetic algorithm, if these solutions have the potential to perform better when preferences change later. That is why they suggested the use of case-based memory per population of policies. This memory acts as a continuous memory of the population and its fitnesses to give a more continuous and non-myopic view of evaluating the performance of the chromosomes through the generations (instead of having the evaluations be based on a single generation biases) [6]. IGAs have also been used to define robot behavior in known environments. A child (representing the human factor) trains the genetic algorithm through feedback on the evolved population. This training happens by selecting the top three preferred routes to be taken by the robot [66]. Usually IGA are used in more visual problems, where we can easily make the user rank or evaluate chromosomes in certain domains such as music and design [40, 14, 89]. It also has been used in resource allocation problems (which usually are more static) [94, 7]. The last related example was the interaction of a human with a GA within a board game, where GA plays against another GA or against a human. The GA here works on a limited set of (easy to reach) solutions that describe a behavior for the whole game (not move by move). The human in this case sets the parameters in the beginning of the game, including the number of generations and the mutation rate [21]. Interactive genetic algorithms in multi-agent systems Research that studied interactive genetic algorithms in multi-agent systems was mainly focused on dividing the IGA functions (including human interaction, mutation, crossover) into separate agents [57, 51]. In this case, the IGA is not fully independent of the other modules, where all modules interact with each other to reach a common objective.
  • 54. CHAPTER 2. LITERATURE REVIEW 2.8 35 Summary Past work has derived intelligent behavior in repeated games using various methodolgies. However, existing solutions have various problems, including: • They require too much input from the user. • They force certain constraints on users which decrease their comfort level. • They do not learn fast enough for real time systems, and are not able to deal with environments or goals that change over time. • They cannot be used in distributed systems. • They do not consider how human input can be incorporated in the system. From this, we conclude that there is a need of a solution that is able to target these drawbacks.
  • 55. CHAPTER 3 Experimental Setup In this chapter, we will present the experimental setup used to test our hypothesis. Since our goal is to test the performance of interactive genetic algorithms (IGAs) within a multi-agent system setting, we designed an experiment that will allow us to do that. In order to test the efficiency of the algorithm, we run it against itself and other learning algorithms in a variety of matrix games. The two renowned learning algorithms we use are GIGA-Wolf [10] and Q-learning [100]. 3.1 Games’ structure In this section, we give an overview of the matrix games’ we use to evaluate our algorithms. The expectations from the players within the games differ from one game to another. Therefore, we expect a different response from each learning algorithm. 36
  • 56. CHAPTER 3. EXPERIMENTAL SETUP Cooperate Defect Cooperate 3,3 5,0 37 Defect 0,5 1,1 Table 3.1: Payoff matrix for the Prisoner’s dilemma. 3.1.1 Prisoner’s dilemma The prisoner’s dilemma is perhaps the most studied social dilemma [4], [3], as it appears to model many real life situations. In the prisoner’s dilemma (Table 3.1), defection is each agent’s dominant action. However, both agents can increase their payoffs simultaneously by influencing the other agent to cooperate. To do so, an agent must (usually) be willing to cooperate (at least to some degree) in the longrun. An n-agent, m-action version of this game has also been studied [95]. As we can see from the matrix in Table 3.1, if we consider the reward of each player for each joint-action pair, the following rule should apply within a prisoner’s dilemma matrix: rdc ≥ rcc ≥ rdd ≥ rcd . Here, i represents my action, j represents the opponent’s action, and ri j represents my reward at the joint action pair (i, j). However, it is more desirable than mutual defection for both players to choose the first actions (C,C) and obtain rcc . 3.1.2 Chicken The game of Chicken is a game of conflicting interests. Chicken models the Cuban Missile Crisis [19], among other real-life situations. The game has two one-shot NEs ((C, d) and (D, c)). However, in the case of a repeated game, agents may be unwilling to receive a payoff of 2 continuously when much more profitable solutions are available. Thus, in such cases, compromises can be reached, such as the Nash bargaining solution (Swerve, Swerve) (Table 3.2). Therefore, the game is similar to the prisoner’s dilemma game (Table 3.1) in that an “agreeable” mutual solution is available. This solution, however, is unstable since both players are
  • 57. CHAPTER 3. EXPERIMENTAL SETUP Swerve 6,6 7,4 Swerve Straight 38 Straight 4,7 2,2 Table 3.2: Payoff matrix for chicken game. A B C a 0,0 1,0 0,1 b 0,1 0,0 1,0 c 1,0 0,1 0,0 Table 3.3: Payoff matrix of Shapley’s game. individually tempted to stray from it. 3.1.3 Shapley’s game Shapley’s game [36] is a 3-action game. It is a variation from the rock-paperscissors game. Shapley’s game has often been used to show that various learning algorithms do not converge. The game has a unique one-shot NE in which all agents play randomly. The NE of this game gives a payoff of 1/3 to each agent (in a 2-agent formulation). However, the players can reach a compromise in which both receive an average payoff of 1/2. This situation can be reached if both players alternate between receiving a payoff of 1 and receiving a payoffs of 0. The payoff matrix for this game is shown in Table 3.3. 3.1.4 Cooperative games As mentioned in the previous chapter, cooperative games are exactly opposite to competitive games (which are part of non-cooperative games). In these games, all agents share common goals, some of which may be more profitable than others. Table 3.4 shows the payoff matrix of a fully cooperative game.
  • 58. CHAPTER 3. EXPERIMENTAL SETUP A B a 4,4 0,0 39 b 0,0 2,2 Table 3.4: Payoff matrix of a fully cooperative matrix game. 3.2 Knowledge and Information In matrix games, the more information any learning algorithm has about the game and its associates, the more efficiently it can learn. Some of the information is usually hidden within the learning process, and the algorithm has to deal with only the information available. The following list shows the possible variations in the level of knowledge of the agent. This will help us have a better understanding of how our algorithm, and its opponents view the surrounding world. • The agent’s own action. The agent has to basically know its actions in order to know how to act in the first place. • The agent’s own payoffs. The agent may know the reward it has taken at a certain point of time, or how the actions are rewarded over time. • Associates’ actions. The agent can either know directly which action was taken by an associate, or be able to predict it over time. • Associates’ payoffs. The agent can know what outcomes can be used to motivate or threaten the other associates to be able to act accordingly. • Associates’ internal structure. The agent may have a knowledge of how the opponent reacts to certain situations. This knowledge usually is gained by attempts to model the associates over time. In our experiment, we assume that the algorithm has a complete knowledge about its own payoffs and actions as well as the opponent’s history of actions (from previous plays). No knowledge of the opponents internal structure is assumed.
  • 59. CHAPTER 3. EXPERIMENTAL SETUP 40 Algorithm 3.1 GIGA-WolF xt is the strategy according to which I play my action zt is the “baseline” strategy loop xt+1 ← xt + ηt ∗ rt ˆ zt+1 ← zt + ηt ∗ rt /3 δt+1 ← min( zt+1 − zt / zt+1 − xt+1 ) ˆ xt+1 = xt+1 + δt+1 ∗ (zt+1 − xt+1 ) ˆ ˆ end loop 3.3 Opponents The following section overviews the learning algorithms that we selected as opponents in our experiments. 3.3.1 GIGA-WolF GIGA-WoLF [10] (Generalized Infinitesimal Gradiet Ascent-Win or Learn Fast) is a gradient ascent algorithm. It is also a model-free algorithm like Q-learning. The idea of the algorithm is that it compares its strategy to a baseline strategy. It learns quickly if the strategy is performing worse than the baseline strategy. On the other hand, if the strategy is performing better than the baseline strategy, it learns at a slower rate. Algorithm 3.1 shows the basic update structure of the algorithm. As we can see, this algorithm consists of two main components. The first component is the “GIGA” component. The idea of “GIGA” is that after each play the agent updates its strategy in the direction of the gradient of its value function. The “WoLF” component was introduced later [11]. The idea is to use two different strategy update steps, one which is updated with a faster learning rate than the other. To distinguish between those situations, the player keeps track of two policies. Each policy is concerned with assigning the probabilities of taking a certain action under this specific situation. GIGA-WoLF is a no-regret algorithm. No-regret learning converges to NEs
  • 60. CHAPTER 3. EXPERIMENTAL SETUP 41 Algorithm 3.2 Q-learning for each state-action pair (s, a) do Q(s, a) ← 0 end for loop Depending on exploration rate ε, select an action a and execute it Receive immediate reward r Observe the new state s Update the table entry for Q(s, a) as follows: Q(s, a) ← (1 − α) ∗ Q(s, a) + α ∗ (r + γ ∗ maxa Q(s , a )) s←s end loop in dominance-solvable, constant-sum, and 2-action general-sum games, but do not necessarily converge in Shapleys Game [50]. 3.3.2 Q-learning Q-learning [100] is a reinforcement learning technique that is widely used in artificial intelligence research [35], [72], [9]. It can be also viewed as a dynamic programing technique, in which it iteratively tries to learn its ”to-go” payoff(called a Q-value) over time. Its main idea can be summarized as follows: an agent tries an action at a particular state, and evaluates its consequences in terms of the immediate reward or penalty it receives and its estimate of the value of the next state. By trying all actions in all states repeatedly, it learns which are best overall, judged by the long-term discounted reward. Algorithm 3.2 shows the main structure of the algorithm. For all states and action pairs, Q(s, a) converges to the true value under the optimal policy when (i) the environment has the Markov property, (ii) the agent visits all states and takes all actions infinitely often, and (iii) the learning rate α is decreased properly. However, if the agent always chooses the actions greedily during learning, Q-values may converge to a local optimum because the agent may not visit all states sufficiently. To avoid this, the agent usually uses a stochastic
  • 61. CHAPTER 3. EXPERIMENTAL SETUP 42 methods (like ε-greedy) to choose actions. The ε-greedy method chooses an action that has the maximum Q-value with probability (1-ε) or a random action with probability ε [72]. The Q-learning algorithm we are using in our experiments has the following settings: • Discount factor γ = 0.95. • State is represented by the previous joint action of the agent and its associates. • Exploration rate ε = 1.0/(10.0 + (t/1000.0)), where t represents the number of rounds played. 3.4 Evaluation criteria In order to evaluate the performance of our algorithms, we will mainly focus on two main points: 1. The average fitness of population per generation: This will help us to evaluate the performance of the algorithm regarding convergence and the ability of the algorithm to learn over time. 2. The final payoff achieved: By knowing and studying the final payoff, we know the final performance of the algorithm in comparison to other algorithms. These evaluation criteria are averaged over 10 runs of each algorithm against the selected opponent in order to eliminate the effect of randomness. We show the variations within the final payoffs over all the conducted runs in order to verify the stability of the performance of the algorithm within a specific game against a specific opponent.
  • 62. CHAPTER 3. EXPERIMENTAL SETUP 3.5 43 Performance of GIGA-WoLF and Q-learning In order to get a better understanding of the behavior of the algorithms in the selected games, we compared the performance of the algorithms against each other and in self-play with available reports from previous work [10], [104], [71]. Figure 3.1 shows our results. These results represent the average of the average of final payoffs of both learners over 10 runs within each game. Each run consists of running both learning algorithms against each other for 100,000 steps. In addition, the figure shows the standard deviation of the payoffs from the average (to show if it converges consistently to the same rewards or not). 3.5.1 Prisoner’s dilemma In the prisoner’s dilemma, we expected the agents to learn mutual defection when GIGA-WoLF plays against Q-learning. This is due to the fact that we are working with a no-regret algorithm, which learns within few iterations to defect. On the other hand, the Q-learner learns slower than the GIGA-Wolf, so it takes it a larger number of iterations for the Q-learner to learn to defect against GIGA-WoLF. The Q-learner we used uses a high value for γ as it was shown from previous work that it increases the probability of cooperation if the other learners are willing to cooperate [71]. 3.5.2 Chicken When GIGA-WoLF and Q-learning interact in Chicken, they did not stabilize to a fixed outcome among the 10 simulations we ran. Depending on the state action pairs experimented by the Q-learner in the beginning of a simulation, GIGA-WoLF acts accordingly. Thus, if the Q-learner started by attempting to swerve in the beginning, the GIGA-WoLF will go straight. But in most of the cases, GIGA-WoLF
  • 63. CHAPTER 3. EXPERIMENTAL SETUP 44 will go with the safe option (if Q-learner tried to go straight in the beginning), and it will swerve for the rest of the game. Both Q-learning and GIGA-WoLF are not able to reach the compromise situation of (swerve, swerve) in self-play. One agent always “bullies” the other into swerving, while the other goes straight. 3.5.3 Shapley’s game Previous work [9], [50] shows that GIGA-WoLF’s policy does not converge in Shapley’s game. This fact is apparent in both self-play and against Q-learning. As a result, players receive an average payoff near the NE value in this game. The best performance for Q-learning is in self play, as it is often able to learn over time to reach the solution of iterating between “winning” and “loosing”. But again, in some cases it is still unable to reach this satisfactory solution. 3.5.4 Cooperative games Within cooperative games, GIGA-WoLF will find it easy to maintain one of the actions over time, giving it the ability to reach cooperation quickly in self-play. This property helps the Q-learner to easily discover the state-action pair that has the maximum Q-value (as both of the agents in this case have the same goal). As a result, our Q-learners learn mutual cooperation. On the other hand, Q-learning is not able to maintain the highest payoff possible from cooperation in self-play. The reason is although each agent tries to stabilize at one of the actions (to reach its steady-state), the exploration mechhanism within the algorithm sometimes make it hard for both agents to maintain a certain action pair.
  • 64. CHAPTER 3. EXPERIMENTAL SETUP 3.6 45 Summary From the results presented within this chapter, we find that, although both Qlearning and GIGA-WoLF perform well under certain situations, there are situations in which the algorithms do not learn effectively. Furthermore, these algorithms sometimes take a long time to converge. This motivates us to work on deducing new algorithms that are able to adapt within such dynamic systems. In the following chapter, we start discussing the structure of the suggested algorithm and potential variations that could enhance it.
  • 65. CHAPTER 3. EXPERIMENTAL SETUP 46 Final payoffs of GIGA−WoLF and Q−learning in Prisoners dilemma 3 Average payoff (10 runs, each run 100,000 steps) GIGA−WoLF Q−learning 2.5 2 1.5 1 0.5 Vs. GIGA−WoLF Vs. Q−learning Opponents (Vs.) Final payoffs of GIGA−WoLF and Q−learning in Cooperation game GIGA−WoLF Q−learning 4.5 Average payoff (10 runs, each run 100,000 steps) 4 3.5 3 2.5 2 1.5 1 0.5 0 Vs. GIGA−WoLF Vs. Q−learning Opponents (Vs.) Final payoffs of GIGA−WoLF and Q−learning in Chicken game 7.5 GIGA−WoLF Q−learning Average payoff (10 runs, each run 100,000 steps) 7 6.5 6 5.5 5 4.5 4 3.5 Vs. GIGA−WoLF Vs. Q−learning Opponents (Vs.) Final payoffs of GIGA−WoLF and Q−learning in Shapleys game 0.55 GIGA−WoLF Q−learning Average payoff (10 runs, each run 100,000 steps) 0.5 0.45 0.4 0.35 0.3 0.25 0.2 Vs. GIGA−WoLF Vs. Q−learning Opponents (Vs.) Figure 3.1: Payoffs of GIGA-Wolf and Q-Learning within selected games.
  • 66. CHAPTER 4 Learning using Genetic Algorithms In this chapter, we discuss the performance of a basic genetic algorithm (GA) algorithm in repeated matrix games. In addition, we present several suggested modifications to this basic algorithm and show how they may affect the performance of the GA. We first describe the basic GA structure, and the modifications we apply to it. We then demonstrate the performance of these algorithms against GIGA-WoLF, Q-Learning and in self-play. We initially define a set of parameters that are used within our algorithms, we tried to maximize the number of steps taken in order to get a better understanding of the learning trends. These parameters include: the total number of steps (Ns ) and the number of generations (NG ), both determine the time range through which the agent can learn through playing against another agents. We set NG = 100 generations and Ns = 100,000 (for easier manipulation of the calculations required). Once we know NG and Ns , we get a trade-off between the number of chromosomes within a population (Nc ), and the number of steps that each chromosome 47
  • 67. CHAPTER 4. LEARNING USING GENETIC ALGORITHMS 48 plays against the opponent (Nsc ). Equation 4.1 shows the resulting trade-off. NG = Ns Nsc × NG (4.1) Therefore, by setting the total number of steps (Ns ) to 100,00, and by fixing the number of generations to 100 (in order to have an acceptable number of generations through which we compare our results), we get that Nsc × Nc =1000. Through initial experimentations, we set Nsc to different values including 50, 100, and 200, which caused Nc to be 20, 10, and 5 respectively. This analysis shows us that decreasing the number of chromosomes within a population reduces randomization, which can cause the population lean towards a local-optima in certain situations. At the same time, by increasing the members in the population, although it will allow us more exploration, evaluating the population consumes more time. That is why we settled for a population of 20 chromosomes, which appears to be reasonable in our settings. In the following section, we will describe the main structure of the algorithm to get a better understanding of how we incorporate these modifications, as well as comprehend the results and analysis. 4.1 Algorithm structure We analyze a GA typically used in similar problems [3], with slight modifications to the selection function. Before we start, we introduce some of the variables used within our work. Table 4.1 shows the common parameters used within the algorithms. As mentioned within the literature review, GA starts with the initialization of a new population of chromosomes (Pop). Each chromosome C represents the strategy followed by the player in response to the current history (his) of both the player
  • 68. CHAPTER 4. LEARNING USING GENETIC ALGORITHMS 49 and its opponent. An example of the structure of the chromosome within the prisoner’s dilemma game can be seen in Figure 4.1. AIn this structure, each bit in the chromosome represents the action to be taken for a particular history (specificed by the position) of joint action. In our case, we used the last three actions taken by both the agent and its opponent, as has been done in some past work [3]. In order to determine which action to take according to a set of history steps (his), we convert the number his from the base Na to decimal base in order to identify the bit location of the step to be taken. This conversion is made as follows: 3 A p = ∑ (Na )i × hisi i=1 Representation Pm Pc Pe C f P Ch Ap Bp Avp Bavp Nc NG Na Ns Ns c Pop his Ovf g Entrop Variable Mutation Rate Crossover Rate Elitism Rate Chromosome (Strategy) Fitness Parent Child Position of the gene to determine action to be taken Best Chromosome in Previous Generation Average fitness of chromosomes in current generation Best average payoff over generations Number of chromosomes per generation Number of generations Number of actions available for each player Total number of chromosomes Number of steps per chromosome Current population Current history of actions Overall fitness of chromosome Gene within a chromosome (bit) Entropy of a gene Table 4.1: Variables used within the algorithms (4.2)
  • 69. CHAPTER 4. LEARNING USING GENETIC ALGORITHMS 50 Figure 4.1: Chromosome structure Using this equation, we can identify the position of the action to be taken in response to a certain history within the chromosome. For example, if my history was CCCCDC (which represents that my moves in the past three stages were Cooperate, except the last stage was defect, while the opponent cooperated all the time). This history can be encoded as a binary number 000010, which we convert to decimal base. This means that the bit g at position Ap=2 within the chromosome shows the action to be taken given this history. Take note that the binary encoded history can be set in other bases depending on the number of actions available. For example in a 3-action game, we will be working with history encoded in the ternary numeric base. After the initialization of the population (in this experiment we have 20 chromosomes within the population), we start running these random chromosomes against the opponent player to “evaluate” them. Our evaluation here is based on averaging the reward taken by each chromosome within each step against the opponent. Each chromosome plays for 50 steps against the opponent. The reward is then averaged to compute the fitness of the chromosome. Following the evaluation, we sort the population according to fitness f. Here our selection process starts, with the higher fitness at the top. We keep the elite (top) chromosomes, whose number is defined by the elitism rate Pe for the following generation, and apply mutation and crossover on the best two chromosomes. The same sequence is repeated again with the new population Ch until a stopping