1. Simulating Evolution and Behaviour
with monkey.py, redqueen.py, relationship.py
Abhranil Das
IISER Kolkata
1 monkey.py
Abstract
The Infinite Monkey Theorem proposes a dumb monkey at a type-
writer that takes more than the age of the universe to come up with a
Shakespearean quote. Creationists equate this monkey with the force
of evolution and thus denounce the theory. This report discusses a
program emulating a modified monkey which can implement natural
selection and thus generate the desired quote within a few tries.
1.1 Theory
The Infinite Monkey Theorem states that if a monkey sat at a type-
writer and started pressing keys at random, it would take him more
than the age of the universe to come up with something like a quote
from Shakespeare. Creationists immediately grabbed this idea and
used it against the theory of evolution. They said that evolution also
takes place through random mutations, and so is blind and without
foresight, just as the monkey does not know which phrase he is sup-
posed to generate. Thus, the random mutations may be equated to
the random jabs of the monkey at the typewriter. How come, then,
could evolutionists claim that complex organs like eyes and nerves
and brains could evolve within a tiny fraction of the age of the uni-
verse? If a single phrase from Shakespeare cannot be generated by the
monkey within the span of the age of the universe, how can we see
’products of evolution’ roaming the earth that are far more complex
in design than a Shakespearean quote? The creationists said therefore
1
2. 1 monkey.py 2
that these complex biological objects are really the manifestation of in-
telligent design. In fact, a program called The Monkey Shakespeare
Simulator was written that emulates a large population of monkeys. It
took 2,737,850 million billion billion billion monkey-years to come up
with:
RUMOUR. Open your ears; 9r"5j5&?OWTY Z0d...
Does that mean that evolution as a theory is disproved?
Of course not. The key point missing here in the analogy with the
monkey is that there is no natural selection in case of the monkey. Al-
though in evolution we have purely random mutations, these muta-
tions are preferentially chosen via natural selection, which is not a ran-
dom process at all. Every mutation has an associated fitness, which
helps it to survive. Gradually, therefore, the individuals comprising
the population become fitter and fitter. In such a situation, it is not
uncommon for eyes and brains to evolve within a tiny fraction of the
age of the universe.
Therefore, if we modify the program emulating the monkey so that
it now looks at what it has randomly typed and judges its fitness, and
then produces mutated copies of the fittest phrase, then again picks
the fittest etc, we might very well be able to generate a quote by Shake-
speare, provided we define that quote beforehand as being the fittest
phrase possible.
1.2 Aim
i.) To write a program to emulate the monkey sitting at the type-
writer and generating random phrases, but also effecting nat-
ural selection.
ii.) To check whether we can generate desired phrases starting
from a random or a user-defined phrase.
iii.) To generate a fitness v/s generation plot.
1.3 The Program
The Program was written in Python. The program is the following:
1 # ! / usr / bin / python
2 import random , commands , Gnuplot
3 while True :
3. 1 monkey.py 3
4 s r c = raw_input ( ’ Source phrase o f same l e n g t h as d e s t i n a t i o n
phrase ( e n t e r blank f o r random s o u r c e phrase ) : ’ )
5 d s t = raw_input ( ’ D e s t i n a t i o n phrase : ’ )
6 d = len ( dst )
7 pop = in pu t ( ’ P o p u l a t i o n s i z e : ’ )
8 N = in pu t ( ’ Generation count ( e n t e r 0 t o s t o p a t f i r s t s u c c e s s f u l
generation ) : ’ )
9 c h r s e t = ’ ’ . j o i n ( [ chr ( i ) f o r i in range ( 3 2 , 1 2 7 ) ] )
10 l = [ ’ ’ ]∗d
11 title = ’ ’
12 i f l e n ( s r c ) < >0:
13 l = [ i f o r i in s r c ]
14 t i t l e = src
15 else :
16 f o r i in range ( d ) :
17 l [ i ] = random . sample ( [ j f o r j in c h r s e t ] , 1 ) [ 0 ]
18 t i t l e = ’ ’ . j o i n ( l ) + ’ ( Random ) ’
19 t i t l e = ( chr ( 3 9 ) ∗ 2 ) . j o i n ( t i t l e . s p l i t ( chr ( 3 9 ) ) )
20 lpop = [ [ ’ ’ ] ∗ d ] ∗ pop
21 ppop = lpop [ : ]
22 prob = [ ’ ’ ] ∗ d
23 f o r i in range ( d ) :
24 prob [ i ] = f l o a t ( min ( [ abs ( c h r s e t . f i n d ( d s t [ i ] ) − c h r s e t .
f i n d ( l [ i ] ) ) , l e n ( c h r s e t ) − abs ( c h r s e t . f i n d ( d s t [ i ] ) −
c h r s e t . f i n d ( l [ i ] ) ) ] ) ) /( l e n ( c h r s e t ) /2)
25 psum = [ sum ( prob ) ] ∗ pop
26 lplot = []
27 n = 1
28 inst = ’ ’
29 f = open ( ’ monkeydata . t x t ’ , ’w’ )
30 p r i n t >> f , ’ Source : ’ , t i t l e
31 p r i n t >> f , ’ D e s t i n a t i o n : ’ , d s t
32 p r i n t >> f , ’ P o p u l a t i o n : ’ , pop
33 while True :
34 inst = ’ ’ . join ( l )
35 p r i n t >>f , ’ n ’ , n , ’ t ’ , i n s t , ’ t ’ , 1 − min ( psum ) /d , ’ n ’
36 l p l o t . append ( s t r ( n ) )
37 l p l o t . append ( ’ t ’ )
38 l p l o t . append ( s t r ( 1 − min ( psum ) /d ) )
39 l p l o t . append ( ’ n ’ )
40 # probability of survival : natural selection
41 f o r i in range ( d ) :
42 prob [ i ] = f l o a t ( min ( [ abs ( c h r s e t . f i n d ( d s t [ i ] ) −
c h r s e t . f i n d ( l [ i ] ) ) , l e n ( c h r s e t ) − abs ( c h r s e t .
f i n d ( d s t [ i ] ) − c h r s e t . f i n d ( l [ i ] ) ) ] ) ) /( l e n (
c h r s e t ) /2)
43 # randomization of the string : mutations
44 f o r i in range ( pop ) :
45 lpop [ i ] = l [ : ]
46 f o r i in range ( pop ) :
47 f o r j in range ( d ) :
48 i f random . uniform ( 0 , 1 ) <=prob [ j ] :
49 lpop [ i ] [ j ] = random . sample ( [ k f o r
k in c h r s e t ] , 1 ) [ 0 ]
50 ppop [ i ] [ j ] = f l o a t ( min ( [ abs ( c h r s e t . f i n d (
d s t [ j ] ) − c h r s e t . f i n d ( lpop [ i ] [ j ] ) ) ,
4. 1 monkey.py 4
l e n ( c h r s e t ) − abs ( c h r s e t . f i n d ( d s t [ j ] )
− c h r s e t . f i n d ( lpop [ i ] [ j ] ) ) ] ) ) /( l e n (
c h r s e t ) /2)
51 psum [ i ] = sum ( ppop [ i ] )
52 p r i n t >>f , ’ t ’ , ’ ’ . j o i n ( lpop [ i ] ) , ’ t ’ , 1 − psum [
i ]/d
53 l = lpop [ psum . index ( min ( psum ) ) ]
54 i f N <> 0 :
55 i f n == N:
56 break
57 else :
58 i f i n s t == d s t :
59 break
60
61 n += 1
62 f . close ( )
63 g = open ( ’ monkeyplot . t x t ’ , ’w’ )
64 p r i n t >> g , ’ ’ . j o i n ( l p l o t )
65 g . close ( )
66 d s t = ( chr ( 3 9 ) ∗ 2 ) . j o i n ( d s t . s p l i t ( chr ( 3 9 ) ) )
67 h = Gnuplot . Gnuplot ( )
68 h ( ’ ’ ’ s e t x l a b e l ’Number o f t r i e s ’ ’ ’ ’ )
69 h( ’ ’ ’ set ylabel ’ Fitness ’ ’ ’ ’ )
70 h ( ’ ’ ’ p [ : ] [ 0 : 1 . 2 ] 1 t i t l e ’Maximum F i t n e s s ’ , ’ monkeyplot . t x t ’ w
l t i t l e ’ Fitness ’ ’ ’ ’ )
71 commands . g e t o u t p u t ( ’ g e d i t monkeydata . t x t ’ )
The program does the following:
It takes the source and destination phrases from the user. They must
be of same length. If a blank is entered for source, a random source
of the same length as the destination is generated. The population
size is entered. This is the number of mutated phrases produced from
the selected phrase in a generation. The number of generations that
one wishes to see is entered. If 0 is entered, the program stops at the
first generation which produces the destination phrase. A character
set is taken. This is basically the alphabet that the monkey will use. It
consists of letters of either case, numbers, punctuations and symbols.
A probability of change is defined for each position of the phrase.
This is related to the difference between the current character at that
position and the character that lies in the destination phrase at that
position. This difference is evaluated in the following way:
The character set is arranged in a ring. The character at a particular
position of the current phrase and that at the same position of the desti-
nation phrase are located on that ring. Now the shortest distance from
one to the other is calculated. For example, if they are neighbours, the
distance would be 1.
The probability of change is proportional to this distance, normal-
5. 1 monkey.py 5
ized by the maximum possible distance.
The characters of the phrase are then changed with this probabil-
ity. The new characters are picked randomly from the character set. A
number of mutated phrases are produced, equal to the specified pop-
ulation number.
The probability of change averaged over all the characters of a phrase,
and subtracted from 1, gives its fitness. Thus, fitness here is a number
from 0 to 1. The destination phrase has a fitness of 1.
The fittest individual in a generation is selected, and mutated copies
are again generated from it.
This goes on until the specified generation count is reached, or the
destination phrase is achieved, whichever is specified by the user.
A data file (text) is generated that contains all the initial settings and
the resulting data. Also, a plot of the fitness of the fittest individual in
the population v/s generation is produced. This plot is plotted using
gnuplot.
1.4 Results
Let us first get a feel of the output data. Let us take a look at the phrase
in a certain generation and the mutated copies that are produced from
it:
Generation Phrase Fitness
1 Evolution 0.858156028369
Evolution 0.858156028369
EvokutiwY 0.862884160757
Evo?ution 0.777777777778
EvfluVion 0.832151300236
EvoluJion 0.796690307329
2 EvokutiwY 0.862884160757
This is the parent phrase (’Evolution’) of generation 1 with fitness 0.85
producing mutated copies. The destination phrase is ’Noitulove’. The
second mutated daughter is fittest in the population, and is chosen in
generation 2. This example should make things clear.
Let us try to generate the following quote from Shakespeare’s Ham-
let: ’To be or not to be, that is the question.’ We take a random source
phrase, a population of 5, and choose to stop at the first successful
6. 1 monkey.py 6
generation.
The target is reached in 3167 generations. So this did not quite take
the age of the universe. Some excerpts from the data file generated are
shown. A few representative generations are picked:
Source: j<H5<)P%FU5D~tm%UJA=IZRw5Tvbt/}kR6woE5,<X (Random)
Destination: To be or not to be, that is the question.
Population: 5
Generation Phrase Fitness
1 j<H5<)P%FU5D~tm%UJA=IZRw5Tvbt/}kR6woE5,<X 0.47275557862
...
100 Qq bf}nr"ouv~tr&ba.Tt[^^!fs tgi}quduubqn- 0.926310326933
...
300 Uo bf}kr~oqt tq bf,!tgar ls tgg$quknujrn/ 0.976128697457
...
500 So bd}or~qkt tm bf,!tgar ls the!qudsqjpn/ 0.984431759211
...
2000 So be or~nnt tn be, that is the questjon/ 0.984431759211
...
3167 To be or not to be, that is the question. 1.0
We can see how the phrase converges to the destination phrase, and
the fitness goes to 1. The following is the fitness v/s generation curve:
7. 1 monkey.py 7
It is noticeable that the fitness curve goes asymptotically to 1. The
reason is that the fitter the phrase becomes, i.e. the closer it gets to the
target phrase, the smaller is its chance of mutating again. So when it
is very close to the target phrase, it requires a longer time to make a
change.
This is, in fact, a point at which the analogy with evolution breaks
down. The asymptotic nature comes about because we fixed before-
hand the target phrase. However, evolution is not teleological, in that
there is no perfect organism towards which animals evolve. In a real
fitness v/s generation plot, the trend will not be asymptotic.
Also notice that at some regions we have the fitness actually drop-
ping. This is not surprising. It has an analog in real life: retrograde
evolution. I have zoomed in on a region of the plot that shows such a
thing.
8. 1 monkey.py 8
This also shows up in the data file in generations 10 and 11:
10 KmZXvy}p{tyv~td%_p-#$j^w))ojtZov~UzinvR- 0.807991696938
Km50vylp{tyv~td%_p-#$jnw)eojtZoL~Uzinv;- 0.787752983913
Cm@Xvy}p{tyv~td%_p-/8j^wf)ojtZo!~3ainvW- 0.758173326414
cm0X8y}p{tyv~t5%_p-#$"^w)youtZov~Uz(nvR- 0.785158277115
KN,Xvy}p{tyv~td%_p-#$j^wp>ojtZo9~qzinvR- 0.779968863518
Pm7Xvy_p{tyv~td%_p-#$j^w)/ojtZovuU7ivR- 0.799688635184
11 Pm7Xvy_p{tyv~td%_p-#$j^w)/ojtZovuU7ivR- 0.799688635184
We see that the fitness in generation 10 was 0.8, whereas that in 11
came down to 0.79. If one has understood the working principle of the
program, they shall see that such a thing is not impossible.
Now we run the program again with the same parameters, except
with a population size of 50.
9. 1 monkey.py 9
The destination phrase is achieved in 743 tries. Then we try the
same thing with a population of 500:
10. 1 monkey.py 10
The destination phrase arrives in 82 tries.
We see that with increasing population size, the target phrase is
achieved quicker, and there is less retrograde evolution. Both of these
are easily understood. With a greater population size, there is a greater
chance of getting a mutated daughter that is closer to the target phrase.
Thus, we reach the target quicker. And since the progeny is more in
number, it’s less probable that all of them will be less fit than the par-
ent. Thus the chances of retrograde evolution are reduced.
1.5 Conclusions
i.) Thus, if the monkey is not all that dumb, but applies natural
selection, it can arrive at desired results pretty quickly.
ii.) Retrograde evolution can be observed. It reduces with larger
population size.
iv.) The number of tries requires for success drops with increas-
ing population.
11. 2 redqueen.py 11
ii.) Although teleology is a drawback of this simulation, with
some effort the concept of destination phrase that introduces
this teleology can be removed.
2 redqueen.py
Abstract
The Red Queen Hypothesis is associated with coevolution. This pro-
gram simulates such co-evolution using two random variables depen-
dent upon each other. The dependence of the coevolution pattern on
various parameters is observed.
2.1 Theory
Coevolution occurs when two biologically evolving objects are related
to each other. It may be as small as amino acids or as large as pheno-
types of two species in an ecology. The Red Queen Hypothesis, origi-
nally proposed by Leigh Van Valen (1973), is a model to describe this
evolutionary arms race.
In ’Through the Looking Glass’ by Lewis Carroll, the Red Queen
says to Alice, ’It takes all the running you can do, to keep in the same
place.’ In evolutionary biology, this quote can be translated to:
’An evolutionary system that co-evolves with other systems needs
to continually develop to maintain its fitness relative to these other
systems.’
For example, the speed of a predator and that of its prey are coe-
volving entities. Similarly, the colouration on a plant and a caterpillar
that camouflages on it are also coevolving. Another example is host-
parasite interaction.
2.2 Aim
i.) To write a program for coevolution using a simplified model
ii.) To check whether coevolution occurs, and observing its na-
ture
iii.) To see the effect of various parameters on the pattern of co-
evolution
12. 2 redqueen.py 12
2.3 The Program
The program was written in Python. It is the following:
1 # ! / usr / bin / python
2 while True :
3 l 1 = in pu t ( ’ S t a r t value o f s p e c i e s 1 : ’ )
4 l 2 = in pu t ( ’ S t a r t value o f s p e c i e s 2 : ’ )
5 pop1 = in pu t ( ’ P o p u l a t i o n s i z e o f s p e c i e s 1 : ’ )
6 pop2 = in pu t ( ’ P o p u l a t i o n s i z e o f s p e c i e s 2 : ’ )
7 N = in pu t ( ’ Generation count : ’ )
8 lpop1 = [ ’ ’ ] ∗ pop1
9 lpop2 = [ ’ ’ ] ∗ pop2
10 fpop1 = lpop1 [ : ]
11 fpop2 = lpop2 [ : ]
12 n = 1
13 import random , math , Gnuplot , commands
14 def f i t n e s s ( i , u ) :
15 r e t u r n math . e∗∗( − abs ( i −u ) )
16 f = open ( ’ redqueenplot . t x t ’ , ’w’ )
17 g = open ( ’ redqueendata . t x t ’ , ’w’ )
18 h = open ( ’ f i t n e s s p l o t . t x t ’ , ’w’ )
19 p r i n t >> g , ’ S t a r t value o f s p e c i e s 1 : ’ , l 1 , ’ t ’ , ’ S t a r t value o f
species 2: ’ , l2
20 p r i n t >> g , ’ P o p u l a t i o n s i z e o f s p e c i e s 1 : ’ , pop1 , ’ t ’ , ’
P o p u l a t i o n s i z e o f s p e c i e s 2 : ’ , pop2
21 p r i n t >> g , ’ Generation count : ’ , N
22 p r i n t >> g , ’ Generation : ’ , ’ t ’ , ’ S p e c i e s ’ , ’ t ’ , ’ Value ’ , ’ t ’ ∗ 2 ,
’ Fitness ’
23 while n <= N:
24 p r i n t >> f , n , ’ t ’ , l 1 , ’ t ’ , l 2
25 f i t = f i t n e s s ( l1 , l 2 )
26 p r i n t >> g , n , ’ t ’ ∗ 2 , ’ 1 ’ , ’ t ’ , ’%17 f ’%l 1 , ’ t ’ , ’%17 f ’ %(
f i t ) , ’ n ’
27 f o r i in range ( pop1 ) :
28 lpop1 [ i ] = l 1
29 # random m u t a t i o n s
30 lpop1 [ i ] = random . n o r m a l v a r i a t e ( l 1 , 1 0 )
31 #fitness
32 fpop1 [ i ] = f i t n e s s ( lpop1 [ i ] , l 2 )
33 p r i n t >>g , ’ t ’ ∗ 3 , ’%17 f ’%lpop1 [ i ] , ’ t ’ , fpop1 [ i ]
34 p r i n t >> g , ’ n ’
35 p r i n t >> g , ’ t ’ ∗ 2 , ’ 2 ’ , ’ t ’ , ’%17 f ’%l 2 , ’ t ’ , ’%17 f ’ %( f i t
) , ’ n ’
36 f o r i in range ( pop2 ) :
37 lpop2 [ i ] = l 2
38 # random m u t a t i o n s
39 lpop2 [ i ] = random . n o r m a l v a r i a t e ( l 2 , 1 0 )
40 #fitness
41 fpop2 [ i ] = f i t n e s s ( lpop2 [ i ] , l 1 )
42 p r i n t >> g , ’ t ’ ∗ 3 , ’%17 f ’%lpop2 [ i ] , ’ t ’ , fpop2 [ i
]
43 p r i n t >> g , ’ n ’
44 p r i n t >> h , n , ’ t ’ , f i t
45 # natural selection
46 def f i t t e s t ( l , lpop ) :
13. 2 redqueen.py 13
47 lpoptemp = lpop [ : ]
48 f = max ( lpoptemp )
49 while f > l :
50 lpoptemp . remove ( max ( lpoptemp ) )
51 i f l e n ( lpoptemp ) <> 0 :
52 i f max ( lpoptemp ) <= l :
53 break
54 else :
55 break
56 f = max ( lpoptemp )
57 return f
58 l 1 , l 2 = f i t t e s t ( l 2 , lpop1 ) , f i t t e s t ( l 1 , lpop2 )
59 n += 1
60 f . close ( )
61 g . close ( )
62 h . close ( )
63 commands . g e t o u t p u t ( ’ ’ ’ g e d i t ’ redqueendata . t x t ’ ’ ’ ’ )
64 i = Gnuplot . Gnuplot ( )
65 i ( ’ ’ ’ s e t t i t l e ’ Red Queen Hypothesis : Phenotype Values with
Generation ’ ’ ’ ’ )
66 i ( ’ ’ ’ s e t x l a b e l ’ Generation ’ ’ ’ ’ )
67 i ( ’ ’ ’ s e t y l a b e l ’ Phenotype Values ’ ’ ’ ’ )
68 i ( ’ ’ ’ p ’ redqueenplot . t x t ’ u 1 : 2 w l t i t l e ’ S p e c i e s 1 ’ , ’
redqueenplot . t x t ’ u 1 : 3 w l t i t l e ’ S p e c i e s 2 ’ ’ ’ ’ )
69 raw_input ( )
70 i ( ’ ’ ’ s e t t i t l e ’ Red Queen Hypothesis : F i t n e s s with Generation ’ ’
’ ’)
71 i ( ’ ’ ’ s e t x l a b e l ’ Generation ’ ’ ’ ’ )
72 i ( ’ ’ ’ set ylabel ’ Fitness ’ ’ ’ ’ )
73 i ( ’ ’ ’p [ : ] [ 0 : 1 ] ’ f i t n e s s p l o t . t x t ’ w l t i t l e ’ F i tn e s s ’ ’ ’ ’ )
What does this program do? Well, what it does is that it takes two
variables, which can take number values. Each of the variables pro-
duces mutated copies of themselves in one generation. These copies
have values that are generally different from the parent value from
which they are spawned. These mutated numbers are chosen through
a gaussian distribution centered around the parent number and with
a standard deviation of 10 units. One each of these mutated copies
are then chosen according to a fitness criterion. Up to this point it is
similar to monkey.py. But it is the fitness criterion that produces the
coevolution. Let’s say that my two variables are X and Y. The fitness
criterion for X is that its value must be greater than that of Y, and that
of Y is that its value must be greater than the value of X.
Think of the two variables as signifying the phenotypes of two closely
dependent species, let’s say the running speeds of a predator and its
prey. The predator hunts its prey by running it down. So it’s easily
seen that prey individuals who run faster than the predator survive.
Similarly, predator individuals that run faster than their prey also sur-
14. 2 redqueen.py 14
vive.
One might think from this criterion that the fitness criteria have no
upper bound, meaning that any value that is greater than the current
value of the other variable is fit. Moreover, one might be tempted to
think that the larger the value, the greater the fitness. However, this is
wrong. To see why, we need to go back to our example.
Let’s say that the average speed of the prey in one generation is 10
m/s. Now, obviously, a particular predator individual benefits from
having a running speed greater than 10 m/s, but the higher its speed
is, the more resources it is spending behind running. It is able to hunt
down prey if it is faster than 10 m/s. It is not really necessary for it
to run 50 m/s or 100 m/s. These are as good as 13 m/s. In fact, the
predator is worse off with these higher speeds because it costs its en-
ergy resources. Therefore, the fittest value is the lowest speed that is
greater than the prey speed. This argument may be used in any other
scenario of coevolution, not just in cases of speed. Making a forward
move in terms of evolution is costly, so only the smallest move neces-
sary will be made, so that the gain reaped is highest for the smallest
cost.
Therefore, our program first takes the values of the two variables
in generation i, say xi and yi . Then it generates two sets of mutated
values from these two. From the first set, it selects the value that is
just greater than yi . From the second set, it selects the value that is just
greater than xi . These two values then become xi+1 and yi+1 .
The program takes five inputs: the starting values of the two species,
the population sizes of the two species (this is the number of mutated
progeny produced), and the number of generations to simulate. It pro-
duces three outputs: a data file with all relevant information, a simul-
taneous plot of the two values of the two species against generation
number, and a plot of the fitness against generation.
But there’s a catch here. We have defined what fittest is. But what is
the fitness value in general? We see that we don’t really need a number
associated with fitness if we are anyway able to select the fittest. But
we need a fitness against generation plot. For this reason alone we
artificially define a fitness. To understand how we define it, we first
need to assume that the values of the two variables are integral. Let’s
say that in one generation the value of Y is 14. Now, the fittest value of
X possible is 15, (with a fitness of 1), and as we go towards either side
15. 2 redqueen.py 15
of 15, the fitness should drop asymptotically to 0. A good candidate
for such a function is f d ( x, y) = e−|x−(y+1)| , which gives the fitness of
x (a value of X) when the value of the other phenotype, Y is y. The d
in the subscript stands for discrete. Why this function was chosen is
clear from a plot of it, with Y = 14:
We can see that the fitness of X = 15 is 1, whereas for values on either
side, the fitness drops exponentially.
Similarly, the fitness of y when the value of X is x is given by:
f d (y, x ) = e−|y−(x+1)| .
Let’s find the fitnesses of X and Y when their values are 15 and 14
respectively:
16. 2 redqueen.py 16
The Fitness of X curve intersects X = 15 at the fitness value 1 (be-
cause X = 15 is fittest for Y = 14), whereas the value of the Fitness of Y
curve at Y = 14 is around 0.13 (because for X = 15, Y = 16 is fittest).
But when X and Y can take continuous values, assume that the
fittest value of Y is just more than x. So we take the fittest value to
be x+ , then we take to 0. This changes the fitness formula to the
continuous version: f c ( x, y) = e−|x−y| = f c (y, x ). The symmetry in x
and y basically means the following: if X = 15 and Y = 14, say, their
fitnesses are the same. This is shown by the following graph:
17. 2 redqueen.py 17
We see that the value of the Fitness of X curve at X = 15 is the same
as that of the Fitness of Y curve at Y = 14, both equal to around 0.3.
Thus, for any two values of X and Y, their fitnesses will be the same.
This is the continuous fitness formula which is used in the program to
plot the fitness against generation. Since the fitnesses of X and Y at a
time are the same, we have only one curve.
2.4 Results
Let’s run the program once. The starting values are 100 and 10 for the
phenotypes of X and Y, 5 each for their populations, and generation
count as 100.
A sample of the data file is the following, from generation 50 to 51:
Start value of species 1: 100 Start value of species 2: 10
Population size of species 1: 10 Population size of species 2: 10
Generation count: 100
Generation Species Value Fitness
50 1 171.045513 0.000015
2 182.149629 0.000015
51 1 183.156338 0.000008
2 171.443637 0.000008
18. 2 redqueen.py 18
We can see that the values 171.04 and 182.14 become 183.15 and
171.44, which is almost a flip with an upward shift.
The following is the plot of the phenotype values with generation:
We see the following:
• The phenotypes steadily climb to higher values. After generation
100, the values are at 276.23 and 274.18 respectively.
• They chase each other, never going too far from each other.
These two are the hallmark of coevolution.
• The difference in the initial values is quickly bridged, as the phe-
notypes converge to a central value.
• Opposing zig-zags are seen in certain regions:
19. 2 redqueen.py 19
These zig-zags are easily understood. If there is a gap between the
X and Y values in a generation, then X jumps to a value just above
Y, and Y jumps to a value just above X in the next generation. This
almost reverses their values from the previous generation, with a small
upward shift in both values.
This is a drawback of this program. It cannot be removed by in-
creasing the population size. In a real scenario, generations overlap,
and this situation does not arise.
The following is the fitness v/s generation plot:
20. 2 redqueen.py 20
We see that the fitness value is not really increasing on the aver-
age over time, although the species are continually evolving. This is
the crux of the Red Queen Hypothesis. There is a need for continual
evolution just to maintain the fitness.
Now we run the program again. This time we take both initial val-
ues to be 10, in order to minimize zig-zags, and a larger population
size of 50. We plot the phenotype values in the same y range as the
previous plot in order to compare:
21. 2 redqueen.py 21
• The phenotype values rise slower.
This is expected because now from a larger population, an individual
is chosen which is really close to the other phenotype value. So the
upward shift is smaller. In the real world, the population sizes are
much larger, so that it takes many generations for a considerable bit of
evolution to take place.
• The values follow each other more closely.
This is also expected because now it is more likely that a mutant from
the X population will be very close to the current Y value.
• There are smaller zig-zags if we start from the same starting val-
ues.
Zig-zags are not initiated because the X and Y values do not become
considerably different. If somehow zig-zags are initiated, say from
two different starting values, they shall continue, no matter how large
the population.
This is the fitness plot:
22. 2 redqueen.py 22
The fitness values are more than the previous case.
Now we plot with population size of 100:
24. 2 redqueen.py 24
And then with population size of 1000:
25. 2 redqueen.py 25
We can see that the increase of phenotype values with generation is
very slow, but it is not zero.
Here is a plot of the fitness against generation for the different pop-
ulation sizes:
26. 3 relationship.py 26
We can see that the generation-average of fitness increases with in-
creasing population. This is easily understood. A larger population
means that the fittest individual will be fitter than in a smaller popu-
lation.
2.5 Conclusions
i.) Coevolution is demonstrated using this program.
ii.) The Red Queen Hypothesis is verified: the fitness is main-
tained even with continual evolution.
iii.) The coevolution is close with increasing population size.
iv.) The generation-average of the fitness increases with increas-
ing population size.
3 relationship.py
Abstract
A simple program to analyse the formation and breaking of heterosex-
27. 3 relationship.py 27
ual relationships in a population of singles and couples, and the effect
of the values of various parameters on the result.
3.1 Theory
The theory involved with this program is very simple. We know that
a heterosexual human population consists of single males, single fe-
males and couples. These elements may be supposed to roam freely
through their world, often chancing upon enocunters. There may be
different types of encounters: a single male with another single male,
a single female with another single female, a single male with a single
female, and encounters between committed persons. When a single
male meets (read encounters, interacts, and familiarizes with) a single
female, there is a certain chance that they shall commit themselves to
a relationship. For purposes of simplification, we assume that other
sorts of encounters do not amount to the formation of any relation-
ship. In particular, it may be assumed that couples are isolated from
the rest of the world, in that they have already committed themselves
to a relationship, and in the duration of it, will not be involved in any
encounters that break or form a relationship. This is surely a draw-
back of the program, because in many cases committed persons are
tempted by single persons, which results in the breaking and making
of relationships. But here we neglect that.
Once in a relationship, there is the chance of a break-up. If this
happens, the persons forming that couple are released into the world,
when they can again chance upon encounters.
3.2 Aim
i.) To write a program to simulate a population of heterosexuals
making and breaking relationships
ii.) To observe the results for different initial settings
iii.) To attempt to get a balanced condition where the number of
committed persons equals the number of single persons.
3.3 The Program
The program was written in Python. It is the following:
28. 3 relationship.py 28
1 # ! /usr/bin/python
2 while True :
3 f = in pu t ( ’ Number o f s i n g l e f e m a l e s : ’ )
4 m = in pu t ( ’ Number o f s i n g l e males : ’ )
5 c = in pu t ( ’ Number o f c o u p l e s : ’ )
6 pc = in pu t ( ’ P r o b a b i l i t y o f commitment : ’ )
7 pb = in pu t ( ’ P r o b a b i l i t y o f break −up : ’ )
8 import random , Gnuplot
9 h = Gnuplot . Gnuplot ( )
10 h ( ’ ’ ’ s e t t i t l e ’ R e l a t i o n s h i p s : M %d F %d C %d PC %f
PB %f ’ ’ ’ ’ % (m, f , c , pc , pb ) )
11 g = open ( ’ r e l a t i o n s h i p . t x t ’ , ’w’ )
12 p r i n t >> g , 1 , ’ t ’ , m , ’ t ’ , f , ’ t ’ , c
13 f o r i i n range ( ( f +m) ∗ ( f +m− 1) /2) :
14 i f f ∗m > 0 :
15 x = random . sample ( range ( f +m) , 2 )
16 i f ( x [0] − ( f − 0 . 5 ) ) ∗ ( x [1] − ( f − 0 . 5 ) ) < 0 :
17 i f random . uniform ( 0 , 1 ) <= pc :
18 f −= 1
19 m −= 1
20 c += 1
21 if c > 0:
22 i f random . uniform ( 0 , 1 ) <=pb :
23 f += 1
24 m += 1
25 c −= 1
26 p r i n t >> g , i +1 , ’ t ’ , m , ’ t ’ , f , ’ t ’ , c
27 g . close ( )
28 h( ’ ’ ’ set xlabel ’ Iterations ’ ’ ’ ’)
29 h ( ’ ’ ’ s e t y l a b e l ’Number ’ ’ ’ ’ )
30 h ( ’ ’ ’ p ’ r e l a t i o n s h i p . t x t ’ u 1 : 2 w l t i t l e ’ Males ’ , ’ r e l a t i o n s h i p
. t x t ’ u 1 : 3 w l t i t l e ’ Females ’ , ’ r e l a t i o n s h i p . t x t ’ u 1 : 4 w l
t i t l e ’ Couples ’ ’ ’ ’ )
What the program does is the following: it takes the starting num-
ber of single females, single males and couples, and the probabilities of
commitment and break-up. Then it starts taking random samples (SR-
SWOR) of size two from the population of the singles. If a male and a
female come up, they may get committed with the preset probability
value. If they are committed, they are removed from the population.
Simultaneously, the couples are broken up with the preset probability
value, upon which they enter the population again. The number of
times this operation is done is equal to the total number of possible
collisions. This is because when the population size is larger, there are
also more collisions per unit time. So, if we are to see the time evolu-
tion of the system, each time step should contain more collisions if the
population size is more. Thus, if instead of defining a time step that
contains more iterations for a greater population, we might just take
the total number of collisions as the range of iterations over which to
29. 3 relationship.py 29
plot the graph. This way, we can say that all the graphs generated take
place more or less in the same duration.
The principle is somewhat like Brownian motion. The single per-
sons may be assumed to execute this random motion, and sometimes
there is a collision between a male and a female. If they enter into a re-
lationship, they are precipitated from the system. The precipitate at the
bottom, consisting in couples, also releases molecules into the system
with a finite chance. A dynamic equilibrium may thus be established
between committed and single persons.
The program then plots the number of single males, single females
and couples against iteration number, using gnuplot.
3.4 Results
This is a sample plot for number of single females (F) = 50, number
of single males (M) = 100, number of couples (C) = 50, probability of
commitment (PC) = 0 and probability of break-up (PB) = 1:
30. 3 relationship.py 30
We see that starting from the initial values, the scenario changes
steadily and monotonically to zero couples, and all persons single.
Now we keep the population same, but change PC to 0.3, and PB to
0.7:
The situation is more or less the same, but now there are some fluc-
tuations from the steady state, which are quickly restored.
Now we change to PC = 0.5, PB = 0.5:
31. 3 relationship.py 31
Now the fluctuations are more, but surprisingly, it is still more or
less a steady state with very low number of couples and almost all
singles, even though the probabilities of commitment and break-up
are the same.
Now we change to PC = 0.65 and PB = 0.35:
32. 3 relationship.py 32
The balance is turning towards commitment. Note that the male
and female curves are mirrors of each other, only displaced vertically.
This is obvious, because a change in the single male number during
making or breaking of couples is linked to the same change in the sin-
gle female number.
The couple curve is the inversion of the other two curves. This is
because when either of the two other curves increases or decreases by
some amount, the couple curve should decrease pr increase by half
that amount.
Now we change to PC = 0.7 and PB = 0.3:
33. 3 relationship.py 33
This cannot really be called a steady state. Moreover, now the bal-
ance is in equal favour of single and committed people. The couple
curve is more or less midway between the single curves, which means
that double of the couple curve (the number of committed people) is
the sum of the single males and single females (total number of single
people).
Now we change to PC = 0.9, PB = 0.1:
34. 3 relationship.py 34
The balance is now turning towards couples.
And finally, for PC = 1 and PB = 0:
35. 3 relationship.py 35
All females are committed, which is a limiting condition for the
larger number of males, leaving 50 of them single.
Why are we getting a balance for PC = 0.7 and PB = 0.3, and not
when the two are equal? Well, there are two chance factors that go into
forming a couple. First: they must meet, second: they must commit.
On the other hand, a formed couple needs only to break up, which is
just a single chance factor. Thus, the situation is rigged towards more
single people than couples. To balance this rigging, the probability of
commitment should be fixed higher.
However, this 0.7:0.3 ratio does not provide balance in all popula-
tions. If we increase M to 200 and decrease F to 20, then these same
probabilities give us almost all singles:
36. 3 relationship.py 36
This is because the chance factor of meeting depends upon the pop-
ulation composition. Here the number of females being less reduced
the chances of a male meeting a female, so that the earlier rigged ratio
was not enough to produce a balance.
For a population where it’s easier for a male and female to meet, the
earlier ratio would now produce more committed people than single
people:
37. 3 relationship.py 37
What, then is the criterion for producing a balance? For this, we
must equate the probabilities of the breaking and making of a relation-
fm
ship. The chance of meeting is given by f +m C , where f and m are the
2
numbers of single males and females. The chance of commitment is pc .
Therefore, if we consider meeting and commitment to be independent
events, which is true for our program, we have the probability for the
fm
making of a relationship as pc f +m C . The probability of break-up is pb .
2
fm p fm
So we have our criterion as: = pb =⇒ pb = f +m C . This is a
p c f +m C c
2 2
positive proper fraction, so that pb ≤ pc for any population composi-
tion. This criterion only decides the ratio of the two probabilities. If,
as before, we impose the additional condition that the sum of these
probabilities should be 1 (there isn’t any logical basis for this, but ap-
plying this does not affect the evolution of the system), then we can
separately evaluate pb and pc :
fm f +m
pc f +m C = 1 − pc =⇒ pc = 1
fm = f +m C C2f m .
+
2 1+ f + m 2
C2
fm
So that pb = f +m C + f m .
2
38. 3 relationship.py 38
We see that the balance condition probabilities do not depend on
the number of couples.
For f = 50 and m = 100, our first example, we have pc = 0.69 ,
pb = 0.31. This is very close to our empirical values of 0.7 and 0.3.
For the second example, (m=200, f=20, c=50), we have pc = 0.85 ,
pb = 0.15. The following plot shows this:
Now we are in for a surprise. The couple curve is not midway be-
tween the other two curves, so that the system is not balanced. Why
did this happen? Well, close observation shows that with this initial
population composition, it is impossible to equate the number of com-
mitted and single people. If we try to raise the couple curve, we have
to lower the other two curves by the same distance, but the female
curve will touch the baseline.
There is a criterion on the initial values for the system to be able to
reach a balance.
Let’s take our initial values to be f, m, and c. Let’s say x females (or
males) need to get committed from the initial value to reach a balance.
39. 3 relationship.py 39
x may be positive or negative (meaning couples need to break to join
the single population). So in the balance condition we have:
f +m−2c
f − x + m − x = 2(c + x ) =⇒ x = 4 .
Now, the upper limit of x is the number of single females if they
are fewer than males, because once all females are committed, there
cannot be any more commitments. Similarly, if males are fewer than
females, the upper limit of x is the number of males.
The lower limit of x is simply -c, because that many males (or fe-
males) from the couple population can join in the single male (or fe-
male) population.
∴ x [−c, min{ f , m}]
=⇒ −c ≤ f +m−2c ≤min{ f , m}
4
The lower limit can be written as:
−4c ≤ f + m − 2c =⇒ −2c ≤ f + m, which is always true since
f + m is positive. So we may ignore the lower limit.
Therefore, the condition on the initial values for the system to allow
a balance is: f + m − 2c ≤ 4min{ f , m}.
So even though the balance condition does not depend on the num-
ber of couples, the balance criterion depends on it. For a wrong value
of c, we can never have balance.
In the last case that we considered, we have the L.H.S. = 200 + 20 −
2 × 50 = 120. The R.H.S. = 4 × 20 = 80. 120 80, we could not
reach a balance even with optimum probability values.
Now let’s consider our first example, with f = 50, m = 100, c = 50.
We have the L.H.S. = 50 + 100 − 100 = 50. The R.H.S. = 4 × 50 = 200.
So the balance threshold was satisfied, and we got a balance.
For the third example (f = 80, m = 100, c = 50), we have 80 ≤ 120, so
a balance can be achieved.
The balance condition is given by pc = 0.668187474,pb = 0.331812525.
If you are wondering why I have put so many decimal places, it will
be clear shortly. First let us plot the balanced state:
40. 3 relationship.py 40
The end bit does not look very balanced, so it’s difficult to believe
that this is the balanced condition. However, to convince you, I shall
take slightly different values of the probabilities.
Let’s take pc = 0.68,pb = 0.32:
41. 3 relationship.py 41
Commitment is greater here. Now we take pc = 0.64,pb = 0.36:
42. 3 relationship.py 42
Almost all are single. So we see that the change of 2 in the second
place of decimal of the probabilities causes such a drastic change. That
is why I had to be so accurate in the balance condition probabilities.
A look at the graphs also tells us another important thing: the bal-
ance condition is an unstable equilibrium. This is why, even with
the balance condition probabilities, the system tends to deviate from
balance, whereas with slightly different probabilities, it stabilizes far
away from the balance.
3.5 Conclusions
i.) The relationship model is successfully simulated
ii.) For a balance condition, the probability of commitment needs
to be more than the probability of break-up.
iii.) There is a balance threshold dependent on the population
composition which has to be satisfied for a balance to be al-
lowed.
43. 3 relationship.py 43
iv.) If the balance threshold is satisfied, there is a balance condi-
tion depending upon the population composition which will
give us a balance.
v.) The balance condition is an unstable equilibrium.
Acknowledgements
• The Python programming language
• LYX document processor (L TEX)
A
• Dr. Anindita Bhadra