win rate first search

From Monte-Carlo
to win rate first search
for “Dobutsu Shogi”

2010/05/22
IHARA Takehiro

Abstract
• On algorithm for computer Shogi (Japanese
chess)
• Contents
– Exhibition of Dobutsu Shogi
– Min-max method (conventional)
– Monte-Carlo method (conventional)
– Win rate first search (presented)

Dobutsu shogi
• This slide mentions computer game
algorithm by using Dobutsu Shogi
• Dobutsu Shogi: a miniature shogi
• Shogi: Japanese chess
• Dobutsu: animal
• Normal shogi is too large to examine new
methods

Rule of Dobutsu Shogi 1
Five kind of pieces
Initial position is as figure
Win if you catch lion
Win if your lion reaches
to opposite end

Chick promotes
chicken

Rule of Dobutsu Shogi 2
All pieces move by one step
vertical horizontal and
forward forward-diagonal

around 8 diago vertical
squares nal horizontal

You can reuse (drop) the pieces that you took

Copy right of Dobutsu shogi
• I do not know who has copy right
– FUJITA Maiko （illustration）
– KITAO Madoka （making rule）
– ＬＰＳＡ（the two designers had belonged to）
– GENTOSHA Education （toy seller）

Illustration on this slide
• Because of that complex copy right, I use
the illustrations on the website below in this
slide, instead of FUJITA's ones
• “SOZAIYA JUN”
• (http://park18.wakwak.com/~osyare/)

Exhibition initial position
Black： win rate first
search (presented)
White： min-max
method, search depth
9, evaluation function
is composed by only
piece value
(conventional)

Exhibition 1st move
Black advanced
giraffe

Exhibition 2nd move
White advanced
giraffe

Exhibition 3rd move
Black took chick by
chick

Exhibition 4th move
White took chick by
elephant

Exhibition 5th move
Black advanced
elephant

Exhibition 6th move
White dropped chick
for defense

Exhibition 7th move
Black moved giraffe
backward

Exhibition 8th move
White advanced
giraffe

Exhibition 9th move
Black dropped chick
for defense

Exhibition 10th move
White took elephant
by giraffe

Black took giraffe by
lion

White dropped
elephant
This elephant
combination style is
strong

Black lion escaped

White advanced lion

Black dropped giraffe
and check

White escaped lion

Black advanced
giraffe
Black forced white to
select taking giraffe or
escaping elephant

White took giraffe by
elephant

Black took elephant by
lion

White dropped giraffe

Exhibition 21st move
Black dropped
elephant behind lion

Exhibition 22nd move
White moved elephant
backward

Exhibition 23rd move
Black advanced
elephant

White check by giraffe

Black took giraffe by
elephant

White took elephant
by chick
If white had taken by
elephant, white would
be mate

Black lion escaped

White dropped
elephant

Black check by giraffe

White took giraffe by
elephant

Exhibition 31st move
Black took chick by
lion, and white
resigned
After it, white drops giraffe on side of
lion, black giraffe takes elephant and
check, white lion takes it, black chick
advances, white lion moves backward,
black drops chick, check mate

Min-max method
• A conventional method
• Today the most successful method for shogi
• Explanation using tree structure from next
page

Min-max Example: 3 depth

Present board position

after 1 and 2 moves
Board position
Board position after 3 moves

Suppose scores after 3 moves
were revealed
-8
23
5
-9
Min-max

3
10
-3
-4

Scores after 2 moves are
maximum of each score
-8
23

23
5
5

-9
Min-max

3
10

10
-3
-3

-4

Scores after 1 moves are
minimum of each score
-8
23

23
5

5
5

-9
Min-max

3
10

10
-3

-3
-3

-4

Select the move having
maximum score
-8
23

23
5

5
5

-9
5
Min-max

3
10

10
-3

-3
-3

-4

Min-max method
• Theoretically you can select the move that
has the maximum score after N moves
• Theoretically if we could obtain the score of
the end of the game, we would always win
the game
• Practically because of too large
computational cost, we cannot calculate all
moves

Min-max method
• Although many methods for reducing
computational cost is presented, they will
be not mentioned this slide (It is called
pruning to reduce the number of searched
nodes)

Conclusion of min-max method
• It uses tree structure
• Scores after N moves are needed
• Pruning is needed

Monte-Carlo method
• While I do not know the history of Monte-
Carlo method, it have been successful for
computer “go” (precisely successful by
Monte-Carlo tree search)
• They say that it is difficult to apply
computer shogi (or chess-like game) yet

Outline of Monte-Carlo
first move • Repeat random
moves
• Then game finishes
random move

and winner is
playout

revealed
• making game end by
random moves is
called playout

end of game

• Repeat playout
• Obtain win rate of
the first move
• (number of win) /
(number of playout)
• Select move having
highest win rate at
the last

• Outline is only it
• As to “Go”, this method has become
stronger by combining tree structure and
making Monte-Carlo tree search (this slide
does not mention it)
• Another improvement is that playout uses
moves by knowledge of “Go” instead of
simple random moves

Example of knowledge of “Go”
• Observe 3x3 squares
• Set low probability to drop
black stone the center of
above figure
• Set high probability to drop
black stone the center of
below figure

Monte-Carlo for shogi
• Simple Monte-Carlo method does not work
for shogi (too many bad moves appear)
• A causal must be that few moves in all legal
moves are good on shogi
• I do not want to use knowledge of shogi by
neither machine learning nor manual setting

Why Monte-Carlo for shogi
• Ability to determine the move by result of
the end of game, which seems beautiful
• No evaluation function is needed, no preset
knowledge is needed

Discussion Monte using tree

green and red
equal win rate between
Simple random moves lead
Truth is that green win and red lose
It tells importance of tree structure


after 3 moves
Suppose you obtain win rate
0.1 0.3 0.7 0.8 0.2 0.6 0.9 0.4
Obtain win rate of green and red from
These 3-move-after rates by playout


ones of min-max method
Ideally the rates are equal to
0.3 0.6

0.3 0.8 0.6 0.9

0.1 0.3 0.7 0.8 0.2 0.6 0.9 0.4

• Q: How do you calculate
parent node 0.6 by children
nodes 0.2 and 0.6
0.6
• A: Ignore 0.2

0.2 0.6

• Q: How do you ignore 0.2?
• A1: Always search maximum
0.6 win rate node
• A2: sometimes search through
node randomly
0.2 0.6


maximum win rate
Search node that has
0.1 0.3 0.7 0.8 0.2 0.6 0.9 0.4

This tactics finds the best path

Win rate first search
• Remember win rate of searched node
• Almost always search node that has
maximum win rate
• Sometimes search randomly (ideally it is
not needed)
• Then this algorithm finds the best move

Additional explanation
• Update win rate at every playout
• Keep numerator and denominator as win
rate
• Add constant number to both numerator and
denominator when win the playout
• Add constant number to only denominator
when lose the playout

Problems of presented method
• Win rates of the nodes that have not been
searched are mentioned from the next pages
• Many other issues must be hiding, though I
have not defined them

Unreached node
• On the node that has
not been searched
and no win rate

0.4 0.6 0.3
unreached

Another win rate
• Before this page, knowledge of shogi does
not appear and only graph is used
• This win rate uses knowledge of shogi
• Win rate is calculated by kind of moves
• For example, taking piece, promotion, and
etc.

Another win rate
• Calculate win rate by these factors
– Piece position before and after move
– Kind of pieces moving and taken
– Is position whether controlled or not
• Win rate table for all combination of these
factors is prepared
• These win rates are learned by playout,
whose values are not prepared

Another smaller win rate
• Another smaller win rate table is prepared
– Kind of pieces moving and taken
– Is position whether controlled or not
• Since it is small, it learns fast
• It is used when “another larger win rate” is
not learned yet
• If all three kinds of win rate have not been
learned, let win rate be 1

Conclusion of presented method
• Win rates of all searched nodes are
remembered and learned by playout
• Select node that has highest win rate in
playout (“win rate first search”)
• Sometimes select node randomly
• If win rate has not been learned, other win
rates are used

Condition of simulation game
• Win rate first search vs. Simple min-max
method (evaluation function is composed
by only values of pieces)
• If the game continues till 80 moves, the
game is regarded as even (special rule for
this simulation)

Result of simulation 1
Number
of playout １００００３００００１０００００
Presented
method: 22-76 44-52 48-49
black
Presented
method: 16-81 30-68 61-35
white

Win-lose for presented method in 100 games
Some even games exist
Depth of min-max method is 6
More the playouts are, stronger the method is

Result of simulation 2
Depth of
min-max ４５６７ 8 ９
Present
method: 94-6 77-20 48-49 37-61 24-73 14-85
black
Present
method: 78-21 78-20 61-35 38-57 40-52 20-74
white

Win-lose for presented method in 100 games
Some even games exist
100000 playouts for presented method
Almost same strongness to 6-depth min-max

Impression by human viewer
• Frequently presented method take bad
moves
• Although it is a variation of Monte-Carlo
method, it can find mate route
• It is good at finding narrow route
• Difference of the number of playout shows
clearly difference of strongness

Conclusion and future issue
• Conclusion
– Playout by win rate first
– Select moves without preset knowledge
– Select moves by result of playout
• Future
– Someone can apply it to “Go” or other
chess-like games
– I return to research speech signal
processing

win rate first search

Recomendados

Recomendados

Mais conteúdo relacionado

Último

Último (20)

Destaque

Destaque (20)

win rate first search