Automatic Selection of Predicates for Common Sense Knowledge Expression

Automa'c
selec'on
of
predicates
for
common
sense
knowledge
expression
Ai
Makabi,
Kazuhide
Yamamoto,
Hiroshi
Matsumoto
Nagaoka
University
of
Technology

D&(350$"-1
•! C$)1+A+,$/)-)2-#+,,25+-#)($%/#+0)
–!E0%%'(,)3-$4,+15+)
–!F05+)%$-#)$.)!##$%'$'%($)*'+,'%-./01%
@*+0G*)H+%03)
I)%)*J+02-5).0$%))K$(2G*)82'-5
H+*/$-*+)
L$)B$)80$4*+))
M+8)*2#+).$0)1$5)
#$)12*(2/,2-+N
O! !#$%)2*))-%+)$.))'(
O! PA+0($%+)82'-5)8+A2$0)
44WC$)#02-))1$5
+Q5Q)R$-A+0*'$-,)*B*#+%
KKRR

D(350$-1
•! C$)1+A+,$/)-)2-#+,,25+-#)($%/#+0)
–!E0%%'(,)3-$4,+15+)
–!F05+)%$-#)$.)!##$%'$'%($)*'+,'%
)
))S$(*)$-)%+#$1*T)
))U)D2,12-5))($%%$-)*+-*+)3-$4,+15+)8*+))
))))VRW:DX)
))U)K0$A212-5)((+**28,+)I)%)*0+/J+0+*+-#'$-).$02-5).0$%))K$(2G*)82'-0)*+))))
5
))2-)-#0,),-55+)/0$(+**2-5)#*3*)
)
L$)B$)80$4*+))
M+8)*2#+).$0)1$5)
#$)12*(2/,2-+N
O! !#$%)2*))-%+)$.)'(
O! PA+0($%+)82'-5)8+A2$0)
44WC$)#02-))1$5
+Q5Q)R$-A+0*'$-,)*B*#+%
KKRR

Related
Works
1/2
• Exis'ng
Upper
Ontologies
(SUMO,
Cyc,
etc.)
– Contain
many
general
concepts
– e.g.
Collec'on:
book
• A
Type
of:
Informa'on
bearing
object
the
form
of
paper
• Instance
of:
Kind
of
ar'fact
not
dis'nguished
by
brand
or
model
• Merits:
– Exploit
rigorously-‐defined
CSK
• Demerits:
– Knowledge
representa'on
cannot
be
matched
fully
with
actual
expressions

Related
Works
2/2
• Defineing
the
CSK
as
some
rela'ons
are
added
to
sentences/words
(ConceptNet)
– e.g.
犬（dog）
• CapableOf:
散歩（walk）,
寝る（sleep）
• SymbolOf:
忠誠（loyalty）,
• Merits:
– Defini'on
is
be_er
suited
to
a
natural
language
processing
task
• Demerits:
– For
the
Japanese
ConceptNet,
the
most
concepts
are
collected
manually
• Coverage
of
CSK
is
excep'onally
low

E$,)$.)#+)W#1B
•! !#$%'(,,B)($-*#0(#))`/-+*+)RW:D)##)
(-)8+)',2;+1).$0)*+%-'()-,B*2*)2-)-#0,)
,-55+)/0$(+**2-5
W+#)$.)/0+12(#+*)
##)($U$((0)42#)))-$-)
a)RW:)
)
44T3A+08)
44T)1]+('A+)
44T)A+08,)-$-
A+08)
803)
0-
1]+('A+)
/0+_B)
(#+
A+08,)-$-)
#$)#02-9)#$)80+1
RW:)$.)b1$5c

S2-,)5$,)U)PA+0A2+4)$.)#+)RW:D
(#
%+49)%+$4)
#$)80+1)
/0+_B)
-2%,

#$)80+1)
/0+_B)
R$%/#+))*2%2,02#B)8+#4++-)-$-*)
///B
B+,/)
))))))TTTT)
)
1$5
803)
#$)80+1)
/0+_B)
R$%/0+)#)
#+)/0+12(#+U,+A+,
!550+5#+))($-(+/#*)*)

R$-(+/#)
V-$-X
)//+0)($-(+/#)b-2%,c)
RW:) 8*+1)$-)#+)*2%2,02#B
V/0+12(#+X

Specific
Property
of
CSK
• We
make
the
three
hypothesis:
1) The
predicate
a
is
the
CSK
of
the
noun
n
when
the
pair
of
a
and
n
are
frequently
co-‐occurred
in
sentences.
2) The
predicate
a
which
co-‐occurs
with
any
nouns
is
not
the
appropriate
CSK
3) Whether
the
predicate
a
is
a
correct
CSK
or
not,
it
depends
on
the
number
of
unique
nouns
which
co-‐occurred
with
a.

W/+(2^()K0$/+0#B)$.)RW:
•! M+)%3+)#+)#0++)B/$#+*2*T)
YX! C+)/0+12(#+()(2*)#+)RW:)$.)#+)-$-(*(4+-)
#+)/20)$.)))-1(*(0+).0+d+-#,B)($U$((00+1)2-)
*+-#+-(+*Q))
[X! 12(#+()(42()($U$((0*)42#)-B)-$-*)
eX! M+#+0)#+)/0+12(#+)))2*)(($I)_+-1)'*'#'$2345%!6*)+A+0B1B)
)/0+12(#+)))))))))))$7$%
)+($U$((0)
42()($U$((00+1)42#))Q
($00+(#)RW:)$0)-$#9)
2#)1+/+-1*)$-)#+)-%8+0)$.)-2d+)-$-*)
42#)25)
.0+d+-(B
C+)/0+12(#+
2*)-$#)#+)//0$/02#+)RW:)
12(b#_++-1c)2*)#+)RW:)
42#)25)/0$882,2#B

!#$%'()*+,+('$-)$.)K0+12(#+*
C+)#$/)Yf)/0+12(#+*)112-5)#$)
)-$-)b

3V#$)+-0$,,)2-)*($$,X)
*3V#$)+1(#+X)
5P3V8+X)
GP3V8+($%+X)
3V#$)^-2*)*($$,X)
3V#$)52A+),+**$-*X)
23V#$)#3+)-)+6%X)
:N73V_+-1X)

)3V#$),+0-X)
3V#$)($(X))

25
C+)/0+12(#+*)/,(+1)//+0)
2-)#+),2*#)0+)($-*21+0+1)
%$0+)//0$/02#+)*)#+)RW:)
,$4

)V+,+%+-#0B)*($$,Xc K0+12(#+*)42#)25)
($U$((00+-(+).0+d+-(B)
42#))-$-)8#)(--$#)
(0(#+02;+)#+)-$-
I-($00+(#)RW:)
•! g+0*',+)4$01*)
•! R$U$((00+1)42#)%-B)
-$-*

%#!!
%!!!
$#!!
$!!!
#!!
!
%+05+-(+)12*#028'$-)$.)/0+12(#+*)
2-)#+)#$/)Y9fff)-$-*))
! %!! !! '!! (!! $!!!
?%8+0)$.)-2d+)-$-*)($U$((002-5)42#)/0+12(#+)
?%8+0)$.)-2d+)/0+12(#+*))
C+)/0+12(#+*)42().,,)-1+0))(+0#2-)*($/+)($U$((0)42#)
%-B)-$-*)a)L+,+#+)#+)/0+12(#+*)*)+,-,.*'(/0,%#)1,23

%#!!
%!!!
$#!!
$!!!
#!!
!
%+05+-(+)12*#028'$-)$.)/0+12(#+*)
2-)#+)#$/)Y9fff)-$-*))
C+)-%8+0)$.)-2d+)/0+12(#+*9)
42()($U$((0)42#)hff)-$-*9)2*)Yfff
! %!! !! '!! (!! $!!!
?%8+0)$.)-2d+)-$-*)($U$((002-5)42#)/0+12(#+)
?%8+0)$.)-2d+)/0+12(#+*))
C+)/0+12(#+*)42().,,)-1+0))(+0#2-)*($/+)($U$((0)42#)
%-B)-$-*)a)L+,+#+)#+)/0+12(#+*)*)+,-,.*'(/0,%#)1,23

%#!!
%!!!
$#!!
$!!!
#!!
!
%+05+-(+)12*#028'$-)$.)/0+12(#+*)
2-)#+)#$/)Y9fff)-$-*))
($U$((002-5)42#)
%-B)-$-*
($U$((002-5)42#)
.+4)-$-*
! %!! !! '!! (!! $!!!
?%8+0)$.)-2d+)-$-*)($U$((002-5)42#)/0+12(#+)
?%8+0)$.)-2d+)/0+12(#+*))
C+)/0+12(#+*)42().,,)-1+0))(+0#2-)*($/+)($U$((0)42#)
%-B)-$-*)a)L+,+#+)#+)/0+12(#+*)*)+,-,.*'(/0,%#)1,23

%#!!
%!!!
$#!!
$!!!
#!!
!
%+05+-(+)12*#028'$-)$.)/0+12(#+*)
2-)#+)#$/)Y9fff)-$-*))
C2*)($-#2-*)#+)2-($00+(#,B)/0+12(#+*)
8*+1)$-)B/$#+*2*)V[X)
V*0/,B)2-(0+*+1X
! %!! !! '!! (!! $!!!
?%8+0)$.)-2d+)-$-*)($U$((002-5)42#)/0+12(#+)
?%8+0)$.)-2d+)/0+12(#+*))
C+)/0+12(#+*)42().,,)-1+0))(+0#2-)*($/+)($U$((0)42#)
%-B)-$-*)a)L+,+#+)#+)/0+12(#+*)*)+,-,.*'(/0,%#)1,23

%+05+-(+)12*#028'$-)$.)/0+12(#+*)
2-)#+)#$/)Y9fff)-$-*))
/$4+0)
//0$62%#+1)(0A+))
2-i+('$-)
/$2-#
,$502#%2()(0A+
?%8+0)$.)-2d+)-$-*)($U$((002-5)42#)/0+12(#+)V,$502#%X
?%8+0)$.)-2d+)/0+12(#+*)V,$502#%X))

W/+(2^()K0$/+0#B)$.)RW:

2-.$0%'$-)
/+0*$-)
/0$1(#)
T)
T)
T)
0--+0)
1#8*+)
/2-$
C+)/0+12(#+)$.)b0-c)
($,1)-$#)(0(#+02;+)#+)
•! M+)%3+)#+)#0++)B/$#+*2*T)
YX! C+)/0+12(#+()(12(#+
2*)#+)RW:)$.)#+)-$-(*(4+-)
20)$.)))-1(*(W$0#)
#+)-$-*)
8B)#+)
-%8+0)$.)
($U$((002-5)
/0+12(#+*
#+)/20)$.)
*+-#+-(+*Q))
C+)/0+12(#+
2*)-$#)#+)//0$/02#+)RW:)
M+#+0)#+)/0+12(#+)
-$-)$.)b/+0*$-c
0+).0+d+-#,B)($U$((00+1)2-)
C+)/0+12(#+)$.)b0-c)
($,1)(0(#+02;+)#+)
[X! 12(#+()(42()($U$((0*)42#)-B)-$-*)
b0--+0c
eX! 12(#+)))2*)(($00+(#)RW:)$0)-$#9)
2#)1+/+-1*)$-)#+)-%8+0)$.)-2d+)-$-*)
42()($U$((00+1)42#))Q
C+)-$-)42()($U$((0*)42#)%-B)/0+12(#+*)(-)-$#)8+)
(0(#+02;+1)8B)5+-+02()/0+12(#+*9)+-(+9)#+)-%8+0)$.)#+20)
1+,+'-5)/0+12(#+*)2*)%$0+)2-(0+*+)#-)-$-*)($U$((002-5)
42#)).+4)/0+12(#+*Q))

%+05+-(+)12*#028'$-)$.)#+)#$/)?)
/0+12(#+*)($U$((002-5)42#)-$-)
I-)#+)(*+)42()#+)A,+)$.)?)2*)Yff9)
$-,B)#+)#$/)Yff)-$-*)0+)#3+-)2-#$)(($-#
?jYff ?jYfff

L+,+'-5)/0+12(#+*
25U0-3+1)-$-*)42,,)A+)%$0+)1+,+'-5)/0+12(#+*))
4+-)-$-*)0+)*$0#+1)2-)#+)$01+0)$.)/0+12(#+)($U$((00+-(+Q))

C+)#$/)?)-$-*)($U$((002-5)42#)%-B)/0+12(#+*) C+)-%8+0)$.)1+,+'-5)/0+12(#+*)
1+(0+*2-5)
2-))*#20(*+)/_+0-
C+)-%8+0)$.)1+,+'-5)
/0+12(#+*).$0)+()-$-)
2*)1+(21+1)8*+1)$-)
#+)B/$#+*2*)VeXQ))
*2-5,0)/$2-#*)2-)
?jkff9)Y9Yff9)Y9lff9)[9mff)$0)e9lffQ

?%8+Table 0)$.)I
1+,+'-5)/0+12(#+*)
NUMBER OF DELETING PREDICATES FOR EACH NOUN (N=THE
.$0)+()-$-
UNIQUE NUMBER OF CO-OCCURRED PREDICATES)
Scope of the nouns Deletion
N!700 427
700N!1,100 267
1,100N!1,600 143
1,600N!2,500 73
others 33
R$-*21+0)##)#+)ee)
/0+12(#+*)0+)-$#)RW:9)
-1)1+,+#+).0$%),,)
-$-*)*)2-($00+(#,B)
/0+12(#+*))
However, the 33 predicates, which get deleted when
S:P3V-1+0*#-1X9)LD3VA+X9)KP3V*++9),$$3X9)GP3V8+($%+X9))
G63V-$#2-5X9)FP3V#3+9)1$/#9)/0+.+0X9)E;P3V(-X9)@P3V3-$4X9))
P3V($%+X9)9L73V#2-3X9)9963V%-BX9)6P3V8+9)-++19)*$$#X
can be used to nearly all nouns, so we consider
are not common sense knowledge, and delete from
as incorrectly predicates. Figure 6 shows a list of
A. Evaluation We compare following (1) Do predicates (2) Do predicates (3) Remove by normalized We compare 6%/,+)$.)1+,+'-5)/0+12(#+*

relate), B. Evaluation We take their assigned follows (Table The proposed noun as the On the other which frequently much higher “犬(dog)”, “一緒(be together)” appeared in :
やる(do), かける(build, hang, run, lack)
(predicates the weighted scores for predicates co-occurring with noun
using Figure Harman 6. Added
The normalized deleting CSK
predicates frequency. for
each
for A all noun
predicate noun
is correct
common sense knowledge for a noun when the predicate
score is high. The equation of Harman normalized frequency
is as follows (n: noun, a: predicate, na,n: appearance of predicate a with noun n).
use the selected predicates as common sense knowl-edge,
and add them to each noun. In particular, we calculate
weighted scores for predicates co-occurring with noun
Harman normalized frequency. A predicate is correct
TF(a, n) =
log2(!
na,n + 1)
log2(
high. The equation of Harman normalized k nk,n)
frequency
follows (n: noun, a: predicate, na,n: appearance fre-quency
of predicate a with noun n).
• The
following
equa'on
computes
weighted
scores
for
predicates
co-‐occurring
with
noun
using
Harman
normalized
frequency
A
predicate
is
appreciate
as
correct
CSK
for
a
noun
when
TF(the
predicate
a, n) =
score
is
high.
log2(!
na,n + 1)
log2(
k nk,n)
(1)
Figure 6. The deleting predicates for all noun
use the selected predicates as common sense knowl-edge,
and add them to each noun. In particular, we calculate
weighted scores for predicates co-occurring with noun
Harman normalized frequency. A predicate is correct
high. The equation of Harman normalized frequency
follows (n: noun, a: predicate, na,n: appearance fre-quency
of predicate a with noun n).
TF(a, n) =
log2(!
na,n + 1)
log2(
k nk,n)
for all noun
predicates as common sense knowl-edge,
noun. In particular, we calculate
predicates co-occurring with noun
frequency. A predicate is correct
a noun when the predicate
Harman normalized frequency
predicate, na,n: appearance fre-quency
noun n).
log2(na,n + 1)
!
(1)
noun
:
predicate
:
appearance
frequency
of
predicate
a
with
noun
n

Baselines
1) Do
not
delete
the
any
predicates,
just
use
the
weighted
predicates
by
Harman
normalized
frequency
(baseline
1)
2) Do
not
delete
the
any
predicates,
just
use
the
weighted
predicates
by
TF-‐IDF
score
(baseline
2)
3) Remove
the
427
dele'ng
predicates
in
N≤700,
and
use
the
weighted
predicates
by
Harman
normalized
frequency
(baseline
3)

893#:*'%;%3,$'+%:4'+!32'%3++$,%2%=+,
D*+,2-+)Y D*+,2-+)[ D*+,2-+)e !//0$(
:7UA+V :7UA+V SG63V1$)-$#)
+#X)
3V#$)#3+)$#)
.$0))4,3X)
GPU?'!#'V %3-?'%2,'26'41% 1SG63V1$)-$#)
80++1X)
@D=P3V802-5)
/X)
6PU?'V 3V#$),2A+X) :KRA3V82#+)#$)
1+#X)
#3V8+)*2(3X
5PU?'V .
3V#$)*,+X) 8G63V1$)-$#)
803X)
DQP3V#3+)
*$%+$-+)#$)#$4X)
U#$),2A+V CI@63V.-X) +3V#$)52A+))
,+#,)2-]+('$-X)
OA3V,2A+X
KPU*++V MA63V(+/X) '3V#$)#+#+0X ,3V#$)#02-X
G6U8+)-$-+V S:P3V-1+0*#-1X) ,3V#$)#02-X) J8P3V803X)
67U*BV $/3V#$)0+52*#+0X) MB=P3V5+#),,)
#2-X)
:S663V(#+X

Automatic Selection of Predicates for Common Sense Knowledge Expression

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Automatic Selection of Predicates for Common Sense Knowledge Expression

Similar to Automatic Selection of Predicates for Common Sense Knowledge Expression (20)

More from 長岡技術科学大学　自然言語処理研究室

More from 長岡技術科学大学　自然言語処理研究室 (20)

Recently uploaded

Recently uploaded (20)