2. 紹介する研究
“From Large Scale Image Categorization to
Entry-Level Categories”
Vicente Ordonez
Jia Deng
Yejin Choi
Alexander C. Berg
Tamara L. Berg
12. さて、どうやろう?
Wordnet
Linguistic resources
Imagenet
Google Web 1T
Computer
Vision
Lots of text
The Egyptian cat statue
by the floor clock and
perpetual motion
Interior design of modern
white and brown living
room furniture hanging.
SBU Captioned Dataset
Man sits in a rusted car
buried in the sand on
Waitarere beach
Labeled Images
Little girl and her dog in
northern Thailand. They
both seemed.
Our dog Zoe in
her bed
Emma in her hat
looking super cute
Lots of images with text
14. 1. Goal: Category Translation
Detailed Category
What should I Call It?
(Entry-Level Category)
Grampus
griseus
dolphin
𝑑
𝑒
2. Goal: Content Naming
Input Image
What should I Call It?
(Entry-Level Category)
dolphin
𝑒
15. 1. Goal: Category Translation
Detailed Category
What should I Call It?
(Entry-Level Category)
Grampus
griseus
dolphin
𝑑
𝑒
2. Goal: Content Naming
Input Image
What should I Call It?
(Entry-Level Category)
dolphin
𝑒
16. 1. Goal: Category Translation
Detailed Category
What should I Call It?
(Entry-Level Category)
Grampus
griseus
dolphin
𝑑
𝑒
1.1 Text Based
WebコーパスとWordNetの階層構造のみか
ら推定
1.2 Image Based
画像特徴からWordNetの単語へ投票して推
定
35. Extracting Meaning from Data
“water”を学習させた結果
water dog
surfing, surfboarding, surfriding
manatee, Trichechus manatus
punt
dip, plunge
cliff diving
fly-fishing
sockeye, sockeye salmon, red salmon,
blueback salmon, Oncorhynchus nerka
sea otter, Enhydra lutris
American coot, marsh hen, mud hen, water
hen, Fulica americana
booby
canal boat, narrow boat, narrowboat
Mammals
Birds
Instruments Structures Plants Other
37. Results: Content Naming
Human Labels
Flat Classifier
Deng et al.
CVPR’12
Propagated
Visual Estimates
Supervised
Learning
Joint
farm, fence
field
horse, mule
kite, dirt
people
tree, zoo
gelding
yearling
shire
yearling
draft
horse
equine
perissodactyl
ungulate
male
horse
tree
equine
male
gelding
horse
pasture
field
cow
fence
horse
pasture
field
cow
fence
38. Results: Content Naming
Human Labels Flat Classifier
Deng et al.
CVPR’12
Propagated
Visual Estimates
Supervised
Learning
Joint
fence, junk
sign
stop sign
street sign
trash can
tree
woody
tree
structure
plant
vascular
tree
structure
building
plant
area
logo
street
neighborhoo
building
office building
logo
street
neighborhood
building
office
feeder
Hyla
cleaner
box
large
39. Evaluation: Content Naming
Test Set B – High Confidence
Prediction Scores
Test Set A – Random Images
26%
26%
24%
24%
22%
22%
20%
20%
18%
18%
16%
16%
14%
14%
12%
12%
10%
10%
8%
8%
6%
6%
4%
4%
2%
2%
0%
0%
Flat
Deng et al. Propagated Supervised Combined
Classifier CVPR'12
Visual
Learning
Estimates
Precision
Recall
Flat
Deng et al. Propagated Supervised Combined
Classifier CVPR'12
Visual
Learning
Estimates
Precision
Recall