SlideShare uma empresa Scribd logo
1 de 15
Mask R-CNN とその祖先
(を理解したかった)
【画像処理 & 機械学習】論文LT会!#7
2019.9.13@LPIXEL
俵(@tawatawara)
自己紹介
バックグラウンド
◦ 学士:情報学科(計算機科学)
◦ 修士:情報学研究科(社会情報学)
現職
◦ 某 JTC の研究開発部門. 画像は not 業務 but 趣味
趣味: Kaggle
◦ がっつり取り組んだことあるのは大概画像コンペ
◦ Human Protein Atlas Classification: 21th
◦ iMet Collection 2019: lost place ...
◦ 実は今まで Classification しかやったことが無い
→ そろそろ Detection とか Segmentation もやりたい
→ baseline(と思われる)論文を読もう → Mask R-CNN
おことわり
今回の内容について
◦ 画像専門の方にとっては正に釈迦に説法
◦ まとめている記事等が大量に存在. 色々参考にさせて頂きました.
◦ 生暖かい目で見守ってください.ツッコミ歓迎です
◦ ふわっとした理解を書いてるので詳細度が足りない(すみません)
スライド中の図表について
◦ 特に明示しない場合はMask R-CNN[1] の論文から引用
◦ 今回はその他からも引用する場合あり
今回の道のり
Mask R-CNN[1] : Faster R-CNN の拡張
Faster R-CNN[2] : Fast R-CNN の拡張
Fast R-CNN[3]: R-CNN の拡張
R-CNN[4][5]: 古き良き画像特徴での物体検出に止めを刺した(らしい)
この他にも関連は色々あるらしいですが勘弁して…
論文概要
背景
◦ instance segmentation は object detection と semantic segmentation の両方の
性質を併せ持つ challenging な task
◦ 高精度に行うためには複雑なモデルが必要と予想されるが...
提案手法
◦ 既存の object detection 手法(Faster R-CNN) に mask prediction mask を追加し
たシンプルな拡張による実装.
◦ Bounding box の feature map 上での対応をいい感じにする RoI Align を提案.
結果
◦ instance segmentation, object detection 共に高い性能を発揮.
R-CNN[4][5]
◦ 入力画像から領域候補を抽出: Selective Search (参考:[6])
◦ 各領域候補から特徴を抽出: CNN
◦ 各領域候補を上記の特徴量を用いて識別: class ごとの SVMs
◦ (+より正確な位置推定: class ごとの Bounding Box Regressor)
※図は[4]より引用.
各 part が
独立している
Fast R-CNN[3]
◦ 入力画像から領域候補を抽出: Selective Search (参考:[6])
◦ 入力画像から feature map を抽出:CNN
◦ feature map から領域候補ごとの特徴を獲得: RoI Pooling
◦ 領域候補ごとに識別 & BB Regression: マルチタスク
※図は[3]より引用.
ここだけ独立
End-to-End
で学習
Faster R-CNN[2]
◦ 入力画像から feature map を抽出:CNN
◦ feature map から領域候補を予測: RPN (anchor box の話は略)
◦ feature map から領域候補ごとの特徴を獲得: RoI Pooling
◦ 識別 & BB Regression: マルチタスク
※図は[2]より引用.
※学習がややこしそうな印象.
End-to-End(?)
で学習
Mask R-CNN[1]
◦ 入力画像から feature map を抽出:CNN
◦ feature map から領域候補を予測: RPN
◦ feature map から領域候補ごとの特徴を獲得: RoI Align
◦ 識別(FC) & BB Regression(FC) & Segmentation (FCN)
End-to-End(?)
で学習
※論文中だと FPN と学習は
分けていると記載。
実験結果
定性評価(instance segmentation)
実験結果
定量評価 (instance segmentation)
◦ 実装の仕方による詳細な性能検証もされている(論文を参照).
◦ Backborn, RoIPool vs. RoIAlign, Mask Branch …
◦ ここらへんちゃんと見た方が良いと思ったのですが時間が...
実験結果
定量評価 (object detection)
◦ object detection の方でも良い性能
◦ 2行目と下三行を見ると:
◦ RoIAlign や Multi-task、及び backborn の改良、いずれも効果アリ
まとめ
Faster R-CNN を拡張したシンプルなアプローチ
◦ Mask Prediction Branch の追加
◦ RoI Align 導入による Bounding Box と RoI feature の位置ずれの補正
結果
◦ Instance segmentation でも object detection でも良い結果(当時).
◦ 定性評価の差が非常に明確.(もちろん悪い例もあるだろうが...)
感想
◦ 一通り流れを追うのはいい勉強になります(※まだ終わってない)
◦ 何のためにこれを追加したか?がわかりやすい
◦ Anchor Box の話とかをちゃんとまとめれずすみません...
◦ 実装追うのと、ちゃんと学習させられるようになりたいです。そして Kaggle で使う。
◦ シンプルなアプローチとは言ったものの学習させるの難しそうな印象でした。知見が欲しい。
参考資料
[1] K. He, G. Gkioxari, P. Dollar, and R. Girshick. Mask R-CNN. In ICCV, 2017.
[2] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards Real-Time Object
Detection with Region Proposal Networks. In NIPS, 2015.
[3] R. Girshick. Fast R-CNN. In ICCV, 2015.
[4] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate
object detection and semantic segmentation. In CVPR, 2014.
[5] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate
object detection and semantic segmentation Tech report (v5). arXiv:1311.2524v5
[6] KantoCV/Selective Search for Object Recognition
[7] 物体検出についての歴史まとめ
[8] Mask R-CNN:ディープラーニングによる一般物体検出・Instance Segmentation手法
[9] 最新の物体検出手法Mask R-CNNのRoI AlignとFast(er) R-CNNのRoI Poolingの違いを正し
く理解する
おわり

Mais conteúdo relacionado

Último

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Último (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Destaque

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Destaque (20)

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 

【論文LT資料】 Mask R-CNN とその祖先(を理解したかった...)

  • 1. Mask R-CNN とその祖先 (を理解したかった) 【画像処理 & 機械学習】論文LT会!#7 2019.9.13@LPIXEL 俵(@tawatawara)
  • 2. 自己紹介 バックグラウンド ◦ 学士:情報学科(計算機科学) ◦ 修士:情報学研究科(社会情報学) 現職 ◦ 某 JTC の研究開発部門. 画像は not 業務 but 趣味 趣味: Kaggle ◦ がっつり取り組んだことあるのは大概画像コンペ ◦ Human Protein Atlas Classification: 21th ◦ iMet Collection 2019: lost place ... ◦ 実は今まで Classification しかやったことが無い → そろそろ Detection とか Segmentation もやりたい → baseline(と思われる)論文を読もう → Mask R-CNN
  • 3. おことわり 今回の内容について ◦ 画像専門の方にとっては正に釈迦に説法 ◦ まとめている記事等が大量に存在. 色々参考にさせて頂きました. ◦ 生暖かい目で見守ってください.ツッコミ歓迎です ◦ ふわっとした理解を書いてるので詳細度が足りない(すみません) スライド中の図表について ◦ 特に明示しない場合はMask R-CNN[1] の論文から引用 ◦ 今回はその他からも引用する場合あり
  • 4. 今回の道のり Mask R-CNN[1] : Faster R-CNN の拡張 Faster R-CNN[2] : Fast R-CNN の拡張 Fast R-CNN[3]: R-CNN の拡張 R-CNN[4][5]: 古き良き画像特徴での物体検出に止めを刺した(らしい) この他にも関連は色々あるらしいですが勘弁して…
  • 5. 論文概要 背景 ◦ instance segmentation は object detection と semantic segmentation の両方の 性質を併せ持つ challenging な task ◦ 高精度に行うためには複雑なモデルが必要と予想されるが... 提案手法 ◦ 既存の object detection 手法(Faster R-CNN) に mask prediction mask を追加し たシンプルな拡張による実装. ◦ Bounding box の feature map 上での対応をいい感じにする RoI Align を提案. 結果 ◦ instance segmentation, object detection 共に高い性能を発揮.
  • 6. R-CNN[4][5] ◦ 入力画像から領域候補を抽出: Selective Search (参考:[6]) ◦ 各領域候補から特徴を抽出: CNN ◦ 各領域候補を上記の特徴量を用いて識別: class ごとの SVMs ◦ (+より正確な位置推定: class ごとの Bounding Box Regressor) ※図は[4]より引用. 各 part が 独立している
  • 7. Fast R-CNN[3] ◦ 入力画像から領域候補を抽出: Selective Search (参考:[6]) ◦ 入力画像から feature map を抽出:CNN ◦ feature map から領域候補ごとの特徴を獲得: RoI Pooling ◦ 領域候補ごとに識別 & BB Regression: マルチタスク ※図は[3]より引用. ここだけ独立 End-to-End で学習
  • 8. Faster R-CNN[2] ◦ 入力画像から feature map を抽出:CNN ◦ feature map から領域候補を予測: RPN (anchor box の話は略) ◦ feature map から領域候補ごとの特徴を獲得: RoI Pooling ◦ 識別 & BB Regression: マルチタスク ※図は[2]より引用. ※学習がややこしそうな印象. End-to-End(?) で学習
  • 9. Mask R-CNN[1] ◦ 入力画像から feature map を抽出:CNN ◦ feature map から領域候補を予測: RPN ◦ feature map から領域候補ごとの特徴を獲得: RoI Align ◦ 識別(FC) & BB Regression(FC) & Segmentation (FCN) End-to-End(?) で学習 ※論文中だと FPN と学習は 分けていると記載。
  • 11. 実験結果 定量評価 (instance segmentation) ◦ 実装の仕方による詳細な性能検証もされている(論文を参照). ◦ Backborn, RoIPool vs. RoIAlign, Mask Branch … ◦ ここらへんちゃんと見た方が良いと思ったのですが時間が...
  • 12. 実験結果 定量評価 (object detection) ◦ object detection の方でも良い性能 ◦ 2行目と下三行を見ると: ◦ RoIAlign や Multi-task、及び backborn の改良、いずれも効果アリ
  • 13. まとめ Faster R-CNN を拡張したシンプルなアプローチ ◦ Mask Prediction Branch の追加 ◦ RoI Align 導入による Bounding Box と RoI feature の位置ずれの補正 結果 ◦ Instance segmentation でも object detection でも良い結果(当時). ◦ 定性評価の差が非常に明確.(もちろん悪い例もあるだろうが...) 感想 ◦ 一通り流れを追うのはいい勉強になります(※まだ終わってない) ◦ 何のためにこれを追加したか?がわかりやすい ◦ Anchor Box の話とかをちゃんとまとめれずすみません... ◦ 実装追うのと、ちゃんと学習させられるようになりたいです。そして Kaggle で使う。 ◦ シンプルなアプローチとは言ったものの学習させるの難しそうな印象でした。知見が欲しい。
  • 14. 参考資料 [1] K. He, G. Gkioxari, P. Dollar, and R. Girshick. Mask R-CNN. In ICCV, 2017. [2] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS, 2015. [3] R. Girshick. Fast R-CNN. In ICCV, 2015. [4] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014. [5] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation Tech report (v5). arXiv:1311.2524v5 [6] KantoCV/Selective Search for Object Recognition [7] 物体検出についての歴史まとめ [8] Mask R-CNN:ディープラーニングによる一般物体検出・Instance Segmentation手法 [9] 最新の物体検出手法Mask R-CNNのRoI AlignとFast(er) R-CNNのRoI Poolingの違いを正し く理解する