O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

manpu2022(Sakurai_SpeakerLineDataset)

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
OzCHI_1205.pdf
OzCHI_1205.pdf
Carregando em…3
×

Confira estes a seguir

1 de 40 Anúncio

Mais Conteúdo rRelacionado

Mais de nakamura-lab (20)

Mais recentes (20)

Anúncio

manpu2022(Sakurai_SpeakerLineDataset)

  1. 1. A Method to Annotate Who Speaks a Text Line in Manga and Speaker-Line Dataset for Manga109 Tsubasa Sakurai, Risa Ito, Kazuki Abe and Satoshi Nakamura School of Interdisciplinary Mathematical Sciences, Meiji University
  2. 2. Increased research and services utilizing e-comics Automatic translation, content-based recommendation and search, spoiler prevention ➔Various studies on the contents of comics are required Background Automatic translation (Mantra) © Akamatsu Ken, LoveHina
  3. 3. Recognition of the components of comics The area of comic frames, the area of text lines and the face of a character Background Face Line Frame © Akamatsu Ken, LoveHina Line
  4. 4. Recognition of the components of comics The area of comic frames, the area of lines and the face of a character Other studies on the components of comics the content of lines, facial expressions, the speaker of the lines and relationships between characters etc. Background
  5. 5. Focus on the relationship between lines and characters Who speaks these text line in the frame Face Line Line Required dataset © Akamatsu Ken, LoveHina
  6. 6. Methods for automatic estimation of the speaker Estimation by distance from the tail of the speech balloon ➔ Speech balloon and speaker association for comics and manga understanding [Rigaud et al. 2015] Related work © Shindou Uni, NichijouSoup
  7. 7. Related work Tail of the speech balloon © Shindou Uni, NichijouSoup Methods for automatic estimation of the speaker Estimation by distance from the tail of the speech balloon ➔ Speech balloon and speaker association for comics and manga understanding [Rigaud et al. 2015]
  8. 8. Examples where existing methods cannot be used Difficulty of speaker estimation © Sorata Akizuki, Snow White with the Red Hair © Taira Masami, KuroidoGanka No speech balloon Distant character is the speaker
  9. 9. Examples where existing methods cannot be used 吹き出しがない 遠いキャラが発話者 Clarify the factors to consider in machine learning and what are the difficulties Difficulty of speaker estimation © Sorata Akizuki, Snow White with the Red Hair © Taira Masami, KuroidoGanka
  10. 10. The eBDtheque dataset Speaker information is available, but the number of data is small ➔ eBDtheque: A Representative Database of Comics [Rigaud et al. 2013] The Manga109 dataset Large number of data, but no speaker information ➔ Sketch-based manga retrieval using manga109 dataset [Matsui et al. 2017] Related work
  11. 11. The Manga109 dataset 109 comics drawn by professional cartoonists, with annotations ➔ Sketch-based manga retrieval using manga109 dataset [Matsui et al. 2017] 4 types of annotations l Position of the frames l Body position and character name l Face position and character name l Text line position and text string Related work Frame Body Face Line Line © Akamatsu Ken, LoveHina
  12. 12. Propose and develop systems to easily construct datasets Analyze Speaker-Line Dataset for Manga109 Identifying the characteristics of comics for speaker estimation Research purpose Face Line Line © Akamatsu Ken, LoveHina
  13. 13. How to assign annotations Manual selection of speakers is very difficult Conventional method
  14. 14. Speakers and lines are often close together Enables quick annotation Informational Design © Akamatsu Ken, LoveHina
  15. 15. Fitts's law [Fitts 1954] Which tasks are difficult? Fitts's law
  16. 16. Fitts's law [Fitts 1954] The larger the target and the shorter the distance to the target, the quicker the movement 𝑇 = 𝑎 + 𝑏 log!( 𝑫 𝑾 + 1) 𝑎:Time taken for start and end operations 𝑏:Effect of mouse speed on time taken D W Fitts's law
  17. 17. How to assign annotations Drag and drop lines to the speaker Proposed method
  18. 18. Dataset construction system Building a dataset of mapping between lines and speakers Dataset Construction
  19. 19. Dataset construction system Building a dataset of mapping between lines and speakers Number of the annotations l A total of 749,856 lines annotated by 56 people l Average of about 5 persons evaluating per line Dataset Construction
  20. 20. Result of dataset construction Number of annotations assigned per person Speaker-Line Dataset for Manga109 Manga109 147,918 total speaking Fewer annotations ↓ less validity
  21. 21. Number of people evaluated and rating agreement © Shindou Uni, NichijouSoup Appropriate number of evaluators © Shindou Uni, NichijouSoup
  22. 22. Number of people evaluated and rating agreement © Shindou Uni, NichijouSoup © Shindou Uni, NichijouSoup Appropriate number of evaluators
  23. 23. Number of people evaluated and rating agreement © Shindou Uni, NichijouSoup © Shindou Uni, NichijouSoup Possibility to change the speaker candidate Appropriate number of evaluators
  24. 24. Result of dataset construction Agreement rate of the annotations Analysis of our dataset
  25. 25. Percentage of lines that were in perfect agreement with the evaluation Perfect Match Rate =100% By one annotator Appropriate number of evaluators
  26. 26. Percentage of lines that were in perfect agreement with the evaluation Perfect Match Rate =100% Perfect Match Rate =60% Appropriate number of evaluators By one annotator By two annotators Perfect Match Rate =40% By three annotators
  27. 27. Changes in the rate of perfect matches in ratings Need appropriate evaluator above a certain level 10 points Appropriate number of evaluators
  28. 28. Result of dataset construction Presence or absence of speaker in the frame Analysis of our dataset
  29. 29. Result of dataset construction Presence or absence of speaker in the frame © Akamatsu Ken, LoveHina Analysis of our dataset
  30. 30. Result of dataset construction Presence or absence of speaker in the frame Analysis of our dataset © Akamatsu Ken, LoveHina
  31. 31. Specific situations (on battleships) © Kato Masaki, ARMS Scenes with difficulty to map
  32. 32. Specific situations (darkness) Scenes with difficulty to map © Kato Masaki, ARMS
  33. 33. Specific scenes (battle scenes) © Oi Masakazu, Joouari Scenes with difficulty to map
  34. 34. Specific situations (internal speak) Scenes with difficulty to map © Oi Masakazu, Joouari
  35. 35. Unusual cases (Case closed series) Scenes with difficulty to map © Gosho Aoyama, Detective Conan (Case closed) which?
  36. 36. Speaker-Line Dataset construction system System for annotating even a large number of lines Blurred evaluations in certain genres and scenes Genre: science fiction, battle Scene: difficult-to-grasp frames (e.g. battle scenes, darkness) Difficulty of speaker estimation Blurring exists even in human evaluation Consideration of the best number of people to annotate Discussion and prospects
  37. 37. Efficiency of annotation assignment l Less annotators for easy scene l More annotators for difficult scene Discussion and prospects © Yagami Ken, HisokaReturns © Kato Masaki, ARMS who?
  38. 38. Efficiency of annotation assignment l Less annotators for easy scene l More annotators for difficult scene Discussion and prospects
  39. 39. Efficiency of annotation assignment l Less annotators for easy scene l More annotators for difficult scene Clarify the level of difficulty in annotation & Reconsider the required number of annotators Discussion and prospects
  40. 40. Summary Background The need to recognize the components of comics Focus on the relationship between lines and characters (speaker estimation) Research purpose Propose and develop systems to easily construct datasets Analysis of Speaker-Line Dataset for Manga109 Proposed methods The larger the target and the shorter the distance to the target, the quicker the movement Speakers and lines are often close together ➔Drag and drop lines to the speaker Dataset construction A total of 749,856 annotations assigned by 56 people Average of about 5 persons evaluating per line Analysis of datasets Agreement rate of the annotations Changes in the rate of perfect matches in ratings Scenes with difficult to map Discussion and prospects Blurred evaluations in certain genres and scenes Difficulty of speaker estimation Efficiency of annotation assignment

×