Vsum2020 presentation

•Transferir como PPTX, PDF•

0 gostou•18 visualizações

Alison Reboud

Presentation of the MeMAD team for the VSUM task from TrecVid 2020

Tecnologia

Using Fan-Made Content, Subtitles and
Face Recognition for Character-Centric
Video Summarization
Ismail Harrando, Alison Reboud, Jorma Laaksonen, Pasquale
Lisena, Anja Virkkunen, Mikko Kurimo, Raphaël Troncy

MeMAD Team - Takeaways
■ Contribution: Showing that matching fan-written synopses
with episodes transcripts is an efficient starting point for
character-centric video summarization
■ Results: 32 % in average (scoring over 3 variables + 5 qualitative
questions)
■ Observation: Except from the redundancy score, runs with
more shots rarely yield better results

Approach: Leveraging on fan synopses
Corpus Creation Alignment
and
Preprocessing
Matching and runs
generation

Corpus Creation
■ Scraping synopses from Fandom EastEnders Wiki [1]
Hypothesis: every sentence represents an equally important event to be added to the
final video summary
■ Shots Selection with Face Recognition for query characters :
● Crawl the web with name of actor + “EastEnders” (e.g.: Janine = Charlie Brooks)
● Perform face detection and recognition [2] with one model for each of the 3
characters (Janine, Ryan, Stacey)
[1] https://eastenders.fandom.com/wiki/EastEndersWiki
[2] https://github.com/D2KLab/Face-Celebrity-Recognition

Alignment and preprocessing
■ Synopses:
● Concatenate text from weekdays episodes to longer “EastEnders Omnibus”
episodes
● Split into sentences and perform coreference resolution [3] to explicit
character mention
■ Transcript: From XML file to corresponding shot using timestamps
■ Synopses and Transcript: lower case, stop words removal and
lemmatization
[3] https://github.com/huggingface/neuralcoref

Matching and runs generation
■ Synopses sentence / shot transcript pairwise comparison by
generating a similarity score
■ Length penalizing to avoid matching with unrelated scenes
■ TFiDF to avoid having only short sentences matching with the highest
scores
■ Order shot by similarity: only keep best match for each shot
■ Choose the N most matching shots in chronological order

Average score for each run
TeamRun Max # shots Max Summary
length
Score
MeMAD1 5 shots 150 sec 31%
MeMAD2 10 shots 300 sec 31%
MeMAD3 15 shots 450 sec 35%
MeMAD4 20 shots 600 sec 32%

Team_Run_Query Tempo Contextuality Redundancy Q1 Q2 Q3 Q4 Q5 final_score
MeMAD_1_Janine 6 4 5 No No No No Yes 29
MeMAD_2_Janine 5 5 6 No No No No Yes 28
MeMAD_3_Janine 5 5 6 No No No No Yes 28
MeMAD_4_Janine 5 5 7 No No No No Yes 27
MeMAD_1_Ryan 4 5 3 No No No No Yes 30
MeMAD_2_Ryan 5 5 3 No No No No Yes 31
MeMAD_3_Ryan 3 4 5 No No No Yes Yes 42
MeMAD_4_Ryan 2 3 5 No No No Yes Yes 40
MeMAD_1_Stacey 6 5 2 No Yes No No No 33
MeMAD_2_Stacey 6 5 2 No Yes No No No 33
MeMAD_3_Stacey 6 6 2 No Yes No No No 44
MeMAD_4_Stacey 4 5 4 No Yes No No No 29
Individual scores

■ For Tempo we decided to include the next shot if sentence was cut. Rather good
results despite no additional attempt to specifically optimise for Tempo,
Contextuality and Redundancy
■ Qualitative Questions: not more than two (at best) out of five questions
■ When generating the runs, having read the summaries but not having watched
the videos
1. We had the impression that the shots chosen were important
2. We found out it is very challenging for humans too to choose between two
potentially interesting moments without knowing beforehand the
questions included in the evaluation set
Discussion

Thank you for your attention!
Any questions?

Mais conteúdo relacionado

Último

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

CloudStudio User manual (basic edition):comworks

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Training state-of-the-art general text embeddingZilliz

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

Destaque

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

12 Ways to Increase Your Influence at WorkGetSmarter

ChatGPT webinar slidesAlireza Esmikhani

More than Just Lines on a Map: Best Practices for U.S Bike RoutesProject for Public Spaces & National Center for Biking and Walking

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference

Barbie - Brand Strategy PresentationErica Santiago

Destaque (20)

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

ChatGPT webinar slides

More than Just Lines on a Map: Best Practices for U.S Bike Routes

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...

Barbie - Brand Strategy Presentation

Vsum2020 presentation

1. Using Fan-Made Content, Subtitles and Face Recognition for Character-Centric Video Summarization Ismail Harrando, Alison Reboud, Jorma Laaksonen, Pasquale Lisena, Anja Virkkunen, Mikko Kurimo, Raphaël Troncy

2. MeMAD Team - Takeaways ■ Contribution: Showing that matching fan-written synopses with episodes transcripts is an efficient starting point for character-centric video summarization ■ Results: 32 % in average (scoring over 3 variables + 5 qualitative questions) ■ Observation: Except from the redundancy score, runs with more shots rarely yield better results

3. Approach: Leveraging on fan synopses Corpus Creation Alignment and Preprocessing Matching and runs generation

4. Corpus Creation ■ Scraping synopses from Fandom EastEnders Wiki [1] Hypothesis: every sentence represents an equally important event to be added to the final video summary ■ Shots Selection with Face Recognition for query characters : ● Crawl the web with name of actor + “EastEnders” (e.g.: Janine = Charlie Brooks) ● Perform face detection and recognition [2] with one model for each of the 3 characters (Janine, Ryan, Stacey) [1] https://eastenders.fandom.com/wiki/EastEndersWiki [2] https://github.com/D2KLab/Face-Celebrity-Recognition

5. Alignment and preprocessing ■ Synopses: ● Concatenate text from weekdays episodes to longer “EastEnders Omnibus” episodes ● Split into sentences and perform coreference resolution [3] to explicit character mention ■ Transcript: From XML file to corresponding shot using timestamps ■ Synopses and Transcript: lower case, stop words removal and lemmatization [3] https://github.com/huggingface/neuralcoref

6. Matching and runs generation ■ Synopses sentence / shot transcript pairwise comparison by generating a similarity score ■ Length penalizing to avoid matching with unrelated scenes ■ TFiDF to avoid having only short sentences matching with the highest scores ■ Order shot by similarity: only keep best match for each shot ■ Choose the N most matching shots in chronological order

7. Average score for each run TeamRun Max # shots Max Summary length Score MeMAD1 5 shots 150 sec 31% MeMAD2 10 shots 300 sec 31% MeMAD3 15 shots 450 sec 35% MeMAD4 20 shots 600 sec 32%

8. Team_Run_Query Tempo Contextuality Redundancy Q1 Q2 Q3 Q4 Q5 final_score MeMAD_1_Janine 6 4 5 No No No No Yes 29 MeMAD_2_Janine 5 5 6 No No No No Yes 28 MeMAD_3_Janine 5 5 6 No No No No Yes 28 MeMAD_4_Janine 5 5 7 No No No No Yes 27 MeMAD_1_Ryan 4 5 3 No No No No Yes 30 MeMAD_2_Ryan 5 5 3 No No No No Yes 31 MeMAD_3_Ryan 3 4 5 No No No Yes Yes 42 MeMAD_4_Ryan 2 3 5 No No No Yes Yes 40 MeMAD_1_Stacey 6 5 2 No Yes No No No 33 MeMAD_2_Stacey 6 5 2 No Yes No No No 33 MeMAD_3_Stacey 6 6 2 No Yes No No No 44 MeMAD_4_Stacey 4 5 4 No Yes No No No 29 Individual scores

9. ■ For Tempo we decided to include the next shot if sentence was cut. Rather good results despite no additional attempt to specifically optimise for Tempo, Contextuality and Redundancy ■ Qualitative Questions: not more than two (at best) out of five questions ■ When generating the runs, having read the summaries but not having watched the videos 1. We had the impression that the shots chosen were important 2. We found out it is very challenging for humans too to choose between two potentially interesting moments without knowing beforehand the questions included in the evaluation set Discussion

10. Thank you for your attention! Any questions?

Vsum2020 presentation

Recomendados

Recomendados

Mais conteúdo relacionado

Último

Último (20)

Destaque

Destaque (20)

Vsum2020 presentation