SlideShare uma empresa Scribd logo
1 de 62
Baixar para ler offline
Multimodal Features
for Linking Television Content
Petra Galuščáková
galuscakova@ufal.mff.cuni.cz
Institute of Formal and Applied Linguistics
NLP Applications 11. 5. 2017
2
Introduction
● Video: 80% of internet traffic in 2019
● It would take a single person 5 million years to view all the
video content crossing the internet during one month
● Mainly generated by streaming services such as Netflix,
Hulu Plus and Amazon Prime
● Followed by YouTube, Vimeo and Vine
● Each minute
● 77,160 hours of videos streamed by Netflix
● More than a million videos played by Vine users
● 300 hours of videos uploaded to YouTube
– rose from 6 hours in 2007, to 24-35 hours in 2010
3
Motivation
● Increasing number of audio-visual document
● Wide variety of typed of videos
● Progress in ASR and visual processing
systems
● Lack of effective systems for retrieving
information stored in these documents
4
Multimedia Search Engines
5
Mutlimedia Retrieval
● A set of methods for understanding information
stored in the various media in a manner comparable
with human understanding
● Semantic content of the documents needs to be at
first mined using automatic processing
● ASR, acoustic processing, image processing, face recognition,
signal processing, video content analysis, ...
● Videos
● Different modalities
● No structure
6
Search in Audio-Visual
Documents
● Input:
● Data collection (video recordings)
● Query
– Given as text
● Output:
● Relevant segments (passages) of documents
7
Search Examples
● “Medieival history of why castles were first built”
● “FA cup final, old & current, comparison. History of
football”
● “animal park, kenya marathon, wildlife reserve”
8
Speech Retrieval vs.
Spoken Term Detection
● Spoken Term Detection (Keyword Spotting) is often
not sufficient
● Documents must contain the exact word (or
sometimes different word forms)
E.g. “Rover finds bulletproof evidence of water on
early Mars” vs. “A bulletproof vest is an item of
personal armor that helps absorb the impact from
firearm-fired projectiles”
● Retrieval techniques allow to exploit – e.g. visual
content
9
Hyperlinking
● Input:
● Data collection (video
recordings)
● Query segment
● Output:
● Segments similar to the query
segment
10
Hyperlinking Definition
● Hyperlink: an electronic link providing
direct access from one distinctively marked
place in a hypertext or hypermedia
document to another in the same or a
different document.
● The source marked place of the link: anchor
● “’give me more information about this
anchor’ instead of ’give me more based on
this anchor or entity’”
11
Recommender Systems
● Focused on entertainment
● YouTube
● Generated by using a user’s personal activity
(watched, favourited, liked videos) [*]
● TED Talks
● Related talks manually selected by the
editors
[*] James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas
Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, and Dasarathi Sampath. 2010. The
YouTube video recommendation system. In Proc. of RecSys '10. ACM, New York, NY, USA,
293-296.
12
Our Approach
● Content-based search
● Combining speech, acoustic and visual
information
● Retrieval instead of word-spotting
● Passage Retrieval
● Retrieve relevant segments instead of
relevant documents
13
Outline
● System Description
● Audio Features
● Visual Features
● Demo User Interface
14
System Description
15
● Search and Hyperlinking (2012-2014)
● Video Hyperlinking (2015-)
Multimedia Benchmarks
16
Search and Hyperlinking
Task
● The main goal of the Search Subtask
● To find passages relevant to a user’s interest given by a
textual query in a large set of audio-visual recordings
● And of the Hyperlinking Subtask:
● To find more passages similar to the retrieved ones
● Scenario:
● A user wants to find a piece of information relevant to a
given query in a collection of TV programmes (Search
subtask)
● And then navigate through a large archive using
hyperlinks to the retrieved segments (Hyperlinking
subtask)
17
BBC Broadcast
Data
● Broadcast between 1. 4. 2008 and
31. 7. 2008
● News, documentaries, series,
entertainment programmes, quiz
shows, cookery shows, sport
programmes, ...
● Subtitles
● Three ASR transcripts
● LIMSI, LIUM, NST-Sheffield
● Metadata
● Prosodic features
● Shots, keyframes, visual concepts
18
Evaluation
● Using crowdsourcing
● MAP-bin and MAP-tol measures
● Adaptations of the MAP measure
● Proposed for evaluation of video content retrieval to
allow a segment retrieved near the relevant segment
(but not necessarily overlapping it) to also be
marked as relevant.
19
● Retrieve relevant
segments
● Divide documents into
50 and 60-second long
segments
● A new segment is created each 10 seconds
● Index textual segments
● A query segment is transformed to textual query
System Description
20
● Terrier IR Framework
● Hiemstra Language Model
● Porter Stemmer, Stopwords
● Post-filter retrieved segments
System Description
21
Passage Retrieval
● Documents are automatically divided into shorter
segments
● Segments serve as documents in the traditional IR
setup
● The segmentation is crucial for the quality of
the retrieval
– Especially the segment length
23
Hyperlinking Baseline
24
Acoustic Information in
Hyperlinking
25
Speech Retrieval Problems
1. Restricted vocabulary
● Data and query segment expansion
● Combination of transcripts
2. Lack if reliability
● Utilizing only the most confident words of the
transcripts
● Using confidence score
3. Lack of content
● Audio music information
● Acoustic similarity
26
1. Restricted Vocabulary
● Number of unique words in transcripts is
almost three times smaller than in subtitles.
● Low frequency words are expected to be the
most informative for information retrieval.
● Expand data and query segments
● Metadata
● Content surrounding the query segment
● Combine different transcripts
27
Data and Query Segment
Expansion
● Metadata
● Concatenate each data and query segment with
metadata of the corresponding file
● Title, episode title, description, short episode
synopsis, service name and program variant
● Content surrounding the query segment
● Use 200 seconds before and after the query
segment
28
Data and Query Segment
Expansion Results
29
MAP-bin vs. WER
30
Data and Query Segment
Expansion Results
● The improvement is significant in terms of both
measures
● Expansion using metadata and context can
substantially reduce query expansion problem.
● The highest MAP-tol score was achieved on the LIUM
transcripts.
● Even though relatively high WER
● Metadata and context produce much higher relative
improvement to the automatic transcripts than to
subtitles.
● MAP-bin score corresponds with the WER
31
Transcripts Combination
MAP-bin MAP-tol
32
Transcripts Combination
● The combination is generally helpful.
● Even though the high score achieved by the
LIUM transcripts
● The overall highest MAP-bin score was
achieved using union of LIMSI and NST
transcripts.
● Outperforms results achieved with subtitles
33
2. Transcripts Reliability
● WER
● LIMSI: 57.5%
● TED-LIUM: 65.1%
● NST-Sheffield: 58.6%
● Word variants
● Word confidence
34
Word Variants
● Compare utilization of the first, most reliable word
and all variants in LIMSI transcripts.
35
Word Confidence
● Only use words with high confidence scores
● Only words from LIMSI and LIUM transcripts with a
confidence score higher than a given treshold
● Increased both scores for development set
● It did not outperform fully transcribed test data
● We also experimented with voting
36
3. Lack of Content
● We only use content of subtitles/transcripts
● A wide range of acoustic attributes can also
be utilized: applause, music, shouts,
explosions, whispers, background noise, …
● Acoustic fingerprinting
● Acoustic similarity
37
Acoustic Fingerprinting
Motivation
● Obtain additional information about music in
the query segment
● Especially helpful for hyperlinking music
programmes
38
Acoustic Fingerprinting
1) Minimize noise in each query segment
● Query segments were divided into 10-second long
passages; a new passage was created each second
2) Submit sub-segments to Doreso API
3) Retrieve song title, artists and album
● Development set: 4 queries out of 30
● Test set: 10 queries out of 30
4) Concatenate title, artist and album name with
query segment text
● Both retrieval scores drop
39
Acoustic Similarity
Motivation
● Retrieve identical acoustic
segments
● E.g. signature tunes and jingles
● Detect semantically related
segments
● E.g. segments containing action
scenes and music
40
Acoustic Similarity
● Calculate similarity between data and query vector
sequences of prosodic features
● Find the most similar sequences near beginning
● Linearly combine the highest acoustic similarity with
text-based similarity score
● MAP-bin: 0.2689 → 0.2687, MAP-tol: 0.2465 → 0.2473
41
Acoustic Information in
Hyperlinking: Overview
42
Visual Information in
Hyperlinking
43
Visual Features
● Similar setting: Feature Signatures
● Object recognition: CNN descriptors
● Concept detection: CNN descriptors
● Same faces: SIMILE descriptors
44
Feature Signatures
● Approximate distribution of color and texture in the image
● Work especially well for recognition of similar background
and setting
● Calculated distance between each keyframe in query
segment and each keyframe in data segment.
45
Visual Similarity
46
Feature Signature Results
47
Feature Signatures
Positive Examples
48
Feature Signatures
Negative Examples
49
Hyperlinking Task Results
50
Object Recognition
● Similarity between keyframes calculated also using
deep convolutional network (AlexNet)
● Last layer features used for calculating similarity
● Improved results but worse than Feature
Signatures
51
Concept Detection
● System provided by MUNI:
● Retrieve images similar to the keyframe
● Use descriptions of similar images
– Text analysis and word semantic relationships
● e.g. people, indoors, young, two, canadian, plant, ...
● Concepts with higher confidence scores and
concepts with restricted number of occurrence
● MAP-bin: 0.2333 → 0.2368, MAP-tol: 0.1375 → 0.1638
52
Faces Recognition
● Created by Eyedea Recognition Framework
● Faces were first detected and geometrically aligned
with a canonical pose
● SIMILE descriptors were calculated
● A set of face descriptors representing person
identities were available for each face
● Faces were then compared by L2 distance on
calculated descriptors.
● MAP-bin: 0.2051 → 0.2088 MAP-tol: 0.1162 → 0.1281
53
Visual Information in
Hyperlinking: Overview
54
Demo
55
SHAMUS
● Open source tool for
● text-based search Search,
● retrieval of the topically related
Hyperlinks and
● determination of the most important
Anchoring segments in videos.
● Demo running on 1219 TED talks
http://ufal.mff.cuni.cz/shamus
56
SHAMUS
57
SHAMUS
● Based on subtitles/transcripts
● Uses Terrier framework
● Aimed at media professionals
● Uses video segments
● 1-minute long, overlapping
● Methods used at MediaEval and TRECVid
58
Search
● Textual query
59
Anchoring
● Find most interesting and important
segments of videos
● Further use in hyperlinking
● Convert metadata to query
● Marked as chapters
60
Hyperlinking
● Retrieve segments similar to each
anchoring segment on the fly.
● Convert segment to a textual query.
● 20 most frequent words (stopwords are
filtered out)
61
http://ufal.mff.cuni.cz/shamus
62
Conclusion
63
Conclusion
● Hyperlinking
● Content-based retrieval
● Text
● Audio Features
● Visual Features
● Demo

Mais conteúdo relacionado

Semelhante a Multimodal Features for Linking Television Content

Mpeg 7 video signature tools for content recognition
Mpeg 7 video signature tools for content recognitionMpeg 7 video signature tools for content recognition
Mpeg 7 video signature tools for content recognition
Parag Tamhane
 

Semelhante a Multimodal Features for Linking Television Content (20)

Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018
 
Network-Assisted Delivery of Adaptive Video Streaming Services through CDN, S...
Network-Assisted Delivery of Adaptive Video Streaming Services through CDN, S...Network-Assisted Delivery of Adaptive Video Streaming Services through CDN, S...
Network-Assisted Delivery of Adaptive Video Streaming Services through CDN, S...
 
London IR Meetup - Players in Vector Search_ algorithms, software and use cases
London IR Meetup - Players in Vector Search_ algorithms, software and use casesLondon IR Meetup - Players in Vector Search_ algorithms, software and use cases
London IR Meetup - Players in Vector Search_ algorithms, software and use cases
 
Research Proposal Presentation Pitch
Research Proposal Presentation PitchResearch Proposal Presentation Pitch
Research Proposal Presentation Pitch
 
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
 
Video Thumbnail Selector
Video Thumbnail SelectorVideo Thumbnail Selector
Video Thumbnail Selector
 
IEEEGlobecom'22-OL-RICHTER.pdf
IEEEGlobecom'22-OL-RICHTER.pdfIEEEGlobecom'22-OL-RICHTER.pdf
IEEEGlobecom'22-OL-RICHTER.pdf
 
(Slides) P2P video broadcast based on per-peer transcoding and its evaluatio...
(Slides) P2P video broadcast based on per-peer transcoding and its evaluatio...(Slides) P2P video broadcast based on per-peer transcoding and its evaluatio...
(Slides) P2P video broadcast based on per-peer transcoding and its evaluatio...
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
 
MediaEval 2018: The MediaEval 2018 Movie Recommendation Task: Recommending Mo...
MediaEval 2018: The MediaEval 2018 Movie Recommendation Task: Recommending Mo...MediaEval 2018: The MediaEval 2018 Movie Recommendation Task: Recommending Mo...
MediaEval 2018: The MediaEval 2018 Movie Recommendation Task: Recommending Mo...
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
Dinesh ppt
Dinesh pptDinesh ppt
Dinesh ppt
 
Mpeg 7 video signature tools for content recognition
Mpeg 7 video signature tools for content recognitionMpeg 7 video signature tools for content recognition
Mpeg 7 video signature tools for content recognition
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language Processing
 
SoftNews-lowres
SoftNews-lowresSoftNews-lowres
SoftNews-lowres
 
Netflix Playback Data Systems Team and Job Overview
Netflix Playback Data Systems Team and Job OverviewNetflix Playback Data Systems Team and Job Overview
Netflix Playback Data Systems Team and Job Overview
 
IRJET- Multimedia Summarization and Retrieval of News Broadcast
IRJET- Multimedia Summarization and Retrieval of News BroadcastIRJET- Multimedia Summarization and Retrieval of News Broadcast
IRJET- Multimedia Summarization and Retrieval of News Broadcast
 
OpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructureOpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructure
 
Linked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so far
 
Video Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingVideo Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive Streaming
 

Mais de Petra Galuscakova

CUNI at MediaEval 2013 Similar Segments in Social Speech Task
CUNI at MediaEval 2013 Similar Segments in Social Speech TaskCUNI at MediaEval 2013 Similar Segments in Social Speech Task
CUNI at MediaEval 2013 Similar Segments in Social Speech Task
Petra Galuscakova
 
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Petra Galuscakova
 
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmi
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmiČesko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmi
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmi
Petra Galuscakova
 
Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval
Penalty Functions for Evaluation Measures of Unsegmented Speech RetrievalPenalty Functions for Evaluation Measures of Unsegmented Speech Retrieval
Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval
Petra Galuscakova
 

Mais de Petra Galuscakova (8)

Combining Evidence for Cross-language Information Retrieval
Combining Evidence for Cross-language Information RetrievalCombining Evidence for Cross-language Information Retrieval
Combining Evidence for Cross-language Information Retrieval
 
Czech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test CollectionCzech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test Collection
 
Evaluácia tematického vyhľadávania v audiovizuálnych nahrávkach
Evaluácia tematického vyhľadávania v audiovizuálnych nahrávkachEvaluácia tematického vyhľadávania v audiovizuálnych nahrávkach
Evaluácia tematického vyhľadávania v audiovizuálnych nahrávkach
 
CUNI at MediaEval 2013 Similar Segments in Social Speech Task
CUNI at MediaEval 2013 Similar Segments in Social Speech TaskCUNI at MediaEval 2013 Similar Segments in Social Speech Task
CUNI at MediaEval 2013 Similar Segments in Social Speech Task
 
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
 
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmi
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmiČesko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmi
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmi
 
Application of Topic Segmentation in Audiovisual Information Retrieval
Application of Topic Segmentation in Audiovisual Information RetrievalApplication of Topic Segmentation in Audiovisual Information Retrieval
Application of Topic Segmentation in Audiovisual Information Retrieval
 
Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval
Penalty Functions for Evaluation Measures of Unsegmented Speech RetrievalPenalty Functions for Evaluation Measures of Unsegmented Speech Retrieval
Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval
 

Último

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Multimodal Features for Linking Television Content

  • 1. Multimodal Features for Linking Television Content Petra Galuščáková galuscakova@ufal.mff.cuni.cz Institute of Formal and Applied Linguistics NLP Applications 11. 5. 2017
  • 2. 2 Introduction ● Video: 80% of internet traffic in 2019 ● It would take a single person 5 million years to view all the video content crossing the internet during one month ● Mainly generated by streaming services such as Netflix, Hulu Plus and Amazon Prime ● Followed by YouTube, Vimeo and Vine ● Each minute ● 77,160 hours of videos streamed by Netflix ● More than a million videos played by Vine users ● 300 hours of videos uploaded to YouTube – rose from 6 hours in 2007, to 24-35 hours in 2010
  • 3. 3 Motivation ● Increasing number of audio-visual document ● Wide variety of typed of videos ● Progress in ASR and visual processing systems ● Lack of effective systems for retrieving information stored in these documents
  • 5. 5 Mutlimedia Retrieval ● A set of methods for understanding information stored in the various media in a manner comparable with human understanding ● Semantic content of the documents needs to be at first mined using automatic processing ● ASR, acoustic processing, image processing, face recognition, signal processing, video content analysis, ... ● Videos ● Different modalities ● No structure
  • 6. 6 Search in Audio-Visual Documents ● Input: ● Data collection (video recordings) ● Query – Given as text ● Output: ● Relevant segments (passages) of documents
  • 7. 7 Search Examples ● “Medieival history of why castles were first built” ● “FA cup final, old & current, comparison. History of football” ● “animal park, kenya marathon, wildlife reserve”
  • 8. 8 Speech Retrieval vs. Spoken Term Detection ● Spoken Term Detection (Keyword Spotting) is often not sufficient ● Documents must contain the exact word (or sometimes different word forms) E.g. “Rover finds bulletproof evidence of water on early Mars” vs. “A bulletproof vest is an item of personal armor that helps absorb the impact from firearm-fired projectiles” ● Retrieval techniques allow to exploit – e.g. visual content
  • 9. 9 Hyperlinking ● Input: ● Data collection (video recordings) ● Query segment ● Output: ● Segments similar to the query segment
  • 10. 10 Hyperlinking Definition ● Hyperlink: an electronic link providing direct access from one distinctively marked place in a hypertext or hypermedia document to another in the same or a different document. ● The source marked place of the link: anchor ● “’give me more information about this anchor’ instead of ’give me more based on this anchor or entity’”
  • 11. 11 Recommender Systems ● Focused on entertainment ● YouTube ● Generated by using a user’s personal activity (watched, favourited, liked videos) [*] ● TED Talks ● Related talks manually selected by the editors [*] James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, and Dasarathi Sampath. 2010. The YouTube video recommendation system. In Proc. of RecSys '10. ACM, New York, NY, USA, 293-296.
  • 12. 12 Our Approach ● Content-based search ● Combining speech, acoustic and visual information ● Retrieval instead of word-spotting ● Passage Retrieval ● Retrieve relevant segments instead of relevant documents
  • 13. 13 Outline ● System Description ● Audio Features ● Visual Features ● Demo User Interface
  • 15. 15 ● Search and Hyperlinking (2012-2014) ● Video Hyperlinking (2015-) Multimedia Benchmarks
  • 16. 16 Search and Hyperlinking Task ● The main goal of the Search Subtask ● To find passages relevant to a user’s interest given by a textual query in a large set of audio-visual recordings ● And of the Hyperlinking Subtask: ● To find more passages similar to the retrieved ones ● Scenario: ● A user wants to find a piece of information relevant to a given query in a collection of TV programmes (Search subtask) ● And then navigate through a large archive using hyperlinks to the retrieved segments (Hyperlinking subtask)
  • 17. 17 BBC Broadcast Data ● Broadcast between 1. 4. 2008 and 31. 7. 2008 ● News, documentaries, series, entertainment programmes, quiz shows, cookery shows, sport programmes, ... ● Subtitles ● Three ASR transcripts ● LIMSI, LIUM, NST-Sheffield ● Metadata ● Prosodic features ● Shots, keyframes, visual concepts
  • 18. 18 Evaluation ● Using crowdsourcing ● MAP-bin and MAP-tol measures ● Adaptations of the MAP measure ● Proposed for evaluation of video content retrieval to allow a segment retrieved near the relevant segment (but not necessarily overlapping it) to also be marked as relevant.
  • 19. 19 ● Retrieve relevant segments ● Divide documents into 50 and 60-second long segments ● A new segment is created each 10 seconds ● Index textual segments ● A query segment is transformed to textual query System Description
  • 20. 20 ● Terrier IR Framework ● Hiemstra Language Model ● Porter Stemmer, Stopwords ● Post-filter retrieved segments System Description
  • 21. 21 Passage Retrieval ● Documents are automatically divided into shorter segments ● Segments serve as documents in the traditional IR setup ● The segmentation is crucial for the quality of the retrieval – Especially the segment length
  • 24. 25 Speech Retrieval Problems 1. Restricted vocabulary ● Data and query segment expansion ● Combination of transcripts 2. Lack if reliability ● Utilizing only the most confident words of the transcripts ● Using confidence score 3. Lack of content ● Audio music information ● Acoustic similarity
  • 25. 26 1. Restricted Vocabulary ● Number of unique words in transcripts is almost three times smaller than in subtitles. ● Low frequency words are expected to be the most informative for information retrieval. ● Expand data and query segments ● Metadata ● Content surrounding the query segment ● Combine different transcripts
  • 26. 27 Data and Query Segment Expansion ● Metadata ● Concatenate each data and query segment with metadata of the corresponding file ● Title, episode title, description, short episode synopsis, service name and program variant ● Content surrounding the query segment ● Use 200 seconds before and after the query segment
  • 27. 28 Data and Query Segment Expansion Results
  • 29. 30 Data and Query Segment Expansion Results ● The improvement is significant in terms of both measures ● Expansion using metadata and context can substantially reduce query expansion problem. ● The highest MAP-tol score was achieved on the LIUM transcripts. ● Even though relatively high WER ● Metadata and context produce much higher relative improvement to the automatic transcripts than to subtitles. ● MAP-bin score corresponds with the WER
  • 31. 32 Transcripts Combination ● The combination is generally helpful. ● Even though the high score achieved by the LIUM transcripts ● The overall highest MAP-bin score was achieved using union of LIMSI and NST transcripts. ● Outperforms results achieved with subtitles
  • 32. 33 2. Transcripts Reliability ● WER ● LIMSI: 57.5% ● TED-LIUM: 65.1% ● NST-Sheffield: 58.6% ● Word variants ● Word confidence
  • 33. 34 Word Variants ● Compare utilization of the first, most reliable word and all variants in LIMSI transcripts.
  • 34. 35 Word Confidence ● Only use words with high confidence scores ● Only words from LIMSI and LIUM transcripts with a confidence score higher than a given treshold ● Increased both scores for development set ● It did not outperform fully transcribed test data ● We also experimented with voting
  • 35. 36 3. Lack of Content ● We only use content of subtitles/transcripts ● A wide range of acoustic attributes can also be utilized: applause, music, shouts, explosions, whispers, background noise, … ● Acoustic fingerprinting ● Acoustic similarity
  • 36. 37 Acoustic Fingerprinting Motivation ● Obtain additional information about music in the query segment ● Especially helpful for hyperlinking music programmes
  • 37. 38 Acoustic Fingerprinting 1) Minimize noise in each query segment ● Query segments were divided into 10-second long passages; a new passage was created each second 2) Submit sub-segments to Doreso API 3) Retrieve song title, artists and album ● Development set: 4 queries out of 30 ● Test set: 10 queries out of 30 4) Concatenate title, artist and album name with query segment text ● Both retrieval scores drop
  • 38. 39 Acoustic Similarity Motivation ● Retrieve identical acoustic segments ● E.g. signature tunes and jingles ● Detect semantically related segments ● E.g. segments containing action scenes and music
  • 39. 40 Acoustic Similarity ● Calculate similarity between data and query vector sequences of prosodic features ● Find the most similar sequences near beginning ● Linearly combine the highest acoustic similarity with text-based similarity score ● MAP-bin: 0.2689 → 0.2687, MAP-tol: 0.2465 → 0.2473
  • 42. 43 Visual Features ● Similar setting: Feature Signatures ● Object recognition: CNN descriptors ● Concept detection: CNN descriptors ● Same faces: SIMILE descriptors
  • 43. 44 Feature Signatures ● Approximate distribution of color and texture in the image ● Work especially well for recognition of similar background and setting ● Calculated distance between each keyframe in query segment and each keyframe in data segment.
  • 49. 50 Object Recognition ● Similarity between keyframes calculated also using deep convolutional network (AlexNet) ● Last layer features used for calculating similarity ● Improved results but worse than Feature Signatures
  • 50. 51 Concept Detection ● System provided by MUNI: ● Retrieve images similar to the keyframe ● Use descriptions of similar images – Text analysis and word semantic relationships ● e.g. people, indoors, young, two, canadian, plant, ... ● Concepts with higher confidence scores and concepts with restricted number of occurrence ● MAP-bin: 0.2333 → 0.2368, MAP-tol: 0.1375 → 0.1638
  • 51. 52 Faces Recognition ● Created by Eyedea Recognition Framework ● Faces were first detected and geometrically aligned with a canonical pose ● SIMILE descriptors were calculated ● A set of face descriptors representing person identities were available for each face ● Faces were then compared by L2 distance on calculated descriptors. ● MAP-bin: 0.2051 → 0.2088 MAP-tol: 0.1162 → 0.1281
  • 54. 55 SHAMUS ● Open source tool for ● text-based search Search, ● retrieval of the topically related Hyperlinks and ● determination of the most important Anchoring segments in videos. ● Demo running on 1219 TED talks http://ufal.mff.cuni.cz/shamus
  • 56. 57 SHAMUS ● Based on subtitles/transcripts ● Uses Terrier framework ● Aimed at media professionals ● Uses video segments ● 1-minute long, overlapping ● Methods used at MediaEval and TRECVid
  • 58. 59 Anchoring ● Find most interesting and important segments of videos ● Further use in hyperlinking ● Convert metadata to query ● Marked as chapters
  • 59. 60 Hyperlinking ● Retrieve segments similar to each anchoring segment on the fly. ● Convert segment to a textual query. ● 20 most frequent words (stopwords are filtered out)
  • 62. 63 Conclusion ● Hyperlinking ● Content-based retrieval ● Text ● Audio Features ● Visual Features ● Demo