2. The problem
The massive increase in digital audio-visual information
poses high demands on advanced storage and search
engines for consumers and professional archives.
Video is now a natural form of communication
for the Internet and mobile devices.
Video search engines are the product of progress in many
technologies: visual and audio analysis, machine learning
techniques, as well as visualization and interaction.
giovedì 24 giugno 2010
3. Two solutions
www.vidivideo.info www.im3i.eu
giovedì 24 giugno 2010
4. VidiVideo: project overview
The VidiVideo project addressed the
challenge of creating a substantially
enhanced semantic access to video,
implemented in a search engine.
The outcome of the project is an audio-visual search
engine, composed of two parts: a automatic annotation
part, that runs off-line, where detectors for more
than 1000 semantic concepts are collected in a
thesaurus to process and automatically annotate the
video and an interactive part that provides a video
search engine for both technical and non-technical
users.
giovedì 24 giugno 2010
5. VidiVideo: project results
The automatic annotation part of the system performs audio
and video segmentation, speech recognition,
speaker clustering and semantic concept detection.
The VidiVideo system has achieved the highest
performance in the most important object and concept
recognition international contests (PASCAL VOC and
TRECVID).
The interactive part provides a desktop-based and a
web-based search engines. The system permits different
query modalities (free text, natural language, graphical
composition of concepts using boolean and temporal relations
and query by visual example) and visualizations for video
retrieval and browsing.
giovedì 24 giugno 2010
6. Call Identifier FP7-SME-2010-1
Submitted 03 December 2009
VidiVideo: project partners
Name of the co-ordinating person Dr.-Ing. Georgios Ioannidis
E-Mail gi@in-two.com
Fax +49-179-33-2286677
No. Participant Name Type Short Name Country
1 IN2 search interfaces development Ltd SME IN2 UK
2 spring techno GmbH SME SPRING DE
3 VISup Srl SME VISUP IT
4 Hogeschool voor de Kunsten Utrecht RTDP HKU NL
5 University Firenze RTDP UNIFI IT
6 Instituto de Engenharia de Sistemas e RTDP INESC-ID PT
Computadores
giovedì 24 giugno 2010
7. IM3I: project overview
IM3I aims to provide the creative media sector with new
ways of searching, summarising and visualising large
multimedia archives.
IM3I will provide a service-oriented architecture
that allow multiple viewpoints upon multimedia data that
are available in a repository, and provide better ways to
interact and share rich media. This paves the way for a
multimedia information management
platform which is more flexible, adaptable and
customisable than current repository software.
This in turn enables new opportunities for content
owners to exploit their digital assets.
giovedì 24 giugno 2010
8. IM3I: project results
Developed a set of tools for automatic audio-visual
annotation and search
Developed a set of web services to manage, create and
orchestrate the indexing services
Developed a set of specialized search and
management interfaces
IM3I authoring platform: allows professional users to
import and publish repositories of digital media, authoring of
web-based environments for the end-users, creation of
elaborate workflow patterns and search & retrieval interfaces
to allow a diversity of end-user interactions and scenarios
giovedì 24 giugno 2010
11. Video and scene segmentation
•Developed a new gradual transition detection algorithm
•Uses novel individual criteria that exhibit less sensitivity to local or global motion:
•Color Coherence Change
•Macbeth Color Histogram Change
•Luminance Center of Gravity Change
•Combines these criteria (and their multi-scale extensions) using a machine learning
technique
•Advantages:
•Significantly improved performance
•Lack of need for any threshold selection
Scene or story unit: collection of temporally
consecutive shots which are about the
same topic or event
•Developed a multimodal scene
segmentation based on Scene Transition
Graph
• Significantly improved performance
over visual-only STG
giovedì 24 giugno 2010
12. Audio analysis in VidiVideo
• Audio segmentation / audio diarization
• Audio events detection (AED)
• Automatic speech recognition (ASR)
• Language identification (LID)
giovedì 24 giugno 2010
13. Block diagram of audio processing
Current
Audio event detection framework Concept Detectors
s
Non Speech
Feature
extraction
Feature
Reductio
SVM
classification
AE 61 AE +
n 10 Sports (testing)
Audio
Segmentatio
Speech Speech 6 Speech
n
Speaker ID Reasoning Narrator, 3 Monologue
Anchor … Dialogue
Audio
Music
Data Detector
Music
3 Classes (base)
4 New (testing)
Telephone Low 1 Telephone
detector Frequenc
y
Detector
Audio --------------
Processing Total
74+10 (testing)
Video Processing Audio + Video
+(-3+4) (change
music detectors)
giovedì 24 giugno 2010
14. Audio events corpora
• Sound effect corpus: 18,700 short files (290 hrs.),
provided by B&G. Intrinsically labelled corpus.
• Selection of subset for training 61 semantic concepts
with more examples.
• Extended feature set: MFCCs, ZCR, Brightness / Audio
spectrum centroid, Bandwidth / Audio spectrum
spread Audio spectrum envelope, Audio spectrum
flatness, Pitch, Harmonicity
• Tested on Movies, Documentaries, Broadcast News,
and Talk Shows (TS).
• Mean Average Precision=0.459 (6 test concepts)
giovedì 24 giugno 2010
15. Machine learning
• Learning of many independent binary classification
tasks is computationally expensive
• KDA using Spectral Regression to solve this problem:
• The time complexity scales linearly with respect to
number of labels (i.e. concepts)
• Training in just 1.3 hours compared to 30.2 hours
using SVM, over 20 times faster! (MAP ~ the same)
• Tested on Pascal VOC 2008 (20 Concepts)
• Best Method in Pascal VOC 2008
• Ranked First in 9 out of 20 concepts
giovedì 24 giugno 2010
16. Color Features
Point sampling Color Descriptor
• Harris-Laplace • SIFT
• Dense sampling • OpponentSIFT
• WSIFT
Spatial Pyramid
• rgSIFT
• 1x1
• Transformed color SIFT
• 2x2
• 1x3
giovedì 24 giugno 2010
17. 0.25
Results
MediaMill Semantic Video Search Engine at TRECVID 2009
216 other concept detection methods
Our results
MediaMill concept detection method
0.2
0.15
TRECVid 2009
0.1
0.05
0
0 20 40 60 80 100 120 140 160 180 200 220
Concept Detection Task Submissions
•Good local descriptors: SIFT, OpponentSIFT, rgSIFT/WSIFT,
0.25
Transformed color SIFT
0.2
22 users of other video retrieval systems
2 users of MediaMill video search engine
•Combining these color features gives state-of-the-art
0.15
performance
•Drawback: computational costs, reduced adopting GPU
0.1
0.05
implementations (codebook creation is 80% of CPU time!) for 17x
speed-up
0
0 5 10 15
Interactive Search Task Submissions
20 25
giovedì 24 giugno 2010
19. Visual annotation
• Split a video detecting shots and large content changes
with very fast algorithm
• Use different annotation strategies and types of
detectors:
• low level (color, B/W, motion)
• Haar-based boosted classifiers
• HOG + SVMs
• Bag-of-words
• k-NN + voting
• simple MPEG-7 XML format (full and fragment)
giovedì 24 giugno 2010
20. Baseline: typical BoW
Hierarch.
clustering
Feature
extract.
visual words
histo
Learning
giovedì 24 giugno 2010
21. Fusion schemes
• Early fusion: integrates unimodal features before learning concepts.
• Late fusion: first reduces unim. feat. to separately learned concepts
scores, then these scores are integrated to learn concepts.
giovedì 24 giugno 2010
22. Fusion schemes
• Early fusion: integrates unimodal features before learning concepts.
• Late fusion: first reduces unim. feat. to separately learned concepts
scores, then these scores are integrated to learn concepts.
giovedì 24 giugno 2010
23. Early fusion approach
Hierarch.
clustering
• Hypothesis: MSER isolate semantically relevant information.
• Idea: represent points that have some spatial relation with regions that are inside, outside, just
on the border
• Sampling: SIFT-SURF, dense.
giovedì 24 giugno 2010
24. Late fusion approach
Hierarch.
clustering
Hierarch.
clustering
!"#
!1 !2
!"###$%#&'%(!")#*%+,$-#&'-(!")#*%+......$%#&'%(!")#*/+,$-#&'-(!")#*/+#
• Use SURF/SIFT + MSER
• Use geometric descriptors for MSERs
giovedì 24 giugno 2010
25. Test: baseline
Time Avg. Max
Method Sampling # points Time
accuracy accuracy
• Best: SURF 64 Grid 10 (accuracy, computational cost)
• SURF 64 Grid 5: +7-8% accuracy, +300% time
• the number of points influences accuracy
giovedì 24 giugno 2010
26. Test: early fusion
Sampling Avg. Max
Method # points Time Time
accuracy accuracy
• Best: EF SURF 64 Grid 10 (accuracy, computational cost)
• EF SURF 64 Borders: many points, accuracy ~ that of Grid 10 but higher
computational costs
• EF SURF 64 Grid 10 is worst than SURF 64 Grid 10, but much faster (50% of
execution time)
giovedì 24 giugno 2010
27. Test: late fusion
Method 1 Method 2 Accuracy
• weighting 0.6 (best method) and 0.4 (worst method) lead to good results
• best performance: dense sampling + sparse sampling
• best combination: SURF 64 + EF SURF 64 Grid 10 (improved accuracy, modest
computational cost increase)
giovedì 24 giugno 2010
28. Conclusions
• Early fusion strategies:
• ~ baseline accuracy
• faster
• Late fusion strategies:
• better accuracy than baseline
• each method corrects some errors made by the other
• fuse keypoints/regions (SURF, fusion of SURF and
MSER)
• IM3I users will be able to chose what’s best for them
giovedì 24 giugno 2010
30. Video search engine
Our goal is to provide a search engine for videos
for both technical and non-technical users.
Provide different interfaces that permit different query
modalities: free-text, natural language,
graphical composition of concepts using boolean and
temporal relations and query by visual example.
In addition, exploit ontologies and their structure
to encode semantic relations between
concepts permitting, for example, to expand queries to
synonyms and concept specializations.
giovedì 24 giugno 2010
31. Sirio and Orione
• Design goals/assumptions:
• semantic content-based retrieval
• efficient web-based interface
• System features: • System interface query options:
• Sirio is a Rich Internet • ontology exploration using a
Application (in Adobe Flex) front graph-based view
end.
• compact keyframe-based results
• Orione is web service search engine presentation / streaming videos
• Support for multiple ontologies • concept drag&drop facility (to build
and ontology reasoning complex queries)
• Results are in Media RSS format • natural language query (with Boolean/
(queries treated as RSS feeds) temporal ops.)
• New search engine able to scale • free text query (for Google-like
to large number of instances of search)
ontology concepts
giovedì 24 giugno 2010
40. Andromeda
• System interface query options:
• Design goals/assumptions:
• Shows the concepts with more
instances in a concept cloud view
• semantic content-based browsing
• efficient web-based interface using • Graph representation of
semantic data structure
RIA
• System features: • Multiple automatic layout algorithms
for spatial positioning and manual drag
• Query manager as a Rich Internet & drop
Application (in Adobe Flex).
Connects to web service (search • Thumbnails view of the instances of
each concept
engine)
• Support for multiple ontologies • Access to video metadata and video
streaming
and ontology reasoning
• Access to social content related
to ontology concepts (Flickr,YouTube,
and real time tweets from Twitter)
giovedì 24 giugno 2010
47. Pan
• Design goals/assumptions:
• complete/correct automatic
annotations
• System interface options
• help in training new automatic
• Integrated with web-based
concept detectors
search engine and automatic
• System features: video annotation
• Rich Internet Application • Multiple user profiles: a
(in Adobe Flex). simple user may change his own
annotations, while a super user
• video streaming using the same can import the annotations of
system of Sirio and Andromeda other users, e.g. to supervise
the annotation process
• new backend within an organization.
• geotagging using Google Maps
giovedì 24 giugno 2010
54. Daphnis
• Design goals/assumptions:
• build on image tagging made popular • System interface options
by Flickr and tag clouds
• users can tag images and retrieve
images based on tags, or use tags
• connect to social web sites to filter the results of similarity
based retrieval.
• allow CBIR
• System features: • Ongoing work:
• Rich Internet Application • merging with automatic video
annotation for automatic
(in Adobe Flex).
tagging
• Connects to Flickr (and also
• adoption of mechanisms for
Facebook, if needed)
tag suggestion, based on
• Approximate nearest recent research work in this
field (use content, tags and
neighbour search using MPEG-7
descriptors, to scale to large number geolocalization)
of images
giovedì 24 giugno 2010
59. IM3I: authoring platform
A CMS approach to repository
analysis, authoring and publication
giovedì 24 giugno 2010
60. IM3I: authoring platform
Authoring IM3I end-user functionality typically covers 5
distinctive stages:
• Importing an existing repository from RSS and various
XML streams
• Extending the associated datamodel
• Editing layout and editing features
• Editing Search and Retrieval interfaces
• Embedding the IM3I end-user interfaces in a (corporate)
website
giovedì 24 giugno 2010
61. Editing workflow demo
•Step 1: Importing a video-repository
•Step 2: Enhancing the datamodel
•Step 3: Authoring layouts
•Step 4: Publishing the repository
giovedì 24 giugno 2010
62. I: Importing a repository
•Importing an existing repository to an internal and
flexible datamodel
•Aggregating and harmonizing multiple repositories
•Visualisation of markup and preview of contents
•Flexibly mapping by drag-and-drop
giovedì 24 giugno 2010
63. I: Importing a repository
Mapping the
contents of video
RSS to an IM3I
Datamodel
giovedì 24 giugno 2010
64. II: Enhancing the Datamodel
•Datamodels contain the descriptions of your
repository and in this way stipulate what can be
shown to- or retrieved by an end-user.
•Datamodels can reference to each other
•Datamodels can be extended overtime by adding
elements
•Elements are based on types: media files, URIs, date,
string, etc.
•Elements can be shared across datamodels to allow
search & retrieval across multiple collections
giovedì 24 giugno 2010
65. II: Enhancing the Datamodel
Adding a ‘translation’ element to the datamodel
giovedì 24 giugno 2010
66. II: Enhancing the Datamodel
Adding a ‘translation’ element to the datamodel
giovedì 24 giugno 2010
67. III: Layout and Functionality
Easy manipulation of layout to a repository by:
•Table metaphor (easy editing of table
characteristics)
•Drag and drop graphical elements
•Drag and drop contents of repository in cells
•Easy manipulation of look and feel
•Easy adding editing functionalities to a layout
•Easy preview and markup functionalities
giovedì 24 giugno 2010
68. III: Layout and Functionality
Defining a layout table
giovedì 24 giugno 2010
69. III: Layout and Functionality
Dragging repository contents to layout
giovedì 24 giugno 2010
70. III: Layout and Functionality
Previewing layout
giovedì 24 giugno 2010
71. IV: Embedding in website
Easy blend- in of layouts in corporate websites
•By means of plugins for CMSs (e.g. Webmanager,
WordPress, Typo3)
•By <embed> </embed>
•Allowing for elaborate workflow patterns in
combining multiple layouts
giovedì 24 giugno 2010
72. IV: Embedding in website
Original
contents Added
Translation
Functionality
giovedì 24 giugno 2010
74. Atlante - process manager
• Main functions of this
• Web application that is used for application are:
creation, technical
administration and monitoring • creation of new type of
of IM3I processing pipeline (e.g. (distributed) process
automatic annotation process,
media transcoding, etc.) • params setting for new type
of process
• This web application has
• creation of “Multiprocess”
multiple user profile:
composed by sets of single
• managers (distributed) Processes
• administrators • starting/pausing/stopping a
process
• monitoring running processes
giovedì 24 giugno 2010
78. Gaia - media manager
• Web application that will be used for a technical
administration and monitoring of the database
• Main functions of this application are:
• media management
• configuration of metadata, broadcasters,
Annotations types, Concept types and Media types
• media annotations monitoring by technical backend
giovedì 24 giugno 2010
84. ACM MM 2010 Workshop
3rd International Workshop on Automated Information Extraction in Media Production
AIEMPro'10
Organizers:
Dr. Robbie De Sutter
Vlaamse Radio- en Televisieomroep - Medialab
Jean-Pierre Evain
European Broadcasting Union . Union Européenne de Radiotélévision
Dr. Gerald Friedland
ICSI (International Computer Science Institute)
Dr. Alberto Messina
RAI Radiotelevisione Italiana, Centre for Research and Technological Innovation
Dr. Masanori Sano
NHK (Japan Broadcasting Corporation) Science and Technology Research Laboratories
giovedì 24 giugno 2010