In the research project PISA we have investigated how powerful search engines can be build, given a library of audiovisual material that has been analysed objectively and intelligently
Powerful Google developer tools for immediate impact! (2023-24 C)
Fiat 20080921 results PISA
1. medialab
PISA – Proof of Concept
Production, Indexing and Search of Audiovisual Material
2. PISA - Positioning
! VRT-Medialab (medialab.vrt.be) - technical R&D
! IBBT (www.ibbt.be) – Interdisciplinary Research Institute
! PISA – Research Project on Production and Indexing of Audiovisual Media
! 21 Man-year
! Computer Assisted Manufacturing
! Unsupervised Feature Extraction
! Search Engine Technology
2
medialab
3. Context - Digital Media Production
Suprastructure – Metadata Mgnt
Production and distribution
Production and distribution
Editing Mastering
Media
Ingest Asset Mgnt Playout
Infrastructure - Networks and Storage
Production Platform
3
medialab
4. Digital Asset Management, Content Management…
Suprastructure – Metadata Mgnt
Production and distribution
Infrastructure - Networks and Storage
Production Platform
4
medialab
5. User Expectations
Communication
(Information)
Data General Data General Data General
Suprastructure – Metadata Mgnt
Data General Data General Data General
Meta Meta
Data Data
Production and distribution
Assumptions:
• An item is relevant or it is not
• A “scene” is the logical unit of search
Infrastructure - Networks and Storage
The ideal search engine
• retrieves all relevant items (recall 100%)
• without false positives (precision 100%)
• enables instant access to digital media
• with respect to intellectual property.
Production Platform
5
medialab
6. Archiving – Disclosure, Annotation,…
archiefnummer : ALG 20010813 1
fragmentnummer : 1
reeks : 1000 ZONNEN EN GARNALEN
Opzoekscherm FILM Set: 16 Aantal: 1 bandnummer : E03024404
blz 1 van 3 formaat : DBCM
trefwoorden: ibm and vrt fragmenttitel : 1000 ZONNEN & GARNALEN
beeld : KL/PALPLUS
archiefnummer: - fragmentduur : 18 20
uitzendjaar: maand: dag: tekst : 0'00quot; TOERISTISCH REPORTAGEMAGAZINE OVERZICHT
fragmentnummer: fragmentduur: ONDERWERPEN GENERIEK TOERISTISCH REPORTAGEMAGAZINE,
reeks: OVERZICHT ONDERWERPEN
formaat: bandnummer: 0'50quot; VANDAAG : KUNSTENAAR LUC HOFKENS ONTWIERP EEN OASE
aflevering: afleveringsnummer: OP ZIJN DAKTERRAS IN BORGERHOUT DIE DOET DENKEN AAN DE
programma: uitzenddatum: GRAND CANYON INTERVIEW MET LUC EN ZIJN VROUW
fragmenttitel: MARILOU BUITENBEELD DAK MET OMGEVING BUITENKANT
tekst: ARBEIDERSWONING, PANO OVER ROTSWANDEN, KRATEN MET WATER,
kategorie: BEPANTING, FOTOALBUM MET VERLOOP WERKEN
opnamedatum: opnamenummer: 4'00quot; JUNIOR : KLAARTJE ALAERTS, 13 JAAR WIL ASTRONAUTEN
journalist: rechthebbende: WORDEN ZE BEZOEKT HETEUROSPACE CENTER METRUIMTEVEREN,
RAKETTEN SIMULATIE IN RUIMTEVEER, INTERVIEW, HEEFT EEN
UFO GEZIEN MAAKT ZELF KLEIN RAKETJE, SCHIET HET AF
SETS 7'50quot; DE SCHEURKALENDER : ARCHIEF RECLAMEFILM IBM
The strings required for the operation are not defined INTERVIEW MAURICE DE WILDE, EERSTE PERSOONLIJKECOMPUTER
trefwoorden : BELGIE; BORGERHOUT; ARTIEST; OASE; KUNST; GRAND
CANYON (NATUURGEBIED); DAK; TERRAS; INTERVIEW; EURO
F11 F12 F13 F14 F17 F18 F19 F20 Ent SPACE CENTER; RUIMTEVAART; PC; BOOTTOCHT; RIJKDOM;
Eindigen Sets Refset Toon Vorige Volg/Leeg Thesaurus Commando Opzoeken PASSAGIER; GASTRONOMIE; RESTAURANT; PERSONEEL;
VAKANTIE; BINNENBEELD; SCHIP; BECKERS LEEN; VRT;
LOTTO; RADIOOMROEPSTER; KLANKSTUDIO; UITVINDING;
BARBECUE; BETONMOLEN; IBM; RECLAMESPOT
rechthebbende : VRT
6
medialab
8. Issues – Catch-22
-> Automated processing of information is a key
discriminator, but it requires correct and
structured metadata
-> “Annotation” of rich media requires semantic
awareness and interpretation, and thus it is at
best an approximation
-> Product Engineering is the source of structured
and meaningful information, but creative staff
are not susceptible to technology
8
medialab
9. Objectives - Proof of Concept
• One Set of Numbers(!)
• Model Driven Development
• Computer Assisted Manufacturing
• Unsupervised Feature Extraction
• Efficient Search and Retrieval
!
Develop an extensible data-model and a consistent application
framework, accessible via an intuitive user-interface
(! Digitizing analogue and disintegrated information flows)
9
medialab
11. Milestone 1 – Search Engine
! Search federation by system integration Search Client
! Facetted search (Custom Development)
! Integrated application of keywords
! Intuitive and structured presentation of results
! Direct access to audiovisual material
Legacy Video Library
(Basisplus)
<NewsML-G2>
Raw Material
(EBU Superpop) Media Asset Search Engine
Management System (Lucene/SOLR)
(Ardome)
Actual news items
(Ardome)
11
medialab
15. Milestone 2 – Feature Extraction
! Time-coded properties and indexing allow
random access to material fragments:
! Shot segmentation and Keyframe extraction
! Subtitle processing and Speech recognition
! Taxonomy-driven topic detection
! Face recognition
! Scene recognition
! Copy detection
Legacy Video Library
(Basisplus)
<NewsML-G2>
Raw Material Media Asset
(EBU Superpop) Management Asset
Media Search Engine
Management System (Lucene/SOLR)
(Ardome)(Ardome)
Actual news items
(Ardome)
Face
Detection
Shot Topic
Segmentation Detection
Media Speech
15
medialab
Production Recognition
16. Work in Process (due Q4 2008)
! Multi-lingual
! Access control and Intellectual Property Protection
! Audio segmentation and classification
! Music transcription
! Fractal-based visual indexing
! …
Media 16
medialab
Production
17. Conclusion
! Enterprise search – structured metadata, limited number of libraries, limited number
of records per library, dependencies between objects
! Intelligent search federation is aware of the media production process - scripts,
webpages, subtitles and formal annotation may represent the same editorial object
! Random access to audiovisual material requires an index is based on timecode and
not « wordposition in a document »
! Onthology-driven application logic is essential to create semantic awareness, i.e.
resolving synonyms and disambiguation of homonyms
! The perfect search engine is not for sale yet and required from the ground up design
and development.
17
medialab
18. Future Work - From « Metadata » to CAD/CAM
?
18
medialab
19. Future Work - From « Metadata » to CAD/CAM
?
19
medialab