SlideShare uma empresa Scribd logo
1 de 24
Arjan van Hessen Speech & Language TechnologyA.J.vanHessen@ewi.utwente.nl Searching in spoken wordsDisclosure of recorded content in MediaMosa SURFnetRelatiedagen 2010Noordwijkerhout December 9, 2010
Content Introduction Why speech is so important What is HLT? Working applications: Self-service (Internet & Telephony) Searching in recorded audiovisual recordings Demonstrations
Humans as speaking creatures The start of the human speech started some 100.000 years ago. Before, the shape of the vocal track was not “ready” for the modern speech. The larynx was situated too high, something you can see with chimps.
Humans as writing creatures Sumer (3300 AD, Mesopotamia) is probably the oldest written language. NU -3300text -10.000farming -100.000speech
What is HLT? Human Language Technology is the technology that mimics the human language capacity. Language UNDERSTANDING speech text sign
Redundancy Vlgoneseenoznrdeeok op eenEglneseuvinretsietmkaat het neituitinwlkeevloogdre de ltteers in eenwroodsaatn, het eingewatblegnaijrk is is dat de eretse en de ltaatseltteer op de jiutsepatalssaatn. De rset van de ltteersmgoenwllikueirggpletaastwdoren en je knutvrelvogensgwoeonlzeenwatersaatt. Ditkmotodmat we neitekleltteer op zcihlzeen maar het wroodalsgheeel.
Pensez a cequevsavezfnit et demandezvs i<<est coque 3 ai
Working applications Dialogue systems (telephony, real time, limited complexity) Disclosure systems (high quality audio, offline, complex)
ContactCenter Voice HLT Natural Language Search Web Mobile
Companies using speech technology
How may I help you Why are they calling? Classification based on the recognition of the question: “how may I help you” Who is calling? Identification via ZIP-code and house number
Organisations using speech technology
Disclosure of audiovisual archives The number of AV-archives on the Internet increases rapidly Archiving is not enough: disclosure and reusing is required! The use of HLT is needed (humans cost too much).
Digitalized (historic) collections Digital recorded collections WFH H.M. Koningin Wilhelmina Second feministic wave Buchenwald Memories of Indonesia LVSR
Searching in historic radio recordings:Radio Oranje
Oral History: Buchenwald
Oral History: Brandgrens, Rotterdam 10 getuigen van het bombardement van Rotterdam (mei ‘40) vertellen hun verhaal. TST wordt gebruikt om in de getuigenissen te zoeken.
Searching in the radio interviews of WFH
Searching in 46 interview collections:getuigenverhalen (600 hour)
Searching in 500 interviews in Croatia
CroMe - Audio Search Searching for: commandant Phrase boundaries 5 fragments found Found word (5x commandant)
CroMe - Audio Search Search word traumas Language found
Politicalmetings
Parliament transcriptions  Gisteren  was  er  een  bespreking    ivm    de  betrekkingen tussen Nederland en Vlaanderen

Mais conteúdo relacionado

Semelhante a Transcription verhaal2010

MediaMosa Transcription technology
MediaMosa Transcription technologyMediaMosa Transcription technology
MediaMosa Transcription technology
MediaMosa
 
Presentation zadi, Origin of Language
Presentation zadi, Origin of LanguagePresentation zadi, Origin of Language
Presentation zadi, Origin of Language
Zadi Rafique
 
Do we need linguistic knowledge for speech technology applications in African...
Do we need linguistic knowledge for speech technology applications in African...Do we need linguistic knowledge for speech technology applications in African...
Do we need linguistic knowledge for speech technology applications in African...
Guy De Pauw
 
Introduction of linguistic
Introduction of linguisticIntroduction of linguistic
Introduction of linguistic
Florizqul Shodiq
 
Origins of language
Origins of languageOrigins of language
Origins of language
Jasmine Wong
 
Origins of language
Origins of languageOrigins of language
Origins of language
Jasmine Wong
 
CURRENT ANTHROPOLOGY Volume 34, Number r, February 1993© 199.docx
CURRENT ANTHROPOLOGY Volume 34, Number r, February 1993© 199.docxCURRENT ANTHROPOLOGY Volume 34, Number r, February 1993© 199.docx
CURRENT ANTHROPOLOGY Volume 34, Number r, February 1993© 199.docx
annettsparrow
 

Semelhante a Transcription verhaal2010 (16)

MediaMosa Transcription technology
MediaMosa Transcription technologyMediaMosa Transcription technology
MediaMosa Transcription technology
 
Large-Scale Computational Research in Arts & Humanities
Large-Scale Computational Research in Arts & HumanitiesLarge-Scale Computational Research in Arts & Humanities
Large-Scale Computational Research in Arts & Humanities
 
Presentation zadi, Origin of Language
Presentation zadi, Origin of LanguagePresentation zadi, Origin of Language
Presentation zadi, Origin of Language
 
Do we need linguistic knowledge for speech technology applications in African...
Do we need linguistic knowledge for speech technology applications in African...Do we need linguistic knowledge for speech technology applications in African...
Do we need linguistic knowledge for speech technology applications in African...
 
Class 04 hist ling
Class 04 hist lingClass 04 hist ling
Class 04 hist ling
 
Exploring rhetoric in the Electronic Enlightenment
Exploring rhetoric in the Electronic EnlightenmentExploring rhetoric in the Electronic Enlightenment
Exploring rhetoric in the Electronic Enlightenment
 
Introduction of linguistic
Introduction of linguisticIntroduction of linguistic
Introduction of linguistic
 
Origins of language
Origins of languageOrigins of language
Origins of language
 
TEI for building multilingual corpora
TEI for building multilingual corporaTEI for building multilingual corpora
TEI for building multilingual corpora
 
Chapter14
Chapter14Chapter14
Chapter14
 
Origins of language
Origins of languageOrigins of language
Origins of language
 
Origins of language
Origins of languageOrigins of language
Origins of language
 
CURRENT ANTHROPOLOGY Volume 34, Number r, February 1993© 199.docx
CURRENT ANTHROPOLOGY Volume 34, Number r, February 1993© 199.docxCURRENT ANTHROPOLOGY Volume 34, Number r, February 1993© 199.docx
CURRENT ANTHROPOLOGY Volume 34, Number r, February 1993© 199.docx
 
The origins of language
The origins of languageThe origins of language
The origins of language
 
The story of language BS English literature notes
The story of language BS English literature notesThe story of language BS English literature notes
The story of language BS English literature notes
 
Communication skills about language
Communication skills about languageCommunication skills about language
Communication skills about language
 

Mais de MediaMosa

To be continued... Completing the lifecycle of innovation - TNC2013, 5 june 2013
To be continued... Completing the lifecycle of innovation - TNC2013, 5 june 2013To be continued... Completing the lifecycle of innovation - TNC2013, 5 june 2013
To be continued... Completing the lifecycle of innovation - TNC2013, 5 june 2013
MediaMosa
 
Videoservices via SURFconext - 29 maart 2012
Videoservices via SURFconext  - 29 maart 2012Videoservices via SURFconext  - 29 maart 2012
Videoservices via SURFconext - 29 maart 2012
MediaMosa
 
MediaMosa – BasicLTI Provider - Community day - 8 december 2011
MediaMosa – BasicLTI Provider - Community day - 8 december 2011MediaMosa – BasicLTI Provider - Community day - 8 december 2011
MediaMosa – BasicLTI Provider - Community day - 8 december 2011
MediaMosa
 

Mais de MediaMosa (20)

MediaMosa as a Mass-storage Solution - 11 december 2013, Brussels
MediaMosa as a Mass-storage Solution - 11 december 2013, BrusselsMediaMosa as a Mass-storage Solution - 11 december 2013, Brussels
MediaMosa as a Mass-storage Solution - 11 december 2013, Brussels
 
Drupalcafe meets MediaMosa - 17 october 2013, Utrecht
Drupalcafe meets MediaMosa  - 17 october 2013, UtrechtDrupalcafe meets MediaMosa  - 17 october 2013, Utrecht
Drupalcafe meets MediaMosa - 17 october 2013, Utrecht
 
To be continued... Completing the lifecycle of innovation - TNC2013, 5 june 2013
To be continued... Completing the lifecycle of innovation - TNC2013, 5 june 2013To be continued... Completing the lifecycle of innovation - TNC2013, 5 june 2013
To be continued... Completing the lifecycle of innovation - TNC2013, 5 june 2013
 
MediaMosa Foundation - Webstroom 15 april 2013
MediaMosa Foundation - Webstroom  15 april 2013MediaMosa Foundation - Webstroom  15 april 2013
MediaMosa Foundation - Webstroom 15 april 2013
 
MediaSalsa - Inuits and RUG - Video Vendor Event - 19 juni 2012 - Pitch prese...
MediaSalsa - Inuits and RUG - Video Vendor Event - 19 juni 2012 - Pitch prese...MediaSalsa - Inuits and RUG - Video Vendor Event - 19 juni 2012 - Pitch prese...
MediaSalsa - Inuits and RUG - Video Vendor Event - 19 juni 2012 - Pitch prese...
 
MediaSalsa - Inuits and RUG - Video Vendor Event - 19 juni 2012
MediaSalsa - Inuits and RUG - Video Vendor Event - 19 juni 2012MediaSalsa - Inuits and RUG - Video Vendor Event - 19 juni 2012
MediaSalsa - Inuits and RUG - Video Vendor Event - 19 juni 2012
 
Inuits en RUG - MediaSalsa - Video Vendor Event 19 juni 2012 - Short
Inuits en RUG - MediaSalsa - Video Vendor Event 19 juni 2012 - ShortInuits en RUG - MediaSalsa - Video Vendor Event 19 juni 2012 - Short
Inuits en RUG - MediaSalsa - Video Vendor Event 19 juni 2012 - Short
 
A new approach for adding metadata to online Media - TNC2012
A new approach for adding metadata to online Media - TNC2012A new approach for adding metadata to online Media - TNC2012
A new approach for adding metadata to online Media - TNC2012
 
Videoservices via SURFconext - 29 maart 2012
Videoservices via SURFconext  - 29 maart 2012Videoservices via SURFconext  - 29 maart 2012
Videoservices via SURFconext - 29 maart 2012
 
MediaMosa – BasicLTI Provider - Community day - 8 december 2011
MediaMosa – BasicLTI Provider - Community day - 8 december 2011MediaMosa – BasicLTI Provider - Community day - 8 december 2011
MediaMosa – BasicLTI Provider - Community day - 8 december 2011
 
MediaMosa 3.x Release 
& Transcripting Project - Community day - 8 december ...
MediaMosa 3.x Release 
& Transcripting Project  - Community day - 8 december ...MediaMosa 3.x Release 
& Transcripting Project  - Community day - 8 december ...
MediaMosa 3.x Release 
& Transcripting Project - Community day - 8 december ...
 
MediaMosa Future - Community day 8 december 2011
MediaMosa Future - Community day 8 december 2011MediaMosa Future - Community day 8 december 2011
MediaMosa Future - Community day 8 december 2011
 
Rich Media Extra - MediaMosa Ingestor
Rich Media Extra - MediaMosa IngestorRich Media Extra - MediaMosa Ingestor
Rich Media Extra - MediaMosa Ingestor
 
WCAG compliancy for MediaMosa
WCAG compliancy for MediaMosaWCAG compliancy for MediaMosa
WCAG compliancy for MediaMosa
 
Using OpenSource LCMS Chamilo 2.0 with MediaMosa
Using OpenSource LCMS Chamilo 2.0 with MediaMosaUsing OpenSource LCMS Chamilo 2.0 with MediaMosa
Using OpenSource LCMS Chamilo 2.0 with MediaMosa
 
Media & Learning Brussels 24 November 2011
Media & Learning Brussels 24 November 2011Media & Learning Brussels 24 November 2011
Media & Learning Brussels 24 November 2011
 
MediaMosa Player v2_0 - OPEN SOURCE HTML5 WEBLECTURES PLAYER
MediaMosa Player v2_0 - OPEN SOURCE HTML5 WEBLECTURES PLAYERMediaMosa Player v2_0 - OPEN SOURCE HTML5 WEBLECTURES PLAYER
MediaMosa Player v2_0 - OPEN SOURCE HTML5 WEBLECTURES PLAYER
 
Seminar 20111122 - MediaMosa projects
Seminar 20111122 - MediaMosa projectsSeminar 20111122 - MediaMosa projects
Seminar 20111122 - MediaMosa projects
 
Norwegian studietur MediaMosa 22 november 2011
Norwegian studietur MediaMosa   22 november 2011Norwegian studietur MediaMosa   22 november 2011
Norwegian studietur MediaMosa 22 november 2011
 
Rondetafel bijeenkomst MediaMosa -17 november 2011
Rondetafel bijeenkomst MediaMosa -17 november 2011Rondetafel bijeenkomst MediaMosa -17 november 2011
Rondetafel bijeenkomst MediaMosa -17 november 2011
 

Transcription verhaal2010

  • 1. Arjan van Hessen Speech & Language TechnologyA.J.vanHessen@ewi.utwente.nl Searching in spoken wordsDisclosure of recorded content in MediaMosa SURFnetRelatiedagen 2010Noordwijkerhout December 9, 2010
  • 2. Content Introduction Why speech is so important What is HLT? Working applications: Self-service (Internet & Telephony) Searching in recorded audiovisual recordings Demonstrations
  • 3. Humans as speaking creatures The start of the human speech started some 100.000 years ago. Before, the shape of the vocal track was not “ready” for the modern speech. The larynx was situated too high, something you can see with chimps.
  • 4. Humans as writing creatures Sumer (3300 AD, Mesopotamia) is probably the oldest written language. NU -3300text -10.000farming -100.000speech
  • 5. What is HLT? Human Language Technology is the technology that mimics the human language capacity. Language UNDERSTANDING speech text sign
  • 6. Redundancy Vlgoneseenoznrdeeok op eenEglneseuvinretsietmkaat het neituitinwlkeevloogdre de ltteers in eenwroodsaatn, het eingewatblegnaijrk is is dat de eretse en de ltaatseltteer op de jiutsepatalssaatn. De rset van de ltteersmgoenwllikueirggpletaastwdoren en je knutvrelvogensgwoeonlzeenwatersaatt. Ditkmotodmat we neitekleltteer op zcihlzeen maar het wroodalsgheeel.
  • 7. Pensez a cequevsavezfnit et demandezvs i<<est coque 3 ai
  • 8. Working applications Dialogue systems (telephony, real time, limited complexity) Disclosure systems (high quality audio, offline, complex)
  • 9. ContactCenter Voice HLT Natural Language Search Web Mobile
  • 11. How may I help you Why are they calling? Classification based on the recognition of the question: “how may I help you” Who is calling? Identification via ZIP-code and house number
  • 13. Disclosure of audiovisual archives The number of AV-archives on the Internet increases rapidly Archiving is not enough: disclosure and reusing is required! The use of HLT is needed (humans cost too much).
  • 14. Digitalized (historic) collections Digital recorded collections WFH H.M. Koningin Wilhelmina Second feministic wave Buchenwald Memories of Indonesia LVSR
  • 15. Searching in historic radio recordings:Radio Oranje
  • 17. Oral History: Brandgrens, Rotterdam 10 getuigen van het bombardement van Rotterdam (mei ‘40) vertellen hun verhaal. TST wordt gebruikt om in de getuigenissen te zoeken.
  • 18. Searching in the radio interviews of WFH
  • 19. Searching in 46 interview collections:getuigenverhalen (600 hour)
  • 20. Searching in 500 interviews in Croatia
  • 21. CroMe - Audio Search Searching for: commandant Phrase boundaries 5 fragments found Found word (5x commandant)
  • 22. CroMe - Audio Search Search word traumas Language found
  • 24. Parliament transcriptions Gisteren was er een bespreking ivm de betrekkingen tussen Nederland en Vlaanderen
  • 25. Recognition of lectures Record the speech Record the PPT Recognise the speech Use the display time of each slide as THE time unit Use the recognised speech as keywords for each slide
  • 26. Searching in news broadcasts
  • 27. Metadata -> Language model Text in the slide(s) Lectures handouts Language model Environmental texts

Notas do Editor

  1. Éénéén
  2. Q-go plaatjes