Nick Campbell is a speech scientist who has held research positions at AT&T Bell Labs, IBM UK Scientific Centre, and ATR basic telecom research. He has also served on boards and as a professor. Campbell discusses the growth of speech and multimedia data collection over time, challenges around data management and privacy, and the need for standardized tools and resources to support research using large corpora.
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Speech Technology and Big Data
1. *
Nick Campbell
Speech Communication Lab
Trinity College Dublin, Ireland
2. *
* TCD – Stokes Professor (Dublin)
* CNGL – PI – Delivery & Interaction
* ELRA – board member / VP – speech
* ISCA – board member – workshops
* IEEE – Sig Proc Soc - SLTC member
* ATR/NiCT – research director(Japan)
* Speech Prosody 2014 (Dublin) host
* Speech scientist/researcher/corpus analyst
3. * AT&T Bell Labs
* The ideas people – think ‘BIG’
* IBM UK Scientific Centre
* The corpus people – ‘collect it all’
* ATR basic telecom research
* The fundamentals - learn how to ‘infer’ from it
*
4. * we used to be considered BIG – speech data
(and now multimedia) gobbled up memory
* I collected 1500 hours of everyday chat/daily
conversations in 2000 – (@1GB per minute) -
took 5-years to process!
* now Apple, Google, Ms, .. get that each minute
(but the secret is in the metadata)
* we need accessible data & tools for everybody!
*
5. * but we need to manage privacy issues first!
*
6. * and we need a way to protect IP as well
* written publications have ISBN standard
* work is now underway (cf ELRA & COCOSDA) to
institute ISLRN for Language Resources
* researchers need to get credit for corpora as
well as for publishing research results
* The community needs a way to identify,
acknowledge, attribute, and reference data
*
7. * tools for processing speech & multimodal data
* htk, hts, R, etc . . . not simple to use
* little consensus on what features to encode
* manual bootstrap – much too time-consuming!
*
8. * social interaction
* personal idiosyncracies
* group dynamics – multimodal data (TB/hr)
* issues of robustness / domain specificity /
privacy / storage & archiving / redistribution
*
9. context analytics:
* cultural and language-specific needs
* multimodal – multimedia – multilingual
* tools for ‘less-well-supported’ languages
* e.g., U-STAR consortium for speech research –
sharing tools & data & knowledge for research
*
10. * European Language Resources Association
* COCOSDA – int’l coordinating committee
* IEEE SLTC, ISCA SIGS, there are places to go
* but are they ready for really BIG data?
perhaps not yet . . .
*
11. * curricula prepare people
* what standards to rely on?
* what resources available?
* what features to extract?
* what tools to work with?
* what use to put it to?
* what info to hide?
* what to do next?
*