SlideShare uma empresa Scribd logo
1 de 12
*
    Nick Campbell
    Speech Communication Lab
    Trinity College Dublin, Ireland
*
    * TCD – Stokes Professor (Dublin)
    * CNGL – PI – Delivery & Interaction
    * ELRA – board member / VP – speech
    * ISCA – board member – workshops
    * IEEE – Sig Proc Soc - SLTC member
    * ATR/NiCT – research director(Japan)
    * Speech Prosody 2014 (Dublin) host

        * Speech scientist/researcher/corpus analyst
* AT&T Bell Labs
    * The ideas people – think ‘BIG’

* IBM UK Scientific Centre
    * The corpus people – ‘collect it all’

* ATR basic telecom research
    * The fundamentals - learn how to ‘infer’ from it


*
* we used to be considered BIG – speech data
  (and now multimedia) gobbled up memory
* I collected 1500 hours of everyday chat/daily
  conversations in 2000 – (@1GB per minute) -
  took 5-years to process!

* now Apple, Google, Ms, .. get that each minute
       (but the secret is in the metadata)

* we need accessible data & tools for everybody!

   *
* but we need to manage privacy issues first!




  *
* and we need a way to protect IP as well

* written publications have ISBN standard
* work is now underway (cf ELRA & COCOSDA) to
  institute ISLRN for Language Resources
* researchers need to get credit for corpora as
  well as for publishing research results
* The community needs a way to identify,
  acknowledge, attribute, and reference data



 *
* tools for processing speech & multimodal data

* htk, hts, R, etc . . .   not simple to use


* little consensus on what features to encode

* manual bootstrap – much too time-consuming!


*
* social interaction

* personal idiosyncracies

* group dynamics – multimodal data (TB/hr)

* issues of robustness / domain specificity /
 privacy / storage & archiving / redistribution


     *
context analytics:


* cultural and language-specific needs
* multimodal – multimedia – multilingual
* tools for ‘less-well-supported’ languages

* e.g., U-STAR consortium for speech research –
 sharing tools & data & knowledge for research



     *
* European Language Resources Association
* COCOSDA – int’l coordinating committee
* IEEE SLTC, ISCA SIGS, there are places to go

    * but are they ready for really BIG data?
               perhaps not yet . . .




                          *
* curricula prepare people

* what standards to rely on?
* what resources available?
* what features to extract?
* what tools to work with?
* what use to put it to?
* what info to hide?
* what to do next?

                               *
*

Mais conteúdo relacionado

Destaque

A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
SlideShare
 

Destaque (11)

Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)
 
Comment manager des geeks - Devoxx 2015
Comment manager des geeks - Devoxx 2015Comment manager des geeks - Devoxx 2015
Comment manager des geeks - Devoxx 2015
 
Annotation Processor, trésor caché de la JVM
Annotation Processor, trésor caché de la JVMAnnotation Processor, trésor caché de la JVM
Annotation Processor, trésor caché de la JVM
 
Querying Linked Data on Android
Querying Linked Data on AndroidQuerying Linked Data on Android
Querying Linked Data on Android
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 
Querying Linked Data
Querying Linked DataQuerying Linked Data
Querying Linked Data
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
Conférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUG
Conférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUGConférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUG
Conférence sur les annotations Java par Olivier Croisier (Zenika) au Paris JUG
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
 

Semelhante a Speech Technology and Big Data

Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
Alexandru Iosup
 

Semelhante a Speech Technology and Big Data (20)

GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016
 
Born Digital Archives
Born Digital ArchivesBorn Digital Archives
Born Digital Archives
 
Importance of Database in Library
Importance of Database in LibraryImportance of Database in Library
Importance of Database in Library
 
IWST 2013: Intro
IWST 2013: IntroIWST 2013: Intro
IWST 2013: Intro
 
Kathryn Cassidy - DRI Training Series: 4. Metadata and XML
Kathryn Cassidy - DRI Training Series: 4. Metadata and XMLKathryn Cassidy - DRI Training Series: 4. Metadata and XML
Kathryn Cassidy - DRI Training Series: 4. Metadata and XML
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
DRI Introduction to Digital Preservation Training- Metadata and xml-Kathryn C...
DRI Introduction to Digital Preservation Training- Metadata and xml-Kathryn C...DRI Introduction to Digital Preservation Training- Metadata and xml-Kathryn C...
DRI Introduction to Digital Preservation Training- Metadata and xml-Kathryn C...
 
iUser2011 Keynote: The Personal Information Environment beyond the Personal C...
iUser2011 Keynote: The Personal Information Environment beyond the Personal C...iUser2011 Keynote: The Personal Information Environment beyond the Personal C...
iUser2011 Keynote: The Personal Information Environment beyond the Personal C...
 
dbGLOVE (presentation at Silicon Valley Personal Health Technology)
dbGLOVE (presentation at Silicon Valley Personal Health Technology)dbGLOVE (presentation at Silicon Valley Personal Health Technology)
dbGLOVE (presentation at Silicon Valley Personal Health Technology)
 
Takeda 101214short-d
Takeda 101214short-dTakeda 101214short-d
Takeda 101214short-d
 
Six Use Cases for Edinburgh DataShare
Six Use Cases for Edinburgh DataShareSix Use Cases for Edinburgh DataShare
Six Use Cases for Edinburgh DataShare
 
Using islandora to build digital collections - 2016.01.29 OLA 2016
Using islandora to build digital collections - 2016.01.29 OLA 2016Using islandora to build digital collections - 2016.01.29 OLA 2016
Using islandora to build digital collections - 2016.01.29 OLA 2016
 
Digital Archive of Knowledge for Sharing and Re-using
Digital Archive of Knowledge for Sharing and Re-usingDigital Archive of Knowledge for Sharing and Re-using
Digital Archive of Knowledge for Sharing and Re-using
 
Challenges for Linked Data in Japan
Challenges for Linked Data in JapanChallenges for Linked Data in Japan
Challenges for Linked Data in Japan
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
 
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...
 
What's the fuss about all this metadata?
What's the fuss about all this metadata?What's the fuss about all this metadata?
What's the fuss about all this metadata?
 
An information environment for neuroscientists
An information environment for neuroscientistsAn information environment for neuroscientists
An information environment for neuroscientists
 
Ensuring Continuing Access to Online Scholarly Resources
Ensuring Continuing Access to Online Scholarly ResourcesEnsuring Continuing Access to Online Scholarly Resources
Ensuring Continuing Access to Online Scholarly Resources
 
Digital Cultural Heritage and the new EU Framework Programme
Digital Cultural Heritage and the new EU Framework ProgrammeDigital Cultural Heritage and the new EU Framework Programme
Digital Cultural Heritage and the new EU Framework Programme
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Speech Technology and Big Data

  • 1. * Nick Campbell Speech Communication Lab Trinity College Dublin, Ireland
  • 2. * * TCD – Stokes Professor (Dublin) * CNGL – PI – Delivery & Interaction * ELRA – board member / VP – speech * ISCA – board member – workshops * IEEE – Sig Proc Soc - SLTC member * ATR/NiCT – research director(Japan) * Speech Prosody 2014 (Dublin) host * Speech scientist/researcher/corpus analyst
  • 3. * AT&T Bell Labs * The ideas people – think ‘BIG’ * IBM UK Scientific Centre * The corpus people – ‘collect it all’ * ATR basic telecom research * The fundamentals - learn how to ‘infer’ from it *
  • 4. * we used to be considered BIG – speech data (and now multimedia) gobbled up memory * I collected 1500 hours of everyday chat/daily conversations in 2000 – (@1GB per minute) - took 5-years to process! * now Apple, Google, Ms, .. get that each minute (but the secret is in the metadata) * we need accessible data & tools for everybody! *
  • 5. * but we need to manage privacy issues first! *
  • 6. * and we need a way to protect IP as well * written publications have ISBN standard * work is now underway (cf ELRA & COCOSDA) to institute ISLRN for Language Resources * researchers need to get credit for corpora as well as for publishing research results * The community needs a way to identify, acknowledge, attribute, and reference data *
  • 7. * tools for processing speech & multimodal data * htk, hts, R, etc . . . not simple to use * little consensus on what features to encode * manual bootstrap – much too time-consuming! *
  • 8. * social interaction * personal idiosyncracies * group dynamics – multimodal data (TB/hr) * issues of robustness / domain specificity / privacy / storage & archiving / redistribution *
  • 9. context analytics: * cultural and language-specific needs * multimodal – multimedia – multilingual * tools for ‘less-well-supported’ languages * e.g., U-STAR consortium for speech research – sharing tools & data & knowledge for research *
  • 10. * European Language Resources Association * COCOSDA – int’l coordinating committee * IEEE SLTC, ISCA SIGS, there are places to go * but are they ready for really BIG data? perhaps not yet . . . *
  • 11. * curricula prepare people * what standards to rely on? * what resources available? * what features to extract? * what tools to work with? * what use to put it to? * what info to hide? * what to do next? *
  • 12. *