O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Próximos SlideShares
Text data mining1
Transfira para ler offline e ver em ecrã inteiro.



Introduction to Text Mining and Semantics

Baixar para ler offline

Introduction to Text Mining and Semantics, presented by Seth Grimes at Nstein seminars in London, June 2009, and New York, September 2009.

Livros relacionados

Gratuito durante 30 dias do Scribd

Ver tudo

Audiolivros relacionados

Gratuito durante 30 dias do Scribd

Ver tudo

Introduction to Text Mining and Semantics

  1. 1. Introduction to Text Miningand Semantics<br />Seth Grimes<br />--<br />President, Alta Plana<br />
  2. 2. New York Times<br />October 9, 1958<br />
  3. 3. From text to information<br />“Text expresses a vast, rich range of information, but encodes this information in a form that is difficult to decipher automatically.” <br />-- Marti A. Hearst,<br />“Untangling Text Data Mining,” 1999<br />
  4. 4. Document input and processing<br />Information extraction<br />Knowledge management<br />Hans Peter Luhn, “A Business Intelligence System,” IBM Journal, October 1958<br />
  5. 5. Statistical analysis of content<br />“Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance.”<br />Hans Peter Luhn, <br />“The Automatic Creation of Literature Abstracts,” <br />IBM Journal, April 1958<br />
  6. 6. Statistical analysis limitations<br />“This rather unsophisticated argument on ‘significance’ avoids such linguistic implications as grammar and syntax... No attention is paid to the logical and semantic relationships the author has established.”<br />-- Hans Peter Luhn, 1958<br />
  7. 7. Semantic links<br />New York Times,<br />September 8, 1957<br />Anaphora / coreference: “They”<br />
  8. 8. Digital content universe<br />“The broadcast, media, and entertainment industries garner about 4% of the world’s revenues but already generate, manage, or otherwise oversee 50% of the digital universe.”<br />“The Diverse and Exploding Digital Universe,”<br />(IDC, 2008)<br />Approximately 70% of the digital universe is created by individuals.<br />
  9. 9. The “unstructured data” challenge<br />The digital universe:<br /><ul><li>Web sites, news & journal articles, images, video.
  10. 10. Blogs, forum postings, and social media.
  11. 11. E-mail, Contact-center notes and transcripts; recorded conversation.
  12. 12. Surveys, feedback forms, warranty & insurance claims.
  13. 13. Office documents, regulatory filings, reports, scientific papers.
  14. 14. And every other sort of document imaginable.</li></ul>Is Search up to the job?<br />
  15. 15. Search results<br />How are the quality, value & authority of search results?<br />Hotel’s opinion<br />Who profits from search?<br />Guest’s opinion - about Priceline<br />
  16. 16. From Web 1.0 to Web 2.0<br />How can we do better?<br />“We have many of the tools in place -- from Web 2.0 technologies…”<br />“The Diverse and Exploding Digital Universe,”<br />(IDC, 2008)<br />
  17. 17. Web 2.0<br />“Web 2.0 is the business revolution in the computer industry caused by the move to the Internet as a platform.” -- Tim O’Reilly, 2004<br />“[A] move from personal websites to blogs and blog site aggregation, from publishing to participation,… an ongoing and interactive process... to links based on tagging.”<br /> -- Terry Flew, “New Media: An Introduction,” 2008<br />Web 2.0 is dynamic, personalized, interactive, collaborative.<br />
  18. 18. From information growth to economicgrowth<br />“We have many of the tools in place -- from Web 2.0 technologies… to unstructured data search software and the Semantic Web -- to tame the digital universe. Done right, we can turn information growth into economic growth.”<br />-- “The Diverse and Exploding Digital Universe,” (IDC, 2008)<br />
  19. 19. Text mining: from information to intelligence<br />Text mining enables smarter search that better responds to user goals, e.g., answers –<br />
  20. 20. From Web 2.0 to Web 3.0<br />For even better findability:<br />“The Semantic Web is a web of data, in some ways like a global database.”<br /> -- Tim Berners-Lee, 1998<br />Web 3.0 is Web 2.0 + the Semantic Web + semantic tools.<br />Recurring themes:<br /><ul><li>Semantically enriched content & search.
  21. 21. Linked Data.
  22. 22. Context sensitive.
  23. 23. Location aware.</li></li></ul><li>The Semantic Web vision<br />&quot;<br />An open-standards architure, coordinated by the W3C (World Wide Web Consortium)<br />Linked Data: “exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web.”<br />
  24. 24. Steps in the right direction…<br />
  25. 25. … and missteps<br />“Kind” = type, variety, not a sentiment.<br />Complete misclassification<br />External reference<br />Unfiltered duplicates<br />
  26. 26. Getting to Web 3.0<br />Text mining / analytics enables Web 3.0 and the Semantic Web.<br /><ul><li>Automated content categorization and classification.
  27. 27. Text augmentation: metadata generation, content tagging.
  28. 28. Information extraction to databases.
  29. 29. Exploratory analysis and visualization.</li></ul>Technical concepts:<br /><ul><li>Linked Data
  30. 30. RDF, SPARQL, OWL
  31. 31. RDFa, Microformats, eRDF</li></li></ul><li>Text mining: users’ perspective<br />I recently published a study report, “Text Analytics 2009: User Perspectives on Solutions and Providers.”<br />I estimated a $350 million global market in 2008, up 40% from 2007.<br />I relayed findings from a survey that asked…<br />
  32. 32. Primary applications<br />What are your primary applications where text comes into play?<br />
  33. 33. Analyzedtextual information<br />What textual information are you analyzing or do you plan to analyze?<br />Current users responded:<br />
  34. 34. Extracted information<br />Do you need (or expect to need) to extract or analyze:<br />
  35. 35. Overall satisfaction<br />Please rate your overall experience – your satisfaction – with text mining.<br />
  36. 36. Movingahead<br />Apply text mining to discover value in content.<br />Develop / improve metadata and taxonomies.<br />Adopt semantic technologies -- for content publishing and for user interactions -- to boost flexibility, findability, and profitability.<br />And understand your audience:<br />“By focusing on the fundamental aspects of the consumers’ online behavior -- not just current best practices -- companies will be better prepared when Web 2.0+ morphs into Web 3.0 and beyond.”<br /> -- Donna L. Hoffman, UC Riverside, in the McKinsey Quarterly<br />
  • khaoulina

    Apr. 27, 2018
  • buyinzah

    May. 12, 2017
  • tuliocastilho7

    Mar. 23, 2016
  • anhldbk

    Mar. 19, 2014
  • kazuakey

    Jul. 13, 2013
  • vmarinou

    Dec. 28, 2012
  • paolotonon

    May. 11, 2012
  • vbrasse

    Apr. 10, 2012
  • davidgraus

    Jan. 29, 2012
  • aincalza

    Jul. 27, 2011
  • fleddermaus

    Jun. 22, 2011
  • kamau06

    Apr. 28, 2011
  • d_leaper

    Aug. 9, 2010
  • sfermigier

    May. 26, 2010
  • kandance

    Jan. 19, 2010

Introduction to Text Mining and Semantics, presented by Seth Grimes at Nstein seminars in London, June 2009, and New York, September 2009.


Vistos totais


No Slideshare


De incorporações


Número de incorporações