SlideShare a Scribd company logo
1 of 32
Download to read offline
ประสบการณ์การวิเคราะห์ข้อมูลด้วย
วิธีการทาเหมืองข้อมูล (Text Mining)



                                               ดร.อลิสา คงทน

                  นักวิจัย ห้องปฏิบัติการวิจัยวิทยาการมนุษยภาษา
           ศูนย์เทคโนโลยีอิเล็กทรอนิกส์และคอมพิวเตอร์แห่งชาติ



                                                                  1
Text Mining is about…



 “Sifting through vast collections of unstructured or
 semistructured data beyond the reach of data mining
 tools, text mining tracks information sources, links isolated
 concepts in distant documents, maps relationships
 between activities, and helps answer questions.”


                                   Tapping the Power of Text Mining
                             Communications of the ACM, Sept. 2006



                                                                      2
Humans VS. Computers
• Humans: Ability to distinguish and apply linguistic patterns to text

   – Could overcome language difficulties such as slangs, spelling
     variations, contextual meaning


• Computers: Ability to process text in large volumes at high speed
   – Could sift through a large collection of texts to find simple statistics
     and relationship among terms in an instant of time


• Text mining requires a combination of both
   Human's linguistic capability + computer's speed and accuracy


               NLP                                   Data Mining
Text Mining Tasks

• Information extraction:
  – Analyze unstructured text and identify key words or
    phrases and relationships within text
• Topic detection and tracking:
  – Filter and present only documents relevant to the user
    profile
• Summarization:
  – Text summarization reduces the content by retaining
    only its main points and overall meaning



                                                             4
Text Mining Tasks

• Categorization:
  – Automatic classify documents into predefined
    categories
• Clustering:
  – Group similar documents based on their similarity
• Concept Linkage
  – Connect related documents by identifying their shared
    concepts, helping users find information they perhaps
    wouldn't have found through traditional search methods



                                                             5
Text Mining Tasks

• Information Visualization
  – Represent documents or information in graphical
    formats for easily browsing, viewing, or searching
• Question and answering (Q&A)
  – Search and extract the best answer to a given question




                                                             6
Applications: Tech Mining

• Tech Mining is the application of text mining
  tools to science and technology (S&T)
  information particularly bibliographic abstracts

• It exploits the S&T databases to see patterns,
  detect associations, and foresee opportunities




                                                     7
Tech Mining Process




                      8
Technical Intelligences:
Who, What, When, Where?
• Digest multiple S&T information resources
• Profile Research Domains:
  –   Who?
  –   What?
  –   When?
  –   Where?
• Map Relationships: Topics & Teams
• Analyze Trends: What’s Hot & What’s Coming
• And do so -- Quickly

                                               9
What if I don’t have Tech
Mining Software?




                            10
What if I don’t have Tech
Mining Software?




                            11
Output example from Tech
Mining Software




Source: A.L. Porter, QTIP: quick technology intelligence processes, Technol. Forecast. Soc. Change 72 (2005)   12
Applications: Expert Finder




                              13
Applications: Expert Finder




                              14
Applications: Expert Finder




                              15
Applications: ABDUL
(Artificial BudDy U Love)

• An online information service which currently provides
  access to Thai linguistic (e.g., dictionary and sentence
  translation) and information resources (e.g., weather
  condition, stock price, gas price, traffic condition, etc.)


• Users are able to use natural language to interact with
  ABDUL via Instant Messaging (IM) based protocol, Web
  browser, and Mobile devices




                                                                16
Applications: ABDUL
(Artificial BudDy U Love)




                            17
Applications: ABDUL
(Artificial BudDy U Love)




                            18
Web 1.0 VS. Web 2.0




                      19
User-Generated Contents

• With the Web 2.0 or social networking websites, the
  amount of user-generated contents has increased
  exponentially


• User-generated contents often contain opinions and/or
  sentiments


• An in-depth analysis of these opinionated texts could
  reveal potentially useful information, e.g.,
  – Preferences of people towards many different topics including news
    events, social issues and commercial products



                                                                         20
Online Opinion Resources
Characteristics of Online
Reviews
• Natural language and unstructured text format

• Some reviews are long and contain only a few
  sentences expressing opinions on the product

• Could be difficult for a potential reader to
  understand and analyze each review that
  maybe relevant to his or her decision making


                                                  22
Opinion Mining

• Opinion mining and sentiment analysis is a task for
  analyzing and summarizing what people think about a
  certain topic


• Opinion mining has gained a lot of interest in text mining
  and NLP communities


• Three granularities of opinion mining:
  – Document level
  – Sentence level
  – Feature level

                                                               23
Feature-Based Opinion Mining

• This approach typically consists of two following
  steps:
      1. Identifying and extracting features of an object,
  topic or event from each sentence
      2. Determining whether the opinions regarding the
  features are positive or negative




                                                             24
Opinion Mining on Hotel Reviews in
Thailand (Graphical Display)




                                     25
Opinion Mining on Hotel Reviews in
Thailand (Textual Display)




                                     26
Comparison among Hotels




                          27
Opinion Mining on Mobile
Network Operators in Thailand




                                28
Opinion Mining on Mobile
Network Operators in Thailand




                                29
Challenges in Text Mining

• Text Mining = NLP + Data Mining
• Statistical NLP
  –   Ambiguity
  –   Context
  –   Tokenization  Sentence Detection
  –   POS tagging
• Data Mining
  – Ability to process the data
  – Massive amounts of data
  – Determining and extracting information of interest

                                                         30
Conclusions

• As the amount of data increases, text-mining
  tools that sift through it will be increasingly
  valuable

• Various applications for academic and industry
  uses




                                                    31
Thank you for your attention


           Q&A



                               32

More Related Content

What's hot (20)

Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Data mining services
Data mining servicesData mining services
Data mining services
 
10.1.1.118.1099
10.1.1.118.109910.1.1.118.1099
10.1.1.118.1099
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in Databases
 
Data mining-2
Data mining-2Data mining-2
Data mining-2
 
Data mining and knowledge Discovery
Data mining and knowledge DiscoveryData mining and knowledge Discovery
Data mining and knowledge Discovery
 
Dwdm
DwdmDwdm
Dwdm
 
Data mining in agriculture
Data mining in agricultureData mining in agriculture
Data mining in agriculture
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
Data Mining
Data MiningData Mining
Data Mining
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
Ir 01
Ir   01Ir   01
Ir 01
 
Introduction to DataMining
Introduction to DataMiningIntroduction to DataMining
Introduction to DataMining
 
Data Mining
Data MiningData Mining
Data Mining
 

Similar to Text Mining : Experience

Auto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingAuto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingShalin Hai-Jew
 
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...WiLS
 
KM - Cognitive Computing overview by Ken Martin 13Apr2016
KM - Cognitive Computing overview by Ken Martin 13Apr2016KM - Cognitive Computing overview by Ken Martin 13Apr2016
KM - Cognitive Computing overview by Ken Martin 13Apr2016HCL Technologies
 
Km cognitive computing overview by ken martin 19 jan2015
Km   cognitive computing overview by ken martin 19 jan2015Km   cognitive computing overview by ken martin 19 jan2015
Km cognitive computing overview by ken martin 19 jan2015HCL Technologies
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Roi Blanco
 
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapNew Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapAxel Bruns
 
Presentation on the Warsaw Conference on National Bibliographies August 2012
Presentation on the Warsaw Conference on National Bibliographies August 2012Presentation on the Warsaw Conference on National Bibliographies August 2012
Presentation on the Warsaw Conference on National Bibliographies August 2012nw13
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎Libcorpio
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...Artificial Intelligence Institute at UofSC
 
Introduction to Information Architecture & Design - 3/21/15
Introduction to Information Architecture & Design - 3/21/15Introduction to Information Architecture & Design - 3/21/15
Introduction to Information Architecture & Design - 3/21/15Robert Stribley
 
Metadata and Taxonomies for More Flexible Information Architecture
Metadata and Taxonomies for More Flexible Information Architecture Metadata and Taxonomies for More Flexible Information Architecture
Metadata and Taxonomies for More Flexible Information Architecture jrhowe
 
Introduction to Information Architecture & Design - 6/20/15
Introduction to Information Architecture & Design - 6/20/15Introduction to Information Architecture & Design - 6/20/15
Introduction to Information Architecture & Design - 6/20/15Robert Stribley
 
Introduction to Information Architecture & Design - SVA Workshop 03/22/14
Introduction to Information Architecture & Design - SVA Workshop 03/22/14Introduction to Information Architecture & Design - SVA Workshop 03/22/14
Introduction to Information Architecture & Design - SVA Workshop 03/22/14Robert Stribley
 
Introduction to Information Architecture & Design - 10/03/15
Introduction to Information Architecture & Design - 10/03/15Introduction to Information Architecture & Design - 10/03/15
Introduction to Information Architecture & Design - 10/03/15Robert Stribley
 
Introduction to Information Architecture & Design - 2/14/15
Introduction to Information Architecture & Design - 2/14/15Introduction to Information Architecture & Design - 2/14/15
Introduction to Information Architecture & Design - 2/14/15Robert Stribley
 

Similar to Text Mining : Experience (20)

C N I20080404
C N I20080404C N I20080404
C N I20080404
 
Torsten Reimer
Torsten ReimerTorsten Reimer
Torsten Reimer
 
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingAuto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Text Mining
Text MiningText Mining
Text Mining
 
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
 
KM - Cognitive Computing overview by Ken Martin 13Apr2016
KM - Cognitive Computing overview by Ken Martin 13Apr2016KM - Cognitive Computing overview by Ken Martin 13Apr2016
KM - Cognitive Computing overview by Ken Martin 13Apr2016
 
Km cognitive computing overview by ken martin 19 jan2015
Km   cognitive computing overview by ken martin 19 jan2015Km   cognitive computing overview by ken martin 19 jan2015
Km cognitive computing overview by ken martin 19 jan2015
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search
 
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapNew Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
 
Presentation on the Warsaw Conference on National Bibliographies August 2012
Presentation on the Warsaw Conference on National Bibliographies August 2012Presentation on the Warsaw Conference on National Bibliographies August 2012
Presentation on the Warsaw Conference on National Bibliographies August 2012
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
 
Introduction to Information Architecture & Design - 3/21/15
Introduction to Information Architecture & Design - 3/21/15Introduction to Information Architecture & Design - 3/21/15
Introduction to Information Architecture & Design - 3/21/15
 
Metadata and Taxonomies for More Flexible Information Architecture
Metadata and Taxonomies for More Flexible Information Architecture Metadata and Taxonomies for More Flexible Information Architecture
Metadata and Taxonomies for More Flexible Information Architecture
 
Introduction to Information Architecture & Design - 6/20/15
Introduction to Information Architecture & Design - 6/20/15Introduction to Information Architecture & Design - 6/20/15
Introduction to Information Architecture & Design - 6/20/15
 
Introduction to Information Architecture & Design - SVA Workshop 03/22/14
Introduction to Information Architecture & Design - SVA Workshop 03/22/14Introduction to Information Architecture & Design - SVA Workshop 03/22/14
Introduction to Information Architecture & Design - SVA Workshop 03/22/14
 
Introduction to Information Architecture & Design - 10/03/15
Introduction to Information Architecture & Design - 10/03/15Introduction to Information Architecture & Design - 10/03/15
Introduction to Information Architecture & Design - 10/03/15
 
Introduction to Information Architecture & Design - 2/14/15
Introduction to Information Architecture & Design - 2/14/15Introduction to Information Architecture & Design - 2/14/15
Introduction to Information Architecture & Design - 2/14/15
 
2014_WWW_BTOR
2014_WWW_BTOR2014_WWW_BTOR
2014_WWW_BTOR
 

More from Boonlert Aroonpiboon (20)

Excel quiz
Excel quizExcel quiz
Excel quiz
 
Scival for Research Performance
Scival for Research PerformanceScival for Research Performance
Scival for Research Performance
 
20190726 icde-session-chularat-nstda-4
20190726 icde-session-chularat-nstda-420190726 icde-session-chularat-nstda-4
20190726 icde-session-chularat-nstda-4
 
20190409 social-media-backup
20190409 social-media-backup20190409 social-media-backup
20190409 social-media-backup
 
20190220 open-library
20190220 open-library20190220 open-library
20190220 open-library
 
20190220 digital-archives
20190220 digital-archives20190220 digital-archives
20190220 digital-archives
 
OER KKU Library
OER KKU LibraryOER KKU Library
OER KKU Library
 
Museum digital-code
Museum digital-codeMuseum digital-code
Museum digital-code
 
OER MOOC - Success Story
OER MOOC - Success StoryOER MOOC - Success Story
OER MOOC - Success Story
 
LAM Code of conduct
LAM Code of conductLAM Code of conduct
LAM Code of conduct
 
RLPD - OER MOOC
RLPD - OER MOOCRLPD - OER MOOC
RLPD - OER MOOC
 
New Technology for Information Services
New Technology for Information ServicesNew Technology for Information Services
New Technology for Information Services
 
New Technology for Information Services
New Technology for Information ServicesNew Technology for Information Services
New Technology for Information Services
 
digital law for GLAM
digital law for GLAMdigital law for GLAM
digital law for GLAM
 
20180919 digital-collections
20180919 digital-collections20180919 digital-collections
20180919 digital-collections
 
Field-Weighted Citation Impact (FWCI)
Field-Weighted Citation Impact (FWCI)Field-Weighted Citation Impact (FWCI)
Field-Weighted Citation Impact (FWCI)
 
20180828 digital-archives
20180828 digital-archives20180828 digital-archives
20180828 digital-archives
 
Local Wisdom Information : How to
Local Wisdom Information : How toLocal Wisdom Information : How to
Local Wisdom Information : How to
 
201403 etda-library-settup
201403 etda-library-settup201403 etda-library-settup
201403 etda-library-settup
 
201403 etda-library
201403 etda-library201403 etda-library
201403 etda-library
 

Recently uploaded

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Text Mining : Experience

  • 1. ประสบการณ์การวิเคราะห์ข้อมูลด้วย วิธีการทาเหมืองข้อมูล (Text Mining) ดร.อลิสา คงทน นักวิจัย ห้องปฏิบัติการวิจัยวิทยาการมนุษยภาษา ศูนย์เทคโนโลยีอิเล็กทรอนิกส์และคอมพิวเตอร์แห่งชาติ 1
  • 2. Text Mining is about… “Sifting through vast collections of unstructured or semistructured data beyond the reach of data mining tools, text mining tracks information sources, links isolated concepts in distant documents, maps relationships between activities, and helps answer questions.” Tapping the Power of Text Mining Communications of the ACM, Sept. 2006 2
  • 3. Humans VS. Computers • Humans: Ability to distinguish and apply linguistic patterns to text – Could overcome language difficulties such as slangs, spelling variations, contextual meaning • Computers: Ability to process text in large volumes at high speed – Could sift through a large collection of texts to find simple statistics and relationship among terms in an instant of time • Text mining requires a combination of both Human's linguistic capability + computer's speed and accuracy NLP Data Mining
  • 4. Text Mining Tasks • Information extraction: – Analyze unstructured text and identify key words or phrases and relationships within text • Topic detection and tracking: – Filter and present only documents relevant to the user profile • Summarization: – Text summarization reduces the content by retaining only its main points and overall meaning 4
  • 5. Text Mining Tasks • Categorization: – Automatic classify documents into predefined categories • Clustering: – Group similar documents based on their similarity • Concept Linkage – Connect related documents by identifying their shared concepts, helping users find information they perhaps wouldn't have found through traditional search methods 5
  • 6. Text Mining Tasks • Information Visualization – Represent documents or information in graphical formats for easily browsing, viewing, or searching • Question and answering (Q&A) – Search and extract the best answer to a given question 6
  • 7. Applications: Tech Mining • Tech Mining is the application of text mining tools to science and technology (S&T) information particularly bibliographic abstracts • It exploits the S&T databases to see patterns, detect associations, and foresee opportunities 7
  • 9. Technical Intelligences: Who, What, When, Where? • Digest multiple S&T information resources • Profile Research Domains: – Who? – What? – When? – Where? • Map Relationships: Topics & Teams • Analyze Trends: What’s Hot & What’s Coming • And do so -- Quickly 9
  • 10. What if I don’t have Tech Mining Software? 10
  • 11. What if I don’t have Tech Mining Software? 11
  • 12. Output example from Tech Mining Software Source: A.L. Porter, QTIP: quick technology intelligence processes, Technol. Forecast. Soc. Change 72 (2005) 12
  • 16. Applications: ABDUL (Artificial BudDy U Love) • An online information service which currently provides access to Thai linguistic (e.g., dictionary and sentence translation) and information resources (e.g., weather condition, stock price, gas price, traffic condition, etc.) • Users are able to use natural language to interact with ABDUL via Instant Messaging (IM) based protocol, Web browser, and Mobile devices 16
  • 19. Web 1.0 VS. Web 2.0 19
  • 20. User-Generated Contents • With the Web 2.0 or social networking websites, the amount of user-generated contents has increased exponentially • User-generated contents often contain opinions and/or sentiments • An in-depth analysis of these opinionated texts could reveal potentially useful information, e.g., – Preferences of people towards many different topics including news events, social issues and commercial products 20
  • 22. Characteristics of Online Reviews • Natural language and unstructured text format • Some reviews are long and contain only a few sentences expressing opinions on the product • Could be difficult for a potential reader to understand and analyze each review that maybe relevant to his or her decision making 22
  • 23. Opinion Mining • Opinion mining and sentiment analysis is a task for analyzing and summarizing what people think about a certain topic • Opinion mining has gained a lot of interest in text mining and NLP communities • Three granularities of opinion mining: – Document level – Sentence level – Feature level 23
  • 24. Feature-Based Opinion Mining • This approach typically consists of two following steps: 1. Identifying and extracting features of an object, topic or event from each sentence 2. Determining whether the opinions regarding the features are positive or negative 24
  • 25. Opinion Mining on Hotel Reviews in Thailand (Graphical Display) 25
  • 26. Opinion Mining on Hotel Reviews in Thailand (Textual Display) 26
  • 28. Opinion Mining on Mobile Network Operators in Thailand 28
  • 29. Opinion Mining on Mobile Network Operators in Thailand 29
  • 30. Challenges in Text Mining • Text Mining = NLP + Data Mining • Statistical NLP – Ambiguity – Context – Tokenization Sentence Detection – POS tagging • Data Mining – Ability to process the data – Massive amounts of data – Determining and extracting information of interest 30
  • 31. Conclusions • As the amount of data increases, text-mining tools that sift through it will be increasingly valuable • Various applications for academic and industry uses 31
  • 32. Thank you for your attention Q&A 32