SlideShare uma empresa Scribd logo
1 de 18
RapidMiner5 2.9 - Word vector tool and RapidMiner
Word Vector tool The Word & Web Vector Tool is a flexible Java library for statistical language modeling and integration of Web and Webservice based data sources.  It supports the creation of word vector representations of text documents in the vector space model that is the point of departure for many text processing applications .
Installation 1.	Download the archive form wvtoolsourceforge website.
Installation 2. Putting it into lib/plugins directory of your RapidMiner installation, example: D:rogram Filesapid-IapidMiner5iblugins
Word Vector tool The aim of the WVTool is to provide a simple to use, simple to extend pure Java library for text and webmining. It can easily be invoked from any Java application.
Word Vector tool WVTool bridges a gap between highly sophisticated linguistic packages as the GATE system on the one side and many partial solutions that are part of diverse text and information retrieval applications on the other side.
Functions
Word List A word list contains all terms used for vectorization together with some statistics  (e.g. in how many documents a term appears). The word list is needed for vectorization to define which terms are considered as dimensions of the vector space and for weighting purposes.
WVtool functions Input list that tells the system which text documents to process WVTool Function  Inputs A configuration object, that tells the system which methods to use in the individual steps.
Defining the input The input list tells the WVTool which texts should be processed. Every item in the list contains the following information: A URI  The language the document is written in (optional) ˆ The type of the document (optional) ˆ The character encoding of the document, e.g. UTF-8 (optional) ˆ A class label
Using Predefined Word Lists In some cases it is necessary to exactly define the dimensions of the vector space, yet leaving the counting of terms and documents to the WVTool . This can be achieved by calling the word list creation function with a list of String values.
Text Input The TextInput operator creates an ExampleSet from a collection of texts. The output ExampleSet contains one row for each text document and one column of each term.
Text Classification, Clustering and Visualization For text classification, the class labels (e.g. positive, negative) are defined in the TextInput operator, as described above. Using clustering or dimensionality reduction, there is a possibility to directly visualize text documents from the RapidMiner Visualization panel.
Creating and Maintaining Word Lists Creating an Initial Word List: An initial word list can be created by using the following chain of operators:
Creating and Maintaining Word Lists Applying a Word List:  You can apply a word list in two ways:  To use the actual weights, first create word vectors using the TextInput Operator and then use the AttributeWeightsLoader and AttributesWeightsApplier on the resulting ExampleSet.
Creating and Maintaining Word Lists Applying a Word List:  You can apply a word list in two ways:  2.  To use the word list only as a selection of relevant terms and leave it to the TextInput to actually weight them, use the AttributeWeightsLoader before. The TextInput will create vectors that contain as dimensions only terms in the word list, that have a weight larger than zero.
Creating and Maintaining Word Lists Updating a Word List :  If you add new documents to your corpus, usually additional terms will be relevant and should be added to the word list. After the InteractiveAttributeWeighting operator pops up, use the load function to load your original word list.
More Questions? Reach us at support@dataminingtools.net Visit: www.dataminingtools.net

Mais conteúdo relacionado

Destaque (20)

Clustering
ClusteringClustering
Clustering
 
Introduction to Text Classification with RapidMiner Studio 7
Introduction to Text Classification with RapidMiner Studio 7Introduction to Text Classification with RapidMiner Studio 7
Introduction to Text Classification with RapidMiner Studio 7
 
RapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid MinerRapidMiner: Data Mining And Rapid Miner
RapidMiner: Data Mining And Rapid Miner
 
RapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid MinerRapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid Miner
 
Data Applied:Forecast
Data Applied:ForecastData Applied:Forecast
Data Applied:Forecast
 
Welcome
WelcomeWelcome
Welcome
 
2008 IEDM presentation
2008 IEDM presentation2008 IEDM presentation
2008 IEDM presentation
 
LISP:Predicates in lisp
LISP:Predicates in lispLISP:Predicates in lisp
LISP:Predicates in lisp
 
Test
TestTest
Test
 
Quick Look At Classification
Quick Look At ClassificationQuick Look At Classification
Quick Look At Classification
 
Data Applied: Similarity
Data Applied: SimilarityData Applied: Similarity
Data Applied: Similarity
 
Anime
AnimeAnime
Anime
 
SPSS: Data Editor
SPSS: Data EditorSPSS: Data Editor
SPSS: Data Editor
 
Control Statements in Matlab
Control Statements in  MatlabControl Statements in  Matlab
Control Statements in Matlab
 
Ireland Apo University Fy 10 Tibbs Slideshare
Ireland Apo University Fy 10 Tibbs SlideshareIreland Apo University Fy 10 Tibbs Slideshare
Ireland Apo University Fy 10 Tibbs Slideshare
 
Data-Applied: Technology Insights
Data-Applied: Technology InsightsData-Applied: Technology Insights
Data-Applied: Technology Insights
 
Traffic Skills, Parent & Kids Intro
Traffic Skills, Parent & Kids IntroTraffic Skills, Parent & Kids Intro
Traffic Skills, Parent & Kids Intro
 
XL-Miner: Timeseries
XL-Miner: TimeseriesXL-Miner: Timeseries
XL-Miner: Timeseries
 
Apresentação Red Advisers
Apresentação Red AdvisersApresentação Red Advisers
Apresentação Red Advisers
 
Txomin Hartz Txikia
Txomin Hartz TxikiaTxomin Hartz Txikia
Txomin Hartz Txikia
 

Semelhante a RapidMiner: Word Vector Tool And Rapid Miner

List of values Best Practices
List of values Best PracticesList of values Best Practices
List of values Best PracticesAd Ghauri
 
Team G
Team GTeam G
Team Gbutest
 
MS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningMS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningDataminingTools Inc
 
MS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningMS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningsqlserver content
 
Ant conc ~design & development of a freeware
Ant conc ~design & development of a freewareAnt conc ~design & development of a freeware
Ant conc ~design & development of a freewaresarahannelazarus
 
20131112 Introduction to LaTeX for EndNote Users.docx
20131112 Introduction to LaTeX for EndNote Users.docx20131112 Introduction to LaTeX for EndNote Users.docx
20131112 Introduction to LaTeX for EndNote Users.docxNTUSubjectRooms
 
Programming Without Coding Technology (PWCT) Environment
Programming Without Coding Technology (PWCT) EnvironmentProgramming Without Coding Technology (PWCT) Environment
Programming Without Coding Technology (PWCT) EnvironmentMahmoud Samir Fayed
 
Improving writing aids, the community way
Improving writing aids, the community wayImproving writing aids, the community way
Improving writing aids, the community wayAlexandro Colorado
 
Bootcamp - Web Development Session 2
Bootcamp - Web Development Session 2Bootcamp - Web Development Session 2
Bootcamp - Web Development Session 2GDSCUniversitasMatan
 
Intro To Flex Typography 360|Flex
Intro To Flex Typography 360|FlexIntro To Flex Typography 360|Flex
Intro To Flex Typography 360|FlexMatt Guest
 
Presentation kaushal
Presentation kaushalPresentation kaushal
Presentation kaushalAjay Yadav
 
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdfInstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdfarsmobiles
 
What is html xml and xhtml
What is html xml and xhtmlWhat is html xml and xhtml
What is html xml and xhtmlFkdiMl
 

Semelhante a RapidMiner: Word Vector Tool And Rapid Miner (20)

List of values Best Practices
List of values Best PracticesList of values Best Practices
List of values Best Practices
 
Team G
Team GTeam G
Team G
 
MS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningMS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data mining
 
MS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningMS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data mining
 
Ant conc notes
Ant conc notesAnt conc notes
Ant conc notes
 
I x scripting
I x scriptingI x scripting
I x scripting
 
 
Project seminar
Project seminarProject seminar
Project seminar
 
Ant conc ~design & development of a freeware
Ant conc ~design & development of a freewareAnt conc ~design & development of a freeware
Ant conc ~design & development of a freeware
 
20131112 Introduction to LaTeX for EndNote Users.docx
20131112 Introduction to LaTeX for EndNote Users.docx20131112 Introduction to LaTeX for EndNote Users.docx
20131112 Introduction to LaTeX for EndNote Users.docx
 
Olap
OlapOlap
Olap
 
Programming Without Coding Technology (PWCT) Environment
Programming Without Coding Technology (PWCT) EnvironmentProgramming Without Coding Technology (PWCT) Environment
Programming Without Coding Technology (PWCT) Environment
 
Improving writing aids, the community way
Improving writing aids, the community wayImproving writing aids, the community way
Improving writing aids, the community way
 
Bootcamp - Web Development Session 2
Bootcamp - Web Development Session 2Bootcamp - Web Development Session 2
Bootcamp - Web Development Session 2
 
Intro To Flex Typography 360|Flex
Intro To Flex Typography 360|FlexIntro To Flex Typography 360|Flex
Intro To Flex Typography 360|Flex
 
Presentation kaushal
Presentation kaushalPresentation kaushal
Presentation kaushal
 
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdfInstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
InstructionYou’ll probably want to import FileReader, PrintWriter,.pdf
 
What is html xml and xhtml
What is html xml and xhtmlWhat is html xml and xhtml
What is html xml and xhtml
 
PDFArticle
PDFArticlePDFArticle
PDFArticle
 
Robot framework
Robot frameworkRobot framework
Robot framework
 

Mais de DataminingTools Inc

AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceDataminingTools Inc
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web miningDataminingTools Inc
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDataminingTools Inc
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDataminingTools Inc
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDataminingTools Inc
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technologyDataminingTools Inc
 

Mais de DataminingTools Inc (20)

Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Techniques Machine Learning
Techniques Machine LearningTechniques Machine Learning
Techniques Machine Learning
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Areas of machine leanring
Areas of machine leanringAreas of machine leanring
Areas of machine leanring
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
AI: Learning in AI 2
AI: Learning in AI 2AI: Learning in AI 2
AI: Learning in AI 2
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 

Último

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Último (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

RapidMiner: Word Vector Tool And Rapid Miner

  • 1. RapidMiner5 2.9 - Word vector tool and RapidMiner
  • 2. Word Vector tool The Word & Web Vector Tool is a flexible Java library for statistical language modeling and integration of Web and Webservice based data sources. It supports the creation of word vector representations of text documents in the vector space model that is the point of departure for many text processing applications .
  • 3. Installation 1. Download the archive form wvtoolsourceforge website.
  • 4. Installation 2. Putting it into lib/plugins directory of your RapidMiner installation, example: D:rogram Filesapid-IapidMiner5iblugins
  • 5. Word Vector tool The aim of the WVTool is to provide a simple to use, simple to extend pure Java library for text and webmining. It can easily be invoked from any Java application.
  • 6. Word Vector tool WVTool bridges a gap between highly sophisticated linguistic packages as the GATE system on the one side and many partial solutions that are part of diverse text and information retrieval applications on the other side.
  • 8. Word List A word list contains all terms used for vectorization together with some statistics (e.g. in how many documents a term appears). The word list is needed for vectorization to define which terms are considered as dimensions of the vector space and for weighting purposes.
  • 9. WVtool functions Input list that tells the system which text documents to process WVTool Function Inputs A configuration object, that tells the system which methods to use in the individual steps.
  • 10. Defining the input The input list tells the WVTool which texts should be processed. Every item in the list contains the following information: A URI The language the document is written in (optional) ˆ The type of the document (optional) ˆ The character encoding of the document, e.g. UTF-8 (optional) ˆ A class label
  • 11. Using Predefined Word Lists In some cases it is necessary to exactly define the dimensions of the vector space, yet leaving the counting of terms and documents to the WVTool . This can be achieved by calling the word list creation function with a list of String values.
  • 12. Text Input The TextInput operator creates an ExampleSet from a collection of texts. The output ExampleSet contains one row for each text document and one column of each term.
  • 13. Text Classification, Clustering and Visualization For text classification, the class labels (e.g. positive, negative) are defined in the TextInput operator, as described above. Using clustering or dimensionality reduction, there is a possibility to directly visualize text documents from the RapidMiner Visualization panel.
  • 14. Creating and Maintaining Word Lists Creating an Initial Word List: An initial word list can be created by using the following chain of operators:
  • 15. Creating and Maintaining Word Lists Applying a Word List: You can apply a word list in two ways: To use the actual weights, first create word vectors using the TextInput Operator and then use the AttributeWeightsLoader and AttributesWeightsApplier on the resulting ExampleSet.
  • 16. Creating and Maintaining Word Lists Applying a Word List: You can apply a word list in two ways: 2. To use the word list only as a selection of relevant terms and leave it to the TextInput to actually weight them, use the AttributeWeightsLoader before. The TextInput will create vectors that contain as dimensions only terms in the word list, that have a weight larger than zero.
  • 17. Creating and Maintaining Word Lists Updating a Word List : If you add new documents to your corpus, usually additional terms will be relevant and should be added to the word list. After the InteractiveAttributeWeighting operator pops up, use the load function to load your original word list.
  • 18. More Questions? Reach us at support@dataminingtools.net Visit: www.dataminingtools.net