O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Project IDI PPT

  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Project IDI PPT

  1. 1. Project IDI David I Widjaja
  2. 2. Steps  Data Extraction  Tagging  Correlation  Web Scraping  Comparison  Documentation
  3. 3. Data Extraction  How to get the data?  Input from database  Input manually  Data type:  Topics that is made of strings
  4. 4. Tagging  Prerequisite:  Topic Sentences (Subject)  Dictionary (Tags)
  5. 5. Dictionary  How to create tags: 1. Get all topic sentences and split them between white space 2. Convert all words into lower case 3. Delete all numeric and duplicate values 4. Sort words alphabetically 5. Delete unnecessary words (e.g. is, the, and, etc.) 6. Search for synonym words and cluster them into a single tag 7. Translate words if necessary 8. Insert tags into main spreadsheet
  6. 6. Correlation  A weighted graph map is used:  The larger the amount of word associated with the tag, the bigger the bubble.  Lines get thicker according to the number of relationship between topics.
  7. 7. Web Scraping  Web Scraping on other similar websites  Take the topic sentences to be in the subject columns. Examples:  Article Titles  Comments  Etc.  Copy to previous spreadsheet (The one with the pervious tags).
  8. 8. Correlation  Do the same process as before on the weighted graph map
  9. 9. Comparison  Compare the two weighted graph maps
  10. 10. Word Cloud  Generate Word Cloud using Python or online tools. e.g.
  11. 11. Tools  Microsoft Excel 2013 (Spreadsheet)  Mozilla Firefox (Browser)  Inspect Element (Search Patterns)  DownThemAll (Download HTMLs)  Total Commander (Merge HTMLs)  Notepad++ (Cleanse Data)

×