SlideShare uma empresa Scribd logo
1 de 35
Duke University Libraries, Digital Scholarship
Text > Data, October 25




HIGH-LEVEL TEXT ANALYSIS
AND TECHNIQUES
Angela Zoss
Data Visualization Coordinator
226 Perkins Library
angela.zoss@duke.edu
DOCUMENTS AS CONTEXT
But first,

ANGELA AS CONTEXT
How I learned to love the
document.
B.A. courses:         Linguistics, Communication

M.S. courses:         Communication, Human-Computer
Interaction

Employment:           arXiv.org Administrator
              • Bibliometrics/Scientometrics
Ph.D.         •
        courses:Computer Mediated Discourse Analysis
              • Latent Structure Analysis
              • Natural Language Processing
Now,

DOCUMENTS AS CONTEXT
Text analysis from…
• documents down to words (“low-level”)
• words up to documents (“high-level”)
Using documents to learn about
language
(or other social phenomena)
Analyzing documents as records/proxies of
language, social structures, events, etc.

Linguistic studies:
morphology, word counts, syntax, etc. …
      over time (e.g., Google ngram viewer)
language across corpora (e.g., political
speeches)

Underwood, T. (2012). Where to start with text mining.
Using documents to learn about
language
  Historical culturomics of pronoun frequencies
Using documents to learn about
language
 Universal properties of mythological networks
Using language to learn about
documents
Analyzing documents as artifacts themselves, with
their own properties and dynamics

Literary, documentary studies:
Structural/rhetorical/stylistic analysis
Document categorization, classification
Detecting clusters of document features (topic
modeling)


Underwood, T. (2012). Where to start with text mining.
Using language to learn about
documents
   Literary Empires, Mapping Temporal and
         Spatial Settings in Swinburne
Using language to learn about
documents
 Using Word Clouds for Topic Modeling Results
What are documents?
For this discussion,
     digital versions of works of
     spoken or written language
Examples:
     books, articles, transcripts, emails, twe
ets…
Documents as context
Documents have:
• form(at)
• style
• provenance
• entities
• intentions
STUDIES OF DOCUMENTS
Why study documents?
• Describe a corpus
• Compare/organize documents
• Locate relevant information/filter out
  irrelevant information
Describing a corpus
• Finding regularities/differences across
  groups of documents
• Developing theories of structure, style, etc.
  that can then be tested or applied
• May be manual (content analysis) or
  computer-assisted (statistical)
Example: Storylines




            http://xkcd.com/657/
Differences of
format, genre, participants…
• Articles may have sections, but these will
  vary by discipline and type of article
• Books may be fiction or non-fiction (or
  both)
• Transcripts may refer to multiple speakers,
  non-text content
• …ad infinitum
Example: Literature
Fingerprinting




 Keim, D. A., & Oelke, D. (2007). Literature fingerprinting: A new method for visual literary analysis. In IEEE
 Symposium on Visual Analytics Science and Technology, VAST 2007 (pp.115-122). doi:
 10.1109/VAST.2007.4389004
Organizing documents
Detect similarity between documents and a
known category (or simply among
themselves)

Supports browsing, sentiment
analysis, authorship detection
Example: Bohemian Bookshelf




Thudt, A., Hinrichs, U., & Carpendale, S. (2012). The Bohemian Bookshelf: Supporting Serendipitous Book
Discoveries through
Information Visualization. In CHI '12: Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems, to appear.
Similarity based on…
• common document attributes
    authorship, genre
• common language patterns
    topics, phrases
• common entity references
    characters, citations
Example: Quantitative
Formalism




Allison, S., Heuser, R., Jockers, M., Moretti, F., & Witmore, M. (2011). Quantitative formalism: An
experiment. Pamphlets of the Stanford Literary Lab (vol. 1).
Example: Clinton’s DNC Speech




                http://b.globe.com/TogUqq
Example: View DHQ




      http://digitalliterature.net/viewDHQ/vis3.html
Classification
• assigning an object to a single class
• often supervised, using an existing
  classification scheme and a tagged corpus
Example: Relative signatures




Jankowska, M., Keselj, V., & Milios, E. (2012). Relative n-gram signatures: Document visualization at the level
of character n-grams. In Proceedings of IEEE Conference on Visual Analytics Science and Technology 2012
(pp. 103-112).
Categorization
• assigning documents to one or more
  categories
• suggestive of unsupervised clustering
  techniques
• design choices made to fit particular tasks
  or goals
Example: UCSD Map of
Science




Börner, K., Klavans, R., Patek, M., Zoss, A. M., Biberstine, J. R., Light, R. P., Larivière, V., &
Boyack, K. W. (2012). Design and update of a classification system: The UCSD Map of Science. PLoS
ONE, 7(7), e39464.
Example: NIH Map Viewer




        https://app.nihmaps.org/nih/browser/
Reference
systems, infrastructure
What do we gain by adding structure?

What do we lose?
SUMMARIZING DOCUMENTS
Text is only one component of a document.

Research questions often push us to be
creative with how we operationalize
constructs.

The richness of language and documents is
best preserved by using
multiple, complementary approaches.
QUESTIONS?
angela.zoss@duke.edu

Mais conteúdo relacionado

Mais procurados

Electronic Literature
Electronic LiteratureElectronic Literature
Electronic LiteratureSiswo Harsono
 
Electronic literature and its place in digital library
Electronic literature and its place in digital libraryElectronic literature and its place in digital library
Electronic literature and its place in digital libraryAlexandr Belov
 
Regional variation of Finnic folksongs
Regional variation of Finnic folksongsRegional variation of Finnic folksongs
Regional variation of Finnic folksongsMari Sarv
 
EngWri 300 (Magneson)
EngWri 300 (Magneson)EngWri 300 (Magneson)
EngWri 300 (Magneson)karlsen
 
More library services
More library servicesMore library services
More library servicesTimothy Tsui
 
Textual analysis for social research
Textual analysis for social researchTextual analysis for social research
Textual analysis for social researchLazarus Gawazah
 
Authorship Without Agency?: Responding to Computer-Generated Texts
Authorship Without Agency?: Responding to Computer-Generated TextsAuthorship Without Agency?: Responding to Computer-Generated Texts
Authorship Without Agency?: Responding to Computer-Generated TextsLeah Henrickson
 
Electronic literature (e lit) in public libraries
Electronic literature (e lit) in public librariesElectronic literature (e lit) in public libraries
Electronic literature (e lit) in public librariesAlexandr Belov
 
Carl Burnett: Searching the Corpus of Contemporary American English
Carl Burnett: Searching the Corpus of Contemporary American EnglishCarl Burnett: Searching the Corpus of Contemporary American English
Carl Burnett: Searching the Corpus of Contemporary American Englishtalnoznisky
 
International journal on natural language computing(ijnlc)
International journal on natural language computing(ijnlc)International journal on natural language computing(ijnlc)
International journal on natural language computing(ijnlc)kevig
 

Mais procurados (10)

Electronic Literature
Electronic LiteratureElectronic Literature
Electronic Literature
 
Electronic literature and its place in digital library
Electronic literature and its place in digital libraryElectronic literature and its place in digital library
Electronic literature and its place in digital library
 
Regional variation of Finnic folksongs
Regional variation of Finnic folksongsRegional variation of Finnic folksongs
Regional variation of Finnic folksongs
 
EngWri 300 (Magneson)
EngWri 300 (Magneson)EngWri 300 (Magneson)
EngWri 300 (Magneson)
 
More library services
More library servicesMore library services
More library services
 
Textual analysis for social research
Textual analysis for social researchTextual analysis for social research
Textual analysis for social research
 
Authorship Without Agency?: Responding to Computer-Generated Texts
Authorship Without Agency?: Responding to Computer-Generated TextsAuthorship Without Agency?: Responding to Computer-Generated Texts
Authorship Without Agency?: Responding to Computer-Generated Texts
 
Electronic literature (e lit) in public libraries
Electronic literature (e lit) in public librariesElectronic literature (e lit) in public libraries
Electronic literature (e lit) in public libraries
 
Carl Burnett: Searching the Corpus of Contemporary American English
Carl Burnett: Searching the Corpus of Contemporary American EnglishCarl Burnett: Searching the Corpus of Contemporary American English
Carl Burnett: Searching the Corpus of Contemporary American English
 
International journal on natural language computing(ijnlc)
International journal on natural language computing(ijnlc)International journal on natural language computing(ijnlc)
International journal on natural language computing(ijnlc)
 

Destaque

Machine Learning and Data Mining: 11 Decision Trees
Machine Learning and Data Mining: 11 Decision TreesMachine Learning and Data Mining: 11 Decision Trees
Machine Learning and Data Mining: 11 Decision TreesPier Luca Lanzi
 
제조/인프라산업의 서비스가상화 적용사례
제조/인프라산업의 서비스가상화 적용사례제조/인프라산업의 서비스가상화 적용사례
제조/인프라산업의 서비스가상화 적용사례Eugene Chung
 
호스트다운사이징 사업 위험 경감 방안으로 활용되는 서비스가상화
호스트다운사이징 사업 위험 경감 방안으로 활용되는 서비스가상화호스트다운사이징 사업 위험 경감 방안으로 활용되는 서비스가상화
호스트다운사이징 사업 위험 경감 방안으로 활용되는 서비스가상화Eugene Chung
 
Big data on 제조 글로벌 제조사 품질 개선 사례-Dhan-kim-2013-7-2
 Big data on 제조 글로벌 제조사 품질 개선 사례-Dhan-kim-2013-7-2 Big data on 제조 글로벌 제조사 품질 개선 사례-Dhan-kim-2013-7-2
Big data on 제조 글로벌 제조사 품질 개선 사례-Dhan-kim-2013-7-2Donghan Kim
 
UX, 세상을 바꾸는 비밀
UX, 세상을 바꾸는 비밀UX, 세상을 바꾸는 비밀
UX, 세상을 바꾸는 비밀Billy Choi
 
TeraStream for ETL
TeraStream for ETLTeraStream for ETL
TeraStream for ETL치민 최
 
Expanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with TajoExpanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with TajoMatthew (정재화)
 
파이썬 Special method 이해하기
파이썬 Special method 이해하기파이썬 Special method 이해하기
파이썬 Special method 이해하기Yong Joon Moon
 
실시간 빅데이터와 머신 데이터
실시간 빅데이터와 머신 데이터실시간 빅데이터와 머신 데이터
실시간 빅데이터와 머신 데이터김 한도
 
비정형 데이터를 기반으로 한 빅데이터 필요기술 및 적용사례
비정형 데이터를 기반으로 한 빅데이터 필요기술 및 적용사례비정형 데이터를 기반으로 한 빅데이터 필요기술 및 적용사례
비정형 데이터를 기반으로 한 빅데이터 필요기술 및 적용사례JeongHeon Lee
 
Machine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationMachine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationPier Luca Lanzi
 
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian ClassifiersMachine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian ClassifiersPier Luca Lanzi
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysisodsc
 

Destaque (14)

Machine Learning and Data Mining: 11 Decision Trees
Machine Learning and Data Mining: 11 Decision TreesMachine Learning and Data Mining: 11 Decision Trees
Machine Learning and Data Mining: 11 Decision Trees
 
제조/인프라산업의 서비스가상화 적용사례
제조/인프라산업의 서비스가상화 적용사례제조/인프라산업의 서비스가상화 적용사례
제조/인프라산업의 서비스가상화 적용사례
 
호스트다운사이징 사업 위험 경감 방안으로 활용되는 서비스가상화
호스트다운사이징 사업 위험 경감 방안으로 활용되는 서비스가상화호스트다운사이징 사업 위험 경감 방안으로 활용되는 서비스가상화
호스트다운사이징 사업 위험 경감 방안으로 활용되는 서비스가상화
 
Big data on 제조 글로벌 제조사 품질 개선 사례-Dhan-kim-2013-7-2
 Big data on 제조 글로벌 제조사 품질 개선 사례-Dhan-kim-2013-7-2 Big data on 제조 글로벌 제조사 품질 개선 사례-Dhan-kim-2013-7-2
Big data on 제조 글로벌 제조사 품질 개선 사례-Dhan-kim-2013-7-2
 
UX, 세상을 바꾸는 비밀
UX, 세상을 바꾸는 비밀UX, 세상을 바꾸는 비밀
UX, 세상을 바꾸는 비밀
 
TeraStream for ETL
TeraStream for ETLTeraStream for ETL
TeraStream for ETL
 
Expanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with TajoExpanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with Tajo
 
파이썬 Special method 이해하기
파이썬 Special method 이해하기파이썬 Special method 이해하기
파이썬 Special method 이해하기
 
실시간 빅데이터와 머신 데이터
실시간 빅데이터와 머신 데이터실시간 빅데이터와 머신 데이터
실시간 빅데이터와 머신 데이터
 
빅데이터 플랫폼 Splunk 6.2 인트로
빅데이터 플랫폼 Splunk 6.2 인트로빅데이터 플랫폼 Splunk 6.2 인트로
빅데이터 플랫폼 Splunk 6.2 인트로
 
비정형 데이터를 기반으로 한 빅데이터 필요기술 및 적용사례
비정형 데이터를 기반으로 한 빅데이터 필요기술 및 적용사례비정형 데이터를 기반으로 한 빅데이터 필요기술 및 적용사례
비정형 데이터를 기반으로 한 빅데이터 필요기술 및 적용사례
 
Machine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationMachine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to Classification
 
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian ClassifiersMachine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
Machine Learning and Data Mining: 13 Nearest Neighbor and Bayesian Classifiers
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 

Semelhante a Zoss High-Level Text Analysis and Techniques

LIWC-ing at Texts for Insights from Linguistic Patterns
LIWC-ing at Texts for Insights from Linguistic PatternsLIWC-ing at Texts for Insights from Linguistic Patterns
LIWC-ing at Texts for Insights from Linguistic PatternsShalin Hai-Jew
 
Content analysis
Content analysisContent analysis
Content analysisdsmjrf
 
Digital Humanities: An Introduction
Digital Humanities: An IntroductionDigital Humanities: An Introduction
Digital Humanities: An IntroductionDilip Barad
 
Slideshare to write essays
Slideshare to write essaysSlideshare to write essays
Slideshare to write essaysClaudia Pisoni
 
Slideshare to write essays
Slideshare to write essaysSlideshare to write essays
Slideshare to write essaysClaudia Pisoni
 
Writing in the disciplines
Writing in the disciplinesWriting in the disciplines
Writing in the disciplinesvlequire
 
Dh2014 e mopcobre-complete
Dh2014 e mopcobre-completeDh2014 e mopcobre-complete
Dh2014 e mopcobre-completeLaura Mandell
 
What is doscourse analysis..
What is doscourse analysis..What is doscourse analysis..
What is doscourse analysis..Katy Chicaiza
 
Referencing mudcd it_id
Referencing mudcd it_idReferencing mudcd it_id
Referencing mudcd it_idlibrarymudc
 
Skeptical Discourse Analysis for non-Linguists
Skeptical Discourse Analysis for non-LinguistsSkeptical Discourse Analysis for non-Linguists
Skeptical Discourse Analysis for non-LinguistsDominik Lukes
 
Rethinking Study Skills Webster.pptx
Rethinking Study Skills Webster.pptxRethinking Study Skills Webster.pptx
Rethinking Study Skills Webster.pptxHelen Webster
 
Sh. tamizrad discourse and genre
Sh. tamizrad  discourse and genreSh. tamizrad  discourse and genre
Sh. tamizrad discourse and genreSheila Rad
 
JTharsen Curriculum Vitae 2016
JTharsen Curriculum Vitae 2016JTharsen Curriculum Vitae 2016
JTharsen Curriculum Vitae 2016Jeffrey Tharsen
 
3. Do you have some idea how you will study your topic?
3. Do you have some idea how you will study your topic?3. Do you have some idea how you will study your topic?
3. Do you have some idea how you will study your topic?DoctoralNet Limited
 
Text & Critical Discourse Analysis
Text & Critical Discourse AnalysisText & Critical Discourse Analysis
Text & Critical Discourse AnalysisLazarus Gawazah
 
Mdst3703 2013-09-17-text-models
Mdst3703 2013-09-17-text-modelsMdst3703 2013-09-17-text-models
Mdst3703 2013-09-17-text-modelsRafael Alvarado
 
Laurel Stvan dh ant_conc 2/27/13
Laurel Stvan dh ant_conc 2/27/13Laurel Stvan dh ant_conc 2/27/13
Laurel Stvan dh ant_conc 2/27/13Jessica C. Murphy
 

Semelhante a Zoss High-Level Text Analysis and Techniques (20)

LIWC-ing at Texts for Insights from Linguistic Patterns
LIWC-ing at Texts for Insights from Linguistic PatternsLIWC-ing at Texts for Insights from Linguistic Patterns
LIWC-ing at Texts for Insights from Linguistic Patterns
 
Content analysis
Content analysisContent analysis
Content analysis
 
Introduction to Nvivo
Introduction to NvivoIntroduction to Nvivo
Introduction to Nvivo
 
Digital Humanities: An Introduction
Digital Humanities: An IntroductionDigital Humanities: An Introduction
Digital Humanities: An Introduction
 
Slideshare to write essays
Slideshare to write essaysSlideshare to write essays
Slideshare to write essays
 
Slideshare to write essays
Slideshare to write essaysSlideshare to write essays
Slideshare to write essays
 
Writing in the disciplines
Writing in the disciplinesWriting in the disciplines
Writing in the disciplines
 
Dh2014 e mopcobre-complete
Dh2014 e mopcobre-completeDh2014 e mopcobre-complete
Dh2014 e mopcobre-complete
 
What is doscourse analysis..
What is doscourse analysis..What is doscourse analysis..
What is doscourse analysis..
 
Citing & referencing
Citing & referencing Citing & referencing
Citing & referencing
 
Referencing mudcd it_id
Referencing mudcd it_idReferencing mudcd it_id
Referencing mudcd it_id
 
Skeptical Discourse Analysis for non-Linguists
Skeptical Discourse Analysis for non-LinguistsSkeptical Discourse Analysis for non-Linguists
Skeptical Discourse Analysis for non-Linguists
 
Rethinking Study Skills Webster.pptx
Rethinking Study Skills Webster.pptxRethinking Study Skills Webster.pptx
Rethinking Study Skills Webster.pptx
 
Sh. tamizrad discourse and genre
Sh. tamizrad  discourse and genreSh. tamizrad  discourse and genre
Sh. tamizrad discourse and genre
 
JTharsen Curriculum Vitae 2016
JTharsen Curriculum Vitae 2016JTharsen Curriculum Vitae 2016
JTharsen Curriculum Vitae 2016
 
3. Do you have some idea how you will study your topic?
3. Do you have some idea how you will study your topic?3. Do you have some idea how you will study your topic?
3. Do you have some idea how you will study your topic?
 
Text & Critical Discourse Analysis
Text & Critical Discourse AnalysisText & Critical Discourse Analysis
Text & Critical Discourse Analysis
 
Literature review
Literature reviewLiterature review
Literature review
 
Mdst3703 2013-09-17-text-models
Mdst3703 2013-09-17-text-modelsMdst3703 2013-09-17-text-models
Mdst3703 2013-09-17-text-models
 
Laurel Stvan dh ant_conc 2/27/13
Laurel Stvan dh ant_conc 2/27/13Laurel Stvan dh ant_conc 2/27/13
Laurel Stvan dh ant_conc 2/27/13
 

Último

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 

Último (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

Zoss High-Level Text Analysis and Techniques

  • 1. Duke University Libraries, Digital Scholarship Text > Data, October 25 HIGH-LEVEL TEXT ANALYSIS AND TECHNIQUES Angela Zoss Data Visualization Coordinator 226 Perkins Library angela.zoss@duke.edu
  • 4. How I learned to love the document. B.A. courses: Linguistics, Communication M.S. courses: Communication, Human-Computer Interaction Employment: arXiv.org Administrator • Bibliometrics/Scientometrics Ph.D. • courses:Computer Mediated Discourse Analysis • Latent Structure Analysis • Natural Language Processing
  • 6. Text analysis from… • documents down to words (“low-level”) • words up to documents (“high-level”)
  • 7. Using documents to learn about language (or other social phenomena) Analyzing documents as records/proxies of language, social structures, events, etc. Linguistic studies: morphology, word counts, syntax, etc. … over time (e.g., Google ngram viewer) language across corpora (e.g., political speeches) Underwood, T. (2012). Where to start with text mining.
  • 8. Using documents to learn about language Historical culturomics of pronoun frequencies
  • 9. Using documents to learn about language Universal properties of mythological networks
  • 10. Using language to learn about documents Analyzing documents as artifacts themselves, with their own properties and dynamics Literary, documentary studies: Structural/rhetorical/stylistic analysis Document categorization, classification Detecting clusters of document features (topic modeling) Underwood, T. (2012). Where to start with text mining.
  • 11. Using language to learn about documents Literary Empires, Mapping Temporal and Spatial Settings in Swinburne
  • 12. Using language to learn about documents Using Word Clouds for Topic Modeling Results
  • 13. What are documents? For this discussion, digital versions of works of spoken or written language Examples: books, articles, transcripts, emails, twe ets…
  • 14. Documents as context Documents have: • form(at) • style • provenance • entities • intentions
  • 16. Why study documents? • Describe a corpus • Compare/organize documents • Locate relevant information/filter out irrelevant information
  • 17. Describing a corpus • Finding regularities/differences across groups of documents • Developing theories of structure, style, etc. that can then be tested or applied • May be manual (content analysis) or computer-assisted (statistical)
  • 18. Example: Storylines http://xkcd.com/657/
  • 19. Differences of format, genre, participants… • Articles may have sections, but these will vary by discipline and type of article • Books may be fiction or non-fiction (or both) • Transcripts may refer to multiple speakers, non-text content • …ad infinitum
  • 20. Example: Literature Fingerprinting Keim, D. A., & Oelke, D. (2007). Literature fingerprinting: A new method for visual literary analysis. In IEEE Symposium on Visual Analytics Science and Technology, VAST 2007 (pp.115-122). doi: 10.1109/VAST.2007.4389004
  • 21. Organizing documents Detect similarity between documents and a known category (or simply among themselves) Supports browsing, sentiment analysis, authorship detection
  • 22. Example: Bohemian Bookshelf Thudt, A., Hinrichs, U., & Carpendale, S. (2012). The Bohemian Bookshelf: Supporting Serendipitous Book Discoveries through Information Visualization. In CHI '12: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, to appear.
  • 23. Similarity based on… • common document attributes authorship, genre • common language patterns topics, phrases • common entity references characters, citations
  • 24. Example: Quantitative Formalism Allison, S., Heuser, R., Jockers, M., Moretti, F., & Witmore, M. (2011). Quantitative formalism: An experiment. Pamphlets of the Stanford Literary Lab (vol. 1).
  • 25. Example: Clinton’s DNC Speech http://b.globe.com/TogUqq
  • 26. Example: View DHQ http://digitalliterature.net/viewDHQ/vis3.html
  • 27. Classification • assigning an object to a single class • often supervised, using an existing classification scheme and a tagged corpus
  • 28. Example: Relative signatures Jankowska, M., Keselj, V., & Milios, E. (2012). Relative n-gram signatures: Document visualization at the level of character n-grams. In Proceedings of IEEE Conference on Visual Analytics Science and Technology 2012 (pp. 103-112).
  • 29. Categorization • assigning documents to one or more categories • suggestive of unsupervised clustering techniques • design choices made to fit particular tasks or goals
  • 30. Example: UCSD Map of Science Börner, K., Klavans, R., Patek, M., Zoss, A. M., Biberstine, J. R., Light, R. P., Larivière, V., & Boyack, K. W. (2012). Design and update of a classification system: The UCSD Map of Science. PLoS ONE, 7(7), e39464.
  • 31. Example: NIH Map Viewer https://app.nihmaps.org/nih/browser/
  • 32. Reference systems, infrastructure What do we gain by adding structure? What do we lose?
  • 34. Text is only one component of a document. Research questions often push us to be creative with how we operationalize constructs. The richness of language and documents is best preserved by using multiple, complementary approaches.

Notas do Editor

  1. why categorize/organize?