SlideShare a Scribd company logo
1 of 21
Lecture/Studio:
  Culturomics

  Prof. Alvarado
MDST 3703/7703
1 November 2012
Business

• Everyone’s families and friends OK?
Review

• The New Epistemology
  – Rise of Big Data: massive, available, social
  – Shifts our relationship to primary sources
  – From reading to quantitative methods and
    visualizations
  – Example of media determinism
• Manovich
  – Consistent with database logic
  – Applies spirit of Big Data methods to art
Review

• Rationalization Effects
  – What are we looking at?
  – What is theory?
  – What are models?
  – What is culture?
  – What are the humanities?
Overview

• Combined Studio and Lecture
• Lecture
  – Google’s NGram Viewer
  – Culturomics
• Studio:
  – Collaborative Topic Index
Google Does the Humanities
Google NGrams

• Google Books comprises 11% of the corpus of
  published books, about 2 trillion words
• NGrams uses 5.2 million books (4% of the
  corpus)
• 500 billion words
• Published between 1500-1800
• In English, French, Spanish, German, Chinese
  and Russian (Hebrew too)
Erez Lieberman Aiden and Jean-Baptiste Michel
What’s an NGram?
A space-delimited string

 N = number of strings

     Case sensitive
    Purely syntactic

  Very hard to index
Culturomics

• A method more than a model (like Anderson
  argues)
• Analogy is to genomics
  – Does this make sense?
  – What is the analog to the gene?
Parallel




 Crossing




     Convergent/Divergent
American




British
“There’s not even a historian of the book
 connected to the project,” Mr. Menand
                 noted.
Anthony Grafton, History, Princeton
Studio

• We are now at the point where we have all the
  pieces in place
  – HTML markup, CSS, JavaScript
  – Structured data (table in Google Docs)
  – Visualization tools
• Create Character Index
  – We will use everything we have done so far – notes,
    network visualizations, etc.
  – Today we begin to collaboratively create the Character
    Index (a subset of a full topic index)

More Related Content

Viewers also liked

Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-dataMdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-dataRafael Alvarado
 
Mdst3703 2013-09-17-text-models
Mdst3703 2013-09-17-text-modelsMdst3703 2013-09-17-text-models
Mdst3703 2013-09-17-text-modelsRafael Alvarado
 
Mdst3703 2013-08-29-hello-world
Mdst3703 2013-08-29-hello-worldMdst3703 2013-08-29-hello-world
Mdst3703 2013-08-29-hello-worldRafael Alvarado
 
UVA MDST 3703 2012-08-28
UVA MDST 3703 2012-08-28UVA MDST 3703 2012-08-28
UVA MDST 3703 2012-08-28Rafael Alvarado
 
Mdst3703 2013-09-10-textual-signals
Mdst3703 2013-09-10-textual-signalsMdst3703 2013-09-10-textual-signals
Mdst3703 2013-09-10-textual-signalsRafael Alvarado
 
UVA MDST 3703 Thematic Research Collections 2012-09-18
UVA MDST 3703 Thematic Research Collections 2012-09-18UVA MDST 3703 Thematic Research Collections 2012-09-18
UVA MDST 3703 Thematic Research Collections 2012-09-18Rafael Alvarado
 

Viewers also liked (11)

MDST 3703 F10 Studio 9
MDST 3703 F10 Studio 9MDST 3703 F10 Studio 9
MDST 3703 F10 Studio 9
 
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-dataMdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
 
MDST 3270 F10 Seminar 9
MDST 3270 F10 Seminar 9MDST 3270 F10 Seminar 9
MDST 3270 F10 Seminar 9
 
Mdst3703 2013-09-17-text-models
Mdst3703 2013-09-17-text-modelsMdst3703 2013-09-17-text-models
Mdst3703 2013-09-17-text-models
 
Mdst3703 2013-08-29-hello-world
Mdst3703 2013-08-29-hello-worldMdst3703 2013-08-29-hello-world
Mdst3703 2013-08-29-hello-world
 
UVA MDST 3703 2012-08-28
UVA MDST 3703 2012-08-28UVA MDST 3703 2012-08-28
UVA MDST 3703 2012-08-28
 
MDST 3703 F10 Seminar 4
MDST 3703 F10 Seminar 4MDST 3703 F10 Seminar 4
MDST 3703 F10 Seminar 4
 
Mdst 3559-02-17-php2
Mdst 3559-02-17-php2Mdst 3559-02-17-php2
Mdst 3559-02-17-php2
 
MDST 3703 F10 Seminar 7
MDST 3703 F10 Seminar 7MDST 3703 F10 Seminar 7
MDST 3703 F10 Seminar 7
 
Mdst3703 2013-09-10-textual-signals
Mdst3703 2013-09-10-textual-signalsMdst3703 2013-09-10-textual-signals
Mdst3703 2013-09-10-textual-signals
 
UVA MDST 3703 Thematic Research Collections 2012-09-18
UVA MDST 3703 Thematic Research Collections 2012-09-18UVA MDST 3703 Thematic Research Collections 2012-09-18
UVA MDST 3703 Thematic Research Collections 2012-09-18
 

Similar to Mdst3703 culturomics-2012-11-01

How the Semantic Web is transforming information access
How the Semantic Web is transforming information accessHow the Semantic Web is transforming information access
How the Semantic Web is transforming information accessGuus Schreiber
 
Discourse Or Document? Issues of adopting Emerging Digital Genres for Scholar...
Discourse Or Document? Issues of adopting Emerging Digital Genres for Scholar...Discourse Or Document? Issues of adopting Emerging Digital Genres for Scholar...
Discourse Or Document? Issues of adopting Emerging Digital Genres for Scholar...Cornelius Puschmann
 
Research in language and literature, karpagam university, conference ppt
Research in language and literature, karpagam university, conference pptResearch in language and literature, karpagam university, conference ppt
Research in language and literature, karpagam university, conference pptvijay kumar
 
Free the Patterns! The Vital Challenge to the Pattern Community
Free the Patterns! The Vital Challenge to the Pattern CommunityFree the Patterns! The Vital Challenge to the Pattern Community
Free the Patterns! The Vital Challenge to the Pattern CommunityDouglas Schuler
 
RMTC Hub Presentation (P. Holmes)
RMTC Hub Presentation (P. Holmes)RMTC Hub Presentation (P. Holmes)
RMTC Hub Presentation (P. Holmes)RMBorders
 
Advances in word sense disambiguation
Advances in word sense disambiguationAdvances in word sense disambiguation
Advances in word sense disambiguationVictor Sianghio II
 
Digital Humanities: An Introduction
Digital Humanities: An IntroductionDigital Humanities: An Introduction
Digital Humanities: An IntroductionDilip Barad
 
Researching multilingually exploring emerging linguistic practices in migrant...
Researching multilingually exploring emerging linguistic practices in migrant...Researching multilingually exploring emerging linguistic practices in migrant...
Researching multilingually exploring emerging linguistic practices in migrant...RMBorders
 
2013 RBMS Premodern manuscript application profile presentation
2013 RBMS Premodern manuscript application profile presentation2013 RBMS Premodern manuscript application profile presentation
2013 RBMS Premodern manuscript application profile presentationssteuer
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
DH Tools Workshop #1: Text Analysis
DH Tools Workshop #1:  Text AnalysisDH Tools Workshop #1:  Text Analysis
DH Tools Workshop #1: Text Analysiscjbuckner
 
Adaptation or Shaping the Field: the Next Phase of Digital History
Adaptation or Shaping the Field: the Next Phase of Digital HistoryAdaptation or Shaping the Field: the Next Phase of Digital History
Adaptation or Shaping the Field: the Next Phase of Digital HistorySharon Leon
 
Bridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital TextBridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital TextBeth Plale
 
Finding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital libraryFinding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital libraryWilliam Ulate
 
Researching language/languaging in contexts of pain and pressure: perspective...
Researching language/languaging in contexts of pain and pressure: perspective...Researching language/languaging in contexts of pain and pressure: perspective...
Researching language/languaging in contexts of pain and pressure: perspective...RMBorders
 
Studying young people’s online social practices
Studying young people’s online social practicesStudying young people’s online social practices
Studying young people’s online social practicesMalene Charlotte Larsen
 
Malina lafayette 2019 colloquium
Malina lafayette 2019 colloquiumMalina lafayette 2019 colloquium
Malina lafayette 2019 colloquiumroger malina
 
Research trends in language linguistics and literature
Research trends in language linguistics and literatureResearch trends in language linguistics and literature
Research trends in language linguistics and literaturevijay kumar
 
ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...
ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...
ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...eswcsummerschool
 

Similar to Mdst3703 culturomics-2012-11-01 (20)

How the Semantic Web is transforming information access
How the Semantic Web is transforming information accessHow the Semantic Web is transforming information access
How the Semantic Web is transforming information access
 
Discourse Or Document? Issues of adopting Emerging Digital Genres for Scholar...
Discourse Or Document? Issues of adopting Emerging Digital Genres for Scholar...Discourse Or Document? Issues of adopting Emerging Digital Genres for Scholar...
Discourse Or Document? Issues of adopting Emerging Digital Genres for Scholar...
 
Research in language and literature, karpagam university, conference ppt
Research in language and literature, karpagam university, conference pptResearch in language and literature, karpagam university, conference ppt
Research in language and literature, karpagam university, conference ppt
 
Qualitative Data Analysis I: Text Analysis
Qualitative Data Analysis I: Text AnalysisQualitative Data Analysis I: Text Analysis
Qualitative Data Analysis I: Text Analysis
 
Free the Patterns! The Vital Challenge to the Pattern Community
Free the Patterns! The Vital Challenge to the Pattern CommunityFree the Patterns! The Vital Challenge to the Pattern Community
Free the Patterns! The Vital Challenge to the Pattern Community
 
RMTC Hub Presentation (P. Holmes)
RMTC Hub Presentation (P. Holmes)RMTC Hub Presentation (P. Holmes)
RMTC Hub Presentation (P. Holmes)
 
Advances in word sense disambiguation
Advances in word sense disambiguationAdvances in word sense disambiguation
Advances in word sense disambiguation
 
Digital Humanities: An Introduction
Digital Humanities: An IntroductionDigital Humanities: An Introduction
Digital Humanities: An Introduction
 
Researching multilingually exploring emerging linguistic practices in migrant...
Researching multilingually exploring emerging linguistic practices in migrant...Researching multilingually exploring emerging linguistic practices in migrant...
Researching multilingually exploring emerging linguistic practices in migrant...
 
2013 RBMS Premodern manuscript application profile presentation
2013 RBMS Premodern manuscript application profile presentation2013 RBMS Premodern manuscript application profile presentation
2013 RBMS Premodern manuscript application profile presentation
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
DH Tools Workshop #1: Text Analysis
DH Tools Workshop #1:  Text AnalysisDH Tools Workshop #1:  Text Analysis
DH Tools Workshop #1: Text Analysis
 
Adaptation or Shaping the Field: the Next Phase of Digital History
Adaptation or Shaping the Field: the Next Phase of Digital HistoryAdaptation or Shaping the Field: the Next Phase of Digital History
Adaptation or Shaping the Field: the Next Phase of Digital History
 
Bridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital TextBridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital Text
 
Finding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital libraryFinding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital library
 
Researching language/languaging in contexts of pain and pressure: perspective...
Researching language/languaging in contexts of pain and pressure: perspective...Researching language/languaging in contexts of pain and pressure: perspective...
Researching language/languaging in contexts of pain and pressure: perspective...
 
Studying young people’s online social practices
Studying young people’s online social practicesStudying young people’s online social practices
Studying young people’s online social practices
 
Malina lafayette 2019 colloquium
Malina lafayette 2019 colloquiumMalina lafayette 2019 colloquium
Malina lafayette 2019 colloquium
 
Research trends in language linguistics and literature
Research trends in language linguistics and literatureResearch trends in language linguistics and literature
Research trends in language linguistics and literature
 
ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...
ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...
ESWC SS 2013 - Wednesday Tutorial Elena Simperl: Creating and Using Ontologie...
 

More from Rafael Alvarado

Mdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collectionsMdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collectionsRafael Alvarado
 
Mdst3703 2013-10-01-hypertext-and-history
Mdst3703 2013-10-01-hypertext-and-historyMdst3703 2013-10-01-hypertext-and-history
Mdst3703 2013-10-01-hypertext-and-historyRafael Alvarado
 
Mdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertextMdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertextRafael Alvarado
 
Mdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-htmlMdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-htmlRafael Alvarado
 
Mdst3703 2013-09-05-studio2
Mdst3703 2013-09-05-studio2Mdst3703 2013-09-05-studio2
Mdst3703 2013-09-05-studio2Rafael Alvarado
 
Mdst3703 2013-09-03-plato2
Mdst3703 2013-09-03-plato2Mdst3703 2013-09-03-plato2
Mdst3703 2013-09-03-plato2Rafael Alvarado
 
UVA MDST 3703 2013 08-27 Introduction
UVA MDST 3703 2013 08-27 IntroductionUVA MDST 3703 2013 08-27 Introduction
UVA MDST 3703 2013 08-27 IntroductionRafael Alvarado
 
MDST 3705 2012-03-05 Databases to Visualization
MDST 3705 2012-03-05 Databases to VisualizationMDST 3705 2012-03-05 Databases to Visualization
MDST 3705 2012-03-05 Databases to VisualizationRafael Alvarado
 
Mdst3705 2013-02-26-db-as-genre
Mdst3705 2013-02-26-db-as-genreMdst3705 2013-02-26-db-as-genre
Mdst3705 2013-02-26-db-as-genreRafael Alvarado
 
Mdst3705 2013-02-12-finding-data
Mdst3705 2013-02-12-finding-dataMdst3705 2013-02-12-finding-data
Mdst3705 2013-02-12-finding-dataRafael Alvarado
 
Mdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databasesMdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databasesRafael Alvarado
 
Mdst3705 2013-01-29-praxis
Mdst3705 2013-01-29-praxisMdst3705 2013-01-29-praxis
Mdst3705 2013-01-29-praxisRafael Alvarado
 
Mdst3705 2013-01-31-php3
Mdst3705 2013-01-31-php3Mdst3705 2013-01-31-php3
Mdst3705 2013-01-31-php3Rafael Alvarado
 
Mdst3705 2013-01-24-php2
Mdst3705 2013-01-24-php2Mdst3705 2013-01-24-php2
Mdst3705 2013-01-24-php2Rafael Alvarado
 
Mdst3705 2012-01-15-introduction
Mdst3705 2012-01-15-introductionMdst3705 2012-01-15-introduction
Mdst3705 2012-01-15-introductionRafael Alvarado
 
Mdst3703 graph-theory-11-20-2012
Mdst3703 graph-theory-11-20-2012Mdst3703 graph-theory-11-20-2012
Mdst3703 graph-theory-11-20-2012Rafael Alvarado
 
Mdst3703 maps-and-timelines-2012-11-13
Mdst3703 maps-and-timelines-2012-11-13Mdst3703 maps-and-timelines-2012-11-13
Mdst3703 maps-and-timelines-2012-11-13Rafael Alvarado
 
Mdst3703 visualization-2012-10-23
Mdst3703 visualization-2012-10-23Mdst3703 visualization-2012-10-23
Mdst3703 visualization-2012-10-23Rafael Alvarado
 
Mdst3703 shiva-2012-10-18
Mdst3703 shiva-2012-10-18Mdst3703 shiva-2012-10-18
Mdst3703 shiva-2012-10-18Rafael Alvarado
 

More from Rafael Alvarado (20)

Mdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collectionsMdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collections
 
Mdst3703 2013-10-01-hypertext-and-history
Mdst3703 2013-10-01-hypertext-and-historyMdst3703 2013-10-01-hypertext-and-history
Mdst3703 2013-10-01-hypertext-and-history
 
Mdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertextMdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertext
 
Presentation1
Presentation1Presentation1
Presentation1
 
Mdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-htmlMdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-html
 
Mdst3703 2013-09-05-studio2
Mdst3703 2013-09-05-studio2Mdst3703 2013-09-05-studio2
Mdst3703 2013-09-05-studio2
 
Mdst3703 2013-09-03-plato2
Mdst3703 2013-09-03-plato2Mdst3703 2013-09-03-plato2
Mdst3703 2013-09-03-plato2
 
UVA MDST 3703 2013 08-27 Introduction
UVA MDST 3703 2013 08-27 IntroductionUVA MDST 3703 2013 08-27 Introduction
UVA MDST 3703 2013 08-27 Introduction
 
MDST 3705 2012-03-05 Databases to Visualization
MDST 3705 2012-03-05 Databases to VisualizationMDST 3705 2012-03-05 Databases to Visualization
MDST 3705 2012-03-05 Databases to Visualization
 
Mdst3705 2013-02-26-db-as-genre
Mdst3705 2013-02-26-db-as-genreMdst3705 2013-02-26-db-as-genre
Mdst3705 2013-02-26-db-as-genre
 
Mdst3705 2013-02-12-finding-data
Mdst3705 2013-02-12-finding-dataMdst3705 2013-02-12-finding-data
Mdst3705 2013-02-12-finding-data
 
Mdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databasesMdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databases
 
Mdst3705 2013-01-29-praxis
Mdst3705 2013-01-29-praxisMdst3705 2013-01-29-praxis
Mdst3705 2013-01-29-praxis
 
Mdst3705 2013-01-31-php3
Mdst3705 2013-01-31-php3Mdst3705 2013-01-31-php3
Mdst3705 2013-01-31-php3
 
Mdst3705 2013-01-24-php2
Mdst3705 2013-01-24-php2Mdst3705 2013-01-24-php2
Mdst3705 2013-01-24-php2
 
Mdst3705 2012-01-15-introduction
Mdst3705 2012-01-15-introductionMdst3705 2012-01-15-introduction
Mdst3705 2012-01-15-introduction
 
Mdst3703 graph-theory-11-20-2012
Mdst3703 graph-theory-11-20-2012Mdst3703 graph-theory-11-20-2012
Mdst3703 graph-theory-11-20-2012
 
Mdst3703 maps-and-timelines-2012-11-13
Mdst3703 maps-and-timelines-2012-11-13Mdst3703 maps-and-timelines-2012-11-13
Mdst3703 maps-and-timelines-2012-11-13
 
Mdst3703 visualization-2012-10-23
Mdst3703 visualization-2012-10-23Mdst3703 visualization-2012-10-23
Mdst3703 visualization-2012-10-23
 
Mdst3703 shiva-2012-10-18
Mdst3703 shiva-2012-10-18Mdst3703 shiva-2012-10-18
Mdst3703 shiva-2012-10-18
 

Mdst3703 culturomics-2012-11-01

  • 1. Lecture/Studio: Culturomics Prof. Alvarado MDST 3703/7703 1 November 2012
  • 3. Review • The New Epistemology – Rise of Big Data: massive, available, social – Shifts our relationship to primary sources – From reading to quantitative methods and visualizations – Example of media determinism • Manovich – Consistent with database logic – Applies spirit of Big Data methods to art
  • 4.
  • 5. Review • Rationalization Effects – What are we looking at? – What is theory? – What are models? – What is culture? – What are the humanities?
  • 6. Overview • Combined Studio and Lecture • Lecture – Google’s NGram Viewer – Culturomics • Studio: – Collaborative Topic Index
  • 7. Google Does the Humanities
  • 8. Google NGrams • Google Books comprises 11% of the corpus of published books, about 2 trillion words • NGrams uses 5.2 million books (4% of the corpus) • 500 billion words • Published between 1500-1800 • In English, French, Spanish, German, Chinese and Russian (Hebrew too)
  • 9. Erez Lieberman Aiden and Jean-Baptiste Michel
  • 11. A space-delimited string N = number of strings Case sensitive Purely syntactic Very hard to index
  • 12. Culturomics • A method more than a model (like Anderson argues) • Analogy is to genomics – Does this make sense? – What is the analog to the gene?
  • 13.
  • 14.
  • 15. Parallel Crossing Convergent/Divergent
  • 17.
  • 18. “There’s not even a historian of the book connected to the project,” Mr. Menand noted.
  • 20.
  • 21. Studio • We are now at the point where we have all the pieces in place – HTML markup, CSS, JavaScript – Structured data (table in Google Docs) – Visualization tools • Create Character Index – We will use everything we have done so far – notes, network visualizations, etc. – Today we begin to collaboratively create the Character Index (a subset of a full topic index)

Editor's Notes

  1. Is this correct?
  2. Was he schooled?
  3. Durer, Melancolia 1, 1514, engraving