O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

How and why study big cultural data

Ad

How and why
study big cultural data



Lev Manovich
manovich@ ucsd.edu
softwarestudies.com

Ad

New York Times (November 16, 2010):
“The next big idea in language, history and
the arts? Data.”


NEH/NSF Digging into Da...

Ad

Why study
big cultural data ?

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Carregando em…3
×

Confira estes a seguir

1 de 44 Anúncio
1 de 44 Anúncio

How and why study big cultural data

Baixar para ler offline

Lev Manovich.
How and why study big cultural data.

Presentation at Data Mining and Visualization for the Humanities symposium, NYU, March 19, 2012.

softwarestudies.com

Lev Manovich.
How and why study big cultural data.

Presentation at Data Mining and Visualization for the Humanities symposium, NYU, March 19, 2012.

softwarestudies.com

Mais Conteúdo rRelacionado

How and why study big cultural data

  1. 1. How and why study big cultural data Lev Manovich manovich@ ucsd.edu softwarestudies.com
  2. 2. New York Times (November 16, 2010): “The next big idea in language, history and the arts? Data.” NEH/NSF Digging into Data competition (2009): “How does the notion of scale affect humanities and social science research? Now that scholars have access to huge repositories of digitized data—far more than they could read in a lifetime—what does that mean for research?”
  3. 3. Why study big cultural data ?
  4. 4. 1 study societies through the social media traces - social computing (but do we study society or only social media itself?) 2 more inclusive understanding of history and present (using much larger samples) 3 detect large scale cultural patterns 4 the best way to follow global professionally produced digital culture; understand new developed cultural fields (“X” design) 5 map cultural variability and diversity
  5. 5. Data: 3,724 18th century volumes, using 10,000 most frequent words (excluding proper nouns). Ted Underwood. The Differentiation of Literary and nonliterary diction, 1700-1900.
  6. 6. Growth of a global culture space after 1990: Cumulative number of new art biennales, 1895-2008. Cumulative number of new art biennales, 1895-2008. 6
  7. 7. modern (19th-20th centuries) social and cultural theory: describe what is similar (classes, structures, types) / statistics (reduction) computational humanities and social science should focus on describing what is different / variability / diversity not “from data to knowledge” but from (incomplete) knowledge to actual cultural data
  8. 8. We are no longer interested in the conformity of an individual to an ideal type; we are now interested in the relation of an individual to the other individuals with which it interacts... Relations will be more important than categories; functions, which are variable, will be more important than purposes; transitions will be more important than boundaries; sequences will be more important than hierarchies. Louis Menand on Darvin, 2001.
  9. 9. Visualization: Thinking without “large” categories “The ontological status of assemblages, large and small, is always that of unique, singular individuals.” “Unlike taxonomic essentialism in which genus, species and individuals are separate ontological categories, the ontology of assemblages is flat since it contains nothing but differently scaled individual singularities.” Manuel DeLanda. A New Philosophy of Society.
  10. 10. Bruno Latour: The “whole” is now nothing more than a provisional visualization which can be modified and reversed at will, by moving back to the individual components, and then looking for yet other tools to regroup the same elements into alternative assemblages.
  11. 11. How to study big cultural data ? how to explore massive visual collections (exploratory media analysis)? which data analysis and visualization techniques are appropriate for non- technical users? How to democratize data analysis?
  12. 12. Our approach: media visualization (visualizing media directly rather than only using abstract infovis language)
  13. 13. visualizing large non-visual data using abstraction
  14. 14. media visualization: showing visual data directly Every cover of Times magazine, 1923-2009 (4535 images). X-axis = publication date. Y-axis = saturation mean.
  15. 15. our media visualization software o 287 megapixel display (image: 1 million manga pages)
  16. 16. our software on new display wall with thin bezels (data: 4535 Time magazine covers)
  17. 17. Our methods: 1. media visualization using existing metadata - show complete collection 2. media visualization using existing metadata - use samples to better reveal patterns 3. digital image processing + media visualization (use simple image features which have direct perceptual meaning - and gradually introduce humanists to image processing)
  18. 18. 1. media visualization / existing metadata: montage
  19. 19. 2. media visualization / existing metadata / sample
  20. 20. 3. digital image processing + media visualization Image plots of selected paintings by six impressionist artists. X-axis = mean saturation. Y-axis = median hue. Megan O’Rourke, 2012.
  21. 21. Advantages: replacing discrete categories with continuos attributes
  22. 22. 1. from timelines to curves 2. better represent analog cultural attributes 3. understand cultural landscapes (fuzzy / overlapping / hard clusters?) 4. visualize cultural variability 5. discover new gropings
  23. 23. 1. from timelines to curves
  24. 24. 2. better represent analog attributes
  25. 25. 3. our maps of cultural landscapes reveal fuzzy/overlapping clusters - rather than discrete categories with hard boundaries
  26. 26. 4. visualize cultural variability
  27. 27. 5. discover new grouping
  28. 28. Studying large cultural data challenges our existing theoretical concepts and assumptions example: what is “style”?
  29. 29. one million manga pages
  30. 30. single short manga series (>1000 pages)
  31. 31. 776 Vincent van Gogh paintings
  32. 32. Selected current projects: 7000 year old stone arrowheads (with UCSD anthropologist and CS postdoc at University of Washington) comparing Art Now & Graphic design Flickr groups (340,000 images) (with CS collaborator from Laurence Berkeley National Laboratory) One million images (+ metadata) from deviantArt (with an art historian / DH collaborator from Netherlands Academy of Arts and Sciences)
  33. 33. 4.7 million newspaper pages from Library of Congress (UCSD undergraduate students) virtual world / game analytics (NSF Eager, with UCSD Experimental Games Lab) SEASR tools and workflows for working with image and video data (with NCSA at University of Illinois, Urbana-Champaign)
  34. 34. Conclusion: Computational humanities vs. digital humanities
  35. 35. “The capacity to collect and analyze massive amounts of data has transformed such fields as biology and physics. But the emergence of a data-driven 'computational social science' has been much slower. Leading journals in economics, sociology, and political science show little evidence of this field. But computational social science is occurring in Internet companies such as Google and Yahoo, and in government agencies such as the U.S. National Security Agency.” “Computational Social Science.” Science, vol. 323, no. 6, February 2009. Digital humanities: scholars are mostly working with the archives of digitized historical cultural archives which were created by libraries and universities with the funding from NEH and other institutions.
  36. 36. Computational humanities: Analyzing massive amounts of cultural content and and peoples' conversations, opinions, and cultural activities online - personal and professional web sites, general and specialized social media networks and sites. This data offers us unprecedented opportunities to understand cultural processes and their dynamics and develop new concepts and models which can be also used to better understand the past. Current players in computational humanities: - Google, Facebook, YouTube, Blue Fin Lans, Echonest, and many other companies which analyze social media signals (blogs, Twitter, etc.) and the content of media on social networks. - Computer scientists who are working with this data.
  37. 37. manovich@ ucsd.edu www.softwarestudies.com
  38. 38. Appendix: visualizing video collections use media visualization with a set of keyframes automatic selection of key frames (for example, using free shot detection software)

Notas do Editor

  • 1 The exponential growth of a number of both non-professional and professional media producers over the last decade has created a fundamentally new cultural situation and a challenge to our normal ways of tracking and studying culture. Hundreds of millions of people are routinely creating and sharing cultural content - blogs, photos, videos, online comments and discussions, and so on. 2 The rapid growth of professional educational and cultural institutions in many newly globalized countries along with the instant availability of cultural news over the web and ubiquity of media and design software has also dramatically increased the number of culture professionals who participate in global cultural production and discussions.
  • In summary, the availability of large digitized collections of humanities data certainly creates the case for humanists to use computational tools. However, the rise of social media and globalization of professional cultures leave us no other choice. But how can we explore patterns and relations in sets of photographs, designs, or video, which may number in hundreds of thousands, millions, or billions? (FB: 7 billion photos uploaded per month.)
  • We are situated inside Calit2 which is working on creating next generation cyberinfrastructure: grid computing, super high resolution displays, optical networks which support real-time uncompressed streaming of 4K cinema and 4K teleconferencing
  • ( can instead show my Time covers animations video ) Images are unique for big data analytics. Other media types such as music and text take time to process. If we display many of visualization, it does not work and we have to resort to information visualization. However, with images, a user can see many patterns instantly. at this point, show the true scale of the image in Photoshop
  • illustration: use of simple perceptually meaningful image features
  • 3 Instead of starting with labels (genres, styles, authors) (supervised machine learning), map cultural landscapes (and their evolution) using content properties may or may not find clusters discover many new groupings we did not think of before
  • 1. from timelines to curves: normally a book or an exhibition divides artist work into discrete periods visualization allows us to study the gradual changes, and it may reveal that there are no discrete categories
  • 2. better represent analog dimensions film scholars describe motion in a shot using only half a dozen categories; image analysis + visualization allows us to map “amount of visual change” as a continuos value and discover patterns which were hidden by the use of categories
  • fuzzy overlapping clusters
  • map space of variations
  • discover new groupings

×