Lev Manovich.
How and why study big cultural data.
Presentation at Data Mining and Visualization for the Humanities symposium, NYU, March 19, 2012.
softwarestudies.com
Choosing the Right CBSE School A Comprehensive Guide for Parents
How and why study big cultural data
1. How and why
study big cultural data
Lev Manovich
manovich@ ucsd.edu
softwarestudies.com
2. New York Times (November 16, 2010):
“The next big idea in language, history and
the arts? Data.”
NEH/NSF Digging into Data competition
(2009): “How does the notion of scale
affect humanities and social science
research?
Now that scholars have access to huge
repositories of digitized data—far more
than they could read in a lifetime—what
does that mean for research?”
4. 1 study societies through the social media
traces - social computing (but do we study
society or only social media itself?)
2 more inclusive understanding of history
and present (using much larger samples)
3 detect large scale cultural patterns
4 the best way to follow global
professionally produced digital culture;
understand new developed cultural fields
(“X” design)
5 map cultural variability and diversity
5. Data: 3,724 18th century volumes, using 10,000 most frequent words
(excluding proper nouns). Ted Underwood.
The Differentiation of Literary and nonliterary diction, 1700-1900.
6. Growth of a global culture space after 1990:
Cumulative number of new art biennales, 1895-2008.
Cumulative number of new art biennales, 1895-2008.
6
7. modern (19th-20th centuries) social and
cultural theory: describe what is similar
(classes, structures, types) / statistics
(reduction)
computational humanities and social
science should focus on describing what
is different / variability / diversity
not “from data to knowledge” but from
(incomplete) knowledge to actual cultural
data
8. We are no longer interested in the
conformity of an individual to an ideal type;
we are now interested in the relation of an
individual to the other individuals with
which it interacts... Relations will be more
important than categories; functions, which
are variable, will be more important than
purposes; transitions will be more
important than boundaries; sequences will
be more important than hierarchies.
Louis Menand on Darvin, 2001.
9. Visualization: Thinking
without “large”
categories
“The ontological status of assemblages,
large and small, is always that of unique,
singular individuals.”
“Unlike taxonomic essentialism in which
genus, species and individuals are
separate ontological categories, the
ontology of assemblages is flat since it
contains nothing but differently scaled
individual singularities.”
Manuel DeLanda. A New Philosophy of
Society.
10. Bruno Latour:
The “whole” is now nothing more than a
provisional visualization which can be
modified and reversed at will, by moving
back to the individual components, and
then looking for yet other tools to
regroup the same elements into
alternative assemblages.
11. How to study
big cultural data ?
how to explore massive visual collections
(exploratory media analysis)?
which data analysis and visualization
techniques are appropriate for non-
technical users? How to democratize data
analysis?
14. media visualization: showing visual data
directly
Every cover of Times magazine, 1923-2009 (4535 images).
X-axis = publication date. Y-axis = saturation mean.
16. our software on new display wall
with thin bezels (data: 4535 Time
magazine covers)
17. Our methods:
1. media visualization using existing
metadata - show complete collection
2. media visualization using existing
metadata - use samples to better reveal
patterns
3. digital image processing + media
visualization (use simple image features
which have direct perceptual meaning -
and gradually introduce humanists to
image processing)
21. 3. digital image processing + media
visualization
Image plots of selected paintings by six impressionist artists.
X-axis = mean saturation. Y-axis = median hue.
Megan O’Rourke, 2012.
24. 1. from timelines to curves
2. better represent analog
cultural attributes
3. understand cultural landscapes
(fuzzy / overlapping / hard
clusters?)
4. visualize cultural variability
5. discover new gropings
36. Selected current
projects:
7000 year old stone arrowheads
(with UCSD anthropologist and CS postdoc
at University of Washington)
comparing Art Now & Graphic design Flickr
groups (340,000 images)
(with CS collaborator from Laurence
Berkeley National Laboratory)
One million images (+ metadata) from
deviantArt (with an art historian / DH
collaborator from Netherlands Academy of
Arts and Sciences)
37. 4.7 million newspaper pages from Library
of Congress (UCSD undergraduate
students)
virtual world / game analytics (NSF Eager,
with UCSD Experimental Games Lab)
SEASR tools and workflows for working
with image and video data (with NCSA at
University of Illinois, Urbana-Champaign)
39. “The capacity to collect and analyze massive amounts of data has
transformed such fields as biology and physics. But the emergence of a
data-driven 'computational social science' has been much slower. Leading
journals in economics, sociology, and political science show little evidence
of this field. But computational social science is occurring in Internet
companies such as Google and Yahoo, and in government agencies such as
the U.S. National Security Agency.”
“Computational Social Science.” Science, vol. 323, no. 6, February 2009.
Digital humanities:
scholars are mostly working with the archives of digitized historical cultural
archives which were created by libraries and universities with the funding
from NEH and other institutions.
40. Computational
humanities:
Analyzing massive amounts of cultural content and and peoples'
conversations, opinions, and cultural activities online - personal and
professional web sites, general and specialized social media networks and
sites. This data offers us unprecedented opportunities to understand cultural
processes and their dynamics and develop new concepts and models which
can be also used to better understand the past.
Current players in computational humanities:
- Google, Facebook, YouTube, Blue Fin Lans, Echonest, and many other
companies which analyze social media signals (blogs, Twitter, etc.) and the
content of media on social networks.
- Computer scientists who are working with this data.
1 The exponential growth of a number of both non-professional and professional media producers over the last decade has created a fundamentally new cultural situation and a challenge to our normal ways of tracking and studying culture. Hundreds of millions of people are routinely creating and sharing cultural content - blogs, photos, videos, online comments and discussions, and so on. 2 The rapid growth of professional educational and cultural institutions in many newly globalized countries along with the instant availability of cultural news over the web and ubiquity of media and design software has also dramatically increased the number of culture professionals who participate in global cultural production and discussions.
In summary, the availability of large digitized collections of humanities data certainly creates the case for humanists to use computational tools. However, the rise of social media and globalization of professional cultures leave us no other choice. But how can we explore patterns and relations in sets of photographs, designs, or video, which may number in hundreds of thousands, millions, or billions? (FB: 7 billion photos uploaded per month.)
We are situated inside Calit2 which is working on creating next generation cyberinfrastructure: grid computing, super high resolution displays, optical networks which support real-time uncompressed streaming of 4K cinema and 4K teleconferencing
( can instead show my Time covers animations video ) Images are unique for big data analytics. Other media types such as music and text take time to process. If we display many of visualization, it does not work and we have to resort to information visualization. However, with images, a user can see many patterns instantly. at this point, show the true scale of the image in Photoshop
illustration: use of simple perceptually meaningful image features
3 Instead of starting with labels (genres, styles, authors) (supervised machine learning), map cultural landscapes (and their evolution) using content properties may or may not find clusters discover many new groupings we did not think of before
1. from timelines to curves: normally a book or an exhibition divides artist work into discrete periods visualization allows us to study the gradual changes, and it may reveal that there are no discrete categories
2. better represent analog dimensions film scholars describe motion in a shot using only half a dozen categories; image analysis + visualization allows us to map “amount of visual change” as a continuos value and discover patterns which were hidden by the use of categories