Talk given at the Social Media and the Transformation of Public Space Conference on June 19 at the University of Amsterdam. References and comments are in the notes section.
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
Engines of Order. Social Media and the Rise of Algorithmic Knowing.
1. Engines of Order
Social Media and the Rise of Algorithmic Knowing
Bernhard Rieder
Universiteit van Amsterdam
Mediastudies Department
2. Starting point
"Algorithms play an increasingly important role in selecting what information is considered
most relevant to us, a crucial feature of our participation in public life." (Gillespie 2015)
From search engines to social media and beyond, the impression is that
socially and culturally relevant tasks are delegated to and performed by
algorithms.
Because algorithms draw together many different things, there are many
ways of beginning to address them.
New forms of "knowing" that have quite different means of producing
knowledge and of making it performative. Can we think of it as a "style of
reasoning" (Hacking 1992)?
3. My approach to the question
As researcher and software developer with the Digital Methods Initiative, I
build and apply tools that contribute to "knowing" what is happening on
social media, most recently:
☉ Netvizz (Facebook data extraction), Rieder 2013
https://apps.facebook.com/netvizz/
☉ DMI-TCAT (DMI Twitter Capture and Analysis Toolkit), Borra & Rieder 2014
https://github.com/digitalmethodsinitiative/dmi-tcat/
This project is more closely aligned with a book project that investigates
the conceptual content and history of algorithmic information processing.
A critical approach is necessary both for my own role in algorithmic
knowledge production and for understanding how social media make use
of algorithms on various levels.
Algorithms used by computational researchers and platforms are similar.
4. algorithminput output
system
in use
system
in use
- interface elements
- contents
- users and uses
- interface elements
- contents
- users and uses
- capture
- formalization
- semantics
- display
- interactivity
- performativity
- techniques
- parameters
- internal states
latent
order
revealed
order
users tweeting, clicking,
navigating, reading, etc.
some math 10 trending phrases
Algorithmic configurations
loads of
data results
possible effects
5. Very large numbers and variety in users,
contents, purposes, arrangements, etc.
"[Commensuration] standardizes
relations between disparate things and
reduces the relevance of context."
(Espeland & Stevens 1998)
6. Platforms like Twitter
provide opportunities for
creating connections
between defined types of
entities (users, messages,
hashtags, resources, etc.).
They formalize and channel
expression, exchange, and
coordination.
"You cannot reply to a
hashtag."
"Simply put, a system can only
track what it can capture, and it
can only capture information that
can be expressed within a
grammar of action that has been
imposed upon the activity." (Agre
7. Using social media and the Web
is like living in a survey.
Or rather, in an experiment,
since so many parameters are
controlled.
Grammars need to become
more pervasive or more explicit
("deeper") so that more
semantic data can be captured.
8. Data pools in social media are
centralized and searchable.
Data is used by social media
platforms at various instances
for various goals.
Data is made accessible at
varying degree to various actors
for various reasons.
9. Taxonomy of the Encyclopédie
(Diderot and d'Alembert ca. 1783)
11. Knowing the many
Similar experience of "too many" in different fields:
☉ Maxwell (1859): even if atoms are fully deterministic, we could never model the
behavior of a gas by observing individual atoms; => statistical mechanics
☉ Foucault (2004): epidemics, economic dynamics, etc. cast doubt on the family as a
model for understanding and governing society; => "population" and social sciences
☉ Bush (1945): "There is a growing mountain of research." => information retrieval
Between 1850 and 1940 many techniques to think and analyze "the many"
are introduced, looking at the structure and dynamics of interacting
ensembles.
The "erosion of determinism" (Hacking 1981) means that modes of
description are increasingly probabilistic and oriented towards "acting in
an uncertain world" (Callon, Lascoumes, Barthe 2001) that can be "tamed"
(Hacking 1990) through statistical techniques.
12. Social media deal with various kinds of "the many" (users,
messages, products, ideas, etc.) and strife to provide
answers to questions like who to talk to, what to read,
where to go, what to buy, etc. in the form of decisions.
They make use of various techniques to algorithmically
reduce complexity to allow continuous activity.
13. From classification to calculation
Classifications as information infrastructures (cf. Bowker and Star 1999) that
orient practice through normalization, standardization, selective
discarding, reformulation, positioning, navigational structuring, etc.
are still relevant.
But various forms of process and calculation are making things much,
much more complicated.
We are currently seeing a race toward understanding the semantics of
expression, behavior, and cultural artifacts.
14.
15. There are different ways of
producing "semantic" data.
Users are not only filling up the fields, they are
increasingly participating in shaping formalizations.
From classifications to classification procedures.
16. "One of the simplest ways to
derive information about a user
is to look at the way he uses the
system." (Rich 1983)
Let's not forget that some of
the valuable data are simply a
byproduct of people using the
system.
17. What are "personal data"?
"Facebook Likes can be used to automatically
and accurately predict a range of highly
sensitive personal attributes including:
sexual orientation, ethnicity, religious and
political views, personality traits,
intelligence, happiness, use of addictive
substances, parental separation, age, and
gender." (Kosinskia, Stillwell, Graepel 2013)
The data used in this study does not
even include friends' likes.
Prediction is determination of
likelihood based on knowledge of
previous events.
18. Data is analyzed and made
performative immediately inside
of the system.
New categories can be derived
from other data and are instantly
made actionable.
19.
20. Recapitulation
By providing functionality through always more fine-grained grammars of
action (and other data capturing techniques), social media platforms
accumulate loads of structured and unstructured data.
The semantization of data in relation to operational contexts (through
formalization, derivation, etc.) begins early on.
Classification is deeply caught up with calculation and process.
21. algorithminput output
system
in use
system
in use
- interface elements
- contents
- users and uses
- interface elements
- contents
- users and uses
- capture
- formalization
- semantics
- display
- interactivity
- performativity
- techniques
- parameters
- internal states
latent
order
revealed
order
Algorithmic configurations
Algorithmic configurations imply "distributed calculative agencies" (Callon
and Munesia 2005) that run through the system and its users.
The data arriving at the algorithm has both latent meaning and order: it is
related to actual practices and not random noise.
30. Techniques
There are many different algorithmic techniques that have complex
histories. Each technique reveals the data from a specific angle, but they
are highly plastic and can be easily combined.
They may be reductionist (e.g. graph theory: everything is a point or line),
but also very generative (unlimited number of "views").
Many techniques focus on the relationship between populations and
individuals. In social media units can be qualified in terms of other units.
All of these techniques are "revealing" (in the sense of Heidegger) the data:
they show certain aspects of the latent order in certain ways; they make truth
that is caught up in a position towards the world, a finality.
37. Parameters
Any somewhat complex technique reacts (strongly) to variation in
parameters and data. This means that without knowledge of parameters
and data, it is hard to understand/critique an algorithm.
A single parameter can encode a commitment to a specific theory of power
(PageRank at low α is "one person, one vote", at high α "patronage of the
powerful").
Parameters are now often set through continuous testing. They are one of
the places where empirical practices and operational goals can be brought
to converge; - automatically.
38. We move from "what should the formula be according to our ideas about
relevance?" to "what has our testing engine identified as the optimal
parameters given our operational goal of more user interaction?".
Whenever you read "n000 factors", machine learning techniques are at work.
39. Machine learning techniques (e.g. Bayesian filters,
maximum entropy classifiers, etc.) can learn to
"interpret" any input signal in relation to
categories, based on feedback ("supervision").
In these techniques, the state of the machine (i.e.
the statistical model) becomes the algorithm.
These self-optimizing, empirical machines are
becoming increasingly common.
40. The "risk technology" is trained by associating "thousands of
pieces of data" with a probability of defaulting or not defaulting.
Every signal receives meaning as predictor for defaulting.
41. States
In digital media, we often need to do preciously little to "make things
calculable", since everything already has been made so.
Algorithms are increasingly empirical knowledge machines, that tie the
"real world" to operational modes of optimization and validation.
The epistemological commitment, then, is no longer to a theory or model,
but to a method for generating models.
The difference is thus not just between the "editorial" and the
"algorithmic" (Gillespie 2012), but also between "editorial algorithms" and
"generated algorithms".
42. "To date, the complexity of mobile and the disparate, closed
platforms that dominate it have caused most people to ignore the
possibility and benefits of A/B testing. […] To us at Taplytics this is
crazy. If you are developing on the web everything is calculated
and optimized and viewed in terms of hypotheses, significance
levels and confidence intervals. On mobile, however, for the past
6 years we have been living in the era of the 'artform' of mobile
apps, where things are viewed in terms of gut feel and shooting
from the hip." (Druxerman 2014)
43. Since the digital operational environment is fully
integrated, data collection, analysis, decision-
making, and execution are all folded into one.
These are engines of order.
44. Conclusions
Moving from classification to calculation implies a move from "thing
concepts" (Dingbegriffe) to "relational concepts" (Relationsbegriffe), of
from substance notions of knowledge to functional ones (cf. Cassirer
1910).
A good analogue to algorithmic configurations on social media platforms
are markets and in particular multi-sided markets (Rochet and Tirole 2004).
Just like markets, algorithmic configurations are "places of truth" (Foucault
2004) not in that they show "the truth" but that truth is produced as a
byproduct of their optimal functioning e.g. the right price, the right
trending topics, the right number and type of stories shown, etc.
The right algorithm is the one that produces an optimal equilibrium
between user satisfaction and value extraction through advertising.
45. Conclusions
"The current mythology of big data is that with more data comes greater accuracy and truth. This
epistemological position is so seductive that many industries, from advertising to automobile
manufacturing, are repositioning themselves for massive data gathering." (Crawford 2014)
This position is problematic and potentially dangerous if it frames
proponents as either naïve ("they don’t know what they are saying") or
cynical ("they don't believe what they are saying").
The danger is not that "big data" acolytes are wrong, but that they are
right. We should consider this as a real possibility.
46. Conclusions
If they are right, we face a series of really big problems:
☉ If better data + algorithms means better truth, we can expect further
concentration and concentric diversification of large Internet companies through
tipping markets;
☉ Operational concepts of knowledge and truth would become even more pervasive;
☉ Privacy issues pale compared to the threat of knowledge monopolization and the
reconfiguration of publicness according to operational goals that are geared
toward profit maximization;
☉ Political institutions and critical forces are direly unprepared for dealing with
algorithmic engines of order, both technically and normatively;
"I will argue that democratic talk is not essentially spontaneous but essentially role-
governed, essentially civil, and unlike the kinds of conversation often held in highest
esteem for their freedom and their wit, it is essentially oriented to problem-solving."
(Schudson 1997)
Question of classification is not new, obviously and conflicts around classification have a long history.
Parameters: a little bit shorter
Image from Techcrunch: http://techcrunch.com/2014/04/03/the-filtered-feed-problem/
The lists can be seen as vectors as well and then treated with the full arsenal of geometry (e.g. to calculate a similarity coefficient between two such vectors)