The document discusses the need for web science to study the social web. It provides three key points:
1) The social web has transformed global communication but remains surprisingly unstudied as an entity. Understanding its complex emergent phenomena requires new analytical methods.
2) Simple micro-level rules of the web infrastructure give rise to complex macro-level behaviors that must be analyzed differently than traditional computer science. Properties like trust and reliability need new models.
3) Understanding socio-cultural factors is important for engineering future social machines and resolving conflicts between openness versus security/privacy requirements from different parts of society. New theoretical and computational methods are needed to effectively study the web at large scales.
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Lecture 6: How do we study the Social Web (2013)
1. Social Web
Lecture VI
How can we STUDY the Social Web?: The Web Science
Lora Aroyo
The Network Institute
VU University Amsterdam
(based on slides from Les Carr, Nigel Shadbolt)
Monday, March 11, 13
2. The Web
the most used and one of the most transformative
applications in the history of computing, e.g. how the
Social Web has transformed the world's
communication
approximately 1010 people
more than 1011 web documents
Monday, March 11, 13
3. Web is NOT a Thing
• it’s not a verb, or a
noun
• it’s a performance, not
an object
• co-constructed with
society
• activity of individuals
who create interlinked
content that both
reflect and reinforce
the interlinkedness of
society and social
interaction ... and a record of
that performance
Monday, March 11, 13
4. The Web
Great success as a technology,
it’s built on significant computing infrastructure,
but
as an entity surprisingly unstudied
Monday, March 11, 13
5. Science & Engineering
• physical science: analytic discipline to find laws
that generate or explain observed phenomena
• CS is mainly synthetic: formalisms & algorithms
are created to support specific desired
behaviors
• Web Science: web needs to be studied &
understood as a phenomenon but also to be
engineered for future growth and capabilities
Monday, March 11, 13
6. Simple micro rules give
rise to complex macro
phenomena
• at microscale an infrastructure of artificial languages and
protocols: a piece of engineering
• however, interaction of people creating, linking and
consuming information generates web's behavior as
emergent properties at macroscale
• properties require new analytic methods to be
understood
• some properties are desirable and are to be engineered
in, others are undesirable and if possible engineered out
Monday, March 11, 13
7. A new way of software
development
• software applications designed based on appropriate
technology (algorithm, design) and with envisioned
'social' construct
• usually tested in the small, testing microscale properties
• a macrosystem evolving from people using the
microsystem and interacting in often unpredicted ways, is
far more interesting and must be analyzed in different
ways
• also the macrosystems exhibit challenges that do not
exist at microscale
Monday, March 11, 13
8. Evolution of Search
Engines
1: techniques designed to rank documents
2: people were gaming to influence algorithms &
improve their search rank
3: adapt search technologies to defeat this influence
Monday, March 11, 13
9. The Web Graph
• to understand the web, in good
CS tradition, we look at the graph
• nodes are web pages (HTML)
• edges are hypertext links
between nodes
• first analysis shows that in-degree
and out-degree follow power law
distribution => shown to hold for
large samples
• this gave insight into the growth of
the web
Monday, March 11, 13
10. Search Algorithms
• the Web graph also as
basis of algorithms for
search engines:
• HITS or PageRank
assume that inserting
a hyperlink symbolizes
an endorsement of
authority of the page
linked to
Monday, March 11, 13
11. User State is Important
• the original Web graph is too simple, starts from quasi static
HTML
• for personalization or customization different representations
(of sources) may be served to different requesters, e.g. cookies
• graph based models often do not account for this sort of user-
dependent state, and not fit for all the information behind the
servers, in Deep Web
• it’s not a simple HTTP-GET anymore (but HTTP-POST or
HTTP-GET with complex URI) that is the basis for defining
nodes in the graph
• URis that carry user state are heavily used in Web applications,
but are not in the model and largely unanalyzed
Monday, March 11, 13
12. According to Google
each day 20-25% of searches have not been seen before, i.e.
generate a new identifier
thus a new node in the graph
more than 20 million new links per day, 200 per second
do they follow the same power laws & growth models?
Monday, March 11, 13
13. validating such models is hard
According to Google
exponential growth of content
changes in number & power of servers
each day 20-25% of searches have not been seen before, i.e.
increasing adiversity in users
generate new identifier
thus a new node in the graph
more than 20 million new links per day, 200 per second
do they follow the same power laws & growth models?
Monday, March 11, 13
14. Social Web Sites
• modern websites (on the social web)
• have large script systems running in browser
• store personal information
many Social Web sites are not part of the (open) graph model
do these systems show a similar behavior? (macro)
are they stable? are they fair?
do they need to be regulated?
are the access restrictions, for personal
information, assured?
there is a need for understanding and intervening/engineering
Monday, March 11, 13
15. Wikipedia
• purely mathematical (technology-based) models do not capture the
whole story
• the Wikipedia structure (link labels) shows a Zipf-like distribution
just like other tag-based systems
• Wikipedia is built on MediaWiki software
• but other MediaWiki-based applications did not generate such
significant use
• the pure 'technological' explanation cannot explain it
• must be related to the 'social model' of how Wikipedia is
organized
this is referred to as the dynamics of a 'social machine' (already in TBL’s original vision of WWW)
Monday, March 11, 13
16. Collective Intelligence
• why do people contribute?
• how to maintain the connected
content?
• how are trust & provenance
represented, maintained and
repaired on the Web?
Monday, March 11, 13
17. Collective Intelligence
Motivation Example Mean
Fun “Writing in Wikipedia is fun” 6.10
Ideology “I think information should be free” 5.59
Values “I feel it’s important to help others” 3.96
Understanding “Writing in Wikipedia allows me to gain a new perspective on things” 3.92
Enhancement “Writing in Wikipedia makes me feel needed” 2.97
Protective “By writing in Wikipedia I feel less lonely” 1.97
Career “I can make new contacts that might help my career” 1.67
Social “People I am close to want me to write in Wikipedia” 1.51
Monday, March 11, 13
18. Social Machines
• today's interactive applications are very early
social machines limited by being largely isolated from
one another
• more effective social machines can be expected
• social processes in society interlink, so they
should also interlink on the web
• technology needed to allow user communities to
construct, share & adapt social machines to get
success through trial, use & refinement
Monday, March 11, 13
19. Next Generation
Social Machines
• what are fundamental theoretical properties of social
machines, what algorithms are needed to create them?
• what underlying architectural principles a needed to
effectively engineer new web components for this social
software?
• how can we extend current web infrastructure with
mechanisms that make the social properties of information
sharing explicit and conform to relevant social-policy
expectations?
• how do cultural differences affect development and use of
social mechanisms?
Monday, March 11, 13
20. Modeling the Social
Machines
• trustworthiness, reliability or silent expectations about
use of information
• privacy, copyright, legal rules
• we lack structures for formally representing &
reasoning over such properties
• thus, without scalable models for these issues it is
hard to help the web go in the best possible
direction
Monday, March 11, 13
23. Web Science is about
additionality
not the union of
disciplines, but
intersection
Monday, March 11, 13
24. Society is Diverse
different parts of society have different objectives and hence incompatible
Web requirements, e.g. openness, security, transparency, privacy
Monday, March 11, 13
25. Understanding the
Socio-Cultural
• POWER DISTANCE: The extent to which
power is distributed equally within a society
and the degree that society accepts this
distribution.
• UNCERTAINTY AVOIDANCE: The degree to
which individuals require set boundaries and
clear structures
• INDIVIDUALISM vs COLLECTIVISM: The degree
to which individuals base their actions on self-
interest versus the interests of the group.
• MASCULINITY vs FEMININITY: A measure of a
society's goal orientation
• TIME ORIENTATION: The degree to which a
society does or does not value long-term
commitments and respect for tradition.
Monday, March 11, 13
26. Understanding the
variation
• Ecology of the Web - structure
of the environment, producers
and consumers
• Populations (individuals and
species), traits/characteristics,
heredity, genotypes and
phenotypes
• Mechanisms - variation
(mutation, migration, genetic
drift), selection
• Outcomes - adaption, co-
evolution, competition, co-
operation, speciation,
extinction
Monday, March 11, 13
27. Understanding the
variation
• Ecology of the Web - structure
of the environment, producers
and consumers
• Populations (individuals and
species), traits/characteristics,
heredity, genotypes and
phenotypes
• Mechanisms - variation
(mutation, migration, HGT,
genetic drift), selection
• Outcomes - adaption, co-
evolution, competition, co-
operation, speciation,
extinction
Monday, March 11, 13
28. Understanding the
variation
• Ecology of the Web - structure
of the environment, producers
and consumers
• Populations (individuals and
species), traits/characteristics,
heredity, genotypes and
phenotypes
• Mechanisms - variation
(mutation, migration, HGT,
genetic drift), selection
• Outcomes - adaption, co-
evolution, competition, co-
operation, speciation,
extinction
Monday, March 11, 13
29. Understanding the
variation
• Ecology of the Web - structure
of the environment, producers
and consumers
• Populations (individuals and
species), traits/characteristics,
heredity, genotypes and
phenotypes
• Mechanisms - variation
(mutation, migration, HGT,
genetic drift), selection
• Outcomes - adaption, co-
evolution, competition, co-
operation, speciation,
extinction
Monday, March 11, 13
30. but
How to do the Science?
Monday, March 11, 13
31. it’s relationships, stupid!
not attributes
All the world's a net
by David Cohen
April, 2002 May, 2007
Monday, March 11, 13
32. • Leveraging recent advances in:
• Theories: about the social motivations for
creating, maintaining, dissolving and re-creating
links in multidimensional networks and about
emergence of macro-structures
• Data: Semantic Web/Web 2.0 provide the
technological capability to capture, store, merge,
and query relational metadata needed to more
effectively understand and enable communities
• Methods: qualitative and quantitative
methods to enable theoretically grounded
network predictions
• Computational infrastructure: Cloud
computing and petascale applications are
critical to face the computational challenges in
analyzing the data
Monday, March 11, 13
33. Network
Analysis
• is about linking social actors,
e.g. systematically
understanding and identifying
connections
• by using empirical data
• draws on graphic imagery
• relies on mathematical/
computational models
• Jacob Moreno - one of the
founders of social network
analysis; some of the earliest
graphical depictions of social
networks (1933)
Monday, March 11, 13
34. Think Networks!
Albert-László Barabási: Linked:The New Science of Networks
• everything is connected to everything else
• networks are pervasive - from the human brain
to the Internet to the economy to our group of
friends
• following underlying order and follow simple laws
• "new cartographers" are mapping networks in a
wide range of scientific disciplines
• social networks, corporations, and cells are more
similar than they are different
• new insights into the interconnected world
• new insights on robustness of the Internet, spread
of fads and viruses, even the future of democracy.
April, 2002
Monday, March 11, 13
36. Networks:
another perspective :-)
• Social Networks: It’s not what you
know, it’s who you know
• Cognitive Social Networks: It’s not
who you know, it’s who they think you know.
• Knowledge Networks: It’s not what
you know, it’s what they think you know
Monday, March 11, 13
37. Big Data Owners
Who can do macro analysis?
•Google, Bing,Yahoo!, Baidu
•Large scale, comprehensive data
•New forms of research alliance
How Billions of Trivial Data Points can Lead to
Understanding
Monday, March 11, 13
39. Open Data
• common standards for release of
public data
• common terms for data where
necessary
• licenses - CC variants
• exploitation & publication of
distributed and decentralized
information assets
Monday, March 11, 13
43. Web Science
Reflections
Is the Web changing faster than our ability to observe it?
How to measure or instrument the Web?
How to identify behaviors and patterns?
How to analyze the changing structure of the Web?
Monday, March 11, 13
44. Big Bang:
Web Information
• assumption of the open exchange of
information is being imposed on the society
• is the Web, open access, open data and
scientific and creative commons offer a
beneficial opportunity or dangerous cul-de-
sac?
Monday, March 11, 13
45. Open Questions
• How is the world changing as other parts of society
impose their requirements on the Web?, e.g. current
examples with SOTA/PIPA, ACTA requirements for
security and policing taking over free exchange of
information, unrestricted transfer of knowledge
• Are the public and open aspects of the Web a
fundamental change in society’s information
processes, or just a temporary glitch?, e.g. are open
source, open access, open science & creative commons
efficient alternatives to free-based knowledge transfer?
Monday, March 11, 13
46. Open Questions
• do we take Web for granted as provider of a free
and unrestricted information exchange?
• is Web Science the response to the pressure for the
Web to change - to respond to the issues of
security, commerce, criminality and privacy?
• What are the challenges for Web science?
•to explain how the Web impacts society?
•to predict the outcomes of proposed changes
to Web infrastructure on business & society?
Monday, March 11, 13
47. What can you do as a
Computer Scientist?
specifically for the Social Web
Monday, March 11, 13
48. Hands-on Teaser
• Q&A on Assignments
• Pitch of the Social Web Apps
image source: http://www.flickr.com/photos/bionicteaching/1375254387/
Monday, March 11, 13