1. Informatik 5 (DBIS),
RWTH Aachen University
TeLLNet
GALA
The MediaBase
Ralf Klamma
Webinar
December 16, 2010
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-1
2. TeLLNet
GALA
The Overall Approach
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-2
3. What is unique about
the MediaBase?
Interdisciplinary multidimensional model of digital networks
– Social network analysis (SNA) is defining measures for social
TeLLNet Community
relations
GALA
– Actor network theory (ANT) is connecting human and media agents
– I* framework is defining strategic goals and dependencies
– Theory of media transcriptions is studying cross-media knowledge
social software Media Networks network of artifacts
Wiki, Blog, Podcast, IM, Chat, Microcontent, Blog entry, Message, Burst, Thread,
Email, Newsgroup, Chat … Comment, Conversation, Feedback (Rating)
i*-Dependencies
(Structural, Cross-media)
network of members
Lehrstuhl Informatik 5
Members
(Social Network Analysis: Centrality,
(Informationssysteme)
Prof. Dr. M. Jarke
Efficiency)
Communities of practice
I5-KL-111010-3
4. Modeling Dependencies
Using the i* Framework
Coordination Iterant
Coordinator
Broker
TeLLNet
GALA isA
isA
isA
Member Gatekeeper Artifact
isA
URL
Hub
Legend:
Agent
Goal
Communication
Network
Resource
Lehrstuhl Informatik 5 Task
(Informationssysteme)
Prof. Dr. M. Jarke
Eric S. K. Yu, Towards Modeling and Reasoning Support for Early-Phase Requirements Engineering, RE 1997
I5-KL-111010-4
5. What can you do with the Mediabase
Community Interface for (Firefox Plugin)
– Adding media for crawling, searching & viewing
TeLLNet
GALA
– Observing social networks over time
– Retrieving structural patterns of media
– Applying Web 2.0 operations (tagging, etc.) on media
Writing your own crawlers
Applying all kind of social network measures
– Centrality measures – Finding influential & powerful persons
– Network statistics – Understand networks at large
Advanced queries in RDF Store on concepts and relations
– Who is the owner of company x?
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke – Structured input for conceptual mapping tools
I5-KL-111010-5
6. What is the MediaBase?
Collection of Social Software artifacts:
TeLLNet Mailing lists (>200 k) Wikipedias
GALA Blogs (>300 k) RSS Feeds
Websites Forums
Newsletters …
The MediaBase
• IBM DB 2 data store
• 24/7 Perl crawlers for media artifacts
• Community oriented Commander Interface
• Social network analysis & visualization tools
Lehrstuhl Informatik 5
• PALADIN: A pattern language for automatic behavior detection
Automatic extraction of concepts and relations in RDF
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-6
•
7. TeLLNet
GALA
The Data Model
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-7
8. MediaBase Model
TeLLNet
A Mediabase is a six-tuple graph
GALA
M = (A, R, µ , ν , η , L)
R ⊆ A×A
µ :A → L
ν :R → L
η : R → {0, 1}
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-8
9. Simplified Meta Model
Attribute has Actor
TeLLNet
GALA
isA
Medium Artifact Process Agent Community
isA
stores creates is affected by belongs go
represents consumes performs ranks
Lehrstuhl Informatik 5
Browse Address Transcribe … Localize
(Informationssysteme)
Prof. Dr. M. Jarke
Latour: On Recalling ANT, 1999
I5-KL-111010-9
11. Medium – Artifact Compatibility
Mailing Transaction- Chat
Email Blog Wiki URL Forum
TeLLNet List based Website Room
GALA
Message + + - - - - - +
Thread - + - - + - - +
Burst + + + + + - - +
Conversation - - - - - + - +
Blog Entry - - + - - - - -
Comment - - + + + - - +
Web Page - - - - + - + -
Transaction - - - + - - - -
Feedback - - - + - - - +
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-11
12. TeLLNet
GALA
The Crawlers
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-12
13. Crawling Technologies
Mix of dumps (Wikis) and special purpose crawlers:
TeLLNet
GALA W = Media ∪ Artifact
I = Media ∪ Artifact ∪ Process ∪ Agent
G = Media ∪ Artifact ∪ Process ∪ Agent ∪ Network
MW = Mailing list ∪ Message ∪ Thread ∪ Index
Lehrstuhl Informatik 5
(Informationssysteme)
BW = Blog ∪ Blogroll ∪ Blogentry ∪ Comment ∪ Index
Prof. Dr. M. Jarke
I5-KL-111010-13
14. Crawler Overview
TeLLNet
GALA
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-14
15. Website Crawler
TeLLNet
GALA
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-15
16. Feed Crawler
TeLLNet
GALA
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-16
17. Mailinglist Crawler
TeLLNet
GALA
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-17
18. News Crawler
TeLLNet
GALA
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-18
19. Podcast Crawler
TeLLNet
GALA
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-19
20. TeLLNet
GALA
The MediaBase Commander
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-20
21. Media Base Web 2.0 Commander
Personalization (user annotates resources with tags and has his page)
Community-awareness (resources and annotation of others are open)
TeLLNet
User-friendly interface (Firefox plug-in, easy insertion of resources, tags, tracking of
GALA
recent changes)
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-21
22. Application Programmer Interfaces
Under Development
TeLLNet
– GraphService – Visualization and PALADIN
GALA
– http://dbis.rwth-
aachen.de/~atlas/module_build/JavaDoc//atlas_las_services_gr
aph-service/HEAD/javadoc/index.html
– TargETLy Service – RDF Data Generator
– http://dbis.rwth-
aachen.de/~atlas/module_build/JavaDoc/atlas_theses_da_kren
ge_TargETLy2/HEAD/javadoc/index.html
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-22
23. GraphService
AbstractDigitalNetwork – Representation of
TeLLNet MetaModel
GALA
Classes for Networks – Blogs, Mailinglists, etc.
Classes for Basic SNA
Classes for Pattern Analysis
Classes for GraphLayout
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-23
24. TargETLy Service
Connection to RDF Store
OpenCalais Service – RDF Generator
TeLLNet
GALA
Pattern Analysis
IntentAnalysis
Collection of predefined RDF Queries
– e.g. companyCompetitor, companyEmployeeNumber
– e.g. patentFiling, patentIssuance
– e.g. personEmailAddress, creditRating
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-24
25. TeLLNet
GALA
PALADIN – Pattern Analysis
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-25
26. PALADIN: Disturbances in
Cross-media Social Networks
What is a disturbance?
TeLLNet – Sensing an incompatibility
GALA between theories exposed
and theories-in-use
Disturbances are starting
points of learning processes
– Disturbances disturb,
prevent … but they are
creating reflection
Disturbances are hard to
detect or to forecast
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-26
27. Pattern Language for PALADIN:
Example Troll
Troll Pattern: This pattern tries to discover the cases when a troll exists in a digital social
network. A troll in the network is considered a disturbance.
TeLLNet
Disturbance:
GALA
(EXISTS [medium | medium.affordance = threadArtefact]) &
(EXISTS [troll |(EXISTS [thread | (thread.author = troll) &
(COUNT [message | (message.author = troll) &
(message.posted = thread)]) > minPosts]) &
(~EXISTS[ thread1, message1| (thread1.author1 != troll) &
(message1.author = troll & message1.posted = thread1 ]))])])
Forces: medium; troll; network; member; thread; message; url
Force Relations: neighbour(troll, member); own thread(troll, thread)
Solution: No attention must be paid to the discussions started by the troll.
Rationale: The troll needs attention to continue its activities. If no attention is paid, he/she
Lehrstuhl Informatik 5
will stop participating in the discussions.
Pattern Relations: Associates Spammer pattern.
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-27
28. Pattern Discovery Process
Pattern 1. Set pattern Pattern Template
parameters Disturbance
Disturbance
Variables Pattern
TeLLNet 4a. Variables Parameters
Change
GALA Pattern Instance Pattern
Parameters
Disturbance Digital Social Network 2. Instantiate
disturbances
4b. Apply
Variables Pattern Pattern Solution
Parameters
Pattern Template Instance
Forces Force
Relations
Disturbance Instances
Description Solution
Variables Pattern
Parameters
Rationale
Dependencies 3. Evaluate
disturbances
Lehrstuhl Informatik 5
(Informationssysteme) Pattern Relations
Prof. Dr. M. Jarke
I5-KL-111010-28
29. PALADIN Case Study
10 patterns of disturbance over 119 social network instances,
TeLLNet 17359 individuals, 215 345 mails
GALA Pattern Occurrences Remarks
Burst 22 The pattern finds out topics which were very important for certain
period of time. Scalability is necessary.
No Conversationalist 76 The existence implies little communication in the network.
No Questioner 67 The existence implies that the network is not popular.
No Answering Person 61 Occurs in small networks. The effects of the lack of an answering
person must be further checked with content analysis.
Troll 2 Troll occurs very rarely in cultural communities. True negatives exist.
Spammer 86 Spammers can be found often in discussion groups. False positives
exist.
Leader 37 The pattern occurs in the network centered around a member.
No Leader 40 Occurs in big networks where the members are distributed in
different clusters.
Structural Hole 67 Occurs for members having neighbors with only one contact.
Lehrstuhl Informatik 5
(Informationssysteme) Independent 13 Occurs in large networks where disconnected subnetworks exist.
Prof. Dr. M. Jarke
I5-KL-111010-29
Discussions Scalability is necessary.
30. TeLLNet
GALA
Visualization & Analysis
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-30
31. Social Network Analysis of
Open Source Communities
Eclipse components network based on analysis of
TeLLNet source code repository (Software Architecture)
GALA
Eclipse components network based on analysis of
mailing list communication (Social Structure)
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-31
32. Community Reflection about
Development Process
TeLLNet
GALA
Social platform: Eclipse forum eclipsezone
Forum: Eclipse communication framework (ECF)
Measure: degree centrality
Lehrstuhl Informatik 5
(Informationssysteme)
Statistics: 225 nodes, 283 edges
Prof. Dr. M. Jarke
I5-KL-111010-32
33. Conversationalist Pattern
Social platform: Eclipse mailing list
TeLLNet
Forum: Device debugging developer discussion
GALA
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-33
34. Questioner Pattern
Social platform: Eclipse mailing list
TeLLNet
Forum: Device debugging developer discussion
GALA
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-34
35. Identification of End-Users and
Developers in OSS Communities
Community
TeLLNet Clustering
GALA
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-35
36. Textual Analysis of Postings from
Community Experts
TeLLNet
GALA
Postings from experts
of one of the identified
communities
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-36
37. Computer Science Knowledge Network:
the Visualization
TeLLNet
GALA
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-37
38. Computer Science Knowledge Network:
Clustering
TeLLNet
GALA
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-38
39. Interdisciplinary Venues:
Top Betweenness Centrality
TeLLNet
GALA
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-39
40. High Prestige Series:
Top PageRank
TeLLNet
GALA
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-40
41. Data Sets
DBLP (http://www.informatik.uni-trier.de/~ley/db/)
- 788,259 author’s names
TeLLNet
- 1,226,412 publications
GALA
- 3,490 venues (conferences, workshops, journals)
CiteSeerX (http://citeseerx.ist.psu.edu/)
- 7,385,652 publications (including publications in reference lists)
- 22,735,240 citations
- Over 4 million author’s names
Combination
- Canopy clustering [McCallum 2000]
- Result: 864,097 matched pairs
- On average: venues cite 2306 and
are cited 2037 times
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-41
42. WikiWatcher – System Design
Stage 1: SAX-based Parser in PERL Wiki Network Data
TeLLNet Authors
Generating XML Parsing wiki data/
GALA dump/export files database transfer Article pages, Joe
URLS,
Revisions Liz
article Tim
[[Article]]
RDB
[[requested]]
123.45.67.89
Stage 2: Dynamic Analysis and Visualization
article
[http://…]
[[Article2]]
Generating Networks Measurement [[never exists]]
Metadata
Lehrstuhl Informatik 5
(Informationssysteme) Visualization Network Analysis
Prof. Dr. M. Jarke
I5-KL-111010-42
43. Network Heterogeneity
Author Networks
TeLLNet – Author nodes
GALA (anonymous/registered users)
– Edges represent collaboration
between authors during
a period t
Article Networks
– Article nodes
(incl. wiki namespaces)
– Directed edges (links)
between articles
Lehrstuhl Informatik 5
(Informationssysteme)
As expected both kind of
Prof. Dr. M. Jarke
I5-KL-111010-43 networks stay heterogenous
44. Importance of Network Actors
Articles: High betweenness
TeLLNet centrality controls the flow of
GALA information within a Wiki
Betweenness values grow
up or stay nearly constant
during the evolution process
Determines
– Important actors
– Important articles
– Vandalism
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-44
45. Evolution of Shortest Paths
Densification Power Law:
TeLLNet Complex networks may
GALA become denser
during their growth
Generally this could not
verified for wiki author
networks!
The average distances
stagnate at nearly 2 for all
considered author networks
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-45
46. Evolution of Author Networks
Strongly connected components merged by collaboration of
two wiki authors
TeLLNet
GALA
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke Author Network of German Wikia in July 2007 Author Network of German Wikia in August 2007
I5-KL-111010-46
47. TeLLNet
GALA
Visualization & Analysis
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-47
48. What you cannot do with the
Mediabase (in the moment )
Creating a new Mediabase in a new environment
TeLLNet
– Maintenance with databases, scripts and interfaces is tedious
GALA – Interfaces integrated into Zope/Plone
Not all media are equally supported
– Very good support for mailing lists, forums, web sites and blogs
– Less support for wikis, podcasts, social bookmarks
Lacking support for
– Conceptual navigation interface (Conzilla!)
– Discourse management tools
– Weak signal analysis tools
– Topic & sentiment & opinion mining tools
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke – Automatic generation of recommendations
I5-KL-111010-48
49. The Future of the Mediabase:
CommunityBase
TeLLNet
GALA Activity Theory
[Enge87]
Actor Network Self- Community Self-
Theory [Lato05] modeling experience reflection
repository
Community of
Practice [Weng98]
+
disturbance +/- -
disturbance disturbance
[PeKl08]
Lehrstuhl Informatik 5
(Informationssysteme)
Prof. Dr. M. Jarke
I5-KL-111010-49
Self-modeling phase contributes to self-reflection phase and vice versa