Discourse on the Web currently can not be appropriately representation, which hampers searching and querying. Based on insights from Web Science, DERI Galway has developed three different approaches for representing and mining of discourse.
Representing discourse and argumentation as an application of Web Science
1. Digital Enterprise Research Institute www.deri.ie
Representing discourse and argumentation
as an application of Web Science
Benjamin Heitmann, Dr. Conor Hayes
Digital Resources for the Humanities and Arts Conference 2009
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Chapter
2. Introduction
Digital Enterprise Research Institute www.deri.ie
The Web mirrors most areas of today’s
society (e.g.: entertainment, science and
humanities)
Current Web does not capture structure
of critique, argumentation, interpretation
Representing types and granularity of
discourse and links is necessary
DERI has 3 approaches to discourse
representation
Foundation: Web Science as an
interdisciplinary approach to
understanding and engineering the
Web (started by Tim Berners-Lee)
Benjamin.Heitmann
slide 2 of 18
@deri.org
3. Outline
Digital Enterprise Research Institute www.deri.ie
Motivation:
Knowledge representation techniques to enable more
sophisticated searching and querying of discourse on
the Web
Introducing Web Science:
an interdisciplinary approach to understanding the
Web and its evolution
Applying the Web Science method:
three approaches for discourse representation
Benjamin.Heitmann
slide 3 of 18
@deri.org
4. Discourse and argumentation on the Web
Digital Enterprise Research Institute www.deri.ie
TheWeb doesn't properly
capture the dynamic
argumentation structures in
discourse Primary research research
Current search only captures:
Text paper weblog
reference
Plain text
Argument
General links
Counter-
reference
Citations argument
No search for: Evaluation
Relations between concepts reference
Motivation
Negative relations reference
Conclusion
Semantics of argumentation:
– Argument, counter-argument publication frequency increases
– Condition, evidence, solution
Benjamin.Heitmann
slide 4 of 18
@deri.org
5. Representing the structure of discourse
Digital Enterprise Research Institute www.deri.ie
Knowledge on the Web
is not sufficiently
connected
No standard
vocabularies for
representation of
discourse structure
and link granularity
Queries are un-intuitive
and imprecise, no
negative queries
Links are un-typed, and
only on document level
No semantics of
relationships
Source: “Clickstream Data
Yields High-Resolution
Maps of Science,” Bollen,
Van de Sompel, et al.
PLoS ONE (2009)
Benjamin.Heitmann
slide 5 of 18
@deri.org
6. Insights from Web Science
Digital Enterprise Research Institute www.deri.ie
The “Web Science” idea was started by Tim Berners-
Lee and researchers from Southampton (see sources)
1. Understanding the current Web requires an
interdisciplinary and holistic view of the Web on a
whole
2. On the Web, engineering and social factors will
influence each other and create a feedback loop
3. Properties of the Web are based on emergent
behaviour, which can be empirically measured
Benjamin.Heitmann
slide 6 of 18
@deri.org
10. Approaches for discourse representation
Digital Enterprise Research Institute www.deri.ie
The Web Science method and discourse
representation:
Interdisciplinary: theoretical foundation is based
on Speech act theory and Language Game theory
Expect a feedback loop between Semantic Web
solutions and usage patterns of community
Empirical approach: CORAAL: use knowledge
extraction and integration on large data collections
Normative (engineering) approaches:
– SIOC Argumentation vocabulary:
light-weight and community-driven
– SALT: annotation of argumentation semantics
Benjamin.Heitmann
slide 10 of 18
@deri.org
11. CORAAL: empirical discourse analysis
Digital Enterprise Research Institute www.deri.ie
Knowledge extraction
and integration
Pattern discovery
Use emergent patterns
in large document
collections
Go beyond text based
search:
Answer negative queries
Detect relations between
concepts
UsesNatural Language
Processing
No mark-up required
Benjamin.Heitmann
slide 11 of 18
@deri.org
13. SIOC argumentation vocabulary
Digital Enterprise Research Institute www.deri.ie
Light-weight and
informal
Express structure of
argumentation:
Who is participating?
Where are the elements of
the discourse distributed?
How are the elements
connected?
Extensibility enables
community
involvement
Benjamin.Heitmann
slide 13 of 18
@deri.org
15. SALT: Semantically Annotated LaTex
Digital Enterprise Research Institute www.deri.ie
Enables mark-up of
documents for claim
identification
Exposes the semantics of
the argumentation.
Examples:
Claims, explanations
Rhetorical structure (abstract,
contribution, evaluation)
Argument, counter-argument
Creates PDF with content
and structure
Benjamin.Heitmann
slide 15 of 18
@deri.org
17. Summary
Digital Enterprise Research Institute www.deri.ie
Representing discourse allows intuitive querying
and searching of the argumentation semantics
The Web Science method provides insights to
representing discourse:
Use interdisciplinary approach; Expect feedback loop
between technical and social factors; Detect emergent
properties and patterns
Three approaches at DERI for representing
discourse:
CORAAL: empirical, knowledge extraction+integration
SIOC argumentation vocabulary: light weight, bottom up
SALT: annotate argumentation semantics in publications
Benjamin.Heitmann
slide 17 of 18
@deri.org
18. Questions? and Sources!
Digital Enterprise Research Institute www.deri.ie
These slides: http://www.slideshare.net/metaman
Web Science:“Web science: an interdisciplinary approach to
understanding the web”, Hendler, Shadboldt, Hall, Berners-Lee,
Weitzner, Communications of the ACM (2008)
CORAAL: demo at http://coraal.deri.ie:8080/coraal
“CORAAL-Dive into publications, Bathe in the Knowledge,”
Novacek, Groza, et al., Journal of Web Semantics, Elsevier (2009)
SIOC argumentation vocabulary:“Expressing
Argumentative Discussions in Social Media Sites”, Lange, Bojars,
et al., Workshop on Social Data on the Web at the International
Semantic Web Conference (2008)
SALT:“SALT-Semantically Annotated LaTex for Scientific
Publications,” Groza, Handschuh, et al., European Semantic Web
Conference (2007)
Benjamin.Heitmann
slide 18 of 18
@deri.org