1. Thinking through networks: generating, visualizing and analysing complex re-use graphs in the Humanities Section 1 Introduction Marco Büchler Natural Language Processing Group Department of Computer Science University of Leipzig Tom Brughmans Archaeological Computing Research Group Department of Archaeology University of Southampton Marco Büchler – Tom Brughmans
11. A visualisation of the contrastive semantics Source: F. Baumgardt: Visualisierung von Kookkurrenzgraphen. Bachelorarbeit Abteilung Automatische Sprachverarbeitung, Universität Leipzig, 2010. Marco Büchler – Tom Brughmans
12.
13. Marco Büchler – Tom Brughmans Thinking through networks: generating, visualizing and analysing complex re-use graphs in the Humanities Section 2 What is a graph/network
14. Marco Büchler – Tom Brughmans Graph VS network “ A graph is a set of vertices and a set of lines between pairs of vertices. A network consists of a graph and additional information on the vertices or the lines of the graph. ” (Nooy et.al. 2005, 6-7) Simple undirected graph Graph VS network
15. Marco Büchler – Tom Brughmans Simple directed graph Weighted directed graph Graph VS network
16.
17. Marco Büchler – Tom Brughmans Graph visualization Geographical visualization
18. Marco Büchler – Tom Brughmans Topological visualization Graph visualization Linkedin Maps
19. Marco Büchler – Tom Brughmans Graph visualization Circular visualization
20. Marco Büchler – Tom Brughmans Graph visualization Grouped visualization sites ITS ESB ESA ESC ESD
21. Marco Büchler – Tom Brughmans Analytical techniques Degree (k i ) Outdegree (k out j ) Indegree (k in i ) Average degree = average of all degree scores in a single network
22. Marco Büchler – Tom Brughmans Shortest path Average shortest path = average of all shortest path scores between all possible pairs of vertices in the network Analytical techniques
23. Marco Büchler – Tom Brughmans Clustering coefficient (c) = average of the fraction of all possible relationships between all nodes and their direct neighbours. K i = 3 6 c i = 3/6 = 0.5 C = ( c i , c j , … c n ) / n Analytical techniques
24. Marco Büchler – Tom Brughmans Analytical techniques Small-world networks (Watts and Strogatz 1998) L = average shortest path length; C = clustering coefficient; P = random probability
26. Marco Büchler – Tom Brughmans Analytical techniques Degree distribution Degree Number of nodes Scale-free networks (Barabási and Albert 1999)
27. Marco Büchler – Tom Brughmans Thinking through networks: generating, visualizing and analysing complex re-use graphs in the Humanities Section 3 Introduction to text re-use graphs
28. Marco Büchler – Tom Brughmans What do you want to measure? Similarity of branches of the “ same ” knowledge, knowledge is changing over time
29. Marco Büchler – Tom Brughmans What is of interest? Functionality of objects vs. object of interest, critical amount of re-use
30.
31.
32. Marco Büchler – Tom Brughmans 7 different versions of the Holy Bible The data: * American Standard Version (ASV) * Bible in Basic English (Basic) * Darby Bible (Darby) * King James Version (KJV) * World English Bible (WEB) * Webster Bible (Webster) * Young's Literal Translation (YLT) 28,632 verses are selected that occurred in all versions. Bible version Word tokens Word types Token/type ratio ASV 741267 13485 54.97 Basic 791367 7350 100.85 Darby 732928 14971 48.96 KJV 746746 13466 55.45 WEB 722817 13556 54.68 Webster 744137 13655 54.50 YLT 745422 13973 53.34
33.
34.
35. Marco Büchler – Tom Brughmans Level 2: Syntactical training – details N-gram feature Overlapping Non overlapping Shingling Local hash breaking Longest Common Consecutive Words Property of overlapping features Syntactical feature Property of constant or variable n-gram length Global hash breaking
36.
37. Marco Büchler – Tom Brughmans Level 4: Linking – types comparing re-use units Intra corpus detection (Text re-use): Inter corpus detection (Modern: Plagiarism, Ancient: e.g. bible):
38. Marco Büchler – Tom Brughmans Level 5: Usage of text re-use by similarity
39.
40. Marco Büchler – Tom Brughmans Thinking through networks: generating, visualizing and analysing complex re-use graphs in the Humanities Section 4 Working with text re-use graphs
41. Marco Büchler – Tom Brughmans Accessing text re-use graphs I
42. Marco Büchler – Tom Brughmans Accessing text re-use graphs II
43. Marco Büchler – Tom Brughmans Macro view (very distant reading) I
44. Marco Büchler – Tom Brughmans Macro view (very distant reading) II Middle Platonism Neoplatonism
49. Marco Büchler – Tom Brughmans Temperature view (distant reading) A text re-use from a document with a high text re-use coverage is more trustworthy than from a less frequently re-used text. A text re-use from a section of a document with a high text re-use temperature is more trustworthy than from a less frequently re-used part of a document.
50. Marco Büchler – Tom Brughmans Dotplot view (mid distant reading) Source (Plot): John Lee: A Computational Model of Text Reuse in Ancient Literary Texts, 2009.
51. Marco Büchler – Tom Brughmans Thinking through networks: generating, visualizing and analysing complex re-use graphs in the Humanities Section 5 Discussion Join the networks network Google group! http://groups.google.com/group/the-networks-network?hl=en-GB Tom Brughmans [email_address] http://archaeologicalnetworks.wordpress.com/ Marco Büchler [email_address] http://www.asv.informatik.uni-leipzig.de/
Notas do Editor
Funded by german government
Funded by german government
Funded by german government
Funded by german government
Funded by german government
Funded by german government
Funded by german government
Funded by german government
IMPORTANT: Difference of text types: Plato vs Atthidographers
SWITH ALOOS --> MBUECHLER: talking about “ literal ” citations - not in a modern way, but deviations -> personal focus centered on them - to verify Platonic text How can this be made easier for me?
SWITCH: MBUECHLER --> ALOOS Link to visualisation SWITCH: ALOOS --> MBUECHLER Ohne Worte
SWITCH: MBUECHLER --> ALOOS Link to visualisation SWITCH: ALOOS --> MBUECHLER Ohne Worte
Funded by german government
SWITCH: MBUECHLER --> ALOOS Link to visualisation SWITCH: ALOOS --> MBUECHLER Ohne Worte