Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Visualising Dante: The data behind the Divine Comedy
1. Data Stories, “Visualising Dante - The data behind the Divine Comedy”, Ginestra Ferraro, King’s Digital Lab, London, 27 November 2020
Ginestra Ferraro
Senior Research Software
UI/UX Designer
@ginez_17
ginestra.ferraro@kcl.ac.uk
Visualising Dante:
The data behind
the Divine Comedy
26/27 November 2020
Data Stories Symposium
2. Data Stories, “Visualising Dante - The data behind the Divine Comedy”, Ginestra Ferraro, King’s Digital Lab, London, 27 November 2020
King’s Digital Lab (KDL)
● Research Software Engineering Lab
● Sits in the Faculty of Arts & Humanities
● 13+ permanent staff members
● Established in 2015
● Solid SDLC and Agile processes
3. Data Stories, “Visualising Dante - The data behind the Divine Comedy”, Ginestra Ferraro, King’s Digital Lab, London, 27 November 2020
Visualising Dante - The data behind the Divine Comedy
Context (and disclaimers)
This work is the result of a final MSc Computer Science project, built* using 10% personal
development time offered by KDL plus evenings and weekends available in the span of two months
during the Summer of 2018.
The project receives irregular updates. It’s a proof of concept with a vision to create a reusable tool
to generate semi-automated data visualisations based on text analysis.
The code is available in a Github repository for anyone to play with (under MIT license).
https://github.com/ginestra/dante-visualised
And the project is published at https://ginestra.github.io/dante-visualised/
* Built, but by no means completed.
4. Data Stories, “Visualising Dante - The data behind the Divine Comedy”, Ginestra Ferraro, King’s Digital Lab, London, 27 November 2020
Visualising Dante - The data behind the Divine Comedy
Who is Dante Alighieri and why the Divine Comedy ?
Dante Alighieri was a 13th Century Italian poet, credited with clearing the path for using and
innovating vernacular language instead of Latin in Italian poetry, making it accessible to a larger
audience.
The Divine Comedy is an allegorical long narrative poem written circa 1308–20. It is widely
considered to be one of the greatest works of world literature.1
The narrative traces the Dante’s journey from darkness and error to the revelation of the divine
light, culminating in the Beatific Vision of God.2
The poem is divided in three sections: Inferno, Purgatorio, Paradiso.
Dante’s Divine Comedy makes an interesting case study because of its structural (spatial and
temporal) textual components lend themselves to be represented graphically, and offer insights into
its original linguistic content.
1 https://en.wikipedia.org/wiki/Divine_Comedy
2 https://www.britannica.com/topic/The-Divine-Comedy
5. Data Stories, “Visualising Dante - The data behind the Divine Comedy”, Ginestra Ferraro, King’s Digital Lab, London, 27 November 2020
Visualising Dante - The data behind the Divine Comedy
Visual representation of the sentiment analysis and allow for comparisons between the three
sections.
Specific to the Italian version (Petrocchi 1966-67), render the schematic representation of the
poem’s structure and rhythm.
The work is written in terza rima, a set of tercets, where every line if formed by a fixed number of
syllables (11), and alternate rhymes, and the sections contain respectively 34, 33 and 33 Cantos (a
form of division in medieval long poetry).
Show a particularly interesting and uneven distribution of keywords.
Objectives
6. Data Stories, “Visualising Dante - The data behind the Divine Comedy”, Ginestra Ferraro, King’s Digital Lab, London, 27 November 2020
Visualising Dante - The data behind the Divine Comedy
Inferno: Bad, very bad.
Purgatorio: It was bad, it will be okay.
Paradiso: Good. Actually, great!
The story, the data, the sentiment
Violent death and painful wounds may be 3
You will not reach the peak before you see 4
That he loves well and hopes well and has faith 5
3 Inferno, Canto XI, line 34
4 Purgatorio, Canto VI, line 55
5 Paradiso, Canto XXIV, line 40
-0.89
+0.02
+0.91
Range (-1, +1)
7. Data Stories, “Visualising Dante - The data behind the Divine Comedy”, Ginestra Ferraro, King’s Digital Lab, London, 27 November 2020
Visualising Dante - The data behind the Divine Comedy
The story, the data, the sentiment
Sentiment analysis visualisation of
the three cantiche.
Red is negative, blue is positive and
the opacity indicates how close to
the polarity (-1, 1) the sentiment is.
One square per line.
ginestra.github.io/dante-visualised/sentiment-pattern/
8. Data Stories, “Visualising Dante - The data behind the Divine Comedy”, Ginestra Ferraro, King’s Digital Lab, London, 27 November 2020
Visualising Dante - The data behind the Divine Comedy
Distribution of
keywords
The story, the data, the keywords
Fun fact
Dante closes each Cantica
with the word stelle
(stars), never uses the
word Cristo (Christ) in
the Inferno whilst its
often present in the
Paradiso.
ginestra.github.io/dante-visualised/repetitions-pattern/
9. Data Stories, “Visualising Dante - The data behind the Divine Comedy”, Ginestra Ferraro, King’s Digital Lab, London, 27 November 2020
Visualising Dante - The data behind the Divine Comedy
Rhyme prediction *
The story, the data, the textual structure
Nel mezzo del cammin di nostra vita
mi ritrovai per una selva oscura
ché la diritta via era smarrita.
Ahi quanto a dir qual era è cosa dura
esta selva selvaggia e aspra e forte
che nel pensier rinova la paura!
Tant’ è amara che poco è più morte;
ma per trattar del ben ch’i’ vi trovai,
dirò de l’altre cose ch’i’ v’ho scorte.
1st, 2nd and 3rd tercets, Inferno, Canto I
* The rhyme prediction is
accurate depending on
the textual structure
analysed. In the case of
terza rima, it works
except for the first and
last lines of each Canto.
10. Data Stories, “Visualising Dante - The data behind the Divine Comedy”, Ginestra Ferraro, King’s Digital Lab, London, 27 November 2020
Visualising Dante - The data behind the Divine Comedy
Rhyme rhythm **
The story, the data, the textual structure
Nel mezzo del cammin di nostra vita
mi ritrovai per una selva oscura
ché la diritta via era smarrita.
Ahi quanto a dir qual era è cosa dura
esta selva selvaggia e aspra e forte
che nel pensier rinova la paura!
Tant’ è amara che poco è più morte;
ma per trattar del ben ch’i’ vi trovai,
dirò de l’altre cose ch’i’ v’ho scorte.
1st, 2nd and 3rd tercets, Inferno, Canto I
** Line and rhyme
lengths as well as
matching rhymes counted
by number of chars.
11. Data Stories, “Visualising Dante - The data behind the Divine Comedy”, Ginestra Ferraro, King’s Digital Lab, London, 27 November 2020
Visualising Dante - The data behind the Divine Comedy
Rhyme rhythm
The story, the data, the textual structure
ginestra.github.io/
dante-visualised/
rhymes/inferno/
12. Data Stories, “Visualising Dante - The data behind the Divine Comedy”, Ginestra Ferraro, King’s Digital Lab, London, 27 November 2020
Visualising Dante - The data behind the Divine Comedy
The main success lies in
its modular development,
making it amenable to
further development.
More languages and
different text structures
will be integrated and a
wider range of output
visualisations offered,
while making use of the
same core functionalities
for ingesting and
processing data.
Data model and future plans
The data model of the application, illustrating the separation of concerns
and the potential for extensibility.
13. Data Stories, “Visualising Dante - The data behind the Divine Comedy”, Ginestra Ferraro, King’s Digital Lab, London, 27 November 2020
Visualising Dante - The data behind the Divine Comedy
Thank you.
Ginestra Ferraro
Senior Research Software
UI/UX Designer
@ginez_17
ginestra.ferraro@kcl.ac.uk
Notas do Editor
Who I am and why I’m here
UI/UX Designer, working at King’s since 2013, previously in the Department of Digital Humanities, at KDL since 2015.
My main interests are data visualizations, immersive experience and accessibility. My work in KDL ranges from user research to applied user interface design and development.
KDL was established in 2015 by staff previously embedded within the Department of Digital Humanities. Moved to RSE Lab, with solid research processes aligning with Research & Development units both in academia and the commercial world.
Range of collaborative projects ranging from History and Classics to Augmented Reality and Immersive Experience.
hendecasyllable
Example verse for each Cantica.
The analysis was performed with Vader on NLTK. Returning values range from -1 to +1.
Although the analysis is by line and not by tercet, the use of words with their meaning adds up to the expected sentiment for each Cantica.
One obvious improvement would have been to output the mean value per Cantica, rather than relying on the output by colour only. This is on the list of new features.
Example verse for each Cantica.
The sentiment analysis was performed with Vader on NLTK. Returning values range from -1 to +1.
Although the analysis is by line and not by tercet, the use of words with their meaning adds up to the expected sentiment for each Cantica.
The interaction allows for contextual information like the Cantica, the Canto number, the line number and the text in the line.
It can be improved by displaying the occurence in context, making its location clearer, showing the uneven distribution.
The rhyme prediction is accurate depending on the textual structure analysed. In the case of terza rima, it works except for the first and last lines of each Canto.
The hendecasyllable gives a particular rhythm to the reading.
The rhyme prediction is accurate depending on the textual structure analysed. In the case of terza rima, it works except for the first and last lines of each Canto.
The representation of the line is rendered by counting the number of chars and the length of the rhyme is counted based on the number of matching letters counting from the end.The number of syllable is the same for every line: 11. There are exceptions in the Italian language on how they are counted based on how they are related to the following word.
The rhyme prediction is accurate depending on the textual structure analysed. In the case of terza rima, it works except for the first and last lines of each Canto.
The main success lies in its modular development, making it amenable to further development (algorithm refinements, visualisation workflows, stylometric analysis).
More languages and different text structures will be integrated and a wider range of output visualisations offered, while making use of the same core functionalities for ingesting and processing data.