SlideShare a Scribd company logo
1 of 49
Download to read offline
Data Analysis for Ancient Corpora
Cody Kingham and Dirk Roorda
FAMES, Cambridge, 2019-01-31
0
50
100
150
200
250
conj nmpr subs adjv prep art
Parts of Speech after Atnach in ETCBC Phrase
background
description
mini-study
new horizons
• Put researchers in control of their
data.
• Empower researchers to fully
harness the data available to them.
• Encourage a new paradigm in the
humanities
🤔
"# data
💰
what’s important
limits
researchers
they decide
Text-Fabric and Hebrew Data
• Free, accessible corpus annotation and analysis tool.
• Published the Amsterdam Hebrew data on Github with free,
open-source license.
• Encouraged researchers to step out of their technological
comfort zones.
A Different Vision
• Researchers are in charge of their data and set the agenda for
its use.
• Researchers are empowered with the tools needed for
powerful data analysis.
• Data is made open-source, freely available
Text-Fabric
• Graph model: words, phrases, etc. are “nodes,” relationships
between them are edges.
• We can model complex data structures better than other
methods (e.g. XML).
• All stored in easy-to-understand, plain-text files. No messy
XML, SQL, etc.
&P005381 = MSVO 3, 70
#atf: lang qpc
@tablet
@obverse
@column 1
1.a. 2(N14) , SZE~a SAL TUR3~a NUN~a
1.b. 3(N19) , |GISZ.TE|
2. 1(N14) , NAR NUN~a SIG7
3. 2(N04)# , PIRIG~b1 SIG7 URI3~a NUN~a
@column 2
1. 3(N04) , |GISZ.TE| GAR |SZU2.((HI+1(N57))+(HI+1(N57)))| GI4~a
2. , GU7 AZ SI4~f
@reverse
@column 1
1. 3(N14) , SZE~a
2. 3(N19) 5(N04) ,
3. , GU7
@column 2
1. , AZ SI4~f
CTBA|CTBA#CTBA#CTB###0#0#0#3#1#0#2#0#0#2#0#0#2#0#0#0#0#0 D;L;DOTH|;L;DOT#;L;DOTA#;LD#D#H#0#0#0#3#1#0#3#0#0#2#0#0#2#1#1#3#0#0
D;WOE|;WOE#;WOE#;WOE#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 MW;KA|MW;KA#MW;KA#MWK###0#1#0#3#1#0#2#0#0#0#0#2#0#0#0#0#0#0 BRH|
BR#BRA#BR##H#0#0#0#3#1#0#2#0#0#2#0#0#2#1#1#3#0#0 DDO;D|DO;D#DO;D#DO;D#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 BRH|
BR#BRA#BR##H#0#0#0#3#1#0#2#0#0#2#0#0#2#1#1#3#0#0 DABRHM|ABRHM#ABRHM#ABRHM#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0
ABRHM|ABRHM#ABRHM#ABRHM###0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 AOLD|AOLD#;LD#;LD###0#5#1#0#1#3#2#0#0#0#0#0#0#0#0#0#0#0 LA;SKX|
A;SKX#A;SKX#A;SKX#L##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 A;SKX|A;SKX#A;SKX#A;SKX###0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 AOLD|
Syriac NT (Sedra database)
DEUT33,02 >C- >;71C 1.000 >;71C- >C-
DEUT33,02 DT D.@73T 1.000 D.@73T DT
DEUT33,09 BNW B.@N@73JW 1.000 B.@N@73W BNW
EST 01,16 MWMKN M:MW.K@81N 1.000 M:WM.K@81N MWMKN
EST 03,04 B- K.:- 1.000 B.:- B-
EST 03,04 >MRM >@M:R@70M 1.000 >@M:R@70M >MRM
Hebrew Ketiv-Qere (ETCBC)
Cuneiform Uruk (CDLI)
(1:1:1:1) bi P PREFIX|bi+
(1:1:1:2) somi N STEM|POS:N|LEM:{som|ROOT:smw|M|GEN
(1:1:2:1) {ll~ahi PN STEM|POS:PN|LEM:{ll~ah|ROOT:Alh|GEN
(1:1:3:1) {l DET PREFIX|Al+
(1:1:3:2) r~aHoma`ni ADJ STEM|POS:ADJ|LEM:r~aHoma`n|ROOT:rHm|MS|GEN
(1:1:4:1) {l DET PREFIX|Al+
(1:1:4:2) r~aHiymi ADJ STEM|POS:ADJ|LEM:r~aHiym|ROOT:rHm|MS|GEN
(1:2:1:1) {lo DET PREFIX|Al+
(1:2:1:2) Hamodu N STEM|POS:N|LEM:Hamod|ROOT:Hmd|M|NOM
Arabic Quran (Tanzil)
Source data of a corpus
TEI, Markdown, ASCII, Database
Data structure of TF - the IKEA spirit
node
order! order!
stacks of components
uniquely identified
words
phrases
chapters
verses
Conversion to TF
TF does more than half of the work
# Consider Phlebas
$ author=Iain M. Banks
## 1
Everything about us,
everything around us,
everything we know [and can know of] is composed ultimately of
patterns of nothing;
that’s the bottom line, the final truth.
So where we find we have any control over those patterns,
why not make the most elegant ones, the most enjoyable and good
ones,
in our own terms?
## 2
Besides,
it left the humans in the Culture free to take care of the things that
really mattered in life,
such as [sports, games, romance,] studying dead languages,
barbarian societies and impossible problems,
and climbing high mountains without the aid of a safety harness.
@node
@compiler=Dirk Roorda
@description=the letters of a word
@name=Culture quotes from Iain
Banks
@source=Good Reads
@url=https://www.goodreads.com/
work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
Everything
about
us
everything
around
us
everything
we
know
and
can
know
of
is
composed
ultimately
of
patterns
of
nothing
that’s
the
bottom
line
the
final
truth
So letters
@node
@compiler=Dirk Roorda
@description=the punctuation after
a word
@name=Culture quotes from Iain
Banks
@source=Good Reads
@url=https://www.goodreads.com/
work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
3 ,
6 ,
20 ;
24 ,
27 .
38 ,
45 ,
51 ,
55 ?
,
75 ,
78 ,
,
,
83 ,
88 ,
99 .
punc
banks/tf/
author.tf
gap.tf
letters.tf
number.tf
oslots.tf
otext.tf
otype.tf
punc.tf
terminator.tf
title.tf
TF dataset
otype
@node
@compiler=Dirk Roorda
@name=Culture quotes from Iain Banks
@source=Good Reads
@url=https://www.goodreads.com/work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
1-99 word
100 book
101-102 chapter
103-114 line
115-117 sentence
oslots
@edge
@compiler=Dirk Roorda
@name=Culture quotes from Iain Banks
@source=Good Reads
@url=https://www.goodreads.com/work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
100 1-99
1-55
56-99
1-3
4-6
7-9,14-20
21-27
28-38
39-51
52-55
56
57-75
76-77,81-83
84-88
89-99
1-27
28-55
56-99
1-99 word
100 book
101-102 chapter
103-114 line
115-117 sentence
## 1
Everything about us,
everything around us,
everything we know [and can know of] is composed ultimately of patterns of
nothing;
that’s the bottom line, the final truth.
So where we find we have any control over those patterns,
why not make the most elegant ones, the most enjoyable and good ones,
in our own terms?
## 2
Besides,
it left the humans in the Culture free to take care of the things that really
mattered in life,
such as [sports, games, romance,] studying dead languages,
barbarian societies and impossible problems,
and climbing high mountains without the aid of a safety harness.
otext
@config
@compiler=Dirk Roorda
@fmt:text-orig-full={letters}{punc}
@name=Culture quotes from Iain Banks
@sectionFeatures=title,number
@sectionTypes=book,chapter
@source=Good Reads
@url=https://www.goodreads.com/work/quotes/14366-consider-phlebas
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
Computing - Python - Jupyter notebooks
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/start.ipynb
BHSA
Quran
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/quran/start.ipynb
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/syrnt/start.ipynb
Syriac NT
Old Babylon'
https://shebanq.ancient-data.org/hebrew/query?version=4b&id=1050 SHEBANQ
Computing - more power!
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/searchFromMQL.ipynb
BHSA
Quran
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/quran/search.ipynb
Quran
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/quran/search.ipynb
Syriac NT
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/syrnt/search.ipynb
Old Babylon'
Uruk
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/uruk/search.ipynb
UrukPower to you! (without the programming)
Uruk
Uruk
Mini-Study:
Atnachs and Phrase Divisions
• How often do atnach accents disagree with the ETCBC phrase
divisions?
• Why?
Sharing and re-using data
Text-Fabric has been developed by a DANS-employee
as a consequence:
Data export is built in ✅
Provenance tracking is built in ✅
Redistribution of newly created data is built in ✅
sharing #1: GitHub & NBviewer
work done in a Jupyter Notebook inside a GitHub repository
is very sharable
https://github.com/Nino-cunei/primers/blob/master/oldbabylonian/OB-primer1.ipynb
sharing #2: Export from TF-browser
sharing #3: Zenodo
sharing #4: Create new features
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/share.ipynb
• etcbc/valence/tf : the results of the verbal valence work of Janet Dyk in the
SYNVAR project;

• etcbc/lingo/heads/tf : head words for phrases, work done by Cody Kingham;

• ch-jensen/Semantic-mapping-of-participants/actor/tf : participant analysis in
progress by Christian Høygaard-Jensen;

• cmerwich/bh-reference-system/tf: participant analysis in progress by
Christiaan Erwich;

• or whatever you have in the making!

• HINT: semantic/fuzzy/plurality for collective nouns (Chip Hardy?)
https://github.com/ETCBC/lingo/tree/master/easter/tf/c
https://github.com/ETCBC/lingo/tree/master/easter/tf/c
Open Science Rocks
thank you
Cody Kingham codykingham@icloud.com
Dirk Roorda dirk.roorda@dans.knaw.nl

More Related Content

Similar to Ancient corpora analysis

From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
Bertram Ludäscher
 
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Duncan Hull
 
DB-IR-ranking
DB-IR-rankingDB-IR-ranking
DB-IR-ranking
FELIX75
 

Similar to Ancient corpora analysis (20)

Digital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the fieldDigital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the field
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
 
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmithThe world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
Complex queries in a distributed multi-model database
Complex queries in a distributed multi-model databaseComplex queries in a distributed multi-model database
Complex queries in a distributed multi-model database
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applications
 
A Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitmentA Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitment
 
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
 
Dark Data In the Long Tail of Science:   Examples in Biology
Dark Data In the Long Tail of Science:  Examples in BiologyDark Data In the Long Tail of Science:  Examples in Biology
Dark Data In the Long Tail of Science:   Examples in Biology
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classification
 
Empirical Semantics
Empirical SemanticsEmpirical Semantics
Empirical Semantics
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012
Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012
Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Modelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyModelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic Study
 
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
 
Recommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenuRecommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenu
 
DB-IR-ranking
DB-IR-rankingDB-IR-ranking
DB-IR-ranking
 

More from Dirk Roorda

Verbal Valency in Hebrew Verbs
Verbal Valency in Hebrew VerbsVerbal Valency in Hebrew Verbs
Verbal Valency in Hebrew Verbs
Dirk Roorda
 

More from Dirk Roorda (20)

TF-FAIR.pdf
TF-FAIR.pdfTF-FAIR.pdf
TF-FAIR.pdf
 
Textpy
TextpyTextpy
Textpy
 
General Missives
General MissivesGeneral Missives
General Missives
 
Text Display (when it gets tricky)
Text Display (when it gets tricky)Text Display (when it gets tricky)
Text Display (when it gets tricky)
 
Tf in-context
Tf in-contextTf in-context
Tf in-context
 
Qdf2tf
Qdf2tfQdf2tf
Qdf2tf
 
Text fabric
Text fabricText fabric
Text fabric
 
Verbal Valency in Hebrew Verbs
Verbal Valency in Hebrew VerbsVerbal Valency in Hebrew Verbs
Verbal Valency in Hebrew Verbs
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchers
 
Annotating the Hebrew Bible
Annotating the Hebrew BibleAnnotating the Hebrew Bible
Annotating the Hebrew Bible
 
20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissen20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissen
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew Bible
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case Study
 
Award
AwardAward
Award
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case Study
 
Hebrew Bible as Data: Laboratory, Sharing, Lessons
Hebrew Bible as Data: Laboratory, Sharing, LessonsHebrew Bible as Data: Laboratory, Sharing, Lessons
Hebrew Bible as Data: Laboratory, Sharing, Lessons
 
Laf fabric-dh benelux2014
Laf fabric-dh benelux2014Laf fabric-dh benelux2014
Laf fabric-dh benelux2014
 
Data Analysis in the Hebrew Bible
Data Analysis in the Hebrew BibleData Analysis in the Hebrew Bible
Data Analysis in the Hebrew Bible
 
LAF Fabric
LAF FabricLAF Fabric
LAF Fabric
 
Auto ingest demo-werklunch 2013-11-05
Auto ingest demo-werklunch 2013-11-05Auto ingest demo-werklunch 2013-11-05
Auto ingest demo-werklunch 2013-11-05
 

Recently uploaded

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Recently uploaded (20)

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 

Ancient corpora analysis

  • 1. Data Analysis for Ancient Corpora Cody Kingham and Dirk Roorda FAMES, Cambridge, 2019-01-31 0 50 100 150 200 250 conj nmpr subs adjv prep art Parts of Speech after Atnach in ETCBC Phrase
  • 3. • Put researchers in control of their data. • Empower researchers to fully harness the data available to them. • Encourage a new paradigm in the humanities
  • 4.
  • 5.
  • 6.
  • 8.
  • 9.
  • 10.
  • 12.
  • 13.
  • 14. Text-Fabric and Hebrew Data • Free, accessible corpus annotation and analysis tool. • Published the Amsterdam Hebrew data on Github with free, open-source license. • Encouraged researchers to step out of their technological comfort zones.
  • 15.
  • 16. A Different Vision • Researchers are in charge of their data and set the agenda for its use. • Researchers are empowered with the tools needed for powerful data analysis. • Data is made open-source, freely available
  • 17. Text-Fabric • Graph model: words, phrases, etc. are “nodes,” relationships between them are edges. • We can model complex data structures better than other methods (e.g. XML). • All stored in easy-to-understand, plain-text files. No messy XML, SQL, etc.
  • 18. &P005381 = MSVO 3, 70 #atf: lang qpc @tablet @obverse @column 1 1.a. 2(N14) , SZE~a SAL TUR3~a NUN~a 1.b. 3(N19) , |GISZ.TE| 2. 1(N14) , NAR NUN~a SIG7 3. 2(N04)# , PIRIG~b1 SIG7 URI3~a NUN~a @column 2 1. 3(N04) , |GISZ.TE| GAR |SZU2.((HI+1(N57))+(HI+1(N57)))| GI4~a 2. , GU7 AZ SI4~f @reverse @column 1 1. 3(N14) , SZE~a 2. 3(N19) 5(N04) , 3. , GU7 @column 2 1. , AZ SI4~f CTBA|CTBA#CTBA#CTB###0#0#0#3#1#0#2#0#0#2#0#0#2#0#0#0#0#0 D;L;DOTH|;L;DOT#;L;DOTA#;LD#D#H#0#0#0#3#1#0#3#0#0#2#0#0#2#1#1#3#0#0 D;WOE|;WOE#;WOE#;WOE#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 MW;KA|MW;KA#MW;KA#MWK###0#1#0#3#1#0#2#0#0#0#0#2#0#0#0#0#0#0 BRH| BR#BRA#BR##H#0#0#0#3#1#0#2#0#0#2#0#0#2#1#1#3#0#0 DDO;D|DO;D#DO;D#DO;D#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 BRH| BR#BRA#BR##H#0#0#0#3#1#0#2#0#0#2#0#0#2#1#1#3#0#0 DABRHM|ABRHM#ABRHM#ABRHM#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 ABRHM|ABRHM#ABRHM#ABRHM###0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 AOLD|AOLD#;LD#;LD###0#5#1#0#1#3#2#0#0#0#0#0#0#0#0#0#0#0 LA;SKX| A;SKX#A;SKX#A;SKX#L##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 A;SKX|A;SKX#A;SKX#A;SKX###0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 AOLD| Syriac NT (Sedra database) DEUT33,02 >C- >;71C 1.000 >;71C- >C- DEUT33,02 DT D.@73T 1.000 D.@73T DT DEUT33,09 BNW B.@N@73JW 1.000 B.@N@73W BNW EST 01,16 MWMKN M:MW.K@81N 1.000 M:WM.K@81N MWMKN EST 03,04 B- K.:- 1.000 B.:- B- EST 03,04 >MRM >@M:R@70M 1.000 >@M:R@70M >MRM Hebrew Ketiv-Qere (ETCBC) Cuneiform Uruk (CDLI) (1:1:1:1) bi P PREFIX|bi+ (1:1:1:2) somi N STEM|POS:N|LEM:{som|ROOT:smw|M|GEN (1:1:2:1) {ll~ahi PN STEM|POS:PN|LEM:{ll~ah|ROOT:Alh|GEN (1:1:3:1) {l DET PREFIX|Al+ (1:1:3:2) r~aHoma`ni ADJ STEM|POS:ADJ|LEM:r~aHoma`n|ROOT:rHm|MS|GEN (1:1:4:1) {l DET PREFIX|Al+ (1:1:4:2) r~aHiymi ADJ STEM|POS:ADJ|LEM:r~aHiym|ROOT:rHm|MS|GEN (1:2:1:1) {lo DET PREFIX|Al+ (1:2:1:2) Hamodu N STEM|POS:N|LEM:Hamod|ROOT:Hmd|M|NOM Arabic Quran (Tanzil) Source data of a corpus TEI, Markdown, ASCII, Database
  • 19. Data structure of TF - the IKEA spirit node order! order! stacks of components uniquely identified words phrases chapters verses
  • 20. Conversion to TF TF does more than half of the work
  • 21. # Consider Phlebas $ author=Iain M. Banks ## 1 Everything about us, everything around us, everything we know [and can know of] is composed ultimately of patterns of nothing; that’s the bottom line, the final truth. So where we find we have any control over those patterns, why not make the most elegant ones, the most enjoyable and good ones, in our own terms? ## 2 Besides, it left the humans in the Culture free to take care of the things that really mattered in life, such as [sports, games, romance,] studying dead languages, barbarian societies and impossible problems, and climbing high mountains without the aid of a safety harness.
  • 22. @node @compiler=Dirk Roorda @description=the letters of a word @name=Culture quotes from Iain Banks @source=Good Reads @url=https://www.goodreads.com/ work/quotes/14366-consider-phlebas @valueType=str @writtenBy=Text-Fabric @dateWritten=2019-01-30T22:20:19Z Everything about us everything around us everything we know and can know of is composed ultimately of patterns of nothing that’s the bottom line the final truth So letters @node @compiler=Dirk Roorda @description=the punctuation after a word @name=Culture quotes from Iain Banks @source=Good Reads @url=https://www.goodreads.com/ work/quotes/14366-consider-phlebas @valueType=str @writtenBy=Text-Fabric @dateWritten=2019-01-30T22:20:19Z 3 , 6 , 20 ; 24 , 27 . 38 , 45 , 51 , 55 ? , 75 , 78 , , , 83 , 88 , 99 . punc banks/tf/ author.tf gap.tf letters.tf number.tf oslots.tf otext.tf otype.tf punc.tf terminator.tf title.tf TF dataset
  • 23. otype @node @compiler=Dirk Roorda @name=Culture quotes from Iain Banks @source=Good Reads @url=https://www.goodreads.com/work/quotes/14366-consider-phlebas @valueType=str @writtenBy=Text-Fabric @dateWritten=2019-01-30T22:20:19Z 1-99 word 100 book 101-102 chapter 103-114 line 115-117 sentence
  • 24. oslots @edge @compiler=Dirk Roorda @name=Culture quotes from Iain Banks @source=Good Reads @url=https://www.goodreads.com/work/quotes/14366-consider-phlebas @valueType=str @writtenBy=Text-Fabric @dateWritten=2019-01-30T22:20:19Z 100 1-99 1-55 56-99 1-3 4-6 7-9,14-20 21-27 28-38 39-51 52-55 56 57-75 76-77,81-83 84-88 89-99 1-27 28-55 56-99 1-99 word 100 book 101-102 chapter 103-114 line 115-117 sentence ## 1 Everything about us, everything around us, everything we know [and can know of] is composed ultimately of patterns of nothing; that’s the bottom line, the final truth. So where we find we have any control over those patterns, why not make the most elegant ones, the most enjoyable and good ones, in our own terms? ## 2 Besides, it left the humans in the Culture free to take care of the things that really mattered in life, such as [sports, games, romance,] studying dead languages, barbarian societies and impossible problems, and climbing high mountains without the aid of a safety harness.
  • 25. otext @config @compiler=Dirk Roorda @fmt:text-orig-full={letters}{punc} @name=Culture quotes from Iain Banks @sectionFeatures=title,number @sectionTypes=book,chapter @source=Good Reads @url=https://www.goodreads.com/work/quotes/14366-consider-phlebas @writtenBy=Text-Fabric @dateWritten=2019-01-30T22:20:19Z
  • 26. Computing - Python - Jupyter notebooks https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/start.ipynb BHSA
  • 31. Computing - more power! https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/searchFromMQL.ipynb BHSA
  • 37. UrukPower to you! (without the programming)
  • 38. Uruk
  • 39. Uruk
  • 40. Mini-Study: Atnachs and Phrase Divisions • How often do atnach accents disagree with the ETCBC phrase divisions? • Why?
  • 41. Sharing and re-using data Text-Fabric has been developed by a DANS-employee as a consequence: Data export is built in ✅ Provenance tracking is built in ✅ Redistribution of newly created data is built in ✅
  • 42. sharing #1: GitHub & NBviewer work done in a Jupyter Notebook inside a GitHub repository is very sharable
  • 44. sharing #2: Export from TF-browser
  • 46. sharing #4: Create new features https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/share.ipynb • etcbc/valence/tf : the results of the verbal valence work of Janet Dyk in the SYNVAR project; • etcbc/lingo/heads/tf : head words for phrases, work done by Cody Kingham; • ch-jensen/Semantic-mapping-of-participants/actor/tf : participant analysis in progress by Christian Høygaard-Jensen; • cmerwich/bh-reference-system/tf: participant analysis in progress by Christiaan Erwich; • or whatever you have in the making! • HINT: semantic/fuzzy/plurality for collective nouns (Chip Hardy?)
  • 49. Open Science Rocks thank you Cody Kingham codykingham@icloud.com Dirk Roorda dirk.roorda@dans.knaw.nl