SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Closing the Gap:
       Data Models for
Documentary Linguistics
                                     Baden Hughes
 Department of Computer Science and Software Engineering
                              The University of Melbourne
                                   badenh@cs.mu.oz.au
Overview

 Overall Context
 The Electronic Data Format Challenge
 Common Problems
 Data Encoding Models
    Lexicons, interlinear texts, paradigms, syntactic trees, annotation
    standards, query languages
 Linguistic Motivations vs Computational Interests
 New Types of Data Exploration
 Effects on Linguistic Analysis
 New Tools
 Conclusions
                        Latrobe Uni - Linguistics Seminar - 20050505      2
Overall Context

 Large amounts of human language data
 continues to be managed in electronic form
 and analysed in fieldwork-driven linguistic
 documentation
 Increasing focus on acquisition-centric
 methodologies which have vastly increased
 the rate of growth of linguistic data
 Reasonably static basic linguistic data
 structures largely grounded in print domain
               Latrobe Uni - Linguistics Seminar - 20050505   3
The Electronic Data Format
Challenge
 The methods used for the digital encoding of
 linguistic data are often disparate
   Often at best reduced to native formats supported by
   widely-used tools such as Shoebox
 Conversion is typically complex and lossy
   Sometimes this can’t be predicted in advance
 Many utility manipulation functions required to move
 data between analytical applications and outputs
   These functions are largely external to analytical
   environments, with some notable exceptions (eg regular
   expression manipulation)

                  Latrobe Uni - Linguistics Seminar - 20050505   4
Common Problems

 Despite diversity of language and analytical
 approach, many documentary and descriptive
 linguists face a common challenge: the
 interoperability and longevity of electronic data
 generated in fieldwork settings.
 Repurposing data
   Publishing data on the web
   Publishing in papers
   New analysis tools
   New generation formats
                Latrobe Uni - Linguistics Seminar - 20050505   5
The Emergence of Abstract
Language Data Encoding Models
 Recently, a number formal data encoding models for
 linguistic data types have emerged from projects
 investigating quot;best practicequot; methods for preserving
 linguistic data.
 We will briefly consider models for
   lexicons
   interlinear texts
   paradigms
   syntactic trees
   annotation standards
   query languages

                  Latrobe Uni - Linguistics Seminar - 20050505   6
Data Models (1)
 Lexicons
   Bell & Bird (2001)
 Interlinear Text
   Bow, Hughes & Bird (2003)
   Hughes, Bird & Bow (2003)
 Linguistic Paradigms
   Penton, Bow, Bird & Hughes (2004)
   Penton & Bird (2004)

                 Latrobe Uni - Linguistics Seminar - 20050505   7
Data Models (2)
 Syntactic Trees
   Lai & Bird (2004)
 Annotation Standards
   Farrar, Lewis & Langendoen (2002)
   Farrar & Langendoen (2003)
 Query Languages
   Bird, Chen, Davidson, Lee & Zheng (2005)
   Cassidy & Bird (2000)
   Taylor (2004)
                Latrobe Uni - Linguistics Seminar - 20050505   8
Linguistic Motivations

 Data models – so what ?
 It is the combined utility of these models that makes
 them attractive to documentary linguists
 The challenge is to lower the barrier to use of these
 technologies in fieldwork and analytical contexts
 Linguistics (mostly) don’t care about the technology,
 they just want to do linguistics!
 Computer scientists are generally not interested in
 linguistics …

                 Latrobe Uni - Linguistics Seminar - 20050505   9
Computational Interests

 The development of such models may be inherently
 interesting to computationally inclined researchers
   Human language data encoding and annotation is
   genuinely interesting in computer science terms;
   unfortunately basic data modelling isn'  t
   Technologists have a bad habit of providing advice which
   is intended well but lacks traction for non-technical
   communities (eg “use XML”)
   Many of the solutions are XML-based, but contain many
   more components than just XML encoded data


                  Latrobe Uni - Linguistics Seminar - 20050505   10
New Types of Data Exploration (1)
 Open implemented solutions for a range of
 manipulations are available
   Lexicons
    Generation of different types of lexicons
   Interlinear Text (see following examples …)
    Generation of different types of interlinear text
    Induction of morphosyntactic glossing from lexicons
    Generation of lexicons from interlinear text
    Enrichment of lexicons from interlinear text

                  Latrobe Uni - Linguistics Seminar - 20050505   11
Nenets Interlinear (1)




            Latrobe Uni - Linguistics Seminar - 20050505   12
Nenets Interlinear (2)




            Latrobe Uni - Linguistics Seminar - 20050505   13
New Types of Data Exploration (2)
 Open implemented solutions for a range of
 manipulations are available
   Syntactic Trees
     Induction of trees from interlinear text
     Creation of interlinear text from syntactic tree drawing
     Creation of lexicons from syntactic trees
   Paradigms (see following examples …)
     Generation of different types of paradigms
     Induction of paradigms from interlinear text
     Annotation of interlinear text from paradigms
     Enrichment of lexicons from paradigms

                      Latrobe Uni - Linguistics Seminar - 20050505   14
Kanarese Paradigm (1)




          Latrobe Uni - Linguistics Seminar - 20050505   15
Kanarese Paradigm (2)




          Latrobe Uni - Linguistics Seminar - 20050505   16
Effects on Linguistic Analysis

 Integrated encoding standards for linguistic
 data affect the practice of linguistic analysis
   Some analysis types are now easier
   New possibilities emerge
   New analytical challenges are discovered
   Data linkage/integration is certainly one of the
   improvements



                 Latrobe Uni - Linguistics Seminar - 20050505   17
New Tools

 The next generation of tools which support these
 data models natively are emerging eg FIELD, ELAN,
 Toolbox (almost)
 “Middleware” which allows the translation of legacy
 formats to and from these models are reasonably
 widely available
 Analytical tools are increasingly being implemented
 with web-grounded technologies and using web-
 derived models
 Open source/open data approaches are becoming
 pervasive
                 Latrobe Uni - Linguistics Seminar - 20050505   18
Conclusion

 Reducing the gap between computationally tractable
 representations on which a high degree of
 functionality can be built and simple underlying
 formats driven by fieldwork-oriented tools
 Reduces the intermediate data-munging steps which
 require technical knowledge rather than linguistic
 knowledge is advantageous to all parties
 While we are not quite “there yet”, the light at the
 end of the tunnel is definitely there
 Growing community of philosophically aligned
 computer scientists and linguists
                 Latrobe Uni - Linguistics Seminar - 20050505   19
References

  Bell & Bird, 2001. A Preliminary Study of the Structure of Lexicon Entries. Proceedings of
  the Workshop on Web-Based Language Documentation and Description.
  Bow, Hughes & Bird 2003. Towards a General Model for Interlinear Text. Proceedings of
  EMELD 2003.
  Farrar, Lewis & Langendoen, 2002. A Common Ontology for Linguistic Concepts.
  Proceedings of the Knowledge Technologies Conference.
  Farrar & Langendoen, 2003. A linguistic ontology for the Semantic Web. GLOT
  International 7(3)
  Hughes, Bird & Bow, 2003. Encoding and Presenting Interlinear Text Using XML
  Technologies. Proceedings of ALTW 2003.
  Lai & Bird, 2004. Querying and Updating Treebanks: A Critical Survey and Requirements
  Analysis. Proceedings of ALTW 2004.
  Penton, Bow, Bird & Hughes, 2004. Towards a General Model for Linguistic Paradigms.
  Proceedings of EMELD 2004.
  Penton & Bird, 2004. Representing and Rendering Linguistic Paradigms. Proceedings of
  ALTW 2004.
  Bird, Chen, Davidson, Lee & Zheng, 2005. Extending XPath to Support Linguistic Queries.
  Proceedings of PLANX 2005.
  Cassidy & Bird, 2000. Querying databases of annotated speech. Proceedings of the
  Eleventh Australasian Database Conference.
  Taylor, 2004. XSLT as a Linguistic Query Language. BSc(Hons) Thesis, University of
  Melbourne.
                             Latrobe Uni - Linguistics Seminar - 20050505                      20
Questions ? Comments ?




           Latrobe Uni - Linguistics Seminar - 20050505   21

Mais conteúdo relacionado

Mais procurados

An Abridged Version of My Statement of Research Interests
An Abridged Version of My Statement of Research InterestsAn Abridged Version of My Statement of Research Interests
An Abridged Version of My Statement of Research Interestsadil raja
 
WP3 Further specification of Functionality and Interoperability - Gradmann / ...
WP3 Further specification of Functionality and Interoperability - Gradmann / ...WP3 Further specification of Functionality and Interoperability - Gradmann / ...
WP3 Further specification of Functionality and Interoperability - Gradmann / ...Europeana
 
Terminology turbocharges your translation: From my archive before TaaS ;-)
Terminology turbocharges your translation: From my archive before TaaS ;-)Terminology turbocharges your translation: From my archive before TaaS ;-)
Terminology turbocharges your translation: From my archive before TaaS ;-)Tatjana Gornostaja
 
Assistive Technology
Assistive TechnologyAssistive Technology
Assistive Technologyjpuglia
 
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLDODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLTakeshi Morita
 
Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaTraian Rebedea
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Reviewinscit2006
 
Taking into account communities of practice’s specific vocabularies in inform...
Taking into account communities of practice’s specific vocabularies in inform...Taking into account communities of practice’s specific vocabularies in inform...
Taking into account communities of practice’s specific vocabularies in inform...inscit2006
 
Pedagogical applications of corpus data for English for General and Specific ...
Pedagogical applications of corpus data for English for General and Specific ...Pedagogical applications of corpus data for English for General and Specific ...
Pedagogical applications of corpus data for English for General and Specific ...Pascual Pérez-Paredes
 
Language Grid
Language GridLanguage Grid
Language Gridlindh
 
Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology EngineeringLearning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineeringbutest
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain OntologyKeerti Bhogaraju
 
Novelty detection via topic modeling in research articles
Novelty detection via topic modeling in research articlesNovelty detection via topic modeling in research articles
Novelty detection via topic modeling in research articlescsandit
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information RetrievalNik Spirin
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianOpinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianTraian Rebedea
 
A Semantic-enhanced Inference Framework for Heterogeneous Resources Management
A Semantic-enhanced Inference Framework for Heterogeneous Resources ManagementA Semantic-enhanced Inference Framework for Heterogeneous Resources Management
A Semantic-enhanced Inference Framework for Heterogeneous Resources ManagementSilvia Giannini
 

Mais procurados (20)

An Abridged Version of My Statement of Research Interests
An Abridged Version of My Statement of Research InterestsAn Abridged Version of My Statement of Research Interests
An Abridged Version of My Statement of Research Interests
 
WP3 Further specification of Functionality and Interoperability - Gradmann / ...
WP3 Further specification of Functionality and Interoperability - Gradmann / ...WP3 Further specification of Functionality and Interoperability - Gradmann / ...
WP3 Further specification of Functionality and Interoperability - Gradmann / ...
 
Terminology turbocharges your translation: From my archive before TaaS ;-)
Terminology turbocharges your translation: From my archive before TaaS ;-)Terminology turbocharges your translation: From my archive before TaaS ;-)
Terminology turbocharges your translation: From my archive before TaaS ;-)
 
Assistive Technology
Assistive TechnologyAssistive Technology
Assistive Technology
 
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLDODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
 
Research Statement
Research StatementResearch Statement
Research Statement
 
Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large Corpora
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Review
 
Taking into account communities of practice’s specific vocabularies in inform...
Taking into account communities of practice’s specific vocabularies in inform...Taking into account communities of practice’s specific vocabularies in inform...
Taking into account communities of practice’s specific vocabularies in inform...
 
Pedagogical applications of corpus data for English for General and Specific ...
Pedagogical applications of corpus data for English for General and Specific ...Pedagogical applications of corpus data for English for General and Specific ...
Pedagogical applications of corpus data for English for General and Specific ...
 
AINL 2016: Eyecioglu
AINL 2016: EyeciogluAINL 2016: Eyecioglu
AINL 2016: Eyecioglu
 
Language Grid
Language GridLanguage Grid
Language Grid
 
Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology EngineeringLearning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineering
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
 
Extended WordNet
Extended WordNetExtended WordNet
Extended WordNet
 
Novelty detection via topic modeling in research articles
Novelty detection via topic modeling in research articlesNovelty detection via topic modeling in research articles
Novelty detection via topic modeling in research articles
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianOpinion mining for social media and news items in Romanian
Opinion mining for social media and news items in Romanian
 
M.cheraghi shehni CALL
M.cheraghi shehni CALLM.cheraghi shehni CALL
M.cheraghi shehni CALL
 
A Semantic-enhanced Inference Framework for Heterogeneous Resources Management
A Semantic-enhanced Inference Framework for Heterogeneous Resources ManagementA Semantic-enhanced Inference Framework for Heterogeneous Resources Management
A Semantic-enhanced Inference Framework for Heterogeneous Resources Management
 

Destaque

Temporal Databases: Data Models
Temporal Databases: Data ModelsTemporal Databases: Data Models
Temporal Databases: Data Modelstorp42
 
All data models in dbms
All data models in dbmsAll data models in dbms
All data models in dbmsNaresh Kumar
 
Paper2 olympics copy
Paper2 olympics copyPaper2 olympics copy
Paper2 olympics copyCorey Topf
 
Ouderavond Groningen Maart 2011
Ouderavond Groningen Maart 2011Ouderavond Groningen Maart 2011
Ouderavond Groningen Maart 2011inespee
 
Techo Club Freshmen
Techo Club FreshmenTecho Club Freshmen
Techo Club FreshmenCorey Topf
 
Bds Blind Spot Mirror Systems Uk
Bds Blind Spot Mirror Systems UkBds Blind Spot Mirror Systems Uk
Bds Blind Spot Mirror Systems UkGuido Weijerman
 
0708 Usability Test Methodes
0708 Usability Test Methodes0708 Usability Test Methodes
0708 Usability Test MethodesHans Kemp
 
Week2 S Ponges
Week2 S PongesWeek2 S Ponges
Week2 S PongesCorey Topf
 
For Sale 7914 Skyview St Slideshow
For Sale 7914 Skyview St SlideshowFor Sale 7914 Skyview St Slideshow
For Sale 7914 Skyview St Slideshowrteam
 
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML TechnologiesEncoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML TechnologiesBaden Hughes
 
Decentralized reception grethe
Decentralized reception   gretheDecentralized reception   grethe
Decentralized reception gretheGeorge Bekiaridis
 
If We're Not There Yet, How Far Do We Have To Go ? Web Metadata at The Univer...
If We're Not There Yet, How Far Do We Have To Go ? Web Metadata at The Univer...If We're Not There Yet, How Far Do We Have To Go ? Web Metadata at The Univer...
If We're Not There Yet, How Far Do We Have To Go ? Web Metadata at The Univer...Baden Hughes
 
Zappos - General Mills - 8-5-09
Zappos - General Mills - 8-5-09Zappos - General Mills - 8-5-09
Zappos - General Mills - 8-5-09zappos
 
Gardeners Not Gate Keepers
Gardeners Not Gate KeepersGardeners Not Gate Keepers
Gardeners Not Gate Keeperscalevans
 
Week 14 Sponges
Week 14 SpongesWeek 14 Sponges
Week 14 SpongesCorey Topf
 
0708 Introductie user experience design minor
0708 Introductie user experience design minor0708 Introductie user experience design minor
0708 Introductie user experience design minorHans Kemp
 
Dutch PHP Conference
Dutch PHP ConferenceDutch PHP Conference
Dutch PHP Conferencecalevans
 

Destaque (20)

Data models
Data modelsData models
Data models
 
Temporal Databases: Data Models
Temporal Databases: Data ModelsTemporal Databases: Data Models
Temporal Databases: Data Models
 
All data models in dbms
All data models in dbmsAll data models in dbms
All data models in dbms
 
Paper2 olympics copy
Paper2 olympics copyPaper2 olympics copy
Paper2 olympics copy
 
Ouderavond Groningen Maart 2011
Ouderavond Groningen Maart 2011Ouderavond Groningen Maart 2011
Ouderavond Groningen Maart 2011
 
Techo Club Freshmen
Techo Club FreshmenTecho Club Freshmen
Techo Club Freshmen
 
Bds Blind Spot Mirror Systems Uk
Bds Blind Spot Mirror Systems UkBds Blind Spot Mirror Systems Uk
Bds Blind Spot Mirror Systems Uk
 
0708 Usability Test Methodes
0708 Usability Test Methodes0708 Usability Test Methodes
0708 Usability Test Methodes
 
Copyleft
CopyleftCopyleft
Copyleft
 
Week2 S Ponges
Week2 S PongesWeek2 S Ponges
Week2 S Ponges
 
For Sale 7914 Skyview St Slideshow
For Sale 7914 Skyview St SlideshowFor Sale 7914 Skyview St Slideshow
For Sale 7914 Skyview St Slideshow
 
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML TechnologiesEncoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
 
Decentralized reception grethe
Decentralized reception   gretheDecentralized reception   grethe
Decentralized reception grethe
 
If We're Not There Yet, How Far Do We Have To Go ? Web Metadata at The Univer...
If We're Not There Yet, How Far Do We Have To Go ? Web Metadata at The Univer...If We're Not There Yet, How Far Do We Have To Go ? Web Metadata at The Univer...
If We're Not There Yet, How Far Do We Have To Go ? Web Metadata at The Univer...
 
Kick Off
Kick OffKick Off
Kick Off
 
Zappos - General Mills - 8-5-09
Zappos - General Mills - 8-5-09Zappos - General Mills - 8-5-09
Zappos - General Mills - 8-5-09
 
Gardeners Not Gate Keepers
Gardeners Not Gate KeepersGardeners Not Gate Keepers
Gardeners Not Gate Keepers
 
Week 14 Sponges
Week 14 SpongesWeek 14 Sponges
Week 14 Sponges
 
0708 Introductie user experience design minor
0708 Introductie user experience design minor0708 Introductie user experience design minor
0708 Introductie user experience design minor
 
Dutch PHP Conference
Dutch PHP ConferenceDutch PHP Conference
Dutch PHP Conference
 

Semelhante a Closing the Gap: Data Models for Documentary Linguistics

Dimensions of Media Object Comprehensibility
Dimensions of Media Object ComprehensibilityDimensions of Media Object Comprehensibility
Dimensions of Media Object ComprehensibilityLawrie Hunter
 
Gellish A Standard Data And Knowledge Representation Language And Ontology
Gellish   A Standard Data And Knowledge Representation Language And OntologyGellish   A Standard Data And Knowledge Representation Language And Ontology
Gellish A Standard Data And Knowledge Representation Language And OntologyAndries_vanRenssen
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information RetrievalDustin Smith
 
The Exploitation of OpenAPI Documents for the Generation of Web Frontends
The Exploitation of OpenAPI Documents for the Generation of Web FrontendsThe Exploitation of OpenAPI Documents for the Generation of Web Frontends
The Exploitation of OpenAPI Documents for the Generation of Web FrontendsIstvanKoren
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...Scott Bou
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indianeSAT Publishing House
 
NAACL2015 presentation
NAACL2015 presentationNAACL2015 presentation
NAACL2015 presentationHan Xu, PhD
 
Error Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language LearningError Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language LearningCITE
 
IRJET- An Efficient Way to Querying XML Database using Natural Language
IRJET-  	  An Efficient Way to Querying XML Database using Natural LanguageIRJET-  	  An Efficient Way to Querying XML Database using Natural Language
IRJET- An Efficient Way to Querying XML Database using Natural LanguageIRJET Journal
 
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect match
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect matchLinked Open (Geo)Data and the Distributed Ontology Language – a perfect match
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect matchChristoph Lange
 
Meaning Extraction - IJCTE 2(1)
Meaning Extraction - IJCTE 2(1)Meaning Extraction - IJCTE 2(1)
Meaning Extraction - IJCTE 2(1)IT Industry
 
Basics of Paper Writing and Publishing in TEL (JTEL 2013)
Basics of Paper Writing and Publishing in TEL (JTEL 2013)Basics of Paper Writing and Publishing in TEL (JTEL 2013)
Basics of Paper Writing and Publishing in TEL (JTEL 2013)Michael Derntl
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationStuart Chalk
 
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...cseij
 
Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...Nakul Sharma
 
Research Developments and Directions in Speech Recognition and ...
Research Developments and Directions in Speech Recognition and ...Research Developments and Directions in Speech Recognition and ...
Research Developments and Directions in Speech Recognition and ...butest
 

Semelhante a Closing the Gap: Data Models for Documentary Linguistics (20)

Dimensions of Media Object Comprehensibility
Dimensions of Media Object ComprehensibilityDimensions of Media Object Comprehensibility
Dimensions of Media Object Comprehensibility
 
Gellish A Standard Data And Knowledge Representation Language And Ontology
Gellish   A Standard Data And Knowledge Representation Language And OntologyGellish   A Standard Data And Knowledge Representation Language And Ontology
Gellish A Standard Data And Knowledge Representation Language And Ontology
 
ppt
pptppt
ppt
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
The Exploitation of OpenAPI Documents for the Generation of Web Frontends
The Exploitation of OpenAPI Documents for the Generation of Web FrontendsThe Exploitation of OpenAPI Documents for the Generation of Web Frontends
The Exploitation of OpenAPI Documents for the Generation of Web Frontends
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
 
NAACL2015 presentation
NAACL2015 presentationNAACL2015 presentation
NAACL2015 presentation
 
Error Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language LearningError Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language Learning
 
Arabic MT Project
Arabic MT ProjectArabic MT Project
Arabic MT Project
 
IRJET- An Efficient Way to Querying XML Database using Natural Language
IRJET-  	  An Efficient Way to Querying XML Database using Natural LanguageIRJET-  	  An Efficient Way to Querying XML Database using Natural Language
IRJET- An Efficient Way to Querying XML Database using Natural Language
 
Lit mtap
Lit mtapLit mtap
Lit mtap
 
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect match
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect matchLinked Open (Geo)Data and the Distributed Ontology Language – a perfect match
Linked Open (Geo)Data and the Distributed Ontology Language – a perfect match
 
Meaning Extraction - IJCTE 2(1)
Meaning Extraction - IJCTE 2(1)Meaning Extraction - IJCTE 2(1)
Meaning Extraction - IJCTE 2(1)
 
Basics of Paper Writing and Publishing in TEL (JTEL 2013)
Basics of Paper Writing and Publishing in TEL (JTEL 2013)Basics of Paper Writing and Publishing in TEL (JTEL 2013)
Basics of Paper Writing and Publishing in TEL (JTEL 2013)
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka Collaboration
 
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
 
Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...Possibility of interdisciplinary research software engineering andnatural lan...
Possibility of interdisciplinary research software engineering andnatural lan...
 
Research Developments and Directions in Speech Recognition and ...
Research Developments and Directions in Speech Recognition and ...Research Developments and Directions in Speech Recognition and ...
Research Developments and Directions in Speech Recognition and ...
 

Mais de Baden Hughes

Managing Perl Installations: A SysAdmin's View
Managing Perl Installations: A SysAdmin's ViewManaging Perl Installations: A SysAdmin's View
Managing Perl Installations: A SysAdmin's ViewBaden Hughes
 
Building Computational Grids with Apple’s Xgrid Middleware
Building Computational Grids with Apple’s Xgrid MiddlewareBuilding Computational Grids with Apple’s Xgrid Middleware
Building Computational Grids with Apple’s Xgrid MiddlewareBaden Hughes
 
Functional Requirements for an Interlinear Text Editor
Functional Requirements for an Interlinear Text EditorFunctional Requirements for an Interlinear Text Editor
Functional Requirements for an Interlinear Text EditorBaden Hughes
 
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...Baden Hughes
 
Disambiguating Advanced Computing for Humanities Researchers
Disambiguating Advanced Computing for Humanities ResearchersDisambiguating Advanced Computing for Humanities Researchers
Disambiguating Advanced Computing for Humanities ResearchersBaden Hughes
 
Metadata Quality Evaluation: Experience from the Open Language Archives Commu...
Metadata Quality Evaluation: Experience from the Open Language Archives Commu...Metadata Quality Evaluation: Experience from the Open Language Archives Commu...
Metadata Quality Evaluation: Experience from the Open Language Archives Commu...Baden Hughes
 
Refactoring Metadata:
Refactoring Metadata:Refactoring Metadata:
Refactoring Metadata:Baden Hughes
 
Towards a Web Search Service for Minority Language Communities
Towards a Web Search Service for Minority Language CommunitiesTowards a Web Search Service for Minority Language Communities
Towards a Web Search Service for Minority Language CommunitiesBaden Hughes
 
Change Management and Versioning in Ontologies
Change Management and Versioning in OntologiesChange Management and Versioning in Ontologies
Change Management and Versioning in OntologiesBaden Hughes
 
Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Co...
Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Co...Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Co...
Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Co...Baden Hughes
 
The Effects of Cross-Pollination : How non-library mass market services are c...
The Effects of Cross-Pollination : How non-library mass market services are c...The Effects of Cross-Pollination : How non-library mass market services are c...
The Effects of Cross-Pollination : How non-library mass market services are c...Baden Hughes
 
Why Digitization Increases the Value of Print Collections
Why Digitization Increases the Value of Print CollectionsWhy Digitization Increases the Value of Print Collections
Why Digitization Increases the Value of Print CollectionsBaden Hughes
 

Mais de Baden Hughes (12)

Managing Perl Installations: A SysAdmin's View
Managing Perl Installations: A SysAdmin's ViewManaging Perl Installations: A SysAdmin's View
Managing Perl Installations: A SysAdmin's View
 
Building Computational Grids with Apple’s Xgrid Middleware
Building Computational Grids with Apple’s Xgrid MiddlewareBuilding Computational Grids with Apple’s Xgrid Middleware
Building Computational Grids with Apple’s Xgrid Middleware
 
Functional Requirements for an Interlinear Text Editor
Functional Requirements for an Interlinear Text EditorFunctional Requirements for an Interlinear Text Editor
Functional Requirements for an Interlinear Text Editor
 
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Pro...
 
Disambiguating Advanced Computing for Humanities Researchers
Disambiguating Advanced Computing for Humanities ResearchersDisambiguating Advanced Computing for Humanities Researchers
Disambiguating Advanced Computing for Humanities Researchers
 
Metadata Quality Evaluation: Experience from the Open Language Archives Commu...
Metadata Quality Evaluation: Experience from the Open Language Archives Commu...Metadata Quality Evaluation: Experience from the Open Language Archives Commu...
Metadata Quality Evaluation: Experience from the Open Language Archives Commu...
 
Refactoring Metadata:
Refactoring Metadata:Refactoring Metadata:
Refactoring Metadata:
 
Towards a Web Search Service for Minority Language Communities
Towards a Web Search Service for Minority Language CommunitiesTowards a Web Search Service for Minority Language Communities
Towards a Web Search Service for Minority Language Communities
 
Change Management and Versioning in Ontologies
Change Management and Versioning in OntologiesChange Management and Versioning in Ontologies
Change Management and Versioning in Ontologies
 
Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Co...
Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Co...Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Co...
Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Co...
 
The Effects of Cross-Pollination : How non-library mass market services are c...
The Effects of Cross-Pollination : How non-library mass market services are c...The Effects of Cross-Pollination : How non-library mass market services are c...
The Effects of Cross-Pollination : How non-library mass market services are c...
 
Why Digitization Increases the Value of Print Collections
Why Digitization Increases the Value of Print CollectionsWhy Digitization Increases the Value of Print Collections
Why Digitization Increases the Value of Print Collections
 

Último

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Closing the Gap: Data Models for Documentary Linguistics

  • 1. Closing the Gap: Data Models for Documentary Linguistics Baden Hughes Department of Computer Science and Software Engineering The University of Melbourne badenh@cs.mu.oz.au
  • 2. Overview Overall Context The Electronic Data Format Challenge Common Problems Data Encoding Models Lexicons, interlinear texts, paradigms, syntactic trees, annotation standards, query languages Linguistic Motivations vs Computational Interests New Types of Data Exploration Effects on Linguistic Analysis New Tools Conclusions Latrobe Uni - Linguistics Seminar - 20050505 2
  • 3. Overall Context Large amounts of human language data continues to be managed in electronic form and analysed in fieldwork-driven linguistic documentation Increasing focus on acquisition-centric methodologies which have vastly increased the rate of growth of linguistic data Reasonably static basic linguistic data structures largely grounded in print domain Latrobe Uni - Linguistics Seminar - 20050505 3
  • 4. The Electronic Data Format Challenge The methods used for the digital encoding of linguistic data are often disparate Often at best reduced to native formats supported by widely-used tools such as Shoebox Conversion is typically complex and lossy Sometimes this can’t be predicted in advance Many utility manipulation functions required to move data between analytical applications and outputs These functions are largely external to analytical environments, with some notable exceptions (eg regular expression manipulation) Latrobe Uni - Linguistics Seminar - 20050505 4
  • 5. Common Problems Despite diversity of language and analytical approach, many documentary and descriptive linguists face a common challenge: the interoperability and longevity of electronic data generated in fieldwork settings. Repurposing data Publishing data on the web Publishing in papers New analysis tools New generation formats Latrobe Uni - Linguistics Seminar - 20050505 5
  • 6. The Emergence of Abstract Language Data Encoding Models Recently, a number formal data encoding models for linguistic data types have emerged from projects investigating quot;best practicequot; methods for preserving linguistic data. We will briefly consider models for lexicons interlinear texts paradigms syntactic trees annotation standards query languages Latrobe Uni - Linguistics Seminar - 20050505 6
  • 7. Data Models (1) Lexicons Bell & Bird (2001) Interlinear Text Bow, Hughes & Bird (2003) Hughes, Bird & Bow (2003) Linguistic Paradigms Penton, Bow, Bird & Hughes (2004) Penton & Bird (2004) Latrobe Uni - Linguistics Seminar - 20050505 7
  • 8. Data Models (2) Syntactic Trees Lai & Bird (2004) Annotation Standards Farrar, Lewis & Langendoen (2002) Farrar & Langendoen (2003) Query Languages Bird, Chen, Davidson, Lee & Zheng (2005) Cassidy & Bird (2000) Taylor (2004) Latrobe Uni - Linguistics Seminar - 20050505 8
  • 9. Linguistic Motivations Data models – so what ? It is the combined utility of these models that makes them attractive to documentary linguists The challenge is to lower the barrier to use of these technologies in fieldwork and analytical contexts Linguistics (mostly) don’t care about the technology, they just want to do linguistics! Computer scientists are generally not interested in linguistics … Latrobe Uni - Linguistics Seminar - 20050505 9
  • 10. Computational Interests The development of such models may be inherently interesting to computationally inclined researchers Human language data encoding and annotation is genuinely interesting in computer science terms; unfortunately basic data modelling isn' t Technologists have a bad habit of providing advice which is intended well but lacks traction for non-technical communities (eg “use XML”) Many of the solutions are XML-based, but contain many more components than just XML encoded data Latrobe Uni - Linguistics Seminar - 20050505 10
  • 11. New Types of Data Exploration (1) Open implemented solutions for a range of manipulations are available Lexicons Generation of different types of lexicons Interlinear Text (see following examples …) Generation of different types of interlinear text Induction of morphosyntactic glossing from lexicons Generation of lexicons from interlinear text Enrichment of lexicons from interlinear text Latrobe Uni - Linguistics Seminar - 20050505 11
  • 12. Nenets Interlinear (1) Latrobe Uni - Linguistics Seminar - 20050505 12
  • 13. Nenets Interlinear (2) Latrobe Uni - Linguistics Seminar - 20050505 13
  • 14. New Types of Data Exploration (2) Open implemented solutions for a range of manipulations are available Syntactic Trees Induction of trees from interlinear text Creation of interlinear text from syntactic tree drawing Creation of lexicons from syntactic trees Paradigms (see following examples …) Generation of different types of paradigms Induction of paradigms from interlinear text Annotation of interlinear text from paradigms Enrichment of lexicons from paradigms Latrobe Uni - Linguistics Seminar - 20050505 14
  • 15. Kanarese Paradigm (1) Latrobe Uni - Linguistics Seminar - 20050505 15
  • 16. Kanarese Paradigm (2) Latrobe Uni - Linguistics Seminar - 20050505 16
  • 17. Effects on Linguistic Analysis Integrated encoding standards for linguistic data affect the practice of linguistic analysis Some analysis types are now easier New possibilities emerge New analytical challenges are discovered Data linkage/integration is certainly one of the improvements Latrobe Uni - Linguistics Seminar - 20050505 17
  • 18. New Tools The next generation of tools which support these data models natively are emerging eg FIELD, ELAN, Toolbox (almost) “Middleware” which allows the translation of legacy formats to and from these models are reasonably widely available Analytical tools are increasingly being implemented with web-grounded technologies and using web- derived models Open source/open data approaches are becoming pervasive Latrobe Uni - Linguistics Seminar - 20050505 18
  • 19. Conclusion Reducing the gap between computationally tractable representations on which a high degree of functionality can be built and simple underlying formats driven by fieldwork-oriented tools Reduces the intermediate data-munging steps which require technical knowledge rather than linguistic knowledge is advantageous to all parties While we are not quite “there yet”, the light at the end of the tunnel is definitely there Growing community of philosophically aligned computer scientists and linguists Latrobe Uni - Linguistics Seminar - 20050505 19
  • 20. References Bell & Bird, 2001. A Preliminary Study of the Structure of Lexicon Entries. Proceedings of the Workshop on Web-Based Language Documentation and Description. Bow, Hughes & Bird 2003. Towards a General Model for Interlinear Text. Proceedings of EMELD 2003. Farrar, Lewis & Langendoen, 2002. A Common Ontology for Linguistic Concepts. Proceedings of the Knowledge Technologies Conference. Farrar & Langendoen, 2003. A linguistic ontology for the Semantic Web. GLOT International 7(3) Hughes, Bird & Bow, 2003. Encoding and Presenting Interlinear Text Using XML Technologies. Proceedings of ALTW 2003. Lai & Bird, 2004. Querying and Updating Treebanks: A Critical Survey and Requirements Analysis. Proceedings of ALTW 2004. Penton, Bow, Bird & Hughes, 2004. Towards a General Model for Linguistic Paradigms. Proceedings of EMELD 2004. Penton & Bird, 2004. Representing and Rendering Linguistic Paradigms. Proceedings of ALTW 2004. Bird, Chen, Davidson, Lee & Zheng, 2005. Extending XPath to Support Linguistic Queries. Proceedings of PLANX 2005. Cassidy & Bird, 2000. Querying databases of annotated speech. Proceedings of the Eleventh Australasian Database Conference. Taylor, 2004. XSLT as a Linguistic Query Language. BSc(Hons) Thesis, University of Melbourne. Latrobe Uni - Linguistics Seminar - 20050505 20
  • 21. Questions ? Comments ? Latrobe Uni - Linguistics Seminar - 20050505 21