SlideShare uma empresa Scribd logo
1 de 47
Baixar para ler offline
KNOWLEDGE GRAPH
USE CASES IN
NATURAL LANGUAGE
GENERATION
Elena Simperl, @esimperl
INLG/SIGDial 2023
KNOWLEDGE
ENGINEERING
“the technical, scientific and social
aspects involved in building,
maintaining and using knowledge-
based systems.”
[Source: Wikipedia]
ORGANISING THE WORLD’S
INFORMATION
Find the right thing Get the best summary Go deeper and broader
[Source: https://blog.google/products/search/introducing-knowledge-graph-things-not/]
KNOWLEDGE GRAPHS
STORE INTERLINKED
DESCRIPTIONS OF ENTITIES
OF INTEREST IN A DOMAIN
5
Name and identifier
Labels and descriptions
Relationships
Links to other sources
KNOWLEDGE-GRAPH-
LIKE ARTIFACTS HAVE
BEEN AROUND FOR
DECADES
NOT YOUR USUAL GOFAI
• Orders of magnitude higher scales than
GOFAI knowledge bases
• Simple knowledge representation, no
formal semantics
• Vocabulary reuse, networks of small
modular vocabularies
• Incomplete, inconsistent, always changing
• Built via human-AI pipelines (w/ ETL,
information extraction etc.)
• Many large open-source projects with
strong communities
• Knowledge graph services for developers
[Source: Noy et al., 2019]
NOT YOUR USUAL
LLM EITHER
Content-wise some overlap, but different
paradigm: knowledge engineering,
decentralized data publishing i.e., identifiers,
reusable schemas, interlinking
More related to semantic networks, frames,
rule-based NLP than LLM
Knowledge graphs are a common source of
embeddings in AI systems
KNOWLEDGE GRAPHS AND NLP
[Source: Schneider et al., 2019]
CLOSING THE
DATA DIVIDE
Natural language generation helps users with
diverse levels of digital literacy to share their
knowledge
BACKGROUND: WIKIDATA
Collaborative knowledge graph,
Wikimedia project (2012)
23k active users, 106m items, 1.9b
edits
Open license
RDF support, links to LOD cloud
BACKGROUND:
ARTICLEPLACEHOLDER
MEDIAWIKI.ORG/WIKI/
EXTENSION:ARTICLEPLACEHOLDER
Wikipedia is available in 300+
languages, but content is unevenly
distributed
Wikidata is cross-lingual, but less
accessible to edit than Wikipedia
ArticlePlaceholders display Wikidata
triples as stubs for articles in
underserved Wikipedia’s
Currently deployed in 14 Wikipedia’s
NEURAL NETWORK
TRAINED ON
WIKIDATA/WIKIPEDIA
Feed-forward architecture encodes
triples from the ArticlePlaceholder
into vector of fixed dimensionality
RNN-based decoder generates text
summaries, one token at a time
Optimisations for different entity
verbalisations, rare entities etc.
RESEARCH
QUESTIONS
RQ1 Can we train a neural
network to generate text from
triples in a multilingual setting?
RQ2 How do editors perceive the
generated text on the
ArticlePlaceholder page?
RQ3 How do editors use the
generated sentence in their
work?
Data Methods Participants
RQ1 Metrics
and survey
answers
Metrics-based
evaluation (BLEU 2,
BLEU 3, BLEU 4,
METEOR and
ROUGEL), scores
(perceived fluency,
appropriateness)
Readers of Arabic
and Esperanto
Wikipedia’s
RQ2 Interviews Task-based
evaluation,
thematic analysis
Arabic, Persian,
Indonesian,
Hebrew,
Swedish Wikipedia
editors
RQ3 Interviews
and text reuse
metrics
Task-based
evaluation,
thematic analysis,
metrics-based
evaluation
Arabic, Persian,
Indonesian,
Hebrew,
Swedish Wikipedia
editors
RQ1 METRICS-BASED EVALUATION
Trained on corpus of Wikipedia sentences
and corresponding Wikidata triples (205k
Arabic; 102k Esperanto)
Tested against three baselines: machine
translation (MT) and template retrieval
(TR, TRext), 5-gram Kneser-Ney language
model (KN)
Using standard metrics: BLEU, METEOR,
ROUGEL
RQ1: METRICS-
BASED
EVALUATION
Approach outperforms
baselines and generalises
across domains
Property placeholders
improve performance
RQ1: PERCEIVED
FLUENCY AND
APPROPRIATENESS OF
TEXT
54 participants, frequent readers of
Arabic (n=27) and Esperanto (n=27)
Wikipedia
60 summaries (30 automatically
generated, 15 news, 15 Wikipedia
sentences from the training corpus)
Esperanto news from Le Monde
Diplomatique, Arabic news from BBC
Arabic
Arabic: ~400 annotations (each),
Esperanto: ~230 annotations (each)
RQ1: PERCEIVED FLUENCY AND
APPROPRIATENESS OF TEXT
Participants could tell the
difference between news and
Wikipedia content.
AI can produce Wikipedia-like
text
Higher standard deviation in
automatically generated text
than genuine content
RQ2/3: TASK-
BASED
EVALUATION
10 experienced editors from 6 Wikipedia’s
(average tenure ~9 years), semi-structured
interviews, have edited English Wikipedia
Arabic sentences produced by the neural
network, synthetic sentences for the other
languages
Removed up to 2 words per sentence,
emulating the behaviour of the neural network
(concept in native language, related entity not
connected in Wikidata)
Editors were asked to write 2-3 sentences
RQ2: TASK-BASED EVALUATION
Under-resourced Wikipedia editing: Useful summaries, particularly for
non-native speakers
Provenance, transparency: Readers assumed generated text was from a
Wikipedia in another language rather than AI generated
Length of text: One sentence article signalled the article needs work,
longer snippets should match reading practice
Importance of text: People looked at the text first rather than the triples.
Text added context to triples, seeing the text reassured people that they
landed on the right page
RQ3 TASK-BASED
AND METRICS-
BASED EVALUATION
The snippets were heavily used
All participants reused them at least
partially.
8 of them were wholly derived and
the other 2 were partially derived
from the automatically generated
text
<rare> tokens lead participants to
discard the whole sentence
Hallucinations remained undetected
even by participants with domain
knowledge
Disjoint longest sequences of tokens in the
edited text that exist in the source text.
Informed by vandalism detection metrics in
Wikipedia
Wholly Derived (WD): gstscore >= 0.66
Partially Derived (PD): 0.66 > gstscore >= 0.33
Non-Derived (ND): 0.33 > gstscore
ASSURING THE
INTEGRITY OF
KNOWLEDGE
Verbalisation as a machine format
to verify triples against references
PROVENANCE
MATTERS
Wikidata is widely used
beyond WikiProjects
It is a secondary source of
knowledge
References should be
accessible, relevant,
authoritative
ARE KNOWLEDGE CLAIMS SUPPORTED BY
THEIR REFERENCE?
CLAIM VERBALISATION
Train a model to convert Wikidata triples into
natural language phrases
• Contextualises predicates
• Formats entity labels as they would appear in sources
Wikidata provides entity labels and multiple
aliases
• Multiple verbalisations can be supported
• Preferred predicate aliases can be set according to entity types
Measure quality of verbalisation
• Crowdsourcing rather than algorithmic, e.g., BLUE, METEOR,
ROUGE, etc
• Fluency (0-5 scale) and adequacy (Yes or No)
(Chandler Fashion Center, directions, Highway 101 & Highway 202)
«■ Not logged in Talk Contributions Create account Login
voyage
Page Discussion Read Edit View history | Search Wikivoyage
This is an old revision of this page, as edited by SelfieCity (talk | contribs) at 20:58,13 October 2019 (page banner). (diff)
•— Older revision | Latest revision (diff) | Newer revision —» (diff)
Main page
Travel destinations
Star articles What's
Nearby?
Tourist office
Random page
North America > United States of America >
Chandler
Southwest (United States of America) > Arizona > Greater Phoenix > Chandler (Arizona)
Travellers' pub
Recent changes
Community portal
Maintenance panel
Policies
Help
Interlingual lounge
Donate
Chandler is a city in Arizona, and a medium-sized suburb of Phoenix with over 240,000 residents. It is a delightful place to visit.
Get in
Related changes
Upload file
By plane
Phoenix Sky Harbor International Airport (PHXIATA) +1 602 275-4958 [1]i? is the main air gateway to Arizona. It is in East Phoenix 3 miles from downtown. All major U.S. carriers serve Phoenix
Special ।
Perm am
Page inft
Cite this
F rirt s r
* Q Chandler Fashion Center^ (Chandler Mali), 3111 W Chandler Blvd (Highway 101 & Highway 202), W +1 480-812-8488. M-Sa 10AM-9PM, Su 11AM-6PM; restaurant and dept store hours vary An upscale shopping mall with various
department stores, (updated Sep 2018)
• H me Shoppes at Casa Palomas’, 7131 West Ray Road (Highway 10 & Ray). Upscale outdoor shopping and dining center, (updated Sep 201 B>
•BPhoenix Premium Outletss’, 4976 Premium Outlet Way (Highway 10 & Highway 202), +1-480-639-1766. A shopping destination for locals and visitors looking for upscale shopping in a casual, family-friendly atmosphere.
(updated Sep 2018)
In other projects Get around
Wikimedia Commons
Wikipedia
Car and bus are the easiest ways to get around Chandler Rideshare services Uber and Lytt also operate in the city. Google's Waymo autonomous car service is also available in parts of the city.
DATASETS
WebNLG (Dbpedia classes)
• Training and validation: Airport, Astronaut,
Building, City, ComicsCharacter, Food,
monument, SportsTeam, University,
WrittenWork
• Testing: Athlete, Artist, CelestialBody,
MeanOfTransportation, Politician
WVD (Wikidata classes)
• Testing: WebNLG classes mapped to Wikidata
• Plus: ChemicalCompount, Mountain, Painting,
Street, Taxon
CROWDSOURCING RESULTS
Fluency: resembles text written by humans
Adequacy: text keeps meaning of triples
590 workers
• Most did 1 task; some did up to 168 tasks
Inter-annotator reliability (Krippendorff’s Alpha)
• Fluency: 0.427 (Moderate)
• Adequacy: 0.458 (Moderate)
Mean Fluency Median Fluency Adequacy
Fluency scores
0. Incomprehensible text
1. Barely understandable text with significant grammatical errors
2. Understandable text with moderate grammatical errors
3. Comprehensible text with minor grammatical errors
4. Comprehensible and grammatically correct text that still reads artificial
5. Comprehensible and grammatically correct text that feels natural
BAD
FLUENCY
Information syntactically hard to understand
1-[(3S,9S,10S)-12-[(2R)-1-hydroxypr... → stereoisomer of → 1-
[(3R,9R,10S)-12-[(2R)-1-hydroxypr…
1-[(3R,9R,10S)-12-[(2R)-1-hydroxypr... is
Redundant information
Bydgoszcz → flag → flag of Bydgoszcz
The flag of Bydgoszcz is the flag of Bydgoszcz
Loosely defined predicates
(15976) 1998 FY119 → time of discovery or invention →
20/03/1998 | 1.6
(15976) 1998 FY119 was invented on 20/03/1998
Predicates reliant on qualifiers
Pseudochaete → different from → Pseudochaete
Pseudochaete is different from Pseudochaete
BAD
ADEQUACY
Information syntactically hard to
understand
(182176) 2000 SM250 → time of discovery or invention
→ 24/09/2000
The invention of the SM250 was made on 24/09/2000
and was discovered on 182176.
Redundant information
Gru → discography → Gru discography
The discography of Gru is extensive.
Predicate labels too broad
barrel wine → facet of → barrel
The facet of barrel wine is the same.
Predicates lacking specificity
Decius → child → Hostilian
Decius is a child of Hostilian.
figshare.com/articles/dataset/WDV/17159045/1
github.com/gabrielmaia7/WDV
github.com/mlcommons/croissant/
SUMMARY
Wikidata is the data backbone of
Wikipedia
Verbalising it can help bootstrap
articles in under-resourced languages
More empirical research needed to
understand level of assistance
needed when writing with AI and the
level of transparency required etc
SUMMARY
Knowledge graphs are curated, trusted
sources of knowledge, which can augment
LLMs to reduce hallucinations, facilitate
answer attribution, support ethical alignment
Their knowledge integrity must be
guaranteed
Natural language generation can help verify
knowledge claims against diverse sources
WHAT’S
NEXT
Conversational generative AI
as a tool to create, curate,
access knowledge graphs’
content… responsibly?
Thanks to: Gabriel Amaral, Jonathan Hare, Lucie
Kaffee, Odinaldo Rodrigues, Pavlos Vougiouklis
Kaffee, L. A., Vougiouklis, P., & Simperl, E. (2022). Using
natural language generation to bootstrap missing
Wikipedia articles: A human-centric perspective. Semantic
Web, 13(2), 163-194.
Amaral, G., Rodrigues, O., & Simperl, E. (2022,
October). WDV: A Broad Data Verbalisation Dataset
Built from Wikidata. In International Semantic Web
Conference (pp. 556-574). Cham: Springer
International Publishing.
Amaral, G., Rodrigues, O., & Simperl, E. ProVe: A
Pipeline for Automated Provenance Verification of
Knowledge Graphs Against Textual Sources. To appear
in Semantic Web,

Mais conteúdo relacionado

Mais procurados

Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach Neo4j
 
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfSonal Tiwari
 
[DSC Europe 23] Marcel Tkacik - Augmented Retrieval Products with GAI models
[DSC Europe 23] Marcel Tkacik - Augmented Retrieval Products with GAI models[DSC Europe 23] Marcel Tkacik - Augmented Retrieval Products with GAI models
[DSC Europe 23] Marcel Tkacik - Augmented Retrieval Products with GAI modelsDataScienceConferenc1
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...Neo4j
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdfQualcomm Research
 
The art of the possible with graph technology_Neo4j GraphSummit Dublin 2023.pptx
The art of the possible with graph technology_Neo4j GraphSummit Dublin 2023.pptxThe art of the possible with graph technology_Neo4j GraphSummit Dublin 2023.pptx
The art of the possible with graph technology_Neo4j GraphSummit Dublin 2023.pptxNeo4j
 
ChatGPT - AI.pdf
ChatGPT - AI.pdfChatGPT - AI.pdf
ChatGPT - AI.pdfBannoon1
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AINeo4j
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023HyunJoon Jung
 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesDianaGray10
 
Deep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptxDeep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptxTomazBratanic1
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfLiming Zhu
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
 
Introduction to Chat GPT
Introduction to Chat GPTIntroduction to Chat GPT
Introduction to Chat GPTDianaGray10
 
Demystifying Graph Neural Networks
Demystifying Graph Neural NetworksDemystifying Graph Neural Networks
Demystifying Graph Neural NetworksNeo4j
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AISemantic Web Company
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroNumenta
 

Mais procurados (20)

Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach
 
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdf
 
[DSC Europe 23] Marcel Tkacik - Augmented Retrieval Products with GAI models
[DSC Europe 23] Marcel Tkacik - Augmented Retrieval Products with GAI models[DSC Europe 23] Marcel Tkacik - Augmented Retrieval Products with GAI models
[DSC Europe 23] Marcel Tkacik - Augmented Retrieval Products with GAI models
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdf
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
 
The art of the possible with graph technology_Neo4j GraphSummit Dublin 2023.pptx
The art of the possible with graph technology_Neo4j GraphSummit Dublin 2023.pptxThe art of the possible with graph technology_Neo4j GraphSummit Dublin 2023.pptx
The art of the possible with graph technology_Neo4j GraphSummit Dublin 2023.pptx
 
ChatGPT - AI.pdf
ChatGPT - AI.pdfChatGPT - AI.pdf
ChatGPT - AI.pdf
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
 
OpenAI Chatgpt.pptx
OpenAI Chatgpt.pptxOpenAI Chatgpt.pptx
OpenAI Chatgpt.pptx
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practices
 
Deep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptxDeep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptx
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdf
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Introduction to Chat GPT
Introduction to Chat GPTIntroduction to Chat GPT
Introduction to Chat GPT
 
Demystifying Graph Neural Networks
Demystifying Graph Neural NetworksDemystifying Graph Neural Networks
Demystifying Graph Neural Networks
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
 
Journey of Generative AI
Journey of Generative AIJourney of Generative AI
Journey of Generative AI
 

Semelhante a Knowledge graph use cases in natural language generation

Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageNoreen Whysel
 
Free For All: Getting Started in Open Source
Free For All: Getting Started in Open SourceFree For All: Getting Started in Open Source
Free For All: Getting Started in Open SourceAli King
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingSeokhwan Kim
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Gautier Poupeau
 
Responsive Web Design for Libraries
Responsive Web Design for LibrariesResponsive Web Design for Libraries
Responsive Web Design for LibrariesVincci Kwong
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webFabien Gandon
 
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Digital Methods Initiative
 
On the many graphs of the Web and the interest of adding their missing links.
On the many graphs of the Web and the interest of adding their missing links. On the many graphs of the Web and the interest of adding their missing links.
On the many graphs of the Web and the interest of adding their missing links. Fabien Gandon
 
Web Introduction
Web IntroductionWeb Introduction
Web Introductionasim78
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinAnja Jentzsch
 
Overview and Summarize knowledge areas: a dual approach in knowledge mapping ...
Overview and Summarize knowledge areas: a dual approach in knowledge mapping ...Overview and Summarize knowledge areas: a dual approach in knowledge mapping ...
Overview and Summarize knowledge areas: a dual approach in knowledge mapping ...KNOWeSCAPE2014
 
Linked Open Data and Applications
Linked Open Data and Applications Linked Open Data and Applications
Linked Open Data and Applications Victor de Boer
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Researchadameq
 

Semelhante a Knowledge graph use cases in natural language generation (20)

Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural Heritage
 
Free For All: Getting Started in Open Source
Free For All: Getting Started in Open SourceFree For All: Getting Started in Open Source
Free For All: Getting Started in Open Source
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
Web Information Systems Introduction and Origin of World Wide Web
Web Information Systems Introduction and Origin of World Wide WebWeb Information Systems Introduction and Origin of World Wide Web
Web Information Systems Introduction and Origin of World Wide Web
 
semantic web & natural language
semantic web & natural languagesemantic web & natural language
semantic web & natural language
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic Tracking
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...
 
Responsive Web Design for Libraries
Responsive Web Design for LibrariesResponsive Web Design for Libraries
Responsive Web Design for Libraries
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
 
On the many graphs of the Web and the interest of adding their missing links.
On the many graphs of the Web and the interest of adding their missing links. On the many graphs of the Web and the interest of adding their missing links.
On the many graphs of the Web and the interest of adding their missing links.
 
Web Introduction
Web IntroductionWeb Introduction
Web Introduction
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
 
Overview and Summarize knowledge areas: a dual approach in knowledge mapping ...
Overview and Summarize knowledge areas: a dual approach in knowledge mapping ...Overview and Summarize knowledge areas: a dual approach in knowledge mapping ...
Overview and Summarize knowledge areas: a dual approach in knowledge mapping ...
 
A Friendly Localized Platform for Multilingual Semantic Communication
A Friendly Localized Platform for Multilingual Semantic Communication A Friendly Localized Platform for Multilingual Semantic Communication
A Friendly Localized Platform for Multilingual Semantic Communication
 
Linked Open Data and Applications
Linked Open Data and Applications Linked Open Data and Applications
Linked Open Data and Applications
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 

Mais de Elena Simperl

This talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceElena Simperl
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so farElena Simperl
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringElena Simperl
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactElena Simperl
 
Ten myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfTen myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfElena Simperl
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringElena Simperl
 
Data commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfData commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfElena Simperl
 
Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Elena Simperl
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?Elena Simperl
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesElena Simperl
 
Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterElena Simperl
 
High-value datasets: from publication to impact
High-value datasets: from publication to impactHigh-value datasets: from publication to impact
High-value datasets: from publication to impactElena Simperl
 
The story of Data Stories
The story of Data StoriesThe story of Data Stories
The story of Data StoriesElena Simperl
 
The human face of AI: how collective and augmented intelligence can help sol...
The human face of AI:  how collective and augmented intelligence can help sol...The human face of AI:  how collective and augmented intelligence can help sol...
The human face of AI: how collective and augmented intelligence can help sol...Elena Simperl
 
Qrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesQrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesElena Simperl
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...Elena Simperl
 
Inclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachInclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachElena Simperl
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computingElena Simperl
 

Mais de Elena Simperl (20)

This talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing science
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 
Ten myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfTen myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdf
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Data commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfData commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdf
 
Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart cities
 
Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on Twitter
 
High-value datasets: from publication to impact
High-value datasets: from publication to impactHigh-value datasets: from publication to impact
High-value datasets: from publication to impact
 
The story of Data Stories
The story of Data StoriesThe story of Data Stories
The story of Data Stories
 
The human face of AI: how collective and augmented intelligence can help sol...
The human face of AI:  how collective and augmented intelligence can help sol...The human face of AI:  how collective and augmented intelligence can help sol...
The human face of AI: how collective and augmented intelligence can help sol...
 
Qrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesQrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart cities
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
 
Qrowd and the city
Qrowd and the cityQrowd and the city
Qrowd and the city
 
Inclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachInclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approach
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computing
 

Último

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Knowledge graph use cases in natural language generation

  • 1. KNOWLEDGE GRAPH USE CASES IN NATURAL LANGUAGE GENERATION Elena Simperl, @esimperl INLG/SIGDial 2023
  • 2. KNOWLEDGE ENGINEERING “the technical, scientific and social aspects involved in building, maintaining and using knowledge- based systems.” [Source: Wikipedia]
  • 3.
  • 4. ORGANISING THE WORLD’S INFORMATION Find the right thing Get the best summary Go deeper and broader [Source: https://blog.google/products/search/introducing-knowledge-graph-things-not/]
  • 5. KNOWLEDGE GRAPHS STORE INTERLINKED DESCRIPTIONS OF ENTITIES OF INTEREST IN A DOMAIN 5 Name and identifier Labels and descriptions Relationships Links to other sources
  • 7. NOT YOUR USUAL GOFAI • Orders of magnitude higher scales than GOFAI knowledge bases • Simple knowledge representation, no formal semantics • Vocabulary reuse, networks of small modular vocabularies • Incomplete, inconsistent, always changing • Built via human-AI pipelines (w/ ETL, information extraction etc.) • Many large open-source projects with strong communities • Knowledge graph services for developers [Source: Noy et al., 2019]
  • 8. NOT YOUR USUAL LLM EITHER Content-wise some overlap, but different paradigm: knowledge engineering, decentralized data publishing i.e., identifiers, reusable schemas, interlinking More related to semantic networks, frames, rule-based NLP than LLM Knowledge graphs are a common source of embeddings in AI systems
  • 9. KNOWLEDGE GRAPHS AND NLP [Source: Schneider et al., 2019]
  • 10. CLOSING THE DATA DIVIDE Natural language generation helps users with diverse levels of digital literacy to share their knowledge
  • 11. BACKGROUND: WIKIDATA Collaborative knowledge graph, Wikimedia project (2012) 23k active users, 106m items, 1.9b edits Open license RDF support, links to LOD cloud
  • 12.
  • 13. BACKGROUND: ARTICLEPLACEHOLDER MEDIAWIKI.ORG/WIKI/ EXTENSION:ARTICLEPLACEHOLDER Wikipedia is available in 300+ languages, but content is unevenly distributed Wikidata is cross-lingual, but less accessible to edit than Wikipedia ArticlePlaceholders display Wikidata triples as stubs for articles in underserved Wikipedia’s Currently deployed in 14 Wikipedia’s
  • 14. NEURAL NETWORK TRAINED ON WIKIDATA/WIKIPEDIA Feed-forward architecture encodes triples from the ArticlePlaceholder into vector of fixed dimensionality RNN-based decoder generates text summaries, one token at a time Optimisations for different entity verbalisations, rare entities etc.
  • 15. RESEARCH QUESTIONS RQ1 Can we train a neural network to generate text from triples in a multilingual setting? RQ2 How do editors perceive the generated text on the ArticlePlaceholder page? RQ3 How do editors use the generated sentence in their work? Data Methods Participants RQ1 Metrics and survey answers Metrics-based evaluation (BLEU 2, BLEU 3, BLEU 4, METEOR and ROUGEL), scores (perceived fluency, appropriateness) Readers of Arabic and Esperanto Wikipedia’s RQ2 Interviews Task-based evaluation, thematic analysis Arabic, Persian, Indonesian, Hebrew, Swedish Wikipedia editors RQ3 Interviews and text reuse metrics Task-based evaluation, thematic analysis, metrics-based evaluation Arabic, Persian, Indonesian, Hebrew, Swedish Wikipedia editors
  • 16. RQ1 METRICS-BASED EVALUATION Trained on corpus of Wikipedia sentences and corresponding Wikidata triples (205k Arabic; 102k Esperanto) Tested against three baselines: machine translation (MT) and template retrieval (TR, TRext), 5-gram Kneser-Ney language model (KN) Using standard metrics: BLEU, METEOR, ROUGEL
  • 17. RQ1: METRICS- BASED EVALUATION Approach outperforms baselines and generalises across domains Property placeholders improve performance
  • 18. RQ1: PERCEIVED FLUENCY AND APPROPRIATENESS OF TEXT 54 participants, frequent readers of Arabic (n=27) and Esperanto (n=27) Wikipedia 60 summaries (30 automatically generated, 15 news, 15 Wikipedia sentences from the training corpus) Esperanto news from Le Monde Diplomatique, Arabic news from BBC Arabic Arabic: ~400 annotations (each), Esperanto: ~230 annotations (each)
  • 19. RQ1: PERCEIVED FLUENCY AND APPROPRIATENESS OF TEXT Participants could tell the difference between news and Wikipedia content. AI can produce Wikipedia-like text Higher standard deviation in automatically generated text than genuine content
  • 20. RQ2/3: TASK- BASED EVALUATION 10 experienced editors from 6 Wikipedia’s (average tenure ~9 years), semi-structured interviews, have edited English Wikipedia Arabic sentences produced by the neural network, synthetic sentences for the other languages Removed up to 2 words per sentence, emulating the behaviour of the neural network (concept in native language, related entity not connected in Wikidata) Editors were asked to write 2-3 sentences
  • 21. RQ2: TASK-BASED EVALUATION Under-resourced Wikipedia editing: Useful summaries, particularly for non-native speakers Provenance, transparency: Readers assumed generated text was from a Wikipedia in another language rather than AI generated Length of text: One sentence article signalled the article needs work, longer snippets should match reading practice Importance of text: People looked at the text first rather than the triples. Text added context to triples, seeing the text reassured people that they landed on the right page
  • 22. RQ3 TASK-BASED AND METRICS- BASED EVALUATION The snippets were heavily used All participants reused them at least partially. 8 of them were wholly derived and the other 2 were partially derived from the automatically generated text <rare> tokens lead participants to discard the whole sentence Hallucinations remained undetected even by participants with domain knowledge Disjoint longest sequences of tokens in the edited text that exist in the source text. Informed by vandalism detection metrics in Wikipedia Wholly Derived (WD): gstscore >= 0.66 Partially Derived (PD): 0.66 > gstscore >= 0.33 Non-Derived (ND): 0.33 > gstscore
  • 23. ASSURING THE INTEGRITY OF KNOWLEDGE Verbalisation as a machine format to verify triples against references
  • 24.
  • 25. PROVENANCE MATTERS Wikidata is widely used beyond WikiProjects It is a secondary source of knowledge References should be accessible, relevant, authoritative
  • 26. ARE KNOWLEDGE CLAIMS SUPPORTED BY THEIR REFERENCE?
  • 27.
  • 28. CLAIM VERBALISATION Train a model to convert Wikidata triples into natural language phrases • Contextualises predicates • Formats entity labels as they would appear in sources Wikidata provides entity labels and multiple aliases • Multiple verbalisations can be supported • Preferred predicate aliases can be set according to entity types Measure quality of verbalisation • Crowdsourcing rather than algorithmic, e.g., BLUE, METEOR, ROUGE, etc • Fluency (0-5 scale) and adequacy (Yes or No)
  • 29. (Chandler Fashion Center, directions, Highway 101 & Highway 202) «■ Not logged in Talk Contributions Create account Login voyage Page Discussion Read Edit View history | Search Wikivoyage This is an old revision of this page, as edited by SelfieCity (talk | contribs) at 20:58,13 October 2019 (page banner). (diff) •— Older revision | Latest revision (diff) | Newer revision —» (diff) Main page Travel destinations Star articles What's Nearby? Tourist office Random page North America > United States of America > Chandler Southwest (United States of America) > Arizona > Greater Phoenix > Chandler (Arizona) Travellers' pub Recent changes Community portal Maintenance panel Policies Help Interlingual lounge Donate Chandler is a city in Arizona, and a medium-sized suburb of Phoenix with over 240,000 residents. It is a delightful place to visit. Get in Related changes Upload file By plane Phoenix Sky Harbor International Airport (PHXIATA) +1 602 275-4958 [1]i? is the main air gateway to Arizona. It is in East Phoenix 3 miles from downtown. All major U.S. carriers serve Phoenix Special । Perm am Page inft Cite this F rirt s r * Q Chandler Fashion Center^ (Chandler Mali), 3111 W Chandler Blvd (Highway 101 & Highway 202), W +1 480-812-8488. M-Sa 10AM-9PM, Su 11AM-6PM; restaurant and dept store hours vary An upscale shopping mall with various department stores, (updated Sep 2018) • H me Shoppes at Casa Palomas’, 7131 West Ray Road (Highway 10 & Ray). Upscale outdoor shopping and dining center, (updated Sep 201 B> •BPhoenix Premium Outletss’, 4976 Premium Outlet Way (Highway 10 & Highway 202), +1-480-639-1766. A shopping destination for locals and visitors looking for upscale shopping in a casual, family-friendly atmosphere. (updated Sep 2018) In other projects Get around Wikimedia Commons Wikipedia Car and bus are the easiest ways to get around Chandler Rideshare services Uber and Lytt also operate in the city. Google's Waymo autonomous car service is also available in parts of the city.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35. DATASETS WebNLG (Dbpedia classes) • Training and validation: Airport, Astronaut, Building, City, ComicsCharacter, Food, monument, SportsTeam, University, WrittenWork • Testing: Athlete, Artist, CelestialBody, MeanOfTransportation, Politician WVD (Wikidata classes) • Testing: WebNLG classes mapped to Wikidata • Plus: ChemicalCompount, Mountain, Painting, Street, Taxon
  • 36.
  • 37. CROWDSOURCING RESULTS Fluency: resembles text written by humans Adequacy: text keeps meaning of triples 590 workers • Most did 1 task; some did up to 168 tasks Inter-annotator reliability (Krippendorff’s Alpha) • Fluency: 0.427 (Moderate) • Adequacy: 0.458 (Moderate)
  • 38. Mean Fluency Median Fluency Adequacy Fluency scores 0. Incomprehensible text 1. Barely understandable text with significant grammatical errors 2. Understandable text with moderate grammatical errors 3. Comprehensible text with minor grammatical errors 4. Comprehensible and grammatically correct text that still reads artificial 5. Comprehensible and grammatically correct text that feels natural
  • 39. BAD FLUENCY Information syntactically hard to understand 1-[(3S,9S,10S)-12-[(2R)-1-hydroxypr... → stereoisomer of → 1- [(3R,9R,10S)-12-[(2R)-1-hydroxypr… 1-[(3R,9R,10S)-12-[(2R)-1-hydroxypr... is Redundant information Bydgoszcz → flag → flag of Bydgoszcz The flag of Bydgoszcz is the flag of Bydgoszcz Loosely defined predicates (15976) 1998 FY119 → time of discovery or invention → 20/03/1998 | 1.6 (15976) 1998 FY119 was invented on 20/03/1998 Predicates reliant on qualifiers Pseudochaete → different from → Pseudochaete Pseudochaete is different from Pseudochaete
  • 40. BAD ADEQUACY Information syntactically hard to understand (182176) 2000 SM250 → time of discovery or invention → 24/09/2000 The invention of the SM250 was made on 24/09/2000 and was discovered on 182176. Redundant information Gru → discography → Gru discography The discography of Gru is extensive. Predicate labels too broad barrel wine → facet of → barrel The facet of barrel wine is the same. Predicates lacking specificity Decius → child → Hostilian Decius is a child of Hostilian.
  • 43. SUMMARY Wikidata is the data backbone of Wikipedia Verbalising it can help bootstrap articles in under-resourced languages More empirical research needed to understand level of assistance needed when writing with AI and the level of transparency required etc
  • 44. SUMMARY Knowledge graphs are curated, trusted sources of knowledge, which can augment LLMs to reduce hallucinations, facilitate answer attribution, support ethical alignment Their knowledge integrity must be guaranteed Natural language generation can help verify knowledge claims against diverse sources
  • 45. WHAT’S NEXT Conversational generative AI as a tool to create, curate, access knowledge graphs’ content… responsibly? Thanks to: Gabriel Amaral, Jonathan Hare, Lucie Kaffee, Odinaldo Rodrigues, Pavlos Vougiouklis
  • 46. Kaffee, L. A., Vougiouklis, P., & Simperl, E. (2022). Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective. Semantic Web, 13(2), 163-194.
  • 47. Amaral, G., Rodrigues, O., & Simperl, E. (2022, October). WDV: A Broad Data Verbalisation Dataset Built from Wikidata. In International Semantic Web Conference (pp. 556-574). Cham: Springer International Publishing. Amaral, G., Rodrigues, O., & Simperl, E. ProVe: A Pipeline for Automated Provenance Verification of Knowledge Graphs Against Textual Sources. To appear in Semantic Web,