Jake Lever - University of Glasgow
Will artificial intelligence change how readers use the research literature?
Huge advances in machine learning and natural language processing are set to upend how researchers search and consume research articles as well as change how articles are written. These new approaches are becoming adept at summarising and rewriting text, answering questions about it and extracting key information. These abilities will enable humans to search for information in new ways, such as the new ChatGPT system. They are valuable tools for researchers who curate the research literature to build knowledge bases particularly in biomedicine. Nevertheless, these approaches suffer from large problems including their computational cost and that they can confidently output incorrect information. This session will provide background on how these new methods work and discuss their benefits, challenges and potential impact.
UKSG 2023 - Will artificial intelligence change how readers use the research literature?
1. Will artificial intelligence change how readers
use the research literature?
jake.lever@glasgow.ac.uk
Jake Lever, University of Glasgow
2. This talk will cover:
• Why text mining is going to get
bigger and bigger in science?
• An application in precision
medicine
• What are these new large
language models and how do they
work?
• How will they fit into text mining?
• What’s next?
Overview of the talk
3. Text mining will become essential in science
• Too many papers to read!
• (for some areas of
science)
• Artificial intelligence
methods may help:
• Extract knowledge from
papers
• Summarise them
• More?
image generated with Midjourney
4. • We can now gather huge amounts of data on each person
• But what does it mean?
Biomedicine is becoming a “big data” science
https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost
https://nanoporetech.com/about-us/news/oxford-nanopore-announces-ps100-million-140m-fundraising-global-investors
5. Interpreting biological experiments is getting more challenging
Gene Event
AP2B1 Overexpressed
CDB10 Deletion
PDDC1 Point Mutation
TPSD1 Underexpressed
WFDC5
Promoter
Methylated
BCL6B
Alternative
Splicing
7. Search tools & knowledge bases are valuable
Search Tools
✅ Flexible to different queries
✅ Easier to maintain
✅ Deals well with new
literature
❌ May requires users to read
(many) papers
❌ Cannot easily be used by
automated analyses
8. Search tools & knowledge bases are valuable
Search Tools
✅ Flexible to different queries
✅ Easier to maintain
✅ Deals well with new
literature
❌ May requires users to read
(many) papers
❌ Cannot easily be used by
automated analyses
Knowledge Bases
✅ Little paper reading
✅ Structured & searchable
✅ Usable by automated
analyses
❌ Only good if a KB exists for
your problem
❌ Huge cost burden to create
and maintain
9. Could text mining be a solution?
Can we use natural language
processing (NLP) to create
knowledge bases for our
specific information need?
http://www.anthropologyofknowledge.nu/2015/11/10/the-challenge-of-digital-reading
10. Knowledge provenance is essential in biomedicine
Where did you get that information
to make that clinical/experimental
decision?
• Clinicians and scientists need
to see the underlying data
• There are often legal
ramifications for clinical
decision making
11. An application in precision medicine
What are large language models?
Where do large language models fit?
What’s coming next?
12. An application in precision medicine
What are large language models?
Where do large language models fit?
What’s coming next?
13. Primary application: Precision medicine
“The right drug to the right
patient at the right time”
• Relies heavily on the latest
research
• Groups around the world are
manually reviewing literature
constantly
14. Interpretation is the
bottleneck of precision
medicine
Good, Benjamin M., et al. "Organizing knowledge to enable personalization of medicine in cancer." Genome biology 15.8 (2014): 438.
15. • Creating a knowledge base
requires expert knowledge
• Biocurators need text
mining tools to help them
triage the literature
Biocuration
17. Motivation
Can text-mining find relevant
knowledge in the literature to assist
curation?
“Our results show that the V433M mutation of the
CYP4F2 gene affects metabolism of warfarin.”
18. Step 1: Get available biomedical publications
Takeuchi, Fumihiko, et al. "A genome-wide association study confirms VKORC1, CYP2C9, and CYP4F2 as principal genetic determinants of warfarin dose." PLoS genetics 5.3 (2009): e1000433.
PubMed
PMC Open
Access subset
PMC Author
Manuscript
Example Paper
… After these quality control steps, a total of 1053 warfarin
patients and 325,997 GWAS SNPs were retained for analysis.
The GWAS SNPs included two SNPs not on the
HumanCNV370 array but which are highly predictive of
warfarin dose [rs9923231 (VKORC1) and rs1799853
(CYP2C9*2)] which we genotyped by TaqMan assay (Applied
Biosystems).
Defining CNV Regions
Although we retained 325,997 GWAS SNPs for association
testing of ...
19. Step 2: Find chemicals, mutations and genes
Takeuchi, Fumihiko, et al. "A genome-wide association study confirms VKORC1, CYP2C9, and CYP4F2 as principal genetic determinants of warfarin dose." PLoS genetics 5.3 (2009): e1000433.
Annotations for:
• Genes
• Mutations
• Chemicals
• Diseases
• Species
• Celllines
Example Paper
… After these quality control steps, a total of 1053 warfarin
patients and 325,997 GWAS SNPs were retained for analysis.
The GWAS SNPs included two SNPs not on the
HumanCNV370 array but which are highly predictive of
warfarin dose [rs9923231 (VKORC1) and rs1799853
(CYP2C9*2)] which we genotyped by TaqMan assay (Applied
Biosystems).
Defining CNV Regions
Although we retained 325,997 GWAS SNPs for association
testing of ...
20. Step 3: Find sentences that mention chemical, mutation
and specific keywords
Takeuchi, Fumihiko, et al. "A genome-wide association study confirms VKORC1, CYP2C9, and CYP4F2 as principal genetic determinants of warfarin dose." PLoS genetics 5.3 (2009): e1000433.
Example Paper
… After these quality control steps, a total of 1053 warfarin
patients and 325,997 GWAS SNPs were retained for analysis.
The GWAS SNPs included two SNPs not on the
HumanCNV370 array but which are highly predictive of
warfarin dose [rs9923231 (VKORC1) and rs1799853
(CYP2C9*2)] which we genotyped by TaqMan assay (Applied
Biosystems).
Defining CNV Regions
Although we retained 325,997 GWAS SNPs for association
testing of ...
21. Step 4: Identify pharmacogenomic relations between
each chemical/mutation pair
Takeuchi, Fumihiko, et al. "A genome-wide association study confirms VKORC1, CYP2C9, and CYP4F2 as principal genetic determinants of warfarin dose." PLoS genetics 5.3 (2009): e1000433.
Example Paper
… After these quality control steps, a total of 1053 warfarin
patients and 325,997 GWAS SNPs were retained for analysis.
The GWAS SNPs included two SNPs not on the
HumanCNV370 array but which are highly predictive of
warfarin dose [rs9923231 (VKORC1) and rs1799853
(CYP2C9*2)] which we genotyped by TaqMan assay (Applied
Biosystems).
Defining CNV Regions
Although we retained 325,997 GWAS SNPs for association
testing of ...
1000 sentences
manually annotated
80/20% split for
evaluation
Avg Precision: 78%
Avg Recall: 25%
22. Step 5: Convert to structured data
The GWAS SNPs included two SNPs not on the HumanCNV370 array but
which are highly predictive of warfarin dose [rs9923231 (VKORC1) and
rs1799853 (CYP2C9*2)] which we genotyped by TaqMan assay (Applied
Biosystems).
Chemical Gene Mutation PubMed ID
warfarin VKORC1 rs9923231 19300499
warfarin CYP2C9 rs1799853 19300499
warfarin CYP2C9 *2 19300499
23. Results
# of papers 7,170
# of sentences 15,228
# of associations 19,930
# of unique
gene/mutation pairs
6,099
% of associations
found in full-text
58.9
24. PGxMine Resource
• Relation extraction used
to mine
pharmacogenomics
sentences from PubMed
& PubMed Central
• Used by PharmGKB
curators to prioritize new
papers for publication
https://pgxmine.pharmgkb.org/
Lever, Jake, et al. "PGxMine: text mining for curation of PharmGKB." Pacific Symposium on Biocomputing 2020.
25. Successful evaluation by PharmGKB curators
Top 100 chemical/mutation associations not in PharmGKB
57 lead directly
to at least one
curatable paper
24 would likely lead
indirectly to curatable
papers through
citations
19 did not lead to
curatable papers
37 to one
paper
16 to two
papers
3 to three
papers
1 to five
papers
83 curatable papers found directly with
PGxMine in top 100 hits!
Lever, Jake, et al. "PGxMine: text mining for curation of PharmGKB." Pacific
Symposium on Biocomputing 2020.
26. Now integrated
into PharmGKB
as automated
annotations
Lever, Jake, et al. "PGxMine: text mining for curation of PharmGKB." Pacific Symposium on Biocomputing 2020.
27. An application in precision medicine
What are large language models?
Where do large language models fit?
What’s coming next?
29. Can you guess the next word?
Peter Piper picked a peck of pickled peppers,
A peck of pickled peppers Peter Piper picked;
If Peter Piper picked a peck of pickled …?
30. From previous text:
“peppers” appears 2/2
times after “pickled”
Can you guess the next word?
Peter Piper picked a peck of pickled peppers,
A peck of pickled peppers Peter Piper picked;
If Peter Piper picked a peck of pickled …?
31. • Look at lots of text (scraped from the internet)
• Some serious problems with bias and hate speech
• See what words follow on another
Where do these probabilities come from?
Example corpora from recent Gopher language paper
Rae, Jack W., et al. "Scaling language models: Methods, analysis & insights from training gopher." arXiv preprint arXiv:2112.11446 (2021).
32. You need more than the previous word to guess the next
it showed 9 o’clock on my ?
Some words are more important for context
33. Language models
have often been
named for muppets
characters:
• ELMO
• BERT
BERT used a new
idea: Transformers
Language models with deep learning
https://muppet.fandom.com/wiki/Sesame_Street
34. Playing language games
It showed 9 o’clock on my __________
It showed 9 _________ on my watch
It showed 9 o’clock on my stapler -> stapler is wrong
Predict the next word:
Predict a masked word:
Spot the corrupted word:
35. • Building and using a language model requires
GPUs become essential and expensive!
https://www.scan.co.uk/products/pny-nvidia-a100-80gb-hbm2-graphics-card-6912-cores-195-tflops-sp-97-tflops-dp
36. • Only large companies can build
these very large language models
And scale up!
https://www.theverge.com/2023/3/13/23637675/microsoft-chatgpt-bing-millions-dollars-supercomputer-openai
Banks, Carl. The Sport of Tycoons. 1974
37. Once upon a time …
Using a language model to generate new text
Word Probability
there 65 / 100
a 22 / 100
it 11 / 100
the 3 / 100
38. Once upon a time there …
Using a language model to generate new text
Word Probability
was 61 / 100
were 23 / 100
had 3 / 100
could 1 / 100
39. Once upon a time there was a young man who lived
down by a river. He did not want to go to school so he…
Using a language model to generate new text
● Keep generating
word-by-word
● Picking the most likely word
doesn’t create the most
interesting text
○ So pick randomly but
weight by the probabilities
Word Probability
ran 22 / 100
hid 21 / 100
told 15 / 100
stole 6 / 100
41. Do language models know anything?
Word Probability
Glasgow 56 / 100
Edinburgh 35 / 100
London 7 / 100
Dundee 2 / 100
The largest city in Scotland is …
42. Do language models know anything?
Word Probability
Glasgow 56 / 100
Edinburgh 35 / 100
London 7 / 100
Dundee 2 / 100
The largest city in Scotland is …
How?
The huge amount
of text used to train
the system must
have contained
text about Scottish
cities
43. But, language models will hallucinate
Source: https://www.unite.ai/preventing-hallucination-in-gpt-3-and-other-complex-language-models/
44. • Stochastic parrots paper
• Fantastic summary of the
limitations of large language
models
• Language models encode the
biases in the text used to train
them
• They don’t “know” anything - they
just regurgitate grammatical
patterns that they’ve seen
Language models are stochastic parrots
Bender, Emily M., et al. "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?🦜." Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 2021.
45. An application in precision medicine
What are large language models?
Where do large language models fit?
What’s coming next?
46. • Prone to confident
hallucination
• Probably mostly factual -
but really difficult to spot
mistakes
• Can’t cite their sources
• But they are very good at
working with language
Using language models directly to ask knowledge is tricky
47. Feed text to a large language model and ask for interpretation
48. Language is
complex and large
language models
can summarize
Hall, Jeff M., et al. "Linkage of early-onset familial breast cancer to chromosome 17q21." Science 250.4988 (1990): 1684-1689.
49. They can also
extract structured
knowledge
Hall, Jeff M., et al. "Linkage of early-onset familial breast cancer to chromosome 17q21." Science 250.4988 (1990): 1684-1689.
50. • Ongoing work to understand if they
are much better than existing
methods
• Writing tasks as “prompts” is a bit of
an art
How good are they at useful language tasks?
51. • This article does not
exist
• Sometimes you get
lucky and it cites real
things. Other things,
it may make up
publications (with
authors, journals,
DOIs, etc).
For better or worse, readers will ask chat systems for knowledge
52. An application in precision medicine
What are large language models?
Where do large language models fit?
What’s coming next?
53. • What happens when there is a new prime minister?
• Language models are trapped by the text they are trained
on
• ChatGPT was trained until 2021
Language models get stuck in the past
The current prime minister of the UK is …
54. • Allow a language
model to use a
search engine to
help it pick the next
word
• Would enable citing
of sources!
• But are the sources
good?
Retrieval-enhanced
language models
Guu, Kelvin, et al. "Retrieval augmented language model pre-training." International conference on machine learning. PMLR, 2020.
56. But even a language model with sources can still hallucinate
57. • Scientists cannot read all the
necessary papers so text
mining can help
• Large language models
seem strangely good at
many general tasks
• There is a lot of hype
• We need to be sceptical
about their abilities and
accuracy
Conclusions
58. Acknowledgements
Jones Lab @ UBC
● Steven Jones
● Martin Jones
● Eric Zhao
● Jasleen Grewal
● Luka Culibrk
● Melika Bonakdar
Griffith Lab @ WashU
● Obi Griffith
● Malachi Griffith
● Kilannin Krysiak
● Arpad Danos
● Jason Saliba
Helix Lab @ Stanford
● Russ Altman
● Teri Klein
● Michelle Whirl-Carrillo
● PharmGKB team
jake.lever@glasgow.ac.uk