SlideShare a Scribd company logo
1 of 61
Download to read offline
Bioevo seminars




The true story behind the annotation of
               a pathway
             Giovanni Dall'Olio,
             IBE (UPF-CEXS)
Summary of the talk
โ—   We have recently published two works:
    โ—   Dall'Olio GM, Jassal B, Montanucci L, Gagneux P, Bertranpetit J,
        Laayouni H. The annotation of the Asparagine N-linked
        Glycosylation pathway in the Reactome Database. Glycobiology.
        2011 Jan 2. PubMed PMID: 21199820.
    โ—   Dall'Olio GM, Bertranpetit J, Laayouni H. The annotation and the
        usage of scientific databases could be improved with public issue
        tracker software. Database (Oxford). 2010 Dec 23;2010:baq035. Print
        2010. PubMed PMID: 21186182; PubMed Central PMCID:
        PMC3011984.
โ—   One is about the annotation of a pathway in a
    database, the other about reporting errors to
    databases
What can you learn from this talk
โ—   The data annotated in scientific databases is
    not perfect, and can contain errors. Even when
    is correct, it can have multiple interpretations.
What can you learn from this talk
โ—   The data annotated in scientific databases is
    not perfect, and can contain errors. Even when
    is correct, it can have multiple interpretations.
โ—   Errors don't get fixed by themselves, and
    problems don't get solved alone. When you find
    something wrong, it is your duty to report it.
What is N-Glycosylation?
โ—   One of the most
    important forms of
    protein modification
โ—   A complex sugar
    composed by 14 units is             N-glycans
    attached to a protein,
    and later modified.

                              protein


                                         Cell membrane
What is N-Glycosylation? (II)
โ—   The surface of a cell is
    usually covered by N-
    glycosylated proteins
โ—   It enhances solubility
    and is required for the
    proper folding.
N-Glycosylation โ€“ how does it works
โ—   A common N-Glycan precursor is attached to a
    nascent protein, and then modified




                                           Millions of
    N-glycan     Intermediate
                                           different N-
    precursor    N-glycan
                                           glycans
Advanced N-Glycosylation
โ—   The same protein can have different N-
    Glycosylation on different tissues and times




    Protein A in the       Protein A in the     Protein A during a
    morning                evening              party




         Protein A on an     Protein A on a     Protein A on a dead
          eritrocite          epithelial cell   cell
N-Glycosylation is a text-book
                 example
โ—   The first part of this
    pathway was
    characterized in the 1980's
    by random mutagenesis in
    yeast (the ALG mutants)
โ—   N-glycosylation is
    important to
    biotechnologists because
    of its implications for drugs
    biosynthesis
Annotating the pathway
โ—   My PhD thesis is about selection within the
    genes of N-Glycosylation, so I had to study the
    biology behind it
Annotating the pathway
โ—   My PhD thesis is about selection within the
    genes of N-Glycosylation, so I had to study the
    biology behind it
โ—   I was not happy with the current annotation of
    the pathway in other databases, so I decided to
    make a new annotation by myself, and publish
    it on a public database
Reactome
โ—   A database for biological relevant pathways
    โ—   TCA cycle, apoptosis, telomerases, influenza...
Reactome - advantages
โ—   Open-source approach. Anyone can contribute
    to a pathway, and all contributions are public
    and transparent
โ—   Annotates a greater number of informations for
    each reaction
โ—   Does not artificially distinguish between
    metabolic/interaction pathways (uses GO terms
    instead)
The process of annotation in
                Reactome
โ—   Annotating a pathway in Reactome means that
    for each reaction, you have to details like:
    โ—   Description and name of the reaction
    โ—   Genes involved
    โ—   GO localization terms for input, outputs and
        enzymes
    โ—   References to experimental evidences
The process of annotation in
        Reactome




     Our N-Glycosylation pathway
The process of annotation in
                Reactome
โ—   Annotating a pathway in Reactome means that
    for each reaction, you have to provide details
    like:
    โ—   Description and name of the reaction
    โ—   Genes involved
    โ—   GO localization terms for input, outputs and
        enzymes
    โ—   References to experimental evidences
โ—   Each reaction must pass a peer-reviewed
    process before being included in Reactome
N-Glycosylation on Reactome
                 final result
โ—   It tooks about 6 months of work to annotate the
    pathway
    โ—   ~100 reactions
    โ—   lot of literature read
    โ—   problems with entries in other databases
    โ—   technical implementation problems
    โ—   problems, problems, problems, time wasted
Is it worth to annotate a pathway
            manually as I did?
โ—   If you are a PhD student: probably yes, and at the
    beginning
โ—   Pros:
    โ—   Helps you having good ideas
    โ—   Recognize how much you can trust database
        annotations
    โ—   Can be published
โ—   Cons:
    โ—   Takes time
    โ—   Not feasible for larger scale studies
The Reactome pathway, complete




       http://tinyurl.com/nglyco-reactome
N-Glycosylation in other databases
โ—   Historically, this pathway is well characterized
    and is a textbook example of a metabolic
    pathway
โ—   It is a good example to compare its annotations
    in different databases
N-Glycosylation in other databases
โ—   Historically, this pathway is well characterized
    and is a textbook example of a metabolic
    pathway
โ—   It is a good example to compare its annotations
    in different databases
โ—   If you dedicate 6 months to read tons of
    literature about what you are studying, which
    kind of errors do you expect to find in
    databases?
GeneOntology (GO)
โ—   GO is an ontology of terms to
    describe the function, localization
    of a gene
โ—   A dictionary designed to reduce
    the problem of synonyms and
    unclear terminology
What is a GO annotation?
โ—   A GO annotation is the association between a
    GO term and a gene, protein or event
    โ—   The gene ALG1 is Integral to the membrane (GO:
        32130)
    โ—   The protein XYZ has peroxidase activity
        (GO:03123)
GO annotations
โ—   GO annotations are frequently used in the
    scientific literature
    โ—   This set of genes is enriched of a specific GO term
    โ—   We subdivided the pathway in GO modules and
        studied selection within them
โ—   However, which are the procedures to
    associate a GO term with a gene?
โ—   Which are the possible sources of error in a GO
    association?
The GO term assignment
โ—   In order to assign a GO term to a protein in a
    species, there it must be an experimental
    evidence for the association in that species
The GO term assignment
โ—   In order to assign a GO term to a protein in a
    species, there it must be an experimental
    evidence for the association in that species




                 Lots of false negatives in GO!
Example of false negative in GO
โ—   We proposed the association of DPAGT1 with
    the term 'Integral to the ER membrane' in
    human
    โ—   All the literature assumes that this association
        exists
    โ—   This association is annotated in Kegg for Yeast
Example of false positive in GO
โ—   We proposed the association of DPAGT1 with
    the term 'Integral to the ER membrane' in
    human
    โ—   All the literature assumes that this association
        exists
    โ—   This association is annotated in Kegg for Yeast
โ—   The proposal has been rejected because there
    it were no experimental evidence in human
    โ—   Probably there it won't never be any
DPAGT1 on GO issue tracker
Ambigous interpretation of the term
     N-Glycosylation in GO
โ—   We found a big mistake with the term 'N-
    Glycosylation'
โ—   This was being associated to two classes of
    genes: those which are glycosylated, and those
    which participate to the process of glycosylation
Ambigous interpretation of the term
     N-Glycosylation in GO




  N-Glycosylation pathway                            N-Glycosylated protein
                                                     World J Gastroenterol. 2010 December 21; 16(47): 5916-
  Essentials of Glycobiology, 3rd edition            5924.




                                            merged
Summary of GO annotations for N-
         glycosylation
โ—   ~80 correct GO classification
โ—   ~50 new gene-term association proposed
โ—   10 genes had a clearly wrong classification
โ—   2 new GO terms proposed:
    โ—   Endoplasmic reticulum quality control compartment
    โ—   Intrinsic to the lumenal side of the endoplasmic
        reticulum :-)
The GO issue tracker
โ—   All the errors on GO elements are tracked on a
    publicly accessible online application, the issue
    tracker
โ—   From there, you can report new errors, or check
    whether there are reported cases for the terms
    you are using
The GO tracker
KEGG/Pathways
โ—   A database of
    Pathways, curated by
    experts in the field
    and manually drawn
KEGG/Pathway
โ—   KEGG/Pathways is the most renowned
    database for biological pathways
โ—   Pros:
    โ—   Lot of pathways annotated
โ—   Cons:
    โ—   Authors of the pathways are unknown, and no
        references are given for each reaction. Difficult to
        ask for clarification
N-Glycosylation on KEGG
KEGG/Pathways
โ—   KEGG/pathways
    diagrams show a
    simplified version of
    the pathway
โ—   These diagrams are
    not wrong, but they
    are there for
    visualization only.
โ—   Easy to interpret them
    erroneously
This is how the latter part of the
   pathway really looks like:

                    Advanced N-Glycosylation
                    in KEGG




                    Real represenation of
                    advanced N-Glycosylation
Why it is important to have
           references for reactions
โ—   For our knowledge of the
    literature, there it should be
    a trifurcation at this point
Why it is important to have
           references for reactions
โ—   For our knowledge of the
    literature, there it should be
    a trifurcation at this point
โ—   Instead, it is shown here:
Why it is important to have
            references for reactions
โ—   For our knowledge of the
    literature, there it should be a
    triple bifurcation at this point
โ—   Instead, it is shown here:
โ—   Without references, it is
    impossible to know why the
    annotator put the bifurcation
    there
String
โ—   Database of interactions among proteins
โ—   Collects informations from different resources
    and merge them together (example of meta-
    database)
False positives in interactions
                 databases
โ—   Interaction databases are usually full of false
    positives and negatives
    โ—   Yeast-two hybrids and similar
    โ—   Automatic processing of scientific papers
String โ€“ the pathway is not there
โ—   Most of the interactions in N-Glycosylation were
    not found in String
    โ—   Latest release has improved
โ—   The same term interaction is ambiguous
String, the ALG2 case
โ—   The biggest issue we
    found in String was due to
    two genes with similar
    names
โ—   There are two genes with
    the symbol ALG2:
    โ—   ALG2 (Asparagine Linked
        Glycosylation 2)
    โ—   ALG-2 (Apoptosis Linked
        Gene โ€“ 2)
โ—   In string, these two were
    confused
                                  ALG2   ALG-2
The ALG11 case
โ—   Similar case for
    ALG11
โ—   Was confused with
    ALG11, a
    Ribonucleosomal
    protein




                             ALG11                   ALG11
                        (N-Glycosylation)   (Ribonucleosomal protein)
What's wrong with String
    โ—   The main problem with String is not that there
        are many false positives/negatives,
    โ—   rather, is that there is no way to annotate or
        report wrong interactions


โ—   It would be good if I could
    tell other users about these
    false positives
Uniprot
โ—   Uniprot is a database for annotations of
    proteins
โ—   The standard resource to use when looking for
    information on a protein
Annotations in Uniprot
โ—   We have found very few errors in the
    annotations in Uniprot
โ—   Mostly they were due to outdated names or
    minor imprecisions
โ—   Every Uniprot entry has a 'Send me feedback'
    button
So, I have found an error in an
        annotation โ€“ now what?
โ—   Ignore the error
So, I have found an error in an
           annotation โ€“ now what?
โ—   Ignore the error
    โ—   Wrong because the error remains there and can
        affect other people's work
So, I have found an error in an
           annotation โ€“ now what?
โ—   Ignore the error
    โ—   Wrong because the error remains there and can
        affect other people's work
โ—   Use a different database
So, I have found an error in an
           annotation โ€“ now what?
โ—   Ignore the error
    โ—   Wrong because the error remains there and can
        affect other people's work
โ—   Use a different database
    โ—   Even worst than the previous: doesn't fix the error
        and waste times
So, I have found an error in an
           annotation โ€“ now what?
โ—   Ignore the error
    โ—   Wrong because the error remains there and can
        affect other people's work
โ—   Use a different database
    โ—   Even worst than the previous: doesn't fix the error
        and waste times
โ—   Report the error
So, I have found an error in an
           annotation โ€“ now what?
โ—   Ignore the error
    โ—   Wrong because the error remains there and can
        affect other people's work
โ—   Use a different database
    โ—   Even worst than the previous: doesn't fix the error
        and waste times
โ—   Report the error
    โ—   This is the best approach, however not everybody
        knows how to do it, and it requires time
Current state of reporting errors in
        the scientific community
โ—   Unfortunately, reporting errors is not
    encouraged in the modern scientific literature
    โ—   Time consuming, but not acknowledged
    โ—   Few people know how to do it
    โ—   Made difficult by lack of transparency
Issue trackers
โ—   Usually the best way to report an error is by
    using an issue tracker software
โ—   When you find an error in a database, always
    check whether there is a public issue tracker
โ—   If not, things get more difficult
How to report an error in GO
โ—   Check the instructions
โ—   Verify the error has not
    been reported yet
โ—   Go to the tracker Home
    page, and click on
    'Report new issue'
โ—   Describe the problem
    with a good title and
    references
Take-home messages
โ—   Before using any data from a scientific
    database:
    โ—   Check whether they have a public issue tracker and
        check whether there are errors reported about your
        data
    โ—   Read the literature!
Agradecimientos
                     (Thank you, grazie, moltes gracies)




                                                                        Thanks also to:
                                                                            โ—   Martin Sikora
                                                                            โ—   Kevin Keys
                                                                            โ—   Anna Bauer-Mehren
                                                                                from IMIM
                                                                            โ—   Everybody in BioEvo!
http://www.ibe.upf-csic.es/ibe/research/research-groups/bertranpetit.html

More Related Content

Similar to The true story behind the annotation of a pathway

1PhylogeneticAnalysisHomeworkassignmentThisa.docx
1PhylogeneticAnalysisHomeworkassignmentThisa.docx1PhylogeneticAnalysisHomeworkassignmentThisa.docx
1PhylogeneticAnalysisHomeworkassignmentThisa.docxfelicidaddinwoodie
ย 
GO slimming tips
GO slimming tipsGO slimming tips
GO slimming tipsValerie Wood
ย 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesMelanie Courtot
ย 
Glycoprotein by KK Sahu sir
Glycoprotein by KK Sahu sirGlycoprotein by KK Sahu sir
Glycoprotein by KK Sahu sirKAUSHAL SAHU
ย 
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...Giovanni Marco Dall'Olio
ย 
Protein glycosylation and its associated disorders
Protein glycosylation and its associated disordersProtein glycosylation and its associated disorders
Protein glycosylation and its associated disordersSaranya Sankar
ย 
Plegable christian
Plegable christianPlegable christian
Plegable christianDaniel Estrada
ย 
JBEI Highlights September 2015
JBEI Highlights September 2015JBEI Highlights September 2015
JBEI Highlights September 2015Irina Silva
ย 
Melatonin Agonists
Melatonin AgonistsMelatonin Agonists
Melatonin AgonistsJessica Howard
ย 
PEGylation technique
PEGylation techniquePEGylation technique
PEGylation techniqueKushal Saha
ย 
05 j microb_methods2019 glycogen
05 j microb_methods2019 glycogen05 j microb_methods2019 glycogen
05 j microb_methods2019 glycogenHoney Phuong
ย 
G protein coupled receptor
G protein coupled receptorG protein coupled receptor
G protein coupled receptorSumit Kumar
ย 
G protein coupled receptor
G protein coupled receptorG protein coupled receptor
G protein coupled receptorSumit Kumar
ย 
gproteincoupledreceptor-170520064151 2.pdf
gproteincoupledreceptor-170520064151 2.pdfgproteincoupledreceptor-170520064151 2.pdf
gproteincoupledreceptor-170520064151 2.pdfbajosalimatou9
ย 
PHd defense presentation Final RIVES
PHd defense presentation Final RIVESPHd defense presentation Final RIVES
PHd defense presentation Final RIVESMarie-Laure Rives, PhD
ย 
JBEI Research Highlights November 2016
JBEI Research Highlights November 2016JBEI Research Highlights November 2016
JBEI Research Highlights November 2016Irina Silva
ย 
Ian Martin GCB Thesis
Ian Martin GCB ThesisIan Martin GCB Thesis
Ian Martin GCB ThesisIan Martin
ย 

Similar to The true story behind the annotation of a pathway (20)

Glycoproteins
Glycoproteins Glycoproteins
Glycoproteins
ย 
Glycoscience-Report-Brief-Final
Glycoscience-Report-Brief-FinalGlycoscience-Report-Brief-Final
Glycoscience-Report-Brief-Final
ย 
1PhylogeneticAnalysisHomeworkassignmentThisa.docx
1PhylogeneticAnalysisHomeworkassignmentThisa.docx1PhylogeneticAnalysisHomeworkassignmentThisa.docx
1PhylogeneticAnalysisHomeworkassignmentThisa.docx
ย 
GO slimming tips
GO slimming tipsGO slimming tips
GO slimming tips
ย 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resources
ย 
Glycoprotein by KK Sahu sir
Glycoprotein by KK Sahu sirGlycoprotein by KK Sahu sir
Glycoprotein by KK Sahu sir
ย 
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
ย 
Protein glycosylation and its associated disorders
Protein glycosylation and its associated disordersProtein glycosylation and its associated disorders
Protein glycosylation and its associated disorders
ย 
Plegable christian
Plegable christianPlegable christian
Plegable christian
ย 
JBEI Highlights September 2015
JBEI Highlights September 2015JBEI Highlights September 2015
JBEI Highlights September 2015
ย 
Samwald ore 2014
Samwald   ore 2014Samwald   ore 2014
Samwald ore 2014
ย 
Melatonin Agonists
Melatonin AgonistsMelatonin Agonists
Melatonin Agonists
ย 
PEGylation technique
PEGylation techniquePEGylation technique
PEGylation technique
ย 
05 j microb_methods2019 glycogen
05 j microb_methods2019 glycogen05 j microb_methods2019 glycogen
05 j microb_methods2019 glycogen
ย 
G protein coupled receptor
G protein coupled receptorG protein coupled receptor
G protein coupled receptor
ย 
G protein coupled receptor
G protein coupled receptorG protein coupled receptor
G protein coupled receptor
ย 
gproteincoupledreceptor-170520064151 2.pdf
gproteincoupledreceptor-170520064151 2.pdfgproteincoupledreceptor-170520064151 2.pdf
gproteincoupledreceptor-170520064151 2.pdf
ย 
PHd defense presentation Final RIVES
PHd defense presentation Final RIVESPHd defense presentation Final RIVES
PHd defense presentation Final RIVES
ย 
JBEI Research Highlights November 2016
JBEI Research Highlights November 2016JBEI Research Highlights November 2016
JBEI Research Highlights November 2016
ย 
Ian Martin GCB Thesis
Ian Martin GCB ThesisIan Martin GCB Thesis
Ian Martin GCB Thesis
ย 

More from Giovanni Marco Dall'Olio

Fehrman Nat Gen 2014 - Journal Club
Fehrman Nat Gen 2014 - Journal ClubFehrman Nat Gen 2014 - Journal Club
Fehrman Nat Gen 2014 - Journal ClubGiovanni Marco Dall'Olio
ย 
Hg for bioinformatics, second part
Hg for bioinformatics, second partHg for bioinformatics, second part
Hg for bioinformatics, second partGiovanni Marco Dall'Olio
ย 
Hg version control bioinformaticians
Hg version control bioinformaticiansHg version control bioinformaticians
Hg version control bioinformaticiansGiovanni Marco Dall'Olio
ย 
Plotting data with python and pylab
Plotting data with python and pylabPlotting data with python and pylab
Plotting data with python and pylabGiovanni Marco Dall'Olio
ย 

More from Giovanni Marco Dall'Olio (20)

Fehrman Nat Gen 2014 - Journal Club
Fehrman Nat Gen 2014 - Journal ClubFehrman Nat Gen 2014 - Journal Club
Fehrman Nat Gen 2014 - Journal Club
ย 
Agile bioinf
Agile bioinfAgile bioinf
Agile bioinf
ย 
Version control
Version controlVersion control
Version control
ย 
Linux intro 5 extra: awk
Linux intro 5 extra: awkLinux intro 5 extra: awk
Linux intro 5 extra: awk
ย 
Linux intro 5 extra: makefiles
Linux intro 5 extra: makefilesLinux intro 5 extra: makefiles
Linux intro 5 extra: makefiles
ย 
Linux intro 4 awk + makefile
Linux intro 4  awk + makefileLinux intro 4  awk + makefile
Linux intro 4 awk + makefile
ย 
Linux intro 3 grep + Unix piping
Linux intro 3 grep + Unix pipingLinux intro 3 grep + Unix piping
Linux intro 3 grep + Unix piping
ย 
Linux intro 2 basic terminal
Linux intro 2   basic terminalLinux intro 2   basic terminal
Linux intro 2 basic terminal
ย 
Linux intro 1 definitions
Linux intro 1  definitionsLinux intro 1  definitions
Linux intro 1 definitions
ย 
Wagner chapter 5
Wagner chapter 5Wagner chapter 5
Wagner chapter 5
ย 
Wagner chapter 4
Wagner chapter 4Wagner chapter 4
Wagner chapter 4
ย 
Wagner chapter 3
Wagner chapter 3Wagner chapter 3
Wagner chapter 3
ย 
Wagner chapter 2
Wagner chapter 2Wagner chapter 2
Wagner chapter 2
ย 
Wagner chapter 1
Wagner chapter 1Wagner chapter 1
Wagner chapter 1
ย 
Hg for bioinformatics, second part
Hg for bioinformatics, second partHg for bioinformatics, second part
Hg for bioinformatics, second part
ย 
Hg version control bioinformaticians
Hg version control bioinformaticiansHg version control bioinformaticians
Hg version control bioinformaticians
ย 
Plotting data with python and pylab
Plotting data with python and pylabPlotting data with python and pylab
Plotting data with python and pylab
ย 
Pycon
PyconPycon
Pycon
ย 
Makefiles Bioinfo
Makefiles BioinfoMakefiles Bioinfo
Makefiles Bioinfo
ย 
biopython, doctest and makefiles
biopython, doctest and makefilesbiopython, doctest and makefiles
biopython, doctest and makefiles
ย 

Recently uploaded

Dubai Call Girls O525547&19 Calls Girls In Dubai (L0w+Charger)
Dubai Call Girls O525547&19 Calls Girls In Dubai (L0w+Charger)Dubai Call Girls O525547&19 Calls Girls In Dubai (L0w+Charger)
Dubai Call Girls O525547&19 Calls Girls In Dubai (L0w+Charger)kojalkojal131
ย 
VVIP Pune Call Girls Handewadi WhatSapp Number 8005736733 With Elite Staff An...
VVIP Pune Call Girls Handewadi WhatSapp Number 8005736733 With Elite Staff An...VVIP Pune Call Girls Handewadi WhatSapp Number 8005736733 With Elite Staff An...
VVIP Pune Call Girls Handewadi WhatSapp Number 8005736733 With Elite Staff An...SUHANI PANDEY
ย 
(๐Ÿ‘‰๏พŸ9999965857 ๏พŸ)๐Ÿ‘‰ VIP Call Girls Greater Noida ๐Ÿ‘‰ Delhi ๐Ÿ‘ˆ : 9999 Cash Payment...
(๐Ÿ‘‰๏พŸ9999965857 ๏พŸ)๐Ÿ‘‰ VIP Call Girls Greater Noida  ๐Ÿ‘‰ Delhi ๐Ÿ‘ˆ : 9999 Cash Payment...(๐Ÿ‘‰๏พŸ9999965857 ๏พŸ)๐Ÿ‘‰ VIP Call Girls Greater Noida  ๐Ÿ‘‰ Delhi ๐Ÿ‘ˆ : 9999 Cash Payment...
(๐Ÿ‘‰๏พŸ9999965857 ๏พŸ)๐Ÿ‘‰ VIP Call Girls Greater Noida ๐Ÿ‘‰ Delhi ๐Ÿ‘ˆ : 9999 Cash Payment...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
ย 
Top Rated Call Girls In Podanur ๐Ÿ“ฑ {7001035870} VIP Escorts Podanur
Top Rated Call Girls In Podanur ๐Ÿ“ฑ {7001035870} VIP Escorts PodanurTop Rated Call Girls In Podanur ๐Ÿ“ฑ {7001035870} VIP Escorts Podanur
Top Rated Call Girls In Podanur ๐Ÿ“ฑ {7001035870} VIP Escorts Podanurdharasingh5698
ย 
Dattawadi ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready Fo...
Dattawadi ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready Fo...Dattawadi ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready Fo...
Dattawadi ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready Fo...tanu pandey
ย 
Indapur - Virgin Call Girls Pune | Whatsapp No 8005736733 VIP Escorts Service...
Indapur - Virgin Call Girls Pune | Whatsapp No 8005736733 VIP Escorts Service...Indapur - Virgin Call Girls Pune | Whatsapp No 8005736733 VIP Escorts Service...
Indapur - Virgin Call Girls Pune | Whatsapp No 8005736733 VIP Escorts Service...SUHANI PANDEY
ย 
Teck Supplemental Information, May 2, 2024
Teck Supplemental Information, May 2, 2024Teck Supplemental Information, May 2, 2024
Teck Supplemental Information, May 2, 2024TeckResourcesLtd
ย 
AMG Quarterly Investor Presentation May 2024
AMG Quarterly Investor Presentation May 2024AMG Quarterly Investor Presentation May 2024
AMG Quarterly Investor Presentation May 2024gstubel
ย 
Pakistani Call girls in Ajman +971563133746 Ajman Call girls
Pakistani Call girls in Ajman +971563133746 Ajman Call girlsPakistani Call girls in Ajman +971563133746 Ajman Call girls
Pakistani Call girls in Ajman +971563133746 Ajman Call girlsgwenoracqe6
ย 
Call Girls Chandigarh Just Call 8868886958 Top Class Call Girl Service Available
Call Girls Chandigarh Just Call 8868886958 Top Class Call Girl Service AvailableCall Girls Chandigarh Just Call 8868886958 Top Class Call Girl Service Available
Call Girls Chandigarh Just Call 8868886958 Top Class Call Girl Service AvailableSheetaleventcompany
ย 
SME IPO Opportunity and Trends of May 2024
SME IPO Opportunity and Trends of May 2024SME IPO Opportunity and Trends of May 2024
SME IPO Opportunity and Trends of May 2024Transique Corporate Advisors
ย 
Corporate Presentation Probe Canaccord Conference 2024.pdf
Corporate Presentation Probe Canaccord Conference 2024.pdfCorporate Presentation Probe Canaccord Conference 2024.pdf
Corporate Presentation Probe Canaccord Conference 2024.pdfProbe Gold
ย 
B2 Interpret the brief.docxccccccccccccccc
B2 Interpret the brief.docxcccccccccccccccB2 Interpret the brief.docxccccccccccccccc
B2 Interpret the brief.docxcccccccccccccccMollyBrown86
ย 
Diligence Checklist for Early Stage Startups
Diligence Checklist for Early Stage StartupsDiligence Checklist for Early Stage Startups
Diligence Checklist for Early Stage StartupsTILDEN
ย 
BDSMโšกCall Girls in Hari Nagar Delhi >เผ’8448380779 Escort Service
BDSMโšกCall Girls in Hari Nagar Delhi >เผ’8448380779 Escort ServiceBDSMโšกCall Girls in Hari Nagar Delhi >เผ’8448380779 Escort Service
BDSMโšกCall Girls in Hari Nagar Delhi >เผ’8448380779 Escort ServiceDelhi Call girls
ย 
The Leonardo 1Q 2024 Results Presentation
The Leonardo 1Q 2024 Results PresentationThe Leonardo 1Q 2024 Results Presentation
The Leonardo 1Q 2024 Results PresentationLeonardo
ย 
Corporate Presentation Probe May 2024.pdf
Corporate Presentation Probe May 2024.pdfCorporate Presentation Probe May 2024.pdf
Corporate Presentation Probe May 2024.pdfProbe Gold
ย 
Nicola Mining Inc. Corporate Presentation May 2024
Nicola Mining Inc. Corporate Presentation May 2024Nicola Mining Inc. Corporate Presentation May 2024
Nicola Mining Inc. Corporate Presentation May 2024nicola_mining
ย 
Western Copper and Gold - May 2024 Presentation
Western Copper and Gold - May 2024 PresentationWestern Copper and Gold - May 2024 Presentation
Western Copper and Gold - May 2024 PresentationPaul West-Sells
ย 

Recently uploaded (20)

Dubai Call Girls O525547&19 Calls Girls In Dubai (L0w+Charger)
Dubai Call Girls O525547&19 Calls Girls In Dubai (L0w+Charger)Dubai Call Girls O525547&19 Calls Girls In Dubai (L0w+Charger)
Dubai Call Girls O525547&19 Calls Girls In Dubai (L0w+Charger)
ย 
VVIP Pune Call Girls Handewadi WhatSapp Number 8005736733 With Elite Staff An...
VVIP Pune Call Girls Handewadi WhatSapp Number 8005736733 With Elite Staff An...VVIP Pune Call Girls Handewadi WhatSapp Number 8005736733 With Elite Staff An...
VVIP Pune Call Girls Handewadi WhatSapp Number 8005736733 With Elite Staff An...
ย 
(๐Ÿ‘‰๏พŸ9999965857 ๏พŸ)๐Ÿ‘‰ VIP Call Girls Greater Noida ๐Ÿ‘‰ Delhi ๐Ÿ‘ˆ : 9999 Cash Payment...
(๐Ÿ‘‰๏พŸ9999965857 ๏พŸ)๐Ÿ‘‰ VIP Call Girls Greater Noida  ๐Ÿ‘‰ Delhi ๐Ÿ‘ˆ : 9999 Cash Payment...(๐Ÿ‘‰๏พŸ9999965857 ๏พŸ)๐Ÿ‘‰ VIP Call Girls Greater Noida  ๐Ÿ‘‰ Delhi ๐Ÿ‘ˆ : 9999 Cash Payment...
(๐Ÿ‘‰๏พŸ9999965857 ๏พŸ)๐Ÿ‘‰ VIP Call Girls Greater Noida ๐Ÿ‘‰ Delhi ๐Ÿ‘ˆ : 9999 Cash Payment...
ย 
Top Rated Call Girls In Podanur ๐Ÿ“ฑ {7001035870} VIP Escorts Podanur
Top Rated Call Girls In Podanur ๐Ÿ“ฑ {7001035870} VIP Escorts PodanurTop Rated Call Girls In Podanur ๐Ÿ“ฑ {7001035870} VIP Escorts Podanur
Top Rated Call Girls In Podanur ๐Ÿ“ฑ {7001035870} VIP Escorts Podanur
ย 
Dattawadi ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready Fo...
Dattawadi ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready Fo...Dattawadi ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready Fo...
Dattawadi ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready Fo...
ย 
(INDIRA) Call Girl Kashmir Call Now 8617697112 Kashmir Escorts 24x7
(INDIRA) Call Girl Kashmir Call Now 8617697112 Kashmir Escorts 24x7(INDIRA) Call Girl Kashmir Call Now 8617697112 Kashmir Escorts 24x7
(INDIRA) Call Girl Kashmir Call Now 8617697112 Kashmir Escorts 24x7
ย 
Indapur - Virgin Call Girls Pune | Whatsapp No 8005736733 VIP Escorts Service...
Indapur - Virgin Call Girls Pune | Whatsapp No 8005736733 VIP Escorts Service...Indapur - Virgin Call Girls Pune | Whatsapp No 8005736733 VIP Escorts Service...
Indapur - Virgin Call Girls Pune | Whatsapp No 8005736733 VIP Escorts Service...
ย 
Teck Supplemental Information, May 2, 2024
Teck Supplemental Information, May 2, 2024Teck Supplemental Information, May 2, 2024
Teck Supplemental Information, May 2, 2024
ย 
AMG Quarterly Investor Presentation May 2024
AMG Quarterly Investor Presentation May 2024AMG Quarterly Investor Presentation May 2024
AMG Quarterly Investor Presentation May 2024
ย 
Pakistani Call girls in Ajman +971563133746 Ajman Call girls
Pakistani Call girls in Ajman +971563133746 Ajman Call girlsPakistani Call girls in Ajman +971563133746 Ajman Call girls
Pakistani Call girls in Ajman +971563133746 Ajman Call girls
ย 
Call Girls Chandigarh Just Call 8868886958 Top Class Call Girl Service Available
Call Girls Chandigarh Just Call 8868886958 Top Class Call Girl Service AvailableCall Girls Chandigarh Just Call 8868886958 Top Class Call Girl Service Available
Call Girls Chandigarh Just Call 8868886958 Top Class Call Girl Service Available
ย 
SME IPO Opportunity and Trends of May 2024
SME IPO Opportunity and Trends of May 2024SME IPO Opportunity and Trends of May 2024
SME IPO Opportunity and Trends of May 2024
ย 
Corporate Presentation Probe Canaccord Conference 2024.pdf
Corporate Presentation Probe Canaccord Conference 2024.pdfCorporate Presentation Probe Canaccord Conference 2024.pdf
Corporate Presentation Probe Canaccord Conference 2024.pdf
ย 
B2 Interpret the brief.docxccccccccccccccc
B2 Interpret the brief.docxcccccccccccccccB2 Interpret the brief.docxccccccccccccccc
B2 Interpret the brief.docxccccccccccccccc
ย 
Diligence Checklist for Early Stage Startups
Diligence Checklist for Early Stage StartupsDiligence Checklist for Early Stage Startups
Diligence Checklist for Early Stage Startups
ย 
BDSMโšกCall Girls in Hari Nagar Delhi >เผ’8448380779 Escort Service
BDSMโšกCall Girls in Hari Nagar Delhi >เผ’8448380779 Escort ServiceBDSMโšกCall Girls in Hari Nagar Delhi >เผ’8448380779 Escort Service
BDSMโšกCall Girls in Hari Nagar Delhi >เผ’8448380779 Escort Service
ย 
The Leonardo 1Q 2024 Results Presentation
The Leonardo 1Q 2024 Results PresentationThe Leonardo 1Q 2024 Results Presentation
The Leonardo 1Q 2024 Results Presentation
ย 
Corporate Presentation Probe May 2024.pdf
Corporate Presentation Probe May 2024.pdfCorporate Presentation Probe May 2024.pdf
Corporate Presentation Probe May 2024.pdf
ย 
Nicola Mining Inc. Corporate Presentation May 2024
Nicola Mining Inc. Corporate Presentation May 2024Nicola Mining Inc. Corporate Presentation May 2024
Nicola Mining Inc. Corporate Presentation May 2024
ย 
Western Copper and Gold - May 2024 Presentation
Western Copper and Gold - May 2024 PresentationWestern Copper and Gold - May 2024 Presentation
Western Copper and Gold - May 2024 Presentation
ย 

The true story behind the annotation of a pathway

  • 1. Bioevo seminars The true story behind the annotation of a pathway Giovanni Dall'Olio, IBE (UPF-CEXS)
  • 2. Summary of the talk โ— We have recently published two works: โ— Dall'Olio GM, Jassal B, Montanucci L, Gagneux P, Bertranpetit J, Laayouni H. The annotation of the Asparagine N-linked Glycosylation pathway in the Reactome Database. Glycobiology. 2011 Jan 2. PubMed PMID: 21199820. โ— Dall'Olio GM, Bertranpetit J, Laayouni H. The annotation and the usage of scientific databases could be improved with public issue tracker software. Database (Oxford). 2010 Dec 23;2010:baq035. Print 2010. PubMed PMID: 21186182; PubMed Central PMCID: PMC3011984. โ— One is about the annotation of a pathway in a database, the other about reporting errors to databases
  • 3. What can you learn from this talk โ— The data annotated in scientific databases is not perfect, and can contain errors. Even when is correct, it can have multiple interpretations.
  • 4. What can you learn from this talk โ— The data annotated in scientific databases is not perfect, and can contain errors. Even when is correct, it can have multiple interpretations. โ— Errors don't get fixed by themselves, and problems don't get solved alone. When you find something wrong, it is your duty to report it.
  • 5. What is N-Glycosylation? โ— One of the most important forms of protein modification โ— A complex sugar composed by 14 units is N-glycans attached to a protein, and later modified. protein Cell membrane
  • 6. What is N-Glycosylation? (II) โ— The surface of a cell is usually covered by N- glycosylated proteins โ— It enhances solubility and is required for the proper folding.
  • 7. N-Glycosylation โ€“ how does it works โ— A common N-Glycan precursor is attached to a nascent protein, and then modified Millions of N-glycan Intermediate different N- precursor N-glycan glycans
  • 8. Advanced N-Glycosylation โ— The same protein can have different N- Glycosylation on different tissues and times Protein A in the Protein A in the Protein A during a morning evening party Protein A on an Protein A on a Protein A on a dead eritrocite epithelial cell cell
  • 9. N-Glycosylation is a text-book example โ— The first part of this pathway was characterized in the 1980's by random mutagenesis in yeast (the ALG mutants) โ— N-glycosylation is important to biotechnologists because of its implications for drugs biosynthesis
  • 10. Annotating the pathway โ— My PhD thesis is about selection within the genes of N-Glycosylation, so I had to study the biology behind it
  • 11. Annotating the pathway โ— My PhD thesis is about selection within the genes of N-Glycosylation, so I had to study the biology behind it โ— I was not happy with the current annotation of the pathway in other databases, so I decided to make a new annotation by myself, and publish it on a public database
  • 12. Reactome โ— A database for biological relevant pathways โ— TCA cycle, apoptosis, telomerases, influenza...
  • 13. Reactome - advantages โ— Open-source approach. Anyone can contribute to a pathway, and all contributions are public and transparent โ— Annotates a greater number of informations for each reaction โ— Does not artificially distinguish between metabolic/interaction pathways (uses GO terms instead)
  • 14. The process of annotation in Reactome โ— Annotating a pathway in Reactome means that for each reaction, you have to details like: โ— Description and name of the reaction โ— Genes involved โ— GO localization terms for input, outputs and enzymes โ— References to experimental evidences
  • 15. The process of annotation in Reactome Our N-Glycosylation pathway
  • 16. The process of annotation in Reactome โ— Annotating a pathway in Reactome means that for each reaction, you have to provide details like: โ— Description and name of the reaction โ— Genes involved โ— GO localization terms for input, outputs and enzymes โ— References to experimental evidences โ— Each reaction must pass a peer-reviewed process before being included in Reactome
  • 17. N-Glycosylation on Reactome final result โ— It tooks about 6 months of work to annotate the pathway โ— ~100 reactions โ— lot of literature read โ— problems with entries in other databases โ— technical implementation problems โ— problems, problems, problems, time wasted
  • 18. Is it worth to annotate a pathway manually as I did? โ— If you are a PhD student: probably yes, and at the beginning โ— Pros: โ— Helps you having good ideas โ— Recognize how much you can trust database annotations โ— Can be published โ— Cons: โ— Takes time โ— Not feasible for larger scale studies
  • 19. The Reactome pathway, complete http://tinyurl.com/nglyco-reactome
  • 20. N-Glycosylation in other databases โ— Historically, this pathway is well characterized and is a textbook example of a metabolic pathway โ— It is a good example to compare its annotations in different databases
  • 21. N-Glycosylation in other databases โ— Historically, this pathway is well characterized and is a textbook example of a metabolic pathway โ— It is a good example to compare its annotations in different databases โ— If you dedicate 6 months to read tons of literature about what you are studying, which kind of errors do you expect to find in databases?
  • 22. GeneOntology (GO) โ— GO is an ontology of terms to describe the function, localization of a gene โ— A dictionary designed to reduce the problem of synonyms and unclear terminology
  • 23. What is a GO annotation? โ— A GO annotation is the association between a GO term and a gene, protein or event โ— The gene ALG1 is Integral to the membrane (GO: 32130) โ— The protein XYZ has peroxidase activity (GO:03123)
  • 24. GO annotations โ— GO annotations are frequently used in the scientific literature โ— This set of genes is enriched of a specific GO term โ— We subdivided the pathway in GO modules and studied selection within them โ— However, which are the procedures to associate a GO term with a gene? โ— Which are the possible sources of error in a GO association?
  • 25. The GO term assignment โ— In order to assign a GO term to a protein in a species, there it must be an experimental evidence for the association in that species
  • 26. The GO term assignment โ— In order to assign a GO term to a protein in a species, there it must be an experimental evidence for the association in that species Lots of false negatives in GO!
  • 27. Example of false negative in GO โ— We proposed the association of DPAGT1 with the term 'Integral to the ER membrane' in human โ— All the literature assumes that this association exists โ— This association is annotated in Kegg for Yeast
  • 28. Example of false positive in GO โ— We proposed the association of DPAGT1 with the term 'Integral to the ER membrane' in human โ— All the literature assumes that this association exists โ— This association is annotated in Kegg for Yeast โ— The proposal has been rejected because there it were no experimental evidence in human โ— Probably there it won't never be any
  • 29. DPAGT1 on GO issue tracker
  • 30. Ambigous interpretation of the term N-Glycosylation in GO โ— We found a big mistake with the term 'N- Glycosylation' โ— This was being associated to two classes of genes: those which are glycosylated, and those which participate to the process of glycosylation
  • 31. Ambigous interpretation of the term N-Glycosylation in GO N-Glycosylation pathway N-Glycosylated protein World J Gastroenterol. 2010 December 21; 16(47): 5916- Essentials of Glycobiology, 3rd edition 5924. merged
  • 32. Summary of GO annotations for N- glycosylation โ— ~80 correct GO classification โ— ~50 new gene-term association proposed โ— 10 genes had a clearly wrong classification โ— 2 new GO terms proposed: โ— Endoplasmic reticulum quality control compartment โ— Intrinsic to the lumenal side of the endoplasmic reticulum :-)
  • 33. The GO issue tracker โ— All the errors on GO elements are tracked on a publicly accessible online application, the issue tracker โ— From there, you can report new errors, or check whether there are reported cases for the terms you are using
  • 35. KEGG/Pathways โ— A database of Pathways, curated by experts in the field and manually drawn
  • 36. KEGG/Pathway โ— KEGG/Pathways is the most renowned database for biological pathways โ— Pros: โ— Lot of pathways annotated โ— Cons: โ— Authors of the pathways are unknown, and no references are given for each reaction. Difficult to ask for clarification
  • 38. KEGG/Pathways โ— KEGG/pathways diagrams show a simplified version of the pathway โ— These diagrams are not wrong, but they are there for visualization only. โ— Easy to interpret them erroneously
  • 39. This is how the latter part of the pathway really looks like: Advanced N-Glycosylation in KEGG Real represenation of advanced N-Glycosylation
  • 40. Why it is important to have references for reactions โ— For our knowledge of the literature, there it should be a trifurcation at this point
  • 41. Why it is important to have references for reactions โ— For our knowledge of the literature, there it should be a trifurcation at this point โ— Instead, it is shown here:
  • 42. Why it is important to have references for reactions โ— For our knowledge of the literature, there it should be a triple bifurcation at this point โ— Instead, it is shown here: โ— Without references, it is impossible to know why the annotator put the bifurcation there
  • 43. String โ— Database of interactions among proteins โ— Collects informations from different resources and merge them together (example of meta- database)
  • 44. False positives in interactions databases โ— Interaction databases are usually full of false positives and negatives โ— Yeast-two hybrids and similar โ— Automatic processing of scientific papers
  • 45. String โ€“ the pathway is not there โ— Most of the interactions in N-Glycosylation were not found in String โ— Latest release has improved โ— The same term interaction is ambiguous
  • 46. String, the ALG2 case โ— The biggest issue we found in String was due to two genes with similar names โ— There are two genes with the symbol ALG2: โ— ALG2 (Asparagine Linked Glycosylation 2) โ— ALG-2 (Apoptosis Linked Gene โ€“ 2) โ— In string, these two were confused ALG2 ALG-2
  • 47. The ALG11 case โ— Similar case for ALG11 โ— Was confused with ALG11, a Ribonucleosomal protein ALG11 ALG11 (N-Glycosylation) (Ribonucleosomal protein)
  • 48. What's wrong with String โ— The main problem with String is not that there are many false positives/negatives, โ— rather, is that there is no way to annotate or report wrong interactions โ— It would be good if I could tell other users about these false positives
  • 49. Uniprot โ— Uniprot is a database for annotations of proteins โ— The standard resource to use when looking for information on a protein
  • 50. Annotations in Uniprot โ— We have found very few errors in the annotations in Uniprot โ— Mostly they were due to outdated names or minor imprecisions โ— Every Uniprot entry has a 'Send me feedback' button
  • 51. So, I have found an error in an annotation โ€“ now what? โ— Ignore the error
  • 52. So, I have found an error in an annotation โ€“ now what? โ— Ignore the error โ— Wrong because the error remains there and can affect other people's work
  • 53. So, I have found an error in an annotation โ€“ now what? โ— Ignore the error โ— Wrong because the error remains there and can affect other people's work โ— Use a different database
  • 54. So, I have found an error in an annotation โ€“ now what? โ— Ignore the error โ— Wrong because the error remains there and can affect other people's work โ— Use a different database โ— Even worst than the previous: doesn't fix the error and waste times
  • 55. So, I have found an error in an annotation โ€“ now what? โ— Ignore the error โ— Wrong because the error remains there and can affect other people's work โ— Use a different database โ— Even worst than the previous: doesn't fix the error and waste times โ— Report the error
  • 56. So, I have found an error in an annotation โ€“ now what? โ— Ignore the error โ— Wrong because the error remains there and can affect other people's work โ— Use a different database โ— Even worst than the previous: doesn't fix the error and waste times โ— Report the error โ— This is the best approach, however not everybody knows how to do it, and it requires time
  • 57. Current state of reporting errors in the scientific community โ— Unfortunately, reporting errors is not encouraged in the modern scientific literature โ— Time consuming, but not acknowledged โ— Few people know how to do it โ— Made difficult by lack of transparency
  • 58. Issue trackers โ— Usually the best way to report an error is by using an issue tracker software โ— When you find an error in a database, always check whether there is a public issue tracker โ— If not, things get more difficult
  • 59. How to report an error in GO โ— Check the instructions โ— Verify the error has not been reported yet โ— Go to the tracker Home page, and click on 'Report new issue' โ— Describe the problem with a good title and references
  • 60. Take-home messages โ— Before using any data from a scientific database: โ— Check whether they have a public issue tracker and check whether there are errors reported about your data โ— Read the literature!
  • 61. Agradecimientos (Thank you, grazie, moltes gracies) Thanks also to: โ— Martin Sikora โ— Kevin Keys โ— Anna Bauer-Mehren from IMIM โ— Everybody in BioEvo! http://www.ibe.upf-csic.es/ibe/research/research-groups/bertranpetit.html