SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
BIONLP'09 Shared Task



 Farzaneh Sarafraz
 James Eales
 Reza Mohammadi
 Goran Nenadic
26 March 2009
                      
BioNLP'09 Task 1

        Events in abstracts
    



        Given: gene and gene products (proteins)
    



        Wanted: events
    



            type
        −

            trigger
        −

            participant(s)
        −

            cause (if applicable)
        −



                                     
Example
    quot;I kappa B/MAD­3 masks the nuclear localization 
      signal of NF­kappa B p65 and requires the 
      transactivation domain to inhibit NF­kappa B 
      p65 DNA binding. quot;


    Event: negative regulation
    Trigger: masks
    Theme1: the first p65
    Cause: MAD­3


                             
Event Types

        Gene expression              Binding
                                



        Transcription                Regulation
                                



        Protein Catabolism           Positive regulation
                                



        Localisation                 Negative regulation
                                



        Phosphorylation
    




                              
Training and Test Data

        Training data: 800 abstracts
    



        Development data: 150 abstracts
    



        Test data: 260 abstracts
    




                               
Our System

    1) Finding trigger and type
    2) Finding participants (themes)
    3) Post processing




                             
1) Finding Triggers and Types ­ CRF
quot;I kappa B/MAD­3 masks the nuclear localization...quot; 
  0   0   0  0      9    0     0          0


quot;The binding of I kappa B/MAD­3 to NF­kappa B p65 is 
  0      0    0 0    0  0   0    0     0    0  0   0
sufficient to retarget NF­kappa B p65 from the
    0       0     4        0    0   0   0    0
nucleus to the cytoplasm.quot;
   0     0   0      0


9: negative regulation
4: localisation
                          
CRF features for each token

        is­protein
    



        is­PPI­word
    



        generic POS tag
    



        log­frequency of token being a trigger for each 
    


        event type (10 features)
        number of proteins in sentence (sentence­level)
    




                                
Trigger Detection Post Processing

        Positive discrimination
    



            Manually looking at false negatives
        −

            Adding recurring triggers
        −

        Negative discrimination
    



            Manually looking at false positives
        −

            Filtering out common mistaken tokens
        −




                                    
Trigger Detection Results

     Event Class #Gold         R      P     F­score
     Localisation        40    77.5   47.69 59.05
     Binding            180   33.33   54.55 41.38
     Gene expression 282       76.6   58.54 66.36
     Transcription       68   58.82     18.6 28.27
                                      88.89 86.49
     Protein catabolism 19    84.21
                               97.5   81.25 88.64
     Phosphorylation 40
     Non­reg total      629   63.91   48.73    55.3
     Regulation         138   13.04   62.07 21.56
     Positive regulation462   13.85   54.24 22.07
     Neg. regulation 153      29.41   45.92 35.86
     All total         1382   38.28   49.44 43.15

                                  
2) Finding Participants

        Type and number of participants
    



            1 theme (protein)
        −                                     1 theme and 1 cause 
                                          −
                                              (proteins/other events)
                 Gene expression
             



                 Transcription                     Regulation
                                              



                 Protein Catabolism                Positive regulation
                                              



                 Localisation                      Negative regulation
                                              



                 Phosphorylation
             



            1 or more themes (protein)
        −

                 Binding
             



                                       
Parse Tree Distance




                   
Parse Tree Distance Analysis




                   
Theme in Subtree

        Single Theme events
    



            Theme in subtree  0.7054
        −

            Theme not in subtree  0.2946
        −

        Binding event
    



            Any theme in subtree = 0.5435
        −

            Any theme not in subtree = 0.4565
        −

        Regulation events
    



            Either theme or cause in subtree = 0.5919
        −
                                   
            Either theme or cause not in subtree = 0.4081
        −
Distance in Trigger Subtree




                    
Distances not in Trigger Subtree




                    
Rules Concerning Parse Tree Analysis

        For quot;bindingquot;, report as themes:
    



            up to the second closest protein in the subtree
        −

            and the first closest protein in the rest of the tree
        −


            quot;In contrast, gp41 failed to stimulate NF­kappaB 
            binding activity in as much as no NF­kappaB bound to 
            the main NF­kappaB­binding site 2 of the IL­10 
            promoter after addition of gp41.quot;


        Successfully missing out the final 
    


        gp41.
                                      
Example of a Missed (FN) Theme

        For gene expression
    



            All the proteins in the subtree are reported as 
        −
            themes
        quot;The 15­lipoxygenase (lox) gene is expressed in a 
          tissue­specific manner, predominantly in 
          erythroid cells but also in airway epithelial 
          cells and eosinophils.quot;
                        is
                       /   
                   gene   expressed
                     |
                                     
             15­lipoxygenase
Evaluation on Development Data

      Event Class        #Gold         R       P     F­score
      Localisation          53       67.92   46.75    55.38
      Binding              312       21.47   63.81    32.13
      Gene expression      356       64.61   76.33    69.98
                                             89.8
      Transcription         82       53.66            67.18
                                                      77.55
      Protein catabolism    21       90.48   67.86
                                     91.49
      Phosphorylation       47               53.09    67.19
      Non­reg total        871        50.4   68.44    58.05
      Regulation           172        5.23   33.33    9.05
      Positive regulation 632         3.48   21.36    5.99
      Neg. regulation      201        9.45   15.08    11.62
      Regulatory total    1005        4.98   19.53    7.93
      All total           1876       26.07   54.46    35.26
                                  
Evaluation on Test Data

      Event Class         #Gold R       P F­score
      Localisation          174 44.83 53.06 48.6
      Binding               347 12.68 40.37 19.3
                            722 52.63 69.34 59.84
      Gene expression
      Transcription         137 15.33 67.74  25
      Protein catabolism     14 42.86  50   46.15
                            135 78.52 53.81 63.86
      Phosphorylation
      Non­reg total        1529 41.53 60.82 49.36
      Regulation            291 3.09 19.15    5.33

      Positive regulation   983 1.12 8.87 1.99
      Neg. regulation       379 12.4 20.52 15.46
      Regulatory total     1653 4.05 16.75 6.53
      All total            3182 22.06 48.61 30.35
                             
Results: Ranked 12 out of 24 teams

Rank     R       P     F­Score       Rank     R       P     F­Score
1      46.73   58.48    51.95        13     25.96   36.26    30.26
2      45.82   47.52    46.66        14     20.93   49.3     29.38
3      34.98   61.59    44.62        15     22.69   40.55     29.1
4      36.9    55.59    44.35        16     21.53   36.99    27.21
5      33.41   51.55    40.54        17     17.44   39.99    24.29
6      28.13   53.56    36.88        18     28.63   20.88    24.15
7      28.22   45.78    34.92        19     13.45   71.81    22.66
8      27.75   46.6     34.78        20     22.78   19.03    20.74
9      21.62   62.21    32.09        21     30.42   14.11    19.28
10     21.12   56.9      30.8        22     11.25   66.54    19.25
11     22.5    47.7     30.58        23     11.69   31.42    17.04
12     22.06   48.61    30.35        24      9.4    61.65    16.31
                                  
End.




        
Other Tasks

        Event detection and characterization
    



        Event argument recognition
    



        Negations and speculations
    




                               
Example
    quot;I kappa B/MAD­3 masks the nuclear localization 
      signal of NF­kappa B p65 and requires the 
      transactivation domain to inhibit NF­kappa B 
      p65 DNA binding. quot;


    Event: negative regulation
    Trigger: masks
    Theme1: the first p65
    Cause: MAD­3
    Site: nuclear localization signal

                             
Example
    quot;In contrast, NF­kappa B p50 alone fails to 
      stimulate kappa B­directed transcription, and 
      based on prior in vitro studies, is not 
      directly regulated by I kappa B. quot;


    Event: regulation
    Theme1: this p50
    Trigger: regulated
    Negation: true for this event
    Speculation: none

                             

Mais conteúdo relacionado

Destaque (11)

the_life_cycle_of_a_wireframe
the_life_cycle_of_a_wireframethe_life_cycle_of_a_wireframe
the_life_cycle_of_a_wireframe
 
Six Month
Six MonthSix Month
Six Month
 
Eoy
EoyEoy
Eoy
 
Tinsleys 7 Accomplishments
Tinsleys 7 AccomplishmentsTinsleys 7 Accomplishments
Tinsleys 7 Accomplishments
 
Rosario Hearst
Rosario HearstRosario Hearst
Rosario Hearst
 
Edu
EduEdu
Edu
 
BioNLP09 Winners
BioNLP09 WinnersBioNLP09 Winners
BioNLP09 Winners
 
Language
LanguageLanguage
Language
 
Defense
DefenseDefense
Defense
 
Olivia Contradictions
Olivia ContradictionsOlivia Contradictions
Olivia Contradictions
 
Ambiguity
AmbiguityAmbiguity
Ambiguity
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Bionlp09

  • 1. BIONLP'09 Shared Task Farzaneh Sarafraz James Eales Reza Mohammadi Goran Nenadic 26 March 2009    
  • 2. BioNLP'09 Task 1 Events in abstracts  Given: gene and gene products (proteins)  Wanted: events  type − trigger − participant(s) − cause (if applicable) −    
  • 3. Example quot;I kappa B/MAD­3 masks the nuclear localization  signal of NF­kappa B p65 and requires the  transactivation domain to inhibit NF­kappa B  p65 DNA binding. quot; Event: negative regulation Trigger: masks Theme1: the first p65 Cause: MAD­3    
  • 4. Event Types Gene expression Binding   Transcription Regulation   Protein Catabolism Positive regulation   Localisation Negative regulation   Phosphorylation     
  • 5. Training and Test Data Training data: 800 abstracts  Development data: 150 abstracts  Test data: 260 abstracts     
  • 6. Our System 1) Finding trigger and type 2) Finding participants (themes) 3) Post processing    
  • 7. 1) Finding Triggers and Types ­ CRF quot;I kappa B/MAD­3 masks the nuclear localization...quot;  0   0   0  0      9    0     0          0 quot;The binding of I kappa B/MAD­3 to NF­kappa B p65 is  0      0    0 0    0  0   0    0     0    0  0   0 sufficient to retarget NF­kappa B p65 from the   0       0     4        0    0   0   0    0 nucleus to the cytoplasm.quot;  0     0   0      0 9: negative regulation 4: localisation    
  • 8. CRF features for each token is­protein  is­PPI­word  generic POS tag  log­frequency of token being a trigger for each   event type (10 features) number of proteins in sentence (sentence­level)     
  • 9. Trigger Detection Post Processing Positive discrimination  Manually looking at false negatives − Adding recurring triggers − Negative discrimination  Manually looking at false positives − Filtering out common mistaken tokens −    
  • 10. Trigger Detection Results Event Class #Gold R P F­score Localisation 40 77.5 47.69 59.05 Binding 180 33.33 54.55 41.38 Gene expression 282 76.6 58.54 66.36 Transcription 68 58.82 18.6 28.27 88.89 86.49 Protein catabolism 19 84.21 97.5 81.25 88.64 Phosphorylation 40 Non­reg total 629 63.91 48.73 55.3 Regulation 138 13.04 62.07 21.56 Positive regulation462 13.85 54.24 22.07 Neg. regulation 153 29.41 45.92 35.86 All total 1382 38.28 49.44 43.15    
  • 11. 2) Finding Participants Type and number of participants  1 theme (protein) − 1 theme and 1 cause  − (proteins/other events) Gene expression  Transcription Regulation   Protein Catabolism Positive regulation   Localisation Negative regulation   Phosphorylation  1 or more themes (protein) − Binding     
  • 14. Theme in Subtree Single Theme events  Theme in subtree  0.7054 − Theme not in subtree  0.2946 − Binding event  Any theme in subtree = 0.5435 − Any theme not in subtree = 0.4565 − Regulation events  Either theme or cause in subtree = 0.5919 −     Either theme or cause not in subtree = 0.4081 −
  • 17. Rules Concerning Parse Tree Analysis For quot;bindingquot;, report as themes:  up to the second closest protein in the subtree − and the first closest protein in the rest of the tree − quot;In contrast, gp41 failed to stimulate NF­kappaB  binding activity in as much as no NF­kappaB bound to  the main NF­kappaB­binding site 2 of the IL­10  promoter after addition of gp41.quot; Successfully missing out the final   gp41.    
  • 18. Example of a Missed (FN) Theme For gene expression  All the proteins in the subtree are reported as  − themes quot;The 15­lipoxygenase (lox) gene is expressed in a  tissue­specific manner, predominantly in  erythroid cells but also in airway epithelial  cells and eosinophils.quot;                 is                /               gene   expressed              |          15­lipoxygenase
  • 19. Evaluation on Development Data Event Class #Gold R P F­score Localisation 53 67.92 46.75 55.38 Binding 312 21.47 63.81 32.13 Gene expression 356 64.61 76.33 69.98 89.8 Transcription 82 53.66 67.18 77.55 Protein catabolism 21 90.48 67.86 91.49 Phosphorylation 47 53.09 67.19 Non­reg total 871 50.4 68.44 58.05 Regulation 172 5.23 33.33 9.05 Positive regulation 632 3.48 21.36 5.99 Neg. regulation 201 9.45 15.08 11.62 Regulatory total 1005 4.98 19.53 7.93 All total 1876 26.07 54.46 35.26    
  • 20. Evaluation on Test Data Event Class #Gold R P F­score Localisation 174 44.83 53.06 48.6 Binding 347 12.68 40.37 19.3 722 52.63 69.34 59.84 Gene expression Transcription 137 15.33 67.74 25 Protein catabolism 14 42.86 50 46.15 135 78.52 53.81 63.86 Phosphorylation Non­reg total 1529 41.53 60.82 49.36 Regulation 291 3.09 19.15  5.33 Positive regulation 983 1.12 8.87 1.99 Neg. regulation 379 12.4 20.52 15.46 Regulatory total 1653 4.05 16.75 6.53 All total 3182 22.06 48.61 30.35    
  • 21. Results: Ranked 12 out of 24 teams Rank R P F­Score Rank R P F­Score 1 46.73 58.48 51.95 13 25.96 36.26 30.26 2 45.82 47.52 46.66 14 20.93 49.3 29.38 3 34.98 61.59 44.62 15 22.69 40.55 29.1 4 36.9 55.59 44.35 16 21.53 36.99 27.21 5 33.41 51.55 40.54 17 17.44 39.99 24.29 6 28.13 53.56 36.88 18 28.63 20.88 24.15 7 28.22 45.78 34.92 19 13.45 71.81 22.66 8 27.75 46.6 34.78 20 22.78 19.03 20.74 9 21.62 62.21 32.09 21 30.42 14.11 19.28 10 21.12 56.9 30.8 22 11.25 66.54 19.25 11 22.5 47.7 30.58 23 11.69 31.42 17.04 12 22.06 48.61 30.35 24 9.4 61.65 16.31    
  • 22. End.    
  • 23. Other Tasks Event detection and characterization  Event argument recognition  Negations and speculations     
  • 24. Example quot;I kappa B/MAD­3 masks the nuclear localization  signal of NF­kappa B p65 and requires the  transactivation domain to inhibit NF­kappa B  p65 DNA binding. quot; Event: negative regulation Trigger: masks Theme1: the first p65 Cause: MAD­3 Site: nuclear localization signal    
  • 25. Example quot;In contrast, NF­kappa B p50 alone fails to  stimulate kappa B­directed transcription, and  based on prior in vitro studies, is not  directly regulated by I kappa B. quot; Event: regulation Theme1: this p50 Trigger: regulated Negation: true for this event Speculation: none