SlideShare a Scribd company logo
1 of 30
Download to read offline
Author Disambiguation for
enhanced science 2.0 services
Generating groups based on “thin” data




Jeffrey Demaine
Institute for Research Information and Quality Assurance (iFQ)
Bonn, Germany
www.research-information.de
Outline
• Describe author-disambiguation efforts at iFQ
   – Good results based on limited metadata.
   – Simple approach can achieve remarkable results.

• Adapt the findings to the context of an online service.
• Most author-disambiguation articles achieve remarkable
  results by starting with clean (manually curated) data.
   – What if you don't have clean, curated data?
   – i.e.: the community-generated content of Science 2.0




           Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                               08/2011
Challenge
• Web of Science database does not group author names
  by identity: each name-instance gets a different ID number.
• Example: There are 100 articles by Smith, J
   – 100 different authorID numbers
   – 100 different researchers?
   – 1 researcher published them all?
   – Something in-between?

• How can iFQ measure a researcher´s productivity if we
  don't know who that person is?



         Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                             08/2011
Challenge
Goal:
• To reduce uncertainty about identity
• Identify and group homonyms
   – same spelling = same thing



Constraints:
• WoS only uses the first initial.
   – Few records have a first name
• Affiliation data is suspect
   – Corresponding author only


          Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                              08/2011
Ignoring synonyms
• Spelling mistakes, anglicisation of characters (changing
  ö into oe or o), name changes due to marriage, etc. are
  beyond the scope of this project.

• While J. Smith can be disambiguated, we will never
  know if Jane Smith is a mis-spelling of June Smith.
  (Maybe it’s really Janet Smith, or Jane Smyth, or…)

• Speculation about what could be cannot be resolved.



         Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                             08/2011
Matching on metadata
    Thiel, C
    Thiel, CM; Bentley, P; Dolan, RJ (2002) Effects of
    cholinergic enhancement on conditioning-related
?   responses in human auditory cortex. EUR J NEUROSCI


    Thiel, C
    Thiel, CM; Henson, R; Morris, JS; Friston, KJ; Dolan, RJ
    (2001) Pharmacological modulation of behavioral and
    neuronal correlates of repetition priming. J NEUROSCI
     These articles have on co-author in common: we can be pretty
       sure they were both written by the same Christiane Thiel
          Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                              08/2011
Matching on metadata
                                                              Thiel, C




                                                                         Smith, J
Author-instances are matched
based on Bibliographic Coupling

                                                              Thiel, C




       Institute for Research Information and Quality Assurance               Jeffrey Demaine
                                                                                       08/2011
Types of matching tested
4 types of matching were tested:
• > 1 co-author
• > 1 co-author and/or > 1 reference
• > 1 co-author and/or > 2 references
• > 1 co-author and/or > 1 reference [Jaccard Similarity Coeff.]

Evaluation of matching:
• Fewer groups = simplification of identity.                        RATIO
• How accurate are the groups?                                      PRECISION
• How efficient are the groups?                                     RECALL
         Institute for Research Information and Quality Assurance      Jeffrey Demaine
                                                                                08/2011
Types of matching tested
• Compare each group against the Emmy Noether
  data, which we know represents a single person.
• How accurate is the group with the most matches?

That is:
• We know that a J Smith wrote 15 articles.
• Which group contains the most of those?
• Which method of matching creates the best groups?



        Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                            08/2011
Goal #1: Reducing uncertainty
14 names from Emmy Noether dataset
(all records since 1992)
  Name tested    WoS                                             Simplification of identity
B Borasoy         59                                     Co-author matching                       95         90%
R Huber           945                     Number of      Co-authors & reference                   65         93%
A Blaukat         44                      groups by:     Co-authors & 2 references                76         92%
G Behre           98
                                                         Co-authors & reference Jaccard           154        84%
M Albrecht        652
L Ackermann       60
R Everaers        43                    (945 x 3) - 1 x (944 x 3) - 1 > 8 million
P Neumann         317
K Niebuhr         13
                                                                                R Huber - manual matching
C Ochsenfeld
JW Pan
                  33
                  192
                                        Co-author                          70,839,681      Co-author comparisons
T Pietschmann     37                    matching                             1180661
                                                                              147583
                                                                                           minutes (1 per second)
                                                                                                 8-hour days
B Regenberg       30
K Wiegand         28
                                        took 5m 27sec                          29517         5-day work weeks
                                                                                  615     years (4 weeks vacation)




                Institute for Research Information and Quality Assurance                                             Jeffrey Demaine
                                                                                                                              08/2011
Types of matching tested
          Which method of matching creates the best groups?
              Precision (%)




Matching by co-authors & 2 references                                                 75% Precision
Matching by co-authors & 2 references & author starting year                          79% Precision
                     Precision = The ratio of correct matches to all matches (% correct)

            Institute for Research Information and Quality Assurance                       Jeffrey Demaine
                                                                                                    08/2011
Goal #2: Split known names
• Use the co-author & 2-references matching technique.
• We know a priori the identities of the persons, so we
  know how many groups there should be.
• This allows us to calculate Recall.
• Also allows us to calculate correlation coefficients.




         Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                             08/2011
Who is E. Cardellach?
Esteve Cardellach                                              Estel Cardellach
Bellaterra, Spain                                              Bellaterra, Spain
Universitat Autònoma de Barcelona                              Insitut de Ciècies de l’ESPAI
• Environmental Engineering                                    • Computer Science, A.I.
• Environmental Sciences                                       • Electrical & Electronic Engineering
• Geochemistry & Geophysics                                    • Imaging Science & Photo. Tech.
• Geology                                                      • Multidisciplinary Geosciences
• Mineralogy                                                   • Remote Sensing
• Civil Engineering                                            • Telecommunications


      Web of Science identifies both as simply “Cardellach, E”


           Institute for Research Information and Quality Assurance                       Jeffrey Demaine
                                                                                                   08/2011
Using the program
• There are 41 distinct records published since 2002 that
  match the name “Cardellach, E”.
• The program grouped the names according to the co-
  authors and/or references (minimum 2) they share.
• Involved the comparison of 9202 co-authors and of
  19777 references, which took only 21 seconds (!).
• The RESULT was 4 groups containing all 41 records.
• Googled article titles to verify author name & address.



         Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                             08/2011
Precision & Recall
                              Condition (reality)               Condition (reality)
                                   TRUE                             FALSE
  Test (prediction)                                                                    Precision
                               True positive                      False positive
       TRUE                                                                           TP ÷ (TP + FP)
  Test (prediction)
                               False negative                    True negative
      FALSE
                                     Recall
                               TP ÷ (TP + FN)

Precision: % of results that are correct (accuracy).

Recall: % of all relevant matches that were made (efficiency).
Must know the total number of correct answers there are.


The Matthews correlation coefficient
gives a statistical measure of the results:



            Institute for Research Information and Quality Assurance                         Jeffrey Demaine
                                                                                                      08/2011
Disambiguation of Estel and Esteve
                                  In reality:                         In reality:
                                   Esteve                               Estel
                                                                                     Precision
 Predicted: Esteve                      25                                1
                                                                                       96%
 Predicted: Estel                        3                               12
                                     Recall                                         Matthews c.c.
                                      89%                                               0.79



The one false positive occurred when Estel published a
paper in the field of Geochemistry & Geophysics.




           Institute for Research Information and Quality Assurance                        Jeffrey Demaine
                                                                                                    08/2011
Disadvantages
• Not the most advanced approach.
   – Does not use NLP on title, abstract.
• High variablility of precision: groups can contain errors.
• Smaller (“orphan“) groups require manual matching.
• Very common names produce a large number of
  groups (naturally). Requires some post-processing.




         Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                             08/2011
Advantages
• It is highly automated, requiring only the input of a
  name and a year with which to limit the search.
• It runs very quickly, with hundreds of names being
  disambiguated in a few minutes.
• Only have to verify one name-instance per group.
• Works with the real-world data we have (FIZ
  databases), not some hand-curated test set.




         Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                             08/2011
What about science 2.0?
• Most author-disambiguation articles achieve remarkable
  results by starting with a very clean data-set.
• In a science 2.0 context, content is user-generated and
  nobody cleans the data.
• But that does not mean one cannot extract interesting
  patterns from the metadata.
• Example: author-disambiguation based on co-authors,
  co-citation, and shared keywords.




         Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                             08/2011
Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                    08/2011
Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                    08/2011
“Loren M. Frank” found 7 times in Karlmarx20’s library


•    Emery N. Brown, David P. Nguyen, Loren M. Frank, Matthew A. Wilson, Victor Solo. “An analysis of neural
     receptive field plasticity by point process adaptive filtering[Quick Edit]” Proceedings of the National Academy of
     Sciences of the United States of America, Vol. 98, No. 21. (9 October 2001), pp. 12261-12266.
•    Emery N. Brown, Loren M. Frank, Dengda Tang, Michael C. Quirk, Matthew A. Wilson . “A Statistical Paradigm
     for Neural Spike Train Decoding Applied to Position Prediction from Ensemble Firing Patterns of Rat Hippocampal
     Place Cells”[Quick Edit] J. Neurosci., Vol. 18, No. 18. (15 September 1998), pp. 7411-7425.
•    Emery N. Brown, Riccardo Barbieri, Valérie Ventura, Robert E. Kass, Loren M. Frank . “The Time-Rescaling
     Theorem and Its Application to Neural Spike Train Data Analysis”[Quick Edit]. Neural Computation, Vol. 14, No. 2.
     (19 December 2010), pp. 325-346.
•    Riccardo Barbieri, Loren M. Frank, Michael C. Quirk, Matthew A. Wilson, Emery N. Brown. “Diagnostic methods
     for statistical models of place cell spiking activity[Quick Edit]” Neurocomputing, Vol. 38-40 (June 2001), pp. 1087-
     1093.
•    Emery N. Brown, Riccardo Barbieri, Valerie Ventura, Robert E. Kass, Loren M. Frank. “The Time-Rescaling
     Theorem and Its Application to Neural Spike Train Data Analysis” [Quick Edit]Neural Comp., Vol. 14, No. 2.
     (February 2002), pp. 325-346.
•    Riccardo Barbieri, Michael C. Quirk, Loren M. Frank, Matthew A. Wilson, Emery N. Brown. “Construction and
     analysis of non-Poisson stimulus-response models of neural spiking activity”. [Quick Edit]Journal of Neuroscience
     Methods, Vol. 105, No. 1. (January 2001), pp. 25-37.
•    Uri T. Eden, Loren M. Frank, Riccardo Barbieri, Victor Solo, Emery N. Brown. “Dynamic Analysis of Neural
     Encoding by Point Process Adaptive Filtering” [Quick Edit]Neural Comp., Vol. 16, No. 5. (May 2004), pp. 971-998.




                  Institute for Research Information and Quality Assurance                                Jeffrey Demaine
                                                                                                                   08/2011
“Loren M. Frank” found 7 times in Karlmarx20’s library


•    Emery N. Brown, David P. Nguyen, Loren M. Frank, Matthew A. Wilson, Victor Solo. “An analysis of neural
     receptive field plasticity by point process adaptive filtering” Proceedings of the National Academy of Sciences of
     the United States of America, Vol. 98, No. 21. (9 October 2001), pp. 12261-12266.
•    Emery N. Brown, Loren M. Frank, Dengda Tang, Michael C. Quirk, Matthew A. Wilson . “A Statistical Paradigm
     for Neural Spike Train Decoding Applied to Position Prediction from Ensemble Firing Patterns of Rat Hippocampal
     Place Cells” J. Neurosci., Vol. 18, No. 18. (15 September 1998), pp. 7411-7425.
•    Emery N. Brown, Riccardo Barbieri, Valérie Ventura, Robert E. Kass, Loren M. Frank . “The Time-Rescaling
     Theorem and Its Application to Neural Spike Train Data Analysis”. Neural Computation, Vol. 14, No. 2. (19
     December 2010), pp. 325-346.
•    Riccardo Barbieri, Loren M. Frank, Michael C. Quirk, Matthew A. Wilson, Emery N. Brown. “Diagnostic methods
     for statistical models of place cell spiking activity” Neurocomputing, Vol. 38-40 (June 2001), pp. 1087-1093.
•    Emery N. Brown, Riccardo Barbieri, Valerie Ventura, Robert E. Kass, Loren M. Frank. “The Time-Rescaling
     Theorem and Its Application to Neural Spike Train Data Analysis” Neural Comp., Vol. 14, No. 2. (February 2002),
     pp. 325-346.
•    Riccardo Barbieri, Michael C. Quirk, Loren M. Frank, Matthew A. Wilson, Emery N. Brown. “Construction and
     analysis of non-Poisson stimulus-response models of neural spiking activity”. Journal of Neuroscience Methods,
     Vol. 105, No. 1. (January 2001), pp. 25-37.
•    Uri T. Eden, Loren M. Frank, Riccardo Barbieri, Victor Solo, Emery N. Brown. “Dynamic Analysis of Neural
     Encoding by Point Process Adaptive Filtering” Neural Comp., Vol. 16, No. 5. (May 2004), pp. 971-998.



                 Disambiguation based on shared co-authors.
                  Institute for Research Information and Quality Assurance                               Jeffrey Demaine
                                                                                                                  08/2011
What about reference matches?
• CiteULike does not have citation data!
• The references collected in a user's library (‟bookmarks”)
  are a form of citation.
   – Indicate common academic interest.
• So: leverage the personal collections generated by users
  to disambiguate authors based on ‟co-collection” of
  articles.




         Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                             08/2011
Matching on metadata
                                                              Thiel, C




Author-instances are matched
based on Co-citation

                                                              Thiel, C




       Institute for Research Information and Quality Assurance          Jeffrey Demaine
                                                                                  08/2011
“Wentai Liu” found 10 times in Karlmarx20’s library
•   Zhi Yang, Qi Zhao, Wentai Liu. “Improving spike separation using waveform derivatives”. Journal of Neural Engineering,
    Vol. 6, No. 4. (2009)
•   Zhi Yang, Qi Zhao, Wentai Liu. “Neural signal classification using a simplified feature set with nonparametric clustering“.
    Neurocomputing, Vol. 73, No. 1-3. (2009), pp. 412-422.
•   Zhi Yang, Qi Zhao, Wentai Liu. “Improving spike separation using waveform derivatives.” Journal of Neural Engineering,
    Vol. 6, No. 4. (2009)
•   Zhi Yang, Qi Zhao, Wentai Liu. “Energy based evolving mean shift algorithm for neural spike classification“. In 2009 31st
    Annual International Conference of the {IEEE} Engineering in Medicine and Biology Society. {EMBC} 2009, 3-6 Sept. 2009
    (2009), pp. 966-9.
•   Linh Hoang, Zhi Yang, Wentai Liu. "VLSI architecture of NEO spike detection with noise shaping filter and feature extraction
    using informative samples“ In 2009 31st Annual International Conference of the {IEEE} Engineering in Medicine and Biology
    Society. {EMBC} 2009, 3-6 Sept. 2009 (2009), pp. 978-81.
•   Linh Hoang, Zhi Yang, Wentai Liu. "VLSI architecture of NEO spike detection with noise shaping filter and feature extraction
    using informative samples“ Conference Proceedings: Annual International Conference of the IEEE Engineering in Medicine
    and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, Vol. 1 (2009), pp. 978-981.
•   Zhi Yang, Qi Zhao, Wentai Liu. “Energy based evolving mean shift algorithm for neural spike classification“. Conference
    Proceedings: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering
    in Medicine and Biology Society. Conference, Vol. 1 (2009), pp. 966-969.
•   Linh Hoang, Zhi Yang, Wentai Liu. "VLSI architecture of NEO spike detection with noise shaping filter and feature extraction
    using informative samples.” Conference Proceedings: Annual International Conference of the IEEE Engineering in Medicine
    and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, Vol. 1 (2009), pp. 978-981.
•   Tung-Chien Chen, Wentai Liu, Liang-Gee Chen. “VLSI architecture of leading eigenvector generation for on-chip principal
    component analysis spike sorting system“. Conference Proceedings: Annual International Conference of the IEEE
    Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, Vol. 1 (2008),
    pp. 3192-5.
•   Zhi Yang, Qi Zhao, Wentai Liu. “Improving spike separation using waveform derivatives“. Journal of Neural Engineering,
    Vol. 6, No. 4. (July 2009).

                  Institute for Research Information and Quality Assurance                                      Jeffrey Demaine
                                                                                                                         08/2011
Matching based on collection
• Some precedence for this in the Science 2.0 literature:
      – Haustein, S. et al. equate bookmarking on CiteULike as a proxy
        for citations in measuring journal impact.
      – Usage ratio, diffusion, intensity based on counting user profiles.

• Matching based on membership in a collection: data that
  is extrinsic to the article itself.
      – Match on user-generated keywords


Haustein, S., Golov, E., Luckanus, K., Reher, S. & Terliesner, J. (2010). “Journal evaluation and science 2.0:
Using social bookmarks to analyze reader perception”. Proceedings of the 11th International Conference on
Science and Technology Indicators, Leiden, 117-119.



                 Institute for Research Information and Quality Assurance                           Jeffrey Demaine
                                                                                                             08/2011
Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                    08/2011
Practicalities
• The accuracy of the disambiguation is not mission-
  critical – it's just an online web-service.
• Could be easily implemented to run on-the-fly using Java
  Server Page technology.
   – When a user loads a web-page, the server matches the
     references based on their metadata and profile membership,
     creating a web-page with groups.
• An example of an automated value-added service in
  a Science 2.0 context.



         Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                             08/2011
Thank you very much
                 for your attention!

   demaine@forschungsinfo.de




Institute for Research Information and Quality Assurance   Jeffrey Demaine
                                                                    08/2011

More Related Content

Recently uploaded

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 

Recently uploaded (20)

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

Author disambiguation for enhanced science 2.0 services

  • 1. Author Disambiguation for enhanced science 2.0 services Generating groups based on “thin” data Jeffrey Demaine Institute for Research Information and Quality Assurance (iFQ) Bonn, Germany www.research-information.de
  • 2. Outline • Describe author-disambiguation efforts at iFQ – Good results based on limited metadata. – Simple approach can achieve remarkable results. • Adapt the findings to the context of an online service. • Most author-disambiguation articles achieve remarkable results by starting with clean (manually curated) data. – What if you don't have clean, curated data? – i.e.: the community-generated content of Science 2.0 Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 3. Challenge • Web of Science database does not group author names by identity: each name-instance gets a different ID number. • Example: There are 100 articles by Smith, J – 100 different authorID numbers – 100 different researchers? – 1 researcher published them all? – Something in-between? • How can iFQ measure a researcher´s productivity if we don't know who that person is? Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 4. Challenge Goal: • To reduce uncertainty about identity • Identify and group homonyms – same spelling = same thing Constraints: • WoS only uses the first initial. – Few records have a first name • Affiliation data is suspect – Corresponding author only Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 5. Ignoring synonyms • Spelling mistakes, anglicisation of characters (changing ö into oe or o), name changes due to marriage, etc. are beyond the scope of this project. • While J. Smith can be disambiguated, we will never know if Jane Smith is a mis-spelling of June Smith. (Maybe it’s really Janet Smith, or Jane Smyth, or…) • Speculation about what could be cannot be resolved. Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 6. Matching on metadata Thiel, C Thiel, CM; Bentley, P; Dolan, RJ (2002) Effects of cholinergic enhancement on conditioning-related ? responses in human auditory cortex. EUR J NEUROSCI Thiel, C Thiel, CM; Henson, R; Morris, JS; Friston, KJ; Dolan, RJ (2001) Pharmacological modulation of behavioral and neuronal correlates of repetition priming. J NEUROSCI These articles have on co-author in common: we can be pretty sure they were both written by the same Christiane Thiel Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 7. Matching on metadata Thiel, C Smith, J Author-instances are matched based on Bibliographic Coupling Thiel, C Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 8. Types of matching tested 4 types of matching were tested: • > 1 co-author • > 1 co-author and/or > 1 reference • > 1 co-author and/or > 2 references • > 1 co-author and/or > 1 reference [Jaccard Similarity Coeff.] Evaluation of matching: • Fewer groups = simplification of identity. RATIO • How accurate are the groups? PRECISION • How efficient are the groups? RECALL Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 9. Types of matching tested • Compare each group against the Emmy Noether data, which we know represents a single person. • How accurate is the group with the most matches? That is: • We know that a J Smith wrote 15 articles. • Which group contains the most of those? • Which method of matching creates the best groups? Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 10. Goal #1: Reducing uncertainty 14 names from Emmy Noether dataset (all records since 1992) Name tested WoS Simplification of identity B Borasoy 59 Co-author matching 95 90% R Huber 945 Number of Co-authors & reference 65 93% A Blaukat 44 groups by: Co-authors & 2 references 76 92% G Behre 98 Co-authors & reference Jaccard 154 84% M Albrecht 652 L Ackermann 60 R Everaers 43 (945 x 3) - 1 x (944 x 3) - 1 > 8 million P Neumann 317 K Niebuhr 13 R Huber - manual matching C Ochsenfeld JW Pan 33 192 Co-author 70,839,681 Co-author comparisons T Pietschmann 37 matching 1180661 147583 minutes (1 per second) 8-hour days B Regenberg 30 K Wiegand 28 took 5m 27sec 29517 5-day work weeks 615 years (4 weeks vacation) Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 11. Types of matching tested Which method of matching creates the best groups? Precision (%) Matching by co-authors & 2 references 75% Precision Matching by co-authors & 2 references & author starting year 79% Precision Precision = The ratio of correct matches to all matches (% correct) Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 12. Goal #2: Split known names • Use the co-author & 2-references matching technique. • We know a priori the identities of the persons, so we know how many groups there should be. • This allows us to calculate Recall. • Also allows us to calculate correlation coefficients. Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 13. Who is E. Cardellach? Esteve Cardellach Estel Cardellach Bellaterra, Spain Bellaterra, Spain Universitat Autònoma de Barcelona Insitut de Ciècies de l’ESPAI • Environmental Engineering • Computer Science, A.I. • Environmental Sciences • Electrical & Electronic Engineering • Geochemistry & Geophysics • Imaging Science & Photo. Tech. • Geology • Multidisciplinary Geosciences • Mineralogy • Remote Sensing • Civil Engineering • Telecommunications Web of Science identifies both as simply “Cardellach, E” Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 14. Using the program • There are 41 distinct records published since 2002 that match the name “Cardellach, E”. • The program grouped the names according to the co- authors and/or references (minimum 2) they share. • Involved the comparison of 9202 co-authors and of 19777 references, which took only 21 seconds (!). • The RESULT was 4 groups containing all 41 records. • Googled article titles to verify author name & address. Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 15. Precision & Recall Condition (reality) Condition (reality) TRUE FALSE Test (prediction) Precision True positive False positive TRUE TP ÷ (TP + FP) Test (prediction) False negative True negative FALSE Recall TP ÷ (TP + FN) Precision: % of results that are correct (accuracy). Recall: % of all relevant matches that were made (efficiency). Must know the total number of correct answers there are. The Matthews correlation coefficient gives a statistical measure of the results: Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 16. Disambiguation of Estel and Esteve In reality: In reality: Esteve Estel Precision Predicted: Esteve 25 1 96% Predicted: Estel 3 12 Recall Matthews c.c. 89% 0.79 The one false positive occurred when Estel published a paper in the field of Geochemistry & Geophysics. Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 17. Disadvantages • Not the most advanced approach. – Does not use NLP on title, abstract. • High variablility of precision: groups can contain errors. • Smaller (“orphan“) groups require manual matching. • Very common names produce a large number of groups (naturally). Requires some post-processing. Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 18. Advantages • It is highly automated, requiring only the input of a name and a year with which to limit the search. • It runs very quickly, with hundreds of names being disambiguated in a few minutes. • Only have to verify one name-instance per group. • Works with the real-world data we have (FIZ databases), not some hand-curated test set. Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 19. What about science 2.0? • Most author-disambiguation articles achieve remarkable results by starting with a very clean data-set. • In a science 2.0 context, content is user-generated and nobody cleans the data. • But that does not mean one cannot extract interesting patterns from the metadata. • Example: author-disambiguation based on co-authors, co-citation, and shared keywords. Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 20. Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 21. Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 22. “Loren M. Frank” found 7 times in Karlmarx20’s library • Emery N. Brown, David P. Nguyen, Loren M. Frank, Matthew A. Wilson, Victor Solo. “An analysis of neural receptive field plasticity by point process adaptive filtering[Quick Edit]” Proceedings of the National Academy of Sciences of the United States of America, Vol. 98, No. 21. (9 October 2001), pp. 12261-12266. • Emery N. Brown, Loren M. Frank, Dengda Tang, Michael C. Quirk, Matthew A. Wilson . “A Statistical Paradigm for Neural Spike Train Decoding Applied to Position Prediction from Ensemble Firing Patterns of Rat Hippocampal Place Cells”[Quick Edit] J. Neurosci., Vol. 18, No. 18. (15 September 1998), pp. 7411-7425. • Emery N. Brown, Riccardo Barbieri, Valérie Ventura, Robert E. Kass, Loren M. Frank . “The Time-Rescaling Theorem and Its Application to Neural Spike Train Data Analysis”[Quick Edit]. Neural Computation, Vol. 14, No. 2. (19 December 2010), pp. 325-346. • Riccardo Barbieri, Loren M. Frank, Michael C. Quirk, Matthew A. Wilson, Emery N. Brown. “Diagnostic methods for statistical models of place cell spiking activity[Quick Edit]” Neurocomputing, Vol. 38-40 (June 2001), pp. 1087- 1093. • Emery N. Brown, Riccardo Barbieri, Valerie Ventura, Robert E. Kass, Loren M. Frank. “The Time-Rescaling Theorem and Its Application to Neural Spike Train Data Analysis” [Quick Edit]Neural Comp., Vol. 14, No. 2. (February 2002), pp. 325-346. • Riccardo Barbieri, Michael C. Quirk, Loren M. Frank, Matthew A. Wilson, Emery N. Brown. “Construction and analysis of non-Poisson stimulus-response models of neural spiking activity”. [Quick Edit]Journal of Neuroscience Methods, Vol. 105, No. 1. (January 2001), pp. 25-37. • Uri T. Eden, Loren M. Frank, Riccardo Barbieri, Victor Solo, Emery N. Brown. “Dynamic Analysis of Neural Encoding by Point Process Adaptive Filtering” [Quick Edit]Neural Comp., Vol. 16, No. 5. (May 2004), pp. 971-998. Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 23. “Loren M. Frank” found 7 times in Karlmarx20’s library • Emery N. Brown, David P. Nguyen, Loren M. Frank, Matthew A. Wilson, Victor Solo. “An analysis of neural receptive field plasticity by point process adaptive filtering” Proceedings of the National Academy of Sciences of the United States of America, Vol. 98, No. 21. (9 October 2001), pp. 12261-12266. • Emery N. Brown, Loren M. Frank, Dengda Tang, Michael C. Quirk, Matthew A. Wilson . “A Statistical Paradigm for Neural Spike Train Decoding Applied to Position Prediction from Ensemble Firing Patterns of Rat Hippocampal Place Cells” J. Neurosci., Vol. 18, No. 18. (15 September 1998), pp. 7411-7425. • Emery N. Brown, Riccardo Barbieri, Valérie Ventura, Robert E. Kass, Loren M. Frank . “The Time-Rescaling Theorem and Its Application to Neural Spike Train Data Analysis”. Neural Computation, Vol. 14, No. 2. (19 December 2010), pp. 325-346. • Riccardo Barbieri, Loren M. Frank, Michael C. Quirk, Matthew A. Wilson, Emery N. Brown. “Diagnostic methods for statistical models of place cell spiking activity” Neurocomputing, Vol. 38-40 (June 2001), pp. 1087-1093. • Emery N. Brown, Riccardo Barbieri, Valerie Ventura, Robert E. Kass, Loren M. Frank. “The Time-Rescaling Theorem and Its Application to Neural Spike Train Data Analysis” Neural Comp., Vol. 14, No. 2. (February 2002), pp. 325-346. • Riccardo Barbieri, Michael C. Quirk, Loren M. Frank, Matthew A. Wilson, Emery N. Brown. “Construction and analysis of non-Poisson stimulus-response models of neural spiking activity”. Journal of Neuroscience Methods, Vol. 105, No. 1. (January 2001), pp. 25-37. • Uri T. Eden, Loren M. Frank, Riccardo Barbieri, Victor Solo, Emery N. Brown. “Dynamic Analysis of Neural Encoding by Point Process Adaptive Filtering” Neural Comp., Vol. 16, No. 5. (May 2004), pp. 971-998. Disambiguation based on shared co-authors. Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 24. What about reference matches? • CiteULike does not have citation data! • The references collected in a user's library (‟bookmarks”) are a form of citation. – Indicate common academic interest. • So: leverage the personal collections generated by users to disambiguate authors based on ‟co-collection” of articles. Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 25. Matching on metadata Thiel, C Author-instances are matched based on Co-citation Thiel, C Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 26. “Wentai Liu” found 10 times in Karlmarx20’s library • Zhi Yang, Qi Zhao, Wentai Liu. “Improving spike separation using waveform derivatives”. Journal of Neural Engineering, Vol. 6, No. 4. (2009) • Zhi Yang, Qi Zhao, Wentai Liu. “Neural signal classification using a simplified feature set with nonparametric clustering“. Neurocomputing, Vol. 73, No. 1-3. (2009), pp. 412-422. • Zhi Yang, Qi Zhao, Wentai Liu. “Improving spike separation using waveform derivatives.” Journal of Neural Engineering, Vol. 6, No. 4. (2009) • Zhi Yang, Qi Zhao, Wentai Liu. “Energy based evolving mean shift algorithm for neural spike classification“. In 2009 31st Annual International Conference of the {IEEE} Engineering in Medicine and Biology Society. {EMBC} 2009, 3-6 Sept. 2009 (2009), pp. 966-9. • Linh Hoang, Zhi Yang, Wentai Liu. "VLSI architecture of NEO spike detection with noise shaping filter and feature extraction using informative samples“ In 2009 31st Annual International Conference of the {IEEE} Engineering in Medicine and Biology Society. {EMBC} 2009, 3-6 Sept. 2009 (2009), pp. 978-81. • Linh Hoang, Zhi Yang, Wentai Liu. "VLSI architecture of NEO spike detection with noise shaping filter and feature extraction using informative samples“ Conference Proceedings: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, Vol. 1 (2009), pp. 978-981. • Zhi Yang, Qi Zhao, Wentai Liu. “Energy based evolving mean shift algorithm for neural spike classification“. Conference Proceedings: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, Vol. 1 (2009), pp. 966-969. • Linh Hoang, Zhi Yang, Wentai Liu. "VLSI architecture of NEO spike detection with noise shaping filter and feature extraction using informative samples.” Conference Proceedings: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, Vol. 1 (2009), pp. 978-981. • Tung-Chien Chen, Wentai Liu, Liang-Gee Chen. “VLSI architecture of leading eigenvector generation for on-chip principal component analysis spike sorting system“. Conference Proceedings: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, Vol. 1 (2008), pp. 3192-5. • Zhi Yang, Qi Zhao, Wentai Liu. “Improving spike separation using waveform derivatives“. Journal of Neural Engineering, Vol. 6, No. 4. (July 2009). Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 27. Matching based on collection • Some precedence for this in the Science 2.0 literature: – Haustein, S. et al. equate bookmarking on CiteULike as a proxy for citations in measuring journal impact. – Usage ratio, diffusion, intensity based on counting user profiles. • Matching based on membership in a collection: data that is extrinsic to the article itself. – Match on user-generated keywords Haustein, S., Golov, E., Luckanus, K., Reher, S. & Terliesner, J. (2010). “Journal evaluation and science 2.0: Using social bookmarks to analyze reader perception”. Proceedings of the 11th International Conference on Science and Technology Indicators, Leiden, 117-119. Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 28. Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 29. Practicalities • The accuracy of the disambiguation is not mission- critical – it's just an online web-service. • Could be easily implemented to run on-the-fly using Java Server Page technology. – When a user loads a web-page, the server matches the references based on their metadata and profile membership, creating a web-page with groups. • An example of an automated value-added service in a Science 2.0 context. Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011
  • 30. Thank you very much for your attention! demaine@forschungsinfo.de Institute for Research Information and Quality Assurance Jeffrey Demaine 08/2011