O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Peer judge: Praise and Criticism Detection in F1000Research reviews

211 visualizações

Publicada em

Presentation by Mike Thelwall

Publicada em: Dados e análise
  • Entre para ver os comentários

  • Seja a primeira pessoa a gostar disto

Peer judge: Praise and Criticism Detection in F1000Research reviews

  1. 1. PeerJudge: Praise and Criticism Detection in F1000Research reviews Mike Thelwall University of Wolverhampton
  2. 2. PeerJudge Overview • Based on a dictionary of review sentiment terms and phrases from F1000Research reviews • Each dictionary term or phrase has a praise or criticism score • Well written: +2 • Flawed: -4 • Reviews given the maximum positive and negative scores of words or phrases found in each sentence. • -1: no criticism … -5: very strong criticism • 1: no praise …. 5: very strong praise • Also 12 linguistic rules to cope with negation, booster words (very, slightly)
  3. 3. PeerJudge Example • The paper is well written but the study is poorly designed. • Praise: 2; Criticism: -4. • Try: online • http://sentistrength.wlv.ac.uk/PeerJudge.html
  4. 4. Part of the dictionary acceptabl* 3 accurate 2 adequat* 3 appropriate 3 arbitrary -2 balanced 2 bewilder* -3 but 1 careful* 3 clarify -2 clear 4 clearer -3 compelling 3
  5. 5. Technical details • Java jar program • portable • Dictionaries are external plain text files • Easily customizable • Fast • 14,000 reviews per second • Explains its judgement • So is transparent and the owner can adjust the dictionary for recurrent problems • Agrees above random chance with reviewer scores • Because based on a dictionary, does not “cheat” by identifying hot topics, fields, affiliations or jargon
  6. 6. Where is the dictionary from? • Human evaluation of a development dataset of F1000Research reviews • Machine learning to suggest extra terms and different weights
  7. 7. Limitations • Designed for F1000Research decisions – needs dictionary modification for good performance on other review datasets. • F1000Research reviews are unbalanced – few negative decisions • F1000Research reviews have standard concluding text that had to be removed – so referees might not conclude • Referees often give judgements in field-specialist languages, avoiding general conclusions • More substantial modifications may be needed for technical domains. • Difficult to do this in advance because very few outlets publish reviews and scores
  8. 8. Applications • Warning reviewers if their judgements are apparently out of line with their scores? • Warning reviewers if they have not given any praise. • As above for editors • On a larger scale, allow publishers to check for anomalies in the reviewer process, such as by identifying journals with uncritical referees (low average criticism scores).