Personalised statistical writing analysis

John Blake
Japan Advanced Institute of Science and Technology
Personalised statistical writing analysis

Overview
• Introduction
– context, impetus
– focus, process
• Five aspects
– statistical analysis
• Personalised writing analysis
– sample extracts
• Interview survey
• Future direction
2

Context
＊Proofreading for faculty
＊Writing assistance for PhD candidates
3
70%  50%
science

Impetus
21 email exchange on various points, including:
• “minor scary incident”で統一したいと思います。
• “near miss”“ではなく”minor scary incident”で統一し
たいと思います。
• 提出先に聞きました。near accidentというのが一
般的なようです。これで修正しました。
• “near-miss incident”に変更しました。 ….先生から
指示に従うように提案されました。
• Near miss incident → Near miss incidents に全て修正
しました。
4
From one research article (RA)
minor scary incident  near-miss incident ヒヤリ・
ハット

Focus
Enable research articles meet generic
expectations of:
• Accuracy by being factually correct
• Clarity by avoiding ambiguity
• Formality by adopting appropriate style
5
rhetorical structure, logic, originality,
flawed method, etc.= important, but…

Five aspects of generic integrity
1. Vocabulary fit
2. Readability
3. Word type balance
4. Style and usage
5. Lexicogrammatical
errors
Summary statistics
6
Bhatia, V. K. (1993). Analysing genre: Language use in professional
settings. London: Longman.

Process for each research article
•Create target corpus (TC)
•Analyse RA and TC
•Identify errors in RA
•Compile ratios where poss.
•Create feedback document
7

Five aspects
8
• keyness of RA & TCVocabulary fit
• Readability statistics of RA & TCReadability
• Ratio of GSL, AWL and off-list for
RA & TC
Word type
balance
• Markedness, modality, register
Style and
usage
• Vocabulary & grammatical errors
Lexico-
grammar

1. Vocabulary fit
Scott & Tribble (2006, p.56)
``keyness [is what a text] boils down to``
Hyland (2011) paper-journal fit
9
Hyland, K. (2011). Welcome to the Machine: Thoughts on writing for scholarly publication.
Journal of Second Language Teaching and Research, 1 (1), 58–68.
Scott, M., & Tribble, C. (2006). Textual Patterns: Key Words and Corpus Analysis in Language
Education. Amsterdam, Philadelphia: John Benjamins.
TC firm knowledge market international foreign
performance research variables markets countries
export country relationship business model
RA organizational TMSs coordination DOPPO expertise interactions
mechanisms BLOCK employee leader
team coordinate informal information management
Prepared using AntConc 3.2.4w with Brown Corpus as reference
TC = 243 RAs, c. 2.1 million words RA = 10k words

10
Prepared using Wordle with RA, 10k words
TC firm knowledge market international foreign
performance research variables markets countries
export country relationship business model
RA

2. Readability
11
0
5
10
15
20
25
Gunning fog
index
Flesch Kincaid
grade level
Mean sentence
length
Draft
Target
Bogert, J. (1985). In Defense of the Fog Index. Business Communication Quarterly, 48 (2), 9-12.
Gilquin, G., & Paquot, M. (2008). Too chatty: Learner academic writing and register variation.
English Text Construction, 1 (1), 41-61.
McClure, G. (1987). Readability Formulas: Useful or Useless, Professional Communication, IEEE
Transactions on, 30 (1), 12-15.
Bogert (1985) & McClure (1987) – factors affecting readability
Gilquin & Paquot (2008) - Learner academic writing – rather `chatty`
Research articles tend to have a higher reading difficulty.

3. Word type balance
Levels academic text
1st 1000 73.5%
2nd 1000 4.6%
AWL 8.5%
Other 13.3%
12
First 2k
words
69%
AWL
16%
Off-list
15%
Cobb , T. (2013). Web Vocabprofile. www.lextutor.ca/vp/
Nation, I.S.P. (2001). Learning vocabulary in another language. Cambridge:
Cambridge University Press.
Used in EAP courses at PolyU and CityU in Hong Kong
Nation (2001,p.17)
RA analysed by Web
VP classic v4 (Cobb, 2013)

4. Style and usage errors
13
Marked usage Ratio Suggestion
People provide first 0:9 COCA People first provide
Hyland (1998) – hedging
Robb (2003) – “Google as a quick ‘n’ dirty corpus tool”
Hyland, K. (1998). Hedging in scientific research articles. Amsterdam : John Benjamins
Robb, T. (2003). Google as a quick ‘n’ dirty corpus tool. TESL-EJ, 7(2).
Corpora: IS, KS, MS, BNC , COCA , WAC

5. Lexicogrammatical errors
14
Grammatical or vocabulary errors
Incorrect form Correct form Comment
1 Taking account
differences
Taking account of
differences
preposition
2 this study answers to
two questions
this study answers
two questions
answer to s.b. /
answer s.th.
3 former employee a former employee employee [singular]
4 to participate to this
study
to participate in this
study
collocation
(participate in)
5 emphasis is given on
XX
emphasis is placed
on XX
collocation
(give to / place on)
6 for being responsible to be responsible general vs. specific
purpose

Summary statistics
15
Based on requests for simple to understand evaluation 
Caveat: subjective evaluations disguised as statistics

Personalised writing analysis
16
Selected statistics for subject 1
Readability Yours Target Word type balance Yours
%
Target
%
Gunning fog
index
13.2 13.2 1k words 68.58 74.39
Mean sentence
length
15.49 19.37 2K words 6.69 5.29
Mean number
of clauses
/sentence
1.19 1.54 AWL 16.36 7.67
Lexical density 0.63 0.57 Off-list words 8.36 12.65

17
Style and usage
Sentence Ratio Comment or correction
1 minor scary incidents 1: 58,700 WAC near-miss incidents
2 falling-accident 0: 19 COCA slips, trips and falls OR
falling objects
3 a medical examination
by interview
1: 525 WAC
0: 1 COCA
a medical consultation
4 According to sex 1: 18 WAC According to the gender
5 175 indoor workers n/a Use One hundred and ….
6 Tomio,T. (1995)
proposes
n/a Omit initials in in-text
citations unless …

18
Style and usage
Sentence Ratio Comment or correction
1 people provide first their
expertise …
0:9
COCA
people first provide their
expertise …
2 XX also engage into XX 1:9000
COCA
XX also engage in XX
3 The XX structure limits
become
n/a Use limits for boundaries and
limitations for restrictions/
inabilities
4 future studies are able to n/a Use may be to show uncertainty
5 employee simultaneous
participation
0:5
WAC
simultaneous participation of
employees

Interview survey
Interviewer = me
Subjects = 4 faculty, 1 PhD candidate
Nationalities = 3 Japanese, 2 non-Japanese
Number = 5 participants
Interview time = 30 minutes
Location = private office on campus
Dates of interview = Jun-Jul 2013
Semi-structured interviews
e.g. `What revisions did you make to your paper since…..?
`How can I make the feedback more useful?`
19

Survey results
20
• Explanatory notes
– too long
• Key word lists
– couldn`t understand
• Three readability scores
– too complex
• Raw ratios
– too difficult
e.g. 47:211,120 1:4500
• Lexico-grammatical errors
• Word type balance
• Ratios for style and usage

Incremental improvements (made)
1. Create summary statistic scorecard 
2. Use word tag cloud for vocabulary fit 
3. Shorten explanatory notes 
4. Simplify and approximate ratios 
5. Show word type balance graphically with
percentages
6. Select `most useful` readability measure(s) –
mean sentence and word length?
21

Future developments
• Integration of metrics into one-stop online
porthole (thanks to reviewer for idea) for
researchers to submit drafts
• Statistical comparison of draft and published
versions to evaluate success of feedback
22

Any questions, suggestions or
comments?
John Blake
johnb@jaist.ac.jp

Personalised statistical writing analysis

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (15)

Destaque

Destaque (19)

Semelhante a Personalised statistical writing analysis

Semelhante a Personalised statistical writing analysis (20)

Mais de john6938

Mais de john6938 (20)

Último

Último (20)

Personalised statistical writing analysis

Notas do Editor