O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Dartmouth 2018 writing assessment presentation Les Perelman

10 visualizações

Publicada em

Presentation given at the Dartmouth Seminar for Writing Research

Publicada em: Educação
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Dartmouth 2018 writing assessment presentation Les Perelman

  1. 1. Writing Assessment & Its Rhetoric Les Perlman MIT Dartmouth Summer Seminar for Composition Research 4 August 2018
  2. 2. White’s Law • Assess thyself or assessment shall be done unto thee
  3. 3. Formative Assessment = Teaching
  4. 4. Summative Assessment = Certificate of Learning
  5. 5. Key Concepts • Validity – Does it measure what you want to measure? • Reliability – Can we trust the measurement? • Significance – Is it not random chance? • Importance – If the effect is not random chance, is it large enough to be important? • Consequences – What are the intended, unintended, positive, and negative effects? • Type I & Type II Errors – How do we identify false positives and false negatives?
  6. 6. Types of Validity • Face Validity – directly measures the thing you want to measure • Construct Validity – measures of a construct (i.e., practical tests developed from a theory) that actually measures something in terms of that construct. • Predictive or Concurrent Validity – measures that can predict or correlate with other measures of the same construct either in the future or concurrently. • Criterion or Proxy Validity – measures of proxy variables that correlate with the construct
  7. 7. Face Validity • Piano Technicians Guild – Repair and tune an old dilapidated piano • Online Essay Evaluations (i.e. iMOAT) – MIT’s Freshman Essay Evaluation assesses two constructs • Ability to write an argumentative response to a reading with its own argument and data; • Ability to synthesize and paraphrase several data rich texts
  8. 8. Construct Validity • Intelligence – IQ – SAT
  9. 9. Showing that AES Measures No Construct Related to Human Communication Privacy at dictators has not, and in all likelihood never will be agreed yet somehow vast. The amanuensis will, nonetheless, be equitable in the extent to which we advocate. Because almost all of the demarcations for the area of literature are foretold to privacy, privacy which assassinates postulation can be more enormously appreciated. Seclusion will always be an experience of humankind. Privateness is the most sophistic escapade of mankind. Privacy has not, and probably never will be lavish but not solemn. Humanity will always preach privateness; whether on the assembly or with the adjuration. A lack of privateness lies in the study of literature as well as the search for reality. Why is seclusion so regrettable to declaration? The response to this query is that privacy is indispensably propitious. Seclusion, usually by a dictum, will despicably be ouster that should be the proclamation. If all of the allusions reprove the admonishment to a embroidery, substantiated affronts infuse equally with privacy. Furthermore, as I have learned in my literature class, human life will always appease privacy. My celebration adheres. The precipitously homogenized anvil may, nonetheless, be immense, venomous, and situational. In my philosophy class, many of the assassinations at our personal drone by the injunction we civilize speculate. Ever since, an altruist is unsubstantiated, mimicking, and startling of my appetite. Community postulates probes, not infusion. In my literature class, most of the circumspections with our personal circumstance for the account we demolish explain the thermostat on dicta to inspections. Because just about all of the assumptions are contradicted with privacy, a momentous privacy can be more mournfully propagandized.
  10. 10. According to professor of semiotics Oscar Wilde, privateness is the most fundamental pledge of humankind. Although the same gamma ray may receive two different pendulums by an assimilationist by surfeit, interference reproduces. The same neuron may receive two different brains on the contradiction to counteract orbitals. Information is not the only thing radiation implodes; it also transmits gamma rays at privacy. By quarreling, postulates of ruminations which entreat the people involved and respond account as well for seclusion. The sooner puissant axioms authorize acquiescence, the more happenstance will rightfully be an outlandishly or surprisingly spurious declaration. Seclusion, often to aggregations, taunts the assassin but can be edification. Privacy which is inchoate changes the disparaging privacy. Additionally, if aborigines arrange juggernauts, appeasement with the prison by privateness can be more rapaciously circumscribed. Our personal quarrel for the commencement we implore jeers. However, armed with the knowledge that a respondent should erratically be the amygdala that might be the epigraph, all of the interlopers to our personal development at the salver we assimilate sequester the denouncement but provide the people involved. In my experience, some of the inquisitions with my avocation countenance taunts. a dearth of privacy enlightenments inquiries on our personal reprobate of the inspection we lament also. Resourcefulness propagates amplifications but collapses, not an avowed spectrometry. In my philosophy class, none of the contradictions by our personal embroidery for the organism we assault shriek. From foretelling appetites, many of the allegations quibble to the same extent of privacy. Privateness to the circumspection will always be an experience of human society. The affirmation will, even so, be lavish yet somehow amicable. The less advancements at inclination of the area of theory of knowledge anesthetize a precinct that choreographs sublimation with the disenfranchisement or confide, the sooner a circumstance is haphazard, petulant, and inappropriate. Seclusion at aggregations has not, and presumably never will be deleteriously risible. Because of the fact that privacy preaches those in question which concede thermostats and allude, mankind should assassinate seclusion immediately.
  11. 11. Predictive Validity 1. Survey of instructors in various levels of first-year writing subject indicates that placement instrument is placing students in the right places. 2. Instrument is significant predictor of grades in first year writing classes. 3. Instrument is significant predictor of first-year GPA. 4. The best predictor of college success is family income.
  12. 12. Proxy Validity: E-Rater® 2.0 created – e-rater 2.0 Dimension Feature Grammar, usage, mechanics, & Style 1. Ratio of grammar errors to the total number of words 2. Ratio of mechanics errors to the total number of words 3. Ratio of usage errors to the total number of words 4. Ratio of style errors (repetitious words, passive sentences, very long sentences, very short sentences) to the total number of words Organization & Development 5. The number of "discourse" units detected in the essay (i.e., thesis, main ideas, supporting ideas, conclusion) 6. The average length of each unit in words
  13. 13. Dimension Feature Topical analysis 7. Similarity of the essay's vocabulary to other previously scored essays in the top score category 8. The score category containing essays whose words are most similar to the target essay Word complexity 9. Word repetition (ratio of different content words to total number of words) 10. Vocabulary difficulty (based on word frequency) 11. Average word length in characters Essay length 12. Total number of words Adapted from Attali, Y. & Burstein, J. (2006) & Ben-Simon, A. & Bennett, R.E. (2007)
  14. 14. Fairness -- Consequences • Intended outcomes based on constructs not on extraneous socially-determined values • Constructs themselves are as neutral as possible • Sufficient care has been taken to avoid adverse impact on specific groups • Measure attempts to avoid systemic negative effects on marginalized populations
  15. 15. Bias in Multiple Choice Test Selection Method • Old analogy question RUNNER: MARATHON :: A) envoy: embassy B) martyr: massacre C) oarsman: regatta D) referee: tournament E) horse: stable • No more analogies on most standardized tests • Committee checks questions for cultural bias • But process of question selection produces bias. Kidder & Rosner 2003; Rosner 2003; Rosner 2010
  16. 16. Ethnicity Bias • Consider this unscored test question from the Oct. 2000 SAT: 7. At bedtime the security blanket served the child as _______ with seemingly magical powers to ward off frightening phantasms. (A) an arsenal (B) an incentive (C) a talisman (D) a trademark (E) a harbinger
  17. 17. Reliability • Yielding the same or compatible results in different clinical experiments or statistical trials • Tension between reliability and validity
  18. 18. The Goal
  19. 19. Types of Reliability • Test – retest • Parallel forms reliability • Errors in measurement: Inter-rater reliability
  20. 20. Test- Retest Reliability
  21. 21. Parallel Forms Reliability • Assessments are at same level of difficulty for all groups • For essay assessments, that each assessment be at the same level of difficulty; that each prompt be equally challenging • SAT Multiple Choice selection process gives an insight into how complex this goal is – June 2018 SAT
  22. 22. Errors in measurement: Inter-rater reliability • All measurements contain a certain amount of error – Engineers always use error bars • As Huot notes, essay scoring has a significantly low correlation among readers • Even now, an acceptable correlation is 0.7 which is squared to give a shared variance of 0.49 or ~ ½ . • The best scoring produces a correlation of around 0.8, which gives a shared variance of 64% or slightly less than 2/3 rds. • But how do most large-scale testers get to even 0.7 or 0.8? Not by rigorous training. • The secret is length! And short time to write!
  23. 23. Significance & Importance The new SAT writing section increased the weighted SAT with GPA as a predictor of 1st year grades by r = .08 or an increase of 0.64% or .0064 in shared variance. Because of the large numbers, this value was significant.
  24. 24. Two-Types of Statistical Error • Type I – Incorrect rejection of a true null hypothesis – False Positive • Type II – Acceptance of a false null hypothesis – False Negative
  25. 25. The Perils of Pre-Test / Post-Test • False Negatives –Even with a rubric, holistic scoring to be accurate will create a bell curve near the middle of the scale
  26. 26. One Solution: Mix Populations
  27. 27. Instruments • Timed impromptu • Untimed essay • Untimed essay with readings • Portfolio
  28. 28. Measures • Binary • Holistic • Primary trait • Analytic Instrument and measures are determined by purpose and audience
  29. 29. The Rhetoric of Assessment Logos or Topic Speaker or Writer Audience Research Question & Data Researcher Audience
  30. 30. The Assessment Process Define Need Frame Questions Design Procedure to Answer Questions Generate Data Analyze Data Assess Validity and Reliability – Outside Transparency Use Data & Report Conclusions Check for Unintended Consequences Generate Follow-Up Research Questions
  31. 31. The Process • Why are you assessing? • What do you want to know? • For whom are you assessing and who will be doing the assessment? • Where, in what context, will you be assessing? • When and how often will you assess? • How will you assess?
  32. 32. Remember unintended consequences!