7. Content Assessment Evaluation: Models and Frameworks Working Example: Evaluation of a Hand Hygiene Program Assessment Instruments Standards and Rigor: Assessment Instruments Working Example: OSATS Standards and Rigor: Evaluations Summary and Take-home Message
17. Models and Frameworks CIPP Evaluation Model (Stufflebeam, 1983) Kirkpatrick’s Model of Evaluating Training Programs (Kirkpatrick, 1998) Moore’s Evaluation Framework (Moore et al., 2009) Miller’s Framework for Clinical Assessments (Miller, 1990)
19. CIPP Context What is the relation of the course to other courses? Is the time adequate? Should courses be integrated or separate?
20. CIPP Inputs What are the entering ability, learning skills and motivation of students? What is the students’ existing knowledge? Are the objectives suitable? Does the content match student abilities? What is the theory/practice balance? What resources/equipment are available? How strong are the teaching skills of teachers? How many students/teachers are there?
21. CIPP Process What is the workload of students? Are there any problems related to teaching/learning? Is there effective 2-way communication? Is knowledge only transferred to students, or do they use and apply it? Is the teaching and learning process continuously evaluated? Is teaching and learning affected by practical/institutional problems?
22. CIPP Product Is there one final exam at the end or several during the course? Is there any informal assessment? What is the quality of assessment? What are the students’ knowledge levels after the course? How was the overall experience for the teachers and for the students?
23. CIPP Methods used to evaluate the curriculum Discussions Informal conversation or observation Individual student interviews Evaluation forms Observation in class by colleagues Performance test Questionnaire Self-assessment Written test
24. CIPP CIPP focuses on the process and informs the program/curriculum for future improvements. Limited emphasis on the outcome.
26. Kirkpatrick’s Model of Evaluating Training Programs Results Behavior Learning Reaction Reaction: The degree to which the expectations of the participants about the setting and delivery of the learning activity were met Questionnaires completed by attendees after the activity
27. Kirkpatrick’s Model of Evaluating Training Programs Results Behavior Learning Reaction Learning: The degree to which participants recall and demonstrate in an educational setting what the learning activity intended them to be able to do Tests and observation in educational setting
28. Kirkpatrick’s Model of Evaluating Training Programs Results Behavior Learning Reaction Behavior: The degree to which participants do what the educational activity intended them to be able to do in their practices Observation of performance in patient care setting; patient charts; administrative databases
29. Kirkpatrick’s Model of Evaluating Training Programs Results: The degree to which the health status of patients as well as the community of patients changes due to changes in the practice Health status measures recorded in patient charts or administrative databases, epidemiological data and reports Results Behavior Learning Reaction
30. Kirkpatrick’s Model of Evaluating Training Programs The importance of Kirkpatrick’s Model is in its ability to identify a range of dimensions that needs to be evaluated in order to inform us about the educational quality of a specific program. It focuses on the outcomes. However, it provides limited information about the individual learner.
32. Miller’s Framework for Clinical Assessments Does (Performance) Shows How (Competence) Knows How (Procedural Knowledge) Knows (Declarative Knowledge)
33. Miller’s Framework for Clinical Assessments Does (Performance) Shows How (Competence) Knows How (Procedural Knowledge) Knows (Declarative Knowledge) Knows: Declarative knowledge. The degree to which participants state what the learning activity intended them to know Pre- and posttests of knowledge.
34. Miller’s Framework for Clinical Assessments Does (Performance) Shows How (Competence) Knows How (Procedural Knowledge) Knows (Declarative Knowledge) Knows how: Procedural knowledge. The degree to which participants state how to do what the learning activity intended them to know how to do Pre- and posttests of knowledge
35. Miller’s Framework for Clinical Assessments Does (Performance) Shows How (Competence) Knows How (Procedural Knowledge) Knows (Declarative Knowledge) Shows how: The degree to which participants show in an educational setting how to do what the learning activity intended them to be able to do Observation in educational setting
36. Miller’s Framework for Clinical Assessments Does: Performance. The degree to which participants do what the learning activity intended them to be able to do in their practices. Observation of performance in patient care setting; patient charts; administrative databases Does (Performance) Shows How (Competence) Knows How (Procedural Knowledge) Knows (Declarative Knowledge)
37. Miller’s Framework for Clinical Assessments The importance of Miller’s Framework is in its ability to identify learning objectives and link them with appropriate testing contexts (where) and instruments (how).
38. Assumption Moore’s and Kirkpatrick’s models assume a relationship between the different levels. If learners are satisfied, they will learn more, will be able to demonstrate the new skills, transfer them to the clinical setting, and consequently the health of patients and communities will improve!
40. Building Skills, Changing Practice: Simulator Training for Hand Hygiene Protocols Canadian Institutes of Health Research Partnerships for Health System Improvement A. McGeer, MA. Beduz, A. Dubrowski
41. Purpose Current models of knowledge delivery about proper hand hygiene rely on didactic session However, transfer to practice is low
42. Purpose Does Shows How Knows How Knows Clinical practice Simulation Didactic courses
43. Purpose The primary purpose was to investigate the effectiveness of simulation-based training of hand hygiene on transfer of knowledge to clinical practice
44. Project Recruitment Clinical monitoring (audits) Δ in Behavior Simulation Δ in Learning Reaction Clinical monitoring (audits)
45. Results: Reactions Simulation Design Scale Educational Practices Questionnaire 159respondents - average response: ‘Agree’ or ‘Strongly Agree’ (4 or 5 on a 5-pt scale) to each of 40 items Rated 39 of these items as ‘Important’ (4 on a 5-pt scale)
50. Summary Outcome based models of program evaluation may be providing limited use (i.e. they are mostly on the research branch). Need new, more complex models that incorporate both processes and outcomes (i.e. span across more branches).
51. Summary Assessment instruments, which feed these models are critical for successful evaluations. We need to invest efforts in standardization and rigorous development of these instruments. Evaluations Assessments
55. Scientific Advisory Committee of the Medical Outcomes Trust The SAC defined a set of eight key attributes of instruments that apply to measuring three properties: Distinguish between two or more groups, assess change over time, predict future status.
56. Scientific Advisory Committee of the Medical Outcomes Trust The SAC defined a set of eight key attributes of instruments: The Conceptual and Measurement Model Reliability Validity Responsiveness or sensitivity to change Interpretability Burden Alternative Forms of Administration Cultural And Language Adaptations The concept to be measured needs to be defined properly and should match its intended use.
57. Scientific Advisory Committee of the Medical Outcomes Trust The SAC defined a set of eight key attributes of instruments: The Conceptual and Measurement Model Reliability Validity Responsiveness or sensitivity to change Interpretability Burden Alternative Forms of Administration Cultural And Language Adaptations Reliability is the degree to which the instrument is free of random error, which means free from errors in measurement caused by chance factors that influence measurement.
58. Scientific Advisory Committee of the Medical Outcomes Trust The SAC defined a set of eight key attributes of instruments: The Conceptual and Measurement Model Reliability Validity Responsiveness or sensitivity to change Interpretability Burden Alternative Forms of Administration Cultural And Language Adaptations Validity is the degree to which the instrument measures what it purports to measure.
59. Scientific Advisory Committee of the Medical Outcomes Trust The SAC defined a set of eight key attributes of instruments: The Conceptual and Measurement Model Reliability Validity Responsiveness or sensitivity to change Interpretability Burden Alternative Forms of Administration Cultural And Language Adaptations Responsiveness is an instrument’s ability to detect change over time.
60. Scientific Advisory Committee of the Medical Outcomes Trust The SAC defined a set of eight key attributes of instruments: The Conceptual and Measurement Model Reliability Validity Responsiveness or sensitivity to change Interpretability Burden Alternative Forms of Administration Cultural And Language Adaptations Interpretability is the degree to which one can assign easily understood meaning to an instrument's score.
61. Scientific Advisory Committee of the Medical Outcomes Trust The SAC defined a set of eight key attributes of instruments: The Conceptual and Measurement Model Reliability Validity Responsiveness or sensitivity to change Interpretability Burden Alternative Forms of Administration Cultural And Language Adaptations Burden refers to the time, effort and other demands placed on those to whom the instrument is administered (respondent burden) or on those who administer the instrument (administrative burden)
62. Scientific Advisory Committee of the Medical Outcomes Trust The SAC defined a set of eight key attributes of instruments: The Conceptual and Measurement Model Reliability Validity Responsiveness or sensitivity to change Interpretability Burden Alternative Forms of Administration Cultural And Language Adaptations Alternative means of administration include self report, interviewer-administered, computer assisted, etc. Often it is important to know whether these modes of administration are comparable
63. Scientific Advisory Committee of the Medical Outcomes Trust The SAC defined a set of eight key attributes of instruments: The Conceptual and Measurement Model Reliability Validity Responsiveness or sensitivity to change Interpretability Burden Alternative Forms of Administration Cultural And Language Adaptations Cultural and Language adaptations or translations.
64. Scientific Advisory Committee of the Medical Outcomes Trust The SAC defined a set of eight key attributes of instruments: The Conceptual and Measurement Model Reliability Validity Responsiveness or sensitivity to change Interpretability Burden Alternative Forms of Administration Cultural And Language Adaptations
66. Working Example: OSATS Objective Assessments of Technical Skills Bell-ringer exam (12 minutes per station) Minimum of 6-8 stations Direct or video assessments of technical performance
67. Working Example: OSATS The Conceptual and Measurement Model Reznick R, Regehr G, MacRae H, Martin J, McCulloch W. Testing technical skill via an innovative "bench station" examination. Am J Surg. 1997 Mar;173(3):226-30. Martin JA, Regehr G, Reznick R, MacRae H, Murnaghan J, Hutchison C, Brown M. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg. 1997 Feb;84(2):273-8.
68. Working Example: OSATS Reliability and Validity Faulkner H, Regehr G, Martin J, Reznick R. Validation of an objective structured assessment of technical skill for surgical residents. Acad Med. 1996 Dec;71(12):1363-5. Kishore TA, Pedro RN, Monga M, Sweet RM. Assessment of validity of an OSATS for cystoscopic and ureteroscopic cognitive and psychomotor skills. J Endourol. 2008 Dec;22(12):2707-11. Goff B, Mandel L, Lentz G, Vanblaricom A, Oelschlager AM, Lee D, Galakatos A, Davies M, Nielsen P. Assessment of resident surgical skills: is testing feasible? Am J Obstet Gynecol. 2005 Apr;192(4):1331-8; discussion 1338-40. VAL Martin JA, Reznick RK, Rothman A, Tamblyn RM, Regehr G. Who should rate candidates in an objective structured clinical examination? Acad Med. 1996 Feb;71(2):170-5.
69. Working Example: OSATS Alternative Forms of Administration Maker VK, Bonne S. Novel hybrid objective structured assessment of technical skills/objective structured clinical examinations in omprehensiveperioperative breast care: a three-year analysis of outcomes. J Sualrg Educ. 2009 Nov-Dec;66(6):344-51. Lin SY, Laeeq K, Ishii M, Kim J, Lane AP, Reh D, Bhatti NI. Development and pilot-testing of a feasible, reliable, and valid operative competency assessment tool for endoscopic sinus surgery. Am J Rhinol Allergy. 2009 May-Jun;23(3):354-9. Goff BA, VanBlaricom A, Mandel L, Chinn M, Nielsen P. Comparison of objective, structured assessment of technical skills with a virtual reality hysteroscopy trainer and standard latex hysteroscopy model. J Reprod Med. 2007 May;52(5):407-12. Datta V, Bann S, Mandalia M, Darzi A. The surgical efficiency score: a feasible, reliable, and valid method of skills assessment. Am J Surg. 2006, Sep;192(3):372-8.
70. Working Example: OSATS Adaptations Setna Z, Jha V, Boursicot KA, Roberts TE. Evaluating the utility of workplace-based assessment tools for specialty training. Best Pract Res ClinObstetGynaecol. 2010 Jun 30. Dorman K, Satterthwaite L, Howard A, Woodrow S, Derbew M, Reznick R, Dubrowski A. Addressing the severe shortage of health care providers in Ethiopia: bench model teaching of technical skills. Med Educ. 2009 Jul;43(7):621-7.
76. The Joint Committee on Standards for Educational Evaluation (1990) Utility Standards The utility standards are intended to ensure that an evaluation will serve the information needs of intended users.
77. The Joint Committee on Standards for Educational Evaluation (1990) Utility Standards Feasibility Standards The feasibility standards are intended to ensure that an evaluation will be realistic, prudent, diplomatic, and frugal.
78. The Joint Committee on Standards for Educational Evaluation (1990) Utility Standards Feasibility Standards Propriety Standards The propriety standards are intended to ensure that an evaluation will be conducted legally, ethically, and with due regard for the welfare of those involved in the evaluation, as well as those affected by its results.
79. The Joint Committee on Standards for Educational Evaluation (1990) Utility Standards Feasibility Standards Propriety Standards Accuracy Standards The accuracy standards are intended to ensure that an evaluation will reveal and convey technically adequate information about the features that determine worth or merit of the program being evaluated.
86. Take-home Need more complex evaluation models that look at both processes and outcomes Choices will depend on the intended use (i.e. the branch on the Evaluation Tree) Both the evaluation models and the assessments instruments need standardization and rigor.