3. What are we going to talk about today?
• My needs in a tool
• Process assessment vs meaningful assessment.
• Specific tools
• Trainee at risk
– Being helpful
– Being careful
4. My needs
• PD small to med program
• Talking to myself at faculty meetings
• Running the program – the Art show
• Time and energy, administrative efficiency
– CEX? NO
– Standardized patients? NO
5. Process assessment
- going thru the motions -
• Chores
• Artificial
• Reciprocity assures fatal restraint
• Work arounds and short cuts limit value
6. (more) Meaningful Assessments
• Genuine opportunities for growth.
• Honest and accurate assessments assured by:
– highly objective nature of the tool,
– Statistical certainty due to high volume of events
– Public display to multiple simultaneous evaluators
– Permanent record of measured variable
– No risk to the evaluator
7. Specific tools I use
• Nearly Useless:
– Self reflection (Fellow self assessment)
– Current endoscopy metrics (yep, that’s what he said)
– Attending global performance assessment
– End of rotation assessments
• Useful:
– Nurse and staff assessments
– Patient assessments
– Lecture, dictation assessments
– QI Projects
• Most Useful:
– The GTE exam
8. Nearly Useless Tools
• Fellow self-reflection
– How I use it: (automated survey to focus self-
directed learning)
– Why I still use it: (makes them read all the learning
objectives in the curriculum)
– Shortcomings: the best fellows are hypercritical,
and the ones who need self-reflection the most,
aren’t capable of it.
– Room for improvement: figure out how to teach
self-reflection better
9. Nearly Useless Tools
• Current endoscopy metrics
– How I use it: (self reporting, signed off by
attending supervisor)
– Why I use it (efforts to develop milestones in
Patient Care)
– Shortcomings: focus on simple procedural
metrics, universally achieved at a busy clinical
program.
– Room for improvement: TBD
10. Nearly Useless Tools
Attending GPA and End of rotation assessments:
How I use it: (automated quarterly e-value surveys)
Why I use it: (easy, satisfies means to assess up to 2
competencies, allows a mechanism for universal
condemnation)
Shortcomings: useless assessments, compromised by
halo effect and fear of reciprocal evaluation, completion
rates poor without constant reminders and/or chief
enforcement.
Room for improvement: scale could be improved to
Needs improvement, Meets expectations, or Exceeds.
11. Useful tools
• Nurse and staff assessments of fellows:
– HOW: automated semi-annual survey. If
problems, changed to quarterly.
– WHY: unsuspected evaluators, no reprisals,
many nurses individually work with each fellow.
These evaluation cover a lot of ground across the
more difficult competencies.
12. Useful tools
• Patient assessments:
– How: every fellow’s clinic patient gets survey to
send back, entered de-identified, summated
semi-annual
– Why: large numbers provide accurate snapshot
of questions attendings really can’t answer (e.g.
how well does this fellow explain a problem to a
patient, is fellow respectful, etc.)
13. Useful tools
• Lecture assessments:
– How: faculty evaluate by survey
– Why: evaluate performance as a teacher, assess
effort to evaluate current data, treat colleagues
with respect, etc. Allows fellows to demonstrate
strengths or weaknesses across competencies in
a manner not apparent from routine clinical
work.
14. Useful tools
• Dictation assessment:
– How: pull several dictations every few months
per fellow.
– Why: easy, gives good insight into
communication skills, language skills, thought
organization, professionalism, comprehensive
understanding of clinical medicine; based on
elements included or not, not recall and less
subjective than evaluating oral presentation.
15. Useful tools
• QI projects:
– How: required; standard measure, intervene, re-
measure approach. Fellow designs project idea,
carries out measurements, reports outcomes.
– Why: great tool for PBL competency, can be
combined with clinical research and published.
Fellows seem to enjoy these – more practical
utility, sense of positive accomplishment?
16. Most Useful tool
• GTE exam:
– How we use GTE: annual, required, program
pays. Our fellows required to reach 15%
performance.
– Why use GTE: purely objective, identifies
problem performers early enough to remediate.
17. Why I like the GTE
• As of 2012, GTE includes:
– 97% of allopathic GI programs, and
– 82% of all GI fellows
• Although formative:
– High internal consistency (Cronbach’s coefficient
alpha is 0.89)
– Meaningful improvement F3>F2>F1 in global
performance and in each subject area, p<0.0001
F1 n= 352 (30% of total) mean % correct: 51 +/- 8.7
F2 n = 430 (37%) 57 +/- 9
F3 n = 378 (33%) 63 +/- 8.7
18. Why I like the GTE
– With near global participation, the exam reflects
and compares a fellow’s Medical Knowledge
performance within their peer group, the same
cohort against which the fellow will take their
board exam.
– The ABIM GI board pass rate for first time takers
(regarded as a quality marker) ranges around
85%.
– More on GTE exam: Gastro 2012;142:201-204
19. Remediation
- The trainee at risk -
• Decision: Is the deficit the responsibility of the
training program?
• Zero tolerance conditions do exist.
• Levels of infraction vary (involve DIO/Dean GME ?)
• Program director must NOT/NEVER function as a
treating physician, mental health professional,
interpersonal counselor. Refer these functions to
employee health.
• Early assessment and reassessment is key.
20. Remediation
• HOW:
– Conversation/interaction should be constructive, not
judgmental, no emotional content
– Always should occur with multiple faculty, (chief)
– Remediation plans should be written, signed by
fellow and signed copy retained by program.
– The written plan must clearly state what objective
was not met, identify target objective to be met, the
time course to meet, the interim monitoring process,
and the consequences of failure to remediate.
– Fellows progress monitored by committee, with
written progress reports filed by program.
21. Remediation
• Remediation failure:
– Program must decide if method of failure at issue:
• Lack of effort vs uncontrollable circumstances
• Program must decide whether failure to remediate
identifies a trainee who should not continue in
program – in that event, GME must be involved for
academic probation, if they were not already
involved.
22. Case examples:
• Case #1: Fellow A is extremely deferential to
attending physicians, and generally a bright
fellow, completes work, good team member,
attends conferences, punctual. Gets glowing
attending GPA assessments. However, on
nursing evaluations 5/9 nurses report fellow
unnecessarily rough with patients,
unresponsive to their requests for analgesia,
generally dismissive of the endoscopy nurse,
and rate the fellow Unsatisfactory in several
categories.
23. • Talked with fellow about negative nursing
evaluations, asked for fellow’s perception of
circumstances.
• Impressed upon fellow that while these
evaluations are subjective, these reports were
highly unusual and clearly an outlier.
• Behavior expected to improve.
• Nursing was separately informed that fellow
was counseled on these issues, and instructed
to monitor for changes, and not to judge
fellow on prior performance.
24. • Nursing evaluations issued quarterly for this
fellow.
• Nursing assessment of behavior improved
dramatically, all the way into the superior
category, and remained there the remainder
of fellowship.
25. • Case #2: 1st year fellow scores in 3% on GTE
exam. No excuses, no distractions, personal
stresses, etc. Study plan written out,
resources provided, progress monitored
carefully by Associate PD with monthly
completion of assignments. Fellow signed off
on understanding that his next GTE exam %
must be >15%, or academic probation is an
option for the program.
My thanks to the course directors, Dr. Coyle and Dr. Onken, as well as the ASGE and AGA for inviting me to give this presentation.
Ultimately this talk looks at how my needs shaped which tools work best for me, why, and how I made them work for me.
This situation may not apply to you. You may be blessed with abundant resources and faculty – but I had to learn for at least 10/12 years how to get it all done myself. For years we had 4-5 faculty, and even at 11 faculty presently, 4 joined in the last two years. So for a very long time, I was the only faculty member truly engaged in assessment and educational process, and I needed tools that were not going to consume too much of my clinical time, were efficiently administered and hopefully meaningful. Other faculty were not disinterested, just spread thin, overwhelmed and overcommitted. Given the demographics of the nation’s training programs - 30% small and 50% medium sized, I suspect this may sound familiar to enough of you.
Examples of these include global performance assessments, post-rotation debriefs, fellow on fellow assessments, low-stake module or assignment completions. These assessments maybe start out idealistic, maybe are better applied in residencies with large numbers or trainees, or by faculty with time to be engaged in the process – but for many smaller programs, a lot of traditional tools hold very little meaningful value. Again, faculty time pressures can reduce assessments to burdensome, intrusive chores; and faculty and fellows know that anonymity and confidentiality are practically impossible in a small program. End up with mutual love assessments – faculty are afraid to be critical, give everyone “4” on the 1-5 scale. Even more germane, debriefing an assessment is a learned skill that requires investment and energy – select faculty must be engaged, and they are either present in a program or not. Fellows learn short cuts and sharing (mutual support) strategies to lighten tasks.
Note that we will discuss them separately on following slides
I acknowledge that self reflection does not have to be useless, but in the way I’ve used it, it is. HOW: all 188 learning objectives from the national curriculum are presented as a survey, automated, annual, with e-value targeting outlier responses, which we can then discuss to focus their attention for self-directed learning in the upcoming year.WHY: at some level I feel that I’ve at least presented to them a framework of everything they must know , and since they are forced to answer each one individually, they definitely had to read each one.
These will involve a lot of process unnecessarily – when our fellows complete 700-900 colonoscopies in training, there is very little chance that they will not be competent based on current research evidence, and there is additionally no chance that they will reach true mastery (the 10,000 hour issue) – which takes years after graduation.
The halo effect is principally grounded in professionalism and communication issues, and verbal/non-verbal interactions reflecting general likeability or not.
The presence of the attending as an observer potentially changes fellows behavior, and conflicts the patient as to who they are interacting with. Standardized patient assessments can be too artificial and staged.
Formative exams are generally hard, with few percent correct answers overall; they are designed to expose areas for improvement and do not represent minimum threshold performance assessments
Did the program place the trainee in a situation without adequate backup, supervision, or training? Did the program fail to provide that training which is the purpose of the program (e.g. fellow in research lab all year, cannot be assessed for colonoscopy progression).Generally risks to patient safety, or felony crimes – grounds for immediate suspension, consult with Dean. Issues which involve complaints to HR should probably come to the Dean’s attention and be discussed with Hospital Legal department.Not meeting a milestone in exam scores, or cecal intubation rate, or patient satisfaction with thoroughness of explanation in clinic does not rise to the level of involving the Dean, and are appropriate for the program to manage at first)5. Making the first measurement for competency late in the training program and determining the trainee is not meeting standards is not fair, not appropriate, and reflects failure of program functioning. Would likely be legally challenged. Some exceptions for skills likely to be acquired late, such as sphinterotomy for example.