SlideShare uma empresa Scribd logo
1 de 48
Assessment
   Literacy in a
Teacher Evaluation
      Frame
 Andy Hegedus, Ed.D.
      June 2012
Trying to gauge my audience
            and adjust my speed . . .
• How many of you think your literacy with
  assessments in general is “Good” or better?
• How many of you are currently figuring out
  how to use assessment data thoughtfully in a
  Teacher Evaluation process?
Go forth thoughtfully
                      with care
• What we’ve known to be true is now being
  shown to be true
  – Using data thoughtfully improves student
    achievement
• There are dangers present however
  – Unintended Consequences
Remember the old adage?

“What gets measured (and attended to),
             gets done”
An infamous example

• NCLB
  – Cast light on inequities
  – Improved performance of “Bubble Kids”
  – Narrowed taught curriculum
A patient’s health
doesn’t change
because we know
their blood pressure

It’s our response that
makes all the
difference




                 It’s what we do that counts
Data Use in Teacher Evaluation is
             our construct for today



Our nation has moved from a model of
education reform that focused on fixing
schools to a model that is focused on
fixing the teaching profession
Be considerate of the continuum of
                           stakes involved




                                                    Terminate
Increasing risk




                                 Compensate


                   Support




                    Increasing levels of required rigor
Let’s get clear on terms
• Growth
  • Depiction of progress over time along a cross-
    grade scale
• Value-Added
  – A determination of whether growth is greater for
    a particular student or group of students than
    would be expected
Marcus’ growth

College readiness standard




  Marcus      Normal Growth   Needed Growth
What question is being answered in support of
     using data in evaluating teachers?



                     Is the progress
                     produced by this
                     teacher dramatically
                     different than teaching
                     peers who deliver
                     instruction to
                     comparable students
                     in comparable
                     situations?
There are four key steps required
     to answer this question

The Test

  The Growth Metric

   The Evaluation

      The Rating
The purpose and design of
the instrument is significant

     • Many assessments are
       not designed to
       measure growth
     • Others do not measure
       growth equally well for
       all students
Both Status and Growth are
                        important

     Adult
     Reading
                                Value Added = Teacher
                                Contribution to Growth
                            x
5th Grade                          Status
                x




            Time 1        Time 2
 Beginning Literacy
Teachers encounter a distribution
             of student performance

    Adult
    Reading

               x
           x
             x                      Norm =
 5th
         x xxxx
           xxx x
                      Grade Level   ―Typical‖ for
                      Performance   a reference
Grade
          xx
                                    population
           x



 Beginning Literacy
Traditional assessment uses items
            reflecting the grade level standards

     Adult
     Reading
6th Grade



5th Grade



4th Grade   Grade Level Standards

                                    Traditional
                                    Assessment Item
 Beginning Literacy                 Bank
Traditional assessment uses items
            reflecting the grade level standards

     Adult
     Reading
6th Grade   Grade Level Standards


                                    Overlap allows
5th Grade   Grade Level Standards   linking and scale
                                    construction

4th Grade   Grade Level Standards




 Beginning Literacy
Adaptive testing works differently




Item bank
can span full
range of
achievement
Available item pool depth
        is crucial
                       Est. RIT


                       Correct


                       Incorrect
Tests are not equally accurate for all
              students


          California STAR   NWEA MAP
These differences impact
                                        measurement error
               Academic Warning             Below                        Meets               Exceeds
              .12


                                                                                                       Adaptive
              .10                                                                                        Test

                                                                                                       Significantly
                                                                                                         Different
              .08
                      5th Grade                                                                           Error
Information




                        Level
              .06
                        Items


              .04




              .02                                                                                      Traditional
                                                                                                          Test

              .00
                165       175         185      195      205        215           225   235         245
                                1st                                                    86th
                                                     Scale Score
Error can change your life!

    • Think of a high stakes test –
                    State Summative
          – Designed to identify if a student is proficient or not
    • Do they do that well?
                 • 93% correct on Proficiency determination
    • Does it go off design well?
                 • 75% correct on Performance Levels determination


*Testing: Not an Exact Science, Education Policy Brief, Delaware Education Research & Development Center, May
2004, http://dspace.udel.edu:8080/dspace/handle/19716/244
What is measured must be
           aligned to what is being taught

• Assessments must align with the
  teacher’s instructional responsibility
  – Validity
     • Is it assessing what you think it’s assessing?
  – Reliability
     • If we gave it again, would the results be
       consistent?
The instrument must be able to
                       detect instruction

• …when science is defined in terms of
  knowledge of facts that are taught in
  school…(then) those students who have been
  taught the facts will know them, and those
  who have not will…not. A test that assesses
  these skills is likely to be highly sensitive to
  instruction.

Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design
principles drawn from international comparisons', Measurement:
Interdisciplinary Research & Perspective, 5: 1, 1 — 53
The more complex, the harder to
                detect and attribute to one teacher

• When ability in science is defined in terms of
  scientific reasoning…achievement will be less
  closely tied to age and exposure, and more
  closely related to general intelligence. In
  other words, science reasoning tasks are
  relatively insensitive to instruction.


Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design
principles drawn from international comparisons', Measurement:
Interdisciplinary Research & Perspective, 5: 1, 1 — 53
Other issues
• Security and Cheating

• Proctoring

• Procedures
Mean spring and fall test duration
                              in minutes by school
                 90.00


                 80.00


                 70.00


                 60.00
Duration (Min)




                 50.00


                 40.00


                 30.00


                 20.00


                 10.00


                  0.00

                                  Spring term   Fall term
Ten minutes makes a
                                                              difference ~ one RIT
                     8.00



                     6.00



                     4.00
Growth Index (RIT)




                     2.00



                     0.00



                     -2.00



                     -4.00



                     -6.00
                             1   3   5   7   9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71
                                                        Students taking 10+ minutes longer spring than fall   All other students
Testing is complete . . .
What is useful to answer our question?


The Test

  The Growth Metric

   The Evaluation

      The Rating
The metric matters -
                                      Let’s go underneath ―Proficiency‖
                                         Difficulty of New York ―Meets‖ Level
                      100
                      90
                      80
                      70                                                              College
National Percentile




                                                                                      Readiness
                      60
                      50                                                              Typical
                      40
                      30                                                                 Math
                                                                                         Reading
                      20
                      10
                       0
                            Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
What gets measured and attended to
                                     really does matter

                                          Mathematics
                            Proficiency               College Readiness
                                                                          No Change
                                                                          Down
Number of Students




                                                                          Up




                                               Fall RIT
                     One district’s change in 5th grade mathematics performance
                               relative to the KY proficiency cut scores
Changing from Proficiency to Growth
                               means all kids matter

                                   Mathematics
                                                              Below projected
                                                              growth
                                                              Met or above
Number of Students




                                                              projected growth




                                   Student’s score in fall
                     Number of 5th grade students meeting projected
                        mathematics growth in the same district
How can we make it fair?


The Test

  The Growth Metric

   The Evaluation

      The Rating
Consider . . .
• What if I skip this step?
  – Comparison is likely against normative data so the
    comparison is to “typical kids in typical settings”
• How fair is it to disregard context?
  – Good teacher – bad school
  – Good teacher – challenging kids


How does your performance evaluation consider
                   context?
Nothing is perfect

• Value added models control for a variety of
  classroom, school level, and other conditions
  – Over one hundred different value added models
  – All attempt to minimize error
  – Variables outside controls are assumed as random
• Results are not stable
  – The use of multiple-years of data is highly
    recommended
  – Results are more likely to be stable at the
    extremes
Multiple years of data is necessary for
                                                     some stability
                       Teachers with growth scores in lowest and highest
                          quintile over two years using NWEA’s MAP
                                         (493 teachers)

                       120
                       100                                                              Vote – Year 2 above
Number of teachers




                        80
                                                                                        or below
                        60                                                                               Year 1
                        40                                                                               Year 2
                        20
                          0
                                           Lowest                            Highest

                     Typical r values for measures of teaching effectiveness range between .30 and .60
                     (Brown Center on Education Policy, 2010)
A variety of errors mean more
stability only at the extremes
      • Control for statistical error
         – All models attempt to
           address this issue
            • Error is compounded with
              combining two test events
         – Nevertheless, many
           teachers’ value-added scores
           will fall within the range of
           statistical error
Range of teacher value-added
                                                                     estimates
                                       12.00
                                       11.00
                                                Mathematics Growth Index Distribution by Teacher - Validity Filtered
                                       10.00
                                        9.00                                         Each line in this display represents a single teacher. The graphic
                                                                                     shows the average growth index score for each teacher (green
                                        8.00                                         line), plus or minus the standard error of the growth index estimate
                                        7.00                                         (black line). We removed students who had tests of questionable
                                                                                     validity and teachers with fewer than 20 students.
                                        6.00
                                        5.00
Average Growth Index Score and Range




                                        4.00                                                                                                                Q5
                                        3.00
                                        2.00
                                                                                                                                                            Q4
                                        1.00
                                        0.00
                                                                                                                                                            Q3
                                        -1.00
                                        -2.00                                                                                                               Q2
                                        -3.00
                                        -4.00                                                                                                               Q1
                                        -5.00
                                        -6.00
                                        -7.00
                                        -8.00
                                        -9.00
                                       -10.00
                                       -11.00
                                       -12.00
With one teacher, error
     means a lot
Assumption of randomness
              can have risk implications

• Value-added models assume that variation is
  caused by randomness if not controlled for
  explicitly
  – Young teachers are assigned disproportionate
    numbers of students with poor discipline records
  – Parent requests for the “best” teachers are
    honored
  – Sound educational reasons for placement are
    likely to be defensible
Lower numbers can significantly
 impact a teacher level analysis
      • Idiosyncratic cases
        – In self-contained
          classrooms, one or two
          idiosyncratic cases can have
          a large effect on results
How tests are used to evaluate
             teachers

The Test

  The Growth Metric

   The Evaluation

      The Rating
Translation into ratings can be
  difficult to inform with data

       • How would you
         translate a rank order
         to a rating?
         • Data can be provided
       • Value judgment
         ultimately used to set
         cut scores for points or
         rating
• What is far below a
    district’s expectation is
    subjective
  • What about
    • Obligation to help
       teachers improve?
    • Quality of replacement
       teachers?


Decisions are value based,
       not empirical
• System for combining elements and producing a
  rating is also a value based decision
  – Multiple measures and principal judgment must be
    included
  – Evaluate the extremes to make sure it makes sense




           Even multiple measures need
                 to be used well
• Principal evaluation, state test, and local assessment
  scores are combined
   – Rating and points generated separately for each category
   – Principal has 60% of the evaluation
• What happens at the extremes
   – Low end of Developing (not Ineffective) with test scores
     requires 98% rating by principal to not fall to Ineffective
      • Effective needs 95%
   – A highly effective teacher based on test scores needs 50% or
     higher on Principal evaluation to maintain rating



               NY use of multiple measures
                  provides an example
Recommendations
• Be thoughtful
• Involve variety of stakeholders
• Use multiple years of student achievement data
• Begin with pilots to understand the accuracy and
  unintended consequences
• Embrace the formative advantages of growth
  measurement as well as the summative
• Presentations and other recommended
  resources are available at:
  – www.nwea.org
  – www.kingsburycenter.org


• Contacting us:
  NWEA Main Number
  503-624-1951
  E-mail: andy.hegedus@nwea.org

                   More information

Mais conteúdo relacionado

Mais de NWEA

Teacher goal setting in texas
Teacher goal setting in texasTeacher goal setting in texas
Teacher goal setting in texasNWEA
 
National Superintendent's Dialogue
National Superintendent's DialogueNational Superintendent's Dialogue
National Superintendent's DialogueNWEA
 
Taking control of the South Carolina Teacher Evaluation framework
Taking control of the South Carolina Teacher Evaluation frameworkTaking control of the South Carolina Teacher Evaluation framework
Taking control of the South Carolina Teacher Evaluation frameworkNWEA
 
NYSCOSS Conference Superintendents Training on Assessment 9 14
NYSCOSS Conference Superintendents Training on Assessment 9 14NYSCOSS Conference Superintendents Training on Assessment 9 14
NYSCOSS Conference Superintendents Training on Assessment 9 14NWEA
 
Maximizing student assessment systems cronin
Maximizing student assessment systems   croninMaximizing student assessment systems   cronin
Maximizing student assessment systems croninNWEA
 
Using Assessment Data for Educator and Student Growth
Using Assessment Data for Educator and Student GrowthUsing Assessment Data for Educator and Student Growth
Using Assessment Data for Educator and Student GrowthNWEA
 
NWEA Growth and Teacher evaluation VA 9-13
NWEA Growth and Teacher evaluation VA 9-13NWEA Growth and Teacher evaluation VA 9-13
NWEA Growth and Teacher evaluation VA 9-13NWEA
 
ND Assessment Program Alignment
ND Assessment Program AlignmentND Assessment Program Alignment
ND Assessment Program AlignmentNWEA
 
Nd evaluations using growth data 4 13
Nd evaluations using growth data 4 13Nd evaluations using growth data 4 13
Nd evaluations using growth data 4 13NWEA
 
Dylan Wiliam seminar for district leaders accelerate learning with formative...
Dylan Wiliam seminar for district leaders  accelerate learning with formative...Dylan Wiliam seminar for district leaders  accelerate learning with formative...
Dylan Wiliam seminar for district leaders accelerate learning with formative...NWEA
 
SC Assessment Summit March 2013
SC Assessment Summit March 2013SC Assessment Summit March 2013
SC Assessment Summit March 2013NWEA
 
Assessment Program Alignment: Making Essential Connections Between Assessment...
Assessment Program Alignment: Making Essential Connections Between Assessment...Assessment Program Alignment: Making Essential Connections Between Assessment...
Assessment Program Alignment: Making Essential Connections Between Assessment...NWEA
 
Predicting Student Performance on the MSP-HSPE: Understanding, Conducting, an...
Predicting Student Performance on the MSP-HSPE: Understanding, Conducting, an...Predicting Student Performance on the MSP-HSPE: Understanding, Conducting, an...
Predicting Student Performance on the MSP-HSPE: Understanding, Conducting, an...NWEA
 
Using tests for teacher evaluation texas
Using tests for teacher evaluation texasUsing tests for teacher evaluation texas
Using tests for teacher evaluation texasNWEA
 
KLT TLC Leader Materials Set Excerpt
KLT TLC Leader Materials Set ExcerptKLT TLC Leader Materials Set Excerpt
KLT TLC Leader Materials Set ExcerptNWEA
 
What's New at NWEA: Children’s Progress Academic Assessment (CPAA)
What's New at NWEA: Children’s Progress Academic Assessment (CPAA)What's New at NWEA: Children’s Progress Academic Assessment (CPAA)
What's New at NWEA: Children’s Progress Academic Assessment (CPAA)NWEA
 
Predicting Proficiency… How MAP Predicts State Test Performance
Predicting Proficiency… How MAP Predicts State Test PerformancePredicting Proficiency… How MAP Predicts State Test Performance
Predicting Proficiency… How MAP Predicts State Test PerformanceNWEA
 
Connecting the Dots: CCSS, DI, NWEA, Help!
Connecting the Dots: CCSS, DI, NWEA, Help!Connecting the Dots: CCSS, DI, NWEA, Help!
Connecting the Dots: CCSS, DI, NWEA, Help!NWEA
 
What's New at NWEA: Keeping Learning on Track
What's New at NWEA: Keeping Learning on TrackWhat's New at NWEA: Keeping Learning on Track
What's New at NWEA: Keeping Learning on TrackNWEA
 
What’s New at NWEA: Power of Teaching
What’s New at NWEA: Power of TeachingWhat’s New at NWEA: Power of Teaching
What’s New at NWEA: Power of TeachingNWEA
 

Mais de NWEA (20)

Teacher goal setting in texas
Teacher goal setting in texasTeacher goal setting in texas
Teacher goal setting in texas
 
National Superintendent's Dialogue
National Superintendent's DialogueNational Superintendent's Dialogue
National Superintendent's Dialogue
 
Taking control of the South Carolina Teacher Evaluation framework
Taking control of the South Carolina Teacher Evaluation frameworkTaking control of the South Carolina Teacher Evaluation framework
Taking control of the South Carolina Teacher Evaluation framework
 
NYSCOSS Conference Superintendents Training on Assessment 9 14
NYSCOSS Conference Superintendents Training on Assessment 9 14NYSCOSS Conference Superintendents Training on Assessment 9 14
NYSCOSS Conference Superintendents Training on Assessment 9 14
 
Maximizing student assessment systems cronin
Maximizing student assessment systems   croninMaximizing student assessment systems   cronin
Maximizing student assessment systems cronin
 
Using Assessment Data for Educator and Student Growth
Using Assessment Data for Educator and Student GrowthUsing Assessment Data for Educator and Student Growth
Using Assessment Data for Educator and Student Growth
 
NWEA Growth and Teacher evaluation VA 9-13
NWEA Growth and Teacher evaluation VA 9-13NWEA Growth and Teacher evaluation VA 9-13
NWEA Growth and Teacher evaluation VA 9-13
 
ND Assessment Program Alignment
ND Assessment Program AlignmentND Assessment Program Alignment
ND Assessment Program Alignment
 
Nd evaluations using growth data 4 13
Nd evaluations using growth data 4 13Nd evaluations using growth data 4 13
Nd evaluations using growth data 4 13
 
Dylan Wiliam seminar for district leaders accelerate learning with formative...
Dylan Wiliam seminar for district leaders  accelerate learning with formative...Dylan Wiliam seminar for district leaders  accelerate learning with formative...
Dylan Wiliam seminar for district leaders accelerate learning with formative...
 
SC Assessment Summit March 2013
SC Assessment Summit March 2013SC Assessment Summit March 2013
SC Assessment Summit March 2013
 
Assessment Program Alignment: Making Essential Connections Between Assessment...
Assessment Program Alignment: Making Essential Connections Between Assessment...Assessment Program Alignment: Making Essential Connections Between Assessment...
Assessment Program Alignment: Making Essential Connections Between Assessment...
 
Predicting Student Performance on the MSP-HSPE: Understanding, Conducting, an...
Predicting Student Performance on the MSP-HSPE: Understanding, Conducting, an...Predicting Student Performance on the MSP-HSPE: Understanding, Conducting, an...
Predicting Student Performance on the MSP-HSPE: Understanding, Conducting, an...
 
Using tests for teacher evaluation texas
Using tests for teacher evaluation texasUsing tests for teacher evaluation texas
Using tests for teacher evaluation texas
 
KLT TLC Leader Materials Set Excerpt
KLT TLC Leader Materials Set ExcerptKLT TLC Leader Materials Set Excerpt
KLT TLC Leader Materials Set Excerpt
 
What's New at NWEA: Children’s Progress Academic Assessment (CPAA)
What's New at NWEA: Children’s Progress Academic Assessment (CPAA)What's New at NWEA: Children’s Progress Academic Assessment (CPAA)
What's New at NWEA: Children’s Progress Academic Assessment (CPAA)
 
Predicting Proficiency… How MAP Predicts State Test Performance
Predicting Proficiency… How MAP Predicts State Test PerformancePredicting Proficiency… How MAP Predicts State Test Performance
Predicting Proficiency… How MAP Predicts State Test Performance
 
Connecting the Dots: CCSS, DI, NWEA, Help!
Connecting the Dots: CCSS, DI, NWEA, Help!Connecting the Dots: CCSS, DI, NWEA, Help!
Connecting the Dots: CCSS, DI, NWEA, Help!
 
What's New at NWEA: Keeping Learning on Track
What's New at NWEA: Keeping Learning on TrackWhat's New at NWEA: Keeping Learning on Track
What's New at NWEA: Keeping Learning on Track
 
What’s New at NWEA: Power of Teaching
What’s New at NWEA: Power of TeachingWhat’s New at NWEA: Power of Teaching
What’s New at NWEA: Power of Teaching
 

Último

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 

Último (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

Fusion 2012 - Assessment Literacy in a Teacher Evaluation Frame

  • 1. Assessment Literacy in a Teacher Evaluation Frame Andy Hegedus, Ed.D. June 2012
  • 2. Trying to gauge my audience and adjust my speed . . . • How many of you think your literacy with assessments in general is “Good” or better? • How many of you are currently figuring out how to use assessment data thoughtfully in a Teacher Evaluation process?
  • 3. Go forth thoughtfully with care • What we’ve known to be true is now being shown to be true – Using data thoughtfully improves student achievement • There are dangers present however – Unintended Consequences
  • 4. Remember the old adage? “What gets measured (and attended to), gets done”
  • 5. An infamous example • NCLB – Cast light on inequities – Improved performance of “Bubble Kids” – Narrowed taught curriculum
  • 6. A patient’s health doesn’t change because we know their blood pressure It’s our response that makes all the difference It’s what we do that counts
  • 7. Data Use in Teacher Evaluation is our construct for today Our nation has moved from a model of education reform that focused on fixing schools to a model that is focused on fixing the teaching profession
  • 8. Be considerate of the continuum of stakes involved Terminate Increasing risk Compensate Support Increasing levels of required rigor
  • 9. Let’s get clear on terms • Growth • Depiction of progress over time along a cross- grade scale • Value-Added – A determination of whether growth is greater for a particular student or group of students than would be expected
  • 10. Marcus’ growth College readiness standard Marcus Normal Growth Needed Growth
  • 11. What question is being answered in support of using data in evaluating teachers? Is the progress produced by this teacher dramatically different than teaching peers who deliver instruction to comparable students in comparable situations?
  • 12. There are four key steps required to answer this question The Test The Growth Metric The Evaluation The Rating
  • 13. The purpose and design of the instrument is significant • Many assessments are not designed to measure growth • Others do not measure growth equally well for all students
  • 14. Both Status and Growth are important Adult Reading Value Added = Teacher Contribution to Growth x 5th Grade Status x Time 1 Time 2 Beginning Literacy
  • 15. Teachers encounter a distribution of student performance Adult Reading x x x Norm = 5th x xxxx xxx x Grade Level ―Typical‖ for Performance a reference Grade xx population x Beginning Literacy
  • 16. Traditional assessment uses items reflecting the grade level standards Adult Reading 6th Grade 5th Grade 4th Grade Grade Level Standards Traditional Assessment Item Beginning Literacy Bank
  • 17. Traditional assessment uses items reflecting the grade level standards Adult Reading 6th Grade Grade Level Standards Overlap allows 5th Grade Grade Level Standards linking and scale construction 4th Grade Grade Level Standards Beginning Literacy
  • 18. Adaptive testing works differently Item bank can span full range of achievement
  • 19. Available item pool depth is crucial Est. RIT Correct Incorrect
  • 20. Tests are not equally accurate for all students California STAR NWEA MAP
  • 21. These differences impact measurement error Academic Warning Below Meets Exceeds .12 Adaptive .10 Test Significantly Different .08 5th Grade Error Information Level .06 Items .04 .02 Traditional Test .00 165 175 185 195 205 215 225 235 245 1st 86th Scale Score
  • 22. Error can change your life! • Think of a high stakes test – State Summative – Designed to identify if a student is proficient or not • Do they do that well? • 93% correct on Proficiency determination • Does it go off design well? • 75% correct on Performance Levels determination *Testing: Not an Exact Science, Education Policy Brief, Delaware Education Research & Development Center, May 2004, http://dspace.udel.edu:8080/dspace/handle/19716/244
  • 23. What is measured must be aligned to what is being taught • Assessments must align with the teacher’s instructional responsibility – Validity • Is it assessing what you think it’s assessing? – Reliability • If we gave it again, would the results be consistent?
  • 24. The instrument must be able to detect instruction • …when science is defined in terms of knowledge of facts that are taught in school…(then) those students who have been taught the facts will know them, and those who have not will…not. A test that assesses these skills is likely to be highly sensitive to instruction. Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles drawn from international comparisons', Measurement: Interdisciplinary Research & Perspective, 5: 1, 1 — 53
  • 25. The more complex, the harder to detect and attribute to one teacher • When ability in science is defined in terms of scientific reasoning…achievement will be less closely tied to age and exposure, and more closely related to general intelligence. In other words, science reasoning tasks are relatively insensitive to instruction. Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles drawn from international comparisons', Measurement: Interdisciplinary Research & Perspective, 5: 1, 1 — 53
  • 26. Other issues • Security and Cheating • Proctoring • Procedures
  • 27. Mean spring and fall test duration in minutes by school 90.00 80.00 70.00 60.00 Duration (Min) 50.00 40.00 30.00 20.00 10.00 0.00 Spring term Fall term
  • 28. Ten minutes makes a difference ~ one RIT 8.00 6.00 4.00 Growth Index (RIT) 2.00 0.00 -2.00 -4.00 -6.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 Students taking 10+ minutes longer spring than fall All other students
  • 29. Testing is complete . . . What is useful to answer our question? The Test The Growth Metric The Evaluation The Rating
  • 30. The metric matters - Let’s go underneath ―Proficiency‖ Difficulty of New York ―Meets‖ Level 100 90 80 70 College National Percentile Readiness 60 50 Typical 40 30 Math Reading 20 10 0 Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
  • 31. What gets measured and attended to really does matter Mathematics Proficiency College Readiness No Change Down Number of Students Up Fall RIT One district’s change in 5th grade mathematics performance relative to the KY proficiency cut scores
  • 32. Changing from Proficiency to Growth means all kids matter Mathematics Below projected growth Met or above Number of Students projected growth Student’s score in fall Number of 5th grade students meeting projected mathematics growth in the same district
  • 33. How can we make it fair? The Test The Growth Metric The Evaluation The Rating
  • 34. Consider . . . • What if I skip this step? – Comparison is likely against normative data so the comparison is to “typical kids in typical settings” • How fair is it to disregard context? – Good teacher – bad school – Good teacher – challenging kids How does your performance evaluation consider context?
  • 35. Nothing is perfect • Value added models control for a variety of classroom, school level, and other conditions – Over one hundred different value added models – All attempt to minimize error – Variables outside controls are assumed as random • Results are not stable – The use of multiple-years of data is highly recommended – Results are more likely to be stable at the extremes
  • 36. Multiple years of data is necessary for some stability Teachers with growth scores in lowest and highest quintile over two years using NWEA’s MAP (493 teachers) 120 100 Vote – Year 2 above Number of teachers 80 or below 60 Year 1 40 Year 2 20 0 Lowest Highest Typical r values for measures of teaching effectiveness range between .30 and .60 (Brown Center on Education Policy, 2010)
  • 37. A variety of errors mean more stability only at the extremes • Control for statistical error – All models attempt to address this issue • Error is compounded with combining two test events – Nevertheless, many teachers’ value-added scores will fall within the range of statistical error
  • 38. Range of teacher value-added estimates 12.00 11.00 Mathematics Growth Index Distribution by Teacher - Validity Filtered 10.00 9.00 Each line in this display represents a single teacher. The graphic shows the average growth index score for each teacher (green 8.00 line), plus or minus the standard error of the growth index estimate 7.00 (black line). We removed students who had tests of questionable validity and teachers with fewer than 20 students. 6.00 5.00 Average Growth Index Score and Range 4.00 Q5 3.00 2.00 Q4 1.00 0.00 Q3 -1.00 -2.00 Q2 -3.00 -4.00 Q1 -5.00 -6.00 -7.00 -8.00 -9.00 -10.00 -11.00 -12.00
  • 39. With one teacher, error means a lot
  • 40. Assumption of randomness can have risk implications • Value-added models assume that variation is caused by randomness if not controlled for explicitly – Young teachers are assigned disproportionate numbers of students with poor discipline records – Parent requests for the “best” teachers are honored – Sound educational reasons for placement are likely to be defensible
  • 41. Lower numbers can significantly impact a teacher level analysis • Idiosyncratic cases – In self-contained classrooms, one or two idiosyncratic cases can have a large effect on results
  • 42. How tests are used to evaluate teachers The Test The Growth Metric The Evaluation The Rating
  • 43. Translation into ratings can be difficult to inform with data • How would you translate a rank order to a rating? • Data can be provided • Value judgment ultimately used to set cut scores for points or rating
  • 44. • What is far below a district’s expectation is subjective • What about • Obligation to help teachers improve? • Quality of replacement teachers? Decisions are value based, not empirical
  • 45. • System for combining elements and producing a rating is also a value based decision – Multiple measures and principal judgment must be included – Evaluate the extremes to make sure it makes sense Even multiple measures need to be used well
  • 46. • Principal evaluation, state test, and local assessment scores are combined – Rating and points generated separately for each category – Principal has 60% of the evaluation • What happens at the extremes – Low end of Developing (not Ineffective) with test scores requires 98% rating by principal to not fall to Ineffective • Effective needs 95% – A highly effective teacher based on test scores needs 50% or higher on Principal evaluation to maintain rating NY use of multiple measures provides an example
  • 47. Recommendations • Be thoughtful • Involve variety of stakeholders • Use multiple years of student achievement data • Begin with pilots to understand the accuracy and unintended consequences • Embrace the formative advantages of growth measurement as well as the summative
  • 48. • Presentations and other recommended resources are available at: – www.nwea.org – www.kingsburycenter.org • Contacting us: NWEA Main Number 503-624-1951 E-mail: andy.hegedus@nwea.org More information

Notas do Editor

  1. Concept – If we fix schools we fix education. Schools actually did improve during this period.Race to the Top, Gates Foundation, Teach for America…Signaled in a number of waysNCLB about fixing schools – 100% Proficient by 2014Punishments for AYP – SES, Choice, RestructuringObama switch – Race to the TopFixing or improving teaching and the teaching professionRecruiting teachers from alternative careersMove from holding schools accountable to holding teachers accountable. Wrong no. Different Yes.David Brooks – Aug 2010 – Atlantic Monthly – Teachers are fair game – Teachers under scrutiny – Somewhat unfairlyBOE are asking about test based accountabilityCharleston SC – Any teacher without 50% of students on growth norm – Yr 1 on report, Yr 2 only rehired by approval by BOE50% Yr 1, 25% year 2 to be rehiredOur goal – Make sure you are prepared. Understand the risk. Proper ways to implement including legal issues. Clarify some of the implications – Very complex – Prepare you and a prudent course
  2. Teacher evaluations and the use of data in them can take many forms. You can use them for supporting teachers and their improvement. You can use the evaluations to compensate teachers or groups of teachers differently or you can use them in their highest stakes way to terminate teachers. The higher the stakes put on the evaluation, the more risk there is to you and your organization from a political, legal, and equity perspective. Most people naturally respond with increasing the levels of rigor put into designing the process as a way to ameliorate the risk. One fact is that the risk can’t be eliminated. Our goal – Make sure you are prepared. Understand the risk. Proper ways to implement including legal issues. Clarify some of the implications – Very complex – Prepare you and a prudent course
  3. Contrast with what value added communicatesPlot normal growth for Marcus vs anticipated growth – value added. If you ask whether the teachers provided value added, the answer is Yes.Other line is what is needed for college readinessBlue line is what is used to evaluate the teacher. Is he on the line the parents want him to be on? Probably not.Don’t focus on one at the expense of the otherNCLB – AYP vs what the parent really wants for goal settingCan be come so focused on measuring teachers that we lose sight of what parents valueWe are better off moving towards the kids aspirationsAs a parent I didn’t care if the school made AYP. I cared if my kids got the courses that helped them go where they want to go.
  4. This is the value added metricNot easy to make nuanced decisions. Can learn about the ends.
  5. Steps are quite important. People tend to skip some of these.Kids take a test – important that the test is aligned to instruction being givenMetric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple.People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well.Norms – growth of a kid or group of kids compared to a nationally representative sample of studentsWhy isn’t this value added?Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sampleThe third step controls for variables unique to the teacher’s classroom or environmentFourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
  6. State assessment designed to measure proficiency – many items in the middle not at the endsMust use multiple points of data over time to measure this. We also believe that a principal should be more in control of the evaluation than the test – Principal and Teacher leaders are what changes schools
  7. 5th grade IL math cut scores shown
  8. Common core – very ambitious things they want to measure – tackle things on an AP test. Write and show their work.A CC assessment to evaluate teachers can be a problem.Raise your hand if you know what the capital of Chile is. Stantiago. Repeat after me. We will review in a couple of minutes. Facts can be relatively easily acquired and are instructionally sensitive. If you expose kids to facts in a meaningful and engaging ways, it is sensitive to instruction.
  9. Problem – insensitive to instructionPrereq skills – writing skills. Given events on N. Africa today, Q requires a lot of pre-req knowledge. Need to know the story. Put it into writing. Reasoning skills to put it together with events today. And I need to know what is going on today as well. One doesn’t develop this entire set of skills in the 9 months of instruction.Common core is what we want. Just not for teacher evaluation.These questions are not that sensitive to instruction. Problematic when we hold teachers accountable for instruction or growth.
  10. Problem – insensitive to instructionPrereq skills – writing skills. Given events on N. Africa today, Q requires a lot of pre-req knowledge. Need to know the story. Put it into writing. Reasoning skills to put it together with events today. And I need to know what is going on today as well. One doesn’t develop this entire set of skills in the 9 months of instruction.Common core is what we want. Just not for teacher evaluation.These questions are not that sensitive to instruction. Problematic when we hold teachers accountable for instruction or growth.
  11. Teacher who cheats advantages herself and disadvantages the teacher who follows. Both legal and moral consequences.Security for paper and pencil. Controls for materials, exposure, practice assessments.Newspapers would love to write about the cheating scandal in your town.Have written policies and make sure they are being followed
  12. Steps are quite important. People tend to skip some of these.Kids take a test – important that the test is aligned to instruction being givenMetric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple.People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well.Norms – growth of a kid or group of kids compared to a nationally representative sample of studentsWhy isn’t this value added?Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sampleThe third step controls for variables unique to the teacher’s classroom or environmentFourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
  13. NCLB required everyone to get above proficient – message focus on kids at or near proficientSchool systems respondedMS standards are harder than the elem standards – MS problemNo effort to calibrate them – no effort to project elem to ms standardsStart easy and ramp up.Proficient in elem and not in MS with normal growth. When you control for the difficulty in the standards Elem and MS performance are the same
  14. Dramatic differences between standards based vs growthKY 5th grade mathematicsSample of students from a large school systemX-axis Fall score, Y number of kidsBlue are the kids who did not change status between the fall and the spring on the state testRed are the kids who declined in performance over spring – DecenderGreen are kids who moved above it in performance over the spring – Ascender – Bubble kidsAbout 10% based on the total number of kidsAccountability plans are made typically based on these red and green kids
  15. Same district as beforeYellow – did not meet target growth – spread over the entire range of kidsGreen – did meet growth targets60% vs 40% is doing well – This is a high performing district with high growthMust attend to all kids – this is a good thing – ones in the middle and at both extremesOld one was discriminatory – focus on some in lieu of othersTeachers who teach really hard at the standard for years – Teachers need to be able to reach them allThis does a lot to move the accountability system to parents and our desires.
  16. Steps are quite important. People tend to skip some of these.Kids take a test – important that the test is aligned to instruction being givenMetric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple.People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well.Norms – growth of a kid or group of kids compared to a nationally representative sample of studentsWhy isn’t this value added?Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sampleThe third step controls for variables unique to the teacher’s classroom or environmentFourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
  17. There are wonderful teachers who teach in very challenging, dysfunctional settings. The setting can impact the growth. HLM embeds the student in a classroom, the classroom in the school, and controls for the school parameters. Is it perfect. No. Is it better? Yes.Opposite is true and learning can be magnified as well.What if kids are a challenge, ESL or attendance for instance. It can deflate scores especially with a low number of kids in the sample being analyzed. Also need to make sure you have a large enough ‘n’ to make this possible especially true in small districts.Our position is that a test can inform the decision, but the principal/administrator should collect the bulk of the data that is used in the performance evaluation process.
  18. Experts recommend multiple years of data to do the evaluation. Invalid to just use two points and will testify to it.Principals never fire anyone – NY rubber room – mythIf they do, it’s not fast enough. – Need to speed up the processThis won’t make the process faster – Principals doing intense evaluations will
  19. The question we asked: Are teachers who are rated poorly or well in one year likely to stay there in the second year? Important if high stakes where there is a belief that someone won’t improve.We did VA assessment in year one and again in year two – 493 teachers40% of people in the bottom quintile moved out.Yr 1 and yr 2 correlations – these results are more highly correlated than most other studies. Our is best case scenario.One class can impact results so need multiple years of data to get stable results
  20. Measurement error is compounded in test 1 and test 2
  21. Green line is their VA estimate and bar is the error of measureBoth on top and bottom people can be in other quartilesPeople in the middle can cross quintiles – just based on SEMCross country – winners spread out. End of the race spread. Middle you get a pack. Middle moving up makes a big difference in the overall race.Instability and narrowness of ranges means evaluating teachers in the middle of the test mean slight changes in performance can be a large change in performance ranking
  22. Non –random assignments Models control for various things – FRL, ethnicity, school effectiveness overall. Beyond this point assignment is random.1st year teachers get more discipline problems than teachers who have been 30 years. Pick the kids they get. If the model doesn’t control for disciplinary record – none do have that data – scores are inflated. Makes model invalid.Principals do need to do non-random assignment – sound educational reasons for the placement – match adults for kids
  23. One or two kids can impact a classroom and not a grade and schoolsWhy? A large n helps reduce the standard error
  24. Steps are quite important. People tend to skip some of these.Kids take a test – important that the test is aligned to instruction being givenMetric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple.People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well.Norms – growth of a kid or group of kids compared to a nationally representative sample of studentsWhy isn’t this value added?Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sampleThe third step controls for variables unique to the teacher’s classroom or environmentFourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
  25. Use NY point system as the example
  26. Assessment is ultimately to serve kids. Be thoughtful. Get help.Involve stakeholders in the creation of a comprehensive evaluation systems with multiple measures of teacher effectiveness (Rand, 2010)Select the measures and VA models carefullyBring as much data to bear as possible to create a body of evidenceStart small and learnWe wouldn’t be who we are if I didn’t stress using the data for formative purposes. That’s what we really value.