•1 gostou•191 visualizações

Denunciar

Compartilhar

Baixar para ler offline

Error - What is it? Standard Error of Measurement Standard Deviation or Standard Error of Measurement Why all the fuss about Error? Sources of Error Sources of Error Influencing various Reliability Coefficients Band Interpretation

- 1. CHAPTER 17: ACCURACY AND ERRORReporter: SHELAMIE M. SANTILLAN-EDUC 243 student2nd Sem. S.Y. 2016-
- 2. When is a test score inaccurate? Almost always. All tests and scores are imperfect and are subject to
- 3. Error – What is it? No test measures perfectly, and many tests fail to measure as well as we would like them to. Tests make “mistakes”. They are always associated with some degree of error.
- 4. Error – What is it? Think about the last test you took. Did you obtain exactly the score you thought or knew you deserved?
- 5. Example of a type of error that lower your obtained score? When you couldn’t sleep the night before the test When you are sick but took the test anyway When the essay test you were taking was so poorly constructed it was hard to tell what was being tested.
- 6. Example of a type of error that lower your obtained score? When the test had a 45-minute time limit but you were allowed only 38 minutes, When you took a test that had multiple defensible answers
- 7. Example of a type error (of situation) that raised your obtained score? The time you just happened to see the answers on your neighbor’s paper, The time you got lucky guessing, The time you had 52 minutes for a 45-minute test
- 8. Example of a type error (of situation) that raised your obtained score? The time the test was so full of unintentional clues that you were able to answer several questions based on the information given in other question.
- 9. Then how does one go about discovering one’s true score? Unfortunately, we don’t have an answer. The true score and the error score are both theoretical or hypothetical values.
- 10. Why bother with the true score or error score? Because they allow us to illustrate some important points about test score reliability and test score
- 11. Simply keep the mind! Remember: Obtained Score = true score+ error score
- 12. Table 17.1 The relationship among Obtained Scores, Hypothetical True Scores, and Hypothetical Error Score for a Ninth-Grade Math Test Student Obtained Score True Score Error Score Donna 91 88 +3 Jack 72 79 -7 Phyllis 68 70 -2 Gary 85 80 +5 Marsha 90 86 +4 Hypothetical Values
- 13. We will use the error scores from table 17.1 (3, -7, -2, 5,4, -3) Is the standard deviation of error scores of a test. The Standard Error of Measurement (abbreviated S ) m
- 14. Step 1: Determine the mean. M = X = 0 = 0 Student Obtain ed Score True Score Error Score Donna 91 88 +3 Jack 72 79 -7 Phyllis 68 70 -2 Gary 85 80 +5 Marsha 90 86 +4 Milton 75 78 -3 ∑ N 6
- 15. Student Obtaine d Score True Score Error Score Donna 91 88 +3 Jack 72 79 -7 Phyllis 68 70 -2 Gary 85 80 +5 Marsha 90 86 +4 Milton 75 78 -3 Step 2: Subtract the mean from each error score to arrive at the deviation scores. Square each deviation score and sum the squared deviations. X – M = x x +3 – 0 = 3 9 -7– 0 = -7 49 -2 – 0 = -2 4 +5 – 0 = 5 25 +4 – 0 = 4 16 -3 – 0 = -3 9 2 ∑X = 2 112
- 16. Step 3: Plug the x sum into the formula and solve for the standard deviation. 2 Error Score SD =
- 17. Fortunately, a rather simple statistical formula can be used to estimate this standard deviation (Sm) without actually knowing the error scores: Where r is the reliability of the test and SD is the test’s standard deviation.
- 18. USING THE STANDARD ERROR OF MEASUREMENT In summary, then, we know that error scores: 1. are normally distributed 2. have a mean of zero 3. have a standard deviation called the standard error of measurement
- 19. USING THE STANDARD ERROR OF MEASUREMENT Studen t Obtained Score True Scor e Error Scor e Donna 91 88 +3 Jack 72 79 -7 Phyllis 68 70 -2 Gary 85 80 +5 Marsh a 90 86 +4 Milton 75 78 -3 Figure 17.1 The error score distribution Table 17.1
- 20. This figure tells us that the distribution of error scores is a normal distribution Figure 17.2 The error score distribution for the test depicted Error score of the ninth-grade math test
- 21. Fig. 17.3 The error score distribution for the test depicted in Table 17.1 With approximate normal curve percentages.
- 22. Let’s use the following number line to represent an individual’s obtained score, which we will simply call the X:
- 23. Fig. 17.4 The error distribution around an obtained score of 90 for a test with Sm= 4.32 Student Obtained Score True Scor e Error Score Donna 91 88 +3 Jack 72 79 -7 Phyllis 68 70 -2 Gary 85 80 +5 Marsh a 90 86 +4 Milton 75 78 -3
- 24. Fig. 17.5 The error distribution around an obtained score of 75 for a test with Sm = 4.32 Student Obtained Score True Scor e Error Score Donna 91 88 +3 Jack 72 79 -7 Phyllis 68 70 -2 Gary 85 80 +5 Marsh a 90 86 +4 Milton 75 78 -3
- 25. Standard Deviation or Standard error of measurement? Standard Deviation (SD) Standard Error of Measurement (Sm) Is the variability of raw scores. It tells us how spread out the scores are in a distribution of raw scores. Is based on a group of Is the variability of error scores. Is based on a group of scores that is hypothetical.
- 26. Why all the fuss about error? For two reasons: 1.We want to make you aware of the fallibility of test scores. 2.We want to sensitize you
- 27. Classification of sources of error 1. Test Takers. 2. The test itself. 3. Test administration. 4. Test scoring.
- 28. Test Takers: Factors that would likely result in an obtained score lower than a student’s true score: • fatigue and illness • Accidentally seeing another
- 29. The test itself: Trick questions Reading level that is too high. Ambiguous questions. Items that are too difficult.
- 30. Test Administration: Physical Comfort Instructions & Explanations Test administrator Attitudes
- 31. Error in Scoring: When computer scoring is used, error can occur. When test are hand scored, the likelihood of error increases greatly.
- 32. Sources of Error Influencing Various Reliability Coefficients Test-Retest Alternate Forms Internal Consistency
- 33. Test- Retest Short-interval test-retest coefficients are not likely to be affected greatly by within- student error. Any problem that do exist in the test are present both the first and second administrations, affecting scores the same way each time the test is administered.
- 34. Alternate Form Since alternate-forms reliability is determined by administering two different forms or versions of the same test to the same group close together in time, the effects within student error are negligible.
- 35. Alternate Form Error within the test, however, has a significant effect on alternate-forms reliability. As with test-retest method, alternate- forms score reliability is not greatly affected by error in administering or scoring the test, as long as similar
- 36. Alternate Form
- 38. BAND INTERPRETATION uses the standard error of measurement to a more realistic interpretation and report groups of test scores.
- 40. BAND INTERPRETATION Formula to compute the reliability of the difference score is as follows:
- 41. BAND INTERPRETATION Step 1: List Data (let’s assume) M: 100 , SD: 10, Score reliability - .91 for all subtests. Here are the subtest scores for John:
- 42. BAND INTERPRETATION Step 2: Determine Sm (standard error of measurement) Since SD and r are the same for each subtest in this example, the standard error of measurement will be the same for each student.
- 43. BAND INTERPRETATION Step 2: Add and Subtract Sm
- 44. BAND INTERPRETATION Step 3: Graph the Results Shade in the bands to represent the range of scores that has 68% chance of capturing John’s
- 45. BAND INTERPRETATION Step 4: Interpret the Bands • Interpret the profile of bands by visually inspecting the bars to see which bands overlap and which do not. • Those that overlap probably represent differences that likely occurred by chance.
- 46. Final Word: Technically, there are more accurate statistical procedures for determining real differences between an individual’s test scores than the ones we have been able to present here. These procedures, however, are time- consuming, complex, and overly specific for the typical teacher. Within the classroom, band interpretation, properly used, makes for a practical alternative to those more advanced

- Think about the last test you took. Did you obtain exactly the score you thought or knew you deserved? Was your score higher than you expected? Was it lower than you expected? What about your obtained scores on all the other tests you have taken? Did they truly reflect your skill, knowledge, or ability, or did they sometimes underestimate your knowledge, ability, or skill? Or did they overestimate? If your obtained test scores did not always reflect your true ability, they were associated with some error.
- Your obtained scores may have been lower or higher than they should have been. In short, an obtained score has a true component (actual level of ability, knowledge) and an error component (which may act to lower or raise the obtained score).
- We never actually know an individual’s true score or error score.
- They are important concepts because they allow us to illustrate some important points about test score reliability and test score accuracy.
- The standard deviation of the error score distribution, also known as the standard error of measurement, is 4. 43. If we could know what the error scores are for each test we administer, we could compute Sm in this manner. But, of course, we never know these error scores. If you are following so far, your neat question should be, “But how in the world do you determine the standard deviation of the error scores if you never know the error scores?”
- Error scores are assumed to be random. As such, they cancel each other out. That is obtained scores are inflated by random error to the same extent as they are deflated by error. Another way of saying this is that the mean of the error scores for a test is zero. The distribution of the error scores is also important, since it approximates a normal distribution closely enough for us to use the normal distribution to represent it.
- Returning to our example from the ninth-grade math test in Table 17.1, we recall that we obtained an Sm of 4.32 for the data provided.
- Figure 17.2 illustrates the distribution of error scores for these data. What does the distribution in Fig. 17.2 tell us? Before you answer, consider this: The distribution error of scores is a normal distribution. This is important since, as you learned in Chapter 13, the normal distribution has characteristics that enable us to make decisions about scores that fall between, above, or below different points in the distribution. We are able to do so because fixed percentages of scores fall between various score values in a normal distribution.
- (Fig. 17.3 Should refresh your memory) We listed along the baseline the standard deviation of the error score distribution. This is more commonly called the standard error of measurement (Sm) of the test. Thus we can see that 68% of the error scores for the test will be no more than 4.32 points higher or 4.32 points lower than the true scores. That is, if there were 100 obtained scores on this test, 68 of these scores would not be “off” their true scores by more than 4.32 points. The Sm then, tells us about the distribution of obtained score around true scores. By knowing an individual’s true socre we can predict what his or her obtained score is likely to be.
- The careful reader may be thinking, “That’s not very useful information. We cab never know what a person’s true score is, only their obtained score.” This is correct. As a test users, we work only with obtained scores. However, we can follow our logic in reverse. If 68% of obtained scores fall within 1 Sm of their true scores, then 68% of true scores must fall within 1Sm of their obtained scores. Strictly speaking, this reverse logic is somewhat inaccurate, it would be true 99% of the itme (Gullikson, 1987). Therefore the Sm is often used to determine how test error is likely to have affected individual obtained scores. That is, X plus or minus 4.32 (+4.32) defines the range or band
- Why all fuss? Remember our original point. All test scores are fallible (tending to err); they contain a margin of error. The Sm is a statistic that estimates margin for us. We are accustomed to reporting a single test score. In education, we have long had a tendency to overinterpret small differences in test scores since we too often consider obtained scores to be completely sccurate. Incorporating the Sm in reporting test scores greatly minimizes the likelihood of overinterpretation and forces us to consider how fallible our test scores are. After considering the Sm from a slightly different angle, we will show how to incorporate it to make comparisons among test scores. This procedure is called band interpretation.
- You learned to compute and interpret SD in chapter 13.
- In reality, an individual’s obtained score is the best estimate of an individual’s true score. That is, inspite of the foregoing discussion, we usually use the obtained score as our best guest of a student’s true level of ability. Well, why all the fuss about error then?
- Generally, error due to within-student factors is beyond our control.
- Physical Comfort: room temperature, humidity, lighting, noise, and seating arrangement are all potential sources of error for the test taker. Instructions and Explanations: Different test administrators provide differing amounts of information to test takers. Some spell words, provide hints, or tell whether it’s better to guess or leave blanks, while others remain fairly distant. Naturally, your score may vary depending to the amount of information you are provided. Test Administrator Attitudes: Administrators differ in the notions they convey about the importance of the test, the extent to which they are emotionally supportive of students, and the way in which they monitor the test. To the extent that these variables affect students differently, test score reliability and accuracy will be impaired.
- The computer a highly reliable machine, is seldom the cause of such errors. But teachers and other test administrators prepare the scoring keys, introducing possibilities for error. And sometimes fail to use No. 2 pencils or make extraneous marks on an answer sheets, introducing another potential source of scoring error. Needles to say, when tests are hand scored, as most classroom tests are, the likelihood of error increases greatly. In fact, because you are human, you can be sure that you will make some scoring errors in grading the tests you give.
- With test-retest and alternate-forms reliability, with-in-student factors affect the method of estimating score reliability, since changes in test performance due to such problems as fatigue, momentary anxiety, illness or just having an “off” day can be doubled because there two separate administrations of the test. If the test is sensitive to those problems, it will record different scores from one administration to another, lowering the reliability (or correlation coefficient) between them. Obviously, we would prefer that that the test not be affected by those problems. But if it is, we would like to know about it.
- List subtests and scores and the M, SD, and reliability (r) for each subject. For purpose of illustration, let’s assume that the mean is 100, the standard deviation is 10, and the score reliability is .91 for all the subtests.
- Since SD and r are the same for each subtest in this example, the standard error of measurement will be the same for each student.
- To identify the band or interval of scores that has 68% chance of capturing John’s true score, add and subtract Sm to each subtest score. If the test could be given to John 100 times (without John learning from taking the test), 68 out of 100 times John’s true score would be within the following bands:
- To identify the band or interval of scores that has 68% chance of capturing John’s true score, add and subtract Sm to each subtest score. If the test could be given to John 100 times (without John learning from taking the test), 68 out of 100 times John’s true score would be within the following bands: