SlideShare uma empresa Scribd logo
1 de 5
Baixar para ler offline
International Journal of Mathematics and Statistics Invention (IJMSI)
E-ISSN: 2321 – 4767 P-ISSN: 2321 - 4759
www.ijmsi.org Volume 2 Issue 1 ǁ January. 2014ǁ PP-06-10

“Reliability of four-response type multiple choice questions of
pharmacology summative tests of II M.B.B.S students”
Bhavisha N. Vegada, Bharti N. Karelia, Ajita Pillai
Department of Pharmacology, P.D.U. Govt. Medical College,Rajkot, Gujarat, India.

ABSTRACT:
Introduction
One of the major concerns in the construction of test items for an examination is ensuring the reliability of the
test items. In order for assessments to be sound, they must be free of bias and distortion. Reliability and validity
are two concepts that are important for defining and measuring bias and distortion. Internal consistency is an
estimate of reliability based on the average correlation among items within a test and examines the degree to
which the MCQs in a test measure the same characteristics or domains of knowledge.
Methods
In this study ten MCQ tests from 2008 to 2012 were selected and analyzed to obtain their mean, standard
deviation, reliability coefficient and standard error of measurement. Data entry was done by using Microsoft
excel 2007.
Results
Mean reliability coefficient was 0.54. Out of ten tests, two tests had low reliability, five tests had very low and
three tests had questionable reliability. Mean standard deviation of MCQ tests was 3.52 with range of 3.05 to
4.01. Mean standard error of measurement was 2.37 with range of 2.24 to 2.44.
Conclusion
Reliability of all MCQ tests was low and need improvement. Standard Error of Measurement is more
appropriate parameter for reliability than Reliability Coefficient.

KEYWORDS: Reliability, Reliability coefficient, Standard error of measurement
I.

INTRODUCTION

The educational objectives in medicine as well as in other discipline are generally allotted to three
‘domains’-cognitive, psychomotor and affective. Hence, medical examination should be designed to answer
whether an undergraduate has achieved the above educational objectives by answering the following three
questions: (1) what does he know (cognitive)? (2) what can he do (psychomotor)? And (3) what sort of person is
he (affective)? Regrettably the current medical examination system still could not answer these questions
faithfully. [1]
Objectivising evaluation is becoming increasingly more important in the field of education, both for
summative & formative purpose, as has been again & again emphasized by guidelines published by several
universities. One method of achieving this purpose is the widespread use of objective written items, and the
most popular form of which is the multiple choice question (MCQ). [2]
Designing MCQ is a complex and time consuming process in a multidisciplinary integrated curriculum.
MCQs are used mostly for comprehensive assessment at the end of a semester or academic session and provide
feedback to the teachers on their educational action. Having constructed & assessed a test, a teacher needs to
know, how good the test questions are & whether the test items were able to reflect students’ performance in the
course related to learning. Because of their versatile character, MCQs are the most commonly used tool for
assessing the knowledge capabilities of medical students. [3] There are different types of MCQs like fiveresponse, four-response, three-response and true/false or two-response. [4] One of the major concerns in the
construction of test items for an examination is ensuring the reliability of the test items. [3]
For the assessments to be sound, it should be free from bias and distortion. Reliability and validity are
two concepts that are important for defining and measuring bias and distortion. Reliability refers to the extent to
which assessments are consistent. Validity refers to the accuracy of an assessment. [5] Concepts related to
reliability are consistency, precision, stability, equivalence and internal consistency (Beanlander al1999 p328).

www.ijmsi.org

6|Page
Reliability of four-response type multiple…
Internal consistency is an estimate of 'reliability based on the average correlation among items within a test'
(Nunnally& Bernstein 1994 p251) and examines the degree to which the MCQs in a test measure the same
characteristics or domains of knowledge (Bean land er al 1999, Polir&Hungler 1999). Typically, internal
consistency is measured by the calculation of a reliabiliry coefficient (Cronbach 1990, Beanland et al 1999, Polit
& Hungler 1999). [6] Reliability depends both on Standard Error of Measurement (SEM) and on the ability range
(standard deviation, SD) of candidates taking an assessment. [7]
The present study was taken up with an objective to measure the reliability of MCQs.

II.

MATERIAL& METHODS

The pattern of 1st and 2nd terminal examination of pharmacology subject at our institute consists of 80
marks theory and 50 marks practical examination. Theory examination consists of 20 multiple choice questions
of 1 mark each. Year indicate when the 1 st terminal examination was held and A for 1st terminal and B for 2nd
terminal examination.
2.1 Data collection
MCQ items were taken from the 10 summative test papers from the year 2008-2012 (each year having
two terminal examinations). A total of 200 test items were selected for the item analysis. Each MCQ consisted
of a stem and four choices and the students were to select one best answer from these four choices. A correct
response to an item was awarded 1 mark, while an incorrect response would result in negative 0.25 marks and a
no- attempt or blank response was given no mark.
2.2 Data analysis
MCQ scores of students of different batches of last five years from 2008 to 2012 were included for
analysis and data entry was done by using Microsoft excel 2007. Different statistical parameters like mean,
standard deviation, reliability coefficient, standard error of measurement and confidence interval for MCQ tests
were calculated.
The Equation for Reliability Coefficient is as follow: [8]
Alpha = [n/(n - 1)] x [(Vart - ΣVari)/Vart]

(1)

Alpha = estimated reliability of the full-length test
n = number of items
Vart = variance of the whole test (standard deviation squared)
ΣVari = sum the variance for all n items
The values for reliability coefficients range from 0 to 1.0. A coefficient of 0 means no reliability and
1.0 means perfect reliability. Since all tests have some error, reliability coefficients never reach 1.0. Generally,
if the reliability of a standardized test is above .80, it is said to have very good reliability; if it is below .50, it
would not be considered a very reliable test. [5]
Tests with a reliability coefficient 0.90 and above were considered as excellent reliability, those
between 0.80-0.90 were considered very good, those between 0.70-0.80 were good, those between 0.60-0.70
were considered low and therefore needs to be supplemented by other measures to determine grades, those
between 0.50-0.60 needs revision of test and those with 0.50 and those below were considered to have
questionable reliability. [9]
The Equation for Standard Error of Measurement is as follow: [10]
SEM = S (1-r) ½
Where, S = theStandard Deviation for the test.
r = the Reliability coefficient for the test

III.

(2)

RESULTS

As shown in Table 1, mean reliability coefficient was 0.54. Out of ten tests, two tests had low
reliability, five tests had very low and three tests had questionable reliability.
(Table-1)

www.ijmsi.org

7|Page
Reliability of four-response type multiple…
As shown in Table 2, mean standard deviation of MCQ tests was 3.52 with range of 3.05 to 4.01. Mean
standard error of measurement was 2.37 with range of 2.24 to 2.44.
(Table-2)

IV.

DISCUSSION

The reliability of an examination provides useful information about its performance (and it is selfevident that an examination with a very low reliability is unlikely to be a good or an effective examination, to
the point where zero reliability means that the marks from an examination are no more effective than are
random numbers at distinguishing between candidates). Having said that, the mere fact that an examination has
a high reliability does not ensure that it is necessarily functioning effectively, because the reliability is heavily
dependent upon the ability range of the candidates who are taking it. As has already been seen: [7]
1)
2)
3)

4)

The very same exam can apparently drop its reliability dramatically if it is retaken but only by those who
have already passed it;
The reliability can be artificially inflated by encouraging very weak candidates to take it, thereby
increasing the SD of the marks;
It is almost inevitable where successive examinations are taken, as with the Part 2 Written examination of
MRCP(UK) being taken after Part 1, that the SD will necessarily be lower (only able candidates passing
Part 1), and that the reliability of a second examination will usually be lower than the first examination.
When examinations have very small numbers of candidates, as with the SCEs, there is a greater risk that
the reliability will be distorted by an unusually high or low spread of candidate abilities

Reliability can always be increased by making an assessment progressively longer, thereby increasing the
number of examination items, although that is expensive in time, effort and opportunity cost. [7] Our results
showed that two tests have low reliability means they need to be supplemented by other measure and there are
probably some items which could be improved. Five tests have very low reliability means they need revision
and supplemented by other measure. Three tests have questionable reliability means thesetests should not
contribute heavily to the course grade, they need revision.
Cortina et alconsider reliability coefficient at least 0.70 or above to be adequate for classroom
assessment. [11] None of the test was found to fulfill this criteria ,so our MCQ tests have low reliability. High
reliability means that the questions of a test tended to "pull together."Students who answered a given question
correctly were more likely to answer other questions correctly. If a parallel test were developed by using similar
items, the relative scores of students would show little change. Low reliability means that the questions tended
to be unrelated to each other in terms of who answered them correctly. The resulting test scores reflect
peculiarities of the items or the testing situation more than students' knowledge of the subject matter. [9]
Another way to express reliability is in term of the standard error of measurement. This measure
provides an estimate of how much an individual’ score would be expected to change on re-testing with no
change in knowledge and perception with the same or an equivalent form of the test. Our result showed that
standard deviation of candidate scores showed large variation (3.05-4.01) as compared to variation in standard
error of measurement (2.24-2.44).
Based on the assumption that any test score contains an error , SEM is used to estimate a band or
interval within which a person’s true score would fall, that is the score (hypothetical) the student would receive
if there were no error of measurement.[12] The smaller the SEM is; the narrower the interval. Narrow intervals
are more precise, containing less error, than larger intervals. SEM is inversely related to the Reliability
Coefficient. [13]
For example, in our study 2008-A exams ,Mean Observed Score was 8.11 and SEM was 2.33.Wecan
say with 95% Confidence ,true score of students of this batch lies in an interval within two SEM of the
observed score.(between3.45 and 12.77). An alternative interpretation states that 95 times out of 100 times the
students’ score on a retest would be between 3.45 and 12.77.
As shown in Table 1, reliability coefficient of first terminal examinations of year 2009 and 2011 were
0.51 and 0.63 respectively but standard error of measurement of these examinations was same 2.44. So for
reliability of examinations don’t consider only reliability coefficient but standard error of measurement is also
important stastical parameter. SEM is more appropriate for reliability than reliability coefficient. [7]

www.ijmsi.org

8|Page
Reliability of four-response type multiple…
Our results suggest that our MCQ tests have not very reliable tests and need to improve.Reliability also
shows problems when numbers of candidates in examinations are low and sampling error affect the range of
candidate ability. SEM is not subject to such problems; it is therefore a better measure of the quality of an
assessment and is recommended for routine use. [7]

V.

CONCLUSION

Reliability of all MCQ tests was low and need improvement. Standard Error of Measurement is more
appropriate parameter for reliability than Reliability Coefficient.

REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]

[8]
[9]
[10]

[11]
[12]
[13]

T. Ho, W. Yip, and J. Tay, The use of multiple choice questions in medical examination: An evaluation of scoring and analysis of
results, Singapore Medical Journal, 22(6), 1981, 361-367.
N. Ananthakrishnan, Item analysis-validation and banking of MCQs, in N. Ananthakrishnan, K. Sethuraman, S. kumar, (Ed.),
Medical Education principles and practice, 2(JIPMER, Pondicherry)131-137.
N. Mitra, H. Nagaraja, G. ponndurai, et al, The levels of difficulty and discrimination indices in type A multiple choice questions
of pre-clinical semester 1 multidisciplinary summative tests,IeJSME, 3(1), 2009, 2-7.
Understanding item analysis reports, [online] available at:
http://www.washington.edu/oea/service/scanning-scoring/item_analysis.html [accessed December 20, 2012].
Classroom assessment, [online] available at: http://www.fcit.usf.edu/assessment/basic/basic.html
[accessed December 20,2012].
J. Considine, M. Botti, and S. Thomas, Design, format, validity& reliability of multiple choice question for use in nursing
research and education, Collegian, 12(1), 2005, 19-24.
J. Tighe, I. McManus, N. Dewhurst, L. Chis, andJ. Mucklow, The standard error of measurement is a more appropriate measure
of quality for postgraduate medical assessments than is reliability: an analysis of MRCP (UK) examinations, BMC Medical
Education, 10-40.
Introduction to reliability, [online] Avialable at:
http://www.ncsu.deu/jlnietfe/EDP560_notes_files/reliability.pdf [accesssed January 10, 2013].
Understanding item analysis reports, [online] Available at:
http://www.washington.edu/oea/service/scanningscoring/scanning/itemanalysis [accessed January 10, 2013].
Standard error of measurement, [online] Available at:
http://web.sau.edu/WaterStreetMaryA/NEW%20intro%20to%20tests%20&%20measures%20website_files/standard_error_of_m
easurement.htm [accessed January 10, 2013].
J. Cortina, What is Coefficient Alpha?An Examination of Theory and Applications, Journal of Applied Psychology, 78(1), 1993,
98-104.
Test reliability, [online] Available at: http://www.indians.edu/best/testreliability [accessed January 10, 2013.
L. Harvill, Standard error of measurement, Education measurement : issues & practice, summer, 33-41.

www.ijmsi.org

9|Page
Reliability of four-response type multiple…
Table-1 :Reliability coefficient of ten MCQ tests
Year
2008
2009
2010
2011
2012

Examination
A
B
A
B
A
B
A
B
A
B

Mean

Reliability Coefficient
0.50
0.54
0.51
0.60
0.66
0.38
0.63
0.53
0.51
0.49
0.54

Table-2: Standard deviation, standard error of measurement and confidence interval of ten MCQ tests
Year

Examination

Mean

Standard
deviation (SD)

2008

A
B
A
B
A
B
A
B
A
B

08.11
09.49
08.74
09.68
10.46
08.68
09.71
08.02
10.28
07.26
09.04

3.30
3.57
3.49
3.77
3.98
3.05
4.01
3.40
3.20
3.39
3.52

2009
2010
2011
2012
Mean

www.ijmsi.org

Standard Error of
Measurement
(SEM)
2.33
2.42
2.44
2.38
2.32
2.40
2.44
2.33
2.24
2.42
2.37

Confidence interval at
95% (CI)
(Mean ± 2SEM)
3.45-12.77
4.65-14.33
3.86-13.62
4.92-14.44
5.82-15.10
3.88-13.48
4.83-14.59
3.36-12.68
5.80-14.76
2.42-12.10

10 | P a g e

Mais conteúdo relacionado

Mais procurados

Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Mahsa Farahanynia
 
Validity and objectivity of tests
Validity and objectivity of testsValidity and objectivity of tests
Validity and objectivity of testsbushra mushtaq
 
Valiadity and reliability- Language testing
Valiadity and reliability- Language testingValiadity and reliability- Language testing
Valiadity and reliability- Language testingPhuong Tran
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good testALMA HERMOGINO
 
validity its types and importance
validity its types and importancevalidity its types and importance
validity its types and importanceIerine Joy Caserial
 
Qualities of good evaluation tools
Qualities of good evaluation toolsQualities of good evaluation tools
Qualities of good evaluation toolsJijiCk
 
Quantitative Analysis (Language and Literature Assessment)
Quantitative Analysis (Language and Literature Assessment)Quantitative Analysis (Language and Literature Assessment)
Quantitative Analysis (Language and Literature Assessment)Joy Labrador
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good testcyrilcoscos
 
Testing for Language Teachers Arthur Hughes
Testing for Language TeachersArthur HughesTesting for Language TeachersArthur Hughes
Testing for Language Teachers Arthur HughesRajputt Ainee
 
CHARACTERISTICS OF A GOOD INSTRUMENT
CHARACTERISTICS OF A GOOD INSTRUMENTCHARACTERISTICS OF A GOOD INSTRUMENT
CHARACTERISTICS OF A GOOD INSTRUMENTMusfera Nara Vadia
 
8 2008-normative data for the letter cancellation task in school children
8 2008-normative data for the letter cancellation task in school children8 2008-normative data for the letter cancellation task in school children
8 2008-normative data for the letter cancellation task in school childrenElsa von Licy
 

Mais procurados (18)

Reliability
ReliabilityReliability
Reliability
 
Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)
 
Test validity
Test validityTest validity
Test validity
 
Validity and objectivity of tests
Validity and objectivity of testsValidity and objectivity of tests
Validity and objectivity of tests
 
Validity and reliablity
Validity and reliablityValidity and reliablity
Validity and reliablity
 
CMSS FIVE
CMSS FIVECMSS FIVE
CMSS FIVE
 
Valiadity and reliability- Language testing
Valiadity and reliability- Language testingValiadity and reliability- Language testing
Valiadity and reliability- Language testing
 
Validity and reliability
Validity and reliabilityValidity and reliability
Validity and reliability
 
Content validity
Content validityContent validity
Content validity
 
Characteristic of good test
Characteristic of good testCharacteristic of good test
Characteristic of good test
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
 
validity its types and importance
validity its types and importancevalidity its types and importance
validity its types and importance
 
Qualities of good evaluation tools
Qualities of good evaluation toolsQualities of good evaluation tools
Qualities of good evaluation tools
 
Quantitative Analysis (Language and Literature Assessment)
Quantitative Analysis (Language and Literature Assessment)Quantitative Analysis (Language and Literature Assessment)
Quantitative Analysis (Language and Literature Assessment)
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
 
Testing for Language Teachers Arthur Hughes
Testing for Language TeachersArthur HughesTesting for Language TeachersArthur Hughes
Testing for Language Teachers Arthur Hughes
 
CHARACTERISTICS OF A GOOD INSTRUMENT
CHARACTERISTICS OF A GOOD INSTRUMENTCHARACTERISTICS OF A GOOD INSTRUMENT
CHARACTERISTICS OF A GOOD INSTRUMENT
 
8 2008-normative data for the letter cancellation task in school children
8 2008-normative data for the letter cancellation task in school children8 2008-normative data for the letter cancellation task in school children
8 2008-normative data for the letter cancellation task in school children
 

Destaque

Compliance with International Financial Reporting Standards
Compliance with International Financial Reporting StandardsCompliance with International Financial Reporting Standards
Compliance with International Financial Reporting Standardsinventionjournals
 
Totally R*-Continuous and Totally R*-Irresolute Functions
Totally R*-Continuous and Totally R*-Irresolute FunctionsTotally R*-Continuous and Totally R*-Irresolute Functions
Totally R*-Continuous and Totally R*-Irresolute Functionsinventionjournals
 
An Improved Regression Type Estimator of Finite Population Mean using Coeffic...
An Improved Regression Type Estimator of Finite Population Mean using Coeffic...An Improved Regression Type Estimator of Finite Population Mean using Coeffic...
An Improved Regression Type Estimator of Finite Population Mean using Coeffic...inventionjournals
 
Inventory Model with Different Deterioration Rates with Stock and Price Depen...
Inventory Model with Different Deterioration Rates with Stock and Price Depen...Inventory Model with Different Deterioration Rates with Stock and Price Depen...
Inventory Model with Different Deterioration Rates with Stock and Price Depen...inventionjournals
 
Probabilistic diameter and its properties.
Probabilistic diameter and its properties.Probabilistic diameter and its properties.
Probabilistic diameter and its properties.inventionjournals
 
Congruence Lattices of A Finite Uniform Lattices
Congruence Lattices of A Finite Uniform LatticesCongruence Lattices of A Finite Uniform Lattices
Congruence Lattices of A Finite Uniform Latticesinventionjournals
 
On Estimation of Population Variance Using Auxiliary Information
On Estimation of Population Variance Using Auxiliary InformationOn Estimation of Population Variance Using Auxiliary Information
On Estimation of Population Variance Using Auxiliary Informationinventionjournals
 

Destaque (10)

Inter quiz
Inter quizInter quiz
Inter quiz
 
Compliance with International Financial Reporting Standards
Compliance with International Financial Reporting StandardsCompliance with International Financial Reporting Standards
Compliance with International Financial Reporting Standards
 
Totally R*-Continuous and Totally R*-Irresolute Functions
Totally R*-Continuous and Totally R*-Irresolute FunctionsTotally R*-Continuous and Totally R*-Irresolute Functions
Totally R*-Continuous and Totally R*-Irresolute Functions
 
An Improved Regression Type Estimator of Finite Population Mean using Coeffic...
An Improved Regression Type Estimator of Finite Population Mean using Coeffic...An Improved Regression Type Estimator of Finite Population Mean using Coeffic...
An Improved Regression Type Estimator of Finite Population Mean using Coeffic...
 
MISHRA DISTRIBUTION
MISHRA DISTRIBUTIONMISHRA DISTRIBUTION
MISHRA DISTRIBUTION
 
Inventory Model with Different Deterioration Rates with Stock and Price Depen...
Inventory Model with Different Deterioration Rates with Stock and Price Depen...Inventory Model with Different Deterioration Rates with Stock and Price Depen...
Inventory Model with Different Deterioration Rates with Stock and Price Depen...
 
Probabilistic diameter and its properties.
Probabilistic diameter and its properties.Probabilistic diameter and its properties.
Probabilistic diameter and its properties.
 
On Power Tower of Integers
On Power Tower of IntegersOn Power Tower of Integers
On Power Tower of Integers
 
Congruence Lattices of A Finite Uniform Lattices
Congruence Lattices of A Finite Uniform LatticesCongruence Lattices of A Finite Uniform Lattices
Congruence Lattices of A Finite Uniform Lattices
 
On Estimation of Population Variance Using Auxiliary Information
On Estimation of Population Variance Using Auxiliary InformationOn Estimation of Population Variance Using Auxiliary Information
On Estimation of Population Variance Using Auxiliary Information
 

Semelhante a International Journal of Mathematics and Statistics Invention (IJMSI)

Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from a...
Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from a...Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from a...
Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from a...iosrjce
 
Assessment of-clinical-competence-wass-van-der-vleuten-shatzer-jones
Assessment of-clinical-competence-wass-van-der-vleuten-shatzer-jonesAssessment of-clinical-competence-wass-van-der-vleuten-shatzer-jones
Assessment of-clinical-competence-wass-van-der-vleuten-shatzer-jonesPROIDDBahiana
 
Assessment of-clinical-competence-wass-van-der-vleuten-shatzer-jones
Assessment of-clinical-competence-wass-van-der-vleuten-shatzer-jonesAssessment of-clinical-competence-wass-van-der-vleuten-shatzer-jones
Assessment of-clinical-competence-wass-van-der-vleuten-shatzer-jonesPROIDDBahiana
 
4. qualities of good measuring instrument
4. qualities of good measuring instrument4. qualities of good measuring instrument
4. qualities of good measuring instrumentJohn Paul Hablado
 
Basic Principles of Assessment
Basic Principles of AssessmentBasic Principles of Assessment
Basic Principles of AssessmentYee Bee Choo
 
Psychometrics for Clinical Skills Assessment
Psychometrics for Clinical Skills AssessmentPsychometrics for Clinical Skills Assessment
Psychometrics for Clinical Skills AssessmentINSPIRE_Network
 
Influence of Table of Specification on the Construction of Ordinary Level Phy...
Influence of Table of Specification on the Construction of Ordinary Level Phy...Influence of Table of Specification on the Construction of Ordinary Level Phy...
Influence of Table of Specification on the Construction of Ordinary Level Phy...ijtsrd
 
MCQ test item analysis
MCQ test item analysisMCQ test item analysis
MCQ test item analysisSoha Rashed
 
reliablity and validity in social sciences research
reliablity and validity  in social sciences researchreliablity and validity  in social sciences research
reliablity and validity in social sciences researchSourabh Sharma
 
The impact analysis of psychological reliability of population pilot study fo...
The impact analysis of psychological reliability of population pilot study fo...The impact analysis of psychological reliability of population pilot study fo...
The impact analysis of psychological reliability of population pilot study fo...Dr. Seyed Hossein Fazeli
 
CEFS 521 Quiz -3, Liberty University_3 Version answer, secure HIGHSCORE
CEFS 521 Quiz -3, Liberty University_3 Version answer, secure HIGHSCORECEFS 521 Quiz -3, Liberty University_3 Version answer, secure HIGHSCORE
CEFS 521 Quiz -3, Liberty University_3 Version answer, secure HIGHSCORENiniProton
 
Enhancing MCQ quality & fairness
Enhancing MCQ quality & fairnessEnhancing MCQ quality & fairness
Enhancing MCQ quality & fairnessSusie Macfarlane
 
©2014 Walden University 1 OM004 Improving Patient Saf.docx
©2014 Walden University   1 OM004 Improving Patient Saf.docx©2014 Walden University   1 OM004 Improving Patient Saf.docx
©2014 Walden University 1 OM004 Improving Patient Saf.docxgerardkortney
 
reliability and validity psychology 1234
reliability and validity psychology 1234reliability and validity psychology 1234
reliability and validity psychology 1234MajaAiraBumatay
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnairesVenkitachalam R
 
Adapted from Assessment in Special and incl.docx
Adapted from Assessment in Special and incl.docxAdapted from Assessment in Special and incl.docx
Adapted from Assessment in Special and incl.docxnettletondevon
 

Semelhante a International Journal of Mathematics and Statistics Invention (IJMSI) (20)

Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from a...
Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from a...Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from a...
Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from a...
 
Assessment of-clinical-competence-wass-van-der-vleuten-shatzer-jones
Assessment of-clinical-competence-wass-van-der-vleuten-shatzer-jonesAssessment of-clinical-competence-wass-van-der-vleuten-shatzer-jones
Assessment of-clinical-competence-wass-van-der-vleuten-shatzer-jones
 
Assessment of-clinical-competence-wass-van-der-vleuten-shatzer-jones
Assessment of-clinical-competence-wass-van-der-vleuten-shatzer-jonesAssessment of-clinical-competence-wass-van-der-vleuten-shatzer-jones
Assessment of-clinical-competence-wass-van-der-vleuten-shatzer-jones
 
03 Assessment issues
03 Assessment issues03 Assessment issues
03 Assessment issues
 
EM&E.pptx
EM&E.pptxEM&E.pptx
EM&E.pptx
 
4. qualities of good measuring instrument
4. qualities of good measuring instrument4. qualities of good measuring instrument
4. qualities of good measuring instrument
 
Basic Principles of Assessment
Basic Principles of AssessmentBasic Principles of Assessment
Basic Principles of Assessment
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Psychometrics for Clinical Skills Assessment
Psychometrics for Clinical Skills AssessmentPsychometrics for Clinical Skills Assessment
Psychometrics for Clinical Skills Assessment
 
Influence of Table of Specification on the Construction of Ordinary Level Phy...
Influence of Table of Specification on the Construction of Ordinary Level Phy...Influence of Table of Specification on the Construction of Ordinary Level Phy...
Influence of Table of Specification on the Construction of Ordinary Level Phy...
 
MCQ test item analysis
MCQ test item analysisMCQ test item analysis
MCQ test item analysis
 
reliablity and validity in social sciences research
reliablity and validity  in social sciences researchreliablity and validity  in social sciences research
reliablity and validity in social sciences research
 
The impact analysis of psychological reliability of population pilot study fo...
The impact analysis of psychological reliability of population pilot study fo...The impact analysis of psychological reliability of population pilot study fo...
The impact analysis of psychological reliability of population pilot study fo...
 
CEFS 521 Quiz -3, Liberty University_3 Version answer, secure HIGHSCORE
CEFS 521 Quiz -3, Liberty University_3 Version answer, secure HIGHSCORECEFS 521 Quiz -3, Liberty University_3 Version answer, secure HIGHSCORE
CEFS 521 Quiz -3, Liberty University_3 Version answer, secure HIGHSCORE
 
Enhancing MCQ quality & fairness
Enhancing MCQ quality & fairnessEnhancing MCQ quality & fairness
Enhancing MCQ quality & fairness
 
Analysis of item test
Analysis of item testAnalysis of item test
Analysis of item test
 
©2014 Walden University 1 OM004 Improving Patient Saf.docx
©2014 Walden University   1 OM004 Improving Patient Saf.docx©2014 Walden University   1 OM004 Improving Patient Saf.docx
©2014 Walden University 1 OM004 Improving Patient Saf.docx
 
reliability and validity psychology 1234
reliability and validity psychology 1234reliability and validity psychology 1234
reliability and validity psychology 1234
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnaires
 
Adapted from Assessment in Special and incl.docx
Adapted from Assessment in Special and incl.docxAdapted from Assessment in Special and incl.docx
Adapted from Assessment in Special and incl.docx
 

Último

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

International Journal of Mathematics and Statistics Invention (IJMSI)

  • 1. International Journal of Mathematics and Statistics Invention (IJMSI) E-ISSN: 2321 – 4767 P-ISSN: 2321 - 4759 www.ijmsi.org Volume 2 Issue 1 ǁ January. 2014ǁ PP-06-10 “Reliability of four-response type multiple choice questions of pharmacology summative tests of II M.B.B.S students” Bhavisha N. Vegada, Bharti N. Karelia, Ajita Pillai Department of Pharmacology, P.D.U. Govt. Medical College,Rajkot, Gujarat, India. ABSTRACT: Introduction One of the major concerns in the construction of test items for an examination is ensuring the reliability of the test items. In order for assessments to be sound, they must be free of bias and distortion. Reliability and validity are two concepts that are important for defining and measuring bias and distortion. Internal consistency is an estimate of reliability based on the average correlation among items within a test and examines the degree to which the MCQs in a test measure the same characteristics or domains of knowledge. Methods In this study ten MCQ tests from 2008 to 2012 were selected and analyzed to obtain their mean, standard deviation, reliability coefficient and standard error of measurement. Data entry was done by using Microsoft excel 2007. Results Mean reliability coefficient was 0.54. Out of ten tests, two tests had low reliability, five tests had very low and three tests had questionable reliability. Mean standard deviation of MCQ tests was 3.52 with range of 3.05 to 4.01. Mean standard error of measurement was 2.37 with range of 2.24 to 2.44. Conclusion Reliability of all MCQ tests was low and need improvement. Standard Error of Measurement is more appropriate parameter for reliability than Reliability Coefficient. KEYWORDS: Reliability, Reliability coefficient, Standard error of measurement I. INTRODUCTION The educational objectives in medicine as well as in other discipline are generally allotted to three ‘domains’-cognitive, psychomotor and affective. Hence, medical examination should be designed to answer whether an undergraduate has achieved the above educational objectives by answering the following three questions: (1) what does he know (cognitive)? (2) what can he do (psychomotor)? And (3) what sort of person is he (affective)? Regrettably the current medical examination system still could not answer these questions faithfully. [1] Objectivising evaluation is becoming increasingly more important in the field of education, both for summative & formative purpose, as has been again & again emphasized by guidelines published by several universities. One method of achieving this purpose is the widespread use of objective written items, and the most popular form of which is the multiple choice question (MCQ). [2] Designing MCQ is a complex and time consuming process in a multidisciplinary integrated curriculum. MCQs are used mostly for comprehensive assessment at the end of a semester or academic session and provide feedback to the teachers on their educational action. Having constructed & assessed a test, a teacher needs to know, how good the test questions are & whether the test items were able to reflect students’ performance in the course related to learning. Because of their versatile character, MCQs are the most commonly used tool for assessing the knowledge capabilities of medical students. [3] There are different types of MCQs like fiveresponse, four-response, three-response and true/false or two-response. [4] One of the major concerns in the construction of test items for an examination is ensuring the reliability of the test items. [3] For the assessments to be sound, it should be free from bias and distortion. Reliability and validity are two concepts that are important for defining and measuring bias and distortion. Reliability refers to the extent to which assessments are consistent. Validity refers to the accuracy of an assessment. [5] Concepts related to reliability are consistency, precision, stability, equivalence and internal consistency (Beanlander al1999 p328). www.ijmsi.org 6|Page
  • 2. Reliability of four-response type multiple… Internal consistency is an estimate of 'reliability based on the average correlation among items within a test' (Nunnally& Bernstein 1994 p251) and examines the degree to which the MCQs in a test measure the same characteristics or domains of knowledge (Bean land er al 1999, Polir&Hungler 1999). Typically, internal consistency is measured by the calculation of a reliabiliry coefficient (Cronbach 1990, Beanland et al 1999, Polit & Hungler 1999). [6] Reliability depends both on Standard Error of Measurement (SEM) and on the ability range (standard deviation, SD) of candidates taking an assessment. [7] The present study was taken up with an objective to measure the reliability of MCQs. II. MATERIAL& METHODS The pattern of 1st and 2nd terminal examination of pharmacology subject at our institute consists of 80 marks theory and 50 marks practical examination. Theory examination consists of 20 multiple choice questions of 1 mark each. Year indicate when the 1 st terminal examination was held and A for 1st terminal and B for 2nd terminal examination. 2.1 Data collection MCQ items were taken from the 10 summative test papers from the year 2008-2012 (each year having two terminal examinations). A total of 200 test items were selected for the item analysis. Each MCQ consisted of a stem and four choices and the students were to select one best answer from these four choices. A correct response to an item was awarded 1 mark, while an incorrect response would result in negative 0.25 marks and a no- attempt or blank response was given no mark. 2.2 Data analysis MCQ scores of students of different batches of last five years from 2008 to 2012 were included for analysis and data entry was done by using Microsoft excel 2007. Different statistical parameters like mean, standard deviation, reliability coefficient, standard error of measurement and confidence interval for MCQ tests were calculated. The Equation for Reliability Coefficient is as follow: [8] Alpha = [n/(n - 1)] x [(Vart - ΣVari)/Vart] (1) Alpha = estimated reliability of the full-length test n = number of items Vart = variance of the whole test (standard deviation squared) ΣVari = sum the variance for all n items The values for reliability coefficients range from 0 to 1.0. A coefficient of 0 means no reliability and 1.0 means perfect reliability. Since all tests have some error, reliability coefficients never reach 1.0. Generally, if the reliability of a standardized test is above .80, it is said to have very good reliability; if it is below .50, it would not be considered a very reliable test. [5] Tests with a reliability coefficient 0.90 and above were considered as excellent reliability, those between 0.80-0.90 were considered very good, those between 0.70-0.80 were good, those between 0.60-0.70 were considered low and therefore needs to be supplemented by other measures to determine grades, those between 0.50-0.60 needs revision of test and those with 0.50 and those below were considered to have questionable reliability. [9] The Equation for Standard Error of Measurement is as follow: [10] SEM = S (1-r) ½ Where, S = theStandard Deviation for the test. r = the Reliability coefficient for the test III. (2) RESULTS As shown in Table 1, mean reliability coefficient was 0.54. Out of ten tests, two tests had low reliability, five tests had very low and three tests had questionable reliability. (Table-1) www.ijmsi.org 7|Page
  • 3. Reliability of four-response type multiple… As shown in Table 2, mean standard deviation of MCQ tests was 3.52 with range of 3.05 to 4.01. Mean standard error of measurement was 2.37 with range of 2.24 to 2.44. (Table-2) IV. DISCUSSION The reliability of an examination provides useful information about its performance (and it is selfevident that an examination with a very low reliability is unlikely to be a good or an effective examination, to the point where zero reliability means that the marks from an examination are no more effective than are random numbers at distinguishing between candidates). Having said that, the mere fact that an examination has a high reliability does not ensure that it is necessarily functioning effectively, because the reliability is heavily dependent upon the ability range of the candidates who are taking it. As has already been seen: [7] 1) 2) 3) 4) The very same exam can apparently drop its reliability dramatically if it is retaken but only by those who have already passed it; The reliability can be artificially inflated by encouraging very weak candidates to take it, thereby increasing the SD of the marks; It is almost inevitable where successive examinations are taken, as with the Part 2 Written examination of MRCP(UK) being taken after Part 1, that the SD will necessarily be lower (only able candidates passing Part 1), and that the reliability of a second examination will usually be lower than the first examination. When examinations have very small numbers of candidates, as with the SCEs, there is a greater risk that the reliability will be distorted by an unusually high or low spread of candidate abilities Reliability can always be increased by making an assessment progressively longer, thereby increasing the number of examination items, although that is expensive in time, effort and opportunity cost. [7] Our results showed that two tests have low reliability means they need to be supplemented by other measure and there are probably some items which could be improved. Five tests have very low reliability means they need revision and supplemented by other measure. Three tests have questionable reliability means thesetests should not contribute heavily to the course grade, they need revision. Cortina et alconsider reliability coefficient at least 0.70 or above to be adequate for classroom assessment. [11] None of the test was found to fulfill this criteria ,so our MCQ tests have low reliability. High reliability means that the questions of a test tended to "pull together."Students who answered a given question correctly were more likely to answer other questions correctly. If a parallel test were developed by using similar items, the relative scores of students would show little change. Low reliability means that the questions tended to be unrelated to each other in terms of who answered them correctly. The resulting test scores reflect peculiarities of the items or the testing situation more than students' knowledge of the subject matter. [9] Another way to express reliability is in term of the standard error of measurement. This measure provides an estimate of how much an individual’ score would be expected to change on re-testing with no change in knowledge and perception with the same or an equivalent form of the test. Our result showed that standard deviation of candidate scores showed large variation (3.05-4.01) as compared to variation in standard error of measurement (2.24-2.44). Based on the assumption that any test score contains an error , SEM is used to estimate a band or interval within which a person’s true score would fall, that is the score (hypothetical) the student would receive if there were no error of measurement.[12] The smaller the SEM is; the narrower the interval. Narrow intervals are more precise, containing less error, than larger intervals. SEM is inversely related to the Reliability Coefficient. [13] For example, in our study 2008-A exams ,Mean Observed Score was 8.11 and SEM was 2.33.Wecan say with 95% Confidence ,true score of students of this batch lies in an interval within two SEM of the observed score.(between3.45 and 12.77). An alternative interpretation states that 95 times out of 100 times the students’ score on a retest would be between 3.45 and 12.77. As shown in Table 1, reliability coefficient of first terminal examinations of year 2009 and 2011 were 0.51 and 0.63 respectively but standard error of measurement of these examinations was same 2.44. So for reliability of examinations don’t consider only reliability coefficient but standard error of measurement is also important stastical parameter. SEM is more appropriate for reliability than reliability coefficient. [7] www.ijmsi.org 8|Page
  • 4. Reliability of four-response type multiple… Our results suggest that our MCQ tests have not very reliable tests and need to improve.Reliability also shows problems when numbers of candidates in examinations are low and sampling error affect the range of candidate ability. SEM is not subject to such problems; it is therefore a better measure of the quality of an assessment and is recommended for routine use. [7] V. CONCLUSION Reliability of all MCQ tests was low and need improvement. Standard Error of Measurement is more appropriate parameter for reliability than Reliability Coefficient. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] T. Ho, W. Yip, and J. Tay, The use of multiple choice questions in medical examination: An evaluation of scoring and analysis of results, Singapore Medical Journal, 22(6), 1981, 361-367. N. Ananthakrishnan, Item analysis-validation and banking of MCQs, in N. Ananthakrishnan, K. Sethuraman, S. kumar, (Ed.), Medical Education principles and practice, 2(JIPMER, Pondicherry)131-137. N. Mitra, H. Nagaraja, G. ponndurai, et al, The levels of difficulty and discrimination indices in type A multiple choice questions of pre-clinical semester 1 multidisciplinary summative tests,IeJSME, 3(1), 2009, 2-7. Understanding item analysis reports, [online] available at: http://www.washington.edu/oea/service/scanning-scoring/item_analysis.html [accessed December 20, 2012]. Classroom assessment, [online] available at: http://www.fcit.usf.edu/assessment/basic/basic.html [accessed December 20,2012]. J. Considine, M. Botti, and S. Thomas, Design, format, validity& reliability of multiple choice question for use in nursing research and education, Collegian, 12(1), 2005, 19-24. J. Tighe, I. McManus, N. Dewhurst, L. Chis, andJ. Mucklow, The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP (UK) examinations, BMC Medical Education, 10-40. Introduction to reliability, [online] Avialable at: http://www.ncsu.deu/jlnietfe/EDP560_notes_files/reliability.pdf [accesssed January 10, 2013]. Understanding item analysis reports, [online] Available at: http://www.washington.edu/oea/service/scanningscoring/scanning/itemanalysis [accessed January 10, 2013]. Standard error of measurement, [online] Available at: http://web.sau.edu/WaterStreetMaryA/NEW%20intro%20to%20tests%20&%20measures%20website_files/standard_error_of_m easurement.htm [accessed January 10, 2013]. J. Cortina, What is Coefficient Alpha?An Examination of Theory and Applications, Journal of Applied Psychology, 78(1), 1993, 98-104. Test reliability, [online] Available at: http://www.indians.edu/best/testreliability [accessed January 10, 2013. L. Harvill, Standard error of measurement, Education measurement : issues & practice, summer, 33-41. www.ijmsi.org 9|Page
  • 5. Reliability of four-response type multiple… Table-1 :Reliability coefficient of ten MCQ tests Year 2008 2009 2010 2011 2012 Examination A B A B A B A B A B Mean Reliability Coefficient 0.50 0.54 0.51 0.60 0.66 0.38 0.63 0.53 0.51 0.49 0.54 Table-2: Standard deviation, standard error of measurement and confidence interval of ten MCQ tests Year Examination Mean Standard deviation (SD) 2008 A B A B A B A B A B 08.11 09.49 08.74 09.68 10.46 08.68 09.71 08.02 10.28 07.26 09.04 3.30 3.57 3.49 3.77 3.98 3.05 4.01 3.40 3.20 3.39 3.52 2009 2010 2011 2012 Mean www.ijmsi.org Standard Error of Measurement (SEM) 2.33 2.42 2.44 2.38 2.32 2.40 2.44 2.33 2.24 2.42 2.37 Confidence interval at 95% (CI) (Mean ± 2SEM) 3.45-12.77 4.65-14.33 3.86-13.62 4.92-14.44 5.82-15.10 3.88-13.48 4.83-14.59 3.36-12.68 5.80-14.76 2.42-12.10 10 | P a g e