Mais conteúdo relacionado



  1. The EdResearcher Journal e-Club meeting 23rd June 2021 Topic for discussion Reliability, Validity and Utility in Research Presentation by: Prabha
  2. Contents • What are reliability, validity and utility? • Understanding reliability • Understanding validity • Understanding utility • Summary
  3. What are reliability, validity and utility?
  4. Day 1 Day 2 Day 3 Report 1 Report 2 Report 3 Same every time and is same what was asked Different reports every time or different from what is asked Reliability Validity
  5. What do reliability, validity and utility mean…. • Same information obtained every time the same situation comes - RELIABILITY • Information is what is wanted - VALIDITY • It is practically possible to obtain the information - UTILITY • These qualities helps trust a person or machine • Facilitate outcomes such as making friends, employees, getting service deals for equipments etc.
  6. Gareis and Grant (2008), in Teacher-made Assessments: How to Connect Curriculum, Instruction, and Student Learning p. 33, taken from is-reliability-and-validity/
  7. What are reliability, validity and utility? • Reliability is related to accuracy • Validity is related to success of measuring what is intended • If a tool is valid, it is also reliable BUT if a tool is reliable, its not necessary that its valid • A tool may be reliable and valid, but it should also be practically possible to use it • When experiments, tests, or measuring procedure is reliable and valid, then results from replicated studies can support claims of generalization of findings and contribute to research based evidences
  8. Understanding ‘reliability’
  9. Sources of ‘error’ in measurement • ‘Error’ in measurement means a variation from true reading • ‘Errors’ in measurement can be due to different reasons • The sampling of items – • type of items, • relation to construct and its aspects, • number of items, • How the tool is used • How participants respond – • Guessing • Marking answers incorrectly • Skipping questions by mistake • Misinterpreting test instructions Random error Systematic error
  10. Any measurement can have two types of errors (variations in repeated readings) Random error Caused due to chance Systematic error Caused due to specific reason E.g., if responder is distracted E.g., if responder already has experience in the construct being tested Checking Reliability Checking Validity
  11. Stability Internal Consistency Equivalence Test the measurement tool for …. How to test reliability ? Is the tool giving same result on repeated measurements? Are the items in the tool measuring the same construct? When two people administer the tool will results be same? If we have two versions of the tool, will they give equivalent results?
  12. Stability • Question asked: Is the instrument or data collection (measurement) tool able to give same results on repeated administrations? • The method: test-retest reliability • The instrument is administered two times (about 15 days apart) and the correlation coefficient of the readings obtained both times is used as reliability coefficient • Advantage: can see consistency across time • Limitations: can be affected by … • Memory if duration between tests is less • Maturation if duration between tests is more
  13. E.g., Test-retest correlation for two sets of scores of many college students on Rosenberg Self-Esteem scale Pearson’s r for these data is +.95
  14. Internal Consistency • Question asked: Are the items in the tool measuring the same construct or concept or parts of the same concept? • The method: Split-half reliability, Cronbach’s alpha or Coefficient alpha and Kuder-Richardson Formula 20 (KR-20). • The items are split into two halves. The correlation coefficient for the two sets of readings from the two halves is calculated as the split-half reliability coefficient. • Advantages: Is not affected by memory or maturation effects • Limitations: Does not consider fluctuations across time
  15. E.g., Split-half correlation for sets of odd and even-numbered scores of many college students on Rosenberg Self-Esteem scale Pearson’s r for these data is +.88 Find Split-half correlations for all possible combinations of halves and take their mean. Conceptually, that mean is the Cronbach’s alpha
  16. Equivalence Questions asked are: • When two people administer the tool will results be same? • If we have two versions of the tool, will they give equivalent results? The methods: • Inter-rater reliability • Alternate form test reliability The statistics used include: • Kendall’s tau • Inter-class correlations • Rasch’s item-response model Bannigan & Watson, 2009; Drost, 2011
  17. How to make tests more reliable? Write items clearly Make test instructions easily understood Train raters effectively by making rules of scoring clearly Add more items Obtain a reliability coefficient of 0.7 to 0.8 When situation demands, reliability coefficient of 0.9 can be sought
  18. Understanding Validity
  19. Most attributes that are to be measured in the fields of social sciences like education, are construct variables Example: happiness, intelligence, anxiety, academic achievement, fear, personality, etc. Construct variables are variables or attributes that are abstract in nature. They cannot be universally defined hence understood in the same way. Construct variables have to be operationally defined based on theory about them
  20. Is my tool measuring what I want it to measure? Construct variable: cognitive ability
  21. What are the types of validity? • Content validity at two levels • Face validity • Content validity • Criterion validity – concurrent validity & predictive validity • Construct validity • convergent validity • divergent validity • factorial validity • discriminant validity • Statistical conclusion validity
  22. Face Validity • Check the test tool by subject experts or researchers to see if the items are reasonable, relevant. • Focus is to confirm ‘subject’s acceptance of text’ • Informal Content Validity • Critical review by expert panel to check if content of test tool matches with all aspects of a construct for clarity and completeness • Comparison with literature • Both above • Formal • Content validity index (CVI) or Content validity ratio (CVR) can be used Content Validity at two levels Bannigan & Watson, (2009)
  23. Calculation of Content validity index • Calculate item-wise content validity index (I-CVI) • Number of experts who give ‘very relevant’ / total number of experts • The above measure is calculated for each item on the tool • Items that have I-CVI <0.79 are taken as relevant • Items that have I-CVI between 0.70 and 0.79 are taken as needing revision • Items that have I-CVI below 0.70 are eliminated (Rodrigues, Adachi and Beattie, 2017)
  24. Criterion validity: comparison to established ‘criteria’ Concurrent validity • Test tool is compared to already established ‘criteria’ by conducting both tests at the same time • Procedure applied to scale under development • Correlation of each question with criterion score is used to refine questionnaire • E.g., a culture-relevant short test for self-esteem is created and compared to existing self-esteem tool. Predictive validity • Test tool is compared to already established ‘criteria’ by conducting test tool at one time and criterion tool at a future time • Procedure used to predict if test can predict performance in the construct • Correlation between test scores and targeted outcomes • E.g., TET test score used to predict how well teachers perform in their classes
  25. Construct validity: correlation between test tool and construct under investigation • It is an indirect approach • Relevant when scale or test tool has been developed based on the assumption of a particular theory • Starts with defining the topic or construct to be assessed. Here: • Hypotheses about correlations with other instruments • Respondents who would score low or high • Other findings that can be predicted from the scores
  26. Construct validity Convergent validity Divergent validity Factorial validity Discriminant validity How similar is this scale to other scales measuring same or related concepts How different is this scale from scales measuring different concepts? What are the factors that are exactly covered by the items on the scale? Uses Factor Analysis Can the scale discriminate among people with differing values on the construct? Uses Discriminant Analysis Exploratory factor analysis Confirmatory factor analysis Many variables. Find the variables that would relate to the construct Variables and their relations are known. Use data to test hypothesis of their relations
  27. doi: Factorial Validity
  28. Statistical conclusion validity •Question asked: is the inference obtained on the relationship tested trustworthy and dependable? •Threats to statistical conclusion validity are: • Low statistical power • Violation of assumptions • Reliability of measures • Reliability of treatment • Random irrelevancies of experimental setting • Random heterogeneity of respondents Drost, 2011, pg. 115
  29. Understanding Utility or Practicality
  30. Utility or Practicality of a test • Time to administer • Ease of administering • Easy language • Does not cause boredom to respondents which can increase error • Need to test the scale in different settings as part of reliability • The tests should be cost effective Can my test tool or scale be actually used in the field?
  31. Summary • Reliability and Validity are important checks to ensure trustworthiness of research findings • Reliability addresses random errors in measurement • Validity address systematic errors in measurement • Utility addresses practicability of use of the measurement tool • Reliability measures are in the form of correlation coefficients • Validity measures are in the form of comments, reviews, and correlation coefficients • Correlation coefficients, factor analysis & discriminant analysis can be obtained using statistical software such as SPSS, PSPP and JASP
  32. References • measurement/ • Drost, E.A. (2011) Validity and Reliability in Social Science Research. Education Research and Perspectives, 38(1),105- 123 • Bannigan, K., & Watson, R. (2009). Reliability and validity in a nutshell. Journal of clinical nursing, 18(23), 3237–3243. • Rodrigues, I.B., Adachi, J.D., Beattie, K.A. et al. (2017) Development and validation of a new tool to measure the facilitators, barriers and preferences to exercise in people with osteoporosis. BMC Musculoskelet Disord 18, 540. 017-1914- 5.pdf
  33. Thank you! Questions / Comments please …lets discuss!