- Mohammad Rabbani
- PhD Scholar (LNIPE Gwalior)
We need to be little more precise when we define reliability. In psychological
research, the term ‘reliability’ means repeatability or consistency. A measure is
reliable if it gives the same value in repetitive testing (assuming that what we are
measuring is not changing).
Thus, reliability refers to consistency of a measure. A test is reliable if we get the
same result in multiple testing.
The “consistency of scores obtained by the same individuals when reexamined with
test on different occasions, or with different sets of equivalent items, or under other
variable examining condition
 Thus reliability refers to consistency of scores over a period when all examinees
retain their relative ranks of two separate measurements with the same test, or
consistency of scores when the subjects who obtained high on one set of items also
score high on an equivalent set of items and vice versa.
 The consistency of scores obtained in testing the same person over a period is known
as temporal stability, and the correlation coefficient indicating this temporal
stability is known as coefficient of stability
 The consistency of scores obtained from two equivalent sets of a single test after a
single administration is referred to as the internal consistency. the correlation
coefficient indicating internal consistency is called the coefficient of internal
consistency
 Test-Retest Method of Reliability
 Internal Consistency Reliability
Split-half Test
Kuder–Richardson Test
Cronbach’s Alpha
 Parallel-Forms Reliability
 This test measures temporal stability of a psychological instrument.
 In this method, a test is administered to the same sample on two different occasions. This
kind of reliability is used to assess the consistency of a test across time.
 This approach assumes that there will be no substantial change in the construct being
measured between the two occasions.
 By administering the same measure on two different occasions, we get two sets of scores.
 The coefficient of correlation calculated for these two sets of scores is the reliability
coefficient.
 This reliability coefficient computed through the test and retest method is also known as
temporal stability coefficient. It tells as to what extent the respondents retain their
relative positions as measured in terms of test score over a given period. If the
respondents who obtain low(high) scores on the first administration also obtain low(high)
scores on the second administration, the value of coefficient of correlation between the two
sets of score (test and retest) will be high.
 Higher the value of correlation, more reliable the test is .
Internal consistency refers to the consistency of the results obtained in a test, such that
the various items measuring the different constructs deliver consistent scores.
For example, if a knowledge test consists of 20 questions and if you divide these
questions into two different groups randomly or by using any other method, then the
knowledge test is said to have internal consistency if the correlation of these two sets of
score is high and positive, because in that case both the group of items measure the
same construct, i.e. knowledge test.
Depending on the way items in the test are responded and groupings are made, three
different tests of internal consistency are used.
These tests are
- Split-half reliability,
- Kuder–Richardson’s test .
- Cronbach’s alpha (a).
Kuder–Richardson test is used to assess the reliability of a test.
This test can be used only in a situation where the response of each test item is
dichotomous in nature.
Reliability of a test ensures that the test is consistent.
There are two versions of the Kuder–Richardson test: KR-20 and KR-21.
Use of these tests depends upon whether the test items have varying difficulty or
not.
where n is sample size,
p is proportion of the subjects passing the item,
q is proportion of the subjects failing the item,
M and V are the mean and variance of the test, respectively.
The value of Kuder–Richardson coefficient ranges from 0 to 1, indicating 0 as
no reliability and 1 the perfect reliability.
As the coefficient increases, the reliability of the test also increases. A test can
be considered reliable if the value of the coefficient is 0.5 or more.
Cronbach’s alpha is also a measure of reliability of any test when the response of
test items is measured on Likert scale.
It measures the internal consistency of the test.
Cronbach’s alpha is most widely used in the construction of questionnaire.
Reliability of a questionnaire indicates how well it measures what it supposed to
measure. For instance, if a questionnaire is developed for testing creativity of the
employee, then the high reliability ensures that it actually measures creativity,
while low reliability indicates that it measures something else.
A Likert-type scale is a scale with three to nine responses, in which the
intervals between responses are assumed to be equal.
Strongly agree Agree Undecided Disagree Strongly disagree
In this method, two different tests that are created using the same contents are
compared.
This is done by preparing a large pool of test items that measure the same construct
and then randomly dividing the items into two different tests.
The two tests so developed are administered to the same subjects at the same time.
The correlation between the two sets of data so obtained is calculated, which indicates
the estimate of reliability.
The drawback of this method is that one needs to generate lots of items that reflect
the same construct. This is often not an easy task.
Furthermore, this approach assumes that the randomly divided halves are parallel
or equivalent. Even by chance, this will sometimes not be the case. If the two subsets
of a test are not equivalent, the reliability coefficient may not be the true indicator.
Validity refers to the appropriateness of the test. It ensures that a test measures
the phenomenon for which it has been developed.
Validity refers to what the test is meant to measure and how well it measures.
Validity of a test can be ensured by showing that it measures what it claims to
measure.
Thus, validity can be considered as the measure of correctness and usefulness of the
test.
Test manual should ensure the validity of test and must reveal the population for
which it is valid.
Unlike reliability, there is no single measure of validity or validity coefficient.
-Face Validity -Predictive Validity
-Construct Validity -Concurrent Validity
-Content Validity
-Criterion-Related Validity
This is a very basic form of validity in which an investigator tries to determine whether a measure
appears to measure what it is supposed to measure.
This is not a scientific method of validity as it is not assessed objectively.
Here, we make subjective judgement about the contents to ensure that the test includes all such
dimensions for which it has been developed.
This is the weakest way to demonstrate construct validity of the psychological instrument.
Since face validity depends upon the subjective judgement, it is a weak evidence of validity. Just
because it is weak evidence, it cannot be considered to be wrong. The face validity can be improved
by taking the opinion of different experts on the issue. If they all agree that the instrument
contains all the items which actually measure the construct in question, the face validity is said to
have improved.
A test is said to have construct validity if it shows an association between the test
scores and the prediction of a theoretical trait.
A construct represents a collection of behaviours that are associated in a
meaningful way to represent a phenomenon.
For example, frustration is a construct that represents a personality trait which is
reflected in behaviours such as losing temper, getting irritated, over excited, etc.
The existence of a construct is visible by observing the collection of related
indicators. Any one sign may be associated with several constructs. A person with
difficulty in understanding may have low IQ but may not be frustrated.
To establish construct validity, it is important to prove that one’s data supports the
theoretical structure.
 To ensure the content validity, a test must include the entire range of possible items the
test should cover.
 All the stake holders’ opinion is taken to ensure that all relevant parameters measuring
construct have been included in the test.
 In order to have more accuracy, two experts may be asked to rate the test separately. Items
that are rated as strongly relevant by both the experts should be included in the final test.
 Content validity is also a subjective measure but, unlike face validity, here we ask
whether the content of a measure covers the full domain of the content.
 If a test is to be constructed for measuring the lifestyle, one needs to first decide what
constitutes a relevant domain of contents for the lifestyle.
 For that all the stakeholders like doctors, fitness and lifestyle experts, psychologists and
nutritionists’ views may be taken to ensure that the test consists of all the relevant
dimensions that assess the lifestyle of an individual.
 Content validity is a subjective form of measurement as it relies on people’s perception for
measuring constructs, which would have been otherwise difficult to measure.
 In criterion-related validity accuracy of a test is ensured by comparing it with a test
that has been proved to be valid.
 A correlation coefficient is calculated between the series of scores obtained by the test
in question and the already proved valid test.
 This value of correlation coefficient indicates the validity of the test. Higher correlation
would exist if the test includes all those items which measure the criterion well. The
criterion-related validity is also known as instrumental validity.
 There are two types of criterion-related validity: predictive and concurrent.
 Predictive validity refers to the predictability of the test what it is theoretically able
to predict.
 In other words, it ensures the extent to which a test predicts the expected outcomes.
 In computing predictive validity, the test is administered on the subjects, and after
some time (days, months or year), criterion measures are obtained on the same
subjects.
 If the two results are same, we can conclude that the test has a predictive validity.
 The examples of such tests are test conducted for entrance examination and
personality test for entry into armed forces.
 If the entrance examination results and the annual results are highly correlated, it
ensures the predictive validity of the entrance test.
The concurrent validity refers to the ability of a test to distinguish between groups it
theoretically should be able to.
The test is said to be valid if its results match with the results of the already valid test
to measure the same criterion.
In establishing concurrent validity, the test is conducted on a set of subjects for
measuring a construct. These subjects are again tested by an already known to be the
valid test for measuring the same construct.
The correlation obtained on these two sets of data serves as an indicator of the
concurrent validity. The higher the correlation, the better is the concurrent validity of
the test.
While testing the subjects, we depend on them to answer all questions honestly and
truly. It is assumed that the subjects can answer the questions that are being asked in
the test. For this reason, a pilot study is always a better proposition.

Validity, Reliability ,Objective & Their Types

  • 1.
    - Mohammad Rabbani -PhD Scholar (LNIPE Gwalior)
  • 3.
    We need tobe little more precise when we define reliability. In psychological research, the term ‘reliability’ means repeatability or consistency. A measure is reliable if it gives the same value in repetitive testing (assuming that what we are measuring is not changing). Thus, reliability refers to consistency of a measure. A test is reliable if we get the same result in multiple testing. The “consistency of scores obtained by the same individuals when reexamined with test on different occasions, or with different sets of equivalent items, or under other variable examining condition
  • 4.
     Thus reliabilityrefers to consistency of scores over a period when all examinees retain their relative ranks of two separate measurements with the same test, or consistency of scores when the subjects who obtained high on one set of items also score high on an equivalent set of items and vice versa.  The consistency of scores obtained in testing the same person over a period is known as temporal stability, and the correlation coefficient indicating this temporal stability is known as coefficient of stability  The consistency of scores obtained from two equivalent sets of a single test after a single administration is referred to as the internal consistency. the correlation coefficient indicating internal consistency is called the coefficient of internal consistency
  • 5.
     Test-Retest Methodof Reliability  Internal Consistency Reliability Split-half Test Kuder–Richardson Test Cronbach’s Alpha  Parallel-Forms Reliability
  • 6.
     This testmeasures temporal stability of a psychological instrument.  In this method, a test is administered to the same sample on two different occasions. This kind of reliability is used to assess the consistency of a test across time.  This approach assumes that there will be no substantial change in the construct being measured between the two occasions.  By administering the same measure on two different occasions, we get two sets of scores.  The coefficient of correlation calculated for these two sets of scores is the reliability coefficient.  This reliability coefficient computed through the test and retest method is also known as temporal stability coefficient. It tells as to what extent the respondents retain their relative positions as measured in terms of test score over a given period. If the respondents who obtain low(high) scores on the first administration also obtain low(high) scores on the second administration, the value of coefficient of correlation between the two sets of score (test and retest) will be high.  Higher the value of correlation, more reliable the test is .
  • 7.
    Internal consistency refersto the consistency of the results obtained in a test, such that the various items measuring the different constructs deliver consistent scores. For example, if a knowledge test consists of 20 questions and if you divide these questions into two different groups randomly or by using any other method, then the knowledge test is said to have internal consistency if the correlation of these two sets of score is high and positive, because in that case both the group of items measure the same construct, i.e. knowledge test. Depending on the way items in the test are responded and groupings are made, three different tests of internal consistency are used. These tests are - Split-half reliability, - Kuder–Richardson’s test . - Cronbach’s alpha (a).
  • 8.
    Kuder–Richardson test isused to assess the reliability of a test. This test can be used only in a situation where the response of each test item is dichotomous in nature. Reliability of a test ensures that the test is consistent. There are two versions of the Kuder–Richardson test: KR-20 and KR-21. Use of these tests depends upon whether the test items have varying difficulty or not.
  • 9.
    where n issample size, p is proportion of the subjects passing the item, q is proportion of the subjects failing the item, M and V are the mean and variance of the test, respectively. The value of Kuder–Richardson coefficient ranges from 0 to 1, indicating 0 as no reliability and 1 the perfect reliability. As the coefficient increases, the reliability of the test also increases. A test can be considered reliable if the value of the coefficient is 0.5 or more.
  • 10.
    Cronbach’s alpha isalso a measure of reliability of any test when the response of test items is measured on Likert scale. It measures the internal consistency of the test. Cronbach’s alpha is most widely used in the construction of questionnaire. Reliability of a questionnaire indicates how well it measures what it supposed to measure. For instance, if a questionnaire is developed for testing creativity of the employee, then the high reliability ensures that it actually measures creativity, while low reliability indicates that it measures something else. A Likert-type scale is a scale with three to nine responses, in which the intervals between responses are assumed to be equal. Strongly agree Agree Undecided Disagree Strongly disagree
  • 11.
    In this method,two different tests that are created using the same contents are compared. This is done by preparing a large pool of test items that measure the same construct and then randomly dividing the items into two different tests. The two tests so developed are administered to the same subjects at the same time. The correlation between the two sets of data so obtained is calculated, which indicates the estimate of reliability. The drawback of this method is that one needs to generate lots of items that reflect the same construct. This is often not an easy task. Furthermore, this approach assumes that the randomly divided halves are parallel or equivalent. Even by chance, this will sometimes not be the case. If the two subsets of a test are not equivalent, the reliability coefficient may not be the true indicator.
  • 13.
    Validity refers tothe appropriateness of the test. It ensures that a test measures the phenomenon for which it has been developed. Validity refers to what the test is meant to measure and how well it measures. Validity of a test can be ensured by showing that it measures what it claims to measure. Thus, validity can be considered as the measure of correctness and usefulness of the test. Test manual should ensure the validity of test and must reveal the population for which it is valid. Unlike reliability, there is no single measure of validity or validity coefficient. -Face Validity -Predictive Validity -Construct Validity -Concurrent Validity -Content Validity -Criterion-Related Validity
  • 14.
    This is avery basic form of validity in which an investigator tries to determine whether a measure appears to measure what it is supposed to measure. This is not a scientific method of validity as it is not assessed objectively. Here, we make subjective judgement about the contents to ensure that the test includes all such dimensions for which it has been developed. This is the weakest way to demonstrate construct validity of the psychological instrument. Since face validity depends upon the subjective judgement, it is a weak evidence of validity. Just because it is weak evidence, it cannot be considered to be wrong. The face validity can be improved by taking the opinion of different experts on the issue. If they all agree that the instrument contains all the items which actually measure the construct in question, the face validity is said to have improved.
  • 15.
    A test issaid to have construct validity if it shows an association between the test scores and the prediction of a theoretical trait. A construct represents a collection of behaviours that are associated in a meaningful way to represent a phenomenon. For example, frustration is a construct that represents a personality trait which is reflected in behaviours such as losing temper, getting irritated, over excited, etc. The existence of a construct is visible by observing the collection of related indicators. Any one sign may be associated with several constructs. A person with difficulty in understanding may have low IQ but may not be frustrated. To establish construct validity, it is important to prove that one’s data supports the theoretical structure.
  • 16.
     To ensurethe content validity, a test must include the entire range of possible items the test should cover.  All the stake holders’ opinion is taken to ensure that all relevant parameters measuring construct have been included in the test.  In order to have more accuracy, two experts may be asked to rate the test separately. Items that are rated as strongly relevant by both the experts should be included in the final test.  Content validity is also a subjective measure but, unlike face validity, here we ask whether the content of a measure covers the full domain of the content.  If a test is to be constructed for measuring the lifestyle, one needs to first decide what constitutes a relevant domain of contents for the lifestyle.  For that all the stakeholders like doctors, fitness and lifestyle experts, psychologists and nutritionists’ views may be taken to ensure that the test consists of all the relevant dimensions that assess the lifestyle of an individual.  Content validity is a subjective form of measurement as it relies on people’s perception for measuring constructs, which would have been otherwise difficult to measure.
  • 17.
     In criterion-relatedvalidity accuracy of a test is ensured by comparing it with a test that has been proved to be valid.  A correlation coefficient is calculated between the series of scores obtained by the test in question and the already proved valid test.  This value of correlation coefficient indicates the validity of the test. Higher correlation would exist if the test includes all those items which measure the criterion well. The criterion-related validity is also known as instrumental validity.  There are two types of criterion-related validity: predictive and concurrent.
  • 18.
     Predictive validityrefers to the predictability of the test what it is theoretically able to predict.  In other words, it ensures the extent to which a test predicts the expected outcomes.  In computing predictive validity, the test is administered on the subjects, and after some time (days, months or year), criterion measures are obtained on the same subjects.  If the two results are same, we can conclude that the test has a predictive validity.  The examples of such tests are test conducted for entrance examination and personality test for entry into armed forces.  If the entrance examination results and the annual results are highly correlated, it ensures the predictive validity of the entrance test.
  • 19.
    The concurrent validityrefers to the ability of a test to distinguish between groups it theoretically should be able to. The test is said to be valid if its results match with the results of the already valid test to measure the same criterion. In establishing concurrent validity, the test is conducted on a set of subjects for measuring a construct. These subjects are again tested by an already known to be the valid test for measuring the same construct. The correlation obtained on these two sets of data serves as an indicator of the concurrent validity. The higher the correlation, the better is the concurrent validity of the test. While testing the subjects, we depend on them to answer all questions honestly and truly. It is assumed that the subjects can answer the questions that are being asked in the test. For this reason, a pilot study is always a better proposition.