Information

Criterion versus Convergent Validity

Criterion versus Convergent Validity

I have some difficulties distinguishing criterion and convergent validity. According to Wikipedia, the difference is that criterion validity really focuses on predicting scores on another test, whilst converging validity is concerned with finding correlations between tests or variables that are theoretically assumed to be related (https://en.wikipedia.org/wiki/Concurrent_validity; To avoid any confusion: I know that concurrent validity is a subtype of criterion validity). However, I found a multiple choice question online that confused me:

Scores on a final exam are related to student grade point average, the amount of time spent studying for the exam, and class attendance. What type of validity is demonstrated in this case? A) convergent validity B) discriminant validity C) criterion validity

Apparently, the right answer is A), but I think you could still argue for C) in the following manner: Scores on the final exam is the outcome measure and GPA, amount of time spent studying, and class attendance predict it.


Convergent, incremental, and criterion-related validity of multi-informant assessments of adolescents' fears of negative and positive evaluation

Andres De Los Reyes, Department of Psychology, University of Maryland at College Park, College Park, MD, USA.

Department of Psychology, University of Maryland at College Park, College Park, Maryland, USA

Department of Psychology, University of Maryland at College Park, College Park, Maryland, USA

Department of Psychology, University of Maryland at College Park, College Park, Maryland, USA

Department of Psychology, University of Maryland at College Park, College Park, Maryland, USA

Department of Psychology, University of Maryland at College Park, College Park, Maryland, USA

University of Nebraska Medical Center College of Medicine, Omaha, Nebraska, USA

Department of Psychology, University of Maryland at College Park, College Park, Maryland, USA

Department of Psychology, University of Maryland at College Park, College Park, Maryland, USA

Department of Psychology, University of Maryland at College Park, College Park, Maryland, USA

Department of Psychology, University of Maryland at College Park, College Park, Maryland, USA

Andres De Los Reyes, Department of Psychology, University of Maryland at College Park, College Park, MD, USA.

Institutional Login
Log in to Wiley Online Library

If you have previously obtained access with your personal account, please log in.


American Educational Research Association, American Psychological Association, & National Council of Measurement in Education. (1999). The standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Bradshaw, C. P., Buckley, J., & Lalongo, N. (2008). School-based service utilization among urban children with early-onset educational and mental health problems: The squeaky wheel phenomenon. School Psychology Quarterly, 23, 169–186.

Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York City: Guilford Press.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. New Jersey: Lawrence Erlbaum Associates.

Compton, D. L., Fuchs, D., Fuchs, L. S., & Bryant, J. D. (2006). Selecting at-risk readers in first grade for early intervention: A two-year longitudinal study of decision rules and procedures. Journal of Educational Psychology, 98, 394–409.

Conroy, M. A., Hendrickson, J. M., & Hester, P. P. (2004). Early identification and prevention of emotional and behavioral disorders. In R. B. Rutherford Jr, M. M. Quinn, & S. Mather (Eds.), Handbook of research in behavior disorders (pp. 199–215). New York, NY: Guilford Press.

Costello, E. J., Mustillo, S., Erkanli, A., Keeler, G., & Angold, A. (2003). Prevalence and development of psychiatric disorders in childhood and adolescence. Archives of General Psychiatry, 60, 837–844.

Crick, N. R., Grotpeter, J. K., & Bigbee, M. A. (2002). Relationally and physically aggressive children’s intent attributions and feelings for relational and instrumental peer provocations. Child Development, 73, 1134–1142.

Cullinan, D., & Epstein, M. H. (2013a). Development, reliability, and construct validity of the emotional and behavioral screener. Preventing School Failure, 57, 223–230.

Cullinan, D., & Epstein, M. H. (2013b). Emotional and behavioral screener (EBS). Austin, TX: PRO-ED.

Eisenberg, N., Valiente, C., Spinrad, T. L., Cumbers-land, A., Liew, J., Reiser, M., et al. (2009). Longitudinal relations of children’s effortful control, impulsivity, and negative emotionality to their externalizing, internalizing and co- occurring behavior problems. Developmental Psychology, 45, 988–1008.

Epstein, M. H., & Cullinan, D. (2010). Scales for assessing emotional disturbance (2nd ed.). Austin, TX: PRO-ED.

Essex, M., Kraemer, H. C., Slattery, M. J., Burk, L. R., Boyce, W. T., Woodward, H. R., & Kupfer, D. J. (2009). Screening for childhood mental health problems: Outcomes and early identification. Journal of Child Psychology and Psychiatry, 50, 562–570.

Federal Register. (2006). Assistance to states for the education of children with disabilities and preschool grants for children with disabilities Final Rule 34 (CFR Parts 300 and 301, 71 Fed. Reg. 46540). Washington, DC: U.S. Department of Education.

Feldt, L. S. (1969). A test of the hypothesis that Cronbach’s alpha or Kuder-Richardson coefficient twenty is the same for two tests. Psychometrika, 34, 363–373.

Green, S. B., & Hershberger, S. L. (2000). Correlated errors in true score models and their effect on coefficient alpha. Structural Equation Modeling, 7, 251–270.

Gresham, F. M., & Elliott, S. N. (2008). Social skills rating system—rating scales. Minneapolis: NCS Pearson Inc.

Gresham, F. M., Elliott, S. N., & Kettler, R. J. (2010). Base rates of social skills acquisition/performance deficit, strengths, and problem behaviors: An analysis of the social skills improvement system—rating scales. Psychological Assessment, 22, 809–815.

Gresham, F. M., Elliott, S. N., Vance, M. J., & Cook, C. R. (2011). Comparability of the social skills rating system to the social skills improvement system: Content and psychometric comparisons across elementary and secondary age levels. School Psychology Quarterly, 26, 27–44. doi:10.1037/a0022662.

Hammill, D. D., Brown, L., Bryant, B. R., & Cullinan, D. (1989). A consumer’s guide to tests in print (2nd ed.). Austin, TX: PRO-ED.

Hawken, L. S., Vincent, C. G., & Schumann, J. (2008). Response to intervention for social behavior: Challenges and opportunities. Journal of Emotional and Behavioral Disorders, 16, 213–225.

Hopkins, W. G. (2002). A new view of statistics: A scale of magnitudes for effect statistics. Retrieved from http://www.sportsci.org/resource/stats/.

Horner, R. H., Sugai, G., Smolkowski, K., Eber, L., Nakasato, J., Todd, A. W., et al. (2009). A randomized, wait-list controlled effectiveness trial assessing schoolwide positive behavior support in elementary schools. Journal of Positive Behavior Interventions, 11, 133–144.

Johnson, E. S., Jenkins, J. R., Petscher, Y., & Catts, H. W. (2009). How can we improve the accuracy of screening instruments? Learning Disabilities Research & Practice, 24, 174–185.

Joint Committee on Standards for Educational and Psychological Testing. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Kamphaus, R. W., & Reynolds, C. R. (2007). BASC-2: Behavioral and emotional screening system. Minneapolis, MN: Pearson.

Kamphaus, R. W., Thorpe, J. S., Winsor, A. P., Kroncke, A. P., Dowdy, E. T., & VanDeventer, M. (2007). Development and predictive validity of a teacher screener for child behavioral and emotional problems at school. Educational and Psychological Measurement, 6, 1–15.

Kauffman, J. M., & Landrum, T. J. (2009). Characteristics of emotional and behavioral disorders of children and youth (9th ed.). Upper Saddle River, NJ: Pearson Education Inc.

Lambert, M. C., Epstein, M. H., & Cullinan, D. (2014a). The diagnostic quality of the Emotional and Behavioral Screener. Journal of Psychoeducational Assessment, 32, 51–61. doi:10.1177/0734282913485541.

Lambert, M. C., Epstein, M., Ingram, S., Simpson, A., & Bernstein, S. (2014b). Psychometrics and measurement invariance of the emotional and behavioral screener. Behavior Disorders, 39, 89–101.

Lane, K. L., Menzies, H. M., Oakes, W. P., Lambert, W., Cox, M., & Hankins, K. (2012). A validation of the student risk screening scale for internalizing and externalizing behaviors: Patterns in rural and urban elementary schools. Behavioral Disorders, 37(4), 244–270.

Lane, K. L., Oakes, W., & Menzies, H. (2010). Systematic screenings to prevent the development of learning and behavioral problems: Considerations for practitioners, researchers, and policy makers. Journal of Disability Policy Studies, 21, 160–172.

McDonald, R. P. (1978). Generalizability in factorable domains: Domain validity and generalizability. Educational and Psychological Measurement, 38, 75–79.

Morris, R. J., Shah, K., & Morris, Y. P. (2002). Internalizing behavior disorders. In K. Lane, F. Gresham, & T. O’Shaughnessy (Eds.), Interventions for children with or at risk for emotional and behavioral disorders (pp. 223–241). Boston: Allyn & Bacon.

Mrazek, D., & Mrazek, P. J. (2005). Prevention of psychiatric disorders in children and adolescents. In B. J. Sadock & V. A. Sadock (Eds.), Kaplan & Sadock’s comprehensive textbook of psychiatry (Vol. II, pp. 3513–3518). Philadelphia, PA: Lippincott Williams & Wilkins.

National Research Council and Institute of Medicine. (2009). Preventing mental, emotional, and behavioraldisorders among young people: Progress and possibilities. Committee on the Prevention of Mental Disorders and Substance Abuse Among Children, Youth, and Young Adults: Research Advances and Promising Interventions. In M. E. O’Connell, T. Boat, & K. E. Warner (Eds.). Board on Children, Youth, and Families, Division of Behavioral and Social Sciences and Education. Washington, DC: National Academies Press.

Nordness, P. D., Epstein, M. H., Cullinan, D., & Pierce, C. D. (2014). Emotional and behavioral screener: Test-retest reliability, inter-rater reliability, and convergent validity. Remedial and Special Education., 35, 211–217.

Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill.

Pastor, P. N., Reuben, C. A., & Duran, C. R. (2012). Identifying emotional and behavioral problems in children aged 4–17 years: United States, 2001–2007. National Health Statistics Reports. Number 48. National Center for Health Statistics.

Salvia, J., Yesseldyke, J., & Bolt, S. (2013). Assessment in special and inclusive education (12th ed.). Belmont, CA: Wadsworth.

Severson, H. H., Walker, H. M., Hope-Doolittle, J., Kratochwill, T. R., & Gresham, F. M. (2007). Proactive, early screening to detect behaviorally at-risk students: Issues, approaches, emerging innovations, and professional practices. Journal of School Psychology, 45, 193–223.

Spearman, C. C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295.

Steiger, J. H. (1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87, 245–251.

Sugai, G., & Horner, R. (2006). A promising approach for expanding and sustaining the implementation of school-wide positive behavior support. School Psychology Review, 35, 245–259.

Sugai, G., & Horner, R. H. (2009). Defining and describing schoolwide positive behavior support. In W. Sailor, G. Dunlap, G. Sugai, & R. H. Horner (Eds.), Handbook of positive behavior support (pp. 307–326). New York, NY: Springer. doi:10.1007/978-0-387-09632-2_13.

Walker, H. M., & Severson, H. (1992). Systematic screening for behavior disorders: User’s guide and technical manual. Longmont, CO: Sopris West.

Wood, F., Flowers, L., Meyer, M., & Hill, D. (2002). How to evaluate and compare screening tests: Principles of science and good sense. Paper presented at the meeting of the International Dyslexia Association, Atlanta, GA.


Concurrent Validity vs Convergent Validity

I'm studying for a psych research final right now, and I've been trying to understand and differentiate Concurrent Validity and Convergent Validity! Is there anyone that could explain it and provide some examples?? Sorry if this isn't the right place to post this but I can't think of any other subreddit

Could you possibly explain here what's making it difficult for you to differentiate the two from one another?

Like I'm reading up definitions online and they're saying this for concurrent validity: "Concurrent validity is the extent to which performance on a measurement is related to current performance on a similar, previously established measurement. For example, let’s say employers have designed their own scale to assess the leadership skills of their employees. They give employees the scale to fill out on the same day as a similar, but longer and decades-old leadership test. If the scores for each employee on both tests are closely related, then the new scale is said to have high concurrent validity."

And this for convergent validity:

"Convergent validity refers to the degree to which scores on a test correlate with (or are related to) scores on other tests that are designed to assess the same construct. For example, if scores on a specific form a aggressiveness test are similar to people's scores on other aggressiveness tests, then convergent validity is high (there is a positively correlation between the scores from similar tests of aggressiveness)."

Like to me the examples given just seem so similar, I have no clue how to differentiate them because in both examples they're comparing scores between two tests.


Criterion and convergent validity for 4 measures of pain in a pediatric sickle cell disease population

Objective: To evaluate the psychometric properties of 4 measures of acute pain in youth with sickle cell disease (SCD) during a medical procedure.

Methods: Heart rate, child self-report, parent proxy-report, and observable pain behaviors were examined in 48 youth with SCD ages 2 to 17 years. Criterion validity for acute pain was assessed by responsiveness to a standardized painful stimulus (venipuncture) in a prospective pre-post design. Convergent validity was evaluated through the correlation across measures in reactivity to the stimulus.

Results: Child self-reported pain, parent proxy-report, and behavioral distress scores increased in response to venipuncture (concurrent and convergent validity). In contrast, heart rate did not reliably change in response to venipuncture. Extent of change in response to venipuncture showed moderate intercorrelation across child and parent pain ratings, and behavioral distress. Preprocedure pain ratings correlated with pain experienced during the procedure. An item analysis of observable pain behaviors suggested differences in the presentation of pain in SCD compared with previous pediatric research.

Conclusions: Criterion and convergent validity were demonstrated for child-report, parent-report, and observable pain behaviors. These measures seem to tap into distinct, yet overlapping aspects of the pain experience. Assessment of acute procedural pain responses in SCD requires evaluation of preprocedural pain due to the frequent presence of low-level, baseline pain.


Standardized Assessment

Types of Validity

Concurrent validity . Concurrent validity indicates the amount of agreement between two different assessments. Generally, one assessment is new while the other is well established and has already been proven to be valid. An author of a new assessment would want her assessment to have high concurrent validity with well-respected, well-established assessments.

Construct validity. Construct validity is the ability of the assessment to represent or evaluate the construct in question. This statistic answers the question “Does this assessment tool truly measure what it says it measures?”

Content validity. Content validity refers to the ability of the instrument to measure or evaluate all aspects of the construct it intends to assess. For example, an assessment that examines only socialization or communication would have low content validity for the assessment of autism because the measure ignores repetitive behavior, one of the core domains of impairment found in autism.

Predictive validity. Predictive validity indicates the ability of a measure to predict performance on some outcome variable. For instance, an autism screening measure utilized for infants and toddlers (e.g., BISCUIT – Part 1 Matson, Boisjoli, & Wilkins, 2007 ) should have good predictive validity for future autism diagnoses based on full evaluations. That is, infants and toddlers deemed “at risk” should be more likely to receive an autism diagnosis later when they receive a full diagnostic evaluation.


Test validity [ edit | edit source ]

Reliability and validity [ edit | edit source ]

An early definition of test validity identified it with the degree of correlation between the test and a criterion. Under this definition, one can show that reliability of the test and the criterion places an upper limit on the possible correlation between them (the so-called validity coefficient). Intuitively, this reflects the fact that reliability involves freedom from random error and random errors do not correlate with one another. Thus, the less random error in the variables, the higher the possible correlation between them. Under these definitions, a test cannot have high validity unless it also has high reliability. However, the concept of validity has expanded substantially beyond this early definition and the classical relationship between reliability and validity need not hold for alternative conceptions of reliability and validity. Within classical test theory, predictive or concurrent validity (correlation between the predictor and the predicted) cannot exceed the square root of the correlation between two versions of the same measure — that is, reliability limits validity.

Types [ edit | edit source ]

Test validity can be assessed in a number of ways and thorough test validation typically involves more than one line of evidence in support of the validity of an assessment method (e.g. structured interview, personality survey, etc). The current Standards for Educational and Psychological Testing follow Samuel Messick in discussing various types of validity evidence for a single summative validity judgment. These include construct related evidence, content related evidence, and criterion related evidence which breaks down into two subtypes (concurrent and predictive) according to the timing of the data collection.

Construct validity evidence involves the empirical and theoretical support for the interpretation of the construct. Such lines of evidence include statistical analyses of the internal structure of the test including the relationships between responses to different test items. They also include relationships between the test and measures of other constructs. As currently understood, construct validity is not distinct from the support for the substantive theory of the construct that the test is designed to measure. As such, experiments designed to reveal aspects of the causal role of the construct also contribute to construct validity evidence.

Content validity evidence involves the degree to which the content of the test matches a content domain associated with the construct. For example, a test of the ability to add two-digit numbers should cover the full range of combinations of digits. A test with only one-digit numbers, or only even numbers, would not have good coverage of the content domain. Content related evidence typically involves subject matter experts (SME's) evaluating test items against the test specifications.

Criterion validity evidence involves the correlation between the test and a criterion variable (or variables) taken as representative of the construct. For example, employee selection tests are often validated against measures of job performance. Measures of risk of recidivism among those convicted of a crime can be validated against measures of recidivism. If the test data and criterion data are collected at the same time, this is referred to as concurrent validity evidence. If the test data is collected first in order to predict criterion data collected at a later point in time, then this is referred to as predictive validity evidence.

Construct validity [ edit | edit source ]

Construct validity refers to the totality of evidence about whether a particular operationalization of a construct adequately represents what is intended by theoretical account of the construct being measured. (Demonstrate an element is valid by relating it to another element that is supposedly valid.)

There are two approaches to construct validity- sometimes referred to as 'convergent validity' and 'divergent validity' (or discriminant validity).

Convergent validity [ edit | edit source ]

Convergent validity refers to the degree to which a measure is correlated with other measures that it is theoretically predicted to correlate with.

Discriminant validity [ edit | edit source ]

Discriminant validity describes the degree to which the operationalization does not correlate with other operationalizations that it theoretically should not be correlated with.

Content validity [ edit | edit source ]

Content validity is a non-statistical type of validity that involves “the systematic examination of the test content to determine whether it covers a representative sample of the behaviour domain to be measured” (Anastasi & Urbina, 1997 p. 114).

A test has content validity built into it by careful selection of which items to include (Anastasi & Urbina, 1997). Items are chosen so that they comply with the test specification which is drawn up through a thorough examination of the subject domain. Foxcraft et al. (2004, p. 49) note that by using a panel of experts to review the test specifications and the selection of items the content validity of a test can be improved. The experts will be able to review the items and comment on whether the items cover a representative sample of the behaviour domain.

Representation validity [ edit | edit source ]

Representation validity is also known as translation validity.

Face validity [ edit | edit source ]

Face validity is an estimate of whether a test appears to measure a certain criterion it does not guarantee that the test actually measures phenomena in that domain. Indeed, when a test is subject to faking (malingering), low face validity might make the test more valid.

Face validity is very closely related to content validity. While content validity depends on a theoretical basis for assuming if a test is assessing all domains of a certain criterion (e.g. does assessing addition skills yield in a good measure for mathematical skills? - To answer this you have to know, what different kinds of arithmetic skills mathematical skills include ) face validity relates to whether a test appears to be a good measure or not. This judgment is made on the "face" of the test, thus it can also be judged by the amateur.

Criterion validity [ edit | edit source ]

Criterion-related validity reflects the success of measures used for prediction or estimation. There are two types of criterion-related validity: Concurrent and predictive validity. A good example of criterion-related validity is in the validation of employee selection tests in this case scores on a test or battery of tests is correlated with employee performance scores.

Concurrent validity [ edit | edit source ]

Concurrent validity refers to the degree to which the operationalization correlates with other measures of the same construct that are measured at the same time. Going back to the selection test example, this would mean that the tests are administered to current employees and then correlated with their scores on performance reviews.

Predictive validity [ edit | edit source ]

Predictive validity refers to the degree to which the operationalization can predict (or correlate with) with other measures of the same construct that are measured at some time in the future. Again, with the selection test example, this would mean that the tests are administered to applicants, all applicants are hired, their performance is reviewed at a later time, and then their scores on the two measures are correlated.


Reliability

Reliability refers to the consistency of a measure. Psychologists consider three types of consistency: over time (test-retest reliability), across items (internal consistency), and across different researchers (inter-rater reliability).

Test-Retest Reliability

When researchers measure a construct that they assume to be consistent across time, then the scores they obtain should also be consistent across time. Test-retest reliability is the extent to which this is actually the case. For example, intelligence is generally thought to be consistent across time. A person who is highly intelligent today will be highly intelligent next week. This means that any good measure of intelligence should produce roughly the same scores for this individual next week as it does today. Clearly, a measure that produces highly inconsistent scores over time cannot be a very good measure of a construct that is supposed to be consistent.

Assessing test-retest reliability requires using the measure on a group of people at one time, using it again on the same group of people at a later time, and then looking at test-retest correlation between the two sets of scores. This is typically done by graphing the data in a scatterplot and computing Pearson’s r. Figure 5.2 shows the correlation between two sets of scores of several university students on the Rosenberg Self-Esteem Scale, administered two times, a week apart. Pearson’s r for these data is +.95. In general, a test-retest correlation of +.80 or greater is considered to indicate good reliability.

Figure 5.2 Test-Retest Correlation Between Two Sets of Scores of Several College Students on the Rosenberg Self-Esteem Scale, Given Two Times a Week Apart

Again, high test-retest correlations make sense when the construct being measured is assumed to be consistent over time, which is the case for intelligence, self-esteem, and the Big Five personality dimensions. But other constructs are not assumed to be stable over time. The very nature of mood, for example, is that it changes. So a measure of mood that produced a low test-retest correlation over a period of a month would not be a cause for concern.

Internal Consistency

A second kind of reliability is internal consistency, which is the consistency of people’s responses across the items on a multiple-item measure. In general, all the items on such measures are supposed to reflect the same underlying construct, so people’s scores on those items should be correlated with each other. On the Rosenberg Self-Esteem Scale, people who agree that they are a person of worth should tend to agree that that they have a number of good qualities. If people’s responses to the different items are not correlated with each other, then it would no longer make sense to claim that they are all measuring the same underlying construct. This is as true for behavioural and physiological measures as for self-report measures. For example, people might make a series of bets in a simulated game of roulette as a measure of their level of risk seeking. This measure would be internally consistent to the extent that individual participants’ bets were consistently high or low across trials.

Like test-retest reliability, internal consistency can only be assessed by collecting and analyzing data. One approach is to look at a split-half correlation. This involves splitting the items into two sets, such as the first and second halves of the items or the even- and odd-numbered items. Then a score is computed for each set of items, and the relationship between the two sets of scores is examined. For example, Figure 5.3 shows the split-half correlation between several university students’ scores on the even-numbered items and their scores on the odd-numbered items of the Rosenberg Self-Esteem Scale. Pearson’s r for these data is +.88. A split-half correlation of +.80 or greater is generally considered good internal consistency.

Figure 5.3 Split-Half Correlation Between Several College Students’ Scores on the Even-Numbered Items and Their Scores on the Odd-Numbered Items of the Rosenberg Self-Esteem Scale

Perhaps the most common measure of internal consistency used by researchers in psychology is a statistic called Cronbach’s α (the Greek letter alpha). Conceptually, α is the mean of all possible split-half correlations for a set of items. For example, there are 252 ways to split a set of 10 items into two sets of five. Cronbach’s α would be the mean of the 252 split-half correlations. Note that this is not how α is actually computed, but it is a correct way of interpreting the meaning of this statistic. Again, a value of +.80 or greater is generally taken to indicate good internal consistency.

Interrater Reliability

Many behavioural measures involve significant judgment on the part of an observer or a rater. Inter-rater reliability is the extent to which different observers are consistent in their judgments. For example, if you were interested in measuring university students’ social skills, you could make video recordings of them as they interacted with another student whom they are meeting for the first time. Then you could have two or more observers watch the videos and rate each student’s level of social skills. To the extent that each participant does in fact have some level of social skills that can be detected by an attentive observer, different observers’ ratings should be highly correlated with each other. Inter-rater reliability would also have been measured in Bandura’s Bobo doll study. In this case, the observers’ ratings of how many acts of aggression a particular child committed while playing with the Bobo doll should have been highly positively correlated. Interrater reliability is often assessed using Cronbach’s α when the judgments are quantitative or an analogous statistic called Cohen’s κ (the Greek letter kappa) when they are categorical.


What is Concurrent Validity? (Definition & Examples)

In statistics, we’re often interested in understanding if the value of some explanatory variable can predict the value of some response variable. This response variable is sometimes called a criterion variable.

For example, we might want to know how well some college entrance exam is able to predict the first semester grade point average of students.

The entrance exam would be the explanatory variable and the criterion variable would be the first semester GPA.

We want to know if it’s valid to use this particular explanatory variable as a way to predict the criterion variable. If it is valid, then we say that criterion validity exists.

There are two types of criterion validity:

1. Predictive Validity – This tells us if it’s valid to use the value of one variable to predict the value of some other variable in the future.

2. Concurrent Validity – This tells us if it’s valid to use the value of one variable to predict the value of some other variable measured concurrently (i.e. at the same time).

For example, a company might administer some type of test to see if the scores on the test are correlated with current employee productivity levels.

The benefit of this approach is that we don’t have to wait until some point in the future to take a measurement on the criterion variable we’re interested in.

Note that we usually measure both types of validity using the Pearson Correlation Coefficient, which takes on value between -1 and 1 where:

  • -1 indicates a perfectly negative linear correlation between two variables
  • 0 indicates no linear correlation between two variables
  • 1 indicates a perfectly positive linear correlation between two variables

The further away the correlation coefficient is from zero, the stronger the association between the two variables.

Examples of Concurrent Validity

The following examples illustrate more scenarios in which we can use concurrent validity to determine whether or not some explanatory variable can be used to predict the value of some criterion variable.

Example 1: A Test of Knowledge

A researcher creates a new test that is designed to assess the knowledge of college students in the subject of biology.

The researcher gives out the test to all biology majors at a certain university and compares the scores of his test with their current GPA.

If there is a high correlation between th e grades on his test and the current GPA of the students, we can say that concurrent validity exists.

Example 2: A Test of Endurance

A track coach creates a new endurance challenge that is designed to assess the endurance levels of his athletes. He lets each of his athletes perform the challenge and compares their scores to their current performance levels.

If there is a high correlation between the endurance challenge and the current performance levels, then he can say that concurrent validity exists.

In other words, it would be valid to use the endurance challenge to assess the performance levels of the athletes.

Example 3: A Test of Leadership

A business executive creates a new test to assess the leadership ability of employees at a company. She gives out the test to each employee at a company and compares their scores to current peer-assessed levels of leadership.

If there is a high correlation between the test and current peer-assessed levels of leadership, then she can say that concurrent validity exists.

In other words, it would be valid to use the test to assess leadership levels of the various employees at the company.


Further evaluation of the construct, convergent and criterion validity of the Gambling Urge Scale with university-student gamblers

Background: Research has documented the prevalence of problem gambling among university students, and craving is one factor that may provoke and maintain episodes of gambling.

Objectives: We designed this study to assess elements of construct, convergent and criterion validity of the Gambling Urge Scale (GUS) when administered to regularly gambling university students.

Methods: Students (n = 250) recruited from three universities during the spring semester, 2012, were randomly assigned to one of four conditions to test the impact of cue exposure to one of two types of stimuli (gambling versus non-gambling activity), and two types of presentation format (photographic versus imagery scripts), on current craving to gamble.

Results: Self-reported craving increased significantly following exposure to gambling cues, but not following exposure to engaging non-gambling cues, regardless of the format by which cues were presented. Among those exposed to gambling cues, GUS craving scores were significantly correlated with all three subscales of another measure of craving to gamble, gambling-related problems, passionate attachment to gambling, distorted gambling beliefs and gambling refusal self-efficacy.

Conclusions: These findings provide further support for the construct, convergent and criterion validity of the GUS as a measure of subjective craving in university student gamblers.