Research on “stereotype threat” (Aronson, Quinn, & Spencer, 1998; Steele, 1997; Steele & Aronson, 1995) suggests that the social stigma of intellectual inferiority borne by certain cultural minorities can undermine the standardized test performance and school outcomes of members of these groups. This research tested two assumptions about the necessary conditions for stereotype threat to impair intellectual test performance. First, we tested the hypothesis that to interfere with performance, stereotype threat requires neither a history of stigmatization nor internalized feelings of intellectual inferiority, but can arise and become disruptive as a result of situational pressures alone. Two experiments tested this notion with participants for whom no stereotype of low ability exists in the domain we tested and who, in fact, were selected for high ability in that domain (math-proficient white males). In Study 1 we induced stereotype threat by invoking a comparison with a minority group stereotyped to excel at math (Asians). As predicted, these stereotype-threatened white males performed worse on a difficult math test than a nonstereotype-threatened control group. Study 2 replicated this effect and further tested the assumption that stereotype threat is in part mediated by domain identification and, therefore, most likely to undermine the performances of individuals who are highly identified with the domain being tested. The results are discussed in terms of their implications for the development of stereotype threat theory as well as for standardized testing.
Cohen, G. L., Garcia, J., Apfel, N., Master, A., Reducing
the racial
achievement gap: a social-psychological intervention
Dar-Nimrod, I. and Heine, S. J., Exposure to
Scientific Theories Affects Women's Math Performance
C. Goldin and C. Rouse, Orchestrating Impartiality: The
Impact of “Blind” Auditions on Female Musicians
Gonzales, P.M., Blanton, H., and Williams, K.J. The effects
of
stereotype threat and double-minority status on the test performance of
Latino women
Hunter, J. E., Schmidt, F. L., Racial and gender bias in ability and achievement tests: Resolving the apparent paradox
Keller, J., Blatant Stereotype threat and women’s math
performance:
Self-handicapping as a strategic means to cope with obtrusive negative
performance expectations
Keller, J. & Dauenheimer, D. Stereotype threat in the
classroom:
Dejection mediates the disrupting threat effect on women’s math
performance
Langenfeld, T. E., Test Fairness:Internal and External Investigations of Gender Bias in Mathematics Testing
Martensa, A., Johnsa, M. Greenberga, J., and Schimelb, J.,
Combating stereotype threat: The effect of self-affirmation on
women’s
intellectual performance
McCornack, R.L., McLeod, M.M., Gender Bias in the
Prediction of College
Course Performance
Shepard, L. , Camilli, G. , & Averill, M.,
Comparison of procedures
for detecting test-item bias with both internal and external ability
criteria
Spencer, S.J., Steele, C.M., & Quinn, D.M., Stereotype
threat and
women’s math performance
Steele, C.M., & Aronson, J., Stereotype threat and the
intellectual
test performance of African-Americans
Walton, G. M. and Cohen, G. L., Stereotype Lift
Walton, G. M. and Spencer, S. J., Latent Ability: Grades
and Test Scores Systematically Underestimate the
Intellectual Ability of Negatively Stereotyped Students
Two randomized field experiments tested a social-psychological intervention
designed to improve minority student performance and increase our
understanding of how psychological threat mediates performance in
chronically evaluative real-world environments. We expected that the risk
of
confirming a negative stereotype aimed at one's group could undermine
academic performance in minority students by elevating their level of
psychological threat. We tested whether such psychological threat could be
lessened by having students reaffirm their sense of personal adequacy or
"self-integrity." The intervention, a brief in-class writing assignment,
significantly improved the grades of African American students and reduced
the racial achievement gap by 40%. These results suggest that the racial
achievement gap, a major social concern in the United States, could be
ameliorated by the use of timely and targeted social-psychological
interventions.
Stereotype threat occurs when stereotyped groups perform worse as
their group membership is highlighted. We investigated whether stereotype
threat is affected by accounts for the origins of stereotypes. In two
studies, women who read of genetic causes of sex differences performed
worse
on math tests than those who read of experiential causes.
Claudia Goldin and Cecelia Rouse analyzed the results of musicians'
auditions for positions at U.S. symphony orchestras between 1970 and 1996.
Based on these data,
Goldin and Rouse estimate that the use of a screen in auditions increased
by 50 percent the probability that a woman would be advanced from certain
preliminary
rounds and increased several-fold the likelihood that a woman would be
selected in the final round. Other studies of empirical data have been
conducted for acceptance of
articles in academic journals and fellowship applications.
This study investigated the interactive influences of diagnosticity
instructions, gender, and ethnicity as they related to task performance. In
a laboratory experiment of 120 male and female, Latino and White college
students, both a gender-based and an ethnicity-based stereotype-threat
effect were found to influence performance on a test of mathematical and
spatial ability. Closer inspection revealed that the gender effect was
qualified by ethnicity, whereas the ethnicity effect was not qualified by
gender. This suggests that the ethnicity of Latino women sensitized them to
negative stereotypes about their gender, leading to a performance decrement
in a context in which stereotype threat was activated. In contrast, it
appeared that the gender of Latino women did not sensitize them to negative
stereotypes about their ethnicity, because both male and female Latinos
evidenced ethnicity-based stereotype threat. These findings have
implications for the interplay between multiple group identities as they
relate to concern for confirming negative stereotypes.
The study of potential racial and gender bias in individual test items is a major research area today. The fact that research has established that total scores on ability and achievement tests are predictively unbiased raises the question of whether there is in fact any real bias at the item level. No theoretical rationale for expecting such bias has been advanced. It appears that findings of item bias (differential item functioning; DIF) can be explained by three factors: failure to control for measurement error in ability estimates, violations of the unidimensionality assumption required by DIF detection methods, and reliance on significance testing (causing tiny artifactual DIF effects to be statistically significant because sample sizes are very large). After taking into account these artifacts, there appears to be no evidence that items on currently used tests function differently in different racial and gender groups
Examined the impact of increased salience of negative stereotypic
expectations on math performance among high school students. Results
indicated that female students in the condition of heightened salience of
negative stereotypic expectations underperformed in comparison to their
control group counterparts. The effect of blatant stereotype threat
resulted
in increased self-handicapping tendencies in women, which led to
significantly impaired math performance
Research on stereotype threat, which is defined as the risk of confirming a
negative stereotypic expectation about oneÕs group, has demonstrated that
the applicability of negative stereotypes disrupts the performance of
stigmatized social groups. While it has been shown that a reduction of
stereotype threat leads to improved performance by members of stigmatized
groups, there is a lack of clear-cut findings about the mediating
processes.
The aim of the present study is to provide a better understanding of the
mechanisms that stereotype threat causes in women working on mathematical
problems. In addition, the study set out to test stereotype threat theory
in
a natural environment: high school classrooms. The experiment involved the
manipulation of the gender fairness of a math test. The results indicate
that the stereotype threat effect exists in this everyday setting.
Moreover,
it appears that dejection emotions mediate the effect of threat
manipulation.
What two major approaches have been used to study gender bias in test scores? How do statistical DIF detection methods differ? How does DIF screening of items affect mean score differences?
The present studies were designed to investigate the effects of
self-affirmation on the performance of women under stereotype threat. In
Study 1, women performed worse on a difficult math test when it was
described as diagnostic of math intelligence (stereotype threat condition)
than in a non-diagnostic control condition. However, when women under
stereotype threat affirmed a valued attribute, they performed at levels
comparable to men and to women in the no-threat control condition. In Study
2, men and women worked on a spatial rotation test and were told that women
were stereotyped as inferior on such tasks. Approximately half the women
and
men self-affirmed before beginning the test. Self-affirmation improved the
performance of women under threat, but did not affect men’s performance.
Is the relationship of college grades to the traditional predictors of
aptitude test scores and high school grades different for men and women?
The usual gender bias of underpredicting the grade point averages of women
may result from gender-related course selection effects. This study
controlled course selection effects by predicting single course grades
rather than a composite grade from several courses. In most of the large
introductory courses studied, no gender bias was found that would hold up
on cross-validation in a subsequent semester. Usually, it was
counterproductive to adjust grade predictions according to gender. Grade
point average was predicted more accurately than single course grades.
Test bias is conceptualized as differential validity. Statistical
techniques for detecting biased items work by identifying items that may
be measuring different things for different groups; they identify deviant
or anomalous items in the context of other items. The conceptual basis and
technical soundness were reviewed for the following item bias methods:
transformed item difficulties, item discriminations, one- and
three-parameter item characteristic curve methods, and chi-square methods.
Sixteen bias indices representing these approaches were computed for
black-white and Chicano-white comparisons on both the verbal and nonverbal
Lorge-Thorndike Intelligence Tests. In addition, bias indices were
recomputed for the Lorge-Thorndike tests using an external criterion.
Convergent validity among bias methods was examined in correlation
matrices, by factor analysis of the method correlations, and by ratios of
agreements in the items found to be "most biased" by each method. Although
evidence of convergent validity was found, there will still be important
practical differences in the items identified as biased by different
methods. The signed full chi-square procedure may be an acceptable
substitute for the theoretically preferred but more costly three-parameter
signed indices. The external criterion results also reflect on the
validity of the methods; arguments were advanced, however, as to why
internal bias methods should not be thought of as proxies for a predictive
validity model of unbiasedness.
Shepard, L., Camilli, G., & Williams, D., Accounting for
statistical
artifacts in item bias research
Theoretically preferred IRT bias detection procedures were applied to both
a mathematics achievement and vocabulary test. The data were from black
and white seniors on the High School and Beyond data files. To account for
statistical artifacts, each analysis was repeated on randomly equivalent
samples of blacks and whites (n's = 1,500). Furthermore, to establish a
baseline for judging bias indices that might be attributable only to
sampling fluctuations, bias analyses were conducted comparing randomly
selected groups of whites. To assess the effect of mean group differences
on the appearance of bias, pseudo-ethnic groups were created, that is,
samples of whites were selected to simulate the average black-white
difference. The validity and sensitivity of the IRT bias indices was
supported by several findings. A relatively large number of items (10 of
29) on the math test were found to be consistently biased; they were
replicated in parallel analyses. The bias indices were substantially
smaller in white-white analyses. Furthermore, the indices (with the
possible exception of ? 2) did not find bias in the pseudo-ethnic
comparison. The pattern of between-study correlations showed high
consistency for parallel ethnic analyses where bias was plausibly present.
Also, the indices met the discriminant validity test-the correlations were
low between conditions where bias should not be present. For the math
test, where a substantial number of items appeared biased, the results
were interpretable. Verbal math problems were systematically more
difficult for blacks. Overall, the sums-of-squares statistics (weighted by
the inverse of the variance errors) were judged to be the best indices for
quantifying ICC differences between groups. Not only were these statistics
the most consistent in detecting bias in the ethnic comparisons, but they
also intercorrelated the least in situations of no bias.
Recent studies have documented that performance in a domain is hindered
when
individuals feel that a sociocultural group to which they belong is
negatively stereotyped in that domain. We report that implicit activation
of
a social identity can facilitate as well as impede performance on a
quantitative task. When a particular social identity was made salient at an
implicit level, performance was altered in the direction predicted by the
stereotype associated with the identity. Common cultural stereotypes hold
that Asians have superior quantitative skills compared with other ethnic
groups and that women have inferior quantitative skills compared with men.
We found that Asian-American women performed better on a mathematics test
when their ethnic identity was activated, but worse when their gender
identity was activated, compared with a control group who had neither
identity activated. Cross-cultural investigation indicated that it was the
stereotype, and not the identity per se, that influenced performance.
When women perform math, unlike men, they risk being judged by the negative
stereotype that women have weaker math ability. We call this predicament
stereotype threat and hypothesize that the apprehension it causes may
disrupt women's math performance. In Study 1 we demonstrated that the
pattern observed in the literature that women underperform on difficult
(but
not easy) math tests was observed among a highly selected sample of men and
women. In Study 2 we demonstrated that this difference in performance could
be eliminated when we lowered stereotype threat by describing the test as
not producing gender differences. However, when the test was described as
producing gender differences and stereotype threat was high, women
performed
substantially worse than equally qualified men did. A third experiment
replicated this finding with a less highly selected population and explored
the mediation of the effect. The implication that stereotype threat may
underlie gender differences in advanced math performance, even those that
have been attributed to genetically rooted sex differences, is discussed.
Stereotype threat is being at risk of confirming, as self-characteristic, a
negative stereotype about one's group. Studies 1 and 2 varied the
stereotype
vulnerability of Black participants taking a difficult verbal test by
varying whether or not their performance was ostensibly diagnostic of
ability, and thus, whether or not they were at risk of fulfilling the
racial
stereotype about their intellectual ability. Reflecting the pressure of
this
vulnerability, Blacks underperformed in relation to Whites in the
ability-diagnostic condition but not in the nondiagnostic condition (with
Scholastic Aptitude Tests controlled). Study 3 validated that
ability-diagnosticity cognitively activated the racial stereotype in these
participants and motivated them not to conform to it, or to be judged by
it.
Study 4 showed that mere salience of the stereotype could impair Blacks'
performance even when the test was not ability diagnostic. The role of
stereotype vulnerability in the standardized test performance of
ability-stigmatized groups is discussed.
When a negative stereotype impugns the ability or worth of an outgroup,
people may experience stereotype lift - a performance boost that occurs
when downward comparisons are made with a denigrated outgroup. In a
meta-analytic review, members of non-stereotyped groups were found to
perform better when a negative stereotype about an outgroup was linked
to an intellectual test than when it was not (d =.24, p < .0001). Notably,
people appear to link negative stereotypes to evaluative tests more or
less automatically. Simply presenting a test as diagnostic of ability
was thus sufficient to induce stereotype lift. Only when negative
stereotypes were explicitly invalidated or rendered irrelevant to the
test did the lift effect disappear.
Past research has assumed that group differences
in academic performance entirely reflect genuine
differences in ability. In contrast, extending research on
stereotype threat, we suggest that standard measures of
academic performance are biased against non-Asian ethnic
minorities and against women in quantitative fields.
This bias results not from the content of performance
measures, but from the context in which they are assessed -
from psychological threats in common academic
environments, which depress the performances of people
targeted by negative intellectual stereotypes. Like the time
of a track star running into a stiff headwind, such performances
underestimate the true ability of stereotyped
students. Two meta-analyses, combining data from 18,976
students in five countries, tested this latent-ability hypothesis.
Both meta-analyses found that, under conditions
that reduce psychological threat, stereotyped students
performed better than nonstereotyped students at the same
level of past performance. Walton & Spencer discuss implications for the
interpretation of and remedies for achievement gaps.