Educational Achievement Systems, Seattle

*The following article is a summary of a chapter in Adams, G.,
& Engelmann, S. (1996). Research on Direct Instruction. Ordering
information follows the article.*

**Project Participants and Models**

The Follow Through project was the largest, most expensive
educational experiment ever conducted. This federal program was
originally designed to be a service-oriented project similar to Head
Start. However, because of funding cutbacks the emphasis was shifted
from service to program evaluation. Over 75,000 low income children
in 170 communities were involved in this massive project designed to
evaluate different approaches to educating economically disadvantaged
students from kindergarten through grade 3. State, school, and
national officials nominated school districts that had high numbers
of economically disadvantaged students. Parent representatives of
these school districts chose to participate after hearing
presentations from the 20 different program designers (sponsors).
Each participating district implemented the selected sponsor's
approach in one or more schools. For participating, each district
received $750 per student beyond the normal level of funding.

Each sponsor was required to:

·"provide the community with a well-defined, theoretically consistent and coherent approach that could be adapted to local conditions;

·provide the continuous technical assistance, training, and guidance necessary for local implementation of the approach;

·exercise a 'quality control' function by consistently monitoring the progress of program implementation;

·serve as an agent for change as well as a source of program consistency by asking the community in retaining a consistent focus on the objectives and requirements of the approach rather than responding in an ad hoc manner to the daily pressures of project operations;

·ensure implementation of a total program, rather than a small fragment, such as reading, with a resulting possibility for a major impact on the child's life, and

·provide a foundation for comprehending and describing results of evaluation efforts" (Stebbins, St. Pierre & Proper, 1977, p. 5)

The orientation of the sponsors varied from the loosely-structured
open classroom approach to the highly-structured behavior analysis
approach. Nine of the original sponsors qualified for inclusion in
the evaluation. To be included, a sponsor had to have more than three
active sites that could be compared to control sites in the same
communities.

Abt Associates used the system developed by White to classify the
approaches of the different models. The first dimension was the
theoretical orientation of the models:

·The behavioristic approach is based on the belief that all behaviors are learned. The reason that disadvantaged children are behind is because no one has taught them necessary social and academic skills. The training is based on selecting the behavioral objectives that are needed. Then teachers reinforce the steps in the behavioral objectives. The general label for this group became the Basic Skills Models.

·The cognitive development approach is based on the sequence of normal cognitive growth. The reason that disadvantaged children are behind is because they have insufficient normal cognitive experiences. The orientation of this approach is to provide interactions between children and teachers. During these interactions, children learn how to solve problems and learn verbal skills based on a self-directed process. Emphasis is placed on the teacher providing age-appropriate cognitive materials and experiences. The general label for this group was the Cognitive/Conceptual Skills Models.

·The psychodynamic approach is based on the assumption that socioemotional development (the development of the "whole child") is essential to educational improvement. Emphasis is placed on trying to improve children's self-esteem and peer interactions. The goal for the teacher is to provide an environment in which children can move toward the goal of self-actualization through children making their own free choices. However, it is assumed that children know what is best for their personal growth. The general label for this group was the Affective Skills Models.

**Basic Skills Models
**Direct Instruction Model (University of Oregon)p;Developed by
Siegfried Engelmann and Wes Becker, this model used the DISTAR
(DISTAR is an acronym for Direct Instruction System for Teaching And
Remediation) reading, arithmetic, and language programs. The model
assumes that the teacher is responsible for what the children
learn.

Behavior Analysis Model (University of Kansas)p;Developed by Donald Bushell, this model used a behavioral (reinforcement) approach for teaching reading, arithmetic, handwriting, and spelling. Social praise and tokens were given to the children for correct responses and the tokens were traded for desired activities. Teachers used programmed reading programs in which the task was presented in small steps. The instructional program was not specified by the model. Two sites used the DISTAR materials. Many used Sullivan Programmed Phonics. Students were monitored and corrective procedures were implemented to ensure student progress.

Language Development (Bilingual) Model (Southwest Educational Developmental Laboratory)p;This curriculum-based model used an eclectic approach based on language development. When appropriate, material was presented first in Spanish and then in English.

Cognitively-Oriented Curriculum (High Scope Foundation)p;This popular program was directed by David Weikart and was based on Piaget's belief that there are underlying cognitive processes. Children were encouraged to schedule their own activities and then follow their schedules. The teacher modeled language through the use of labeling and explaining causal relationships. Also, the teacher fostered a positive self-concept through the way the students were given choices.

Florida Parent Education Model (University of Florida)p;Based on the work of Ira Gordon, this program taught parents of disadvantaged children to teach their children. At the same time, students were taught in the classroom using a Piagetian approach. Parent trainers coordinated the teaching. Emphasis included not only language instruction, but also affective, motor, and cognitive skill instruction.

Tucson Early Education Model (University of Arizona)p;Developed by Marie Hughes, TEEM used a language-experience approach (much like the whole language approach) that attempted to elaborate the child's present experience and interest. The model was based on the assumption that children have different learning styles so the child-directed choices are important. The teacher assists by helping children compare, recall, and locate relationships.

Bank Street College Model (Bank Street College of Education)p;This model used the traditional middle-class nursery school approach that was adopted by Head Start. Through the use of learning centers, children had many options, such as counting blocks and quiet areas of reading. The teacher is responsible for program implementation by taking advantage of learning situations. The classroom is structured to increase learning opportunities.

Open Education Model (Education Development Center)p;Derived from the British Infant School model, this model focuses on building the children's responsibility for their own learning. Reading and writing were not taught directly, but through stimulating a desire to communicate.

Responsive Education Model (Far West Laboratory)p;Developed by Glenn Nimict, this is an eclectic model using the work of O.K. Moore, Maria Montessori, and Martin Deutsch. The model used learning centers and the child's interests to determine when and where the child is stationed. The development of self-esteem is considered essential to the acquisition of academic skills.

Each model had 4 to 8 sites with children that started school in kindergarten and some models also had sites with children that started in first grade. Each Follow Through (FT) school district identified a non-Follow Through (NFT) comparison school for each Follow Through site. The comparison school acted as a control group. Unfortunately, the NFT sites that were selected tended to have children who were less economically disadvantaged than the Follow Through sites. Because of this problem, Abt Associates used a covariance statistical analysis process to adjust for initial differences.

A total of 9,255 FT and 6,485 NFT children were in the final analysis group. Students in each school district site were tested at entry and then each spring until the third grade. The DI Model group included low income students in 20 communities. These communities varied widelyp;rural and urbanp;blacks, whites, Mexican Americans, Spanish American, Native Americans, and a diverse mixture of other ethnic groups.

The Stanford Research Institute was initially awarded a contract for data collection and Abt Associates received a contract for data analysis. The Office of Education determined the final design of the project with consultation from the Huron Institute. Because the sponsors had different approaches, the data collection was comprehensive. Assessment information was collected in the areas of basic skills (academic), cognitive, and affective behavior. The process of selecting appropriate assessment instruments was an arduous task given the time constraints of trying to select the most reliable, valid tests that could be administered in the least amount of time.

The following tests were used to assess basic skills, cognitive, and affective achievement: the Metropolitan Achievement Test (MAT), the Wide Range Achievement Test (WRAT), the Raven's Colored Progressive Matrices, the Intellectual Achievement Responsibility Scale (IARS+ and IARS-), and the Coopersmith Self-Esteem Inventory. The MAT is a respected achievement test that assesseses Basic Skills and Cognitive-Conceptual Skills. The Basic Skills scales of the MAT included Listening for Sound (sound-symbol relationships), Word Knowledge (vocabulary words), Word Analysis (word identification), Mathematic Computation (math calculations), Spelling, and Language (punctuation, capitalization, and word usage). The WRAT measured number recognition, spelling, word reading, and oral and written math problems.

The Cognitive Skills scales of the MAT included Reading (comprehension of written passages), Mathematics Concepts (knowledge of math principles and relationships), Mathematical Problem Solving (the use of reasoning with numbers). Also, the Raven's Coloured Progressive Matrices was used. The Raven's test, however, did not prove to discriminate between models or show change in scores over time.

Affective Skills was assessed using two instruments. The IARS was designed to assess whether children attribute their success (+) or failures (-) to themselves or external forces. The Coopersmith Self-Esteem Inventory is designed to assess how children feel about themselves, the way they think other people feel about them, and their feelings about school.

**Comparisons Across Follow Through Sponsors**

Students started in either kindergarten or first grade and were
retested yearly through the end of third grade. While critics have
complained about test selection and have usually suggested more
testing, the assessment effort of this study was well beyond any
other educational study conducted before, or since.

**Significant Outcome Comparison**

Abt Associates analyzed the data by comparing each Follow Through
model's scores to both the local comparison group and the national
pooled comparison group (created by combining the comparison groups
from all nine Follow Through models). Local comparison scores and
national pooled comparison scores were used as covariates to analyze
each variable. A plus (+) was given if (a) the Follow Through (FT)
group exceeded the Non-Follow Through (NFT) group by one-fourth
standard deviation (.25 effect size) and (b) the difference was
statistically significant. A minus (-) was given if the NFT score
exceeded the FT score by one-fourth standard deviation (.25 effect
size) and was statistically significant. If the results did not reach
either the plus or the minus criterion, the difference was null and
left blank.

The following index is based on a comparison of each model's site
with the local and pooled national comparison groups. If either the
pooled or local comparison were plus (+), the effect is recorded as a
plus. If either or both was a minus (-), the effect is recorded as a
minus. Then the plus and minus values are summed and multiplied by
100 so the possible range of scores was from -100 to 100. If the
Follow Through model group scored consistently higher than the
comparison group on a variable, then the index would be a positive
number. If the comparison group scored higher, the index would be
negative. If there was no difference between the two groups, the
score would be zero (0.00).

Figure 1 shows the results of this analysis. As you can see by the
number of negative scores, the local or national pooled comparison
group scores were higher than most Follow Through models.

Only the Direct Instruction model had positive scores on all three
types of outcomes (Basic Skills, Cognitive, and Affective). Overall,
the Direct Instruction model was highest on all three types of
measures.

Figure 1: Significant Outcomes Comparison Across Follow Through
Models

The results were very different from expectations suggested by the
model orientations. The three programs in the Basic Skills model had
the best basic skills, cognitive skills, and affective skills scores.
Of the three orientations, the Basic Skills models (Direct
Instruction, Behavior Analysis, and Southwest Lab) had the best basic
skills scores. The Cognitive models (Parent Education, TEEM, and
Cognitively-Oriented Curriculum) ranked second in cognitive skills
scores; however, the average rank of 5.0 is far from the average rank
of 2.8 for the Basic Skills model. The Affective Models had the worst
affective ranks (6.7 compared to 2.7 for the Basic Skills
models).

Figure 1 provides more details on the models' rankings. The DI model
had, by far, the highest basic skills scores while the other two
Basic Skills models had more modest results (the Behavior Analysis
model had a slight positive score and the Southwest Labs model score
was 0.0).

Figure 1 also shows that none of the Cognitive Models had positive
cognitive scores. In fact, the Direct Instruction Model was the only
model of the nine that had a positive cognitive score (and the
results were extremely positive - over 35%). In contrast, students in
two of the three cognitively-oriented models [TEEM and Cognitive
Curriculum (High Scope)] had the lowest cognitive scores.

Critics have often complained that the DI model was a pressure cooker
environment that would negatively impact students' social growth and
self-esteem. As the Abt Associates' authors note:

Critics of the model have predicted that the emphasis of the model on
tightly controlled instruction might discourage children from freely
expressing themselves and thus inhibit the development of self-esteem
and other affective skills. (Stebbins, St. Pierre & Proper, p.
8)

Because of this expectation, the affective scores are of interest.
Three of the five lowest scoring models on the affective domain were
models that targeted improving affective behavior; none of the
affective models had positive affective scores. In contrast, all
Basic Skills models had positive affective scores with the Direct
Instruction model achieving the highest scores. The theory that an
emphasis on basic skills instruction would have a negative impact on
affective behavior is not supported by the data. Instead, it appears
that the models that focused on an affective education not only had a
negative impact on their students' basic skills and cognitive skills,
but also on their affective skills.

**Fine Tuning the Results
**

·Using the site means as the dependent variable.

·Using these site scores as covariates: socio-economic status and ethnic and linguistic difference from the mainstream.

·Using only models that had data from 6 or more sites.

Each model had the possibility of 77 statistically significant
differences (7 other models times 11 MAT subscale scores). Fifty of
the 77 (65%) possible differences for the DI group were statistically
significant based on Newman-Keuls Tests p=.05). In contrast, the
Behavior Analysis group showed only 18 of 77 (23%) significant
differences.

None of the other six models showed any statistically significant
differences on any of the 11 MAT subscales (0 of 396 possible
combinations). This means, for example, that none of the 11 MAT Bank
Street scores differed significantly from any of the Responsive
Education, TEEM, Cognitive Curriculum, Parent Education, or Open
Education mean scores.

Another way of showing the difference between models was through the
use of effect size comparisons. Figure 2 shows a different display of
the information provided by Bereiter and Kurland (also Figure 2 in
the Bereiter & Kurland review). In Figure 2, the effect size of
the DI model is compared to the average effect size for the other
Follow Through models. The differences are dramatic, even though the
DI data include the Grand Rapids site that did not truly implement
the DI model. The differences would be greater if only DI sites with
implementation fidelity were included.

Figure 2: Effect Size Comparison
(DI to Other Models)

To provide a clearer picture of the differences, Figures 3-4 display
the Bereiter-Kurland findings according to domain. First, Figure 3
shows a comparison of effects for the Basic Skills scores between the
DI group and the average effect size of the other Follow Though
groups. Remember an effect size of .25 is thought to be educationally
significant. Differences in some MAT Basic Skills subscales scores
are over 3.0 (Total Language and Language B). The average difference
in Basic Skills scores between Direct Instruction and the other
models was 1.8.

Figure 3: Bereiter Analysis of
Basic Skills Abt Data*

Figure 4 shows the differences in the cognitive scores between the DI
models and the average Follow Through model. Effect sizes are above
1.0 for all but one difference.

Figure 4: Bereiter Analysis of
Cognitive Ability Abt Data

Overall, the Bereiter-Kurland reanalysis provides even stronger
support for the effectiveness of Direct Instruction. As the authors
noted, only the DI and Behavior Analysis models had positive results
and the DI model results were vastly superior.

**Changing the Abt Report Criteria**

Becker and Carnine (1981) had two other complaints about the Abt
Associates report, which resulted in the report underrepresenting the
superiority of the DI model. First, because of the problem of
mismatches between comparison groups that initially had higher entry
scores than the Follow Through model groups, Abt Associates deleted
these data from subsequent analyses. Unfortunately for the DI model,
sometimes the scores for the comparison groups were significantly
higher at entry, but by the end of third grade the DI group scored
significantly higher than the comparison groups. Abt Associates
decided to delete these groups because of the initial entry
differences. Also, data were excluded if there were significant
differences between the two groups in preschool experience per site,
even though preschool experience (e.g., Head Start) had only a very
low correlation with later achievement (-0.09). (This variable was
not used in the previously cited Bereiter-Kurland study.) Overall,
approximately one-third of the data was excluded from most Follow
Through models because of these decision rules.

Figures 5-7 show the differences in results based on these analyses.
When data were kept for sites where there were initial performance
differences, the highest scoring model (DI) scored even higher
whereas the lower scoring models (Cognitive Curriculum and Open
Education) scored even lower. The scores for the other models stayed
roughly the same.

Figure 5: Index for Significant Outcomes for Cognitive Measures

Figure 6: Index for Significant
Outcomes for Basic Skills Measures

Figure 7: Index for Significant
Outcomes for Affective Measures

Figure 8: Percentile scores across nine Follow Through models

Becker and Carnine re-analyzed the data without the Grand Rapids
site. The Grand Rapids site stopped using Direct Instruction when
there was a change in program director. Even though this problem was
well documented, Abt Associates included the Grand Rapids site in the
DI data. Becker and Carnine reanalyzed the Abt Associates results
without the Grand Rapids site. Figures 6-8 shows the already high
scores for the DI group became even higher when the Grand Rapids data
were removed.

**Norm-Referenced Comparisons
**Another way of looking at the Abt Associates data is to compare
median grade-equivalent scores on the normed-referenced Metropolitan
Achievement Test that was used to evaluate academic progress. Unlike
the previous analysis that compared model data to local and pooled
national sites, the following norm-referenced comparisons show each
model's MAT scores based on the MAT norms. Figure 8 shows the results
across four academic subjects. The comparisons are made to a baseline
rate of the 20th percentile which was the average expectation of
disadvantaged children without special help. The figure displays the
results in one-fourth standard deviation intervals.

Clearly, children in the DI model showed consistently higher scores than the other models, and also the students in the Southwest Lab and the Open Education model were below expected levels of achievement based on norms of performance in traditional schools in all four academic subjects.

Only three of 32 possible reading scores of the other eight models were above the 30th percentile. The DI students scored 7 percentile points higher than the second place group (Behavior Analysis) and over 20 percentile points higher than the Cognitive Curriculum (High Scope), Open Education, and Southwest Lab Models.

Except for children in the DI model, the math results are consistently dismal. The only other model above the 20th percentile was the Behavior Analysis model. DI students scored 20 percentiles ahead of the second place group (Behavior Analysis) and 37 percentiles higher than the last place group (Cognitive Curriculum/High Scope).

In spelling, the DI model and the Behavior Analysis model were within the normal range. DI students scored 2 percentiles above the second place group (Behavior Analysis), 19 percentiles above the third place group, and 33 percentiles above the last place group (Open Education).

Like the previous academic subjects, the DI model was clearly superior in language. DI students scored 29 percentiles above the second place group (Behavior Analysis) and 38 percentiles above the last place group (Cognitive Curriculum/High Scope).

For many people the use of normed scores are more familiar than the use of the index described in the previous section. No matter which analysis is used, children who were in the DI model made the most gains when compared to the other eight models. With the possible exception of the Behavior Analysis model, all other models seem to have little positive effect on the academic progress of their children.

The increase amounts of money, people, materials, health and dental care, and hot lunches did not cause gains in achievement. Becker (1978) observed that most Follow Through classrooms had two aides and an additional $350 per student, but most models did not show significant achievement gains.

Popular educational theories of Piaget and others suggest that children should interact with their environment in a self-directed manner. The teacher is supposed to be a facilitator and to provide a responsive environment. In contrast, the successful DI model used thoroughly field-tested curricula that teachers should follow for maximum success. The Follow Through models that were based on a self-directed learner model approach were at the bottom of academic and affective achievement. The cognitively-oriented approaches produced students who were relatively poor in higher-order thinking skills and models that emphasized improving students' self-esteem produced students with the poorest self-esteem.

**Subsequent Analyses**

**Variability Across DI Sites**

The Abt Associates findings were criticized by House, Glass, McLean,
& Walker, 1978) and then defended by others (Anderson, St.
Pierre, Proper, Stebbins, 1978; Becker, 1977; Bereiter & Kurland,
1981-82; Wisler, Burns, & Iwanoto, 1978). One Abt Associates
finding was that there was more variability within a model than
between models.

This statement is consistent with the often cited belief that
"Different programs work for different children" or another way of
saying "Not all programs work with all children." The following
sections provide research results that contradict this statement. The
problem is that the statement doesn't match the data.

Gersten (1984) provided an interesting picture of the consistency of
achievement scores of urban DI sites after the Abt report was
completed. Figure 9 shows the results in 3rd grade reading scores
from 1973 to 1981 in four urban cities. The reading scores are
consistently around the 40th percentile. Based on non-Follow Through
students in large Northwest urban settings, the expected score is the
28th percentile on the MAT. Some variability is due to the
differences between tests when some districts changed tests over the
nine year period. Also, Gersten mentioned that the drop in the New
York scores in 1978 and 1979 may have been because of budgetary
reductions during those years.

Figure 10 shows the stability of math scores. The math scores for
these three sites tend to be consistently in the 50th percentile
range. New York did not collect information of math during this
period. Based on the math scores of large Northwest cities,
non-Follow Through students would be expected to score at the 18th
percentile.

Figure 9: Total reading scores for K-3 students. Stability of effects: Percentile equivalents at the end of Grade 3.

Figure 10:

**Follow-Up Studies**

**Fifth and Sixth Grade Follow-up**

Some critics of DI have indicated that many, if not most, early DI
achievement gains will disappear over time. There are different
reasons given for this prediction. One reason given is that the DI
students were "babied" through sequences that made instruction easy
for them. They received reinforcement and enjoyed small group
instruction, but they would find it difficult to transition to the
realities of the "standard" classroom.

DI supporters give different reasons for suggesting that DI results
would decrease over time. The DI students were accelerated because
they had been taught more during the available time than they would
have been taught during the same time in a traditional program. Upon
leaving Follow Through, they would be in instructional settings that
teach relatively less than the Follow Through setting achieved.
Statistically, there would be a tendency for them to have a
regression toward the mean effect. Phenomonologically, students would
be provided with relatively fewer learning opportunities and would
tend to learn less accordingly.

In any case, the effects observed at a later time are not the effects
of Follow Through. They are the effects of either three or four years
of Follow Through and the effects of intervening instructional
practices. Engelmann (1996) observed that because the typical
instruction provided for poor children in grades 4 and beyond has not
produced exemplary results, there is no compelling reason to use
results of a follow-up to evaluate anything but the intervening
variables and how relatively effective they were in maintaining
earlier gains.

**Junior and Senior High School Follow-up
**

PS 137 was the only DI Follow Through site in New York City. Meyer selected a comparison school that matched the DI school on many variables. Over 90% of the students were minority students and over 75% were from low-income families.

Meyer retrieved the rosters of the first three cohort groups (1969, 1970, and 1971) and included students who received either three or four years DI instruction. With the cooperation of the New York City Central Board of Education and the Office of the Deputy Chancellor for Instruction, students were located through the computer database. Meyer and staff were able to locate 82% of the former DI students and 76% of the control students. These rates should be considered high because it would be expected that over time many students would move totally out of the area.

Table 1* shows the grade equivalent scores for the DI and comparison groups of the three cohort groups. At the end of 9th grade, the three DI groups were on average one year above the three comparison groups in reading (9.20 versus 8.21) (p.01) with an effect size of .43. In math, the DI groups were approximately 7 months ahead of the comparison group (8.59 versus 7.95) which was not statistically significant (p.09), but educationally significant based on an effect size of .28.

Table 1: Results of t-tests comparisons and effect sizes for reading and math at the end of 9th grade*

Gersten, Keating, and Becker (1988) provide similar information for other sites. Table 2* shows the effect sizes of the E. St. Louis and Flint sites at the end of ninth grade. Most effect sizes were above the .25 level of being educationally significant. It should be noted that the 3-K East St. Louis group that started in kindergarten, instead of first grade, had four years of instruction (not three) had the second highest effect size (.49).

Table 2: Ninth Grade Reading Achievement Results from E. St. Louis, Flint, and New York*

Table 3: Ninth Grade Math Achievement Results from E. St. Louis, Flint, and New York*

Table 3* shows similar effectiveness in the math. The results of these two analyses clearly show that while the superiority of DI diminishes with the time spent in traditional curricula, the advantage of the DI lasts. Educational significant differences occur in reading (overall effect size = .43) and in math (overall effect size =.25).

Darch, Gersten, & Taylor (1987) tracked Williamsburg (SC) students in the original Abt study (students entering first grade in 1969 and 1970) to compare graduation rates. All students were black and had stayed in the county school system. Table 4* shows a statistically significant difference in drop-out rate for Group 1 (the 1969 group), but the difference in drop-out rate was not statistically significant for Group 2 (the 1970 group).

Table 4. Longitudinal Follow-up Study: Percentage of Graduates and Dropouts for Direct Instruction Follow Through and Local Comparison Groups.*

A total of 65.8% of the Group 1 Follow Through students graduated on time in contrast to 44.8% of the comparison group (a statistically significant difference - p .001). For Group 2, 87.1% of the Follow Through group and 74.6% of the comparison group graduated on time (a nonsignificant statistical difference). Also, 27% of the Group 1 Follow Through were accepted into college in contrast to 13% of the comparison group; the difference for Group 2 in college admission was not significant.

Meyer, Gersten, & Gutkin (1983) calculated the rates of graduation, retention, dropping out, applying to college, and acceptance to college for the three cohort groups in the New York City site. Statistical analyses showed that the DI group had statistically significantly higher rates of graduation (p.001), applying to college (p.001), acceptance to college (p.001) and lower rates of retention (p.001) and dropping out (p.001). The differences in graduation rates were consistent across the three cohort groups with over 60% of the DI students graduating in contrast to less than a 40% graduate rate for the three comparison groups. Meyer mentioned in her report that the difference in retention rate between Cohort II and Cohorts I and III may have been due to the principal retaining all students below grade level one year.

Table 5: Percentages of Cohorts 1, 2, and 3 Students: Graduated High School, Retained, Dropped Out, Applied to College, and Accepted to College*

Educational reformers search for programs that produce superior outcomes with at-risk children, that are replicable and can therefore be implemented reliably in given settings, and that can be used as a basis for a whole school implementation that involves all students in a single program sequence, and that result in students feeling good about themselves. The Follow Through data confirm that DI has these features. The program works across various sites and types of children (urban blacks, rural populations, and non-English speaking students). It produces positive achievement benefits in all subject area - reading, language, math, and spelling. It produces superior results for basic skills and for higher-order cognitive skills in reading and math. It produces the strongest positive self-esteem of the Follow Through programs.

Possibly, the single feature that is not considered by these various achievements is the implied level of efficiency of the system. Some Follow Through sponsors performed poorly in math, because they spent very little time on math. Most of the day focused on reading and related language arts. Although time estimates are not available for the various sponsors, some of them spent possibly twice as much time on reading as DI sites did. Even with this additional time, these sites achieved less than the DI sites achieved. For a system to achieve first place in virtually every measured outcome, the system is required to be very efficient and use the limited amount of student-contact time to produce a higher rate of learning than other approaches achieve. If the total amount of "learning" induced over a four-year period could be represented for various sponsors, it would show that the amount of learning achieved per unit of time is probably twice as high for the DI sites as it is for the non-DI sponsors.

Perhaps the most disturbing aspect of the Follow Through results is the persistence of models that are based on what data confirms is whimsical theory. The teaching of reading used by the Tucson Early Education Model was language experience, which is quite similar in structure and procedures to the whole language approach. The fact that TEEM performed so poorly on the various measures should have carried some implications for later reforms; however, it didn't. The notion of the teacher being a facilitator and providing children with incidental teaching was used by the British infant school model (Open Education). It was a flagrant failure, an outcome that should have carried some weight for the design of later reforms in the US. It didn't. Ironically, it was based on a system that was denounced in England by its Department of Science and Education in 1992. At the same time, states like California, Pennsylvania, Kentucky, Ohio, and others were in full swing in the National Association for the Education of Young Children's idiom of "developmentally appropriate practices," which are based on the British system.

Equally disturbing is the fact that while states like California were immersed in whole language and developmentally appropriate practices from the 1980s through mid 1990s, there was no serious attempt to find models or practices that work. Quite the contrary, DI was abhorred in California and only a few DI sites survived. Most of them did through deceit, pretending to do whole language. At the same time, those places that were implementing the whole language reading and the current idiom of math were producing failures at a tragic rate.

Possibly the major message of Follow Through is that there seems to be no magic in education. Gains are achieved only by starting at the skill level of the children and carefully building foundations that support higher-order structures. Direct Instruction has no peer in this enterprise.

*Tables in this article could not be reproduced clearly in electronic format. Please refer to

References

Adams, G.., & Engelmann, S. (in press) Research on Direct
Instruction. Seattle, WA: Educational Achievement Systems.

Anderson, R., St. Pierre, R., Proper, E., & Stebbins, L. (1978).
Pardon us, but what was the question again? A response to the
critique of the Follow Through evaluation. Harvard Educational
Review, 48(2), 1621-170.

Becker, W. (1977). Teaching reading and language to the
disadvantagedp;what we have learned from field research. Harvard
Education Review, 47, 518-543.

Becker, W. C. (1978). National Evaluation of Follow Through:
Behavior-theory-based programs come out on top. Education and Urban
Society, 10, 431-458.

Becker, W., & Carnine, D. (1981). Direct Instruction: A behavior
theory model for comprehensive educational intervention with the
disadvantaged. In S. Bijon (Ed.) Contributions of behavior
modification in education (pp. 1-106), Hillsdale, NJ: Laurence
Erlbaum.

Bereiter, c., & Kurland, M. (1981-82). A constructive look at
Follow Through results. Interchange, 12, 1-22.

Darch, C., Gersten, R., & Taylor, R. (1987). Evaluation of
Williamsburg County Direct Instruction Program: Factors leading to
success in rural elementary programs. Research in Rural Education, 4,
111-118.

Gersten, R. (1984). Follow Through revisted: Reflections on the site
variability issue. Educational Evaluation and Policy Analysis, 6,
411-423.

Gersten, R., Keating, T., & Becker, W. (1988). The continued
impact of the Direct Instruction model: Longitudinal studies of
Follow Through students. Education and Treatment of Children, 11(4),
318-327.

House, E., Glass, G., McLean, L., & Walker, D. (1978). No simple
answer: Critique of the FT evaluation. Harvard Educational Review,
48(2), 128-160).

Meyer, Gersten, & Gutkin, (1983). Direct Instruction: A Project
Follow Through success story in an inner- city school. Elementary
School Journal, 84, 241-252.

Meyer, L. A. (1984). Long-term academic effects of the Direct
Instruction Project Follow Through, Elementary School Journal, 84,
380-394.

Stebbins, L. B., St. Pierre, R. G. , & Proper, E. C. (1977).
Education as experimentation: A planned variation model (Volume IV-A
& B) Effects of follow through models. Cambridge, MA.: Abt
Associates.

Wisler, C., Burns, G.P.,Jr., & Iwamoto, D. (1978). FT redux: A
response to the critique by House, Glass, McLean, & Walker.
Harvard Educational Review, 48(2), 171-185).

For information on ordering

Back to Table of Contents